Welcome to EtherPad! This pad text is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents! SC11 Tutorial M13: MetaData : "Big Data Means Your Metadata Must Work" given at SC11: http://sc11.supercomputing.org/ http://epad.dataone.org/20111114-SC11-Tutorial-M13-Metadata Science Metdata Interest Group: Volunteer Self-Assembled - openly shared list cobbjw@ornl.gov Audience Questions: Primary Role: a) Domain scientist 3 b) CS Researcher 7 c) Manager ~15 d) Systeams Administrator ~ 30 Q: Level of Familiariety with Metadata none some moderate expert none 3 some ~ 12 moderate 6 expert 0 (Perhaps some shyness) Q Level of Familiarity with Metadata none some moderate expert none 2 some ~2/3 moderate ~ 10 expert 0 Perhaps some shyness) Q; Current Metadata projects Y: 1/2 to 2/3 N: ~ 1/3 Q: Familiariety with workflow systems: ~ 7 Audience counts 1:40 PST: 60 1:50 PST: 61 2:50 PST 50 3:40 PST 30 Audience: Questions/Comments - requesting action in Tutorial Comments/Discussion Jim Gray: Scientists spend 80% of their time getting the data together and 20% of the time doing science. We want to flip that. So what's the easiest way to capture metadata from already existing workflows (aka stubborn scientists?) It is hard. The lesson is that the cost of metadata preservation increases with increasing time since data creation. After the fact metadata capture is tough but has been undertaken. For example, NARA does this all the time since it is often the receptacle of projects that have not planned well. Another examples is the NEES project . The NEw NEES operatins team has dedicated a full-time curator to help with metadata issues for ongoing and past projects. They ahve also defined a data file hierarchy rubrid within which to place data files from earthquake engineering project. I mean how to integrate the capture into software they use that may/may not be in house (commercial software). For example...if the workflow involves opening up Excel, doing something, how can I capture that? It's difficult when scientists all use different tools to manipulate data. My (peraonl) advice is "be thoughtful" beforehand. The usual adage is: Good Judgment comes from Experience but experience is often obtained from bad judgment. The best way to address this problem is with a team that has felt the pain of not doing it well. w.r.t. to Excel, it is a good example. Excel is used by many, many scientists as a legitimate part of their science workflow. The California Digital Lbirary is working with Microsoft to try to bring some better data practices to Excel. see: . WE hope to have this available sometime in 2012 - with some error bars on the time here. < It's very surprising how many profs don't care about succession of datasets. The amount of work/time grad students spend trying to figure out what old datasets are.... perhaps the biggest thing funders (like NSF) could do is enforce/require LIMS. This varies program officer by program officer. The curation progress that NES is making is a direct result of the emphasis that the NEES program officer places on deposition of project data. She will not approve NSF fastlane reports, project renewals, or even no cost extensions wiyhtout satisfactorry progress on curation. And that is an anecdotal answer to the question above about "how" - the answer is not a technical SW issue, it is a social issue. Researchers will care more when their incentives are phrased as such. I have never heard of anone being denied tenure becuase of inadequate data curation ,.... yet < A DataONE overview video: A DataONE overview lecture: And on the more humorous side: A video on sharing Unicorn data: - A humorous Xtranormal view of data sharing/data hoarding << unicorns are real after all!! A personal Geek out: A Video prepared for the ORNL staff awards program (With only minor apologies to Bob Dylan) **** BREAK **** metadata gnomes and ren and stimpy made it today! Responding to kris: do you actually get scientists to do this? Actually yes, but it also helps if the project they belong to makes it a requirement to supply metadata for their datasets. The demo shows alot of typing but in reality you can upload from other documents - the demo is to show you one edi I see how you input metadata into the GUI metadata tools .. but where do you actually 'relate' the actual datasets to the metadata? (maybe I missed this step) No - you didn't - John skipped over that because he didn't want to fill out the 15 pages - at the end, you can upload the dataset - Morpho works for just data files - EML will support databases but Morpho is intended for individual datasets ok cool.. "social" is key here, until it's as easy as Facebook people will see it as a burden. And scientists aren't really the social type :P Maybe not Facebook social but more and more scientists are members of larger collaborations that have received funding so working together or finding someone to collaborate with is becoming more important - if another researcher needs data and finds your data, the collaboration can begin - in order to enable that discovery, quality metadata is very important Even that type of social isn't very...well... social anymore :P