OK - I plan on joining Spring 2013 Usability & Assessment / Sociocultural Working Group Joint Meeting Member Nodes Subgroup Twitter sharing during the meeting @DataONEorg #SCUAwg To Contact Rama for inclusion into this group - dial him directly at 301-614-5356. He is available from 1:30 - 2:30, then potentially after 4pm ET. He will use the Etherpad. Rama: You can contact me at 865.924.9661 Current Meeting Documents________________________________________________________________ MN Policy DRAFT MN Checklist (this is higher level for public use) http://mule1.dataone.org/OperationDocs/member_node_deployment/mn_checklist.html and MN Procedure DRAFT (this is detailed for internal use) https://repository.dataone.org/documents/Committees/MNcoord/Coordination%20Work%20Area/SC%20U A%20Joint%20WG%20mtg%2030Apr-2May/DRAFT%20Member%20Node%20procedures.docx MN Persona Template https://repository.dataone.org/documents/Committees/MNcoord/Coordination%20Work%20Area/SC%20UA%20Joint%20WG%20mtg%2030Apr-2May/DataONE%20Member%20Node%20Persona%20Template%20-%20DRAFT.docx MN Persona DRAFT (for PPSR-Public Participation in Scientific Research) https://repository.dataone.org/documents/Committees/MNcoord/Coordination%20Work%20Area/SC%20UA%20Joint%20WG%20mtg%2030Apr-2May/DataONE%20Member%20Node%20Persona%20-%20PPSR%20-%20DRAFT.docx Past Meeting Documents________________________________________________________________ Presentations from February NSF Review of DataONE (Reverse Site visit) https://docs.dataone.org/member-area/documents/management/nsf-reviews/nsf-reverse-site-visit-february-2013/presentations_final_versions Working Group Repository __________________ https://docs.dataone.org/member-area/member-nodes/coordination-work-area/scua_wg_20130430_materials Background and Related Work __________________ DataONE five Principles: https://docs.dataone.org/member-area/working-groups/usability-and-assessment/meetings-usability-and-assessments-working-group/joint-u-a-sc-wg/faqs-documentation-environmental-scan/D1%20Principles%205.2.12.docx/view?searchterm=five%20principles Data Stewardship Principles: https://docs.dataone.org/member-area/working-groups/sociocultural-issues/charter-sociocultural-working-group/sociocultural-issues/ahm-draft-outputs/data-research-lifecycles-stewardship-principles/Data%20Principles%20DataOne-2.doc/view?searchterm=four%20principles PREVIOUS WORK From Joint UASC WG 2012” https://docs.dataone.org/member-area/working-groups/usability-and-assessment/meetings-usability-and-assessments-working-group/joint-u-a-sc-wg Tier 1 member node: public data repository; public data only; no access controls; moderate availability level; uses Tier 1 API Tier 2 member node: entry of authenticated user; log presence, use, responses of member node (including authentication), delivery of data Tier 3 member node: authenticated user with write access to contribute datasets; public and restricted data Tier 4 member node: responsbiility to back up their own and possibly other Tiers' data CCIT WG will review” MN and CN Relationship Diagrams https://docs.dataone.org/member-area/working-groups/usability-and-assessment/meetings-usability-and-assessments-working-group/joint-u-a-sc-wg/metrics-and-statistics/mn-and-cn-relationship-diagrams * * Diagrams, https://docs.dataone.org/member-area/working-groups/usability-and-assessment/meetings-usability-and-assessments-working-group/joint-u-a-sc-wg/metrics-and-statistics/mn-and-cn-relationship-diagrams * * Metrics and stats presentation – combines work from last six months plus diagrams, https://docs.dataone.org/member-area/working-groups/usability-and-assessment/meetings-usability-and-assessments-working-group/joint-u-a-sc-wg/metrics-and-statistics/Metric%20-%20Stats%20pres_2.pptx/view * * Diagrams, https://docs.dataone.org/member-area/working-groups/usability-and-assessment/meetings-usability-and-assessments-working-group/joint-u-a-sc-wg/metrics-and-statistics/mn-and-cn-relationship-diagrams * Subgroup work products: Task lists: Meeting Agenda: (MN portions) Overall agenda: https://docs.dataone.org/member-area/working-groups/usability-and-assessment/meetings-usability-and-assessments-working-group/joint-ua-sc-wg-meeting-2013/Joint%20SC_UA%20WG%20Agenda%204.26.13.doc/at_download/file Block 3: (Tue: 1:30-3:00) Block 4: (Tue: 3:30-5:00) Persona Discussion start: introduction; brainstorming; selection; writing assignments Block 5: (Wed: 9:00-10:30) Document discussion: Ext web : procedure; policy Block 6: (Wed 11:00-12:30) initial report out: Continued Persona development Block 9: (Th 9:00-10:30) MN process assessments - what are important metrics? (analytical, anecdotal, general) background: Rama's MEtrics paper: https://docs.dataone.org/member-area/working-groups/usability-and-assessment/meetings-usability-and-assessments-working-group/joint-u-a-sc-wg/metrics-and-statistics/Metrics%20-%20IGARSS%202007%20-%20Paper-20070425.pdf/view Block 10: (Th 11:00-12:30) Discussion of Member Node Scaling limits; how to increase MN count; "MN lite" Block 12: (2:00-3:00) Siubgroup report out Meeting Notes: (Free to all to edit and add notes - this is the main reporting for the meeting) Block 3: (Tue: 1:30-3:00) Block 4: (Tue: 3:30-5:00) Persona Discussion start: introduction; brainstorming; selection; writing assignments Block 5: (Wed: 9:00-10:30) Document discussion: Ext web : procedure; policy Block 6: (Wed 11:00-12:30) initial report out: Continued Persona development Block 9: (Th 9:00-10:30) MN process assessments - what are important metrics? (analytical, anecdotal, general) Background Info: DataONE PMP: https://docs.dataone.org/member-area/documents/management/project-management-plans-pmp Goals yr 5 (yr 4) 40 (20) MN's 60 TB Storage 1M Metadata records May 2012 SC&UA WG products: https://docs.dataone.org/member-area/working-groups/usability-and-assessment/meetings-usability-and-assessments-working-group/joint-u-a-sc-wg/metrics-and-statistics Block 10: (Th 11:00-12:30) Discussion of Member Node Scaling limits; how to increase MN count; "MN lite" Block 12: (2:00-3:00) Subgroup report out Issues not discussed elsewhere (i.e. "Parking lot") Rama's question during Tuesday Morning: Don't forget to use prior outputs of previous SC&UA WG meetings. ================================================ BEGIN CURRENT E-PAD ================================================ Member node breakout Who is present: Todd Suomela Suzie Allard Kevin Crowston Tanner Jessell Robert Waltz Amber Budden Holly Mercer John Cobb Rama - virtual for parts of meeting Ranjeet - virtual for parts of meeting Thursday morning - reconvene Usability and assessment metrics; scalability Policy Documents we will be Reviewing - check under current meeting documents (scroll up on e-pad) Not rigid policy; guidelines is a good word Checklist; External Web presence (Tomorrow Morning) Today is the notion of "Personas" This group has done much in generating user personas When we say "Member node persona" it is the description of an organization - an "Org-Sona" Amber Owens will present on what Laura Creekmore and John Cobb have been working on - we will generate a group of candidate personas to think about, vote, then spend the bulk of the time writing draft personas. Had the notion that there'd be more people, 8 or 10 - 12 personas. This is an arduous process, so we will be satisfied to have drafts for a couple. Pick your collective brains! Questions are welcome during Amber's Presentation. Start-up Question: Tiers that we have talked about in the past - Tier 1 through Tier 4 - both implemented in what we are doing operationally, a big part of how we will organize the candidate member nodes. Different member nodes will want to participate differently. Member Node Personas - Slides Amber Owens is a masters candidate in the UT School of Information Science. Part of the SciData IMLS program scidata.sis.utk.edu Interests: User-Centered Design and Sociocultural impact on cognition and interfacing potential Slide 1 MN Personas - help define / refine MN coordination activities Personas were informed by Stakeholder network input on potential MN personas Consistency of personas at DataONE Slide: Templates DataONE usage scenarios Data Conservancy Scenarios Data Conservancy profiles from Illinois and Purdue Slide: Features Positive feedback on member nodes What a good persona should have: Background: name, education, age; socioeconomic class and desires; life or career goals, fears, hopes, attitudes; Reasons for using DataONE to share and reuse data; needs and expectations, skills.... (more). Slide: Process Goals - are we answering the needs of the users, how do we expand the user base Methods - who are we servicing? Spreadsheet of potential member nodes, attempt to categorize (Identifier, URL for external web site) categorize by type Guided scenarios from the template Slide: MN URL and Classification Name, Description, Location, URL Just a way to lay out who the constituency / stakeholder might be Who are we trying to reach out to? Slide: Classification Schema Different kinds of repositories, organizations, location and classification along with name, description. Background Informing the process: developing the persona in the exercise 2009 - Scott Ambler - Intro to Persona Model Data Curation Profile (Online in DataONE docs) Talks about user needs, but mainly focusing on the data, a good template but taken a step further in trying to figure out what to do with the data and how to facilitate best practices, growing each organization A user scenario on the DataONE Docs site, this pinpoints the diffferent kinds of users, research scientists, librarians, activity within the DataONE project. Sufficient for the needs of the scenario, but taking a step further with informing the practice and meeting the needs of the user. Alan Cooper - The Inmates are Running the Asylum - What makes a really good persona? What are the Categories Established? Institutional Repository Discipline Science Repository for Researchers Government Repository Individual Investigator Repository Replication Node Public Participation in Science Repository Remote Sensing Large-Scale Data Repository Q Are these the categories of Member Nodes that we are to consider? Also, these are categories dreamed up, we may add more. Kevin: Maybe pick a few that are prototypical instead of trying to abstract. Personas need to be specific. Comparison is individual researcher that graduated from Cornell in the late 1990s. Robert Waltz: With personas, there is a level of specificity that is encouraged, but they are also useful in a general way to talk about general characteristics. If we get too specific in a description, maybe a potential membernode will exclude themselves from the dialog. Might be taken too literally. Kevin: point of personas generally speaking is more for the design team. If you are thinking about a researcher interacting with the system, think of this person who has these specific skills. Here's why some repository that hasn't thought of it. Robert: Also, personas are for advertising Suzie Allard: If you were thinking about being a member node, it is easy to think of this and this and this in a way that tech people are looking at in a different way. Robert: Personas are fictitious, based off of user data, imaginary friends in the design world. Trying to create the same type of thing, imaginary best friend member node. Fictionalized e-bird versus real e-bird? Freedom to fictionalize something about e-bird Profiles of typical member nodes. We may have a small repository versus a large repository. Kevin: Did that with researchers, early career, late career John: might have a funding change - characteristics of member node might be captured within the entire persona Kevin: there are many more ways that organizations can change. John: Person might have an spectrum of opinion Kevin - organizations might be merged from several John: a few features should rise to the surface and be prominent Set of things to ask for organizational things? A few more slides will cover that. Salient questions or characteristics: Who are my users What do I value What do I accept Curation Level - OAIS model Where do I fit in the data life cycle? Fixity - data and the state of change - object permanence Tier Placement Funding Tanner: What is fixity? A: permanance. Assurance that retrieved data is the same as stored data (including retrieved replicas and retrieval accross time) Template components: Description Users Data Resrouces / Funding Expectations - Create 8 templates and narratives Look at components and sub-components Comments: Tanner: What about explicit Geo-spatial oriented repositories. The concept of a data package was interesting for me. For GIS, the packge may include many files in a package. Consensus: Yes important. It might be a characteristics of a repository It may have special features in terms of searching and retrieval Geospatial cuts across many issues Examples: USGS national Hydrology dataset Cornell University Geospatial informaiton systems (CUGIR) EDAC (a propsective MN) Amber: do we need to add industrial Kevin: Looking over the human personas, the characteristics that we try to collect are name, age education, socioeconomic class, desires, life or career goals, reasons to use... etc. From there we went to usage scenarios. Strikes me that almost all of these could carry over to the organizational one (name, age, history where it came from, maybe class and desires (mission statement), funding levels, goals that a repository is trying to accomplish) fears, hopes and attitudes. Fears is an important one: what are they afraid of? Smaller might be losing funding, unable to sustain; conversely a large one might have a different set of fears. Understanding those might inform DataONE on how to support. Tools that researchers are going to use: very different set for the repositories; what might they use from DataONE? What skills do they have? Technically sophisticated repository, equal partner, versus a group without a lot of resources, need a lot of help from DataONE where DataONE essentially helps professionalize what they are doing. List of categories might re-think some of the dimensions, repository that has a steady source of funding than one that has been scraping by on overhead, technology is not that sophisticated. John Cobb: Agrees. Social parts about integration of project, member node growth plan, organizational ranking to lead to success or failure. Kevin: Think of as market segmentation - which go after immediately, which are the longer-term targets. "Hand Holding" to get up and running, versus someone who has more data, more technology, and more staff. A hypothesis. John Cobb - good to characterize, but difficult to approach. Kevin: Useful data divided by hours of integration work Suzie Allard: Is it a leader member node? Is it worth the investment if they will be a leader of other member nodes? Vis-a-vi other organizations. Suzie Allard: D-space has a huge reach within the library community - may have 100 extra people. Kevin: something the organizations have that people dont is the base system. If we can find someone who is really visible in genetics / oceanography / hydrology what have you then we suddenly have a higher profile. More of a marketing use of personas than a systems development use of personas. Allard: Cross over - do they need to be standalone or can they have some crossover - important if looking for buy-in from a community of libraries, that would be a different kind of organization, an institutional repository type. There are different fears and challenges at an organizational level for those challenges. Could have value for if we know their fear is "not having a tool add-on to do x" because that is what their subject group needs, we don't have the tool we need. Robert: Ability to transfer their user base could be a fear or impediment. All XML is marked up with the users specific credentials, if a MN will not join InCommon, will users lose ability to access information? Losing control of the data. Kevin: Suggests making a list of fears that organizations have - losing control of information, losing privileged connection to users - concern that DataONE will come between you and users. Don't think the service being provided, newspapers are concerned about Google News; "we send you links" but they never know the headlines came to us." Allard: Holly Mercer is connected to the Academic LIbrary Holly Mercer: Investment in infrastructure that people don't use. John Cobb: Territorial issues. Amber Owens: Fears and Value - looking at external website. Ask them? Kevin: user persona, a number of people had been in that role and could write it, for others like data librarian, you would go off of the interview. A first draft which sticks pretty closely to repositories. Closely on E-Bird, case study on e-bird is 30 pages. Has a fair amount of goals and technology. Write something that you try to generalize across, identify what the population is, interview some sample, then try to abstract. Not convinced this is going to be the most illuminating for the purposes - variation is more important, well set-up technically, good resources versus not, large collection of legacy data versus more flexibility about the data. Productive: 1) What do we need to find out? Organizational versus individual persona 2) Thinking about points of variation - representation of the different kinds Amber: Small versus large? Kevin: Perhaps. Life History would be interesting. How it came to be, how it is currently run. Who am I. Hopes and fears - what would the organization like to become? What is it afraid of? In general and around DataONE. Smaller repositories might be more at-risk from funding. John Cobb: People have mined their database and sold without knowledge. Influences interactions with DataONE. Tanner: are the institutions happy with their current system. BePress versus DSpace- front end for other places that they put data. UT Libraries has discussion. Kevin: Reasons for using DataONE? Skills or internal capabilities - what is this organization good at. Could be an entire section on "What will this organization "get" out of DataONE? Gets at the fears and hopes - what is it they are afraid of that DataONE is going to help them do better. John Cobb: Bullet on "how do I perceive my role within DataONE" Kevin: Organizational Capabilities - where do they have a lot of capabilities, where are they missing capabilities. Individual investigators - "implement the stack" how / why should we do that? I understand why that would help, but it's just me, not sure what you are asking me to do? Should be highly specific, point value on some distribution, but it really is the description of some individual. Suzie Allard: We have the data to do a persona on libraries - interviewed librarians, had a separate questionairre at the organizational level - we do have hard data. Kevin: Libraries are likely to become member nodes? John Cobb: Yes, institutional repositories. Suzie Allard: Most researchers are interacting with an "information organization" along the way (data center is different from library) ORNL does have a library... does not handle data. We do have hard data on academic libraries, has a huge potential for DataONE in multiple ways - lots of researchers are seated there. Housing most of them. Way of keeping their institutional capital, intellectual capital. Huge opportunity, and a potential funding stream from libraries, they are used to paying subscriber fees for other people's data, if there is something there to help them expose their intellectual capital, best practices for exposing their own data, that could be lucrative - not to say that we are taking money, but we can be inexpensive next to other solutions. John Cobb: Another fear and hope - undermining of funding. Would not say it out loud, but "joining dataONE leads to a diminished recognition of what we do?" Tanner: purpose of each repository - type of data offered by member nodes should match up with how DataONE connects users to Data. Robert Waltz: Fictional representatives, framework in our mind for who our customer base is, discuss with others, how they work with us. Useful in promoting DataONE Allard: Marketing obviously has value; were there far before computer scientists had them. John Cobb: Look at member nodes already on our Radar. This is a redmine.dataone.org/rb/master_backlog/mns List of operational member nodes. Things about the reverse site visit - end of year four, first half end of year 2. In process laundry, deploy by end of year four. Other things: not written in stone. Just a shared. As member nodes, this could inspire us. Ticketing System - backlogs of things to get done. This would be things you deploy to the incremental rev. Turned in to kind of a scheduling tool for targeting. 20 by end of year 4, 40 by end of year 5. Q5 - person to help, waiting on something to happen. Member node by some resarcher - none in that list. Closest is ONeShare - As you take a spreadsheet, do DataUP. Kevin Crowston has a Terabyte of data - currently use Google Code Robert - we know these people exist because "we know them." The amount of work to bring them into the fold - not realistic. Member node in a box. Some standardized way, queries that could be written in some standardized form read by a harvester and slurpped into a DataONE member node (Github, Drupal, Google Code, Figshare) John Cobb: DataONE is not in the business of providing direct archive collection services to users of data - we don't scale correctly; we don't want to poach on our partners' scope. We find ourselves in a situation saying, thanks for coming to us, we have a member node set up just for what you do. IN one instance, we did not, that's how OneShare came about. How will the query tools expose the member node data. Kevin: Operational List: fairly large, fairly well institutionalized, technically savvy, John Cobb: Opportunistic because they are metacat? Robert Waltz: SANParks is not necessarily well-funded. Pathfinder is an African Member node. Dr. Allard: from personal experience concern seen in South Africa is keeping up with the rest of the world, and researchers do not have support to collect data. Not sensitive about exposing data. Kevin: Biological samples - Brazil (why are american drug companies coming down, discovering neat stuff, turning it into a drug, selling it) Suzie Allard: when dealing with an actual sample, has only a specific amount of slices that you can get out of it, there is a whole lot more territoriality on physical samples. Robert Waltz: Should we export data to another continent and not have it on our continent Reconvening After the Brea Choosing some: Template- draft a member node persona - Example from what Amber handed out - Laura and John Cobb assisted. Persona Number 1 - eBird Amber had a set of questions; she answered the questions for E-bird as a chrononical EPSR Abstracted that into an attempt to make concrete personas. Persona Number 2: What should that be? What category, how can we answer questions EDAC - Like (http://edac.unm.edu/) - a discipline science repository, GIS oriented or aware Including Search and Query III. How is it Funded * USGS * NRCS * State of New Mexico Multiple funding streams; single archive IV. What is the user community? Federal agencies; state, local and tribal Productsgovernments; professional societies, organizations; and advisory bodies nationally and internationally V. What do they support? Resource management, scientific research; GIS oriented but specific to New Mexico. VI. Access Method GIS Clearinghouse, File type, how best to connect the user to data Raster Data, Vector Data, metadata files that come with it What are we not capturing in the description form that we are also needing for a persona? a. Hopes / Fears b. Operational Age c. Skills d. Size of Collection (800 objects, well used) e. staff size Suze Allard Suggest change in direction: Prioritize categories by where we can recruit 20 more - should we discuss categories without agreeing that these are the right categories; are we missing any; should these be taken out. Pulling the Categories from Earlier in the Document: * Institutional Repository * Discipline Science Repository for Researchers * Government Repository * Individual Investigator Repository * Replication Node * Public Participation in Science Repository * Remote Sensing Large-Scale Data Repository Rama's initial reaction is that these may overlap. Example: Large repository / Government repository Most Remote Sensing are Institutional; Is institution really an educational institutional? NGO vs. Academic Repository? Is there a difference between a government repository at state, federal level and an academic repository. Rama: Specialized repositories that hold only one scientific discipline (e.g., only Shakespeare's works) Remote Sensing: Continuous Data Flow, mostly government institutional kind of repositories (Or at least funded by governmnet) DataONE will not be a repository for everything, but to the extent possible we would like DataONE to be able to interoperate to the extent possible, but not everyone who interoperates needs to be a member node. Who is funding a lot of this data? Do we look at where the Data is coming from? Suzie Allard: Problem with categories from different dimensions - source of data - type of data - funding of data acquisition - administer of the repository - infrastructure node. Service provider, e.g. member nodes, Rama: look at tiers, capabilities that they are going to have. Robert Waltz: Justifying categories that we have How do organizations represent themselves to the public? How do they describe themselves? Select a few examples Allard: Discipline science can be the organization in charge, public participation. Waltz: Replication node is certainly a data one defined type of thing. Cobb: Series of entities (besides DataONE) that provide cyberinfrastructure as a service The type of resource available is really along a completely different dimension - warehousing or bridging service Rama: Sounds more like a technical skill or capability. Cobb: Has a lot of sociocultural differences - concerned with Dataset created or provided. Infrastructure node is concerned with the service provided. May be providing "brokering service" for nodes providing datasets to node providing datasets. Waltz" Organizational affiliations - organizationally how do we define DataONE. John Cobb: Size, Quality, and a Type Quality could be geospatial. Rama: is there a summary? A: The word file on the WebEx is the place where this is being included. Suzie Allard: NGO might be a characteristic to describe some of these categories. We can hit some / talk about (if you thought of this as a matrix, different characteristics, funding streams, NGO, but look at commonalities and difference) Rama: Phrases for categories; attributes for entities. Change the word "public participation in Science repository" change to something like a "noun phrase." (Changed Public Participation to Citizen Science). Cobb: Kevin used the "Weirdo" category in that it would stretch our notions - museums, iDigBio (not citizen science, but somewhat contributory, high metadata to data ratios, might be a photo or description of a sample at a particular collection location). Waltz: Museum or other public works? Rama: Individual PI repository Waltz: we had an individual repository but lost it to editing - Cobb: To take away: Assignments to answer the template or extend the template for a particular example in one of these categories, get back here tomorrow Allard: take categories, try to put down some of those attributes, see if we defined those categories so we see commonalities Cobb: Amber began doing that - the rest of her slides talk about these abilities or qualities . Kevin's comment was a lot of the exact questions we were asking of the user personas - equally applied. We should include that original template as well. Presentation gave the URL to a lot of them. Hanging tasks off of people in hopes of getting something by Thursday. Amber laid out a whole list of stuff to do - travel the whole thing back and forth and then re-group. Tanner and Amber can do large geo spatial data repository. (Rama can advise) Suzie and Holly can work together - Academic Institutional Repository Robert Waltz and John Cobb - will work on Replication Node. Todd will work on something like the "Cultural Heritage Repository" Ever have a private corporation as a member node? If I am a private company doing overhead information, my funding model is I gather information and sell it to people. Overhead information systems are third party payor, where agency or institution collects and makes it freely available, it is funded out of another mission. Spreadsheet document added to e-pad. https://docs.google.com/spreadsheet/ccc?key=0AmuOOMpSMNgLdDRNMERjOVVPUnlKcXhuSHZQNTNLUEE#gid=0 candidate member node URL's and classification Things that did not fit the mold... some time Thursday but doing other topics. Going to Amber's Extensive Research (2013) Rama Re-joined the break-out at 4:11 and the WebEx About to be looking at < http://mule1.dataone.org/OperationDocs/member_node_deployment/mn_checklist.html > Couple of personas selected to start trying to draft. Wednesday Morning: Policy Discussion: https://docs.dataone.org/member-area/member-nodes/coordination-work-area/scua_wg_20130430_materials/mn-policy-draft-v0.8.0 External Web study: Process/Procedure What should be on this list? MN Checklist (this is higher level for public use) http://mule1.dataone.org/OperationDocs/member_node_deployment/mn_checklist.html Governing that: and MN Procedure DRAFT (this is detailed for internal use) https://repository.dataone.org/documents/Committees/MNcoord/Coordination%20Work%20Area/SC%20UA%20Joint%20WG%20mtg%2030Apr-2May/DRAFT%20Member%20Node%20procedures.docx Road Map to a Member Node 20121217_MemberNodeImplementationPlan.pptx https://docs.dataone.org/member-area/committees/external-advisory-board/2012_17-18dec_eab_meeting_washingtondc/ppt-for-december-2012-eab-meeting-member-node-implementation-plans/?searchterm=20121217_MemberNodeImplementationPlan.pptx Decision Points along the chart. Guidelines, Suggestion might be weak, but something firm might be giving boundary conditions external to DataONE. Talk about Policy Based on Graph from Above. Solicit Comments Go through Exercise to look at external web space From a perspective member node's point of view, what do you need to know about the process. Allard: Promoter of best practices and not enforcer of standards. Coordinator seems like a "big guy who knows everything and runs roughshod over everyone." Caution on enforcement role.. Crowston: implications - if a member node wants to do something dataONE does not believe is best practice, we will not say "you can't do that, we're not going to let you in if you don't do it that way. Is that the implication? Allard. Yes, the issue to tease out is between the technology, political issue, from technology is to say this is the best version to make things work smoothly. Crowston: large amount of resources, may not happen in the short run. Some policy that has been adopted, in the opinion of application evaluators, that's not a best practice for data management. e.g, update files whenever, if they want to update an item in the repository, they should just do it. DataONE would say - that's not really what an archive means. Cobb: Looking at the e-bird guys - they take observations from anyone - including joe six pack who just saw everything on his life list. Crowston- e-bird does vett things that would seem unlikely. They don't delete, it is your record, but they may just flag it as unreliable. Cobb: opportunity w/in dataONE ecosystem for external annotator. Likely, unlikely, credible but unverified. Miriam: Recognizing member nodes that were following in certain ways - credit or a stamp for doing that. Waltz: argument against minimal, if you don't have well formed XML we are going to throw it out. So far, if you are creating well-formed XML, for some instances on the backend, valid EML, DataONE can accept technologically what is provided. May not be worth indexing for discovery purposes, but we can hold on to it just in case in the future you get back and want to provide an update to make it more usable. As a repository we have the capacity to be a part of someone's workflow and create data, not just preserving the product for the long term. MIght want to put in policy - we are validating XML, not only that but you better have these 10 fields in your record or else... Crowston: Application, assessment of readiness, technologically we don't think you are ready but hey, whatever. As opposed to come back when you are ready. Waltz: level of tech - must be able to talk to each other. Crowston: you don't have good metadata, we can't figure out what's in the repository. Come back, or sure, whatever. Cobb: 1 decision process becomes - when dataone reaches agreement - how should we be making that decision? Need many people to weigh in. CCIT is "running the show" because of a vacuum. See things from on high, strategic needs, but CCIT says this will not work unless you have well-formed data packages. From an evaluation standpoint, as we get to 40 member nodes, there needs to be a process in place. Part is quality issues. Couple questions. How do we make that decision? Allard: Discussion of this before at the second DUG. Afterwards, came to an agreement, this is in the leadership team notes somewhere. Suggest that we should carefully form a research question that we are talking about. Where have we stood before? Where do we need to go - talking about scalability, basically the same thing, we have made decisions to help us hit our marks - strategic decisions, talk about different qualities - high-value, high-risk data, ability to bring in a large amount of data, one node, or with associated groups that come in with it. Prioritized somewhere. Goal should be to find that or talk about in the same way that we talked about it before - recapture the investment. Talked about opportunistic: low hanging fruit. Just "who we were associated with" then metacat. HIgh Risk, technological readiness. Then there was a shift that the most recent discussions would be ones that would grow the network in records or number of nodes, most quickly. Brought up earlier - terminology issue. Careful about technological quality in all aspects: content, tech. Weigh against the ability to be welcoming and inclusive. MIriam: Document what the process has been? Allard: if not document, take into account that we have talked of reasons for bringing people in. Implication that they have well-formed data.a DAAC does have a lot of good things, but tech issues. Cobb: have a discussion, reconnect with all stakeholders, present to DUG, once and for all say this is our control document so in the future we don't say "remember back in 2011 we talked about." Miriam: essentially a decision tree. Allard; We have specifically avoided a decision tree. Cobb: you should consider, you should strive to - one reason the language is kind of "wimpy" Allard: allows greater flexibility, technology may be ready but comes down to a personnel issue, a decision tree does not account for that. E.g., making decisions that were not going to be implemented. Cobb: project talks about this a lot before, brainstorm of the different discussions - DUG meeting linked by Robert W. Miriam: helpful to list criteria to be considered? Topic, quality, quantity etc. Cobb: Three things: make contact with institutional history 2. issues just mentioned 3. what is the process for reaching a decision for moving forward with a mn. Allard: * data * diversity * st. part * leadership * management * Technical From the document "MN_Prioritization_2011.07.09.pptx" NOt taking lab-created metadata, must be some standard. Miriam: from slide 17, is that publicly available out there on the Web site - minimum set of MN requirements. Allard: When mike made some of those early handouts, that's what this was based on. Waltz: metadata format used by the candidate is used by DataONE. We have mn's coming in that does not have it - 2011. Thinking about this at the time. Can be supported may be a better way to discuss it. Negotiation between dataone and member node. Introduce: Chelsea Williamson Barnwell - working on a paper for a class - STEM communication , current member nodes, help classify, set parameters for doing personas. Crowston: proprietary, but important data is of high value and public. MIriam: not a quantity issue Allard: Public data in collection of value are available and shared upon request. People will share it, public in that sense, you must request to get it. Crowston - project adopting that - physical access could be you made publicly available, or that you have to send an e-mail and ask. Miriam: does it describe somewhere what that basic level is? Cobb: we can point them to repo mule Miriam: maybe create something that describes that, define what basic level is Robert Waltz: at last CCIT meeting, even the last point, can kind of debate, the idea of tier 1 member node implementation might be too high a level for some mns, talked of creating a member node light. Have implementations of some other member nodes. Although it is good to keep that in, that is even subject to change - this "lite" version of a member node. Cobb: important, unsure how to re-word to capture that. Waltz: may be a Crowston: might use a different "member node lite" if you don't have tier 1 Allard: if you go to tier 1 you go to being an actual member node Crowston: "silver member" Next slide; Key Characteristics - technical, resources, match the topic, could be that the member nodes are not in our sphere. If it matches DataONE's earth and environmental... Crowston: another characteristic of data that is not captured there - you can imagine the domains coverage would expand. Might not have been a priority initially but's in scope enough. Miriam: Then talked about high priority - many member nodes competing for resources - some kind of prioritization of quantity versus priority Cobb - we are going to quickly get to a place where there are more takers than resources to bring on MNs. We will need a way or prioritizing who comes first. Crowston: isn't idea anyone who wants to be a member node can be... do now vs. do eventually. imagine a scenario in the future, common repository software could implement the member node software. Download DataONE D-space plugin. There, whenever ready to start harvesting Cobb: question to pose to group. Important archive, requires 2 person years of dataONE. implement member node, pay DataONE for quarter of FTE, carry their own freight. Marginally important to Dataset. Take one or the other, or both? Waltz: that amount of effort comes down to "do we have the funding stream to support that." Cobb: Find some comany which you are not alligned to, decide they want to fund that. Crowston: hard to imagine DataONE saying "we don't want your data" from Elsevier. Cobb: DataUP was a bit of a trade - microsoft carried the freight. Crowston: last thing said about elsevier saying 'btw we're doing this for all journals, some earth science data, but we can't exactly tell you which files are which." Miriam: finding again to make sure we go back over, criteria is good/important. question posed: what do we do in this situation - that situation, process is important. Do we need insight from CCIT, here, there... Kevin: proposal: resource allocation problem, finite pool of resources, potentially large demand. Grant competition, you would essentially treat each member node as applying for a grant. Might have multiple pools of money. Some might say they need no money, they are self funding, others might need large amounts. Could send real amounts, send to a foundation or funding agency. Basically, you would have a review committee, here are the 30 grant requests, let's prioritize, allocate this year's budget for bringing up member nodes, if there are deserving ones, they can go to the next year's funding allocation. - Ooh, I like that! - LM Miriam: modification, assuming when talking about funding, really talking about resources. dual process, initial ranking along criteria, of all the suite that comes in, those that rank highly are reviewed by people. There is a big policital capital issue impossible to rank. once candidates have gone through a ranking process, still needs to be some subjective looking. Crowston: believes it is all subjective. has review committee. NSF has "intellectual merit' and broader impacts. Intellectual merit of the data - opens up to a new community of interest, diversity of data, acheives an infrastructure for the next step, spent a lot of effort bringing this one online. Waltz: objective criteria - software platform that already conforms to dataone API that gives a prioritization. Crowston: some would basically say, "we're done, accept us" Waltz: give us a certificate so we can talk to you. that is an objective criteria. Crowston: internal administrative review, but not peer review. MIriam: question about originality of this discussion Cobb: social experiment. sc group took a crack - few bubbles. have a conversation, bring together, conversation not known about. Fork a repository versus merge. Cobb broke down a process. https://redmine.dataone.org/rb/master_backlog/mns Rolling submissions. Crowston: probably rolling, but for easy ones, does not require any resources, resource allocation decisions. "on this date we will prioritize" episodically analyze submissions in terms of resource allocation for the project. If it turns out you want external input, all ones will have meeting at DuG, set priority for next 6 mo Cobb: absent other input CCIT has been doing this, x many dev, what should we focus on. Prep mth. for reverse site visit santa fe, put folks on redmine ticket system. Look and say, what about SEAD - what about Taiwan, Data Fed consortium, have not freed up resources, process occurs in ad hoc fashion. Cobb: not exactly "defensible" I don't have 5 principles, etc. Miriam: ultimately needs to be transparent to these potential nodes. huge question Allard: needs to be set up in a way that is agile. does not block out someone who just came on the horizon and really is someone who should be online. Cobb: when terrapin existed, group 10 centers working together. Inside, wrote a policy on what it takes to join. Inflexible. Expensive (tech requirement, 1 million network requirementt). Outside, this was a "walled garden" problem. DataONE is arbitrarily deciding, playing favorites - was a pitfall for Terrapin. People were advocating for defunding it. Miriam: you need this process for making decisions, but also principles for guiding transparency agility defensibility is the process "just" and "useful." Suggestion of submission process. Allard: talk about what the system is of addressing different characteristics. People have some idea of what's important. Some point depending on if it's transparent, what's going on with DataONE key personnel. May shift between diversity in terms of data at risk, may weigh heavier than other points depending on how many people are engaged. Miriam: these are the criteria that which we consider - still transparent, these are the criteria this is the process... Crowston: easier way to say is here are things that get taken into account, up to the committee to decide how they want to decide... Cobb: reference in policy doc, there is a process - will take months to ratify Crowston: app process should focus on letting proponent make case for why this is a deserving cast. Tech info on what is the kind of data, why is this data which is of value to the dataone community. might be an initial screening phase. CCIT- guess as to what kind of resources. Cobb: NSF - point of contact is the PI. not much more than advisory, helps the PI decide. Crowston: governance - who should be involved. DataONE user's group? can say we recommend d1 users', pi, giving suggestions as to what we think a committee would look like - that requires so little resources it would take more time to discuss when one should just do it. >>>who maintains the list of requests, bugs to submit information that is missing. <<. The key identified decisions points from DataONE's perspective are: - Outreach strategy: How to focus on potential Member Nodes to recruit and how to respond to inquiries from potential Member Nodes - Evaluation of initial proposal to initiate a new Member Node development - Operational Acceptance of successful development completion - Coordination of activities at start of production - Possible termination of Member Node status at some future time. Some overarching considerations are that these decision points should be guided by DataONE's mission and vision statements: Mission: Enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it. Vision: DataONE will be commonly used by researchers, educators, and the public to better understand and conserve life on earth and the environment that sustains it. In addition, the DataONE sociocultural working group has articulated five summary principals for data contained within the collections included in DatONE: Include by reference not inclusion Which version? - WE should go with RSV principles (where stored in docs?... What is DataONE's role within the community? service to MN's ; promotoer of best practices; or enforcer? A: coordinator and promoter - not enforcer Do we include suspect data? perhaps support both but try to annotate with quality cues c.f. Bilder's prior comments about indicating or "approving" certain practices and practicers Robert: We can hold onto records that are "well formed" orthogonal to data quality 1. Data science is transforming environmental science. [ ??? jwc: Still include ???] 2. Data should be part of the permanent scholarly record and requires long-term stewardship. 3. Sharing and reuse maximize the value of data to environmental science. 4. Environmental science is best served by an open and inclusive global community. 5. The data environment is dynamic and requires evidence-based decision-making about practice and governance. Finally, the frameworks described in this policy are not absolute and inflexible, but rather are designed to provide guidance than can help to maintain constancy and coherency of DataONE objectives across many different activities and interacting with different communities. **** Outreach Strategy and Targeting Criteria: The goal is to create a collection of MN's that advances science, enhances MN's, and sustains DataONE. First we hope to enhance community and content. Second we want to enhance, diversify, and simplify underlying infrastructure and interoperability. In addition, it is valid to consider pragmatic considerations such as potential member node willingness and eagerness to participate in DataONE and the technological implementation feasibility (or difficulty.) Is a MN-lite a MN or some other type of creature? **** Evaluation of Initial Member Node proposal: There is an associated process and procedures document that outlines needed activities along the workflow path before review. The character of many of these activities are in terms of information gathering, planning, and scoping. After that discussion proceeds to maturity, the potential Member Node will propose moving forward to development. That proposal will be considered by DataONE and the Member Node with the goal of reaching a mutually beneficial understanding of the prospective Member Node. From the DataONE perspective, the DataONE Member Node coordination group will assist the prospective Member Node in providing information and answering questions. (The coordination group will also facilitate connections throughout DataONE and not act as the sole resource for prospective Member Nodes). Once a complete, but brief, proposal is prepared, DataONE will undergo and rapid but sufficiently detailed consideration of the Member Node. The goal of this review is to have a sufficiently complete understanding of important dimensions of the Member Node including: the community it serves; the characteristics of its data collection(s); the implications for DataONE of needed physical cyberinfrastructure, software cyberinfrastructure, effort hours required to support Member Node development/deployment and continuing operations; any unique or new cyberinfrastructure needs that DataONE will support beyond its currently developed and operational infrastructure; and the required resources and their availability to the Member Node to undertake activities during the Member Node development, deployment and operations phases. The exact form of such a proposal is not specified here, but will be provided as a template by the Member Node coordination group who will also assist prospective Member Nodes in completing the proposal. The template itself is not a controlled document, but will evolve as needed. Suzie: NM DUG discussed this- find and re-confirm our position. Qualities: High Value data High risk data Volume of data Technological readiness (Now) scalability We need to touch base with those discussions Document where we have been: NM DUG https://docs.dataone.org/member-area/planning-for-dug/dug-meetings/dug-2011-meeting-planning-folder/dug-2011-discussion-notes/DUG_member_node_prioritization_notes.docx Also see Powerpoint presentation https://docs.dataone.org/member-area/planning-for-dug/dug-meetings/dug-2011-meeting-planning-folder/dug-2011-presentations/MN_Prioritization_2011.07.09.pptx Public facing pages on member nodes https://www.dataone.org/become-member-node https://ask.dataone.org/question/10/where-can-i-find-more-information-about-becoming-a-member-node/ S&G WG discussions: Miriam: items we listed Technology resources alignment with DataONE Process? Kevin: review committe Once submitted, the proposal for a Member Node to move to development will be available for review and comment. Specifically, comment will be solicited from the core cyberinfrastructure team (CCIT) as conveyed by the director of development and operations, director of community engagement and outreach, and the DataONE project manager. The proposal and comments will be reviewed by the DataONE leadership team who will be asked to make a recommendation. The dataONE principle investigator will make the final decision. The purpose of a gate and release process at this point is to endeavor to understand the importance, impact, and needed resources to move this Member Node through development and into continuing operations. Process: needs to be transparent agile deefnsible Dave: (up in discussion: DRAMBORA (ref????) ) risk analysis for consideration/evaluation of data repositories **** Operational Acceptance of successful development completion: At the end of the development period, it will be necessary to insure that the Member Node is functioning in collaboration with the DataONE infrastructure. This is needed to insure that users of DataONE will be able to access the collections of the Member Node effectively and efficiently. It is also necessary to insure that adding the MEmber Node to operations will not create operational problems for DataONE or other Member Nodes. The vast majority of issues that need to be examined will concern cyberinfrastructure unit-tests and dev-ops readiness issues. Consequently, the DataONE director(s) of development and operations will make this decision with a recommendation from the CCIT with notification given to the DataONE leadership team and DataONE Users Group. If during the course of development, the characteristics of the Member Node change significantly, then the director of development and operations can request a re-evaluation of the initial Member Node proposal. **** Coordination at start of production: As the Member Node moves to production, the Member Node will coordinate with the DataONE director of community engagement, education, and outreach in order to develop communication plans for the start of production. Such items might include: revision of internal and external project documentation and communications release to the general public, funding sponsors such as NSF, and social media. And put it in the DataONE NEwsletter send to mailing lists, including MN-=related mailinglists **** Possible termination of Member Node status at some future time. [??? jwc: "terminating" is a double-plus ungood term - alternatives? ???] It is quite possible that at some point a Member Node or DataONE may no longer wish to continue to be a DataONE member node. A likely example might be the end of a overarching project that developed and supported the Member Node. Often the nature of such events are poorly resourced efforts at the end of projects. To the extent possible, DataONE and the Member Node should seek to understand and foresee this type of event with enough foresight to allow an orderly and reasonable termination. DataONE and the Member Node should reach an understanding about data legacy issues including the preservation of data already published; continued availability of data where needed, for example data referenced by persistent identifiers; continued access to collections; management of identities created by the Member Nodes for access controlled data; and other issues as appropriate. Also, there needs to be a mutual understanding about the continued access and access control of data that is not publicly available without restriction. In addition, there may be a discussion about a new agreement to host and maintain data at DataONE or elsewhere for Member Nodes that cannot (or do not wish to) continue to manage the data in their collections moving forward. [??? jwc this is a bit unclear to me at this point. Have I captured the issues? What is the decision tree?] In a sense it's a will. Add a data will question to initial evaluation process. is "Data Estate PLanning" part of DataONE's sustainability plan. wording: "legacy" "succession planning" - this is the term in land ownership Vieglais: also consider change of authoritative MN from original MN. Kevin: Ask for these items at evaluation. Especially preservation and a data will -- realizing that we may not get complete answeers. Add a checkbox on evlaution form "will you allow DataONE to maintain rights to the data? Kevin: reword ask how they plan to make their data avaialble in the long-term and let them choose DataONE. Cobb: Also, keep in mind that this permits DataONE to have the data but does not constitute an obligation by DataONE (i.e. don't go into the insurnace business unbonded) Dave: (up in discussion: DRAMBORA (ref????) ) risk analysis for consideration/evaluation of data repositories Policy Review and Ratification: The proposed route for approval as a DataONE project policy is for this document to be review and discussed with various DataONE working groups and the DataONE user group. Then it will be submitted for discussion and recommendation by the DataONE leadership team and approval (or rejection) by the DataONE PI. This policy will be in effect until revised or revoked. The policy should be reviewed and if necessary revised annually, but if that review fails to occur, it will remain in effect unmodified. (i.e. not automatic unseating.) ***** BREAK ******* External Web process Robert: Types of Data (formats) What types of information from MN. Chelsea: Criteria for the MN- Chelsea - interactive clickable list of MN's that provide similr to NASA Earth Data Centers. Tanner: Factsheet and partnership guidelines are PDF. Send me a Vimeo short description. slideshare embeeded, shorti-ish - not more than 3 minutes. (Suzie say 1.5- 2 min) Kevin: Form mroe information contact DataONE. Who do I contact. instead have a link to MN coordinator Add a picture note not on main MN page but is on the become a MN page. should be on the first page Amber: When talking about sustainability - talk about costs of beoming a MN. "'What will this cost me?" Difficult to anser decisvely but this is a question C.F. Mule work by Suzie Side comment Miriam: do we include this question in the Persona develoment. Amber B. I would like to see the workflow/process. Suzie: What is the value proposition to the MN> Why would I want to? What do I get? What do my users get? How hard will it be? (cost) Todd: Complete list of current members and/or highlight examples Miriam: Exmaples of good MN evaluation proposals Miriam: Who are Member nodes? Miriam: examine current questions coming from incoming. In the FAQ's in the SC mtg. Tanner: I like the second paragraph is too much text fro the web. Break it up into groups and bottom one. Kevin: Kevin describe the documents to download - explain why one would want to look at theese documents Kevin: The backing documents are uneven in their aimed audience. Write documents targeting different personas (decision-maker, technical person, ...) Amber: How do I articulate that without excessive length? A: Kevin: executive Summary (discussion on how to pick apart the content) Todd: List of institutions that run MN's - this is in the ASK FAQ's Amber: Tabulation of the Tier struture and SW alternatives. Suzie: What is the structural and "power" relationship between me and my MN and DataONE. Amber O: Pre-requisites to being a MN. Easy Pre-req. John: Answer DataOWNer's question about where to archive. (Maybe in ASK) Kevin: Who's responsible for what? (It's buried in the documents) Chelsea: Where do I find forms I need to fill out? Amber: We didn't have it on the page initially, but we gave it out. We are not sure. Consider again. Don't scare people off. Tanner: Have an interactive checklist. Congratulations you scored 95%. Do you want to be a MN? Amber: Can and should MN's see the redmine prioritization. Some items should not be public. But this works against desire to have everything in Redmine Think aobut it and figure it out. Miriam: There is potential embarrassment. Amber: Example TRELLO tracking in Dryad. Shift: What are the buckets to organiz data> current: MN Become a MN Current MN's Amber: I aliitle difficult to change. Suzie: Add something quicj and fast about why being a MN is helpful! See the picture on the tapestry, not the stiching. Maybe add this before the "What is a MN" Suzie: Divide this conversation into a couple of peices 1. The technical piece documenting 2. Breaking the discussion into digestable bites (Tanner) 3. Recruitment conversation. (we haven't recruited yet) How is recrutiment occuring via website or word of mouth? We have two website inquires Suzie show alliance with MN goals Give prospects ammunition for internal discussions and decision process. https://docs.dataone.org/member-area/working-groups/usability-and-assessment/meetings-usability-and-assessments-working-group/joint-u-a-sc-wg/metrics-and-statistics/Metric%20-%20Stats%20pres_2.pptx/view Example: Electronic THeses and disserations. (ETD's) two approaches: Talkinh to University Admins Talking to Libraries Metrics of three kinds: static, dynamic, transmission. see Coming from grassroots moving up. Identified 5 audience s Library University level administrators Faculty Students IT Suzie: DataONE is missing the boat currently in having a place to understand why this is important. We have multiple audiences and they will come in differnet ways. Have two paths when getting to MN pages: technical level above in terms of responsiblity (Decision maker) Awareness: Once we have created awareness where do we drop them to give them information to inform them. Operationally nodes may want "just the facts Ma'am" to "Git-R-dun" Personas - timeline draft 3weeks; do phone call Laura will doodle us Tanner & AmberO &Chelsea - geospatial large govt repository Suzie&Holly&Miriam - academic institutional repository Robert & John - replication node Todd & John & Miriam - weird node (Cultural Heritage) Metrics What is success?