Coordination Meeting for DataONE, TeraGrid MN, STEM TeraGrid Analyses This meeting is to provide a status update of progress since the last meeting, detailed at http://epad.dataone.org/20110429-D1-TG-STEM Attendees: Matt Jones, Daniel Fink, Kevin Webb, Dave Vieglais, Nick Dexter, Theo Damoulas, John Cobb Agenda * review of target dates: * July 15 - STEM analysis products being deposited dynamically, as analyses are complete * discuss and resolve any blockers at CCIT meeting in July; add check up call to CCIT agenda * status report on action items from previous meeting: * (Kevin - lead, Daniel, Matt, Paul, Vivek): [June 30] come up with real metadata template for STEM results; use Matt's first draft created for the Feb NSF demo * (Daniel - lead, Dave, Nick, Kevin): [May 31] Draft a document describing information flow and major components, systems, services and protocols involved in the interactions * (Matt - lead, Kevin, Dave?, Paul, Nick): [June 15; needs draft of document above] Review the R package design and refactor as necessary for supporting this experiment * Evaluate whether STEM analysis would use R package at all; might be commandline instead * Design packaging/ folder structures etc for pushing content back into D1. Relationships between content elements. * * How many file * 600 files for each fold for a single species * About 200 files per sepcies per year * For an entire species about 1000 files per species * Species count 200-400 species. * per fold takes ~ 3 hours. across 100 cores * across fold summary is ~ 1 hour more * Q: What is the network connectivity between Lonestar output storage adn the Tg-D1 node. It is currently 1 gbs today andwill move to 10 gbs this summer * note: today lonestar is predicting about a 6 hour queue wait for a 5 hour 100 node job (https://portal.teragrid.org/group/tg-portal/hpc-queue-prediction) * Q: What is the average file size (uncompressed, and compressed)? * * (Dave):[June 15] Design and implement setup staging implementation of D1 infrastructure * (Paul): schedule regular (monthly) checkin of this group * June - first/second week * July - during CCIT meeting * August - TBD * new issues to solve? * new action items? * Mount filesystem for intermediate data transfer (Nick) * Goal for July CCIT meeting: get all development scripts working to create EML and upload data and metadata to dataone development nodes (Kevin, Daniel, Matt) * Production runs will start in beginning of July -- try to get EML generation finished by the time these runs start so that we have what needs to be archived later * June 30 - Quick Follow-up meeting before CCIT conference Notes: Cobb: Update: Progress on DataONE Member node implementation on TG and mounting filesystems - Perhaps Nick can brief in my abscence AI: (cobb) Create a sicence gateway for this effort and associate it with the active allocation. Note: TeraGrid is undergoing transition form TeraGrid to "XSEDE" as the programmatic successor. Hopefully not too much will change Issue: planning the flwo from computation to depostition into D1 and transport to CLO . This is probably a topic to discuss.