/AHM10182011ONEDriveFeedback

Feedback on the ONEDrive, 2011-10-18

Names appearing in this document:
DV = Dave Vieglais
MF = Mike Frame
RS = Ryan Scherle

================= Summary ==================

Concerns:

overwhelming the user/computer with too many files (+2)
- number of files per folder
- filename length
- total size of content on disk
how best to market this tool? (+1)
paths not being persistent if content in DataONE changes
need a query building tool for file browsing (similar to the Mercury browsing tool)
this is a very (conceptually) inefficient way for a user to browse data
(How does it respond to "ls -aR" ?) (actually works very well)
Published v Unpublished data--Data from Dryad can be accessed
Quality Control
this makes it easy for users to accidentally download a lot of content (throwing off statistics)
when there is a resourcemap, why aren't the contents shown as a bagit package?

Feature requests:

separate filtering from access -- use mercury as the recommended way to search, then allow the results to be opened in a filesystem view (e.g., a "smart folder") (+3)
- allow filtering on geography (+1)
ability to drag a folder (data package, all data associated with a keyword, etc.) into other investigator tools (+2)
shopping cart or bookmark lists (+2)
generate more features like abstract.txt automatically from the metadata, investigate other output formats that people would want (+1)
multi-language support -- generate multi-language abstracts (+1)
ability to find data similar to existing data, or collaborative filtering (+1)
improve listing of keywords, using controlled vocabularies or separating into pages (+1)
adding a hook so that data sets cannot be combined, permanently, which changes the original data sets
take the metadata and format it for as a new data set, perhaps for only particular users.
modify the display to indicate which items will work with which applications (user interface)
ratings system (e.g., quality of the data package)
allow the filter query to be updated on the fly (while the drive is mounted)
Use CDL Namaste tags for enhancing the directory listing?
when opening a file, report the canonical path, rather than the path used to locate the item (this will allow applications to cache a path that should be stable)
statistics should be segragated by the tool/interface that generated them -- this way we can sort out the stats for ONEDrive from other tools

================= Session #1 (Badger) ==================

What is an ITK component? an interactive tool kit to allow communication with DataOne
would there be a problem if a facet has millions of values, or a facet value has millions of data objects?
- DV: it actually performs very well
- Not trying to normalize the keyword facets now
- if there are too many results, we could create a path that was similar to search results paging -- but this may be a problem for tools that memorize paths (the paths would change as items were added)
is everything read-only
- DV: yes, for now
Will the windows implementation work if there are millions of titles?
- DV: Yes, probably.
- suggestion: use the identifier as a way to navigate to a particular file (using a directory scheme like pairtree)(+2)
  - could provide truncated identifier
Why are there so many keywords for so few titles?
- DV: That's a good question.
Is it filtering on the command line:
- DV: Yes, right now the filtering is done on the command line. We are exploring, through OneMercury, a way to browse and search the interface and then drag it to the command line.
it's a challenge to develop a good UI for this type of interface
- spaces and other strange characters in the metadata can cause poblems for filesystem display
For every metadata granule, is there more information in each folder?
- DV: Yes. The sub folders contain information, too.
If I were a librarian, how would I tell scientist what this tool would do for them:
- DV: Direct access to scientific data
- MF: It's making all the data available from the File/Open menu option. Eventually, publishing new data will be as simple as "Save As..."
  - write access isn't as critical as read access
- For COTS SW that has a captured file open syntax, it allows inclusion of DataONE collections
- It facillitates workflow access to data. Often the progress of workflow is paced by data transasctions. So doing this can accelerate and simplify workflows. Less manual data schlepping.
concern about people overwriting existing data (we need to UI to indicate that it is not exactly the same as a local filesystem) (unintentional writing to DataONE)
Can users combine things (data sets) that were not intended to be combined?
Are the keywords from combined categories?
- DV: yes. It is pulling keywords from all of the categories. --> this is an area that is ripe for improvement
separate filtering from access -- use mercury as the recommended way to search, then allow the results to be opened in a filesystem view
- mercury already does something like this with RSS -- can create any search and save its results as a feed (RS: this is common in other search systems as well)
MF: is it higher priority to clean up the features of the read-only system, or to implement the write capabilities?
- a few people said the read capabilities are more important
Does each entry have a unique URL?
- DV: yes, and there is the potential to provide citation information as well.
What kinds of FAQs need to be available when the site goes live to enable ease of use?
- MF: we have to simplify the package by perhaps adding an icon, simplification of the tool
Can people see themselves using this on a regular basis? (response was noncommittal)
What's the advantage of opening the data this way instead of searching for the data and downloading it on my hard drive?
- uniquely identify the data by the identifiers
- there is an advantage in being able to use content directly, in your existing set of tools, rather than having to download, organize, and then work with the content
you can trust the quality of the data more than data that was just found and downloaded.
- how can we convey provenance information?
How much quality control is DataONE providing?
- DataONE is not providing quality control; it is up to the member nodes.
If someone uses the data, will the identifier change?
- DV: the identifier stays with the file
some of these discussions sound like the GitHub model

Uncontrolled keywords

Normalize keywords - use controlled vocab. - methodology, geographic,
instruments

Give people chunks - a-d
Having path - remember path.
Limit Mount points - max. Number
Provide short url - direct path to url - sort of like a DOI.
Filtering via GUI
Parsing Keywords for keyword folder
Automatically provide extract of record, record

Need a temporary staging area - framework of GET

Proveannce

What about units checking - some kind of hook. - annotation property - good
time to think about it before (via resource map).

I want to know everything going on in this park? Copy query through ONE
Mercury

Question - how important is write access?

Develop tool, need these keywords, spatial bounding, this time result.

Use file system to provide access to day - use mercruy to provide search -
give file string to mount. Like smart folder on MAC - gives shorten path,
put in folder. Give direct

Generating mechanism for Filter via through Mercury

Keyword clean-up

Should change default

Generating abstract.txt very helfpul -

Use style sheets for metadata

What kind of open formats are useful is

Auto generated file from Mercury

Generate citation (endnote)

Help pages - purpose, installation, integration with Mercury

Reason to do this vs just web - added value.

Scroll and using tools

Distinguishing between data that is published and not.

Some reason to trust this data, more can do to attist to quality of data,
provenance information, link between metadata and data - deals with issue
of stuff just lying around on web.

Type DOI in and

Stick in the Folder - text citation in the folder - right next to file

I want this dataset version.

This is an active project, want the latest - embedding a version control
system.

================= Session #2 (Hawk) ==================

How do we know what the files are, if the filenames are all just identifiers?
- DV: Those file names are all unique identifiers and the problem is how to make them useful to the users. Perhaps some combination between the file name and the identifier would make the files more useful.
It would be useful to have a sandbox, to be able to save the datasets I like (e.g., shopping cart)
- DV: excellent suggestion and it wouldn't be too hard to enable with log in, you could have a section in your account.
- compared to dropbox/googledocs -- is this a hybrid of their models? (tags, virtual folders, nesting)
Filters--focus on e.g., Geography--is it searchable?
- Yes, the searches can be limited by content area
Can provide search capabilities within a filter
are new keywords added to the list as the metadata changes?
- DV: Yes.
Are their abstracts available for everything?
- DV: Yes, however the quality varies
You mention the sandbox, can you just select the folder and drag it to your desktop?
- DV: Let's see. Yes, it worked.
- windows filesystems let you add a comment to files, which would help for sorting
MF: how important is write access vs improvements to the read-only access?
- should make the read only access more useful before progressing to the write capability
When you make alterations to the datasets, creating new sets, where does it get stored?
- DV: It depends on where you put it. (Concern about these sets getting lost)
DV: how many people would/wouldn't want to use this type of system?
- Social scientist says she wouldn't use this type of program, due to privacy/security concerns
What if someone wants to use a subsect a data set?
- DV: In the future we will be able to provide that kind of service.
What about translating the information in to several languages?
- DV: it isn't a problem to do a rough translation, but it is much more difficult to do an accurate translation. It depends on the language in which an individual is running the package.
add user ratings? and allow users to adjust their search results based on ratings and identities of the raters
windows doesn't allow files to be copied if the file name is longer than a specific length
When there are too many keywords, can they be collapsed into a first letter?
- DV: yes.
Is there required metadata that goes into the facets?
- Yes, and the performance doesn't suffer much.
Are there any measures of similarity in finding data sets? (suggestion)
- could be uncovered as the usage increases, similar to Amazon algorithm (you might also like... suggestion)

Filename browsing is difficult - how do I know what to choose?
- maybe title string and I'd

Some place take datasets I find, my space, don't have to find them again -
bookmarks

IDs move around, new version, what can be trackable

Filter on Geography.

Provide a search. - do in ONE Mercury as search, drag into file system

Take uncontrolled keywords and map to controlled - could do on MN basis for
their controlled vocab.

MN listing of data set

Add comment to file itself at your desktop - right click in windows,
attributes.

Folder coloring option in MAC

Display on screen of science metadata or data to tell if it will work with
your application.

Generate citation records can take into endnote

What about google desktop - chaining queries. Filter within filter that is
in place

Some portions are ready only and some writeable

Key is for write, have a bunch of pieces, can't take 30 minutes to do.

Issue - role of MN

Social science data - have to have controlled access

Potential subsetting

Priority with integration on ONEMercury is high - take a link from Mercury
into OS is big - would be a deciding factor
Improving Read Only, queries, filter, fix filename

International language support. Generate multi-lingual abstract

Data object user rating - then filter on star, who rated it!

Filter on data for use with a separate software

Like adopted interface people familiar.

Folder lengths may be an issue issues of how many files in a directory.

Collapse of keywords in alphabet or grouping.

Top level view across the data to allow drilling down - high level view

Any measure of similartly, want something similar to this.

================= Session #3 (Wolf) ==================

Is this meant for human interace or machine?
- DV: It is meant for human interace usable on your desktop
so the queries are implicit?
- DV: regarding the filters, what you will see when you submit a querie are all of the files that contain that keyword.
why do other keywords show up when the filter is "habitat"?
- DV: these are the keywords that co-occur with habitat
- DV: yes
can we copy the query from the mercury search to create a drive?
Can you just mount the data package?
- DV: yes, you could. (and lets!)
is it possible to change the query at the command line and see an updated view in the file browser?
- DV: right now, you would need to mount a new version of the OneDrive. But it should be easy for us to allow the query to be updated.
This is a highly labor intensive program. Should it be used for someone looking for something more obscure?
- DV: It could be, but yes, it is labor intensive. The program provides a common view across everything, so all files retrieved would appear under a block related to the one keyword.
MF:How important is the write feature?
- It is more important to improve the read feature first, then work on the write feature
- The contributions of data could be twofold: new data and then comments about already existing data. Do you have a way to differentiate?
  - DV: not yet, but we are considering areas where the deliniation can be seen.
Can we see the files with or without annotations, and other information?
Will paths be created in the files with citation information?
- DV: No, eventually, there will be a way to directly export the citation into your citation software.
Pairtree-ing again. But it isn't human friendly, so not ideal for file system browsers
- DV: It is important to present a hierarcy of content, so that the user can understand the content. It enables restriction and categorization.
It is an interesting idea because you have complete power about how you present the data.
If a query has 15,000 hits, will it wait to display all of the hits or list the hits and regenerate as results appear?
- DV: The actual return from the server fetches a lot at first then continues to fetch after the first hits are returned.
RS: how important is preserving the paths to objects?
- the identity of the object is more important than the actual path
- perhaps the filesystem can report the canonical path, rather than the path used to locate the item

Filters that are spatial and temporal are needed

In mercury search, can just copy query string into filter

Mount at the level of the data package.

Use Title of Data Provider as Folder name

More human readable Id of the data.

Keywords - mapping to a controlled vocabulary to follow hierarchy

Apply style sheets for rendering of science metadata

Ability to refresh mapped drive, based on change in filter.
Add readme document if drag to the desktop

Could be a game changer

First stop interaction

Comments about existing information - annotations would be helpful

Issue of depositing in the drive vs through MN needs to be figured out.
Drive would be great for single, potential un affiliated PI.

Configurable default to show with or without annotations.

Potential use into the drive for citations. Drag file into endnote

Cdl use of metadata in directory listing - spec John K has. Tagging
directory type. Nomasday tool?

Pair tree break into smaller directories. They aren't friendly to users.

================= Session #4 (Tamaya) ==================

Why doesn't the archive.txt show up for all items?
- DV: some items are the actual data files (not metadata), so they don't have abstracts
How practical is it to filter on licensing features?
- DV: it would be good if there were categories of licenses instead of plain text explainations
  - And to see who the sponsor of the project was, along with grant #
How often is the view refreshed?
- DV: any time the OS requests a browsing operation -- it's fairly frequent
authentication?
- DV: not supported yet, but if you have a CILogon certificate, it will work with this
Can see this being utilized in the library community building up virtual collecitons
In that get request, it would be nice to know where the downloads have come from
how does this compare to NFS/AFS?
- it's similar, but at a higher level (connecting to DataONE objects, rather than individual filesystems)