..meta::
  :keywords: DataONE, CCIT, 20101215, VTC

Note: This document has been committed to subversion at: https://repository.dataone.org/documents/Committees/CCIT/20101215_CCIT_VTC.txt Please edit that document if changes are needed.


DataONE Developer Call - 2010-12-01
===================================

:Attendees: Jim Basney, Randy Butler, Matt Jones, Mark Servilla, Rob Nahf, Ryan Scherle, John Kunze, Giri Palanisamy, Dave Vieglais


Important Action for CCIT:

- Review the authorization document and the conclusions from the discussion during todays call (below).  Feed back on this is a high priority as it will influence documentation that will be provided to the EAB (mid January) and for the NSF review. It will also influence the next steps for design and implementation of the authn / authz APIs for DataONE.


Agenda and Notes
----------------

1. Brief update on the preservation workshop that was held December 5-6 in Chicago.

- Rough meeting notes at http://goo.gl/QcV9z

Main goals initially are to preserve bits, meaning and behavior. Focus on avoiding losses through technical or managable problms.

Three groups focuses on bit-level; syntax, semantics and pragmatics; organization and network level;

General goal of pulling together a document for the EAB, and completed for the NSF review.


2. Discussion about authorization in DataONE. Matt has pulled together a draft outlining the use of XACML for expressing access control policy by DataONE participants:

 http://mule1.dataone.org/ArchitectureDocs-current/design/Authorization.html

Principal representation - the subject returned includes information generated by CILogon - this is done to avoid pushing around fairly long, opaque strings.

Need to map *any* external identifier to a "DataONE" unique identifier for a principal.

Principal DataONE ID maps to potentially multiple identities from multiple identity providers.

certificate subject -> dataone id -> acess control rule

if AC refers to a specific identity from a specific identity provider:

  AC subject -> dataONE ID

Beneficial if users do their own identity mapping. Existing solutions? not really, examples but nothing to drop in.

Probably a good idea to keep identity mapping and groups in the same service to minimize service call overhead.


*Symbolic Principals*

- Anonymous role (authenticated, but unknown identity). This is a bit of a high bar for DataONE to address. We won't be in the business of ensuring user anonymity

- Process for authenticating services (e.g. "MN A" or "CN 1").  Assumption is that we will create "accounts" for the services with their own ID, generate certificate for authentication etc. Important that a MN can trust that a request has come from a CN.  Could use a group for CNs and base AC on the group - but this has some scalability issues.

*Access Policy Language*

- EML AC language is pretty simple, but still generally too complex for users to understand. Basically follows the Apache config approach (Allow | Deny)

- vast majority of users utilize only a very small subset of access control rules

- XACML - does everything that we would need, but does way more than we need -> complex. See for example:

  http://mule1.dataone.org/ArchitectureDocs-current/design/Authorization-technologies.html

- XACML probably too overwhelming to deal with for now.

- Alternative simple language: http://mule1.dataone.org/ArchitectureDocs-current/design/Authorization.html#access-policy-language

- Problem is that this is yet another access control policy language. We *should* be able to use the existing industry standards.

- Need to be able to represent access control rules on both the MNs and the CNs. We could store this information in the system metadata (current approach)

- Need fairly well defined rules for processing access control rules to avoid logic ambiguity

- Efficiency of rule evaluation is important - store rules at the node, then easier to evaluate. If rules are with every object, then access control policy evaluation for sets of objects becomes inefficient.

- Good to store rules in multiple locations close to the relevant services - but this could also be a significant synchronization problem - how to propogate a change in rules to potentially 100's or thousands of locations quickly.

- Is replicated content dark or can it be exposed according to the access control rules? Certainly can be done, but the implementation of the rules needs to be accurate and trusted.

Ryan - generally lean towards using XACML since there are existing processors - with the caveate that XACML is a hard language to get right.

Randy - Perhaps a simple editor might be helpful. 

Create a XACML profile that constrains the rules that can be expressed within the DataONE infrastructure. Otherwise the rules can be very complex. Doing this though means that we can't accept XACML from other sources that can't be expressed in the profile.

Jim - XACML use in the grid world typically has a simpler front-end language

For example: https://twiki.cern.ch/twiki/bin/view/EGEE/SimplifiedPolicyLanguage

- Deny rules can significantly increase the complexity of implementation. Are they really necessary? Mostly a matter of convenience. If group definitions are fairly easy then unlikely to be necessary

- List Objects on MNs should be restricted to CNs only to avoid the need to evaluating access control rules on MNs.

- Access to other networks *must* be considered as well. But we have no idea what they are planning, so we should drive the process. Hold this federation off until there is an emerging federation that needs to be considered.

- Feedback from LTER? Generally very simple.

- Can we do a simple quantitative analysis of the rules that are in use now to determine what might be excluded?

Big question is whether Deny rules are necessary?

*General Conclusions*

  a. Overall approach of using a simple representation of rules internally in DataONE makes implementation a lot simpler though will have re-construction implications down the road as we work with other federations. However - this will likely be the case anyway - it seems unlikely that any project (without infinite resources) would implement full XACML support, so other projects will utilize a subset. That subset may or may not align with what we select from XACML. So, using our own language, and providing a simple service for expressing our rules in XACML would seem the sensible approach.

  b. Other communities (esp. grid) are using the same approach of defining a simple front-end language with some minimal expression of access control rules. Hence the approach of using a DatONE specific representation is not unusual, and in fact seems to be a common implementation pattern.

  c. Next steps are to complete the draft document to the point where it can provide some implementation design guidelines, then start on the prototyping.

  d. Highly recommended that the approach (adopting and implementing a simplified access policy language) is at least reviewed by the entire CCIT.

  e. The need for DENY rules is still to be determined. General agreement hat they are best to avoid if possible. The possibility of avoiding them is somewhat driven by the ease to which users and administrators can manage group memberships and create new groups.

  f. The issue of certificate subject alteration by the CILogon service is something that should be dealt with by DataONE. We need to provide an identity mapping service where users can define their identity mapping, essentially defining equivalence between identities authenticated from diferent identity providers (i.e. asserting that Dave@gmail.com == dave@ku.edu). DataONE needs to design and implement an identity mapping service.



3. Any other business

None.