Discussion on Authn / Authz - Matt, Mark, Dave


Architecture docs: http://mule1.dataone.org/ArchitectureDocs-current/design/Authentication.html

Mark's notes: https://docs.google.com/document/d/10dSiRLy8XasCgtzjQT_cDcuMluF6vKU5TsZdRqqjJDQ/edit?hl=en&authkey=CLCAiYMI

LTER access control rules summary: https://spreadsheets.google.com/ccc?key=0AvmNJnP7eHevdGFzdWJORE14ZWs0NFowX1ZRZkd1V3c&hl=en&authkey=COi89YkO#gid=0


TASK 1483: getAuthSession() should probably only be exposed to Member Nodes to minimize potential for spoofing by through abusive clients.

 
What certificates are we going to accept? only CILogon? What other certificate roots should we trust? 


Using LDAP for group management - the DN would be a DatONE generated ID that is used to define a record that contains other attributs that indicate the alternate identities of other users.

NO: Requirement: Need to track level of assurance for identity providers, implies that we need to be able to verify the level of assurance provided by an identity provider

What are the levels of assurance required by the community? e.g. supporting openID for authn provides a very low assurance that the user is that identity.

Setting access policy requires good identity assurance.

Access rules. What really needs to be supported for the communities. LTER is quite liberal perhaps due to mode of operation (mostly metadata). PISCO is perhaps opposite extreme - many datasets are restricted access.

NCEAS groups tend to have more complex write access rules, also often not publicly readable.

KNB - no use of deny directives. -> Removed deny directive from access policy for DataONE.

DAAC? "A set of internal processes"

Dryad? Need to check with Ryan on the mechanisms for authentication, access control. Lkely to be a highly curated model.

Groups  - regstered users can define groups within the DataONE realm. How does this interplay with groups defined outside of the DataONE domain? 

Need to translate access control rules from each external / MN system into the DataONE model, and vice versa. 

How to deal with changing nature of identities and their access permissions? This is part of data curation, and should be handled by the respective Member Nodes and their data curators.  Note this implies that there are data curators.

Data use policies and licensing issues indicate who has responsibility for managing the access control if the origial data owner / creator changes affiliation.

Next Steps:
- Still a lot of design work to be completed before implemetation commences. Especially sequence diagrams for how things work together.
- Some technology / tool evaluation that needs to be done. e.g. grouper

Technologies to evaluate:
1. grouper
2. OpenLDAP multimaster
3. Review access control policies
4. Requriements review

OK 761: Users can specify authorization rules for data objects, science metadata objects, and process artifacts separately
767
TASK 1472 -768 - how to deprecate ownership of some object
TASK 1471 -769: Member nodes need a special curator role
TASK 1472 -795: revocation - through access policies. requires a curator / administrator / SU role on MNs
Replication of access control information. e.g. locking down LDAP servers behind firewalls
NO: Privacy issues - group members visible to everyone?

NO: Support ability to use encryption to ensure restricted access from untrusted MNs - encrypting content to be unreadable by untrusted MNs / clents

TASK 1473: In abscence of rules (Need to establish the default set of permissions in absence of additional roles) - owner read,write, no other read.

TASK 1474: Given that we have a new account method, we need to have an account verification method to ensure people are who they claim to be.

delete: 
NO: xxx: Sensitive data that is encrypted is generally not replicated except possibly to
avoid risks of leaks of that information (confused)

STORY 1475: notification infrastructure design required for xxx: Need process for data consumer to request authorization for an object

TASK 1470: Need a listUsers and listGroups methods : 
xxx: Need ability to create index of users so that clients can use that list to
assign access control rules, and ability to look up Identity for particular users

TASK 1480 (STORY 1476): xxx: Should have a user profile page to review/revise a user’s own identity, group, and
other Identity information - a place to add identities, set contact info, etc

NO: Unsupportable requirement: xxx: Nodes need to be able to assert minimum LOA re: who has write access - basically, if you need secure storage, then this is the wrong place to store data.

NO: 
xxx: May have need for ‘Deny’ permissions on access for convenience (but probably
lower priority)

NO: This is not in scope for DataONE:
xxx: Should be able to deposit content with embargo, but should be able to grant
anonymous access tokens for access to the data without the owner knowing who it
was that had access (use case for anonymous peer review in Dryad). Some implications for logging - don't log identity of attempted accessor

TASK 1481: redundant: xxx: Should groups be able to own objects?

NO: Can't meet this: 
xxx: Member nodes should be able to restrict data access by individuals on Dept of
Commerce Embargo lists at high LOAs – possibly determine that we won’t support
this, but rather that we state these types of objects must not be uploaded

NO: Not supported (ip address will still be logged): xxx: Anonymous access will be allowed for for publicly accessible objects

NO: Difficult to support (recursive query): xxx: Can groups contain groups, and at what nesting depth?  Yes, one level.

NO: (Nope) xxx: ID and access control should work in all geopolitical jurisdictions
(UI issue) xxx: ID and access control should comply with universal design standards

762: User identities can be derived from existing institutional directory services. Attribute transfer / translation rules and trust issue for the operation. Can perhaps genreate a proposed mapping, then have the users verify the suggested mapping. This is really a MN implementation issue.  DataONE will only see D1 AccessPolicy documents.  Not sure there is a task here for us.

TASK 1482: Need a mechanism to collapse two DataONE accounts into one.

---

Prep for CILogon call

- Review notes, architecture docs etc
- Perhaps the issue of brining in other federations into the CILogon infrastructure
- Need a library that we can use for interacting with CILogon

=============

Friday - 2011-04-08

Implementation Q's

- Grouper ( http://www.internet2.edu/grouper/ ) - stable and widely used. LIGO is using for example

- Distributed session management

- Identity management / mapping. Leaning towards replicated LDAP. Common solution, so seems sensible and safe. qualifier: dc=dataone,dc=org
 - how to verify identity? Incommon provides an assertion that Matt == Matt, but other systems vary by identity verification
 - TeraGrid vets PIs, and PIs can vet their students, etc. -- WebOfTrust
 - CoManage project possible for Identity Management? http://www.internet2.edu/comanage/  https://spaces.internet2.edu/display/COmanage/Home
 
 -- Discussed the pros/cons of SAML vs X509 for transporting session details
     -- really are the same; short lived certs can carry all of the attributes we need
     -- would have advantage of not needing any further login, attributes carried via TLS channel client cert
     
TODO: modify two-phase certificate login to use SSL client authentication using the X509 client cert instead
    -- this really raises the question as to why we have a separate session management service, when CILogon really could be the session manager
    -- the main downside is how hard it is to get a CILogon certificate into client control due to the InCommon redirection issue for desktop apps (less of an issue for web apps)
    
- Access control rules (consistency with other communities?)


===========================
Comments from J. Basney regarding architecture docs and conference call discussions -- see email on 2011-04-08:

[We talked about this on the call...] On the Authentication.html page,
there's an ePPN example of "mbjones@NCEAS" but I think best practice is
for "scope" in the ePPN "user@scope" format to be a registered domain
name, to arbitrate for name collisions, so I think a better ePPN example
would be "mbjones@nceas.ucsb.edu". For two more examples, my ePPN values
are "jbasney@illinois.edu" and "jbasney@idp.protectnetwork.org".

One thing to be aware of is that ePPN values are not always the same as
people's email addresses, i.e., we can't expect people to know what
their campus ePPN value is, and they may get confused that it's
different from their email address.

ePPN Reference:
http://middleware.internet2.edu/eduperson/docs/internet2-mace-dir-eduperson-200806.html#eduPersonPrincipalName

-----

Regarding CILogon Subject DNs, the "/DC=org/DC=cilogon..." string
representation is just one option. It's the OpenSSL "oneline" format.
The RFC 2253 format is another option for displaying Subject DNs, and I
think that option is more common for LDAP environments, so you might
prefer that one. I added OpenSSL examples for displaying the different
string formats here:

http://www.cilogon.org/cert-howto#TOC-Displaying-Certificate-Subject-DNs

So for example the following two forms of my CILogon Subject DN are
equally valid:

/DC=org/DC=cilogon/C=US/O=University of Illinois at
Urbana-Champaign/CN=James Basney A122

CN=James Basney A122,O=University of Illinois at
Urbana-Champaign,C=US,DC=cilogon,DC=org

-----

Do you have any requirements for users to be able to switch between
different "roles" (for example, to escalate privileges for rare
operations)? In the grid world it's common to have roles defined like
"software administrator" and "production manager" where I only sign in
to the "software administrator" role when I need rights to update
software installations, and I only sign in to the "production manager"
role when I need to submit project job runs, but otherwise my usual
logon gives me a base set of permissions that doesn't allow those
things. In DataONE, maybe SAML assertions for MN administrators should
only assert membership in the "MN administrators" group when explicitly
requested, i.e., when the administrator needs to perform an operation
that requires elevated privileges. This might bring more complexity than
needed, but I wanted to raise it as something to consider. Another
option is to have different usernames for different functions (i.e.,
"jbasney" versus "jbasney/ADMIN").

-----

As we discussed on the call, I don't think you need/want a custom
"2-phase handshake" protocol for validating certificates. Just use the
certificate in the SSL handshake that's part of HTTPS. In Java on the
client side you should be able to just set a few system properties
before initiating the connection:

System.setProperty("javax.net.ssl.keyStore",
                  "/home/example/usercred.p12");
System.setProperty("javax.net.ssl.keyStoreType", "pkcs12)";
System.setProperty("javax.net.ssl.keyStorePassword", "password)";

Then on the server side, the Java SSLSession.getPeerCertificates() call
will return the authenticated peer certificate chain on which you can
call X509Certificate.getSubjectX500Principal() to get the authenticated
certificate subject. Let me know if you want more details on that for
Java or other languages.

-----

Matt's question about MyProxy on the call got me thinking... I think you
could use MyProxy as the DataONE AuthN Service and issue short-lived
"DataONE" certificates (based on either LDAP passwords or CILogon
certificates) instead of AuthTokens. You could have MyProxy query LDAP
and insert attributes into the certificates it issues, avoiding the need
for a call-back later for attributes in a SAML assertion. Then DataONE
Client Applications could use standard HTTPS with certificates rather
than defining/implementing a new AuthToken scheme (avoiding the concern
about misuse of AuthTokens by rogue MNs).

Of course I'm biased in favor of MyProxy, but it's widely-used, stable
software that issues standard RFC 5280 certificates, and of course I'd
help you get it to do what you need.

Note that the MyProxy protocol
(http://grid.ncsa.illinois.edu/myproxy/protocol/) isn't a REST protocol.
I don't know if that's a deal-breaker. You could always put a REST
protocol in front of it, like Philip Kershaw did
(http://pypi.python.org/pypi/MyProxyWebService)

-----

Overall I'm still a bit confused about the different identifiers
(principals, PIDs, tokens) in the design. You might want a section in
the specification devoted to explaining the relationships between the
different types of identifiers (unless it's already there and I missed
it). As one specific example, in the Authorization.html doc it says,
"Principal identifiers are strings that are found transported in the
subject field of an identifying certificate produced from the
authentication system." Does that mean they can't be ePPN-type
identifiers? What about X509SubjectNames in the saml:NameID (i.e., found
in a SAML assertion rather than an X.509 certificate)?

-----

As I said on our call, I recommend Grouper
(http://www.internet2.edu/grouper/) very highly, and I think we can
learn from how LIGO is using Grouper with LDAP. It'd also be good to
check with Ken Klingenstein about using COmanage
(http://www.internet2.edu/comanage/) for user/group management. Scott
Koranda is the Grouper expert on the LIGO project, so we might want to
pick his brain at some point. He's working with the Internet2 folks on
Shibboleth, Grouper, and COmanage for LIGO via an NSF award
(http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1032468).

Here are links to Scott's presentation's at last year's I2MMs:
http://www.internet2.edu/presentations/spring10/20100428-ligo-koranda.pdf
http://www.internet2.edu/presentations/fall10/20101104-idm-ligo-koranda.pdf

-----

End comments by J. Basney

===============================