Tuesday, April 05, 2011

An introduction to ORCID

ORCID (Open Researcher & Contributor ID) started out as a CrossRef initiative that then flew the nest, with the support of Nature and Thomson. It now has stakeholders including funders, researchers and librarians. Geoffrey Bilder, our speaker today, has been seconded from his day job at CrossRef to be the technical director at ORCID.

The general problem: identity is cheap
The problem at the heart of ORCID's being is that, on the internet, identity is "cheap" - it's easy to create multiple different profiles in silos on different sites, leaving every site with a fragmented view of you.

The problem in scholarly communications
The scholarly record is built on understanding the provenance and 'network status' of content. Publisher brands are based on the 'provenance infrastructure' (credentials of author, editorial rigour, peer review, citations). Both CrossCheck (another CrossRef initiative) and ORCID are key to the credibility of the author, although note that it's not just about authors - it refers to "contributor identifiers" to acknowledge all the other roles. One person (one ID) can contribute in lots of different ways (author, reviewer, programmer, compiler) and can have relationships to other IDs (edited by, co-author, colleague etc).

The knowledge discovery problem: name ambiguity
ORCID is about knowledge discovery, rather than access control or security - about people publicising their work, but ensuring it is credited accurately. The main issue is name ambiguity: name variations, name "collision" (multiple people with the same name, eg. the other Geoff Bilder, a Canadian para-ski-glider), name changes, name translations, corporate authors... All complex problems that must be resolved for accurate crediting within scholarly literature. ORCID's mission is to solve this problem through collaboration; various systems exist - economists use RePeC's author claims service, some countries have national databases of researchers - but regional / disciplinary / institutional silos are unhelpful in our networked age. Aspects of identity can be claimed by individuals or asserted on their behalf by institutions; ORCID recognised it needed to bring both organisational and personal assertions together to seed its system as neither level by itself would ensure sufficient uptake to make the service useful.

Principles and progress
ORCID's ten guiding principles (http://www.orcid.org/principles) demonstrate the organisation's non-partisan, international, open approach. The board is made up of "anyone who can commit the time and wants to participate". So what have they done so far?
  • Thomson donated codebase for its researcher ID to help jumpstart ORCID
  • Various functions were added to this for ORCID's alpha prototype - Thomson's system was based on personal "claims", so the organisational layer had to be added
  • Now working out last details for licensing the codebase to build a phase I version of the system
  • And planning for future sustainability (funding / staff)
  • Hoping to have something that people can use, next year
Questions:
  • Q: Authors are allowed to create profiles - how can IDs remain unique?
    A: Authors cannot change the identifier, only the information associated with it.
  • Q: The contributor ID could become increasingly complex - how do we define where 'contribution' begins and ends?
    A: We will studiously avoid defining that - it's going to evolve. But the answer is essentially that people will record what they think is important, and if it's not important, it won't be counted for much. [Given that people will have to take the time to enter this data, they will likely only claim credit for things that are useful / important]
  • Q: How will this fit with the requirements of REF?
    A: It's not clear where REF responsibilities will sit but hopefully ORCID will make the process of gathering information easier.
  • Q: Pseudonymity?
    A: A lot of this information is public already, but in aggregation it's more powerful. What if it becomes too easy to find details about stem cell researchers in Alabama or animal sci researchers in Oxford. People do have good reasons to want to hide information - even just if you want to be credited for peer reviewing without it being public. ORCID will allow any or all information except the identifier itself to be hidden.
  • Q: What is happening with the development of IDs in different countries?
    A: It would be a bad idea to think "ORCID's coming, let's stop working on our system". Other systems will continue to exist and be important. At minimum, ORCID will be able to include information about other relevant identifiers.
  • Q: What work will be involved for publishers?
    A: A classic example: a researcher submitting a manuscript currently fills in all the information each time, and that information quickly becomes stale (e.g. contact data). In future, they will upload their ORCID, and publishers can query and recheck information as necessary.
  • Q: Who will be the arbiter of who will be attached to a work as a contributor?
    A: For example, the corresponding author will have more credibility in saying who else contributed.
  • Q: Disambiguity of affiliations?
    A: We may integrate with e.g. Ringgold to create a controlled vocabulary for organisations.
  • Q: What are the data protection issues?
    A: We are transparent about what is being revealed, to whom, and we give authors control - they can make anything except the identifier private.
  • Q: What's the long term funding plan?
    A: Exactly. The technology doesn't matter if we can't sustain an organisation to keep it running. We are looking at future models, from related service provision to membership.

Labels: , , ,

Tuesday, April 17, 2007

Author Identification Project in the Netherlands

The key issue in author identification is not whether this author produced a particular work (although the problem of orphan works is a separate issue), but is this author the same author who produced work A and B and C. Disambiguation, particularly in cataloging is a significant problem. Catalog information can have abbreviations, variant spellings or have or be missing diacritics; authors might change their name, go by nicknames or pseudonyms; and translating languages like Japanese, Chinese or Russian into western Latin text can lead to spelling variations. One project aiming to address this situation is based in the Netherlands and consists of a partnership among 12 universities, SURF, UCI and OCLC Pica.

Daniel van Spanje at OCLC PICA presented a status update of the project underway. The Digital Author Identification (DAI) {NOTE: sites are in Dutch} Project grew out of the Digital Academic REpositories (DARE) a Dutch initiative to stimulate the prodcution of digital science online. The project’s goal is to uniquely identify all of the approximately 40,000 authors conducting research in the Netherlands. A successful pilot test at the University of Groningen in 2005-06, identified approximately 3,000 unique authors and researchers. The project was then rolled out to 13 additional institutions in 2006, with an expected completion date later this year. By using METIS, a registry of metadata on publications and researchers in Netherlands and the GGC, the Dutch national union catalog system, information was gathered on the authors to distinguish and de-duplicate authors for assigning IDs. The project has created a central registry of names cover a wide range of identification information, such as variant names, nationality language, date of birth, publications. After the pilot additional fields such as sex, organizational affiliation, titles and dates of employment were added to the file.

The project certainly has privacy ramifications, although according to Dutch understanding of privacy regulations, the project is justifiable as a library/bibliographic resource. It is unclear that the same methodology would be similarly acceptable in other countries.

There are some other initiatives that relate to this project on an international level, with the ISO Technical Committee 46 -- Information and Documentation that is began work on the International Standard Party Identifier (ISPI) in August 2006. ISPI is as a new international identification system for the parties (persons and corporate bodies) involved in the creation and production of content entities. As envisioned, the work already done by the DAI project would be incorporated into a broader international standard if one is agreed upon. Certainly, there will be more work on this important issue and hopefully, we can learn from the progress made by OCLC PICA, SURF and this DAI project.

Labels: , , , ,