Tuesday, April 17, 2007

Author Identification Project in the Netherlands

The key issue in author identification is not whether this author produced a particular work (although the problem of orphan works is a separate issue), but is this author the same author who produced work A and B and C. Disambiguation, particularly in cataloging is a significant problem. Catalog information can have abbreviations, variant spellings or have or be missing diacritics; authors might change their name, go by nicknames or pseudonyms; and translating languages like Japanese, Chinese or Russian into western Latin text can lead to spelling variations. One project aiming to address this situation is based in the Netherlands and consists of a partnership among 12 universities, SURF, UCI and OCLC Pica.

Daniel van Spanje at OCLC PICA presented a status update of the project underway. The Digital Author Identification (DAI) {NOTE: sites are in Dutch} Project grew out of the Digital Academic REpositories (DARE) a Dutch initiative to stimulate the prodcution of digital science online. The project’s goal is to uniquely identify all of the approximately 40,000 authors conducting research in the Netherlands. A successful pilot test at the University of Groningen in 2005-06, identified approximately 3,000 unique authors and researchers. The project was then rolled out to 13 additional institutions in 2006, with an expected completion date later this year. By using METIS, a registry of metadata on publications and researchers in Netherlands and the GGC, the Dutch national union catalog system, information was gathered on the authors to distinguish and de-duplicate authors for assigning IDs. The project has created a central registry of names cover a wide range of identification information, such as variant names, nationality language, date of birth, publications. After the pilot additional fields such as sex, organizational affiliation, titles and dates of employment were added to the file.

The project certainly has privacy ramifications, although according to Dutch understanding of privacy regulations, the project is justifiable as a library/bibliographic resource. It is unclear that the same methodology would be similarly acceptable in other countries.

There are some other initiatives that relate to this project on an international level, with the ISO Technical Committee 46 -- Information and Documentation that is began work on the International Standard Party Identifier (ISPI) in August 2006. ISPI is as a new international identification system for the parties (persons and corporate bodies) involved in the creation and production of content entities. As envisioned, the work already done by the DAI project would be incorporated into a broader international standard if one is agreed upon. Certainly, there will be more work on this important issue and hopefully, we can learn from the progress made by OCLC PICA, SURF and this DAI project.

Labels: , , , ,


Post a Comment

Links to this post:

Create a Link

<< Home