Monday, April 12, 2010

How blogs can drive scientific progress

Adam Bly opens by thanking the Tweeters for keeping him in touch with the conference. Adam, you see, is in New York. We're in Edinburgh. Virtual speakers: a first for UKSG.

Distributing, managing, sharing, visualising data
Adam will speak around the issue of how we can ensure that we are constantly updating and refining our first principles and our technologies in a landscape that is constantly evolving. He illustrates changes in science by showing a picture of the large hadron collider. "It's a reminder of how science has become increasingly global, with dozens of countries (even those in current geopolitical conflict) coming together to collaborate." These huge, global, multidisciplinary collaborations require new approaches to distributing and managing data. Scientists must not only produce, but find new approaches to sharing and visualising, data.

"Open science is what will drive the most profound and robust advancements in the future, and will ensure that science has the greatest potential to affect society for the better."

Adam says we have a social responsibility to make research as available as possible to the world - to scientists, and researchers, across borders - and ensure that the technology that does this doesn't impede progress. (It's nice to hear someone make this argument about the core data, not the finished article - I think the former is much more valuable to progressing research, and doesn't devalue what publishers add.) The challenge is not whether open science is good or bad, but how it becomes scaleable, sustainable and simple to adopt. We need purpose-built software and environments for researchers, so that this extraordinarily valuable community is not spending its time on IT but is instead discovering the next great cure.

Starting over on scientific communications
We need to bed four first principles into this reboot of the system:
  • Digital core - we've been moving online for years, and will continue to do so. But if the core isn't fundamentally digital, then we just end up hacking solutions around the edge, and can't create the kind of intelligence that a truly digital core for research promises.
  • Free flow of information - economic growth is tied to the abundance of scientific information, and we're on track to increasingly free flow.
  • Standards and interoperability - to ensure that as projects progress they can be tied together, so individual scientists are not navigating through disconnected and redundant applications, but bringing disparate pieces together.
  • Knowledge from information - using tools like data visualisation to see realtime changes and extract knowledge.
ResearchBlogging
Adam uses the example of the ResearchBlogging platform, which enables scientists to communicate with each other and the public, using the simple, open blog as a medium. Scientists can easily tag posts so that they can be syndicated to appropriate outlets. It helps ensure that they are contributing to the scholarly record, and that their work and dialogue are being seen more widely than hitherto (journal clubs etc). He shows an example of a ResearchBlogging feed in the PLoS One site - conversations relating to published papers being immediately syndicated back to the source.

Value of blogs in tracking, funding and planning science research
Some researchers in the Netherlands have done the first bibliometric / webometric study of blogging in science, and have used ResearchBlogging as the subject. They found that blogs are more immediate than traditional academic discourse, and are more contextually relevant than academic literature. They focus on the implications of science. This kind of study enables us to understand where discussions around the web are focussed, to better understand the movements of science and direct policy-making and funding.

Labels: , , , , ,

Friday, April 20, 2007

Framework for Improving Link Resolver Systems

In 2006, UKSG funded a research project to explore the industry context of link resolvers and the chain of serials delivery in hopes of describing some of the issues and laying the groundwork for their future resolution. James Culling, Online Project Manager at Oxford University Press presented the report on the research conducted by Scholarly Information Strategies. (NB – The SIS consulting company led by Simon Inger and Chris Beckett recently disbanded, although they are finishing the report on this project for UKSG.)

When a user conducts a search, either in a traditional A&I database, a federated search, Google Scholar, or clicks through a reference link via the CrossRef system, the essential metadata about the object is passed via an OpenURL to a link resolver system, such as those available from ExLibris, Serials Solutions, Innovative Interfaces, and Openly Informatics. At the heart of each of these systems is a core “Knowledge Base” which provides the context to the OpenURL, comparing it to issue availability data, library holdings information and providing a variety of linking options to the content that was searched. The user is than directed to the content that is available to the content available through his/her institutional subscription.

While these systems work extremely well in the vast majority of cases, they are not without significant inefficiencies and inaccuracies. Much of this is due to the complexity of the distributed supply chain of this information. Link resolver providers (each of which has its own data system, structure, and ingest methodology) receive information from publishers and subscription agents, who provide data on publication release, collections, locations, etc. Frequently, this transfer process requires normalization and quality control review, which add to the complexity and opportunity for error. The library in turn needs to provide holdings and subscription information from their own library systems in order to customize the resolver to match their holdings.

Through a series of conversations with publishers, resolver-systems suppliers and librarians, the research has pointed to some issues and barriers that are inhibiting the deployment and use of these systems. Among the issues identified by SIS were: a lack of awareness that significant issues persist and a lack on cooperation in solving those issues; inaccurate or incomplete data; a lack of procedures for transferring titles; lack of data format and transfer standards; and a communal responsibility for data quality. While OpenURL compliance is growing rapidly, there will need to be broader understanding of the role of OpenURL and how it interacts with other necessary information transfers to facilitate the discovery and delivery of content.

Initial recommendations were suggested and may be explored by UKSG and the community. Much like Project COUNTER, a code of practice might be developed which will address Knowledge Base compliance regarding which information is provided and it what formats. Such a code of practice might certify compliance in areas of format, delivery method, timing and OpenURL compliance among the key organizations in this process, publishers, subscription agents, and resolver suppliers. There might also be areas of standards, which could be developed or expanded to improve this information exchange, such as current work led by EDItEUR on ONIX for Serials Holdings, or possibly a NISO SUSHI equivalent for holdings information.

The final report will be provided to the UKSG Board during their May meeting and the report will likely be posted to the UKSG website sometime shortly thereafter. A summary article is also being prepared for the July issue of Serials. Hopefully, as many other similar UKSG research projects have done, this work will lead to significant outcomes that will improve information exchange.

Labels: , , ,

Tuesday, April 17, 2007

Author Identification Project in the Netherlands

The key issue in author identification is not whether this author produced a particular work (although the problem of orphan works is a separate issue), but is this author the same author who produced work A and B and C. Disambiguation, particularly in cataloging is a significant problem. Catalog information can have abbreviations, variant spellings or have or be missing diacritics; authors might change their name, go by nicknames or pseudonyms; and translating languages like Japanese, Chinese or Russian into western Latin text can lead to spelling variations. One project aiming to address this situation is based in the Netherlands and consists of a partnership among 12 universities, SURF, UCI and OCLC Pica.

Daniel van Spanje at OCLC PICA presented a status update of the project underway. The Digital Author Identification (DAI) {NOTE: sites are in Dutch} Project grew out of the Digital Academic REpositories (DARE) a Dutch initiative to stimulate the prodcution of digital science online. The project’s goal is to uniquely identify all of the approximately 40,000 authors conducting research in the Netherlands. A successful pilot test at the University of Groningen in 2005-06, identified approximately 3,000 unique authors and researchers. The project was then rolled out to 13 additional institutions in 2006, with an expected completion date later this year. By using METIS, a registry of metadata on publications and researchers in Netherlands and the GGC, the Dutch national union catalog system, information was gathered on the authors to distinguish and de-duplicate authors for assigning IDs. The project has created a central registry of names cover a wide range of identification information, such as variant names, nationality language, date of birth, publications. After the pilot additional fields such as sex, organizational affiliation, titles and dates of employment were added to the file.

The project certainly has privacy ramifications, although according to Dutch understanding of privacy regulations, the project is justifiable as a library/bibliographic resource. It is unclear that the same methodology would be similarly acceptable in other countries.

There are some other initiatives that relate to this project on an international level, with the ISO Technical Committee 46 -- Information and Documentation that is began work on the International Standard Party Identifier (ISPI) in August 2006. ISPI is as a new international identification system for the parties (persons and corporate bodies) involved in the creation and production of content entities. As envisioned, the work already done by the DAI project would be incorporated into a broader international standard if one is agreed upon. Certainly, there will be more work on this important issue and hopefully, we can learn from the progress made by OCLC PICA, SURF and this DAI project.

Labels: , , , ,