Tuesday, April 04, 2006

Safe places

Erik Oltmans, National Library of the Netherlands (KB). "The International e-Depot"

KB Policy background. E-journals dominate the field of academic literature; as Gordon said, who will take care of the long term accessibility of international e-journals? In print world, local libraries took care of own country's output but this model is no longer sufficient (harder to determine the place of origin for e-publications). If there's no obvious guardian, we risk losing the information.

We could ask publishers to deposit in every national library, but they are unlikely to do so. We should spread the geopolitical risk and identify a small number of trustworthy partners -- collaboration/coordination required -- creating centres of expertise, "Safe Places Network".

"Safe Places Network" ensures systematic, coordinated preservation. Gives libraries a place to get lost content. Publishers need to deposit in a timely manner. Permanent commitment required from archive organisation, requiring substantial investment, permanent R&D (into changing solutions) -- continuous effort. KB is a part of "Safe Places Network".

Risks to regular access provision -- potential disruptions e.g. catastrophic event at publisher's server; withdrawal of publications (commercially motivated); technological obsolescence -- always a key issues: inaccessible file formats.

Archiving agreement between publisher/safe place is critical to cover all eventualities. Should any trigger event occur to disrupt access to research libraries/end users, the archival library can deliver the content.

Mellon Statements (sep 05) endorsed by Association of Research Libraries. 4 essential key actions:
1. Preservation is a way of managing risk
2. Qualified archives provide a minimal set of well-defined services -- storing files in non-proprietary formats [i.e. not PDF?]; restricting access to protect publisher's business interests, unless publisher cannot provide access; ensuring open means for auditing archival practices
3. Libraries must invest in qualified archiving solution -- either its own, or an "insurance collective" (like the Safe Places Network)
4. Libraries must effectively demand archival deposit by publishers as a condition of licensing electronic journals

KB allows 1Mb of storage for each e-publication -- 1 Terabyte for 1 million publications. The project is ingesting anywhere from 5,000-50,000 publications per day. The system is designed to ingest e-journals, e-books and CD-ROMs. Authentic publications are archived, standard formats (PDF [how does this tie in with Mellon 2?] XML). Publications are validated on ingestion by checksums, JHOVE (checks integrity of PDF files) -- procedures for error handling kick in if necessary. Conversion of metadata from proprietary DTD/schema.

2 key strategies for digital preservation, both studied at KB:
  • migration -- files continually converted to newest format
  • emulation -- whereby future users experience original look and feel of document

e-Depot does not compete with publisher-provided access; access is on site for KB visitors or via ILL within the Netherlands (if content is only available within KB). Remote access can be enabled if permitted by the publisher (as some OA publishers do). Retrieval, access, printing and downloading are only allowed for private use; systematic reproduction prohibited; real-time monitoring of user behaviour to prevent abuse. Thus usage is currently limited but as yet no "trigger events" experienced to require broader access.

  • growing volume of international e-journals without "natural fatherland"
  • must be preserved by institutions who take responsibilities -- systematic and coordinated by means of Safe Place Network
  • Mellon Statement defines essential key actions; in line with KB policy
  • KB has made long-term commitment to be part of the insurance collective
  • new publishers welcomed
  • seeking international collaboration
Q: Greg Kerrison, Qinetiq. Concerns about confusion between preservation and access. If you do have a trigger event enabling broader access, a choice will then need to be made between access and preservation. Access always wins; preservation will suffer. Isn't it better to have a plan to provide that access via an intermediary, enabling separation of preservation site from access site?
A: yes, if a trigger event occurs, we would not want to enable access alone -- we'd want to involve third party software vendors to provide that access. Storage and preservation is our daily focus. We don't have special user interfaces; only in exceptional circumstances will we need to provide much access.

Q: Charles Oppenheim, Univ. Loughborough. Are publishers contributing money to this project? It does seem to be an insurance policy for them.
A: not right now -- business model is easy -- no financial transactions. This may change and we are negotiating with larger publishers to find an appropriate model.

Q: Robert Kiley, Wellcome Trust. How do you ensure the integrity of your database? At PMC, public access to the content exposes any missing data. How do you know you've got a complete archive if no-one is accessing it?
A: technically, via the checksum procedures at submission -- this makes sure that everything supplied is loaded. If a publisher fails to supply something, we will need to compare our data pile to theirs. We may begin applying checksums to existing data to ensure it is still OK.

Q: Ahmed Hindawi. If no-one is using your dark archive, how can you know if there's a problem with the data due to a software bug or similar? Exposing your archive is a way to get your content checked.
A: our administrative procedures ensure we know the versions of the files we hold, and we have a preservation manager tool which enables us to couple file versions with technology and ensure that the data is delivered through the right software to avoid versioning bugs. Our migration & emulation studies are also helping us to find appropriate solutions.

Q: Greg Kerrison, Qinetiq. Would it be a good idea to convert your PDFs to PDF-A (archival version) when that's realised?
A: it's an option, but we are concerned that we should not lose functionality within our PDFs.

Q: Gordon Tibbitts, Blackwell Publishing. The archive should not be considered to be the *primary* source for future delivery, and we need to focus on preservation - shouldn't keep getting lost in access-related discussions.


Post a Comment

Links to this post:

Create a Link

<< Home