Tuesday, April 04, 2006

"Archiving should be done by librarians and archivists, period."

Gordon Tibbitts, Blackwell Publishing.
Quotation from Mellon Foundation, Sep 05 "Digital preservation represents one of the grand challenges facing higher education". Archiving is about preserving. Who should be doing it? What should be archived? What are the current solutions, where are they, how do they work? What critical success factors are there?

"Archiving should be done by librarians and archivists, period."
Publishers often think it should be them -- they should assist (fund), but they aren't the best at it -- they often can't find missing issues ... libraries have them. Publishers have other roles.

Practically: we have to decide what we want to archive. Appropriate content in 3 broad categories:
1. Scholarly journals and books
2. Research material supporting these works (e.g. pre and post-prints, reviews, lecture materials, data)
3. An emerging type: content built from the discourse surrounding scholarly works e.g. blogs, LMS, lecture notes, social networks, conferences, podcasts, message boards, online "webinars". (Where does this type of content start and end?)

Ideally, we should agree to move these types of content from remote locations to a centralised location -- preserving requires a number of things to ensure the material is acceptably stored, and ready for future transitions. Archiving is not necessarily about access, and the focus should be on preserving; only providing non-complex access for restoration purposes.

Various national archives --Dutch National Library, British Library Legal Deposit
  • follow strict copyright requirements
  • allow scholars to have on site access
  • only in the process of evolving ability to provide catastrophic recovery of "lost" works
  • some have govt funding, others thinking about cost recovery mechanisms (which could produce conflict of interest later -- putting a toll gate on the archive)
"Product solution" archives are provided by publishers (really just a big data store, not actually an archive); NFPs such as Portico; even governments such as PubMedCentral (but what's their plan for content preservation long term? Are they not just about content delivery?)

A critical step for an archive should be that material is deposited but is not intended for delivery i.e. in but no out -- which disqualifies most publishers' content piles from archival status.

Institutional repositories constitute "roll your own" archives e.g. D-Space, LOCKSS, eprints, fedora. These mostly contain type 2 content (above); barely anything has been done to store type 3.

"Community-based" archives are emerging which may lead to a networked solution where disparate data stores act as archives linked together by catalogues/indexing solutions. Could be the way forward for type 3 content. CLOCKSS is an example of community archiving.

Critical success factors:
  • governance -- who's in control? Will governments censor? Is there an archivist/librarian running it? It's worrying to think what issues might enter into government policies, thus inducing them to prohibit access to, or not store, certain content? It's worrying if any single entity has control over the archive.
  • economic stability -- how is it funded? by libraries or publishers? We shouldn't lose the chance to create a long term archive by focussing on access and thus antagonising people who make their living out of delivering content.
  • technical soundness. Is it really an archive? Are the standards open for scrutiny? Is the community involved at decision level?
  • community acceptance -- need to know it can be relied on before libraries will cease their own efforts.

Q: Anthony Watkinson. There is a taxonomy on the way as part of UK Legal Deposit. A number of publications have non-text components which can be essential to the message of the scholar. DO you know of any serious efforts to archive and preserve this additional content?
A: I include this in type 3. There are some protocols which have considered multimedia archiving. It is important to realise that storage is one thing, but interoperability with the rest of the scholarly community is key. How do you classify the metadata to provide access to this kind of content? Seems largely to be free text now. LOCKSS does a good job of storing multimedia and is used by about 150 institutions -- but perhaps it's not providing the interoperability that it should.

Q: Bill Russell, Emerald. Resource allocation: we don't know what the future holds. As a publisher, how should we prioritise our resource?
A: many are supporting archiving solutions; generally lots of them, in the hope that one will stick. If we're talking time rather than money, archival considerations should be a component of everything we build -- investing more time in metadata standards.
Q: what if you're a smaller organisation which can't handle the additional requirements?
A: Major institutions & publishers have the time/energy/resource and should enable a mechanism for smaller publishers or lone scholars to get their content into a data store.

Q: Bob Boissy, Springer. Do you mean standards for archiving/preservation should be centrally managed, or that the hardware for the archive itself should be central? (Surely you'd prefer massively redundant distributed server system)
A: I mean that you need to bring things into an archive, you can't just leave things linked -- URLs go out of date. The infrastructure can nonetheless be distributed as LOCKSS is.


Post a Comment

Links to this post:

Create a Link

<< Home