Tuesday, April 08, 2008

When HEFCE underspends: a £22 million JISC digitisation project

In 2004, a £10 million HEFCE underspend [crikey Moses!] resulted in a windfall for JISC: Jean Sykes recounts being told to "spend this in 2-3 years on large scale digitisation projects, please."

JISC reviewed a list of extant proposals for content digitisation, but considered it important to consult the community and bring new bids to the table. 6 major projects were selected for Phase 1 - the largest digitisation activity in Europe - ranging from 18th century British parliamentary papers to British Library archival sound recordings. [Was it these chaps who had Charlotte Green in stitches last week?] The latter group set up a user panel to help decide which of the masses of recordings in the archive should be prioritised for digitisation.

Standards had to be agreed across all projects, and multimedia in particular presented a variety of obstacles. But from this, a JISC digitisation strategy is emerging. Lessons were learned:
  • user consultation (do it - and get some experts in)
  • procurement (technical and commercial issues)
  • metadata (metadata, metadata - importance cannot be overstated - build it in from the outset)
  • quality assurance and evaluation throughout the project
  • impact assessment (an increasingly big deal - projects now need to build in licences and metrics from the start)
  • project management - and capturing of lessons learned
  • interface accessibility
  • promotion of the finished service.
Phase 2 covers another 16 projects with a further £12m funding from JISC (big up those crazy HEFCE underspends!). Seven thousand reel to reels! Four thousand hours of recordings! Fifteen thousand Giles cartoons! Three thousand high quality Pre-Raphaelite images! Fifteen thousand theatrical objects! Half a million pages of Cabinet Papers! Over one million pages from national, regional and local newspapers! Five thousand university theses! Great War poetry and contextual archive material! [Apologies for all that terribly unliterary exclamation, but really, the breadth and scale of this stuff is staggering - did I already say Three cheers for HEFCE underspends?] Phase 1 and 2 projects will be free at the point of use to UK HE and FE, and some to schools and public libraries.

And now they're already preparing for Phase 3 (and here was I thinking Phase 3s are merely the product of an over-optimistic imagination). Work is underway to assess impact/usage of Phase 1 projects, which unfortunately did not have statistics built in from the outset so some qualitative indicators will need to be used. A gap analysis will be conducted to assess the community's needs, and the development of thematic portals will be investigated to make resources more comparable and usable (these could be extended to cover JISC collections, too). Future sustainability remains a big challenge - keeping digitised content accessible; migrating it to future formats and platforms; updating collections with new content. Ultimately, librarians may need to be prepared to subscribe to this content to ensure its preservation.

Labels: , , , ,

Mass digitisation of historical records for access and preservation - Dan Jones, Head of Business Development, National Archives

The National Archives holds 175km shelves hold government records but increasingly they operate as a digital archive. For each physical document delivered, 100 are delivered online. Over 60 million documents are already available electronically and the National Archives are at the heart of Government policy on information.

What digitise?
  • Create high quality digital surrogates of records - this helps keep the originals preserved
  • Maximise the access to documents by delivery over the internet
  • Use technology to add value through indexing, contextualisation, search etc.
  • Also use the web to add quality to site visits too and to segment the stakeholders well
Only wholesale digitisation works for this model.

The National Archives are in competition/affected by the likes of Apple, Google, broadband uptake, web 2.0 (wikis/blogs and the "wisdom of crowds"); emergence of specialist provides (esp. genological and military historians in the National Archives case). Thus users want everything now, everywhere, for free.

However in reality the scale of the collection is vast. Over 100 million catalogue entries. The real cost of digitising the whole collection would be around £5 billion. This shows the importance of the strategic partnership with public and private sector. The scale may be vast but we need to make attempts to begin this work.

Models of Digitisation
The National Archives exploits a "mixed economy" to develop these access services with work being internally funded; commercially funded
Grant funded

Services can be free at the point of use or paid for along the lines of agree stakeholder segmentation. To address different needs, different solutions are needed:

  • Strategic partnerships - consistent, repeat, high volume demand
  • Internal delivery - more manageble resources - specific one off items e.g. the Doomsday Book
  • Digital express - you can request digitisation on demand if you can find what you want. But the catalogue is often not at item level.

Strategic Partners - Awarding contracts
  • Avoid costly time consuming services contracts and tenders
  • Requirements are "output driven" rather than "activity driven" - so specificying the what rather than the how and allowing innovation and flexibility
  • Non exclusive - you may want to rerpurpose and segment
  • Encourage competition - to encourage innovation, quality and services to non-core stakeholders
  • Package collections - commercially attractive with less attractive, difficult with easy etc - to avoid cherry picking.
1911 Census
Scanning takes place on an enormous scale working with Scotland Online. It's over 0.5 petabytes of data. Also very commercially attractive so additional services have been built in: academic, schools, statistical analysis etc. will roll out sequentially as well as a service for home users. Launches 2009.

Dan outlined the advantages and disadvantages of strategic partnerships with commercial partners: finanical risk is on the commercial partnerm, maximum access, re-use of data in knowledge economy, allows many products to be developed at once, but potential loss of control, potential divergence of interests of respective parties, have to agree the agenda, it can be a fragmented user journey and you do need to invest a lot of money and time up front to approve and develop processes.

Organisational Impact
This type of approach means a sea change in attitude and means enabling rather than providing services. You entrust the resources to 3rd parties to preserve. There can be a drain on training resource, supervision etc. And you don't spend less resources but you do apply those resources differently.

Cabinet Papers 1916-1976 is an internally delivered project which is funded by JISC, delivered by documents online, big project that will launch in 2009.

The National Archives are improving search substantially to cross search all databases and present it more intuitively. They are also using a wiki (Your Archives) to allow individuals and experts to exchange ideas and information and it recognises the high expertise of users.

Future Challenges
  • NA will continue to digitise collections but also need to look to new markets, new technologies and new partners (e.g. maps are rich resource but not in high demand at the moment).
  • Provide expertise online
  • Continue to develop and apply customer insight tools - Facebook etc. change all the time and we must be able to develop all the time if we want to deliver services well.
  • Financial sustainability is key to all projects and programmes - the answer may in developing cost effective platforms for delivery but it's not a simple question by any means.

The impact of the Digital Archive on NA use has been immense. Over 80 million NA documents delivered digitally in 2007. The growth has been huge and continued. 81.5% of users are satisfied or very satisfied with online services (surveyed 2007). 95% of users are satisfied with our onsite experience. There is global reach and access is being maximised.

Labels: , ,

Maximising access to, and understanding of, major archives

Dan Jones owns the Domesday Book.

Well, not quite, but it is housed in the National Archives, where he works. The Domesday Book is just one of the 60 million documents available for immediate electronic download (cripes!). Their approach is driven by changing user behaviour (increasing web literacy and expectations) and the pervasiveness of high bandwidth broadband. But in digitising their archives they must contend with over 175km of shelving and over 10 million catalogue entries; Dan's "back of the fag packet" estimate of costs to digitise all this data is over £5 billion (double cripes!).

Models of digitisation
The Archives digitisation activities are funded from internal budgets, commercial investment and grants. Segmentation of the target markets [and presumably funders' mandates?] informs decisions about which services are charged for, and which are free at the point of use.

Strategic partnerships
Content is digitised in different ways depending on demand: strategic partners are contracted for high-demand items, and digital assets, once created, are non-exclusive - i.e. available for repurposing within other services. One current project is the 1911 census. 5 scanners are running round the clock to create 40,000 images per day; these are QAd and transcribed in the Philippines, enabling details of over 35 million individuals to be comprehensively searched. The data will lend itself to use by genealogists, academics, schools and for statistical analysis. The strategic partnership through which the project is operated minimises the risk for the National Archives and allows them (as a facilitator) to simultaneously carry out other work. But there is a potential for the partners' interests to diverge, and the project's agenda has to be balanced to represent the interests of a broader stakeholder group.

Internal delivery
The JISC-funded project to digitise Cabinet Papers from 1916-75 is complex - the papers are handwritten, and don't lend themselves well to digitisation.

Providing context
Autonomy search has been deployed to provide an integrated search function across all databases and websites. Newer archives have been loaded into a Wiki-based resource which allows individual experts to contribute their ideas and information; "some of our users are far more expert in particular areas of these holdings than we are ourselves".

Future challenges
  • Georeferencing will allow map collections to be unlocked
  • organic growth of legacy systems means experts need considerable training to operate systems (so tools need to be developed)
  • customer tools also need to be constantly reconsidered
  • projects and programmes need to be financially sustainable.

Labels: , , ,