Thursday, April 20, 2006

Where to view the presentations – and another UKSG blogger!

As Bev has pointed out in a handy comment below, all presentations from this year's UKSG conference are available from the UKSG website. I'm also delighted to announce that NASIG President Mary Page, who attended and spoke at the conference, also found time to blog some session reports and commentary – do take a look at http://www.nasig.org/uksgblog/.

Thursday, April 13, 2006

"DSS feeds or whatever ... it all seems a bit complicated"

This is a direct quotation from my mother; I was trying introduce her to RSS feeds so she would be able to keep up with various family members' blogs. My early explanations were evidently not simple enough, so in the end I put together the below as a dummy's mummy's guide to getting set up with RSS feeds.

RSS – an acronym often expanded as "really simple syndication" – is a way of being notified when a website you're interested in is updated, to save you having to keep going to check it. To use it, you need a feed reader. This is a bit of software which is either desktop or web-based. You can tell it which websites you're interested in, and it will periodically check and retrieve updates from those websites.

The one I use is a web-based one so I can check it from various computers. It is called Bloglines: http://www.bloglines.com/1, and having tried a few different readers, I have found it the simplest and quickest to configure, with sufficient (and appropriate) functionality to meet my needs. To set it up:

1. Register: http://www.bloglines.com/register/
This gives you an account to set up with all the feeds you want to read, and you can sign up for some initial feeds on the page you get when you click on the link in the confirmation email (e.g. you could choose "Music lover" from the list on the left, and then sign up for the BBC Music News feed by clicking that box. Or you could choose to be sent Dictionary.com's Word of the Day from the list on the right.)

2. Download the Notifier: http://www.bloglines.com/about/notifier
Choose the appropriate one for your operating system (e.g. Windows, Mac) – this puts a little icon in your system tray (bottom right area of your screen) which will tell you when there are new items in your chosen feeds. The notifier download can also be accessed via the Extras section in My Feeds (Download Notifier).

3. Add some feeds!
Ongoing, the easiest way (I find) to add a feed to your Bloglines account is to browse to the page from which you want to receive a feed, and press a "subscribe to this feed (or blog)" button. You can get the "easy subscribe" code as a bookmark from the "My Feeds" section extras – see "Easy Subscribe Bookmarklet". Once you've got it in your links toolbar, you simply click on it to locate and subscribe to the RSS feed for a web page – try it from this page to see what I mean!

Web pages usually have one of the following orange symbols to indicate that an RSS feed is available: But you can try clicking your "subscribe" button on any page; if the site doesn't have RSS, you will just get a message saying that Bloglines couldn't find an RSS feed for that page. If a feed is available, it might be in multiple formats but don't worry, Bloglines is able to cope with any of them so it doesn't really matter which one you choose. You can subdivide your feeds into folders to help keep on top of them.

You can also set up search feeds that will search all Bloglines-known blogs/sites for a specific term and alert you when it is mentioned; I do this to find out what people are saying about Ingenta. You need to search for your chosen keyword(s) in the box on the top right, and then click "Subscribe to this search" on the results page.

1As the good old BBC would put it, "other feed readers are available": many are as good and perhaps better in some respects than Bloglines. There are useful summaries/analyses of the most popular RSS readers at SearchEngineWatch. If you're brave, there's a pretty comprehensive, and helpfully annotated, list of available readers at http://allrss.com/rssreaders.html. ALPSP members who want to find out more should check out ALPSP's advice note 31 on RSS feeds.


N.B. This post syndicated from Ingenta's All My Eye blog with kind permission :)

Wednesday, April 05, 2006

It's all over now

All good things must come to an end; as ever we must thank Karen Sadler and Alison Whitehorn for their fabulous organisation and support throughout the annual conference. (Ladies, I hope the envelopes you were presented with contained more than a few free biros snaffled from exhibitors' stands). We'll be back in Warwick next year, so see you all there.

For LiveSerials, it's very much not the end though -- if I can make sense of my remaining notes I'll add a few more session reports, but more than that: we shall continue to use the blog as a home for UKSG announcements and information throughout the year. Make sure you have signed up for our RSS feed if you want to keep up to date with news from the serials industry.

Over and out from Warwick.

And they call it puppy love ...

Stephen Abram was due to present the "uneasy relationship" between libraries and Google; instead, Peter stays up on the podium to tell us about "Puppy love versus reality", debunking our infatuation1 with Google Scholar

Peter likes Carole Goble's project, but is less sure about Carole's fondness for Google.

Google Scholar is excellent for undergraduates who need the odd article but not for real scholars -- should be called Google Student!

The reality of Google Scholar is:
  • Secrecy -- about sources, journals, time span, size -- everything
  • Huge gaps in collections crawled -- Google Scholar finds far fewer results than native search engines [I wonder if this is related to how Google displays result statistics; numbers given in Google results do vary and I've heard an explanation for this which I now can't recall ...]
  • Crawling not allowed by e.g. Elsevier
"Information professionals beware of your reputation"

Cited-by numbers are unreliable -- following links shows that the "citing" articles do *not*
cite the "cited" articles. And it displays a number but only shows the first few. Google matches cited/citing references "like a senile neighbour" e.g. confusing zip codes or page numbers for publication years (i.e., it's machine-reading citations using relatively crude algorithms, and results aren't eyeballed for accuracy). Also contains links to e.g. journal subscription rates pages as "scholarly documents". This is a major concern because people use its flawed data (e.g. numbers of results or citations) in analyses and debates; particularly disturbing since there is talk of Google Scholar's citation figures being used for e.g. promotion, tenure and funding decisions.

1 Can an infatuation be debunked, or only a myth?

"Survival of the fittest"

Péter Jacsó1, University of Hawaii -- The Endangered Database Species: Are the traditional *commercial* indexing/abstracting & full-text databases dead?

In a word -- no! Commercial A&I/F-T d/bs are not extinct ... yet.
Depends on habitat.

Indexing only d/bs -- near extinction
Most abstracting d/bs - endangered
Some full-text d/bs - vulnerable

Internal reasons for endangered/vulnerable status --

  • Stagnation e.g. British Education Index on Dialog: its focus is British Education system, but there are more relevant records in ERIC
  • Deflation e.g. Mental Health Abstracts -- journal source has been decimated over the last few years, and it became a waste of money to search it as more content was freely available in PubMed
  • Staleness e.g. GeoArchive -- only 2 updates in 2005. Geobase and GeoRef updated twice a month -- search reveals an order of magnitude more records in the latter databases ... size *does* matter
  • Sloppy production e.g. MHA, Information Science Abstracts: no updates in last year. EBSCO have made LISTA d/b in which nearly all records have abstracts.
  • Flab vs muscle e.g. SportDiscus (prior to EBSCO acquisition) -- too much duplication, seemed big but flabby! And users are paying to access each record. Even Google could bypass the quality provided.
  • Self-destruction e.g. e-psyche. Promised us champagne; didn't even deliver beer. Despite backing of well-known industry veterans.
Peter's forensic evidence (screenshots of search results) available via a link within his presentation (which I'll link to at the UKSG site in due course).

External reasons for endangered/vulnerable status:
Open Access -- 100s of millions of OA indexing records; 10s of millions of OA abstract records (e.g. Medline), millions of free OA full text documents. Threat to full text databases on Dialog "which are as they were in 1976 .. and they are still pretty expensive".

A&I publishers in the triple whammy of commercial competitors + government competitors + smart individuals (who are federated searching OA databases and presenting results for no charge)
-- this is driving enhancements of competitive content
-- innovative hosting platforms e.g. CSA
-- appealing interfaces -- very important (students are spoiled by Ask, Yahoo etc)

An additional problem is self-delusion -- denial and PR-illusion by commercial companies. Databases relaunching themselves -- developments are "Emperor's New Clothes" (again, see links within presentation for examples).

Market is no longer willing to pay for access to A&I databases when the abstracts are freely available from publisher sites and "digital facilitators" -- metasearch engines can freely use the data.

Scirus & Google Scholar "get in the ring"
--Peter disputes Scirus' claim to contain solely scientific data
--Google is "deified" and its citation counts are "very off-base"

Many government databases have smarter software and even offer full text e.g. PLoS, TRIS Online, PubMedCentral, Agricola, NCJRS (which has the best phonetic searching -- Peter tells us he tested 18 misspellings of metaamphetamine [which may or may not be the correct spelling!])

Functionality has moved on in good ways -- no longer just links to the publisher site or the author's email -- now offering links to the references, to lists of other articles citing the current article (which demonstrates value of article)

HighWire's link menus for each article are state of the art e.g. links to ISI's "cited by" records (if the publisher has paid for the service)

Comparison to Dialog's "skeletal" database record; to EBSCO's (which has no abstract); to CSA's LISA (which has no onward links); "Haworth Press are 15 years behind state of the art" -- the cited references are "darn cold" and cannot be clicked on to link to the cited articles.

Users will look on the left and at the top of the screen -- that's where e.g. full text links from abstract records need to be.

Full text is the future; "survival of the fittest" -- those who don't adapt will not survive.

1 I had the pleasure of dining with Péter at the conference dinner on Monday, and am therefore able to advise anyone who wasn't sure that it's pronounced Yotcho (and it's Hungarian)

RAE: top dogs don't slice salami

Jonathan Adams, Evidence Ltd. Research Assessment and UK publication patterns.

The UK's Research Assessment Exercise is a research evaluation cycle which considers output, training, funding and strategy across HE institutions. Peer review panels. Forthcoming changes following recent government budget statement -- shift to metrics post 2008.

The RAE has evidently led to an increase in the UK share of world citations; if citations are a measure of research importance, then UK research is now much improved since the early 80s and has this year overtaken the US in biology and health sciences. RAE is a major driver of research activity in universities. It assesses 4 items per researcher -- including books, articles, proceedings, other works.

Analysis of publications submitted to RAE in 1996 and 2001 -- using publication data to assess comparability between subject areas. Evidence shows that journal articles are proportionally the highest output in science, conference proceedings in engineering, book chapters and monographs more common in humanities, and other content (videos, installations etc.) in the arts.

Researchers submit material for assessment which represents their highest quality work; the assessment will affect the amount of funding received, and departmental prestige.

A shift towards journals is evident -- more journal articles were submitted for assessment in 2001 than in 1996 but in comparing these to ISI's content (Web of Science) it is evident that in some subjects, ISI's coverage is comparatively decreasing -- suggesting Web of Science may be less representative of research in some areas than others (e.g. social sciences less well represented).

Changing cultures -- social science researchers do use bibliometric data to evaluate research quality, but do so in an expert way; journals will become increasingly significant.

(3 days at UKSG caught up with me at this point and the notes I made for the remainder of the presentation make so little sense that they would detract from Jonathan's presentation -- so I have quit while I'm still vaguely ahead!)

Q: Greg Kerrison, Qinetiq. How has the RAE influenced the research process?
A: better overall performance; higher level of productivity. In terms of the way people publish, no evidence that we have increased significantly in comparison to other G8 countries. Suggestions of salami slicing don't seem to be justified; it may be that some (less high profile) researchers are focussing on shorter term goals (in order to have adequate content to submit for the next RAE), but the best researchers are not swayed in that direction.

Incentives, incentives, incentives ...

The transition to electronic-only format: cost and considerations -- Roger Schonfeld, Ithaka

Only with an examination of *incentives* can we find a viable path forward

Based on studies in 2003 (11 libraries), 2004 (publishers):

Publishers
Larger publishers (including NFPs, university presses) have already flipped business models from principally print to electronic-with-print-as-add-on -- pricing has evolved to mitigate the effects of print cancellations on the bottom line (site licenses/tiered pricing).
Larger publishers have significant resources to invest in making the transition, and considerable in-house expertise on which to draw.

Smaller commercial publishers, scholarly societies and university presses -- in a few cases, journals are not yet available electronically; where e-versions do exist, costs have not always been separately tracked, which makes it hard to develop pricing outside of the print model. Focus is more likely to be on humanities/social sciences, which may be responsible for a perceived lack of urgency for going electronic/developing new business model. As may, for example, high dependence on advertising, or high image content (e.g. art history or biology). If there were a dramatic move away from the print format, what would their future be?

Libraries
Costs include not only subscription but selection process, cataloguing, storing etc -- these costs are all lower in electronic format in comparison to print, which is a non-trivial incentive to move away from print and to de-duplicate multiple-format collecting. E formats have provided opportunity to increase size of journal collections.

Economies of scale
For libraries, economies of scale exist primarily for print, not electronic; as print journals are transitioned to electronic, unit costs go up dramatically. Thus the decline of print subscriptions *raises* non-subscription costs substantially at large libraries (which would seem counter-intuitive). "As print collections shrink, will libraries be motivated to move away from print all together?" -- at the very least, there does seem to be an incentive to redesign library processes to try to recapture some of the costs.

Non-e-journals
It seems inevitable that all scholarly journals will have an e-version before long (this is not necessarily the case for books). Several different models could be used to help the transition e.g. collaborations such as BioOne; outsourcing to commercial publishers... each option has its own tradeoffs.

In some cases, could there be no sustainable way to publish an e format? -- some journals may end up replaced by disciplinary/institutional repositories, blogs, and other less formal distribution models.

A business model which is entirely reliant on print today, but is intent on flipping to e format, may result in significant price increases. Libraries should employ programs to consider percentage cost increase and "respond with empathy, else they may unintentionally punish lower-price publishers".

Does OA have a disproportionate effect on lower-price publishers who (a) haven't made the transition to OA and (b) haven't even made the transition to electronic -- this additional pressure on smaller publishers has not really been given much air time in the great OA debate.

Library process
The move away from print is inevitable, "whether or not it is managed strategically". A 'strategic format review', whereby a target for journal cancellations is planned over a timeframe, offers an opportunity for a tactical retreat from print and can permit effective cost savings. This is nonetheless politically complicated.

Archiving
Collecting in an e-only environment means libraries don't *own* their acquisitions in the same way, which can complicate archiving when one ceases to collect print. Which types of e-archiving processes are appropriate -- are any ready for comfortable dependence? Efforts include Portico, LOCKSS, British Library legal deposit for e-journals, Dutch National Archive.

Following the transition to electronic formats, is the cost of print holdings justifiable?

What incentives can be developed to ensure the survival of "appropriate print artefacts"? e.g. libraries paying one another to continue holding print.

Conclusions
1. We (the entire serials community) should consider with greater care how traditional society and university press publishers will make a transition to an e-only environment
2. A strategic format review has significant advantages over a chaotic transition
3. Archiving must not be forgotten, for both electronic and legacy print collections.

Q: Diana Leitch, Univ. Manchester. What about the users? We've gone a long way down the e-road, and the demand across all subject areas for e-content is high -- there's a lack of realisation/understanding that some content is not yet electronic.
A: It's clear that there's a growing acceptance of the electronic format, certainly in the sciences and growing elsewhere. Faculty members may not use the bricks-and-mortar library at all, whilst still making regular use of its services; they increasingly suspect that they will cease to depend on libraries, which translates to less economic demand for libraries (there is a lack of understanding about what libraries do). Libraries need to be making a case for themselves in a way that hasn't been necessary previously.

Unbelievably Knackered ... Still Going

It's Wednesday morning and today's expansion of the UKSG acronym is

Unbelievably Knackered ... Still Going.

Masses of fun was had at last night's dinner and quiz as usual. Congratulations to the winning team, whose witty team name escapes me but whose members included Loughborough's Charles Oppenheim, OUP's Richard Gedye (who apparently provided nearly all the answers), IOP's Judith Barnsby and about nineteen others (sorry, but I was already a tad tipsy and am unable to recall the rest of you -- please feel free to identify yourselves and claim your share of the glory). I hope we'll be able to post/link to a snap of the winners in due course (any offers?).

The competitive spirit continued on to the dance floor where shapes aplenty were thrown well into the night. (Top moment: Stevie Wonder's Superstition; lowest moment: two consecutive Shania Twain tracks). I called it a night at a relatively sensible 1.30am. M'learned foolish colleagues carried on partying until 4, although whether 8 people sharing 1 bottle of wine in a kitchen-cum-laundry constitutes a party is debatable...

Tuesday, April 04, 2006

It's snowing!

Safe places

Erik Oltmans, National Library of the Netherlands (KB). "The International e-Depot"

KB Policy background. E-journals dominate the field of academic literature; as Gordon said, who will take care of the long term accessibility of international e-journals? In print world, local libraries took care of own country's output but this model is no longer sufficient (harder to determine the place of origin for e-publications). If there's no obvious guardian, we risk losing the information.

We could ask publishers to deposit in every national library, but they are unlikely to do so. We should spread the geopolitical risk and identify a small number of trustworthy partners -- collaboration/coordination required -- creating centres of expertise, "Safe Places Network".

"Safe Places Network" ensures systematic, coordinated preservation. Gives libraries a place to get lost content. Publishers need to deposit in a timely manner. Permanent commitment required from archive organisation, requiring substantial investment, permanent R&D (into changing solutions) -- continuous effort. KB is a part of "Safe Places Network".

Risks to regular access provision -- potential disruptions e.g. catastrophic event at publisher's server; withdrawal of publications (commercially motivated); technological obsolescence -- always a key issues: inaccessible file formats.

Archiving agreement between publisher/safe place is critical to cover all eventualities. Should any trigger event occur to disrupt access to research libraries/end users, the archival library can deliver the content.

Mellon Statements (sep 05) endorsed by Association of Research Libraries. 4 essential key actions:
1. Preservation is a way of managing risk
2. Qualified archives provide a minimal set of well-defined services -- storing files in non-proprietary formats [i.e. not PDF?]; restricting access to protect publisher's business interests, unless publisher cannot provide access; ensuring open means for auditing archival practices
3. Libraries must invest in qualified archiving solution -- either its own, or an "insurance collective" (like the Safe Places Network)
4. Libraries must effectively demand archival deposit by publishers as a condition of licensing electronic journals

KB allows 1Mb of storage for each e-publication -- 1 Terabyte for 1 million publications. The project is ingesting anywhere from 5,000-50,000 publications per day. The system is designed to ingest e-journals, e-books and CD-ROMs. Authentic publications are archived, standard formats (PDF [how does this tie in with Mellon 2?] XML). Publications are validated on ingestion by checksums, JHOVE (checks integrity of PDF files) -- procedures for error handling kick in if necessary. Conversion of metadata from proprietary DTD/schema.

2 key strategies for digital preservation, both studied at KB:
  • migration -- files continually converted to newest format
  • emulation -- whereby future users experience original look and feel of document

e-Depot does not compete with publisher-provided access; access is on site for KB visitors or via ILL within the Netherlands (if content is only available within KB). Remote access can be enabled if permitted by the publisher (as some OA publishers do). Retrieval, access, printing and downloading are only allowed for private use; systematic reproduction prohibited; real-time monitoring of user behaviour to prevent abuse. Thus usage is currently limited but as yet no "trigger events" experienced to require broader access.

Conclusion
  • growing volume of international e-journals without "natural fatherland"
  • must be preserved by institutions who take responsibilities -- systematic and coordinated by means of Safe Place Network
  • Mellon Statement defines essential key actions; in line with KB policy
  • KB has made long-term commitment to be part of the insurance collective
  • new publishers welcomed
  • seeking international collaboration
Q: Greg Kerrison, Qinetiq. Concerns about confusion between preservation and access. If you do have a trigger event enabling broader access, a choice will then need to be made between access and preservation. Access always wins; preservation will suffer. Isn't it better to have a plan to provide that access via an intermediary, enabling separation of preservation site from access site?
A: yes, if a trigger event occurs, we would not want to enable access alone -- we'd want to involve third party software vendors to provide that access. Storage and preservation is our daily focus. We don't have special user interfaces; only in exceptional circumstances will we need to provide much access.

Q: Charles Oppenheim, Univ. Loughborough. Are publishers contributing money to this project? It does seem to be an insurance policy for them.
A: not right now -- business model is easy -- no financial transactions. This may change and we are negotiating with larger publishers to find an appropriate model.

Q: Robert Kiley, Wellcome Trust. How do you ensure the integrity of your database? At PMC, public access to the content exposes any missing data. How do you know you've got a complete archive if no-one is accessing it?
A: technically, via the checksum procedures at submission -- this makes sure that everything supplied is loaded. If a publisher fails to supply something, we will need to compare our data pile to theirs. We may begin applying checksums to existing data to ensure it is still OK.

Q: Ahmed Hindawi. If no-one is using your dark archive, how can you know if there's a problem with the data due to a software bug or similar? Exposing your archive is a way to get your content checked.
A: our administrative procedures ensure we know the versions of the files we hold, and we have a preservation manager tool which enables us to couple file versions with technology and ensure that the data is delivered through the right software to avoid versioning bugs. Our migration & emulation studies are also helping us to find appropriate solutions.

Q: Greg Kerrison, Qinetiq. Would it be a good idea to convert your PDFs to PDF-A (archival version) when that's realised?
A: it's an option, but we are concerned that we should not lose functionality within our PDFs.

Q: Gordon Tibbitts, Blackwell Publishing. The archive should not be considered to be the *primary* source for future delivery, and we need to focus on preservation - shouldn't keep getting lost in access-related discussions.

"Archiving should be done by librarians and archivists, period."

Gordon Tibbitts, Blackwell Publishing.
Quotation from Mellon Foundation, Sep 05 "Digital preservation represents one of the grand challenges facing higher education". Archiving is about preserving. Who should be doing it? What should be archived? What are the current solutions, where are they, how do they work? What critical success factors are there?

Who?
"Archiving should be done by librarians and archivists, period."
Publishers often think it should be them -- they should assist (fund), but they aren't the best at it -- they often can't find missing issues ... libraries have them. Publishers have other roles.

What?
Practically: we have to decide what we want to archive. Appropriate content in 3 broad categories:
1. Scholarly journals and books
2. Research material supporting these works (e.g. pre and post-prints, reviews, lecture materials, data)
3. An emerging type: content built from the discourse surrounding scholarly works e.g. blogs, LMS, lecture notes, social networks, conferences, podcasts, message boards, online "webinars". (Where does this type of content start and end?)

Ideally, we should agree to move these types of content from remote locations to a centralised location -- preserving requires a number of things to ensure the material is acceptably stored, and ready for future transitions. Archiving is not necessarily about access, and the focus should be on preserving; only providing non-complex access for restoration purposes.

Various national archives --Dutch National Library, British Library Legal Deposit
  • follow strict copyright requirements
  • allow scholars to have on site access
  • only in the process of evolving ability to provide catastrophic recovery of "lost" works
  • some have govt funding, others thinking about cost recovery mechanisms (which could produce conflict of interest later -- putting a toll gate on the archive)
"Product solution" archives are provided by publishers (really just a big data store, not actually an archive); NFPs such as Portico; even governments such as PubMedCentral (but what's their plan for content preservation long term? Are they not just about content delivery?)

A critical step for an archive should be that material is deposited but is not intended for delivery i.e. in but no out -- which disqualifies most publishers' content piles from archival status.

Institutional repositories constitute "roll your own" archives e.g. D-Space, LOCKSS, eprints, fedora. These mostly contain type 2 content (above); barely anything has been done to store type 3.

"Community-based" archives are emerging which may lead to a networked solution where disparate data stores act as archives linked together by catalogues/indexing solutions. Could be the way forward for type 3 content. CLOCKSS is an example of community archiving.

Critical success factors:
  • governance -- who's in control? Will governments censor? Is there an archivist/librarian running it? It's worrying to think what issues might enter into government policies, thus inducing them to prohibit access to, or not store, certain content? It's worrying if any single entity has control over the archive.
  • economic stability -- how is it funded? by libraries or publishers? We shouldn't lose the chance to create a long term archive by focussing on access and thus antagonising people who make their living out of delivering content.
  • technical soundness. Is it really an archive? Are the standards open for scrutiny? Is the community involved at decision level?
  • community acceptance -- need to know it can be relied on before libraries will cease their own efforts.

Q: Anthony Watkinson. There is a taxonomy on the way as part of UK Legal Deposit. A number of publications have non-text components which can be essential to the message of the scholar. DO you know of any serious efforts to archive and preserve this additional content?
A: I include this in type 3. There are some protocols which have considered multimedia archiving. It is important to realise that storage is one thing, but interoperability with the rest of the scholarly community is key. How do you classify the metadata to provide access to this kind of content? Seems largely to be free text now. LOCKSS does a good job of storing multimedia and is used by about 150 institutions -- but perhaps it's not providing the interoperability that it should.

Q: Bill Russell, Emerald. Resource allocation: we don't know what the future holds. As a publisher, how should we prioritise our resource?
A: many are supporting archiving solutions; generally lots of them, in the hope that one will stick. If we're talking time rather than money, archival considerations should be a component of everything we build -- investing more time in metadata standards.
Q: what if you're a smaller organisation which can't handle the additional requirements?
A: Major institutions & publishers have the time/energy/resource and should enable a mechanism for smaller publishers or lone scholars to get their content into a data store.

Q: Bob Boissy, Springer. Do you mean standards for archiving/preservation should be centrally managed, or that the hardware for the archive itself should be central? (Surely you'd prefer massively redundant distributed server system)
A: I mean that you need to bring things into an archive, you can't just leave things linked -- URLs go out of date. The infrastructure can nonetheless be distributed as LOCKSS is.

"Publication costs are just another research cost"

Robert Kiley, Wellcome Trust -- Medical Journals Backfiles Digitisation Project & open access

Project funded by JISC, Wellcome. Supported by several major publishers, and digitisation being carried out by National Library of Medicine. Focus is on providing key teaching resources in history of medicine. Product is available to anyone with a browser but is chiefly targeted at clinical community. Content goes into PubMedCentral, where it is readily discoverable via e.g. PubMed, Medline, Google.

Digitisation is expensive; chose journals based on historical importance, impact factor and comparison to existing titles in collection. Coverage from e.g. 1809 (Journal of the Royal Society of Medicine); 1857 (BMJ); 1866 (Journal of Phsyiology) and including seminal papers. Participating publishers have to agree to backfile digitisation but also to deposit ongoing content in PMC (an embargo is allowed).

References are extracted and matched to PubMed. Underlying data is integrated with the text -- programmatic text mining enables linkage to e.g. chemical compounds in PubChem.

Wellcome Trust now mandates that research it funds must be deposited in PubMedCentral, and it will provide additional funding to cover author-pays publishing fees. Having all content in one place enables more analysis of funding usage.

Currently aiming to create a UK version of PMC which will provide a mirror service and a local manuscript submission system; working with SHERPA to create a database of Wellcome-compliant publisher-archiving policies -- so authors can easily find out whether the journal they want to publish in is compliant with Wellcome's funding regulations.

Publication costs should be recognised as "just another research cost". RK expects that a combination of OA publishing and OA repositories will change the way biomedical research is disseminated, and that improved access to research papers will lead to additional medical discoveries.

Q: Rick Anderson, U. Nevada Reno. I'm concerned that you think publication costs should be considered part of research costs; doesn't that mean less money for actual research?
A: Yes, our figures suggest between 1-2% maximum, which we think is worth it for the improved access to the literature [that author pays OA publishing offers].

Q: Anthony Watkinson. You don't describe how you're going to give the money for payment. Will you hand it to institutions and let them decide whether to give it to researchers?
A: Yes. We have a list of UK universities at which our researchers are based, and we have given them a block of money which they can use to e.g. take out a subscription to PLOS. We don't want to subsidise every single research grant in those institutions, but we do enable the money to be administered by the university (rather than the individual researchers).
Q: who are then at the mercy of their institutional administrators?
A: the grants are not capped; if it can be demonstrated that the money is being spent on funding publication of Wellcome Trust papers, we will top it up when it expires.

Notes from an exhibitor

A busy day yesterday, with time split between keeping the stand staffed, and going to the very interesting papers...the first paper of the day was extremely interesting - reminding everyone that scientists are not really interested in the format of information, and do not make a distinction between information in "published" sources, and other kinds of data. Scientists are already using automated systems to get around such distinctions... very interesting indeed. Always useful to be reminded of the "view from the end-user".

Monday, April 03, 2006

"Facilitators, not gatekeepers"

Linda Stoddart, UN Library (NY) -- From Support to Mission Critical: United Nations libraries in transition

Trying to move from being a "nice to have" service to an essential resource, primarily for financial reasons, as the library is not currently being used. What's important for our staff, and for the delegations representing the 191 United Nations missions? -- most come in with their laptops and Blackberries (sp?!). How do we be sure we are there for our clients?

The library is now managing the intranet which has completely changed our role, and we have become technology consultants.

We are looking to communicate our new vision, and develop a strategy. Big changes for staff in the library many of whom have been in their roles for a long time. It's bureacratic. We need to celebrate our successes and learn from our mistakes.

"From collections to connections" -- summarises our approach. People to people. Maybe 20% of staff/delegations are using our libraries -- how do we service the other 80%? Last year we launched a lectures and conversations series with presentations in the library's auditorium. We brought in key speakers (e.g. Kofi Annan; the President of the General Assembly). High profile events to create a knowledge-sharing opportunity -- e.g. a lecture series on the 2004 Asian tsunami.

Changing skill sets -- we need to embolden people to communicate our new vision. We need extroverts. Streamlining processes; creating partnerships.

We need to deliver what's important to senior management, and the intranet has helped us to do this by providing a device for supporting dialogue
  • supporting core work (UN Reform)
  • creating trust between management/staff
  • providing ideas for internal messages
  • assisting use of all new technological tools
  • partnering with other organisational units e.g. IT, HR (library has been very marginalised in the past)
Learning to influence decision making
  • what do people need to know, and when
  • how can applications/tools be used effectively
    • IT dept too busy to provide this level of support; library is on the front line
Within the UN Secretariat -- there are political considerations; low usage of print publications, medium use of e-data; high use of news sources and direct contacts

International organisations are hierarchical -- decisions at high level, and often limited communication of those decisions.

Junior staff are not part of this process and lack a sense of responsibility. How can we embolden people, and change this organisational culture? It is formalised, structural; rewards are based on rank, and there's limited change/risk-taking.

Old -> new
Bureaucratic -> enabling staff to take initiatives
Multi-leveled -> mobility in functions and amongst sections. Some staff had spent their entire career indexing. Not healthy? Seemed normal a couple of years ago! People are ready for a change, and feel part of it.
Policies/procedures that focus on process -> p/p that facilitate meeting client requirements. Lots of time was spent on internal library issues, and staff were blind to what clients wanted.
Silos -> team-oriented.
Culturally:
centralised -> empowering
introverted -> extroverted
focus on activities in library space -> networking and coaching. Learning to use the space -- renovations for the first time in 50 years; an opportunity to rethink our facilities -- better training; video conferencing services; some spaces still for quiet research, but also an area for networking.
slow decision making -> quicker
defensive -> open to feedback (being sensitive to real needs)
insecure -> confident

Training and development is key:
emphasis on library processing technique -> focus on inter-personal skills e.g. interviewing/coaching techniques
learning new library management systems -> understanding content management tools. Integrating everything we do -- email, records, database searching. OPACs are out of date. We no longer need discrete systems.

Questions we ask ourselves: What information (a) do staff need (b) should be shared (c) is needed when, and in what form -- and how should information be organised, stored, accessed and communicated?

New interface - i-seek -- is now the only thing UN senior management know about the library. Links off to all relevant information -- HR, content for new staff, messaging from senior management, etc.

Changes in outlook and attitude -- we are embracing new opportunities, and moving in new directions -- using skillbase of facilitators, not gatekeepers. Identifying new approaches to knowledge sharing and organisational learning, in order to influence decision making process.

Obstacles ... bureaucratic procudures (hard to make decisions); new skills require more training (recruitment is slow); staff still feel boxed in despite understanding the need for change

Opportunities ... to be more flexible; to adopt new skills -> roles -> responsibilities; flexibility and experimentation; team approach = networking, partnering; staff have new challenges.

Changing perceptions: new signals and symbols.

"Whistling past the graveyard"

In the first plenary session of Monday afternoon, Rick Anderson stepped up to ask "What will become of us? Looking into the crystal ball of serials work".

What's already happened
  • information has become much more abundant and accessible -- less need to visit the library to locate content
    • content is no longer king?
    • information *seems* cheap and ubiquitous to patrons -- and this user perception will shape librarians' future
  • attention has become much more scarce
    • users have less time and are less willing to invest it in looking for content
  • the information world has become a *fundamentally online* place
    • librarians need to come to terms with the user notion that "if it's not online I'm not interested"
At U. Nevada, Reno, usage of online content is going up; dramatic drop in circulation since 1994. Number of items checked out per student down by 45%, and further if you exclude DVD check out. If we don't acknowledge these changes to circulation, we're whistling past the graveyard (i.e. putting on a brave show in denial of our worst fears). How far down are these numbers going to go before they stop? What will form the "hard floor" of materials that continue to be used? -- answers likely to vary from institution to institution.

Things likely to happen next
  • The amount of high-quality information available at no charge to the public will continue to increase
    • "follow the money" -- in the last decade, lots of people have worked out how to make money from putting free content online
  • The percentage of high-quality information available at no charge to the public will never reach 100
    • the OA movement will continue to grow and develop, but Rick is agnostic and suggests it's not likely to replace scholarly publishing as we know it
  • Of what remains non-free, we will continue to purchase the wrong things for our patrons
    • one of the biggest elephants in the crowded living room of our profession is the large amount of money being spent on content that's not necessary
    • we must deal with the elephant -- as our funding bodies will reevaluate how we are funded
Things that are quite likely to happen
  • 1Laptops will replace desktops, at least among students (and mobiles may replace laptops)
    • *compare # of laptops in your library 2 years ago to now
  • Something like Google Print will emerge and take hold
    • *remember: follow the money -- Google have dramatically demonstrated how much money can be made providing access to content
    • *we can see this in Yahoo!'s nascent e-book project, or Amazon's "Search inside the book"
  • Journal inflation will continue, and library budgets will not catch up
    • *tax payers are unlikely to rise as one and insist that civic leaders give libraries a bigger budget ...
    • U. Nevada Reno had a flat budget this year compared to last, and cut monographs purchasing to protect journal subs
    • *when Rick asked, hypothetically, "what would you do if your materials budget was cut in half", he was disappointed to learn that people considering looking at their statistics and cancelling some serials...
      • some folks said they would stop buying books all together -- Rick was surprised, but "we might be forced into this as a short term measure"
What does this mean for serials and acquisitions work?
Laptops -- more remote access = fewer people in the library. Gate counts are already low; what if this declines further as users no longer come in to use work stations? It will become hard to justify staffing/existence even if services are still valuable -- will they be perceived as such?

As more info is free online, it will be harder to justify materials budgets. Administrators are desperate to make cuts. Can we make compelling enough arguments to keep our budgets?

Google Print = OPAC flight. Despite sophisticated work to create them, OPACs are crude -- and often "actively user-hostile"; Google may just be a full text index, but its deceptively simple interface is customer-focussed and masks v. clever back end processes.

Since not all information will ever be free, patrons will need someone to pay for it -- but will that be a librarian? U. Nevada Reno's collective purchasing in recent years has involved very few librarians.

Patrons need to get information more quickly -- faster, and more targeted access. We will have to find a way to deliver this.

Conclusions
  • more information, more broadly available
  • less usage of printed materials
  • more remote use of library resources
  • less use of the OPAC
  • more difficulty justifying staffing and budgets
Q: Todd Carpenter, BioOne. As more quality information is freely available (one of this morning's speakers mentioned that 20% of users think they're accessing OA titles when they're not) - is there anything that publishers and librarians can do to overcome this perception that information is free?
A: Users don't care -- our goals should be getting that information to them as transparently as possible. But there does need to be some level of awareness to support expenditure. Some databases do offer customised branding, which can help as long as it doesn't get in the way.

Q: Alexis Walckiers, ECARES. You say you will have more difficulty justifying staff/budgets. In my experience, having information scientists to demonstrate layers of quality of publications is important. Could this be the librarian role of the future? Also, users find it hard to get to information, and need to ask librarians for assistance in locating it. Librarians are still better at discovery.
A: these are two key areas -- we shouldn't try to get patrons to change their behaviour, but to affect students we should work more closely with faculty and get our services integrated into the curriculum. Faculty members have a power over the student that librarians don't; use it.

(For some more details, see my review of Rick's presentation at February's ASA conference)

It's Monday morning ... "Mice Love Lard"

UKSG has officially started, and first on the podium (following retiring chairman Keith Courtney's welcome, above) is Carole Goble from the University of Manchester, with an excellent review of how workflows can be employed to better connect researchers to the collective content they use. My on the spot notes below:

Bioinformaticians' (people working in life sciences) daily work = identifying new overlapping sequences of interest -- looking them up in databases and annotating to indicate similarity to genetic sequence under investigation.

Example: 35 different resources; all with web interfaces; many publication-centric. Copy and pasting content from different resources, annotating by hand. Can't replicate or log activity to see if it's been done accurately

Bioinformaticians do not distinguish between data and publications; publishers need to recognise there's not a difference between these 2 types of content for users.

Heretical view: CG doesn't read journals -- but does read content on a pre-print service (journals are outdated).
Where conference papers turn into journal papers -- the first iteration may well be the Powerpoint.
"Google is the Lord's work" -- "I haven't been to the library for 14 years!" -- can find it from laptop and send a PhD student to the library if really necessary ...

Workflows: computerising the research process. Enabling machines to interoperate and execute the necessary processes.
"Workflow at its simplest is the movement of documents and or tasks through a work process" (Wikipedia)
Simple scripting language specifies how steps of a pipeline link together -- hides the backend fiddling about
Linking together and cross referencing data in different repositories -- including serials.
Everthing needs to be accessible to the workflow machinery -- including serials.
Results can then be annotated -- semantic metadata annotation of data -- *and* provenance is tracked accurately for future checking/use. You can then reuse, or amend and reuse, your workflow. So the workflow protocol itself becomes valuable, not just the data (therefore need to test thoroughly to make sure it runs oin different platforms etc.) CG cites Taverna Workbench research project: still just a flaky research project but 150 biocentres already using it. And it's just one of many workflow systems.

Workflows can cut processes down from 2 weeks to 2 hours. Publishing workflows enables them to be adapted and shared throughout the community.
e.g. Use of PubMedCentral portal to make into web service for machines to read. Life sciences databases interlink e.g. Interpro links to Medline - these links can be used to retrieve article. XML result is "just" part of the workflow and can be processed and used further down the worklow. Extra value service e.g. Chilibot -- text mining, sits on top of PubMed and tries to build relationship between genes, proteins. Can again be made into a computable workflow. (Using this workflow, the scientist was able to discover that Mice Love Lard.)

Some results will need somebody to read them! -- mixture of machinery and people.

Termina software (Imperial College & Univ Manchester?) looks for terms and recognises them to associate them with a term from a gene ontology -- using text mining -- but would be easier if text mining wasn't necessary i.e. if terms could be identified and flagged at point of publication. The information/knowledge (that these terms are controlled vocabulary) is there at the point of publication -- so why lose it, only to have to reverse engineer it later.

(This reminds me of Leigh Dodds' paper "The Journal Article as Palimpsest", given at Ingenta's Publisher Forum in Dec 2005 -- view slides .pps).

Several projects working on this -- Liz Lyon's eBank project "confusogram" -- escience workflows & research feeding institutional repositories, but also conference proceedings etc. At the time of data creation, annotation is done -- publication & data are deeply intertwined -- breaking up the silo between data, experiment & publication.

Active data forms a web of data object and publications -- all combined together. Workflows also offer provenance tracking - at the point of capture, giving you evidence for your publication which should also be used within the publication.

Web services, workflows
-> publications need to be machine-accessible.
-> Licensing needs to work, so workflows can be shared
-> DRM, authorisation, authentication all need to work
Integration of data and publications
->workflows need to link results -- need common IDs
Semantic mark-up at source
->need better ways to interpret content
Text mining
-> retro-extraction is more useful if it can read full text not just abstract

Why isn't workflow/data published with publications?
-- privacy/intellectual property; workflows give too much away! Need to be slightly modified before publication. They do also need to be licensed -- what model? -- to enable reuse/sharing of results/workflows.

Conclusions:
o machines, not just people, are reading journals
o if journals are not online, they are unread
o workflows are another form of outcome which should be publishing alongside data, metadata and publications
o Google rocks!

http://www.mygrid.org.uk
http://www.ukoln.ac.uk/projects/ebank-uk/
http://www.combechem.org

Q: Anthony Watkinson: what should we do to support our editors to offer the best they can to life scientists?
A: Life scientists want semantic mark-up & web services, so they can avoid expensive, unreliable text-mining. So we need to be able to access the journal's content through a computational process -- and ensure that the same identifiers are being used across databases.

Q: Richard Gedye, OUP. Happy to see an OUP journal being used ... wrt controlled vocabularies-- how many of these are there? Should we ask our editors for their advice on which to use? Are there standards?
A: http://www.bioontology.org -- big US project bringing together all the bioontologies being developed in the community; controlled vocabularies only make sense if there's community consensus. Is very much the case in life sciences but different levels of endorsement.

Q: Peter Burnhill, EDINA: fingerprinting vs controlled vocab -- is the need to access full text primarily to discover relevant material, or to provide access to it?
A: Both, we want to be able to put it into the pipeline so need to enable access. But also need it for discovery, and primarily (now) for text mining.
Re. fingerprinting -- helping to create Controlled vocab as well as identifying common terms.
(to what extent is there a limit to controlled vocab and does it need to rely on a lower level identiication structure?)
You do need both -- and identifiers representing a concept, because the words being mined will change. Building controlled vocab is an entire discipline in itself...