Tuesday, April 25, 2006

Workshop report from colleagues in the US

Exclusively electronic: redeploying staff to manage electronic journals
Cindy Hepfer and Susan Davis, University at Buffalo, State University of New York

The presenters of this workshop were invited to discuss how Central Technical Services (CTS) and the Health Sciences Library’s (HSL) Collection Management Services at the University at Buffalo (UB) -- a mid-ranked Association of Research Libraries (ARL) institution in the United States. -- were reorganized in late 2004 to promote improved electronic resource management. In January 2005, UB inaugurated a department comprised of 6.25 staff to work exclusively with electronic periodicals and databases. In order to create the new department, all technical services functions at the Health Sciences Library were centralized with those in Central Technical Services except for check-in of the library’s remaining print periodicals and associated binding tasks. In addition, the number of staff in CTS doing monograph acquisitions and copy cataloging was reduced, since the monograph budget had decreased to help maintain existing serials and database subscriptions. Staff from monograph acquisitions, periodicals and standing order management were combined to form a print periodicals and serials unit within the acquisitions department.

The need for a department that focuses exclusively on electronic resource management was explained. The Libraries had made a large investment in electronic resources and users were increasingly dependent on electronic access. At the time of the UKSG presentation (April 2006), UB’s Serials Solutions statistics showed:

  • total holdings: 57,494
  • total unique titles: 36,361
  • 71 databases and packages

Nearly 9,000 of the titles are individually selected or subscribed journals (as opposed to titles received in packages/databases). The 2005/2006 acquisitions budget is over $7,000,000 of which $2,580,000 is spent on centrally funded electronic resources (databases and e-journal packages). In addition, many hundreds of print subscriptions include some level of electronic access. User surveys had repeatedly indicated that faculty, staff and students want the convenience of 24/7 remote online access to journals and databases. Like others in the audience, UB had tried to incorporate electronic management tasks into existing positions. Unfortunately, the complexity and time-consuming nature of managing electronic resources as well as significant demands of the print collection led to an untenable situation. The creation of a new electronic periodicals management department and the redeployment of staff seemed like the most viable option to deal with the crisis.

The presenters emphasized that any library considering or undertaking a reorganization to allow for improved electronic resource management should first review its 'big picture.' Suggested steps included assessing workloads across the library to identify tasks that no longer require the same level of staffing or which can be eliminated altogether, and understanding where user and institutional emphases lie. The kinds of tasks involved in the electronic resource management were laid out: pre-order research, trial management, license negotiation, ordering/renewal, registration, bibliographic and password control, holdings and A-Z list management, problem resolution, vendor/publisher communication, archiving tasks, link resolver management, proxy support and user authentication services, electronic resource management system record creation and maintenance, usage data tracking and analysis, and union listing. At UB some of these tasks are handled by staff in Systems or the Libraries’ Web Office, but the majority are handled by the newly formed Electronic Periodicals Management Department (EPMD). The skills and abilities that staff require in order to capably manage electronic resources were also outlined.

The goals for UB’s reorganization were articulated:

  • involve a larger number of staff in e-resource management to develop and expand expertise and skills
  • create efficient and effective workflows
  • eliminate redundant efforts across campus libraries
  • concentrate attention on electronic resources (no distractions from physical objects)

UB’s EPMD implementation -- and how workflow, training, and communication were handled -- was addressed. The presenters felt the reorganization has been quite successful. In addition to improving control of a huge number of electronic resources and to populating an ERM with data, the creation of EPMD has allowed three paraprofessional support staff, as well as 3.25 librarians, to immerse themselves in electronic resource management. Subscription agents were also praised as a resource that the EPMD team cannot function without. The presenters concluded by noting that it was likely that EPMD would be a short-term solution and that further reorganizations were likely, especially as electronic resource management became more of a maintenance issue. Flexibility is one of the main factors in any successful reorganization.

Monday, April 24, 2006

More briefing session reports ...

ONIX for Licensing Terms
Briefing session by Brian Green (EDItEUR) and Mark Bide (Rightscom)

As the number of digital resources in library collections continues to grow, libraries are having difficulty in complying with the widely differing licence terms applied to those resources by their creators and publishers. The ability to express these terms in a standard format and communicate them electronically to users has become a pressing need.

A report by Intrallect for the JISC included the following requirements from libraries:

  • rights should be expressed in machine readable form
  • whenever a resource is described, its rights should also be described
  • users should be able to see the rights information associated with a resource.

In the United States, the Digital Library Federation (DLF), a grouping of the major US academic research libraries, set up their Electronic Resource Management Initiative (ERMI) to aid the rapid development of library systems by providing a series of papers to help both to define requirements and to propose data standards for the management of electronic resources.

EDItEUR, the international body for book and serials e-commerce standards which include ONIX for Books and ONIX for Serials, commissioned a review of the ERMI work from Rightscom that concluded that the ERMI work was a good starting point for such work but required considerable further development.

Following that review and a proof of concept study, co-funded by the JISC and the Publishers Licensing Society (PLS), work commenced on ONIX for Licensing Terms to support the communication of licensing terms for electronic resources from a publisher to a user institution (e.g. an academic institution or consortium), either directly or through an intermediary. The purpose is to enable the licence terms to be loaded into an electronic resources management system maintained by the receiving institution.

A joint international working party has been set up by DLF, EDItEUR, NISO and PLS to monitor this work which is currently being developed through two projects funded by the JISC PALS2 Metadata programme. Further information can be found on the EDItEUR website
www.editeur.org. Report by Brian Green

Institutional identifiers: how they could streamline the supply chain
Briefing session report by Helen Henderson of Ringgold Inc

Institutional identifiers are not new. They are used for locational and financial purposes, for example MARC Organizational Codes and D-U-N-S numbers. However, they haven’t been used in the journal supply chain because in the past it wasn’t so complex. With the advent of electronic journals and site licences, and multiple intermediaries, this chain has become much more complex and the need to identify the entities involved much more important. In addition to a unique identifier, appropriate metadata is required, for example, location, category, tier, size, URL and credentials. This identifier would be used throughout the chain for licensing. publisher marketing, customer analysis, authorization and authentication, usage statistics and many other purposes. It will be necessary to embed the identifier in all the systems along the way including the ILS (Integrated Library System) or ERM (electronic resources management), publisher fulfilment system, CRM (customer relationship management) and authentication service.

Ten of the main benefits to publishers have been elaborated by Richard Gedye of Oxford University Press:

  • automatic institutional holdings reports, leading to faster, more accurate pricing quotations
  • identify all renewals – eliminate false lapsers/new subscribers – particularly important when continued online access depends on recognizing a renewal as a renewal and not a ‘new’ order
  • improved knowledge of specific market penetration levels
  • easier to merge/purge customer/subscriber lists when companies or journals change hands
  • easier to assess degree of market overlap between potential business partners
  • more useful and sophisticated usage reporting – incorporating usage data from all channels, at institutional level – could generate new licensing opportunities and pricing models
  • more effectively targeted list rental opportunities
  • better marketing response measurement – which lists resulted in the best initial response and ultimate institutional subscriptions
  • easier to track end-users of consolidated subscriptions
  • easier to identify and measure the value of your most important customers.

For the user community benefits would include:

  • reduce gaps in service
    • operations
    • administration
    • support process
  • reduce delays in activation
    • third party – immediate action
  • all institution holdings documented
    • departmental, society, personal
  • usage statistic
    • complete view of institution usage
  • overview of complete institutional relationships with a publisher
  • easy access to archives
    • publisher archives
    • JSTOR
    • LOCKSS
    • Portico
    • British Library
  • registry facilities
    • central IP registration.

A group of industry players have combined their resources to create a Journal Supply Chain Integration Pilot, to see what the issues, hurdles and benefits of such an identifier would be. The British Library, HighWire Press, HighWire Publishers, Ringgold Inc and Swets Information Services are working together on a year-long pilot which started in January to asses the practical benefits of such and identifier and look at issues like implementation and governance. The website for the pilot is: www.journalsupplychain.org and regular progress reports are being posted to the site. There are also papers on the background to institutional identifiers in the journal supply chain at www.ringgold.com and in The Charleston Advisor (www.charlestonco.com).

Document supply – dead or alive?
Briefing session by Mike McGrath, Editor ‘Interlending and Document Supply’

I presented two briefing sessions at the conference; a new experience for me, and pleasurable. A surprisingly high number of 100 delegates signed up for what is a rather specialized area. However, only about 60 actually made it! Who knows why they didn’t come? One delegate, who must remain nameless, told me that he “turned up to any that he fancied, not necessarily the ones he had registered for.” Perhaps he was not the only one to take the spontaneous approach?
Those that did attend appeared to enjoy it, with a few coming up afterwards and thanking me for not using PowerPoint. It seems de rigueur these days but is often used badly and often not needed at all.

I started with a very short lecture on economics which explained why we see so much change – essentially, the commercial publishers’ drive for profits leads them endlessly to seek new products and markets whilst dominating the markets that they already control. I covered the many factors that have led to the decline of about 40% in document supply over the last five years: ‘big deals’, retrospective conversion of serials, e-books, mass digitization of books, open access and copyright combined with digital rights management. I benchmarked these against a prediction that I made in my last year of working at the British Library in 2001 that document supply would decline by about 40% but would then bottom out. The figure was about right but the bottoming out was not. I offered some explanations for why the decline continues. However, the scale of publishing, the increase in researchers, the evidence emerging of continued low usage of many journals ‘sold’ with big deals, led me to the conclusion that “document supply is down but definitely not out”. None of the attendees disagreed – hopefully because of the supporting arguments, which cannot be covered in this brief summary.

Back to Basics
Briefing session by Finola Osborn, Serials Librarian, University of Warwick and Tamsyn Honour, Commercial Support Manager, Swets Information Services

Back to Basics was a briefing session on the basic practicalities of managing print and electronic serials in libraries, specifically for those who are new to the role, from the perspective of a librarian and a subscription agent.

The session ran on the Monday and Tuesday of the Conference with around 25 people attending on each day. The participants were mostly librarians but publishers and intermediaries were also represented.

We were keen to keep to the brief of ‘basic practicalities’ and covered the key areas of serials management: pre-order information, the order process, invoicing, receipt, claiming, renewals, cancellations, e-journal management, promotion and user education.

The breadth of the topics covered within the time-frame meant that we considered only key points within each area, with the librarian highlighting potential problems and the agent offering possible solutions. Most of the key areas are common to both print and electronic serials, and we wanted to ensure that the management of print serials was not neglected in favour of electronic so we started the session by addressing the print aspect. Inevitably, a substantial amount of the session centred on the management of electronic serials highlighting issues such as licensing, access, registration, and electronic journals management systems.

We finished the session with a look at likely future developments, mentioning the pressures on libraries to move to e-only, archiving, the growing importance of usage statistics, changes in pricing models, the development of federated searches, alternative access models (open access, institutional repositories), and the burgeoning e-book market.

The sessions ended with comments from the participants, largely focusing on electronic matters such as access and licensing problems.

Report by Finola Osborn

Thursday, April 20, 2006

Where to view the presentations – and another UKSG blogger!

As Bev has pointed out in a handy comment below, all presentations from this year's UKSG conference are available from the UKSG website. I'm also delighted to announce that NASIG President Mary Page, who attended and spoke at the conference, also found time to blog some session reports and commentary – do take a look at http://www.nasig.org/uksgblog/.

Wednesday, April 19, 2006

Reports on workshops & briefing sessions

This year there were an amazing 30 workshops and briefing sessions to choose from. Afterwards, leaders of briefing sessions were asked to sum up the salient points, and workshop leaders to report back on their session. In case you missed these at Warwick – or need a bit-sized refresher – the editors of Serials bring you the first two of these reports.

Next year’s model?: online journal business models on trial
Briefing session by Paul Harwood and Albert Prior of Content Complete Ltd

Albert Prior and Paul Harwood described progress to date on trialling two journal business models with a small number of partner publishers and UK Higher Education Institutions.

The background to the trials can be traced to the prevalence of the ‘big deal’ in the agreements that have been reached within NESLi2, the UK’s national journal licensing initiative for the Higher Education Community. The JISC’s Journals Working Group, which oversees NESLi2, was interested to explore whether there were alternative models that might provide advantages over the big deal which, although popular, did place increasingly onerous restrictions on institutions.

A report into alternative business models was commissioned and undertaken by the consultancy, Rightscom. Along with the report, which was published in April 2005, a number of business models were presented. The Journals Working Group identified two which they felt merited further exploration and invited Content Complete, the JISC’s negotiation agent for NESLi2, to organize a number of trials. The two models put forward were ‘core plus peripheral’ and ‘pay-per-view converting to subscription’.

Finding publishers and libraries prepared to participate in a trial was not easy as both parties were already stretched with the pressures of day-to-day work and participation in various other initiatives. However, five publishers and ten libraries were eventually identified, with each publisher working with two libraries. The trials will run throughout 2006.

It quickly became apparent that true pay-per-view using credit cards would not be possible and the models were modified to operate on the basis of downloads. It was also clear that the trials would have to take place ‘behind the scenes’ as the respective parties had already concluded their renewals and payments for 2006.

The kick-off meetings with the parties focused on establishing the cost per download, how to eliminate free content from the counting (OA, back-files), possible discounts for PDF and HTML downloads, how COUNTER interprets matters, and how to ensure that downloads from gateway services, etc. were properly captured.

Subsequent meetings have focused on what data is to be captured and reported, analysis of the first sets of usage statistics and preliminary discussions about the financial and budgetary implications of working with models where there is a degree of uncertainty regarding actual expenditure.

Content Complete will be providing an interim report to the JISC in the summer with the full report likely to be completed early in 2007.

Personal digital assistants for health
Briefing session by Sarah Sutton, Clinical Librarian, University of Leicester

Personal digital assistants (PDAs) are pocket-sized computers. When first introduced, they had very limited memory so could only perform basic diary and address book functions. Now they can fulfil many of the roles of the traditional desk top PC and the most current models are even able to access the web and connect with digital projectors to show presentations.

The University of Leicester has been taking part in a joint project with University Hospitals Leicester on the use of PDAs in the clinical setting since 2002. These PDAs, in addition to the diary and address book functions, also have electronic books available to assist doctors and nurses on their ward rounds. The ‘bundle’ of books trialled at Leicester is marketed as ‘Dr Companion’ and includes the British National Formulary (BNF), the Oxford Handbooks of Clinical Medicine, General Practice and Clinical Specialities. The Oxford Medical Dictionary, summaries of the current Cochrane Systematic Reviews and NICE guidance, Clinical Evidence and much more. This resource is like a small medical library on a computer the size of a big mobile phone.

PDAs are also able to access electronic journals via a selection of routes. There is a PDA version of Adobe Acrobat so articles can be downloaded from the web and saved from a desktop PC to a PDA. If the PDA links wirelessly to the Internet, then articles can be viewed directly from the PDA. There are alerting services that send tables of contents, abstracts and full text to your PDA each time a new edition of a journal appears – HighWire provides an excellent example of this service. If you want most of the facilities of your desktop PC without the hassle of carrying around a laptop, a PDA is a handy addition to your pocket or handbag.

Keep a lookout for more reports over the coming weeks …

Thursday, April 13, 2006

"DSS feeds or whatever ... it all seems a bit complicated"

This is a direct quotation from my mother; I was trying introduce her to RSS feeds so she would be able to keep up with various family members' blogs. My early explanations were evidently not simple enough, so in the end I put together the below as a dummy's mummy's guide to getting set up with RSS feeds.

RSS – an acronym often expanded as "really simple syndication" – is a way of being notified when a website you're interested in is updated, to save you having to keep going to check it. To use it, you need a feed reader. This is a bit of software which is either desktop or web-based. You can tell it which websites you're interested in, and it will periodically check and retrieve updates from those websites.

The one I use is a web-based one so I can check it from various computers. It is called Bloglines: http://www.bloglines.com/1, and having tried a few different readers, I have found it the simplest and quickest to configure, with sufficient (and appropriate) functionality to meet my needs. To set it up:

1. Register: http://www.bloglines.com/register/
This gives you an account to set up with all the feeds you want to read, and you can sign up for some initial feeds on the page you get when you click on the link in the confirmation email (e.g. you could choose "Music lover" from the list on the left, and then sign up for the BBC Music News feed by clicking that box. Or you could choose to be sent Dictionary.com's Word of the Day from the list on the right.)

2. Download the Notifier: http://www.bloglines.com/about/notifier
Choose the appropriate one for your operating system (e.g. Windows, Mac) – this puts a little icon in your system tray (bottom right area of your screen) which will tell you when there are new items in your chosen feeds. The notifier download can also be accessed via the Extras section in My Feeds (Download Notifier).

3. Add some feeds!
Ongoing, the easiest way (I find) to add a feed to your Bloglines account is to browse to the page from which you want to receive a feed, and press a "subscribe to this feed (or blog)" button. You can get the "easy subscribe" code as a bookmark from the "My Feeds" section extras – see "Easy Subscribe Bookmarklet". Once you've got it in your links toolbar, you simply click on it to locate and subscribe to the RSS feed for a web page – try it from this page to see what I mean!

Web pages usually have one of the following orange symbols to indicate that an RSS feed is available: But you can try clicking your "subscribe" button on any page; if the site doesn't have RSS, you will just get a message saying that Bloglines couldn't find an RSS feed for that page. If a feed is available, it might be in multiple formats but don't worry, Bloglines is able to cope with any of them so it doesn't really matter which one you choose. You can subdivide your feeds into folders to help keep on top of them.

You can also set up search feeds that will search all Bloglines-known blogs/sites for a specific term and alert you when it is mentioned; I do this to find out what people are saying about Ingenta. You need to search for your chosen keyword(s) in the box on the top right, and then click "Subscribe to this search" on the results page.

1As the good old BBC would put it, "other feed readers are available": many are as good and perhaps better in some respects than Bloglines. There are useful summaries/analyses of the most popular RSS readers at SearchEngineWatch. If you're brave, there's a pretty comprehensive, and helpfully annotated, list of available readers at http://allrss.com/rssreaders.html. ALPSP members who want to find out more should check out ALPSP's advice note 31 on RSS feeds.

N.B. This post syndicated from Ingenta's All My Eye blog with kind permission :)

Wednesday, April 05, 2006

It's all over now

All good things must come to an end; as ever we must thank Karen Sadler and Alison Whitehorn for their fabulous organisation and support throughout the annual conference. (Ladies, I hope the envelopes you were presented with contained more than a few free biros snaffled from exhibitors' stands). We'll be back in Warwick next year, so see you all there.

For LiveSerials, it's very much not the end though -- if I can make sense of my remaining notes I'll add a few more session reports, but more than that: we shall continue to use the blog as a home for UKSG announcements and information throughout the year. Make sure you have signed up for our RSS feed if you want to keep up to date with news from the serials industry.

Over and out from Warwick.

And they call it puppy love ...

Stephen Abram was due to present the "uneasy relationship" between libraries and Google; instead, Peter stays up on the podium to tell us about "Puppy love versus reality", debunking our infatuation1 with Google Scholar

Peter likes Carole Goble's project, but is less sure about Carole's fondness for Google.

Google Scholar is excellent for undergraduates who need the odd article but not for real scholars -- should be called Google Student!

The reality of Google Scholar is:
  • Secrecy -- about sources, journals, time span, size -- everything
  • Huge gaps in collections crawled -- Google Scholar finds far fewer results than native search engines [I wonder if this is related to how Google displays result statistics; numbers given in Google results do vary and I've heard an explanation for this which I now can't recall ...]
  • Crawling not allowed by e.g. Elsevier
"Information professionals beware of your reputation"

Cited-by numbers are unreliable -- following links shows that the "citing" articles do *not*
cite the "cited" articles. And it displays a number but only shows the first few. Google matches cited/citing references "like a senile neighbour" e.g. confusing zip codes or page numbers for publication years (i.e., it's machine-reading citations using relatively crude algorithms, and results aren't eyeballed for accuracy). Also contains links to e.g. journal subscription rates pages as "scholarly documents". This is a major concern because people use its flawed data (e.g. numbers of results or citations) in analyses and debates; particularly disturbing since there is talk of Google Scholar's citation figures being used for e.g. promotion, tenure and funding decisions.

1 Can an infatuation be debunked, or only a myth?

"Survival of the fittest"

Péter Jacsó1, University of Hawaii -- The Endangered Database Species: Are the traditional *commercial* indexing/abstracting & full-text databases dead?

In a word -- no! Commercial A&I/F-T d/bs are not extinct ... yet.
Depends on habitat.

Indexing only d/bs -- near extinction
Most abstracting d/bs - endangered
Some full-text d/bs - vulnerable

Internal reasons for endangered/vulnerable status --

  • Stagnation e.g. British Education Index on Dialog: its focus is British Education system, but there are more relevant records in ERIC
  • Deflation e.g. Mental Health Abstracts -- journal source has been decimated over the last few years, and it became a waste of money to search it as more content was freely available in PubMed
  • Staleness e.g. GeoArchive -- only 2 updates in 2005. Geobase and GeoRef updated twice a month -- search reveals an order of magnitude more records in the latter databases ... size *does* matter
  • Sloppy production e.g. MHA, Information Science Abstracts: no updates in last year. EBSCO have made LISTA d/b in which nearly all records have abstracts.
  • Flab vs muscle e.g. SportDiscus (prior to EBSCO acquisition) -- too much duplication, seemed big but flabby! And users are paying to access each record. Even Google could bypass the quality provided.
  • Self-destruction e.g. e-psyche. Promised us champagne; didn't even deliver beer. Despite backing of well-known industry veterans.
Peter's forensic evidence (screenshots of search results) available via a link within his presentation (which I'll link to at the UKSG site in due course).

External reasons for endangered/vulnerable status:
Open Access -- 100s of millions of OA indexing records; 10s of millions of OA abstract records (e.g. Medline), millions of free OA full text documents. Threat to full text databases on Dialog "which are as they were in 1976 .. and they are still pretty expensive".

A&I publishers in the triple whammy of commercial competitors + government competitors + smart individuals (who are federated searching OA databases and presenting results for no charge)
-- this is driving enhancements of competitive content
-- innovative hosting platforms e.g. CSA
-- appealing interfaces -- very important (students are spoiled by Ask, Yahoo etc)

An additional problem is self-delusion -- denial and PR-illusion by commercial companies. Databases relaunching themselves -- developments are "Emperor's New Clothes" (again, see links within presentation for examples).

Market is no longer willing to pay for access to A&I databases when the abstracts are freely available from publisher sites and "digital facilitators" -- metasearch engines can freely use the data.

Scirus & Google Scholar "get in the ring"
--Peter disputes Scirus' claim to contain solely scientific data
--Google is "deified" and its citation counts are "very off-base"

Many government databases have smarter software and even offer full text e.g. PLoS, TRIS Online, PubMedCentral, Agricola, NCJRS (which has the best phonetic searching -- Peter tells us he tested 18 misspellings of metaamphetamine [which may or may not be the correct spelling!])

Functionality has moved on in good ways -- no longer just links to the publisher site or the author's email -- now offering links to the references, to lists of other articles citing the current article (which demonstrates value of article)

HighWire's link menus for each article are state of the art e.g. links to ISI's "cited by" records (if the publisher has paid for the service)

Comparison to Dialog's "skeletal" database record; to EBSCO's (which has no abstract); to CSA's LISA (which has no onward links); "Haworth Press are 15 years behind state of the art" -- the cited references are "darn cold" and cannot be clicked on to link to the cited articles.

Users will look on the left and at the top of the screen -- that's where e.g. full text links from abstract records need to be.

Full text is the future; "survival of the fittest" -- those who don't adapt will not survive.

1 I had the pleasure of dining with Péter at the conference dinner on Monday, and am therefore able to advise anyone who wasn't sure that it's pronounced Yotcho (and it's Hungarian)

RAE: top dogs don't slice salami

Jonathan Adams, Evidence Ltd. Research Assessment and UK publication patterns.

The UK's Research Assessment Exercise is a research evaluation cycle which considers output, training, funding and strategy across HE institutions. Peer review panels. Forthcoming changes following recent government budget statement -- shift to metrics post 2008.

The RAE has evidently led to an increase in the UK share of world citations; if citations are a measure of research importance, then UK research is now much improved since the early 80s and has this year overtaken the US in biology and health sciences. RAE is a major driver of research activity in universities. It assesses 4 items per researcher -- including books, articles, proceedings, other works.

Analysis of publications submitted to RAE in 1996 and 2001 -- using publication data to assess comparability between subject areas. Evidence shows that journal articles are proportionally the highest output in science, conference proceedings in engineering, book chapters and monographs more common in humanities, and other content (videos, installations etc.) in the arts.

Researchers submit material for assessment which represents their highest quality work; the assessment will affect the amount of funding received, and departmental prestige.

A shift towards journals is evident -- more journal articles were submitted for assessment in 2001 than in 1996 but in comparing these to ISI's content (Web of Science) it is evident that in some subjects, ISI's coverage is comparatively decreasing -- suggesting Web of Science may be less representative of research in some areas than others (e.g. social sciences less well represented).

Changing cultures -- social science researchers do use bibliometric data to evaluate research quality, but do so in an expert way; journals will become increasingly significant.

(3 days at UKSG caught up with me at this point and the notes I made for the remainder of the presentation make so little sense that they would detract from Jonathan's presentation -- so I have quit while I'm still vaguely ahead!)

Q: Greg Kerrison, Qinetiq. How has the RAE influenced the research process?
A: better overall performance; higher level of productivity. In terms of the way people publish, no evidence that we have increased significantly in comparison to other G8 countries. Suggestions of salami slicing don't seem to be justified; it may be that some (less high profile) researchers are focussing on shorter term goals (in order to have adequate content to submit for the next RAE), but the best researchers are not swayed in that direction.

Incentives, incentives, incentives ...

The transition to electronic-only format: cost and considerations -- Roger Schonfeld, Ithaka

Only with an examination of *incentives* can we find a viable path forward

Based on studies in 2003 (11 libraries), 2004 (publishers):

Larger publishers (including NFPs, university presses) have already flipped business models from principally print to electronic-with-print-as-add-on -- pricing has evolved to mitigate the effects of print cancellations on the bottom line (site licenses/tiered pricing).
Larger publishers have significant resources to invest in making the transition, and considerable in-house expertise on which to draw.

Smaller commercial publishers, scholarly societies and university presses -- in a few cases, journals are not yet available electronically; where e-versions do exist, costs have not always been separately tracked, which makes it hard to develop pricing outside of the print model. Focus is more likely to be on humanities/social sciences, which may be responsible for a perceived lack of urgency for going electronic/developing new business model. As may, for example, high dependence on advertising, or high image content (e.g. art history or biology). If there were a dramatic move away from the print format, what would their future be?

Costs include not only subscription but selection process, cataloguing, storing etc -- these costs are all lower in electronic format in comparison to print, which is a non-trivial incentive to move away from print and to de-duplicate multiple-format collecting. E formats have provided opportunity to increase size of journal collections.

Economies of scale
For libraries, economies of scale exist primarily for print, not electronic; as print journals are transitioned to electronic, unit costs go up dramatically. Thus the decline of print subscriptions *raises* non-subscription costs substantially at large libraries (which would seem counter-intuitive). "As print collections shrink, will libraries be motivated to move away from print all together?" -- at the very least, there does seem to be an incentive to redesign library processes to try to recapture some of the costs.

It seems inevitable that all scholarly journals will have an e-version before long (this is not necessarily the case for books). Several different models could be used to help the transition e.g. collaborations such as BioOne; outsourcing to commercial publishers... each option has its own tradeoffs.

In some cases, could there be no sustainable way to publish an e format? -- some journals may end up replaced by disciplinary/institutional repositories, blogs, and other less formal distribution models.

A business model which is entirely reliant on print today, but is intent on flipping to e format, may result in significant price increases. Libraries should employ programs to consider percentage cost increase and "respond with empathy, else they may unintentionally punish lower-price publishers".

Does OA have a disproportionate effect on lower-price publishers who (a) haven't made the transition to OA and (b) haven't even made the transition to electronic -- this additional pressure on smaller publishers has not really been given much air time in the great OA debate.

Library process
The move away from print is inevitable, "whether or not it is managed strategically". A 'strategic format review', whereby a target for journal cancellations is planned over a timeframe, offers an opportunity for a tactical retreat from print and can permit effective cost savings. This is nonetheless politically complicated.

Collecting in an e-only environment means libraries don't *own* their acquisitions in the same way, which can complicate archiving when one ceases to collect print. Which types of e-archiving processes are appropriate -- are any ready for comfortable dependence? Efforts include Portico, LOCKSS, British Library legal deposit for e-journals, Dutch National Archive.

Following the transition to electronic formats, is the cost of print holdings justifiable?

What incentives can be developed to ensure the survival of "appropriate print artefacts"? e.g. libraries paying one another to continue holding print.

1. We (the entire serials community) should consider with greater care how traditional society and university press publishers will make a transition to an e-only environment
2. A strategic format review has significant advantages over a chaotic transition
3. Archiving must not be forgotten, for both electronic and legacy print collections.

Q: Diana Leitch, Univ. Manchester. What about the users? We've gone a long way down the e-road, and the demand across all subject areas for e-content is high -- there's a lack of realisation/understanding that some content is not yet electronic.
A: It's clear that there's a growing acceptance of the electronic format, certainly in the sciences and growing elsewhere. Faculty members may not use the bricks-and-mortar library at all, whilst still making regular use of its services; they increasingly suspect that they will cease to depend on libraries, which translates to less economic demand for libraries (there is a lack of understanding about what libraries do). Libraries need to be making a case for themselves in a way that hasn't been necessary previously.

Unbelievably Knackered ... Still Going

It's Wednesday morning and today's expansion of the UKSG acronym is

Unbelievably Knackered ... Still Going.

Masses of fun was had at last night's dinner and quiz as usual. Congratulations to the winning team, whose witty team name escapes me but whose members included Loughborough's Charles Oppenheim, OUP's Richard Gedye (who apparently provided nearly all the answers), IOP's Judith Barnsby and about nineteen others (sorry, but I was already a tad tipsy and am unable to recall the rest of you -- please feel free to identify yourselves and claim your share of the glory). I hope we'll be able to post/link to a snap of the winners in due course (any offers?).

The competitive spirit continued on to the dance floor where shapes aplenty were thrown well into the night. (Top moment: Stevie Wonder's Superstition; lowest moment: two consecutive Shania Twain tracks). I called it a night at a relatively sensible 1.30am. M'learned foolish colleagues carried on partying until 4, although whether 8 people sharing 1 bottle of wine in a kitchen-cum-laundry constitutes a party is debatable...

Tuesday, April 04, 2006

It's snowing!

Safe places

Erik Oltmans, National Library of the Netherlands (KB). "The International e-Depot"

KB Policy background. E-journals dominate the field of academic literature; as Gordon said, who will take care of the long term accessibility of international e-journals? In print world, local libraries took care of own country's output but this model is no longer sufficient (harder to determine the place of origin for e-publications). If there's no obvious guardian, we risk losing the information.

We could ask publishers to deposit in every national library, but they are unlikely to do so. We should spread the geopolitical risk and identify a small number of trustworthy partners -- collaboration/coordination required -- creating centres of expertise, "Safe Places Network".

"Safe Places Network" ensures systematic, coordinated preservation. Gives libraries a place to get lost content. Publishers need to deposit in a timely manner. Permanent commitment required from archive organisation, requiring substantial investment, permanent R&D (into changing solutions) -- continuous effort. KB is a part of "Safe Places Network".

Risks to regular access provision -- potential disruptions e.g. catastrophic event at publisher's server; withdrawal of publications (commercially motivated); technological obsolescence -- always a key issues: inaccessible file formats.

Archiving agreement between publisher/safe place is critical to cover all eventualities. Should any trigger event occur to disrupt access to research libraries/end users, the archival library can deliver the content.

Mellon Statements (sep 05) endorsed by Association of Research Libraries. 4 essential key actions:
1. Preservation is a way of managing risk
2. Qualified archives provide a minimal set of well-defined services -- storing files in non-proprietary formats [i.e. not PDF?]; restricting access to protect publisher's business interests, unless publisher cannot provide access; ensuring open means for auditing archival practices
3. Libraries must invest in qualified archiving solution -- either its own, or an "insurance collective" (like the Safe Places Network)
4. Libraries must effectively demand archival deposit by publishers as a condition of licensing electronic journals

KB allows 1Mb of storage for each e-publication -- 1 Terabyte for 1 million publications. The project is ingesting anywhere from 5,000-50,000 publications per day. The system is designed to ingest e-journals, e-books and CD-ROMs. Authentic publications are archived, standard formats (PDF [how does this tie in with Mellon 2?] XML). Publications are validated on ingestion by checksums, JHOVE (checks integrity of PDF files) -- procedures for error handling kick in if necessary. Conversion of metadata from proprietary DTD/schema.

2 key strategies for digital preservation, both studied at KB:
  • migration -- files continually converted to newest format
  • emulation -- whereby future users experience original look and feel of document

e-Depot does not compete with publisher-provided access; access is on site for KB visitors or via ILL within the Netherlands (if content is only available within KB). Remote access can be enabled if permitted by the publisher (as some OA publishers do). Retrieval, access, printing and downloading are only allowed for private use; systematic reproduction prohibited; real-time monitoring of user behaviour to prevent abuse. Thus usage is currently limited but as yet no "trigger events" experienced to require broader access.

  • growing volume of international e-journals without "natural fatherland"
  • must be preserved by institutions who take responsibilities -- systematic and coordinated by means of Safe Place Network
  • Mellon Statement defines essential key actions; in line with KB policy
  • KB has made long-term commitment to be part of the insurance collective
  • new publishers welcomed
  • seeking international collaboration
Q: Greg Kerrison, Qinetiq. Concerns about confusion between preservation and access. If you do have a trigger event enabling broader access, a choice will then need to be made between access and preservation. Access always wins; preservation will suffer. Isn't it better to have a plan to provide that access via an intermediary, enabling separation of preservation site from access site?
A: yes, if a trigger event occurs, we would not want to enable access alone -- we'd want to involve third party software vendors to provide that access. Storage and preservation is our daily focus. We don't have special user interfaces; only in exceptional circumstances will we need to provide much access.

Q: Charles Oppenheim, Univ. Loughborough. Are publishers contributing money to this project? It does seem to be an insurance policy for them.
A: not right now -- business model is easy -- no financial transactions. This may change and we are negotiating with larger publishers to find an appropriate model.

Q: Robert Kiley, Wellcome Trust. How do you ensure the integrity of your database? At PMC, public access to the content exposes any missing data. How do you know you've got a complete archive if no-one is accessing it?
A: technically, via the checksum procedures at submission -- this makes sure that everything supplied is loaded. If a publisher fails to supply something, we will need to compare our data pile to theirs. We may begin applying checksums to existing data to ensure it is still OK.

Q: Ahmed Hindawi. If no-one is using your dark archive, how can you know if there's a problem with the data due to a software bug or similar? Exposing your archive is a way to get your content checked.
A: our administrative procedures ensure we know the versions of the files we hold, and we have a preservation manager tool which enables us to couple file versions with technology and ensure that the data is delivered through the right software to avoid versioning bugs. Our migration & emulation studies are also helping us to find appropriate solutions.

Q: Greg Kerrison, Qinetiq. Would it be a good idea to convert your PDFs to PDF-A (archival version) when that's realised?
A: it's an option, but we are concerned that we should not lose functionality within our PDFs.

Q: Gordon Tibbitts, Blackwell Publishing. The archive should not be considered to be the *primary* source for future delivery, and we need to focus on preservation - shouldn't keep getting lost in access-related discussions.

"Archiving should be done by librarians and archivists, period."

Gordon Tibbitts, Blackwell Publishing.
Quotation from Mellon Foundation, Sep 05 "Digital preservation represents one of the grand challenges facing higher education". Archiving is about preserving. Who should be doing it? What should be archived? What are the current solutions, where are they, how do they work? What critical success factors are there?

"Archiving should be done by librarians and archivists, period."
Publishers often think it should be them -- they should assist (fund), but they aren't the best at it -- they often can't find missing issues ... libraries have them. Publishers have other roles.

Practically: we have to decide what we want to archive. Appropriate content in 3 broad categories:
1. Scholarly journals and books
2. Research material supporting these works (e.g. pre and post-prints, reviews, lecture materials, data)
3. An emerging type: content built from the discourse surrounding scholarly works e.g. blogs, LMS, lecture notes, social networks, conferences, podcasts, message boards, online "webinars". (Where does this type of content start and end?)

Ideally, we should agree to move these types of content from remote locations to a centralised location -- preserving requires a number of things to ensure the material is acceptably stored, and ready for future transitions. Archiving is not necessarily about access, and the focus should be on preserving; only providing non-complex access for restoration purposes.

Various national archives --Dutch National Library, British Library Legal Deposit
  • follow strict copyright requirements
  • allow scholars to have on site access
  • only in the process of evolving ability to provide catastrophic recovery of "lost" works
  • some have govt funding, others thinking about cost recovery mechanisms (which could produce conflict of interest later -- putting a toll gate on the archive)
"Product solution" archives are provided by publishers (really just a big data store, not actually an archive); NFPs such as Portico; even governments such as PubMedCentral (but what's their plan for content preservation long term? Are they not just about content delivery?)

A critical step for an archive should be that material is deposited but is not intended for delivery i.e. in but no out -- which disqualifies most publishers' content piles from archival status.

Institutional repositories constitute "roll your own" archives e.g. D-Space, LOCKSS, eprints, fedora. These mostly contain type 2 content (above); barely anything has been done to store type 3.

"Community-based" archives are emerging which may lead to a networked solution where disparate data stores act as archives linked together by catalogues/indexing solutions. Could be the way forward for type 3 content. CLOCKSS is an example of community archiving.

Critical success factors:
  • governance -- who's in control? Will governments censor? Is there an archivist/librarian running it? It's worrying to think what issues might enter into government policies, thus inducing them to prohibit access to, or not store, certain content? It's worrying if any single entity has control over the archive.
  • economic stability -- how is it funded? by libraries or publishers? We shouldn't lose the chance to create a long term archive by focussing on access and thus antagonising people who make their living out of delivering content.
  • technical soundness. Is it really an archive? Are the standards open for scrutiny? Is the community involved at decision level?
  • community acceptance -- need to know it can be relied on before libraries will cease their own efforts.

Q: Anthony Watkinson. There is a taxonomy on the way as part of UK Legal Deposit. A number of publications have non-text components which can be essential to the message of the scholar. DO you know of any serious efforts to archive and preserve this additional content?
A: I include this in type 3. There are some protocols which have considered multimedia archiving. It is important to realise that storage is one thing, but interoperability with the rest of the scholarly community is key. How do you classify the metadata to provide access to this kind of content? Seems largely to be free text now. LOCKSS does a good job of storing multimedia and is used by about 150 institutions -- but perhaps it's not providing the interoperability that it should.

Q: Bill Russell, Emerald. Resource allocation: we don't know what the future holds. As a publisher, how should we prioritise our resource?
A: many are supporting archiving solutions; generally lots of them, in the hope that one will stick. If we're talking time rather than money, archival considerations should be a component of everything we build -- investing more time in metadata standards.
Q: what if you're a smaller organisation which can't handle the additional requirements?
A: Major institutions & publishers have the time/energy/resource and should enable a mechanism for smaller publishers or lone scholars to get their content into a data store.

Q: Bob Boissy, Springer. Do you mean standards for archiving/preservation should be centrally managed, or that the hardware for the archive itself should be central? (Surely you'd prefer massively redundant distributed server system)
A: I mean that you need to bring things into an archive, you can't just leave things linked -- URLs go out of date. The infrastructure can nonetheless be distributed as LOCKSS is.

"Publication costs are just another research cost"

Robert Kiley, Wellcome Trust -- Medical Journals Backfiles Digitisation Project & open access

Project funded by JISC, Wellcome. Supported by several major publishers, and digitisation being carried out by National Library of Medicine. Focus is on providing key teaching resources in history of medicine. Product is available to anyone with a browser but is chiefly targeted at clinical community. Content goes into PubMedCentral, where it is readily discoverable via e.g. PubMed, Medline, Google.

Digitisation is expensive; chose journals based on historical importance, impact factor and comparison to existing titles in collection. Coverage from e.g. 1809 (Journal of the Royal Society of Medicine); 1857 (BMJ); 1866 (Journal of Phsyiology) and including seminal papers. Participating publishers have to agree to backfile digitisation but also to deposit ongoing content in PMC (an embargo is allowed).

References are extracted and matched to PubMed. Underlying data is integrated with the text -- programmatic text mining enables linkage to e.g. chemical compounds in PubChem.

Wellcome Trust now mandates that research it funds must be deposited in PubMedCentral, and it will provide additional funding to cover author-pays publishing fees. Having all content in one place enables more analysis of funding usage.

Currently aiming to create a UK version of PMC which will provide a mirror service and a local manuscript submission system; working with SHERPA to create a database of Wellcome-compliant publisher-archiving policies -- so authors can easily find out whether the journal they want to publish in is compliant with Wellcome's funding regulations.

Publication costs should be recognised as "just another research cost". RK expects that a combination of OA publishing and OA repositories will change the way biomedical research is disseminated, and that improved access to research papers will lead to additional medical discoveries.

Q: Rick Anderson, U. Nevada Reno. I'm concerned that you think publication costs should be considered part of research costs; doesn't that mean less money for actual research?
A: Yes, our figures suggest between 1-2% maximum, which we think is worth it for the improved access to the literature [that author pays OA publishing offers].

Q: Anthony Watkinson. You don't describe how you're going to give the money for payment. Will you hand it to institutions and let them decide whether to give it to researchers?
A: Yes. We have a list of UK universities at which our researchers are based, and we have given them a block of money which they can use to e.g. take out a subscription to PLOS. We don't want to subsidise every single research grant in those institutions, but we do enable the money to be administered by the university (rather than the individual researchers).
Q: who are then at the mercy of their institutional administrators?
A: the grants are not capped; if it can be demonstrated that the money is being spent on funding publication of Wellcome Trust papers, we will top it up when it expires.

Notes from an exhibitor

A busy day yesterday, with time split between keeping the stand staffed, and going to the very interesting papers...the first paper of the day was extremely interesting - reminding everyone that scientists are not really interested in the format of information, and do not make a distinction between information in "published" sources, and other kinds of data. Scientists are already using automated systems to get around such distinctions... very interesting indeed. Always useful to be reminded of the "view from the end-user".

Monday, April 03, 2006

"Facilitators, not gatekeepers"

Linda Stoddart, UN Library (NY) -- From Support to Mission Critical: United Nations libraries in transition

Trying to move from being a "nice to have" service to an essential resource, primarily for financial reasons, as the library is not currently being used. What's important for our staff, and for the delegations representing the 191 United Nations missions? -- most come in with their laptops and Blackberries (sp?!). How do we be sure we are there for our clients?

The library is now managing the intranet which has completely changed our role, and we have become technology consultants.

We are looking to communicate our new vision, and develop a strategy. Big changes for staff in the library many of whom have been in their roles for a long time. It's bureacratic. We need to celebrate our successes and learn from our mistakes.

"From collections to connections" -- summarises our approach. People to people. Maybe 20% of staff/delegations are using our libraries -- how do we service the other 80%? Last year we launched a lectures and conversations series with presentations in the library's auditorium. We brought in key speakers (e.g. Kofi Annan; the President of the General Assembly). High profile events to create a knowledge-sharing opportunity -- e.g. a lecture series on the 2004 Asian tsunami.

Changing skill sets -- we need to embolden people to communicate our new vision. We need extroverts. Streamlining processes; creating partnerships.

We need to deliver what's important to senior management, and the intranet has helped us to do this by providing a device for supporting dialogue
  • supporting core work (UN Reform)
  • creating trust between management/staff
  • providing ideas for internal messages
  • assisting use of all new technological tools
  • partnering with other organisational units e.g. IT, HR (library has been very marginalised in the past)
Learning to influence decision making
  • what do people need to know, and when
  • how can applications/tools be used effectively
    • IT dept too busy to provide this level of support; library is on the front line
Within the UN Secretariat -- there are political considerations; low usage of print publications, medium use of e-data; high use of news sources and direct contacts

International organisations are hierarchical -- decisions at high level, and often limited communication of those decisions.

Junior staff are not part of this process and lack a sense of responsibility. How can we embolden people, and change this organisational culture? It is formalised, structural; rewards are based on rank, and there's limited change/risk-taking.

Old -> new
Bureaucratic -> enabling staff to take initiatives
Multi-leveled -> mobility in functions and amongst sections. Some staff had spent their entire career indexing. Not healthy? Seemed normal a couple of years ago! People are ready for a change, and feel part of it.
Policies/procedures that focus on process -> p/p that facilitate meeting client requirements. Lots of time was spent on internal library issues, and staff were blind to what clients wanted.
Silos -> team-oriented.
centralised -> empowering
introverted -> extroverted
focus on activities in library space -> networking and coaching. Learning to use the space -- renovations for the first time in 50 years; an opportunity to rethink our facilities -- better training; video conferencing services; some spaces still for quiet research, but also an area for networking.
slow decision making -> quicker
defensive -> open to feedback (being sensitive to real needs)
insecure -> confident

Training and development is key:
emphasis on library processing technique -> focus on inter-personal skills e.g. interviewing/coaching techniques
learning new library management systems -> understanding content management tools. Integrating everything we do -- email, records, database searching. OPACs are out of date. We no longer need discrete systems.

Questions we ask ourselves: What information (a) do staff need (b) should be shared (c) is needed when, and in what form -- and how should information be organised, stored, accessed and communicated?

New interface - i-seek -- is now the only thing UN senior management know about the library. Links off to all relevant information -- HR, content for new staff, messaging from senior management, etc.

Changes in outlook and attitude -- we are embracing new opportunities, and moving in new directions -- using skillbase of facilitators, not gatekeepers. Identifying new approaches to knowledge sharing and organisational learning, in order to influence decision making process.

Obstacles ... bureaucratic procudures (hard to make decisions); new skills require more training (recruitment is slow); staff still feel boxed in despite understanding the need for change

Opportunities ... to be more flexible; to adopt new skills -> roles -> responsibilities; flexibility and experimentation; team approach = networking, partnering; staff have new challenges.

Changing perceptions: new signals and symbols.

"Whistling past the graveyard"

In the first plenary session of Monday afternoon, Rick Anderson stepped up to ask "What will become of us? Looking into the crystal ball of serials work".

What's already happened
  • information has become much more abundant and accessible -- less need to visit the library to locate content
    • content is no longer king?
    • information *seems* cheap and ubiquitous to patrons -- and this user perception will shape librarians' future
  • attention has become much more scarce
    • users have less time and are less willing to invest it in looking for content
  • the information world has become a *fundamentally online* place
    • librarians need to come to terms with the user notion that "if it's not online I'm not interested"
At U. Nevada, Reno, usage of online content is going up; dramatic drop in circulation since 1994. Number of items checked out per student down by 45%, and further if you exclude DVD check out. If we don't acknowledge these changes to circulation, we're whistling past the graveyard (i.e. putting on a brave show in denial of our worst fears). How far down are these numbers going to go before they stop? What will form the "hard floor" of materials that continue to be used? -- answers likely to vary from institution to institution.

Things likely to happen next
  • The amount of high-quality information available at no charge to the public will continue to increase
    • "follow the money" -- in the last decade, lots of people have worked out how to make money from putting free content online
  • The percentage of high-quality information available at no charge to the public will never reach 100
    • the OA movement will continue to grow and develop, but Rick is agnostic and suggests it's not likely to replace scholarly publishing as we know it
  • Of what remains non-free, we will continue to purchase the wrong things for our patrons
    • one of the biggest elephants in the crowded living room of our profession is the large amount of money being spent on content that's not necessary
    • we must deal with the elephant -- as our funding bodies will reevaluate how we are funded
Things that are quite likely to happen
  • 1Laptops will replace desktops, at least among students (and mobiles may replace laptops)
    • *compare # of laptops in your library 2 years ago to now
  • Something like Google Print will emerge and take hold
    • *remember: follow the money -- Google have dramatically demonstrated how much money can be made providing access to content
    • *we can see this in Yahoo!'s nascent e-book project, or Amazon's "Search inside the book"
  • Journal inflation will continue, and library budgets will not catch up
    • *tax payers are unlikely to rise as one and insist that civic leaders give libraries a bigger budget ...
    • U. Nevada Reno had a flat budget this year compared to last, and cut monographs purchasing to protect journal subs
    • *when Rick asked, hypothetically, "what would you do if your materials budget was cut in half", he was disappointed to learn that people considering looking at their statistics and cancelling some serials...
      • some folks said they would stop buying books all together -- Rick was surprised, but "we might be forced into this as a short term measure"
What does this mean for serials and acquisitions work?
Laptops -- more remote access = fewer people in the library. Gate counts are already low; what if this declines further as users no longer come in to use work stations? It will become hard to justify staffing/existence even if services are still valuable -- will they be perceived as such?

As more info is free online, it will be harder to justify materials budgets. Administrators are desperate to make cuts. Can we make compelling enough arguments to keep our budgets?

Google Print = OPAC flight. Despite sophisticated work to create them, OPACs are crude -- and often "actively user-hostile"; Google may just be a full text index, but its deceptively simple interface is customer-focussed and masks v. clever back end processes.

Since not all information will ever be free, patrons will need someone to pay for it -- but will that be a librarian? U. Nevada Reno's collective purchasing in recent years has involved very few librarians.

Patrons need to get information more quickly -- faster, and more targeted access. We will have to find a way to deliver this.

  • more information, more broadly available
  • less usage of printed materials
  • more remote use of library resources
  • less use of the OPAC
  • more difficulty justifying staffing and budgets
Q: Todd Carpenter, BioOne. As more quality information is freely available (one of this morning's speakers mentioned that 20% of users think they're accessing OA titles when they're not) - is there anything that publishers and librarians can do to overcome this perception that information is free?
A: Users don't care -- our goals should be getting that information to them as transparently as possible. But there does need to be some level of awareness to support expenditure. Some databases do offer customised branding, which can help as long as it doesn't get in the way.

Q: Alexis Walckiers, ECARES. You say you will have more difficulty justifying staff/budgets. In my experience, having information scientists to demonstrate layers of quality of publications is important. Could this be the librarian role of the future? Also, users find it hard to get to information, and need to ask librarians for assistance in locating it. Librarians are still better at discovery.
A: these are two key areas -- we shouldn't try to get patrons to change their behaviour, but to affect students we should work more closely with faculty and get our services integrated into the curriculum. Faculty members have a power over the student that librarians don't; use it.

(For some more details, see my review of Rick's presentation at February's ASA conference)

It's Monday morning ... "Mice Love Lard"

UKSG has officially started, and first on the podium (following retiring chairman Keith Courtney's welcome, above) is Carole Goble from the University of Manchester, with an excellent review of how workflows can be employed to better connect researchers to the collective content they use. My on the spot notes below:

Bioinformaticians' (people working in life sciences) daily work = identifying new overlapping sequences of interest -- looking them up in databases and annotating to indicate similarity to genetic sequence under investigation.

Example: 35 different resources; all with web interfaces; many publication-centric. Copy and pasting content from different resources, annotating by hand. Can't replicate or log activity to see if it's been done accurately

Bioinformaticians do not distinguish between data and publications; publishers need to recognise there's not a difference between these 2 types of content for users.

Heretical view: CG doesn't read journals -- but does read content on a pre-print service (journals are outdated).
Where conference papers turn into journal papers -- the first iteration may well be the Powerpoint.
"Google is the Lord's work" -- "I haven't been to the library for 14 years!" -- can find it from laptop and send a PhD student to the library if really necessary ...

Workflows: computerising the research process. Enabling machines to interoperate and execute the necessary processes.
"Workflow at its simplest is the movement of documents and or tasks through a work process" (Wikipedia)
Simple scripting language specifies how steps of a pipeline link together -- hides the backend fiddling about
Linking together and cross referencing data in different repositories -- including serials.
Everthing needs to be accessible to the workflow machinery -- including serials.
Results can then be annotated -- semantic metadata annotation of data -- *and* provenance is tracked accurately for future checking/use. You can then reuse, or amend and reuse, your workflow. So the workflow protocol itself becomes valuable, not just the data (therefore need to test thoroughly to make sure it runs oin different platforms etc.) CG cites Taverna Workbench research project: still just a flaky research project but 150 biocentres already using it. And it's just one of many workflow systems.

Workflows can cut processes down from 2 weeks to 2 hours. Publishing workflows enables them to be adapted and shared throughout the community.
e.g. Use of PubMedCentral portal to make into web service for machines to read. Life sciences databases interlink e.g. Interpro links to Medline - these links can be used to retrieve article. XML result is "just" part of the workflow and can be processed and used further down the worklow. Extra value service e.g. Chilibot -- text mining, sits on top of PubMed and tries to build relationship between genes, proteins. Can again be made into a computable workflow. (Using this workflow, the scientist was able to discover that Mice Love Lard.)

Some results will need somebody to read them! -- mixture of machinery and people.

Termina software (Imperial College & Univ Manchester?) looks for terms and recognises them to associate them with a term from a gene ontology -- using text mining -- but would be easier if text mining wasn't necessary i.e. if terms could be identified and flagged at point of publication. The information/knowledge (that these terms are controlled vocabulary) is there at the point of publication -- so why lose it, only to have to reverse engineer it later.

(This reminds me of Leigh Dodds' paper "The Journal Article as Palimpsest", given at Ingenta's Publisher Forum in Dec 2005 -- view slides .pps).

Several projects working on this -- Liz Lyon's eBank project "confusogram" -- escience workflows & research feeding institutional repositories, but also conference proceedings etc. At the time of data creation, annotation is done -- publication & data are deeply intertwined -- breaking up the silo between data, experiment & publication.

Active data forms a web of data object and publications -- all combined together. Workflows also offer provenance tracking - at the point of capture, giving you evidence for your publication which should also be used within the publication.

Web services, workflows
-> publications need to be machine-accessible.
-> Licensing needs to work, so workflows can be shared
-> DRM, authorisation, authentication all need to work
Integration of data and publications
->workflows need to link results -- need common IDs
Semantic mark-up at source
->need better ways to interpret content
Text mining
-> retro-extraction is more useful if it can read full text not just abstract

Why isn't workflow/data published with publications?
-- privacy/intellectual property; workflows give too much away! Need to be slightly modified before publication. They do also need to be licensed -- what model? -- to enable reuse/sharing of results/workflows.

o machines, not just people, are reading journals
o if journals are not online, they are unread
o workflows are another form of outcome which should be publishing alongside data, metadata and publications
o Google rocks!


Q: Anthony Watkinson: what should we do to support our editors to offer the best they can to life scientists?
A: Life scientists want semantic mark-up & web services, so they can avoid expensive, unreliable text-mining. So we need to be able to access the journal's content through a computational process -- and ensure that the same identifiers are being used across databases.

Q: Richard Gedye, OUP. Happy to see an OUP journal being used ... wrt controlled vocabularies-- how many of these are there? Should we ask our editors for their advice on which to use? Are there standards?
A: http://www.bioontology.org -- big US project bringing together all the bioontologies being developed in the community; controlled vocabularies only make sense if there's community consensus. Is very much the case in life sciences but different levels of endorsement.

Q: Peter Burnhill, EDINA: fingerprinting vs controlled vocab -- is the need to access full text primarily to discover relevant material, or to provide access to it?
A: Both, we want to be able to put it into the pipeline so need to enable access. But also need it for discovery, and primarily (now) for text mining.
Re. fingerprinting -- helping to create Controlled vocab as well as identifying common terms.
(to what extent is there a limit to controlled vocab and does it need to rely on a lower level identiication structure?)
You do need both -- and identifiers representing a concept, because the words being mined will change. Building controlled vocab is an entire discipline in itself...