LiveSerials: April 2011

Friday, April 15, 2011

The UKSG Echochamber

OU lecturer and UKSG speaker Tony Hirst (@psychemedia) put together a nice graphic during this year's conference, to show who was using the #uksg hashtag on Twitter. Check it out below (click on the image to see it in more detail, or see some of his other UKSG visualisations on his Flickr account).

Labels: Conference, flickr, hashtag, twitter, uksg

Sunday, April 10, 2011

Academic e-resources in the UK: promoting discovery and use

Vic Lyte and Sophia Jones from Mimas, The University of Manchester, presented on The UK Institutional Repository Search (IRS), which is a Mimas project commissioned by JISC in partnership with UKOLN and SHERPA. The project was completed in July 2009 and the service has been running continuously since then.

Content stored in institutional and academic repositories is growing and they recognise that there are limited ways to access this information. This project has taken cross-search and aggregation to the next level, creating a visionary platform that pulls together disparate content, making it easier to search and discover in ways that meet personal or contextual needs.

They demonstrated how the search works including an impressive 3D visulisation option.

They gave an overview of the JISC Historic Books and JISC Journal Archives products. They also talked about the JISC Collections e-platform enabling cross-agregated search of a unique resource from the British Library. Content (300,000 books) previously inaccessible will be searchable on the platform. Features include three types of search (exact; detailed and serendipitous) and tabbed filters; Google-style listings and search clouds. It all looked very impressive.

Labels: breakout session 18

Driving usage - what are publishers and librarians doing to evaluate and promote usage?

Sarah Pearson from the University of Birmingham kicked off this breakout session and outlined her experience of collection development analysis at her institution. She went on to explain that while they have been doing this for some time, usage alone doesn't tell the whole story. They have been looking increasingly at how users get access to content and what path they take.

Sarah highlighted the numerous ways they promote usage at her university. These include news feeds about new acquisitions and trials; making content available in resource discovery interfaces; activating in link resolvers (SFX); integrating with Google Scholar/A&I services; making authentication as seamless as possible and embedding in apps on other sites.

There is a Mylibrary tab on the institutions' portal page and a library news section, which are widely used. Users can search the library catalogue direct from the portal page of the university rather than go to the library pages.

They are also about to user Super Search on Primo Central, which will be embedded in the virtual learning environment and Facebook.

To analyse usage they use a number of services including in-house templates that compare and contrast big deal usage with subscription analysis; JUSP (Jisc) and SCONUL Returns. They look at JR1 reports and evaluate cost per use. They pay particular attention to those resources with low or zero use. They also look at DB1 searches & sessions and compare archive with frontfile usage.

With budgets under threat librarians are looking at cancelling poorly performing content and big deals, for example, have to demonstrate overall good value.

The University of Birmingham approach, Sarah explained, is to activate online access content everywhere and let the user decide.

Google Analytics is being used to look at user behaviour now and to help understand more about their journey to access content. They know that the Institution's portal page is the number one access point but the OPAC and Google are still high referrer sites. There is a low number for access via mobile devices but they expect that to increase.

Evaluating usage is still very manual and it is labour intensive to measure the ROI of resources. It is important with increased pressure on budgets to ensure librarians are making the right decisions about which content to subscribe to and purchase. Evaluating usage is an important step in doing this.

Christian Box from IOP Publishing followed on with an interesting presentation about the work they are currently doing at the Institute of Physics. By sharing data between publishers and librarians, he said, we can make the industry more efficient.

I was particularly interested to hear more about the video abstracts they launched in February this year. Authors can now submit video abstracts and so far they have had over 10,000 video views. The human factor is important in engaging with students and researchers and helps to humanise the journal by conveying the inspiration and enthusiam of the author or editor.

Publishers can learn a lot from evaluating the data they have such as seeing which research areas are growing. Web analytics; train of thought analysis; traffic dashboards including social media indexes and extended metrics such as A&I services are all important.

Platform development and ensuring connectedness is key. SEO is still vitally important here.

Social Networking/Media activity and how it impacts on usage is difficult to track. Physics World has 8,510 follows on Twitter.

Local language sites (Japan; Latin America and China) have moderate but growing traffic so far.

Access via mobile devices including iphones and ipads is growing and publishers need to operate in this space to ensure users can access content wherever they are.

Challenges for publishers and librarians alike include creating new and meaningful metrics to cope with the rate of industry change; niche areas of resarch and primtive metrics.

As Christian stated at the beginning of his presentation, it is important for librarians and publishers to work together as much as possible and share data to increase efficiency wherever possible.

Labels: breakout session 5; driving usage

Friday, April 08, 2011

William Gibson and the future of libraries

On day one of the UKSG 2011 Conference, John Naughton (The Open University and Cambridge University Library) paraphrased William Gibson, 'The future has already arrived...stop trying to predict it.'

'We are living through a revolution and we have no idea where it is going,' he suggested. He used the term 'information bewilderment' to explain further.

Capitalism, he argued, relies on the creative destruction of industries in waves of activity. This is exciting for those on the creative side but scary for those on the destructive (ie newspaper and music industries) side.

Obsolete business models are at threat and everyone at the conference is affected, he warned. In the digital age, 'disruptive innovation' is a feature and a way of cutting out the 'middle man' to create profit.

He cited Amazon Kindle Singles as an example, whereby they invite authors (previously published or unpublished) to publish shorter articles (longer than a magazine or journal article but shorter than a novel) as an e-book on the Amazon Kindle platform.

Prediction is futile but you can measure changes. Complexity is the new reality and the rise and rise of user-generated content offers numerous opportunites for end users to 'cut out the middle man' (ie publishers).

In the old ecosystem there were big corporations while the new ecosystem relies on everything being available in smaller chunks on content (tracks not albums, articles not journals etc).

What's it got to do with libraries?

There is an intrinsic belief that libraries and librarians do good work but a wave of 'creative disruption' doesn't care. Libraries have traditionally taken a physical form and one of the debates has been about how to maintain the idea of a 'library' when users are increasingly accessing content online. When all academic activity takes place in a digital environment (soon?) how will libraries justify their existence (from place to space)?

John Naugthon ended his presentation by suggesting librarians could add value by building services around workflows (social media; rss feeds etc) as the everyday avalanche of data crys out for the skills of the librarian to create order.

'The best way to predict the future is to invent it.'

Sounds like good advice for those of us in publishing too.

Labels: Plenary session 1

Thursday, April 07, 2011

Collections 2021: The Future of the Collection Is Not a Collection

Rick Anderson invites us to think back to the 1980s and ponder how we answered questions that arose - say you wanted to know the population of a certain small country, or the migratory pattern of a certain type of whale. In the 80's you had a couple of choices: either you wouldn't try to answer it, or if you were lucky you would go to a good library and look in a relevant book. However, access to a good library was an elite opportunity, and most people in the world didn't have access to one.

Anderson calls this pre-internet age the "Gutenberg Terror". Print, he says, is a terrible medium for distributing information. During the Gutenberg Terror the library was an information temple with the librarian as high priest to grant the sacred knowledge. Now the library is a store front - one of many store fronts offering access to information at a price.

As such, many traditional librarian roles have been undermined. Reference services are largely bypassed (although Anderson points out that with a reference desk staffed at various times by one or more of just 25 librarians serving over 30,000 students at the University of Utah, the chances of even a small percentage of students getting constructive help were always small. "It works by failing").

Library catalogues are all incomplete. An OCLC survey of the general public showed just 1% of electronic information searches begin at the library catalogue. Perhaps not surprising, but when college students were surveyed the number rose to just 2%. Initial circulations per student have also decreased dramatically at Utah.

The traditional library collection is a "bad guess at patron needs", but it was all that could be done in the print era. With recent budget cuts even electronic collection development is becoming hard to defend. The wrong e-book, even at a deep discount, is still the wrong book, and Anderson doesn't want unused books in his library. If 80% of an e-book package is used and only 20% not, that's still a number of books that he could have replaced with something his patrons would use.

And so to Patron Driven Acquisition, the purpose of which is to avoid wasting money by buying books no one wants. Anderson sees PDA as the alternative to building collections, and makes some predictions about how things will be in ten years time:

PDA is the new assumption, although it's not the only way. The collection service will be mainly a conduit service, building only limited permanent collections.
The smart phone is the killer delivery app. While few people want to sit down for two hours and read a novel on a small device, many of us have lots of blocks of 10-15 minutes in which we will happily read something that's conveniently available.
Most academic print acquisition is print on demand, this avoiding the major waste of print runs based on guesswork.
Most search is done on primary documents rather than proxies such as the library catalogue.
It is difficult to distinguish library services from other educational services.

Different types of library will build different types of collection:

Big collecting libraries such as Oxford, Harvard, LoC will maintain "monuments to Western civilisation"
Local research institutions will have smaller and more specialist collections based on curricula
Less well funded liberal arts and community colleges will be conduits and will rely heavily on Google Book Search and just-in-time delivery.

The stumbling blocks to this will be

Sclerotic librarians. There are difficult conversations to be had about change.
Traditional accreditation structures - counting books on shelf to assess the worth of the library.
Fainthearted publishers - justifiably so. PDA will put some publishers out of business. You can't make as much money selling just what people want as you can by selling them content they don't want bundled with that which they do.
Customer-focused competitors for patrons time such as Google and Amazon. These competitors aren't interested in helping people find good information in the way that libraries are, but they're quick and convenient.

Q: What about undergrads who don't know what they need?

A: PDA doesn't mean no holds barred access to anything at all - the library needs to put some constraints in place. Librarians have traditionally overestimated the influence they can have on teaching students how to be good researchers - professors have this responsibility and it should be taught in the classroom with support from librarians.

Wednesday, April 06, 2011

Squatting in the Library? Visitors and Residents at UKSG

The most tweeted soundbite from Andy Powell's opening presentation for day two of UKSG was 'content should be of the web, not on the web'. This nicely sums up several of the comments and observations of plenary speakers over the first two days. People, we need to be nicer to our content and help it grow to reach its real potential. Content, I'm sorry, you are going to have to work much harder! It was interesting to note that Cliff Lynch was making very similar points to Andy on the other side of the pond, at CNI2011. The well known work 'The Machine is Us/ing Us' demonstrates this well.

Andy drew on a now seminal piece of work on 'Visitors and Residents' by Dave White of Oxford University. This describes how we interact with the online environments - with Residents spending a significant portion of their time online, building a known profile / identity and contributing to the process of creation online. Visitors are more likely to use the web as a tool, dipping in and out to find specific resources or answer specific questions. Both groups are equally valid, but we need to be able to cater for both. To date, digital libraries have tended to focus purely on the Visitor.

So what does this mean to those offering services using an online medium? The changing focus on Residents helps us to move from a controlling "this is you and this is what you are allowed to do' attitude to one where the user has more control - ''this is me and what I have done". Andy posits that social interaction is what creates content, and queries whether the approach of institutional repositories in working in this context. Few researchers see value in depositing or providing metadata, which has let to a culture of mandates - they very controlling approach that does not support the user.

One of the ways we can think about making some simple but significant changes for our content and services is to exploit the potential of URIs (of this type, not this type). URIs support the notion of 'of the web' not 'on the web'. Instead of saying where something is, they say WHAT something is. This moves us closer to the concept of intelligent, linked data and helps provide a platform for the type of social activity that is building content.

Andy notes that the environment and culture for openness does not simply appear over night but that companies are beginning to exploit the benefits of linked data - the 'Facebook Like' button for example relies on linked data to achieve its aims, and that little button is appearing everywhere!

The takeaway from the talk? Probably a call for both libraries and publishers to think more about how their services are supporting Residents and not just Visitors to provide a truly effective service in an online environment.

Labels: "open data", #uksg, linked data, uri

An attempt to capture the OA debate between Alma Swan and Steve Hall

In the red corner: Alma Swan's vision for the future of scholarly communication, and how we might get there: researchers and others can have immediate, fully-linked, reusable, repurposable, no charge, no barriers access to the corpus. Researchers and others wishing to access research outputs (datasets, grey lit as well as journals etc) should be able to roam freely through, picking and choosing what they read and use. Technologies should be able to do the same thing, to help advance research. OA is the answer because: 1, access gaps for researchers will widen as library budgets are further straitened and big deals are cancelled; 2, professional, practitioner, educational communities and private citizens also have an interest and need for this material; PubMed Central's stats show that 40% of usage comes from (what they call) "citizens". Immediate access is important - we must not delay the dissemination of knowledge.

In the blue corner: Steve Hall (IOP) says publishers provide services to authors, editors, readers and librarians but above all to authors - registering, validating, disseminating research. Authors don't pay for these services, libraries do - through subscriptions / licences to journals. This has become a problem as research output has grown (partic in e.g. China), but library budgets have decreased. The big deal still has a role, but multiple business models will be needed to maintain existing level of access in academia, and to extend access elsewhere. Open access is 2 different solutions (green and gold) that can't co-exist. Green OA seeks to make research papers freely available without contributing to their costs. Embargoes allow publishers to recoup their investment but don't widen access; ditching embargoes is unsustainable. Gold OA does contribute to costs of publishing, and makes content immediately and widely available, but funding is haphazard at the moment - we need a collaborative community response. It's still not clear whether OA will deliver savings (Hall dismisses Houghton's study and points to a more "real world" RIN one). IOP is going hybrid and will take fees from gold OA into account when setting budgets and pricing.

Alma agrees with Steve on the issue of scaleability (research output is growing, we need a per-article model) but argues that green is not "unfunded" given the hidden costs covered by academic contribution. Steve responds that publishers charge for the management of the peer review process, not for the piece that is delivered by academia, and cites examples of journals that have seen their subscriptions cannibalised by green OA or delayed access. He argues that depositing manuscripts in a repository is not comparable to publishing, since these manuscripts don't have e.g. reference linking.

Alma says stats show that mandates very quickly achieve 60% deposit, whereas voluntary deposit is about 30%. In 5 years, with more policies, we will be approaching an acceptable level of OA. Several journals have made content freely available online but have seen subscriptions rise, because of broader international visibility. Physics publishers have co-existed for years with ArXiv - what impact has that had on subscriptions in the last 20 years?

Steven cites Harvard's mandate as having achieved only 20% deposit. Policies need to be global and for example China is unlikely to adopt. Publishers are quietly ready to engage with gold OA. Over half of Elsevier's journals are already hybrid. What gets in the way is the continuing fight over short embargo green access. If funders keep pushing it, and publishers keep resisting, the stand off does no-one any good. Better to bring a collection of stakeholders together to review how to facilitate gold OA more quickly. Re. ArXiv, these are pre-prints (not peer reviewed - original author manuscript) and very few journals are comprehensively covered. It is more difficult to make new sales of a journal if a librarian knows most of it is freely available elsewhere.

Alma says that there are several other universities to cite that counter the Harvard example. If mandates aren't working, why are publishers so keen to lobby against them? We cannot ask for gold mandates (no university will do it, not many funders will countenance) so need to pursue policies for green. Publishers will incur costs in transitioning to a per article cost system - each publisher has to decide whether that future is viable.

Unfortunately, not enough time was left for questions from the floor - I bet there were plenty :(

Labels: open access, uksg

Tuesday, April 05, 2011

Goodbye Serials, Hallo Insights!

You've probably all heard the exciting news by now. Announced at today's UKSG AGM, our journal Serials has been renamed "Insights: connecting the knowledge community". This follows feedback indicating that the old title wasn't really doing justice to the wide range of topics we cover. The new title was the winning entry in a competition among UKSG members - big congratulations to our winner, Jane Harvell of the University of Sussex. She will receive her prize (an iPad!) at a presentation just before tomorrow's final plenary sessions.

More details at http://bit.ly/g8euj6 - feel free to share your thoughts, below!

Labels: insights, journal, serials, uksg

Shhhh, Turn of Your Phone!

James Clay starts his session by encouraging us all to turn on all of our devices, say hello to the person sat next to us and generally have a conversation. How nice :-) He lives online at elearningstuff.net - I'd recommend you check it out if you haven't been there before.

The first mobile phone call (from a car) was made in June 1946! This is not a new technology, we have lived with them for a long time. The first proper handheld mobile phone call was made on April 3rd 1973. It has only been in the last couple of years that we have really begun to exploit the potential of mobile devices. The i-pad sold 14 million units in the first 9 months of its life - mobile is everywhere.

James goes on to show how all new developments have been treated with scepticism (and btw gives an excellent way of how to actually give a 'history of' at a conference). The 'evil' slate, pen, calculator have all been criticised. We do things because they have always been done that way - resistance to change is normal. So we put up signs that say don't use your phones (or don't swim) to try and control change - but what problems to phones *actually* cause in a library?

The Culture of NO is a big problem in libraries today. When you see a big sign saying DO NOT, it is human nature not to respect it. It is much better to talk to learners and help them respect their environment rather than dictate and direct. A YES culture is a much better place to be.

So what CAN we do with mobile devices in the library?? My contributions are some fantastic ideas from thewikiman and Jo Alcock. Other ideas:

Use the web. Sounds obvious, but very necessary. Unfortunately, there are very few journal or eBook platforms that are well developed for browsing on phones, or even small notepads.
eBooks!
Communication.
Collaboration. AudioNote is a great example of tools that can be used in this way.
QRCodes - dotted around the library to provide extra information.
Augmented Reality - layering information over images within the library. It's a great way to UpSell the resources.
Barcode Scaners - scanning a barcode of a book in WHSmiths to see if it is in the library.
Making Notes - such as tools like EverNote.
Using tools like Google Googles to find more information about a statue, a picture, a resource.

It is of course, not that simple. In FE there are significant concerns about safeguarding and the permanency of the web. This is often used as an excuse - James describes the role of the Innovation Prevention Department, a well known department in all academic institutions!

Cost is a real issue. Mobile should be about enhancing the service you already provide, it should not be exclusive and discriminatory to those who cannot afford expensive devices. James also bravely states that eBooks will never replace real books :-)

The digital divide is a real issue (a la Andy Powell's talk this morning). As is connectivity (conference wifi anyone?). The pace of change also makes it difficult for libraries to keep up with changing devices, skilling staff etc. Prioritising is an important focus here.

Despite all of these issues, James is a clear believer in using mobile devices in libraries - and he encourages us all to think about just one way in which we could too.

Labels: #uksg, mobile services, technology

An introduction to ORCID

ORCID (Open Researcher & Contributor ID) started out as a CrossRef initiative that then flew the nest, with the support of Nature and Thomson. It now has stakeholders including funders, researchers and librarians. Geoffrey Bilder, our speaker today, has been seconded from his day job at CrossRef to be the technical director at ORCID.

The general problem: identity is cheap

The problem at the heart of ORCID's being is that, on the internet, identity is "cheap" - it's easy to create multiple different profiles in silos on different sites, leaving every site with a fragmented view of you.

The problem in scholarly communications

The scholarly record is built on understanding the provenance and 'network status' of content. Publisher brands are based on the 'provenance infrastructure' (credentials of author, editorial rigour, peer review, citations). Both CrossCheck (another CrossRef initiative) and ORCID are key to the credibility of the author, although note that it's not just about authors - it refers to "contributor identifiers" to acknowledge all the other roles. One person (one ID) can contribute in lots of different ways (author, reviewer, programmer, compiler) and can have relationships to other IDs (edited by, co-author, colleague etc).

The knowledge discovery problem: name ambiguity

ORCID is about knowledge discovery, rather than access control or security - about people publicising their work, but ensuring it is credited accurately. The main issue is name ambiguity: name variations, name "collision" (multiple people with the same name, eg. the other Geoff Bilder, a Canadian para-ski-glider), name changes, name translations, corporate authors... All complex problems that must be resolved for accurate crediting within scholarly literature. ORCID's mission is to solve this problem through collaboration; various systems exist - economists use RePeC's author claims service, some countries have national databases of researchers - but regional / disciplinary / institutional silos are unhelpful in our networked age. Aspects of identity can be claimed by individuals or asserted on their behalf by institutions; ORCID recognised it needed to bring both organisational and personal assertions together to seed its system as neither level by itself would ensure sufficient uptake to make the service useful.

Principles and progress

ORCID's ten guiding principles (http://www.orcid.org/principles) demonstrate the organisation's non-partisan, international, open approach. The board is made up of "anyone who can commit the time and wants to participate". So what have they done so far?

Thomson donated codebase for its researcher ID to help jumpstart ORCID
Various functions were added to this for ORCID's alpha prototype - Thomson's system was based on personal "claims", so the organisational layer had to be added
Now working out last details for licensing the codebase to build a phase I version of the system
And planning for future sustainability (funding / staff)
Hoping to have something that people can use, next year

Questions:

Q: Authors are allowed to create profiles - how can IDs remain unique?
A: Authors cannot change the identifier, only the information associated with it.
Q: The contributor ID could become increasingly complex - how do we define where 'contribution' begins and ends?
A: We will studiously avoid defining that - it's going to evolve. But the answer is essentially that people will record what they think is important, and if it's not important, it won't be counted for much. [Given that people will have to take the time to enter this data, they will likely only claim credit for things that are useful / important]
Q: How will this fit with the requirements of REF?
A: It's not clear where REF responsibilities will sit but hopefully ORCID will make the process of gathering information easier.
Q: Pseudonymity?
A: A lot of this information is public already, but in aggregation it's more powerful. What if it becomes too easy to find details about stem cell researchers in Alabama or animal sci researchers in Oxford. People do have good reasons to want to hide information - even just if you want to be credited for peer reviewing without it being public. ORCID will allow any or all information except the identifier itself to be hidden.
Q: What is happening with the development of IDs in different countries?
A: It would be a bad idea to think "ORCID's coming, let's stop working on our system". Other systems will continue to exist and be important. At minimum, ORCID will be able to include information about other relevant identifiers.
Q: What work will be involved for publishers?
A: A classic example: a researcher submitting a manuscript currently fills in all the information each time, and that information quickly becomes stale (e.g. contact data). In future, they will upload their ORCID, and publishers can query and recheck information as necessary.
Q: Who will be the arbiter of who will be attached to a work as a contributor?
A: For example, the corresponding author will have more credibility in saying who else contributed.
Q: Disambiguity of affiliations?
A: We may integrate with e.g. Ringgold to create a controlled vocabulary for organisations.
Q: What are the data protection issues?
A: We are transparent about what is being revealed, to whom, and we give authors control - they can make anything except the identifier private.
Q: What's the long term funding plan?
A: Exactly. The technology doesn't matter if we can't sustain an organisation to keep it running. We are looking at future models, from related service provision to membership.

Labels: Author, Identification, orcid, uksg

Curated tweetstream: what our audience said about Charles B. Lowry on the economic crisis

Here's another first for LiveSerials - rather than writing a report on the session by Charles B. Lowry, Executive Director of the Association of Research Libraries, I thought I'd give you a snapshot of the tweeting that took place throughout:

Setting the scene:

bookstothesky Telling wordcloud from Lowry re library budgets: key words emphasized are budget, reduction(s), cut, reduced :-/ #uksg

On US vs UK library budgets:

jharvell Does that mean that 10 universities in the us had library budgets of over 40 million dollars before the cuts? #uksg #didimisssomething?
chriskeene @jharvell and the lowest category was 'libraries with budget less than $20million'. different world!
jharvell Don't get me wrong those big budgets are brilliant. Brilliant. But my gob hasn't closed for the last 5 mins. #ineverknew #uksg
jharvell With the amount of money available in US budgets why are publishers even bothering listening to us in the uk #uksg

On the other hand:

charlierapple Decreasing budgets are the new norm, not an aberration, with consequences for teaching and research internationally #uksg Lowry
ORourkeTony @charlierapple #uksg I heard someone say recently that flat was the new up!
MelindaKenneway Time to head to Canada by the looks of things - they seem to be the only libraries left with budget. #uksg

And finally ...

antet Not sure I like the detached phrase "reduced commitment to human resources" #uksg

Labels: budgets, funding, library, research, uksg

Nurturing innovation, or why we need to kiss more frogs

Sir John O'Reilly, VC, Cranfield University, promises to touch on a number of different topics, seemingly randomly. Which is a handy excuse for a blogger - any randomness that follows was 'im, not me. Sir John is speaking about rebooting UK HE, from the perspective of the VC of a wholly postgraduate STEM university (he gives yet another definition of the M in ST(E)M - in this case, management).

He starts by noting that, in recent news stories, "students" and "higher education" have almost entirely meant undergraduate - even HEPI's Higher Education Supply and Demand document was focussed on undergraduate demands and provision. HEFCE funding per student fell substantially through the 1990s, despite the government's recognition of the importance of the knowledge economy, and we now know that several universities will begin charging high-end fees from next year. Only undergraduate students can access the Student Loans Company's "favourable" arrangements; what will the climate be for postgraduate students?

Sir John uses Langton's Ant to demonstrate emergence, as a cipher for the complexity of the higher education system: a simple algorithm at the heart of a complex system that generates unpredictable behaviour and unintended consequences. For example, the line between teaching and research is convenient but arbitrary, and the two are symbiotic - weaken the research base, and the teaching will be poor. But likewise: weaken the teaching base, and the research will be poorer.

Drawing this together, Sir John asserts that the changes in funding of postgraduate education may have the unintended consequence of weakening the research output of our universities - which in turn will weaken innovation, which will in due course weaken the economy. "The princess and the frog" can be used as metaphor for knowledge transfer; to find a prince, you have to kiss a lot of frogs. We must be careful not to disrupt inappropriately the research and innovation agenda and our ability to address it in England. "We are in the business of kissing frogs, to ensure that the future generation has its princes of wealth creation." As Jane Harvell, one of UKSG tweeters puts it, "all this is not news for us working in HE, but it does need saying, and is best said by a Vice Chancellor."

Labels: consequences, economy, emergence, he, innovation, knowledge, postgraduate, uksg, unintended

The future is open (thanks to metadata)

Rufus Pollock from Open Knowledge Foundation tells us how metadata can and will be more open in the future, and why we should care.

Libraries and publishing used to be mainly about reproduction of the printed word. Access and storage also but reproduction mainly which once upon a time reproduction was very costly; people needed to club together and form societies in order to afford reproduction.

Now we're matching, filtering and finding, but there's too much info and every password you have to enter slows you down, and slows down innovation and innovators. Matching is king in a world of too much info - Google's aim is to match people with information and it all relies on humans making the links and building sites. Imagine if they'd had to ask permission of every single person - we would have missed out on something big.

Of course people have to be paid, machines have to run etc. BUT much of this production is already paid for i.e. via academia itself: instead of using the same few favourite books, why not ask friends? Or create our own journals?

Data and content are not commodities to sell but platforms to build on... there are plenty of ways to make money without going closed (although it might be different people making the money of course!)

And why does metadata matter so much? It's the easy way in; everything attaches to it: purchasing services; wikipedia; analytics such as who wrote it, how many people bought it etc.

Data is like code and so the level of re-use, and the number of applications we can create is huge.

One such project is JISC OpenBib which has three million open records provided by the British Library. It integrates with wikipedia, and includes a distributed social bibliography platform so that users can contribute and correct and enhance. We need to harness the users to help us make much better catalogues, to enrich catalog data.

So metatada is the skeleton and right now we have the chance to make a significant change for the better. Metadata and content WILL all be free one day... it may take some time but it will happen. The day is coming when there won't be a choice. There will be enough people with open data to make it happen.

Do the math: PDA not the answer?

Yesterday, we talked about filtering; today, Terry Bucknell suggests, we're looking at the opposite - buying by the bucketful. A multidisciplinary university like Liverpool (where Terry is e-resources manager) buys a wide range of content, for example, 19 ebook packages in 5 years. These have traditionally been bought with leftover budget at the end of the financial year (n.b. Liverpool decided ebook packages are better than journal backfiles here). As "leftover budget" becomes a thing of the past, Terry is analysing the value from these packages in more depth and always considering alternative purchasing models - individual title selection, patron-driven acquisition, etc.. Here are some highlights of Terry's analysis:

40% of Liverpool's e-resource usage is e-books - yet 95% of the budget goes on journals.
Liverpool's usage is typical - approx 40% of titles in a collection are used in the first year; approximately 60% have been used by the second year
Some subjects (e.g. mathematics, at Liverpool) seem to perform badly - is this a factor of how/when information is used in different disciplines? need to be careful before making collection development decisions based on this data
All types of books get used at least a bit, but some content (e.g. conference proceedings) is used more than other content (e.g. monographs)
Pareto principle applies! 80% of downloads from top 21% of ebooks - Terry doesn't think this should be a factor in how collections are purchased / priced. (Looking more closely, 35% of usage on one platform came from one title! - doesn't tell you anything about the broader collection, just that some books are heavily used)
Even on aggregator platforms (where there's a greater level of individual title selection than a publisher package), a third of ebooks have had only 1 or 2 accesses during 2 years
With patron-driven acquisition, all ebooks are used (because you don't buy them unless they are) - so should be better value? Terry used ebrary model (purchase triggered by 10 page turns / 10 minutes in a title / copy & pasting / printing) to analyse Liverpool's Springer ebook usage stats and calculated that PDA costs would overtake package costs in just one year in most cases (even when cheaper backfiles were excluded from analysis).
Evidence from elsewhere (e.g. U Iowa ebrary pilot) also shows that PDA budgets run out quickly - libraries who started trials had to resort to buying packages after all
... other PDA models are available ... (and may show different results) but Terry found that a PDA model would have to allow for "6 chapters free" before it would be comparable to package pricing.

Experiments like this can give libraries and publishers an idea of what is a fair price to pay for an ebook package; Terry's conclusions:

Some packages are better value than others, and libraries should prioritise these in collections
Aggregated databases give cheap critical mass
Single title selections are important for core texts
PDA can fill the gaps, but not form the foundations

The implications for libraries:

Need to centralise book budgets - stop fragmenting by formats etc - a hard sell for lots of faculty / librarians
Rapid move to e-only book acquisition - implications for logistics / staffing

The implications for publishers:

Packages need to be at least 50% discount for it to be worth it for the library - make it a "no brainer" (70% discount) for the library to purchase, and you'll solve the budget crisis
Offer combined books / journals packages with appropriate cost weighting / discounting.

Labels: analysis, database, ebooks, model, package, pda, purchasing, uksg

Monday, April 04, 2011

The Gatekeeper is Dead. Long Live the Gatekeeper!

In 1994 Cameron Neylon's PhD supervisor told him he needed to spend half a day a week in the library reading the new journals. Five years later when he finished his doctorate he had Google. This, he says, makes him part of the last generation to remember the library as the place that you go to access information. The last generation to think of journals as paper objects. The idea of physically searching a paper index is almost a joke.

Neylon defines his audience this afternoon pretty neatly as "the people who have to deliver for research and education, and also have to add value". In this task we - and researchers - have a shared problem: there's "too much stuff", information overload.

However, Neylon takes issue with Clay Shirky's statement that "it's not information overload, it's filter failure". This isn't a good way to think about it, because filters block. But surely it's good to block out the stuff no one wants from the deluge of information they can't deal with, to apply standards, etc? Neylon suggests that these filters that we apply actually limit the researcher's ability to explore.

Filters are a problem when the researcher doesn't know what they are blocking and why. They can be useful, but the researcher needs to be allowed to choose the filters that they want to apply, and doesn't want publishers or librarians applying the filters for them. Google allows you to set your own filters.

Neylon gives the example of a chemistry paper that claimed to show something fairly revolutionary. Within hours of publication the experiment had been recreated by several researchers and proved wrong - one of the samples was contaminated. The paper was retracted and labelled as such "for scientific reasons". The reaction of the unintended chemicals is of interest to many, but it's been retracted with no explanation. Failed experiments are as useful as successful, but they don't get published - they are filtered out.

The number of retractions is going up. That's a lot of failed experiments that could be useful to someone's research. Researchers don't know they are repeating failed experiments, but they could.

The gatekeeper was needed in a broadcast world - expensive printing and distribution needed centralising. Decisions needed to be made about what to publish and what to collect. The current flood of information is the "central research opportunity of our age".

"Every book its reader" - Ranganathan's third law of libraries. Filtering is not adding value. Rather than filter failure, Neylon believes we've got a "discovery deficit".

People can be the filter - social aggregation, annotation, critique. A network of linked objects - blogs, tweets, RSS feeds can all be found using Google, and come together to be the researcher's own personal collection. Neylon doesn't want a collection that has been chosen for him by someone else - he wants to choose his own filters from those that are available to him.

Neylon's closing advice: We need to connect people with people so they can build discovery systems. Enable, don't block. Build platforms not destinations, sell (provide) services not content. The content game is dead. Forget about filtering and control and enable discovery.

Q: Are there any publishers who are currently enabling?

A: Most are making some effort, but we need to think about how to make them more effective. We're feeling our way together.

Q: There's a lot of rubbish on the web - are you saying publishers should be publishing this too?

A: No - there's nothing wrong with authorities labelling things as trustworthy - the problem is that no one publishes all of the experiments that didn't work. Publishers can mark up, validate, etc. Just don't block the other stuff.

What scientists really want from digital publishing.

This section of the conference allowed librarians and publishers to hear directly from scientific researchers; first up is Philip Bourne from University of San Diego who is a computational biologist among other things (e.g. open access advocate).

Bourne starts by explaining his big hope for scientists’ relationship with publishers in the future:

“as a scientist I want an interaction with a publisher that does not begin when the scientific process ends but begins at the beginning of the scientific process itself”

The current situation is:

1. Ideas
2. Experiments
3. Data gathering
4. Conclusions - it’s at this stage that the publisher comes in

But why couldn't the publisher come in at the data stage? They could help store it for our group. Or even earlier, at the ideation stage: The moment I jot down a few ideas, the publisher could control access to that information and then at some point down the line when the access is opened up – that’s when it becomes ‘published’.

There are movements in that direction. For example in Elsevier’s ScienceDirect (and some others too) you can click on a figure/image and move it around and manipulate it – the application is integrated on the platform because a publisher and a data provider has cooperated. But this is just the beginning; when you click on the diagram in the article, you’re getting some data back but it’s generic and it might not be organised in the way that you want. It’s generic – the figure is being viewed separately from the article text and related data – now you have to figure out what that metadata means to the article. So this is a good step but it’s not capturing all of the knowledge that you might want. It needs more cooperation, more open and interactive apps. And it needs:

Integrated rich media that improves comprehension, viewable in different ways. A video of an experiment actually being done, delivered to me alongside the text from the article.
The ability to review and interact with data on the mobile platform. Should have apps not just to read but also interact with data.
Mashups with content from other articles and data, must be at the point of capture, not post- anything.
Semantic linking of data that can lead to new knowledge discovery. To find all references to that piece of data – that data itself is probably not cited – would like to know how the actual data is being used discover relationships that other people have found between your data and other sets of data.

So Bourne wants publishers to become more involved with his work as – he confesses – some of the work is less than organised. He thinks scientists need help with management of data in general, and specifically:

· Project management. They use e.g. basecamp for project management but email folders are primary – this in an unhealthy ‘hub and spoke’ situation
· Content management. It’s a mess with content stored all over such as on slides, posters, lab notebooks etc.
· Manage negative data. They generate way more negative data than positive – Negative data is important. But you can’t find it – it stays hidden. This needs to change.
· Software. All the software they create is open source but when the grad student that wrote it leaves, it’s lost.

Solutions?

Bourne’s ‘Beyond The PDF’ workshop has generated discussion and ideas. He says “the notion of a journal is just dead – sorry. The concept of a journal is lost to me; its components and objects and data are what I think about. Research articles are useful but the components could be seen as a nanopublication.”

We need more:

· Semantic tagging of PDFs and beyond
· Citation ontologies
· Scholarly html – to write these workflows
· Authoring tools

Microsoft are looking at some of these things already and Bourn’es group has written plugins for Word – e.g. as you type you auto-check various ontologies that may suggest you change a common name to a standard name. You can tag that at the point of authoring.

All of this is not yet a huge success but it’s coming. Right now there is not much incentive but if publishers can help fast-track the development of these applications then authors will start using them. There’s no use talking about it but it’s only on researchers’ radar when they see science done in a way where this process has made a difference. For example Bourne’s group is running a test to look at spinal muscular atrophy (designated by the NIH as treatable). They will coalesce a set of disparate tools, engage the publishers (Elsevier have opened up ths), in order to address a specific problem that could change lives.

If this works then that would get the kind of attention that scientists would take notice. Only when they see thus process succeeding will they start adopting it. The tipping point will come when the tenure ‘reward system’ starts to change for the next generation, the way science is researched will improve.

From Tortoise Shells to Tweets - The Future of the Book

Oh dear, Skip Prichard starts his talk at UKSG2011 by praising the bloggers at UKSG - and it's my turn to blog. No pressure Nicole, no pressure! Skip starts by reminding us that it is not worth talking about the 'Future of the Book' without looking at its past.

The earliest know form of writing is some scratchings on a tortoise shell from 6000BC, found in Northern China. From there Prichard takes us on a rapid tour of the familiar names from the history of the book - the Art of War, Guttenburg, Caxton, the Penny Dreadful. All of these developments had a framwork of publication around them that we can reflect on in the same way when we think about digital books - and Ingram Digital has an invested interest in thinking about this. Some of these examples may confuse form and function somewhat, but I think the main point is that we have been consuming 'writing' in various ways for a long time.

So, what is the definition of 'book' in today's age? Is the move from print to digital any different than the move from scroll to bound text? Prichard highlights some trends around the digital move:

1. Shifting Market.

This describes the move from physical stores to online sales. This reminds me of a recent (personal) blog post I wrote about the recent closure of libraries. This is being supported by the growing use of appropriate devices. The predications for 2011 are that there will be 14.7 million e-readers 44.6 million tablets in use.

Prichard also predicts that academic libraries will be 80% e-only by 2020 in US (seems quite slow to me!).

2. Generational Shift.

Schools are using a mixture of modern devices like the i-pad with traditional books that have been in use in the classroom for years. There is a significant change in language - text speak is effecting teenagers learning across the board.

3. Enhanced eBooks.

Moving beyond merely trying to deliver the print version of a book in digital format. The lonely planet's new travel guides are an interesting example of this - I'd note they have to be, as they fight for their market against user-generated content on places like TripAdvisor. The books container is changing, the book itself must change to keep pace with this.

Prichard poses some ideas of where we might go with this:

Could we use biometrics to change the ending of a book based on your mood?
Could your car remember where you were in a book and start reading to you when you start a journey again?
Could locations used in a book change based on where you physically are?
Could books interact with each other more, e.g. viewing other people's underlinings on Kindle?

More practically, MyiLibrary is using algorithms of usage to predict whether a print or digital copy would be more appropriate for a specific library.

Prichard closes by saying that print on demand has to be the future of publishing - it reinvigorates the supply chain, its green, and its user appropriate. He does not see print as vanishing - and reflects on the failure of the 'paperless office' as an example of why printed books will not vanished. There was quite a bit of disagreement on Twitter about this...but I wonder if we think about students printing articles / photocopying book chapters as part of the 'print' process. We might not BUY print, but print consumption will always be a personal choice.

Naturally, the audience is not going to get a publisher get away scot-free with giving a presentation without identifying some of the ways in which publishers are NOT helping the shift to digital. The poor business models for ebooks were highlighted, with prices often being higher than print making them inaccessible. The JISC eBook Observatory project has carried out some interesting work around this concept.

Following a question from Peter Burnhill, Prichard notes that the solution will not come from one part of the industry - we should put the pavements where the students chose to walk.

Starting UKSG in a state of ... informed bewilderment

"The future is already here, it's just not evenly distributed." John Naughton (Observer columnist and OU academic) opens with this striking William Gibson quote, reminding us that you only discover new things if you know where to look and are willing to pay attention. We're in a state of "informed bewilderment", with no idea how the internet revolution will pan out, so should stop trying to predict the future, and pay attention to what's already here.

Dissolving value chains

This is either an exciting, fulfilling & rewarding time ... or a traumatic experience that is likely to be destructive to some well-established businesses (even industries). The internet is "a vast global machine for springing surprises ... a phenomenal enabler of disruption." (Read Barbara von Shewick's "Internet Architecture and Innovation", Jonathan Zittrain's "The Future of the internet - and how to stop it" or John's own "A brief history of the future"). In programming terms, "disruption is a feature, not a bug". It cuts out the middlemen that have been such an enduring feature of our economy - journalists, travel agents .. and librarians, and publishers? "The net dissolves value chains" - where once journalism and classified advertising had a happy marriage, now one is helped by the internet and the other is disrupted. It's impossible to predict when open access will overtake closed access in the scholarly ecosystem, but the direction of funding etc indicates clearly where we're headed, so we need to pay attention to the existing change.

An increasingly complex ecosystem

The scholarly ecosystem has grown complex in its proliferation (of publishers, authors, institutions etc), and "for a system to be viable, it has to match the complexity of its environment." But there's not a single organism in our ecosystem able to match the complexity of our environment. Complexity is the new reality, and complex systems are intrinsically unpredictable. The banking crisis warns us of the dangers of being dependent on a system so complex that few people understand it, a system that is too complex to be modelled, too complex to be understood.

The avalanche of data in science

Librarians' functions have traditionally been determined largely by the physical aspects of materials and their housing. Value and roles were clear in the print ecosystem, whereas now, many students don't visit the physical library. Teaching, scholarship and research increasingly take place in a digital environment; librarians will need to move to where the action is ("from place to space"). The traditional information skills will need overhauling. Cornell's "Seven Steps of the Research Process" starts with encyclopedias and catalogues and only fleetingly, in step 5, refers to finding "internet resources". This already doesn't reflect how students behave. We need to adjust to new realities in science, which is becoming more data intensive. John closes by quoting Alan Kay: "The best way to predict the future is to invent it."

Labels: data, future, internet, libraries, science, uksg, value

LiveSerials