Tuesday, April 13, 2010

Maximising use of library resources

Sue White and Graham Stone, from the University of Huddersfield, were presenting a two phase project (although they emphasised that it's still a work in progress)
  • Phase 1: Looking at low/no use users
  • Phase 2: Linking use to student attainment, looking for evidence of impact and value, connected to the University Teaching and Learning Strategy

They identified three main indicators of use:
  • Access to e-resources, via log-ins to MetaLib (as they can see who users are, which isn't trackable in other usage statistics)
  • Book loans, through Horizon LMS circulation statistics
  • Access to library, through gate entry statistics at the main campus library which identifies students via their ID cards
The results were sobering: figures for zero use are high, even in Schools perceived as 'good' library users.

They then matched usage data with the student record system (SITS) in order to get complete data for two cohorts of students on 3 year courses. More statistical analysis of data is needed but it suggests a clear correlation between MetaLib logins and books borrowed, and degree classification, across all Schools. There was no correlation with gate entry figures, however, which may be been due to complicating factors like an extensive refurbishment programme and the location of other student services within library building.

The project team have done more detailed analysis of 15 'low use' courses, focused on 3 year undergraduate courses delivered on main campus, and excluding courses with less than 35 students (to avoid the possibility of identifying individuals).

The results still suggest a consistent link between e-resource use, book borrowing and student attainment, across all disciplines. There are outliers, like students who have obtained firsts but didn't appear to be library users, and some courses don't follow the pattern eg where degree classification is influenced by book borrowing but not e-resource use. This raises some interesting questions: are e-resources not relevant to the course? is the tutor not advising them to use e-resources? have they bought the right e-resources? do users know about them? are students using Google to go straight to the e-resources, bypassing MetaLib?

This kind of project does raise some issues so Huddersfield's advice was:
  • Politically sensitive topic to investigate, beware offending tutors
  • Important to have support from senior management of university
  • Identify academic 'champions'
  • Need to acknowledge subject differences: there may be pedagogic reasons why some courses do not use resources the way a library might like
  • Not cause and effect relationship: not a case of 'borrow more books and get a better degree'
  • Be honest about findings eg university spent a lot of money on refurbishing the library but gate counts don't correlate with attainment
Hudderfield's academic librarians now have a mandate to go out to the Schools, to explore reasons for non/low usage on specific courses and develop an action plan. The action plans will cover:
  • course profiling
  • raising tutor/student awareness with targeted promotion
  • reviewing the induction process
  • embedded information skills training at point of need
  • targeting resource allocation (both information resources and staffing)
They will produce an Annual Resource Statement each year with Schools, laying out what % budget will be spent on books, journals etc, a list of resources to be cancelled/renewed/started each year. Progress will then be reviewed annually.

More information is available via the University's repository

[This session was also a useful complement to the discussion about metrics and return on investment raised by Carol Tenopir in the second plenary session on "Economics of Scholarly Information", which focused more on the library's impact on research and in particular grant income]

Labels: , , , ,

Wednesday, April 01, 2009

Plenary presentation summary: Journal Spend, use and research outcomes: A UK perspective on Value for Money. Presented by: Ian Rowlands, CIBER

During the second plenary session on Tuesday during UKSG, Mr. Rowlands presented some preliminary data from part of Research Information Network funded research project. He is halfway through project and will be continuing into next year. There are some very interesting visualization tools to explore the data online.

There has been an unprecedented growth in access to journal material over the past decade as content has moved from print to electronic. However, it is critical to assess the impact of the increase in access and availability of content has had in past decade. Has this increase in access led to higher productivity and more innovative research?

In exploring the research outcomes, Rowlands is looking at many quantifiable criteria, including: Number of Counter downloads, # of Phds, # of grants, institutional spending patterns, and deep log analysis in a variety of disciplines.

It should come as little surprise to the community that the transition from print to electronic publication is nearly complete. 96.1% of science journals are online and 88.5% of arts and humanities journals are online. In 2007, the academic community spent £80 million on e-journal licenses. Collectively those purchases have yielded more than102 million downloads or 0.80 £ per download.

There has been tremendous end-user take up of these resources. The number of downloads doubled from 2004 to 2007. This represents a 21.7% per annum growth in downloads over that period. The core proposition of providing online articles is “very popular” among researchers.

There has also been a rapid increase of number of journals available at an average institution. The average number of titles per researchers is up from just above 4 to just below 8. {TC – Given the present economic environment it is likely these figures will decrease in the coming year, but it certainly will remain at a higher average level.}

Citation analysis is showing that users are drawing more sources, and including more references per paper. The use of navigation and discovery tools, increased access, has created a situation where research is now more deeply founded in previous work.

University administrations are looking for clear and compelling justifications for the continued expense of information purchases and Mr. Rowlands thinks that compelling information is now available.

This change of availability has impacted the information seeking behavior of end-users. It is not surprising that Google is the “librarians friend”. Many Researchers are using gateways, such as Google, Pubmed, etc. to get access to content. Examples of the increase of traffic abound. One OUP Journals saw a two-fold increase in journal uses as an effect of opening up their content to Google.

The access provided by online content is also having a profound impact on resource use. The convenience of 24 X 7 access is tremendous. 17% of activity is taking place on weekends and the “Working day is growing” with 1/3 of activity taking place outside of “normal office hours” of 9:00am – 5:00pm. This access was more difficult in a print-based world.

However questions remain about whether efficient search is the same as or necessarily yields successful research? There is a strong negative correlation between research rating of the scientists in institutions and the average session length on Science Direct. The most “successful researchers” were the group spending the least amount of time online with content. Trends pointed to the fact that the most successful researchers use gateways. Much more search activity is taking place outside the library, typically on services like Pubmed, Google, and Google Scholar.

There were natural clustering of intensive use and the figures for the differences between moderate, high and super users correlated significantly with outputs such as the numbers of papers produced, the amount of grants funds received and the number of PhD’s the institution produces. In addition, while the average cost per download is consistent across institution, the more active the institution the less per article the institution paid.

Mr. Rowlands stressed that these data merely show associations not causation. Nor does the data show any directionality. Is it that a lot of research creates demand for lots of information, or is it that research institutions, put things together and in place for research, which therefore impacts results.

The next stage of this research will look at historical information. Among the topics to be explored is what are linkages between products, spending and outcomes? He is working to produce a computer model, that shows, for example, scenarios what the increase in the number of titles and/or downloads might have on research outcomes.

The initial information points to the fact that downloads and research outputs are like “gears on a bicycle” that move in tandem. As one gear gets bigger, the faster the other gear turns. Although one needs to understand the causality question, the understanding of the fact of the connection is a useful addition to knowledge about assessment and performance measurement.

{NB Disclaimer: Much of this summary is verbatim and/or paraphrased from the Mr. Rowlands talk – very little in this post is interpreted and should not be credited to me. Apologies to Mr. Rowlands for any errors.}

Labels: , , , , , , , , , ,

Tuesday, April 08, 2008

Q & A for Plenary Session 3

Q: I am an academic and I build robots. I do this because I want information but you have described what I do as damaging. I do not threaten copyright but it is difficult to download information responsibly from publisher websites

A (Ian Bannerman): I'm sorry I gave that impression. Robots do distort statistics though. The type of usage you use it is legitimate and good and there is work for publishers to do. But if it is measured as human use there is a real issue.

A (Richard Gedye, Oxford Journals (Chair of Session)): we are looking at this in COUNTER and the issues raised by federated searches etc.


Q (for Herbert Van de Sompel): is there not a place for simple usable metrics for people to use

A (Herbert Van de Sompel): they should not be simple but usable - we should know what they are about. We don't really know what they're all about. It is really early days in the study of usage indicators so we are trying to get a grasp on this issue. We won't get out of the two year project with a set of simple metrics to use but we should have some ideas and some valid caveats about using them. A real distinction between a research project as oppossed to launching a metric and being stuck with it for years.

A (Richard Gedye, Oxford Journals (Chair of Session)): Yes we see Herberts work as almost a Which? Guide to Usage Statistics that can be taken forward to build the type of simple metrics for practical usage.

Labels: , ,

Use and Abuse of Usage Measures - Ian Bannerman, Managing Director for Journals, Taylor & Francis

Ian Bannerman is offering a slightly contrasting view to the previous two speakers

COUNTER and the Usage Factor
Launched in 2002 this attempt to make usage data creditable and countable. COUNTER conceived by Herbert Van de Sompel and Colleagues in 2006. An invitation to tendor is now out.

Thomson Scientific Impact Factor: total cites for items published and total items published
Usage Factor: total usage of published items.

Implicit assumptions of usage statistics
  • Usage data is consistent, credible and compatible
  • Usage factor would be a meaningful indicator of something
COUNTER guidelines on filtering for robots and pre-fetching are in draft (release 3) - they will filter a list of known robots Ian Bannerman does not think these go far enough though. Downloads may not be accurate either - not all downloads are successful or intentionally/human initiated. Also most known robots won't get past acces control on subscribed content but it's the unknown ones that distort the numbers (those within IP range of university - amateur attempts to mine data).

Ian has brought up an example article - many many access in one Russian institution - accessed once every 9 seconds or so by some local error (COUNTER would ignore that); another example shows every article in a journal being accessed about 57 times by a Korean institutions - look suspiciously like a robot but the stats arrive 3 months after the event; a further example is an uncited article (and an obscure one) being accessed 1,183 times - not clear why! There is a lot of noise in the system and it's hard to identify or understand it all.

Is usage a meaningful indicator anyway?
Ian Bannerman cites Davis & Price (2006) [eJournal Interface can influence usage statistics: implications for libraries, publishers and Project COUNTER. JASIST v.57 n.9, 1243-1248] in showing the impact of an interface on usage which, he claims, is at odds with the meaning of usage statistics. In particular he talks about those journals that require viewing of full text HTML before downloading a PDF - COUNTER would count this twice at present! Bannerman adds that if people's careers relied on usage (rather than impact) of publications you would clearly have some issues here.

Bannerman is also concerned about the impact on publishing usage statistics and the lack of transparancy that may occur if financial success dependent on them - the Observer Effect. "By measuring the literature we may change the literature." Issues at author or publisher level include (and this is on impact factor): self-citing; alerting authors to content they "should" cite; seeking out prolific high quality authors (who may self cite); publishing most citable articles early in the year (larger window for citation and impact factor); targeting topical areas rather than long term studies (affects funding); publising review articles; etc.

Additional issues for usage factor may be worse: getting friends, your neighbour etc. to download articles (or writing a bot to do it); temptation to leave usage data unfiltered; publishing for students not for researchers (impact factor for citations is prestige amongst peer group, usage is based on numbers); sexing-up title and key-words; using abstract to tease rather than inform; stopping printed journals; blogging it, tagging it and posting it; broadcasting metadata but keeping articles where they are counted - not in OA repositories (although your blogger here feels this is as things are, you could do counting from OA repositories).

Impact Factor - not all attempts to change and improve impact factor are "bad", leave an audit trail, act of citing usually meaningful (you stake your reputation on it). Usage trails not (as) trackable, no reputation impact as practically anonymous.

Recommendations
  • Extreme caution in over interpretating usage data
  • Further research into factors that influence article downloads
  • Improved guidelines for identifying and filtering robots
  • Awareness of the Observer Effect

Labels: , , ,

Information-seeking behaviour of the virtual scholar: from use to users - David Nicholas, UCL

David began by addressing the issue of disconnection from the user. We monitor activity rather than actual users. The virtual audience differs in composition from previous audience - we also can't even see it and we find they move elsewhere (accessing material from publisher sites for instance). Content was kind. Now the consumer is king.

We need to identify best practice, find scholarly outcomes and achieve satisfaction.

David has a slide up to illustrate the Virtual Scholar: a portfolio of services used by these users. This information is evidence based including:

  • UK National e-Books Observatory, JISC, 2008-9
  • Impact of Open Access Journal Publising, OUP, 2006-
  • RIN study on use and impact of journals, RIN, 2008
  • Behavious of the Researcher of the Future (Google Generation), 2008
Digital information footprints allow a complex view of information seeking behaviour which is rich in detail.

Profiling Information Seeking Behaviour
  • There are huge numbers of scholars and high demand for scholarly product driven by ubiquitous access (on buses, trains, hotels etc, existing users can search more freely and flexibly); huge usage numbers; spiralling growth; usage is not the outcome.
  • Some issue in the fact that many users are overseas - UK government funded scholarly websites have less than a third of their users in the UK; Asia loves OA; what issues does this raise?
  • Many users are young - information seeking behaviour is very different; spend lots of time online and some still see them as "noise" in the stats
  • Robots are always an issue - around half of all scholarly site visits are by robots (in some cases 90% of users are robots); now mimic human behaviour (Google's are particularly shrewd)

Human/Human-like behaviour
  • Shop around (40% of visitors never visit again)
  • Bounce (1-3 pages only of the many available - overseas visitors bounce less, young people more)
  • Flicking (a kind of channel hopping behaviour)
  • View (humans conditioned by emailing, text etc.
  • Don't view articles for more than 2 minutes
  • Spend more time reading short articles than long articles online; if it is long either read the abstract or squirrel away for later)
  • Power browse (you can hoover through titles, contents, abstracts etc at huge rate);
  • Books now opened-up great view
  • Horizontal rather than vertical
  • Navigate (we spend half our time navigating to content)
  • We are not all the same (national differences, e.g. Germans most successful searchers and most active information seekers; age differences; gender differences (women are less permiscuous!))
  • Brands very complex but imporant (difficult to identify where authority lies (especially with authorised resources - hard to tell how your access is occuring)
  • What you think is the brand is not what other people will see as the brand, and some are cool, some are not)
  • Do not behave like a librarian!
  • Behave like an e-shopper (use a common platform, multitask, information pedigree of some of key e-commerce giants: Amazon and Google)

Impacts, outcomes etc. best summed up by Guardian (quoting Marshall McLuhan's "Gutenberg galaxy").

David Nicholas compared power browsing and information seeking etc. to alcoholics anonomous: people don't want to admit they do these things. We are all behaving like this though - not just the young! Although older users have different conceptual framework for this behaviour.

Access is no longer the outcome - need to go beyond having that access be easy and quick, now we need to profile behaviours in order to find best practice and see what works and what does not. Establishing the good and the bad needed to establish development of information literacy. We also need to know how we justify our spend on information resources by proving value.

"We are not fighting Google, the battle is with ourselves"

Unless we connect with our users we will dissipate.

Labels: , , ,

Monday, April 07, 2008

Usage Factors - Break Out Session

Richard Gedye, Research Director Oxford Journals

The Usage Factor: How we can enhance the relevance of usage as an indicator of relative value?

6 years after the launch of COUNTER (which aimed to be consistent, credible and compatible) it seemed a good time to take stock and determine how successful it has been:

  • Consistency - nearly 100 publishers and hosts are now producing the reports in standards
  • Credibility - formal auditing process started in 2007 and details on compliance available on website
  • Compatibility - reliable comparison of amount of use, but not meaningful measure of relative quality or value


Addressing the challenge - ISI impact factor compensates for size of journal in a way COUNTER doesn't; we (the UKSG) determined that we should seek an additional measure of utility/value: the usage factor.

Usage factor = total usage over period x published during period y

Is there a demand and would it be cost-effective?

Some initial evidence about the demand from CIBER survey on behalf of STM and the PA; authors in the survey were asked to comment on the statements: article downloads are a good measure of utility and citations are a good measure of utility -70% agreed re: article downloads were a useful measure - slightly more than agreed with that statement as opposed to author citations. Download metrics would have considerable credibility amongst author community; alternatives to the impact factor... would certainly be of great appeal to librarians and many publishers"

In 2007, UKSG and COUNTER commissioned research looking at ways in which journal quality is currently assessed and the degree to which additional metrics would be of value. The resulting report, following in depth interviews with librarians and authors

Advantages
- Useful counterweight to impact factors esp. A&H journals not well covered by ISI
- For journals with high use by those who are using it but not for citation purposes (more practical journals)
- Especially helpful for journals with relatively few articles
- Data available potentially sooner than with Impact Factors
- Useful to communicate usage information to journal owners in order to compare with other journals (where they are societies etc)

Issues to address
- Only COUNTER statistics should be used and not all publishers are COUNTER compliant
- COUNTER data needs to be made more robust
- would another global measure, such as usage half-life per journal or per discipline not be of greater value?
- Concern from those publishers with strong impact factors that this would adversely affect their rankings - this will favour lower quality journals
- This will stimulate publishers to inflate their usage by every means
- How would print usage be taken into account?
- needs to amalgamate usage from all available hosting sites
- Bigger publishers have huge resources selling licenses to institutions and consortia, societies and smaller publishers do not (research shows the more you drive up usage the more you drive up the impact factor)

Who should do the calculations?

"It would be difficult for librarians to consolidate global usage statistics? Could not the publishers do this?" Publishers fed back that they would like control over the process- centralizing it would add considerable cost to the industry... Publishers on the whole were unwilling to provide their stats to a third party, but willing to have their own calculations audited... 84% of publishers were willing to calculate and publish their Usage Factors - 8% said "no", 8% "perhaps"; details needed were how would you define total usage; specified usage period and the total number of articles

What are the optimal values to measure?

Having done initial interviews, we widened it out to a web based survey - similar results to CIBER report - ~70% of authors would welcome the measures; strongest in biomedical sciences, least in arts and humanities

For librarians came in at number 2 in terms of its importance in selecting new journals (IFs were number 4); for renewing existing journals it came in at number 3, below feedback from library users and usage; but above Price, cost per download, impact factor and reputations/status of publisher.

Recommendations from the report - that the UF concept be developed further with a view to testing it as a practical implementable measure of journal quality value and status and if satisfactory results were obtained from the investigations and tests we would then need to scale up and test the system for a couple of years behind the scenes to ensure it would scale up.

In 2007 we began testing and modelling the UF concept with real data - 6 publishers, 1 aggregator, and 1 hosting service formed the Project Steering Group

Plan is that usage logs will be converted to some form of uniform standard report for third party analysis - RFP being drafted for third parties to bid for the work involved - RFP to be published later this month - still need to agree uniform standard report and details to ensure data consistency, integrity and fitness for purpose - e.g. measuring number of "items" published; assigning a correct publication year for each item (e-first can be up to 2 years ahead of "print"); excluding spiders and crawlers etc

Summer 2008, we hope to deliver a report which outlines potential usage measures which can be assessed and recommend which ones are robust enough to scale up, and any changes that publishers need to make to make the UF reliable and comparable... to propose ways in which it can be audited.

Further information at http://www.uksg.org/usagefactors or contact richard.gedye@oxfordjournals.org

Q: How do you define usage?
A: by adopting COUNTER definition - the number of successful requests for full text items (ANSI answers in the 200s and one in the 300s which covers downloading from your own cache)

Q: does this take into account purpose of downloads
A: no, and it can't - measures how much something is desired or wanted, not why. Shibboleth might be able to look at sub-dividing usage into audience type (under grads, faculty etc) - data protection may cause problems here

Q: With the expected rise of data and text mining, will this activity be excluded?
A: this is a question for COUNTER rather than the Usage Factor project; they are aware of the increase in non-traditional usage such as pre-fetching articles by Google etc. It does have to be acknowledged and scoped. Nature has a separate version solely for data mining where all the words are in the wrong order so that only a machine can read it

Q: any of the libraries are using ERM systems which are COUNTER compliant where they locally host material
A: publishers would be keen on libraries becoming COUNTER compliant and feeding this usage back to them

Q: How would you take account of "illegal" downloads?
A: raises a number of questions: assume you mean someone getting around a publishers access management system? Publishers have a built in interest in putting a stop to these; most illegal downloads come from rogue activity within existing institutional customers- robots to download large number of articles etc. COUNTER already can catch most of this and it is excluded; if it is an individual doing it manually, then we have statistical and technological means of stopping this. COUNTER need to address retrospective corrections once this abuse has been detected.

Q: as a collective activity between publishers and librarians, what is the audit process and will it be transparent/trustworthy
A: We don't know yet; one of the tasks for the appointed third party is to come up with recommendations for audits that will satisfy both libraries and publishers so that it is "properly benchmarkable". COUNTER has a relationship with ABC-e - the Audit Bureau of Circulation E Division who audit the COUNTER standards for reporting and advise on their development

Q: What about alternative versions of the articles?
A: Ebscohost are on board and of course publish their own versions of the articles; also on publishers' sites you can have an advanced version of the final articles. We are unlikely to look at author versions on Institutional Repositories, but PubMed Central will be an issue. There needs to be a technical solution to logging usage of distributed copies including authors' copies of articles.
Comment: - as use in IRs grow, these need to become COUNTER compliant and within scope, but small scale at the moment.

Q: what do you think the problem will be with eBooks etc - i.e. non-articles
A: could extend to this, but tricky with no continuity but should look again as they become more established- although perhaps perfect for major textbooks which is the closest parallel? Not much demand or impetus? Article downloads could impact authors careers, less so with books

Q: what happens if a publisher isn't COUNTER compliant (asked by a librarian)?
A: include a clause in the license agreement to encourage them to become compliant..

Labels: , , ,