Tuesday, April 08, 2008

Q & A for Plenary Session 3

Q: I am an academic and I build robots. I do this because I want information but you have described what I do as damaging. I do not threaten copyright but it is difficult to download information responsibly from publisher websites

A (Ian Bannerman): I'm sorry I gave that impression. Robots do distort statistics though. The type of usage you use it is legitimate and good and there is work for publishers to do. But if it is measured as human use there is a real issue.

A (Richard Gedye, Oxford Journals (Chair of Session)): we are looking at this in COUNTER and the issues raised by federated searches etc.


Q (for Herbert Van de Sompel): is there not a place for simple usable metrics for people to use

A (Herbert Van de Sompel): they should not be simple but usable - we should know what they are about. We don't really know what they're all about. It is really early days in the study of usage indicators so we are trying to get a grasp on this issue. We won't get out of the two year project with a set of simple metrics to use but we should have some ideas and some valid caveats about using them. A real distinction between a research project as oppossed to launching a metric and being stuck with it for years.

A (Richard Gedye, Oxford Journals (Chair of Session)): Yes we see Herberts work as almost a Which? Guide to Usage Statistics that can be taken forward to build the type of simple metrics for practical usage.

Labels: , ,

Use and Abuse of Usage Measures - Ian Bannerman, Managing Director for Journals, Taylor & Francis

Ian Bannerman is offering a slightly contrasting view to the previous two speakers

COUNTER and the Usage Factor
Launched in 2002 this attempt to make usage data creditable and countable. COUNTER conceived by Herbert Van de Sompel and Colleagues in 2006. An invitation to tendor is now out.

Thomson Scientific Impact Factor: total cites for items published and total items published
Usage Factor: total usage of published items.

Implicit assumptions of usage statistics
  • Usage data is consistent, credible and compatible
  • Usage factor would be a meaningful indicator of something
COUNTER guidelines on filtering for robots and pre-fetching are in draft (release 3) - they will filter a list of known robots Ian Bannerman does not think these go far enough though. Downloads may not be accurate either - not all downloads are successful or intentionally/human initiated. Also most known robots won't get past acces control on subscribed content but it's the unknown ones that distort the numbers (those within IP range of university - amateur attempts to mine data).

Ian has brought up an example article - many many access in one Russian institution - accessed once every 9 seconds or so by some local error (COUNTER would ignore that); another example shows every article in a journal being accessed about 57 times by a Korean institutions - look suspiciously like a robot but the stats arrive 3 months after the event; a further example is an uncited article (and an obscure one) being accessed 1,183 times - not clear why! There is a lot of noise in the system and it's hard to identify or understand it all.

Is usage a meaningful indicator anyway?
Ian Bannerman cites Davis & Price (2006) [eJournal Interface can influence usage statistics: implications for libraries, publishers and Project COUNTER. JASIST v.57 n.9, 1243-1248] in showing the impact of an interface on usage which, he claims, is at odds with the meaning of usage statistics. In particular he talks about those journals that require viewing of full text HTML before downloading a PDF - COUNTER would count this twice at present! Bannerman adds that if people's careers relied on usage (rather than impact) of publications you would clearly have some issues here.

Bannerman is also concerned about the impact on publishing usage statistics and the lack of transparancy that may occur if financial success dependent on them - the Observer Effect. "By measuring the literature we may change the literature." Issues at author or publisher level include (and this is on impact factor): self-citing; alerting authors to content they "should" cite; seeking out prolific high quality authors (who may self cite); publishing most citable articles early in the year (larger window for citation and impact factor); targeting topical areas rather than long term studies (affects funding); publising review articles; etc.

Additional issues for usage factor may be worse: getting friends, your neighbour etc. to download articles (or writing a bot to do it); temptation to leave usage data unfiltered; publishing for students not for researchers (impact factor for citations is prestige amongst peer group, usage is based on numbers); sexing-up title and key-words; using abstract to tease rather than inform; stopping printed journals; blogging it, tagging it and posting it; broadcasting metadata but keeping articles where they are counted - not in OA repositories (although your blogger here feels this is as things are, you could do counting from OA repositories).

Impact Factor - not all attempts to change and improve impact factor are "bad", leave an audit trail, act of citing usually meaningful (you stake your reputation on it). Usage trails not (as) trackable, no reputation impact as practically anonymous.

Recommendations
  • Extreme caution in over interpretating usage data
  • Further research into factors that influence article downloads
  • Improved guidelines for identifying and filtering robots
  • Awareness of the Observer Effect

Labels: , , ,

Information-seeking behaviour of the virtual scholar: from use to users - David Nicholas, UCL

David began by addressing the issue of disconnection from the user. We monitor activity rather than actual users. The virtual audience differs in composition from previous audience - we also can't even see it and we find they move elsewhere (accessing material from publisher sites for instance). Content was kind. Now the consumer is king.

We need to identify best practice, find scholarly outcomes and achieve satisfaction.

David has a slide up to illustrate the Virtual Scholar: a portfolio of services used by these users. This information is evidence based including:

  • UK National e-Books Observatory, JISC, 2008-9
  • Impact of Open Access Journal Publising, OUP, 2006-
  • RIN study on use and impact of journals, RIN, 2008
  • Behavious of the Researcher of the Future (Google Generation), 2008
Digital information footprints allow a complex view of information seeking behaviour which is rich in detail.

Profiling Information Seeking Behaviour
  • There are huge numbers of scholars and high demand for scholarly product driven by ubiquitous access (on buses, trains, hotels etc, existing users can search more freely and flexibly); huge usage numbers; spiralling growth; usage is not the outcome.
  • Some issue in the fact that many users are overseas - UK government funded scholarly websites have less than a third of their users in the UK; Asia loves OA; what issues does this raise?
  • Many users are young - information seeking behaviour is very different; spend lots of time online and some still see them as "noise" in the stats
  • Robots are always an issue - around half of all scholarly site visits are by robots (in some cases 90% of users are robots); now mimic human behaviour (Google's are particularly shrewd)

Human/Human-like behaviour
  • Shop around (40% of visitors never visit again)
  • Bounce (1-3 pages only of the many available - overseas visitors bounce less, young people more)
  • Flicking (a kind of channel hopping behaviour)
  • View (humans conditioned by emailing, text etc.
  • Don't view articles for more than 2 minutes
  • Spend more time reading short articles than long articles online; if it is long either read the abstract or squirrel away for later)
  • Power browse (you can hoover through titles, contents, abstracts etc at huge rate);
  • Books now opened-up great view
  • Horizontal rather than vertical
  • Navigate (we spend half our time navigating to content)
  • We are not all the same (national differences, e.g. Germans most successful searchers and most active information seekers; age differences; gender differences (women are less permiscuous!))
  • Brands very complex but imporant (difficult to identify where authority lies (especially with authorised resources - hard to tell how your access is occuring)
  • What you think is the brand is not what other people will see as the brand, and some are cool, some are not)
  • Do not behave like a librarian!
  • Behave like an e-shopper (use a common platform, multitask, information pedigree of some of key e-commerce giants: Amazon and Google)

Impacts, outcomes etc. best summed up by Guardian (quoting Marshall McLuhan's "Gutenberg galaxy").

David Nicholas compared power browsing and information seeking etc. to alcoholics anonomous: people don't want to admit they do these things. We are all behaving like this though - not just the young! Although older users have different conceptual framework for this behaviour.

Access is no longer the outcome - need to go beyond having that access be easy and quick, now we need to profile behaviours in order to find best practice and see what works and what does not. Establishing the good and the bad needed to establish development of information literacy. We also need to know how we justify our spend on information resources by proving value.

"We are not fighting Google, the battle is with ourselves"

Unless we connect with our users we will dissipate.

Labels: , , ,