Tuesday, April 08, 2008

The other side of the story: is usage data all it's cracked up to be?

"I collect them - but I don't want them, I don't need them, and I won't use them," said Ian Bannerman, likening promotional giveaway items to usage statistics. We implicitly assume that usage data is consistent, credible and compatible, and that usage factors will be a meaningful indicator of something.

In terms of credibility, Release 3 of the COUNTER guidelines is making inroads into the robot problem (crawlers distorting usage data) by publishing a list of robots and encouraging publishers to exclude these known robots from their list: but is this enough, asks Ian? It's the "amateur" crawler efforts within university IP ranges that cause the most damage and the most distortion, and COUNTER's code will not help exclude these crawlers. And if you identify a crawler (Ian's example showed an article being downloaded 6,372 times in one day), should you retrospectively exclude its activities from previously published usage statistics? Had this crawler been a little more sophisticated, behaving a little more like a "normal" user, the activity would remain unnoticed.

There's a lot going on that we don't understand and that we probably won't ever identify, and interface effects can also distort statistics: HighWire's practice of automatically displaying the HTML version of an article, whilst offering the user the ability to download a PDF from that page, was wrongly giving the impression that the ratio of PDF:HTML downloads was 1:1 (similar stats from Wiley Interscience during the same period gave a more credible 20:1). Given that COUNTER stats are part of the toolkit used by libraries when managing collections, falsely inflating one's statistics in this way is ethically dubious (upon realising this Highwire discontinued its practice).

There is a danger that in measuring the literature, we change the literature. Impact Factors can be abused, for example, by increasing self-citations to the journal in other articles/editorials; alerting authors to content they should cite (however positive the intention); publishing cite-able papers early in the year to maximise the number of citations they could garner before they become eligible for impact factoring; targetting topical areas rather than long-term studies; and so on - see Chronicle of Higher Education Oct 2005 (Monastersky's article) for additional thoughts. The Usage Factor will also not be immune to such "observer effects": authors may encourage everyone they know to download their article and improve their ranking; publishers may "sex up" keywords or seek to double downloads per the HighWire example above, or encourage online coursepacks over printed ones to maximise usage ... and so on. Ian concludes that not all attempts to influence the impact factor are necessarily bad, while those attempts to influence the usage factor are - and are less traceable. Ultimately, citations (on which IF is based) are more meaningful than downloads (on which UF is based).

Recommendations
  • Be cautious; don't over-interpret usage data.
  • Let's carry out further research into factors that influence article downloads
  • Let's improve guidelines on detection/blocking/filtering of robots
  • Let's watch out for the Observer Effect when developing usage-based metrics.
During questions, Peter Murray-Rust noted that some robotic usage is for non-aberrant purposes (e.g. data mining) and we need to be careful to distinguish between different forms of robotic activity and ensure that we are not excluding valid usage from metrics.

Labels: , , ,

3 Comments:

Blogger Louise Penn said...

I think there is always a danger of using any one metric as the be all and end all, especially usage data as the interpretation of it varies so much from publisher to publisher (or aggregrator). It is a way of spotting trends, that's all, but like anything else it has flaws and should not be relied upon blindly. I value usage data but yes, there is something in the view that we collect but do not use (and perhaps, despite all the ways we are convinced we do, we don't need!)

12:24 pm  
Anonymous Anonymous said...

Can you direct me to an accessible version of the Monastersky article?

2:32 pm  
Anonymous Anonymous said...

The Monastersky article is here: http://chronicle.com/free/v52/i08/08a01201.htm

2:50 pm  

Post a Comment

<< Home