The other side of the story: is usage data all it's cracked up to be?
In terms of credibility, Release 3 of the COUNTER guidelines is making inroads into the robot problem (crawlers distorting usage data) by publishing a list of robots and encouraging publishers to exclude these known robots from their list: but is this enough, asks Ian? It's the "amateur" crawler efforts within university IP ranges that cause the most damage and the most distortion, and COUNTER's code will not help exclude these crawlers. And if you identify a crawler (Ian's example showed an article being downloaded 6,372 times in one day), should you retrospectively exclude its activities from previously published usage statistics? Had this crawler been a little more sophisticated, behaving a little more like a "normal" user, the activity would remain unnoticed.
There's a lot going on that we don't understand and that we probably won't ever identify, and interface effects can also distort statistics: HighWire's practice of automatically displaying the HTML version of an article, whilst offering the user the ability to download a PDF from that page, was wrongly giving the impression that the ratio of PDF:HTML downloads was 1:1 (similar stats from Wiley Interscience during the same period gave a more credible 20:1). Given that COUNTER stats are part of the toolkit used by libraries when managing collections, falsely inflating one's statistics in this way is ethically dubious (upon realising this Highwire discontinued its practice).
There is a danger that in measuring the literature, we change the literature. Impact Factors can be abused, for example, by increasing self-citations to the journal in other articles/editorials; alerting authors to content they should cite (however positive the intention); publishing cite-able papers early in the year to maximise the number of citations they could garner before they become eligible for impact factoring; targetting topical areas rather than long-term studies; and so on - see Chronicle of Higher Education Oct 2005 (Monastersky's article) for additional thoughts. The Usage Factor will also not be immune to such "observer effects": authors may encourage everyone they know to download their article and improve their ranking; publishers may "sex up" keywords or seek to double downloads per the HighWire example above, or encourage online coursepacks over printed ones to maximise usage ... and so on. Ian concludes that not all attempts to influence the impact factor are necessarily bad, while those attempts to influence the usage factor are - and are less traceable. Ultimately, citations (on which IF is based) are more meaningful than downloads (on which UF is based).
- Be cautious; don't over-interpret usage data.
- Let's carry out further research into factors that influence article downloads
- Let's improve guidelines on detection/blocking/filtering of robots
- Let's watch out for the Observer Effect when developing usage-based metrics.