Tuesday, April 08, 2008

Reconsidering scholarly impact: MESUR

"Usage data totally rocks", chirped the endearingly-passionate Herbert van de Sompel as he attempted to rouse us all from our hangovers on this second morning of UKSG. van de Sompel's team in Los Alamos has explored interoperability, OpenURL, OAI-PMH, repository architecture and more but is now focussing on MESUR, a project intended to enhance methods of assessment of scholarly impact.

In the paper age, our best attempt to quantify scholarly impact was to count citations. But in a networked environment, we have many more metrics to deploy:

Usage-based metrics
can include numbers of accesses to scholarly material, where the come from and so forth. We can factor in usage of multiple content types (preprints, blog postings, datasets alongside journals and articles) and maintain a comprehensive record from the moment of an item's digital publication. Usage data can, however, present significant challenges. What *exactly* constitutes usage? Can we be sure to protect users' privacy? How do we standardise and aggregate data records?

Network-based metrics
can leverage citation networks, co-authorship networks and so on to assess behaviour. We need to select metrics that characterise the network and define the importance of specific nodes within that network. Tools like Google PageRank and the Eigenfactor can help us to assess networks and assign appropriate levels of significance to nodes within them.

MESUR has accumulated a vast dataset (1 billion usage events, relating to 50 million documents/100,000 publications, spanning up to five years) from multiple stakeholders in the information community. It is important to avoid bias in sampling and analysing this data. Cross-validation against existing indicators, to ensure that there is an appropriate level of correlation, allows the team to check whether their results are broadly valid. The project's goal is to assess whether metrics can be defined from usage data and how these could be used if so.

Networks are identified based on tracking a user's behaviour through a session - for example, creating a connection between the documents downloaded by a single user. Once this type of analysis has been extrapolated to a billion usage events, patterns emerge. This helps to confirm our expectations that, for example, practitioners use the literature differently to researchers. It also shows that whilst users read across multiple disciplines, their citations tend to stick to their own discipline. Correlating different metrics on maps shows that usage-based metrics tend to cluster together (basically in agreement with one another), whereas citation metrics vary both from usage metrics and from each other. Overall, there is an indication that the traditional Impact Factor (IF) "is a completely different animal" to multiple network-based metrics.

Labels: , , ,


Post a Comment

Links to this post:

Create a Link

<< Home