Wednesday, April 01, 2009

Google and Librarians: why it shouldn't be us and them

In preparing to speak to us today .. Clare Duddy did some Googling. In Google's browser. She warns us upfront, in case we hadn't clocked, that she's pro-Google.

Clare is a Masters student at London Met university who won a UKSG competition to present her views on information discovery in the Google generation. I am torn between wishing that I'd had an opportunity like this when I was a student, and thinking how petrifying it must be to present to an audience of professionals eager to hear your views. Clare tells us she's nervous but proceeds to speak confidently and knowledgeably on a subject that while familiar to us all, still holds surprises.

Us and them
Clare already works part time at Oxford University libraries as an electronic journals assistant. Interestingly, she sees the "us and them" of the information world as librarians vs Google (not, as some of the other UKSG delegates might see it, as librarians vs publishers). Between her work and her thesis, Clare spends a lot of time looking for information. "I've been online for more than half of my life, and search engines were already prevalent by the time I started my academic career - I've never had to find information without them." She quotes a friend: "Google is an extension of my memory - I don't have to keep facts in my head."

Finding information
There is a new balance in education as we keep up with emerging technologies. Google has 63% share of the search engine market (13.5bn searches in the US in Jan 09); OCLC research shows that 89% of college students start searches on search engines and Clare confirms it's her first port of call for all her information needs from academic to social. It's a known known. Perhaps less known is that the same research shows only 1% of users starting their search in an online database.

The Google generation
The Google generation is not defined by an age group but by a demographic - "always connected"; multi-tasking; computer literate. Clare says we might also see this group as "impatient, gullible and lazy" - taking the first result they find in a search engine and giving librarians sleepless nights. As we know, the main problems with using search engines as our point of entry to research are:

Material not indexed
* deep web
* access controlled
* non-linked
* robot-excluded
* non-HTML
* no static URL)
Despite this, Google has value - it highlights "informal literature" - the non-traditional materials that other library resources don't surface so effectively, if at all. Through Google Scholar you can filter your search to authoritative content, and the Library Links program enables libraries to direct users to licensed content. And because of Google's power and influence, they drive exposure and sensible structuring of content (e.g. Harvard has redesigned its website to expose its digital collections more effectively; National Libraries of Australia have created stable URLs and metadata for individual items in their image collection). There is a sense that we overestimate what you can't find, and underestimate the value of what you can find.

Quality of material online
"Democratic" (user-generated) publishing - famously exemplified by Wikipedia - concerns librarians and publishers, the gatekeepers of authoritative content. But Wikipedia's advantage is its breadth - over 2.7 million entries in comparison to Oxford Reference Online's 1.3 million (yes, there could be an apples and oranges issue here). "We have to assume that we can't control the web or impose our authority on it any kind of comprehensive way", so how do we manage our response to what we find? With "a pinch of salt"; the widespread news coverage of Wikipedia's flaws, and our own knowledge of how simply we can publish what we want, helps us understand that not everything we find can be trusted. Librarians spend a lot of time already training users about the quirks of different online resources; why not include Google and Wikipedia (etc) in that training.

Deskilling search
Clare recalls a lecturer harking back to the glory days where "users were not allowed near the computers and had to use a librarian to find information", but "librarians are no longer required in that role" - they feel displaced; is their reticence about broad search resources based on frustration? There is a context in which "one-box" search engines are in fact the best way to find something. But still users have need of more complex search interfaces and despite their fondness for simplicity they do recognise the value of more sophisticated search.

Conclusion
"Young people today need to be educated to use these tools properly, just as we had to be taught to use a library and book properly in the past". We shouldn't assume there is one Google generation with one set of characteristics - users are still a complex group with varying needs. It can only be helpful for us to acknowledge the place of Google in our users' lives and to help grow their understanding of this tool in the context of the other tools we offer.

(see next post for question and answer session revealing more of Clare's online behaviour)

Coda: Clare's presentation was excellent - not only interesting and well-informed in terms of the material covered but ably and compellingly presented. The feedback about this session has already been overwhelmingly positive and we'll definitely be thinking about how to follow up with more user input at next year's conference.

Labels: , , , , , ,

Monday, April 07, 2008

How to make your IR effective as a publishing platform for grey literature

"I know nothing about IRs", admits Toby Green, "but I once wrote a paper about tidying up our grey literature at OECD, which seems to have garnered a lot of interest." Today he tells us he'll cover:
  • Post-it-and-hope-Google-finds-it approaches to dissemination of content
  • What does it take to satisfy the needs of various stakeholders
  • What did the OECD do with its working papers
Post and hope
Out of 40 starters at this year's Grand National, 14 finished. Rank outsiders enter what is one of the country's hardest courses - perhaps hoping that everyone else will fall over and allow the rank outsider to win. Is "post it and hope" an equally unlikely strategy for success? It relies on a single discoverability system (search) which puts considerable pressure on metadata to be of sufficient quality to drive successful discovery. And it's a "survival of the fittest" environment: if you are not part of the "short head" (the blockbuster opposite of the long tail) your chances of discovery through major search engines are also limited. It's a passive strategy that is author-, rather than reader-centric. Ultimately, says Toby, it doesn't work. The OECD.org website is a platform for authors to upload their content - which they do - and 90% of it is *never* downloaded.

Stakeholder needs
What do the various stakeholder groups require from literature repositories? As a group - made up of representatives from libraries, publishers, agents, intermediaries - we brainstormed some of the things that different user groups require from a publishing system.

Authors
  • need a channel for dissemination
  • need visibility/recognition for career development
  • need to be read
  • need to claim ownership of ideas
  • need to fulfil mandates (from funders, institutions)
  • need an easy process, preferably with others doing as much as possible
  • need reports on how the work has been used
  • need archiving
  • need links/dissemination to other platforms where they want to be visible/involved
Readers
  • need full text but don't want to have to read it
  • need integration with other workflow tools
  • need easy discoverability - and access - for free
  • need related data and inter-literature links
  • need an indication that the content is authoritative
  • need reliability/predictability of content's location
  • need awareness and other contextual services
Institutional administrators; bosses
  • need reports on usage, financial aspects (value for money), who has been published
  • need prominent branding / enhancement of reputation
  • need budget - and usage - a critical mass of deposits
  • need quality to meet institution's standards and reduce later work
Librarians
  • need more time, resource and better equipment
  • need training
  • need standards
  • need tools to support processes
  • need clearer legal guidelines from publishers
Funders
  • need reports (on usage/what's been published) to show that grants are producing sufficient material
  • need visibility, research profile
  • need dissemination to expedite ongoing research
Intermediaries
  • agents
  • aggregators
  • publishers need copyright and brand to be respected/protected; credit where due
What did the OECD do to meet these needs?
Originally, authors could post what they wanted, when they wanted. Readers, however, struggled to find this material. Administrators were concerned about quality control and reputation; funders were asking questions about impact and ROI. Librarians - were laughing - despairingly? Authors weren't asking for OECD's assistance; administrators didn't think it had anything to do with OECD. Papers were presented in a jumble on the OECD website
  • no metadata standards
  • no quality control
  • no underlying database/workflow
  • no common vision
  • no knowledge of what readers need
  • no understanding of discovery systems
OECD's solution was to get the publishing staff involved to
  • establish metadata standards
  • establish quality control steps
  • create underlying database/workflow
  • build common vision
  • research readers'/librarian needs
  • exploit discovery systems
  • monitor results.
Metadata is key:
  • analyse the papers to identify metadata fields
  • add additional fields to meet industry standards
  • sign off fields so database can be built
  • QA existing metadata; fix numbering problems
  • Fill and QA the database
OECD then created a workflow to minimise effort and create efficiencies - converting the paper to a PDF for hosting and onward dissemination. A single webpage now categorises the papers and links through to organised lists of papers within categories. Metadata is consistent and comprehensive (DOI, abstracts, keywords etc.), and is submitted to RePEc - vastly improving that database's coverage of this content, since authors had previously not been diligent in uploading their own content. And at the full text level, the workflow system adds a templated cover page with improved, consistent branding and clear, exportable citations.

Following this overhaul of the workflow, traffic to the working papers has more than doubled.
  • authors needs are being met: data is more visible in more locations, the data is marketed within the OECD platform, reports are available from OECD and its partner platforms, and authors are not required to carry out any of the processes
  • readers can access the full text and improved metadata helps them understand it without reading it, citations can be exported, content is discoverable, background data is linked, citation linking and "more like this" links are forthcoming, the content is clearly trustworthy and well serviced with awareness alerting
  • administrators can download usage reports and assess financial value, branding has improved, quality is controlled (inappropriate content is rejected)
  • librarians are not required to carry out any of the processes, legal guidance is clear
  • funders are getting good value for money without additional expenditure.
In conclusion:
  • QA - requires filtering to protect institution's reputation
  • Distribute to disseminate - content needs to be widely discoverable with supporting capabilities such as MARC records
  • Promotion - internal awareness-raising with authors so they understand why the process is valuable to them
  • Reports and 'ego' tools (RePEc has good ones); reader tool
  • Institutional repositories need to either outsource to a publisher, or employ people with publishing skills to manage the process effectively

Labels: , , , , , , , , , ,