Wednesday, April 09, 2008

Translating Geek to English: exploring the possibilities of the Semantic Web

"I've made a good business of translating Geek to English," says Geoff Bilder (who hates buzzwords and can't remember submitting a paper with the title Web 3.0 - mea culpa, possibly).

Back in 2004, Geoff talked at UKSG about mash-ups, syndication, RSS and FOAF. Those were the balmy days when the term Web 2.0 had not been coined and we could talk about these individual technologies - and let them get on with changing the web - without lumping them together in a faceless buzzword bundle.

We can draw analogies between our current situation and the huge explosion of content that occurred shortly after the invention of the printing press. But if you compare the timelines, we're still in the primitive stages of developing our technology - "we haven't reached our Martin Luther moment". And just as we are uploading facsimiles of printed works onto the web, early modern European printers illuminated their incunabula to make them more palatable to an audience bred on monk-y manuscripts.

But we're uploading masses of this stuff. Too much. Who can read the glut of data that is available - and relevant - to them? Researchers are inundated. "People would really like to try to avoid reading," in order to get on with research rather than background tasks. Web 2.0's "read + write" capabilities help researchers to help each other find what's out there. Blogs are ubiquitous and emerging tools are enabling easier distinction between research-related and other postings. Social bookmarking allows us to share with others, quickly and easily, the information we are interested in. Tagging enables filtering of bookmarks; ultimately it's a process of subscribing to a colleague's brain.

Web 3.0 takes us beyond "read + write" to "read + write + identity + compute": it promises that we don't need to strip data out of published articles (extracting HTML from a print facsimile), and analyse it before stuffing it back in ... we'll create consistent metadata, structure it, share it in easily-computer-digestible forms (standard ones) and make better use of the content that is out there: it's the semantic web. Storing data in formats such as RDF allows for modeling of relationships between data; metadata encoded in this way allows HTML pages to be queried (using Sparql) to extract metadata NOT by harvesting and parsing (unreliable, prone to error) but by extracted tagged fields: the page is not only human-readable, but also machine-readable. This machine-compatibility is key to the semantic web and to Web 3.0 (whatever that is). Just as tables of contents, page numbers and many other tools were developed - over centuries - to make printed content more accessible and useful, so we are now developing new tools that make our current content formats more accessible, more useful.

Richard Gedye asks whether the technology exists to track how many times an article is bookmarked across multiple social bookmarking sites (answer: yes) and to drill down and explore who has bookmarked it (yes, theoretically, but there are privacy issues).

Mark Ware has been exploring Geoff's page during the presentation, and has picked up an article entitled "Scientists shun Web 2.0". Connotea has 50,000 users averaging fewer than 10 tags per user; Ginsparg's review of social bookmarking shows low uptake. Why? Answer: We had the same reaction to personal computers, to email, and to many other technologies at this stage of their development. [We're still fairly early on the adoption curve]. Some things like RSS have only really become useable in the last year or so, as browsers become more intelligent. Only when technologies mature and people recognise the value they add will there be good uptake. Mark responds that scientists don't see that value yet - no time is being saved, they think. Geoff says it IS more efficient; don't knock it till you've tried it. Our current means of interacting as a community, and sharing information, is going to conferences and networking with our peers. That's a much higher-bandwidth method than sharing content digitally.

Labels: , , , ,


Blogger Geoffrey said...

Slight clarification, I meant that people already find that social events like conferecnes, etc. are very high--bandwidth methods of sharing information. In fact, thay are much higher bandwidth than through traditional reading, reseach techniques. The value I see in digital social networking tools is that they replicate this high-bandwidth form of social communication.


11:06 am  
Anonymous Mark Ware said...

And to clarify my point, I'm not knocking social book-marking and the other Web2.0 stuff - I use them all the time myself - but I think it's a genuinely interesting question why scientists don't use them more when (as Geoff says) they seem replicate important parts of scholarly communication.

One clue may have been in Herbert van de Sompel's fascinating presentation - the real social graph may lie in usage and citations, in which everyone participates.

9:08 am  
Blogger Geoffrey said...

And I would wager that exposing the implicit social graph in researcher's current informal communication via email would provide a much more "real" social graph than usage or citations. The interesting thing will be when (if) this informal conversation moves from email to social software systems.

Alternatively, we could tag researchers with RFID tags when they visit conferences. I'll just go get my blow-gun...


9:30 am  
Anonymous Anonymous said...

This comment has been removed by a blog administrator.

12:49 pm  
Anonymous Anonymous said...

This comment has been removed by a blog administrator.

5:30 am  
Anonymous Anonymous said...

This comment has been removed by a blog administrator.

3:41 am  

Post a Comment

Links to this post:

Create a Link

<< Home