Tuesday, April 13, 2010

Looking for Semantic Meaning

Richard Padley is very much aware that the semantic web can be a confusing area of computer acronyms and complex concepts, and he aims to make it easier for us. He starts by describing the web as something that has been built by humans for humans. The semantic web seeks to add value by creating machine readable data.

Facts cannot be copyrighted, and this is a fundamental concept at the heart of the semantic web. Making facts openly available in a machine readable format opens up a huge amount of potential for a variety of applications. However, the concept of trust is really important here - where did that fact come from, and is it a reliable source?

There is nothing particularly formal about open data at the moment. Although standardisation is going on, most of the work is growing up from roots by people exposing and then experimenting with data.

The 'linked' part of linked data is the 'genius' part of the open data movement, using web addresses as identifiers to allow you to link to any concept without having to redefine it in your work.

Open data can help solve the fragile schema problem. With a traditional database, any change to the core information can cause major disruption including the need to communicate the changes, test the changes, make sure that everyone who is reliant on that data is able to update with breaking. By pulling data in realtime from a variety of resources that effectively manage changes to information, these changes are managed 'on the fly' allowing knowledge to be effectively repurposed.

Richard demonstrates how two pieces of text can be analysed using inference to answer questions. By taking one paper that describes the factual properties of 'needles' and 'hay', and a second paper that describes the process of attracting needles using steel, a computer can answer the question 'how do i find a needle in a haystack?'.

One of the problems faced by publishers is that we still work using the metaphor of paper within academic publishing such as having page references that reflect the print world in an online presentation. Publisher platforms are also silos, making linking data across systems impossible. This means users have to develop ever more sophisticated searching skills to cope with the silos. Linked data could take that searching requirement away.

The challenge to publishers is how to change in order to make the best use of linked data, moving beyond the paper metaphor and moving outside the bundaries of their silos. It is time to wake up and start experimenting.

Finally, Monty Python and the Holy Grail can show us how flawed facts can corrupt linked data.


Blogger Charlotte said...

Thanks Nicole - this is a great summary of a really complex subject, (well for me anyway)

1:34 pm  

