Semantic Web/Introduction

From Wikibooks, open books for an open world
Jump to navigation Jump to search

The semantic web is an exciting evolution of the World Wide Web (WWW) providing machine-readable and machine-comprehensible information far beyond current capabilities. In an age of information deluge, governments, individuals and businesses will come to rely more and more on automated services, which will improve in their capacity to assist humans by “understanding” more of the content on the web. This has potentially far-reaching consequences for all businesses today.

More information on the web needs to be structured in a form that machines can ‘understand’ and process rather than merely display. It relies solely on a machine’s ability to solve complex problems by performing well-defined operations on well-defined data. Sir Tim Berners-Lee, inventor of the World Wide Web, has coined the term “Semantic Web” to describe this approach. Berners-Lee, Hendler and Lassila provide the following definition:

The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.

—Tim Berners-Lee, Ora Lassila, James Hendler, Scientific American May 2001

What Is The Semantic Web?[edit | edit source]

The Semantic Web is a mesh of information linked up in such a way as to be easily processable by machines, on a global scale. You can think of it as being an efficient way of representing data on the World Wide Web, or as a globally linked database.

The Semantic Web was thought up by Tim Berners-Lee, inventor of the WWW, URIs, HTTP, and HTML. There is a dedicated team of people at the World Wide Web consortium (W3C) working to improve, extend and standardize the system, and many languages, publications, tools and so on have already been developed. However, Semantic Web technologies are still very much in their infancies, and although the future of the project in general appears to be bright, there seems to be little consensus about the likely direction and characteristics of the early Semantic Web.

What's the rationale for such a system? Data that is generally hidden away in HTML files is often useful in some contexts, but not in others. The problem with the majority of data on the Web that is in this form at the moment is that it is difficult to use on a large scale, because there is no global system for publishing data in such a way as it can be easily processed by anyone. For example, just think of information about local sports events, weather information, plane times, Major League Baseball statistics, and television guides... all of this information is presented by numerous sites, but all in HTML. The problem with that is that, is some contexts, it is difficult to use this data in the ways that one might want to do so.

So the Semantic Web can be seen as a huge engineering solution... but it is more than that. We will find that as it becomes easier to publish data in a repurposable form, so more people will want to publish data, and there will be a knock-on or domino effect. We may find that a large number of Semantic Web applications can be used for a variety of different tasks, increasing the modularity of applications on the Web. But enough subjective reasoning... onto how this will be accomplished.

The Semantic Web is generally built on syntaxes which use URIs to represent data, usually in triples-based structures, i.e., many triples of URI data that can be held in databases, or interchanged on the world Wide Web using a set of particular syntaxes developed especially for the task. These syntaxes are called "Resource Description Framework" syntaxes. URI - Uniform Resource Identifier

A URI is simply a Web identifier: like the strings starting with "http:" or "ftp:" that you often find on the World Wide Web. Anyone can create a URI, and the ownership of them is clearly delegated, so they form an ideal base technology with which to build a global Web on top of. In fact, the World Wide Web is such a thing: anything that has a URI is considered to be "on the Web".

The syntax of URIs is carefully governed by the IETF, who published RFC 2396 as the general URI specification. The W3C maintains a list of URI schemes.