This is the first article in a small series that will look at explaining the Semantic Web, and attempting to leave people better informed as to whether the Semantic Web, or any part of it, is the right solution for their problems. Since some of the core ideas behind the semantic web are rather abstract, i’ll be tackling it in small chunks, with shorter posts meant for easier digestion.
Too Much Data
With the explosion of the internet and the web, there is a wealth of data for everyone to read, process and understand. However, the sheer volume of data can be overwhelming for a single person to handle. Add the amount of user generated content ( blogs, del.icio.us bookmarks, etc… ) and the problem becomes almost unsolvable.
What we need is a way to create automated agents that go out and find the relevant data for us, and bring it back. Tags are an poor man’s example of this type of data collection agent. We specify the tags that are of interest to us, and whenever someone uses that tag, we get notified. However, there are all sorts of problems with this type of collection. Someone could misuse a tag, or we could get false-positives. A false positive in this case is when a tag carries one or more meanings. Take the tag ‘traffic’ for example. I could use that tag for content relating to automobile traffic patterns, web site traffic optimization hints, or a report on illegal trading of drugs. Any of those pieces of content could use the tag ‘traffic’, and yet all three have different audiences and different meanings. Since a tag only matches on a single keyword, it casts a wide net. So what’s a solution? Some say it’s the Semantic Web.
Everyone is talking about the Semantic Web like it’s 1998 all over again, but what exactly is the Semantic Web or more importantly, why should you care about it? The two key concepts of the Semantic Web are:
* Common shared meaning
* Machine processable metadata.
By combining those two characteristics, the Semantic Web is supposed to usher in a new era of knowledge management and content dissemination. The first characteristic will be covered in this article, with the second characteristic covered in Part 2.
Semantics is typically the word used to refer to the phrase “common shared meaning“. Dictionary.com defines semantics, in part, as:
“the meaning, or an interpretation of the meaning, of a word, sign, sentence, etc”
Well then, that’s pretty clear isn’t it? That’s a great academic definition, but what does it mean in relation to technology? More specifically, what does it mean to the web?
Syntax vs. Semantics
To understand how semantics relates to the web, you first need to understand syntax, and what the difference is between semantics and syntax.
Imagine for a moment that you have a web service that defines two operations:
Add( int number1, int number2) : int
Subtract( int number1, int number2) : int
Now, any human can look at the above service definition and infer that the first operation adds two input numbers, and returns the sum and the second operation takes two input numbers and returns the difference between the two. However, a machine processing this definition would not be able to tell the difference between the two. That’s because the syntax of both operations is the same. Both accept two numbers and return a third.
Since a human already has an idea of what ‘Add’ and ‘Subtract’ are and the syntax of the two operations roughly match what they know, they are able to draw the distinction between the two. That, in a nutshell, is what semantics are. Let’s look at another example.
Suppose you have an XML file that contains the following data representing two members of a school’s faculty:
<academicStaffMember >Bill Johnson</academicStaffMember>
<professor>Dr. James Smith</professor >
What if you wanted to query for all faculty members in this document? Again, a human would be able to see that not only is an academic staff member regular a faculty member, but a professor also is a faculty member. A machine processing this file would only be able to perform a superficial query, typically using XPath, like this:
//academicStaffMember – To get all “academic staff members”
//professor – To get all “professors”
There is no query like this:
That would return both elements. Again, there is no way for a machine to establish a connection between the two elements in the same way that a human can. Since most web pages are just XML formatted pieces of data, you should have a better idea of the the core problem that the Semantic Web tries to solve.
Wrapping It Up
The Semantic Web involves data that not only conforms to a specific syntax, but also carries with it meaning, so that it can be processed and parsed in an automated fashion. How exactly that meaning is transcribed is the other half of the Semantic Web definintion found above, and is the subject of Part 2.