What is the Semantic Web and why should I care? – Part 2: Metadata

This is the second article in a small series that will look at trying to explain the Semantic Web, while attempting to leave people better informed as to whether the Semantic Web, or any part of it, is the right solution for their problems. Since some of the core ideas behind the semantic web are rather abstract, i’ll be tackling it in small chunks, with shorter posts meant for easier digestion.

In part 1 of this series, I described about the two aspects of the Semantic Web, and covered the first of those, common shared meaning. In this article, I’ll expand on the second aspect, machine processable metadata.

Metadata

Data, there’s tons of it. The only way to make heads or tails of it is to store data about the data. That, in a nutshell, is metadata. Data about other data. The statement “His first name is John” says that John is first name of someone. This is a pure english version of meta data. The piece of data here is John and it is assigned some metadata that gives us more information about what the data means. In this case, someone’s first name is John, as opposed to their last name.

Since the Semantic Web definition we talked about in Part 1 specified machine processable metadata, we need to be able to transcribe metadata so that computers can parse it. As an example of transcribing metadata, let’s look at this example XML:

<firstName>John</firstName>

In this case, John is the data we are talking about and <firstName> is the metadata we use to describe the piece of data.

However the problem is that just like the data itself, the actual metadata that we store can have an infinite number of values. While having an infinite numbers of available choices for the data is expected, we should be able to narrow down what metadata we know about and expect. We need a way to be able to specify a range of acceptable values so that machines can parse the metadata. For data in XML format, this is where XML Schema comes into play. XML Schema allows us to specify a schema:

<xsd:element name=”firstName”/>

To define what metadata can be found when parsing our data. However, just as XML itself can only specify the syntax of the data we expect, XML Schema can only specify the syntax of the metadata we expect. They have no ability to represent any semantics about our data. The element ‘firstName’ could be ‘aaaaaa’, and a computer wouldn’t know the difference. We need to use some other means to represent semantics when storing data.

Enter RDF

RDF stands for resource description framework. It is comprised of three basic concepts:

  • resources
  • properties
  • statements

Resources are the things that you are describing. People, places, books and articles are an example of resources. Properties are attributes about those resources. Hair color, address, author, and magazine name are examples of properties that I would use to describe the resources above. Statements simply assign values to properties that describe resources. Putting the three together, a statement would look like this:

Griffin Caprio lives in Chicago

In the above statement, “Griffin Caprio” is the resource, “lives in” is the property, and “Chicago” is the value that we’re assigning to the property “lives in”. Now, RDF can take many serialized forms. Since I have been using XML up to this point, we’ll concentrate on RDF’s XML representation. Thus, the above statement would look something like this:

<?xml version=”1.0″ encoding=”UTF-8″>

<rdf:RDF


xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”

xmlns:myNS=”http://www.myNS.com/rdf”>

<rdf:Description rdf:about=”http://blog.1530technologies.com/author”>

<rdf:type rdf:resource=”&myNS;bio”/>

<myNS:livesIn>Chicago</myNS:livesIn>

</rdf:Description>

</rdf:RDF>

The <rdf:Description> element says that this piece of RDF describing the resource specified in the about attribute. The <rdf:type> element creates a connection between this RDF statement and a type in my schema. The <myNS:livesIn> element is an element from my own schema that I am assigning a value to.

That’s it. That’s RDF.

So what?

That’s it? Yup, basically that’s all RDF is. There are some more elements and attributes, but these are the basics. Honestly, when I first heard of RDF, it made no sense where it would be useful. I even read an entire book on RDF, and still couldn’t make heads or tails of why it was useful. It seemed to some elements ( Description, type ) and a attribute ( about ), but other than that, you still had to use your own schema ( the myNS:livesIn element above ). To bend your noodle a little more, there’s even a simplified syntax for representing RDF in XML that is allowed. Thus, the above RDF/XML snip can be represented as such:

<?xml version=”1.0″ encoding=”UTF-8″>

<rdf:RDF

xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”

xmlns:myNS=”http://www.myNS.com/rdf”>

<myNS:bio rdf:about=”http://blog.1530technologies.com/author”

myNS:livesIn=”Chicago”/>

</rdf:RDF>

Wait a minute, so in the above, the only thing I’m adding is surrounding <rdf:RDF> tags and about attributes? Wow, I feel the Semantic Web engrossing me already

To me, I couldn’t figure out what the difference was between this and getting everyone to use the same schema. I figured I just wasn’t getting it. But, if this was such a important technology, and the cornerstone of something like the Semantic Web, surely, someone should have been able to come up with a use for RDF since it’s inception. I mean, the authors of the rdf syntax specification are very bright guys and Tim Berners Lee ( Author of the WWW, perhaps you’ve heard of it? ) firmly believes in RDF as a key component of the Semantic Web. So what am I missing here?

Enter RDF Schema

It turns out that just like there is XML Schema for specifying the structure and syntax for XML files, so to does RDF Schema exist for specifying the syntax and structure of RDF documents. However, and this is a key difference, XML is still useful without XML Schema, while, IMHO, RDF is useless without RDFS. This could just be my opinion, but since no one seems to:

  1. Know anything about RDF, despite it being around, in one form or another, for almost 8 years.
  2. Even know that RDFS existed, despite RDF actually being described by it.

I feel pretty confident in putting forth my theory.

RDF Schema gives you the ability to define semantics for any domain that you want. By defining classes of objects and their properties, users can define relationships between resources. Let’s use a modified version of the example I talked about in Part 1. We have data that looks like this:

<executive>Bill Johnson</executive>

<intern>James Smith</intern>

Our problem is, how do we query this data for all employees? Syntax-based queries, such as XPath, would not be able to support this. However, using RDFS, we can define a RDF Schema document that does 3 things:

  1. Defines an ‘Employee’
  2. Defines an ‘Executive’, that descends from ‘Employee’
  3. Defines an ‘Intern’ that descends from ‘Employee’

Now, we can use an semantic-based query, such as a RDF query, to return all employees.

Now, i’ll talk a little about one aspect of RDF/RDFS that may have you scratching your head and I’ll define the RDFS for the example above. So, if you don’t care about the guts of RDFS, feel free to skip to the next section.

Ok, for those of you still with me, it may amuse you to know that when you define RDF types using RDFS, you use the <rdfs:Class> type, such as:

<rdfs:Class rdf:ID=”employee”/>

<rdfs:Class rdf:ID=”executive”>

<rdfs:subClassOf rdf:resource=”#employee”/>

</rdfs:Class>

<rdfs:Class rdf:ID=”intern”>

<rdfs:subClassOf rdf:resource=”#employee”/>

<rdfs:Class>



Here is the RDF Schema for our little example above. What’s interesting is that the RDF Schema for RDF Schema itself looks like this:

<rdfs:Class rdf:ID=”Resource”/>

<rdfs:Class rdf:ID=”Class”>

<rdfs:subClassOf rdf:resource=”#Resource”/>

</rdfs:Class>



For those clever readers, you can see that a Class is a type of Resource, which is a type of Class. Oh, and Class is defined using itself. Is it any wonder that these concepts are a little hard to grasp for most people?



Now, forget about RDFS ( kinda.. )

Hopefully the value of RDF/RDFS is beginning to come through. By being able to define higher level semantics using RDFS, your RDF documents become a little smarter. This is where the second part of the Semantic Web definition comes into play. With the addition of RDFS, your metadata AND semantics become machine processable. This is because RDFS is a primitive ontology language. Ontologies are formalized semantics. We defined a simply ontology above when we mapped the hierarchy of Employee, Executive, and Intern.

Unfortunately, the key word in the previous description of RDFS is PRIMITIVE. RDFS only allows for very simple relationships and semantics, mostly centered around hierarchies. “Executive” descends from “Employee”, for example. In order to specify more complex semantics, a more robust language is required.

This is where the Web Ontology Language comes into play. Incidentally, the TLA for the Web Ontology Language is OWL….HUH? Shouldn’t it be WOL? Eh, the craziness continues….

Coming up for Part 3….Ontologies: Semantic Data Nirvana?

Related Posts:

* What is the Semantic Web and why should I case? – Part 1: Semantics