Atom-OWL Ontology 12 August 2004

For an overview of my work see the parent directory.

titleA Blog on combining Atom and FOAF
authorHenry Storycreated29 Jun 2004 08:10:00 GMTN3 fileentry.2004-06-29-1010.n3RDF fileentry.2004-06-29-1010.rdfAtom-like RDF fileatom.2004-06-29-1010.rdf
This is an example blog to illustrate what a fully semantic weblog could look like. Instead of using RSS or Atom content for the feed, I show here how one can create a very powerful, flexible and easy to understand system based on Semantic Web Triples, presented for clarity in N3 format and backed up by an OWL ontology.

This project is a continuation of the work by Danny Ayers on Atom+OWL, but it does not feel the need to stay so close to the Atom specification, and is more interested in using all the tools available to the Semantic Web developer to show how much further one can go in this direction, than it is in convincing people in the Atom community that there is a simple mapping from their work to this one. By integrating with the rest of the Semantic Web community one can also more clearly describe the true contribution of Atom: by refactoring Danny's simple mapping, and extracting elements that are better addressed by other OWL libraries (such as FOAF), we can locate the core of Atom.

Another difference concerns the role of standards. Whereas Atom which is aiming to create a standard format for blogging, this method is following in the spirit of the FOAF group, an attempt to create something like a software library, in the spirit of an Open Source project.

authorHenry Storycreated12 Jul 2004 17:35:00 GMTN3 fileentry.2004-07-12-1935.n3RDF fileentry.2004-07-12-1935.rdfAtom-like RDF fileatom.2004-07-12-1935.rdf
All of the content here is freely licenced under the gpl for the code, and the Attribution-ShareAlike 2.0 Creative Commons license, for the text.
titleA Note on Format
authorHenry Storycreated12 Jul 2004 17:45:00 GMTN3 fileentry.2004-07-12-1945.n3RDF fileentry.2004-07-12-1945.rdfAtom-like RDF fileatom.2004-07-12-1945.rdf
This research is presented in blog format, so that it can itself be a test case for the format it is advocating. The html presentation of this blog does not follow the usual blogging tradition of presenting the main entries in inverse chronological order, as this would make following the flow of ideas presented here a little awkward.

Blog entries are written by Henry Story, as well as the replies which were added to illustrate threading. All of html and N3 files contained in this directory were generated by a little java program available in the same directory. Some of the noteworthy files are:

  • Atom.owl and Atom.n3 the formal OWL spec which backs this all up.
  • AllInOneDatabase.n3 a file that contains all the entries. More on this below.
  • feed.n3 which is the dynamic entry point to the feed for this blog. The other 'feedxxx.n3' files are feed archives.
  • Each of the files that describe each of the entries of this blog.

This model is the second attempt and tries to incorporate all the lessons learned from the first one

titleAn overview UML diagram
authorHenry Storycreated12 Aug 2004 18:35:00 GMTN3 fileentry.2004-08-12-2035.n3RDF fileentry.2004-08-12-2035.rdfAtom-like RDF fileatom.2004-08-12-2035.rdf
Here is a partially simplified UML diagram of Atom-FOAF.

The reality is a little more complicated because there are in fact two ways to represent an Entry:

  • the simple default one shown here
  • another way that takes into account the possible states an Entry can have over time.
There is a simple logical relation between the two views, which I will get into in a later blog entry.

Two important things to notice here are the yellow and green background zones.
The classes on the green background come from the FOAF namespace. Those on the yellow background have until recently been thought to belong to the Atom namespace. It is my contention here (arrived at after a long conversation with Ken McLeod) that these Feed classes are in fact much more general, and don't in any particular way belong to Atom. We can find similar structures in many places we look on the web - pretty much anywhere we need to chunk a potentially large list of results into smaller sections - such as for example search engine results, WebDav search results(?)... So this is a first attempt at simplification. By pushing out everything Atom related into the Blog class located on the white background reserved for Atom concepts, we end up with a little 'Feed' structure that could be nicely useful elsewhere (after due renaming perhaps) and with a Blog class where we can place a lot of the 'introspection' information.

The UML diagram is of course backed up by the formally specified Atom OWL spec that ships with this release.

titleIt's all about the Entries, stupid!
authorbob wymancreated13 Aug 2004 03:30:00 GMTN3 fileentry.2004-08-13-0530.n3RDF fileentry.2004-08-13-0530.rdfAtom-like RDF fileatom.2004-08-13-0530.rdf
This also very nicely illustrates what I was trying to get at with my It's about the Entries, Stupid! post I sent a few months back. By simplifying the model down to the core it becomes apparent that it is indeed the Entry that is at the core of Atom. The person and the Feed concepts are not central to Atom. They are generic concepts that can be found and used elsewhere.
titleN3 illustration - The Feed
authorH. Storycreated13 Aug 2004 08:47:00 GMTN3 fileentry.2004-08-13-1047.n3RDF fileentry.2004-08-13-1047.rdfAtom-like RDF fileatom.2004-08-13-1047.rdf
To start off let us look at the feed files. There are two sets of these files:
  • feed.n3, which is the head of the feed, the dynamic file that changes whenever a new entry is added to the blog. This is the file that blog readers will be polling every so often.
  • feed-entries_0_to_3.n3, feed-entries_x_to_y.n3,... each of which is an archive of older feed entries. These files SHOULD NOT change, making them prime candidates for cacheing.
Each of these files is a part of the whole result. I don't yet have a concept yet the union of all the content in these files. This may be something that needs adding. Each file points to the previous results with code such as
    <>   :previous
              [ a       :Link ;
                :href   <feed-entries_0_to_3.n3> ;
                :mime-type "application/rdf+n3"^^xsd:string ;
                :text   "previous 4 entries"^^xsd:string 
              ] .

which says that the previous entries can be found at the resource <feed-entries_0_to_3.n3>. <feed-entries_0_to_3.n3> itself points to the dynamic element of the feed thus
  <>   a       :Feed ;
      :about  <blog.n3> ;
      :dynamic <feed.n3> ;

which points to the dynamic part of the feed. It also points to the blog file that contains the so called 'introspection' information about the blog: namely where the url for adding new entries is located, and other things which I know are not yet fully thought through.

Notice: The current feeds contain very little information. They point to the entries themselves. feed.n3 for example points to four (only four for illustrative purposes) entries as shown here:

<>    a       :Feed ;
      :dynamic <> ;
      :entry  <entry.2004-08-13-1047.n3> , <entry.2004-08-13-1445.n3> ,
               <entry.2004-08-13-1752.n3> , <entry.2004-08-13-1632.n3> ;

To help clients tell which entries they have or have not downloaded we can add further information such as the EntryID and the EntryVersion of each of these entries. That is done further down in the feed.n3 file:
      :entry-version <> ;
      :id     <> .

      :entry-version <> ;
      :id     <> .

Clearly a lot more could be added. One could add the title (an obvious addition), perhaps the publication date, the last changed date... One could of course add everything, as with the AllInOneDatabase.n3, but that would be extreemely wasteful in bandwidth and very un-RESTful. What to add and not to add is really an empirical research topic. Having very little information is not really a problem. As long as the client can determine where the entries are and which entries it allready has fetched (hence the entry-version field) it will only need to fetch the content once. With HTTP 1.1 Persistent Connections, having to make multiple requests is not at all a problem.
titleTwo perspectives on a blog entry
authorHenry Storycreated13 Aug 2004 12:45:00 GMTN3 fileentry.2004-08-13-1445.n3RDF fileentry.2004-08-13-1445.rdfAtom-like RDF fileatom.2004-08-13-1445.rdf
The current model proposes two view on an entry:
  • the simple Entry, that can be found at a certain retrievable location, and shows only its current state.
  • the Entry as a historical thing, that encompasses all the changes that occurred to it in the actualworld (we don't deal with counterfactual entries). This is the EntryID and its associated EntryVersion-s.
This is illustrated by the following diagram:

Again I have tried to highlight the two areas by placing their classes on differently colored backgrounds. On the yellow background is the main class for the temporal Entry representation, and on the green background, we have the atemporal Entry Representation. Given any one of these one can deduce the other. Ie, they are logical consequences of one another.

Some of the main points distinguishing them are:

  • An entry has a URL resource, that allows one to fetch the information (for example a relative uri such as entry.2004-06-29-1010.n3), whereas EntryID and EntryVersions are URNs such as which will indeed uniquely identify an Entry, but will not allow one to retrieve them without a search engine. This difference creates a fundamental difference in use between these two ways of looking at the entry. An Entry is what people should be editing and fetching in a RESTful manner using GET, POST, and PUT. An EntryID is how a client would identify the Entry-s it downloaded to keep track of the changes to them, and that to which they were responding, so it could follow how the entry propagated around the web, etc. The EntryID and EntryState classes are key elements in databases such as AllInOneDatabase.n3, which contains all the information about all the entries in this directory.
  • An entry must have an id and of course an entry-version. The unchanging parts of the Entry, its essential properties, go into the EntryID structure. The contingent properties of an Entry go into the EntryVersion. An EntryID may on the other hand have a number of EntryVersion-s. Each of these EntryVersions represents the state of the Entry over a particular span of time. From an Entry one can easily deduce the EntryVersion and EntryID fields. To go in the opposite direction one first needs to select the latest EntryVersion of an EntryID.
  • An Entry can be a reply to another EntryVersion. It is important to keep track of which version of an entry one is replying to, as this can significantly change the meaning of a response. For clients this could help clients flag responses that might need to be updated or even deleted, or it could help readers beware that a response may no longer be relevant to the entry it is relating to.
titleGraphical illustration of the two pespectives
authorH. Storycreated13 Aug 2004 14:32:00 GMTN3 fileentry.2004-08-13-1632.n3RDF fileentry.2004-08-13-1632.rdfAtom-like RDF fileatom.2004-08-13-1632.rdf
Let me illustrate two views on an Entry graphically, so as not to have to take any sides among the many possible serialisations of semantic web triples: N3, N-Triples, RDF, ... Each of these serialisation formats can be mapped onto a graph of triples, as explained in the w3c's RDF Semantics paper. I here represent resources in rounded rectangles, blank nodes by circles, Literals by rectangles, and of course predicates by named arrows.

Let us start off with a simple graphical representation of an Entry written in a file entry1.n3, written by Karl Dubost, where he asserts the cryptic '2b v not2b'.

The id of the entry is tag:e1, and it is the first version as hinted at by the entry-version which is tag:e1#v1. The entry was created on 11 Jun 2002 at 5pm, and was published (issued) shortly thereafter, at 10 minutes past 5. (Note that since we know that the entry is written by Karl Dubost, we may be able to find who is friends are if we have access to some FOAF files that mention him.)

Perhaps shortly later Karl finds that he wants to make a change to his entry. He prefers titles to start with capitals, and changes his statement to a question. He is still thinking about this change, so this change does not yet have a publication date. (how we got this file is of course a problem for my story now). As a result the graph we have is as follows:

Here I have highlighted in green the changes to the graph. Gone is the issued field, a modified date has appeared, and the data fields of the title and entry fields have slightly changed. Of course we have a new version id.

Any person who fetches entry1.n3 after the change (and after he issues it) will not be able to retrieve the original version, as it will have been completely replaced by the new one. They will know when the file was last modified though. But if someone were to keep track of all these changes - either the editor that Karl is using in order to allow him to backtrack to previous versions were he to think he had made a mistake, or some agregator that wanted to keep a fuller view of the changes made to the posts on Karl's web site (perhaps in order to notify the aggregator's owner that a reply he wrote to Ken's post had changed) - then he would presumably want to keep the changes stored in its local database by organising the entries by EntryID, in a database similar to our AllInOneDatabase.n3. The graph for this entry would then look like this:

Here the root of the tree is the EntryID, which points to the two EntryVersions. Notice that in this case the EntryVersions have an entry-location, to help find the original entry file. The location is not attached to the EntryID as the location of an Entry could change over time. In this case the entry has remained in the same position.

It should be very easy to specify a logical relation allowing one to deduce one of the views from the other. Since we are speaking in ontology, there is not concptual priority of one of these views over the other. They both exist simultaneously.

titleN3 illustration on the two pespectives
authorH. Storycreated13 Aug 2004 15:52:00 GMTN3 fileentry.2004-08-13-1752.n3RDF fileentry.2004-08-13-1752.rdfAtom-like RDF fileatom.2004-08-13-1752.rdf
The entry file using the Entry class is pretty easy to understand. As an example let us take the file describing this entry, namely entry.2004-08-13-1752.n3.
  <>    a       :Entry ;
      :author [ a       <> ;
                        <> ;
                        <> ;
                        "H. Story"^^xsd:string
              ] ;

The first line just specifies that this file ('<>' in N3) is an Entry. It then continues by specifying the author of the Entry using the FOAF classes. Everything else in the file is pretty self evident. Perhaps the following requires a little closer look at
      :id     <> ;
      :entry-version <> ;
      :in-reply-to <> ;
      :title  [ a       :Content ;
                :data   "N3 illustration on the two pespectives"^^rdf:string ;
                :mime-type "text/simple"^^xsd:string
              ] .

Here we specify the id and entry-version tags. Notice that with a well designed URI structure one should be able to guess the id tag from the version tag. Here the version tag just consists of the entry id with '#version1' appended.

Notice the in-reply-to property. It relates the current Entry not to another Entry (with a URL) but to antother EntryVersion. How do we retrieve the location of the EntryVersion to which this Entry is a reply? Well further down in the file we have the following statement:

      :entry-location <entry.2004-08-13-1445.n3> .

which associate that entry version with a Entry URL namely the above entry.2004-08-13-1445.n3.

How does this Entry appear in the AllInOneDatabase.n3? We just need to search that file for the EntryID

      a       :EntryID ;
      :author _:b2 ;
      :created "2004-08-13T17:52:00+0200"^^xsd:dateTime ;
      :in-reply-to <> ;
      :state  <> .

This just gives us the author _:b2 wich is an empty node that is specified in more detail elsewhere in the file, the creation time, what this EntryId was in reply to, and the EntryState tag. The values for that tag are also to be found in the file, and its content should correspond to the text you are now reading.