Semantic annotations for IDEAL
From Morfeo Wiki
- Status: draft
- Author: Diego Berrueta (Fundación CTIC)
- Contributors: Luis Polo, Abel Rionda (Fundación CTIC)
Introduction and current status
MyMobileWeb currently uses an specific declarative language to describe user interfaces in XML. This language receives the name of IDEAL. There is a reference manual for this language, and some examples. In the source repository there are also a showcast example and a demo application.
The MyMobileWeb consortium is currently re-designing IDEAL. Among other features, this new revision of the language should support semantic annotations. This is important to enable the use cases related with Semantics in the platform. There is no specification document yet for the new version of IDEAL.
Support for semantic annotations in the current version of IDEAL
IDEAL already has some attributes to introduce semantic annotations. They are similar to RDFa, but they were developed in parallel some time ago. They are:
- about-class (equivalent to RDFa's "instanceof/typeof")
- about-resid (equivalent to RDFa's "about")
- about-prop (equivalent to RDFa's "property")
- about-obj-datatype (equivalent to RDFa's "datatype")
- about-link-prop (equivalent to RDFa's "rel")
Additionally, IDEAL has a "mymw:metadata" element that allows to embed chunks of RDF/XML data in IDEAL documents.
Related work
- There are some W3C technologies for semantic content annotation: RDF, RDFa, GRDDL, POWDER.
- Some time ago, the MyMobileWeb project designed a prototype to show how contents can be annotated in UI languages, and how these annotations can be exploited to auto-complete forms.
Why existing technologies are not enough?
- The current MyMobileWeb interface definition language provides means to annotate the contents, but they do not cover every scenario.
- RDFa and microformats can annotate explicit contents in the UI specification (even if they are not visible to the user!), but they fail to capture annotations when data-binding is used, or when the data is intended to be entered by the user.
- RDFa is designed to annotate final markup formats. However, IDEAL is not a final format (i.e., the client never receives an IDEAL document, because it is transformed at the server-side to a different markup language). There isn't any experience with "keeping semantic annotations" across language transformations.
Use cases and requirements
The requirements for the semantic annotations mechanism in IDEAL are:
- Annotations must be made available in RDF, i.e., there must be a mechanism to derive a set of RDF triples from an annotated document. It is not clear, however, whether this extraction should take place at the client- or server-side.
- It must be possible to annotate explicit contents that appear in the user interface definition and are visible for users.
- It must be possible to create additional annotations for contents that appear in the user interface definition but are not visible for users.
- It must be possible to annotate data that do not appear explicitly in the user interface definition, but are obtained through data binding.
- It must be possible to annotate data that do not appear explicitly in the user interface definition, if they are entered by the user (i.e.: forms).
- Semantic annotations must not hamper the performance of the system, particularly regarding the size of the documents transferred through the channel, or the processing time.
Discussion
Our proposal is to use and extend RDFa, with minimal modifications to IDEAL.
Why we cannot simply use (current) RDFa with (current) IDEAL?
- RDFa was not designed to annotate contents that are obtained through data binding (but it can annotate them once they are put in place in the final document served to the client!).
- RDFa was not designed annotate contents entered by the user (forms).
- RDFa cannot annotate user interface widgets, as it would be required to provide auto-completion features (e.g., "this textfield must be filled with the name of a person").
- RDFa (usually) requires "neutral" container elements in the host language (i.e.: "div" and "span" in XHTML). IDEAL doesn't have them.
- RDFa annotations will be applied by the authors of the presentation layer to the IDEAL documents. However, the final users of the documents (the clients) will never receive a IDEAL document. IDEAL documents are transformed at the server into other kinds of mark-up languages.
Should final mark-up documents be annotated with RDFa?
The question here is whether the RDFa markup should be transmitted to the client (mobile device) over the network as part of the presentation markup. There are two alternatives:
- RDFa is transmitted with the presentation markup. As the presentation markup is not IDEAL, but other languages, this alternative would require to find a way to translate the RDFa markup in IDEAL to RDFa markup in those presentation languages. Note that the RDFa markup increments the size of the documents, so they will take longer to transmit and the user may have to pay extra money.
- RDFa is not transmitted with the presentation markup. The client receives just the presentation markup, as usual. If the client wants to receive the semantic data, it must perform a separate request. This is the way in which current semantic annotations in IDEAL work. There should be a way to easily build this second request from the first one. For instance, a pointer to a different URI in a meta header, or by means of content negotiation using the same URI. Note, however, that MyMobileWeb doesn't assign "bookmark-able" URIs to each view, therefore, content negotiation may be impossible.
It is important to take into account that there is some information that never appears in the IDEAL markup, but only in the presentation languages. This is the case, notably, of data inserted by data binding.
In case that RDFa is not transmitted with the presentation markup, RDF can be generated as an additional format for the view (as other markup languages are generated now).
Notes on extending RDFa
- The RDFa specification describes the conformance rules for RDFa agents. Interestingly, it opens the door to agents with additional rules, that can generate additional triples from a document. In other words: it is OK ot define new rules, as far as you put them in a separate graph, and you still generate the default graph according to the standard rules.
- The RDFa specifications do not provide in-depth coverage on how to apply RDFa to general XML document. Only XHTML+RDFa is described. However, the RDFa rules are not XHTML-specific, thus they can be used with any XML document.
- Extending RDFa is OK, breaking RDFa is not. When applying RDFa to IDEAL, we should be careful in order to avoid incompatibilities. Most notably, the triples produced by our RDFa extractor (using our extended rules) should be a superset of those produced by standard RDFa agents.
Scenarios
Annotation of explicit content that appears in the user interface definition and is visible for users
This is a basic feature of RDFa. Consider the following example:
<mymw:list id=”lista”> <mymw:item id=”it1” typeof="ex:Fruta" property="rdfs:label">Pera</mymw:item> <mymw:item id=”it2” typeof="ex:Fruta" property="rdfs:label">Manzana</mymw:item> <mymw:item id=”it3” typeof="ex:Fruta" property="rdfs:label">Naranja</mymw:item> </mymw:list>
The RDF outcome of the previous code is:
_:f1 a ex:Fruta ; rdfs:label "Pera" . _:f2 a ex:Fruta ; rdfs:label "Manzana" . _:f3 a ex:Fruta ; rdfs:label "Naranja" .
Annotation of additional content that can appear in the user interface definition but is not visible for users
This scenario is within the current scope of RDFa, but it often requires support for "grouping" elements (i.e.: div, span) in the host language. The content is entered in attributes of those elements, which produce no visible output.
<mymw:div about="#mybirthdayparty" typeof="ev:Vevent"> <mymw:p> <mymw:span property="ev:description">My birthday party</mymw:span> will take place at <mymw:span property="ev:location">Boecillo</mymw:span> on <mymw:span property="ev:date" content="2008-04-07" datetype="xsd:date">next monday</mymw:span> </mymw:p> </mymw:div>
The RDF outcome of the example above is:
#mybirthdayparty a ev:Vevent ;
ev:description "My birthday party" ;
ev:location "Boecillo" ;
ev:date "2008-04-07"^^xsd:date .
Annotation of content which is obtained through data binding
This scenario is not within the current scope of RDFa, which can only use the data that is explicitly included in the markup of the document. Consider the following table (taken from [1]):
<mymw:table id="myTable" bind="${selectedPS}" optionsbind="${searchPSResult}" keymember="code" paginate="true" style="body selco10"> <mymw:th style="headerfont headercolor"> <mymw:td>Code</mymw:td> <mymw:td>Name</mymw:td> <mymw:td display="${_MYMW_DEV_BELONGS.PdaDevice}">Date</mymw:td> </mymw:th> <mymw:tr> <mymw:td member="code"/> <mymw:td member="name"/> <mymw:td member="date"/> </mymw:tr> </mymw:table>
A few observations:
- The third column is shown only in some devices (those with a wider display), but it is hidden in other devices. This is a presentation trick (reduce the visual width of the table in devices with a narrow display), but it should not have any impact in the semantics. Note, however, that if the triples are encoded as RDFa in the presentation language (e.g., XHTML), special care must be taken to encode those triples, even if the cannot be attached to "visible" markup.
- Pagination may differ from one device to another. As well as the HTML version has "next" and "prev" links, the RDF version should provide links to the previous and next pages. In this way, a semantic web agent will be able to "follow his nose" to the whole set of triples.
- Usual RDFa attributes are not suitable to mark-up data that is retrieved by data-binding. If used, their meaning would be ambiguous: are they referring to the explicit content or to the content obtained through data-binding? (note that the current mechanism to introduce semantic annotations in IDEAL does not use a different set of attributes; however, they have a different meaning if they are attached to form elements or data-binding elements).
It is very difficult to provide a complete solution for all cases. However, for a restricted environment, it is possible to devise a simple solution. One of such restrictions would be that each row of a table can only describe properties of one subject. Consider the following changes to the previous example:
<mymw:tr typeof-bind="foaf:Document">
<mymw:td member="code" property-bind="dc:identifier"/>
<mymw:td member="name" property-bind="dc:title"/>
<mymw:td member="date" property-bind="dc:date"/>
</mymw:tr>
At presentation time, data binding will take place, and the following table will be displayed:
| Code | Name | Date |
|---|---|---|
| 82 | Growing roses | 2008-04-10 |
The RDF outcome for the table above will resemble the following:
[] a foaf:Document ;
dc:identifier "82" ;
dc:title "Growing roses" ;
dc:date "2008-04-10" .
However, if each row describes more than one resource, then a problem arises. Consider this new example:
<mymw:tr typeof-bind="sioc:Post">
<mymw:td member="title" property-bind="dc:title"/>
<mymw:td member="author" [....?] />
</mymw:tr>
Ideally, we would like to say that there is an instance of foaf:Person, which foaf:name is obtained through data-binding, and which is related with the instance of sioc:Post by a dc:creator predicate. Unfortunately, that kind of relation is difficult to describe. One possibility would be:
<mymw:tr typeof-bind="sioc:Post">
<mymw:td member="title" property-bind="dc:title"/>
<mymw:div rel-bind="dc:creator">
<mymw:div typeof-bind="foaf:Person">
<mymw:td member="author" property-bind="foaf:name" />
</mymw:div>
</mymw:div>
</mymw:tr>
This would produce the following display:
| Post title | Author name |
|---|---|
| Great movie! | John Doe |
And these RDF triples:
[] a sioc:Post ;
dc:title "Great movie!" ;
dc:creator [
a foaf:Person ;
foaf:name "John Doe"
] .
Note, however, that this solution introduces a deviation from the usage patterns of HTML, and it introduces coupling between the order of the columns and the ability to represent certain relationships between subjects. Therefore, it cannot be considered a complete solution. For instance, given the following example:
<mymw:tr typeof-bind="foaf:Person">
<mymw:td member="personName" property-bind="foaf:name"/>
<mymw:div rel-bind="ex:hasParent">
<mymw:div typeof-bind="foaf:Person">
<mymw:td member="fatherName" property-bind="foaf:name" />
<mymw:td member="fatherBirthdate" property-bind="foaf:birthDate"/>
</mymw:div>
</mymw:div>
</mymw:tr>
The table will look like this:
| Name | Father name | Father birthday |
|---|---|---|
| Alice | John | 1971-07-11 |
If two additional columns are added for the mother, we may have the following:
<mymw:tr typeof-bind="foaf:Person">
<mymw:td member="personName" property-bind="foaf:name"/>
<mymw:div rel-bind="ex:hasParent">
<mymw:div typeof-bind="foaf:Person">
<mymw:td member="fatherName" property-bind="foaf:name" />
<mymw:td member="fatherBirthdate" property-bind="foaf:birthDate"/>
</mymw:div>
</mymw:div>
<mymw:div rel-bind="ex:hasParent">
<mymw:div typeof-bind="foaf:Person">
<mymw:td member="motherName" property-bind="foaf:name" />
<mymw:td member="motherBirthdate" property-bind="foaf:birthDate"/>
</mymw:div>
</mymw:div>
</mymw:tr>
| Name | Father name | Father birthday | Mother | Mother name |
|---|---|---|---|---|
| Alice | John | 1971-07-11 | Beth | 1970-03-10 |
... which is fine, but if the same columns were arranged in a different order:
<mymw:tr typeof-bind="foaf:Person">
<mymw:td member="personName" property-bind="foaf:name"/>
<mymw:div rel-bind="ex:hasParent">
<mymw:div typeof-bind="foaf:Person">
<mymw:td member="fatherName" property-bind="foaf:name" />
</mymw:div>
</mymw:div>
<mymw:div rel-bind="ex:hasParent">
<mymw:div typeof-bind="foaf:Person">
<mymw:td member="motherName" property-bind="foaf:name" />
</mymw:div>
</mymw:div>
<mymw:div rel="ex:hasParent">
<mymw:div typeof-bind="foaf:Person">
<mymw:td member="fatherBirthdate" property-bind="foaf:birthDate"/>
</mymw:div>
</mymw:div>
<mymw:div rel-bind="ex:hasParent">
<mymw:div typeof-bind="foaf:Person">
<mymw:td member="motherBirthdate" property-bind="foaf:birthDate"/>
</mymw:div>
</mymw:div>
</mymw:tr>
| Name | Father name | Mother | Father birthday | Mother name |
|---|---|---|---|---|
| Alice | John | Beth | 1971-07-11 | 1970-03-10 |
For each row, 5 anonymous instances of foaf:Person would be created (the child, two instances for the father, and two for the mother), and it would be impossible to declare that the two instances which describe each parent are actually the same.
A possible workaround would be to provide a mechanism to coin unique URIs for each resource, something in the spirit of:
<mymw:tr about="http://example.org/Person-${rowNumber}" typeof-bind="foaf:Person">
<mymw:td member="personName" property-bind="foaf:name"/>
<mymw:div rel-bind="ex:hasParent">
<mymw:div about="http://example.org/father-of-Person-${rowNumber}" typeof-bind="foaf:Person">
<mymw:td member="fatherName" property-bind="foaf:name" />
</mymw:div>
</mymw:div>
<mymw:div rel-bind="ex:hasParent">
<mymw:div about="http://example.org/mother-of-Person-${rowNumber}" typeof-bind="foaf:Person">
<mymw:td member="motherName" property-bind="foaf:name" />
</mymw:div>
</mymw:div>
<mymw:div rel="ex:hasParent">
<mymw:div about="http://example.org/father-of-Person-${rowNumber}" typeof-bind="foaf:Person">
<mymw:td member="fatherBirthdate" property-bind="foaf:birthDate"/>
</mymw:div>
</mymw:div>
<mymw:div rel-bind="ex:hasParent">
<mymw:div about="http://example.org/mother-of-Person-${rowNumber}" typeof-bind="foaf:Person">
<mymw:td member="motherBirthdate" property-bind="foaf:birthDate"/>
</mymw:div>
</mymw:div>
</mymw:tr>
In this way, only three named instances would be created for each row. Note that, in addition to the synthetic key ${rowNumber}, many use cases will call for a real primary key to be part of the URI. For instance, a natural primary key, such as the ID-card number (e.g. Spanish DNI) may serve to coin unique URIs for each resource.
Annotation of content which is entered by the user
In IDEAL, content that is entered by the user is binded to back-end domain objects. Therefore, this scenario is quite similar to the previous one. However, there is a difference: the content is not there when the page is served to the client, for instance, it wouldn't make sense to generate a triple saying that someone's name is (empty string) just because the value of the input field of the form is empty when the page is generated.
Instead, the content is typed by the user, to the triple cannot be generated until the user has entered the text. However, this is just one potential use case for the annotation. A different one does not involve the generation of triples, but the description of the fields of a form in order to provide suggestions for auto-completing the requested data. A simple example follows:
<mymw:p id="p1" layout="vertical" align="center" style="nowrap"> <mymw:label id="date">Date:</mymw:label> <mymw:datefield style="mydate" labelid="date" id="date" bind="${date}" property-bind="foaf:birthdate"/> </mymw:p> <mymw:p id="p2" align="center"> <mymw:submit id="submit" value="Accept" principal="true" /> </mymw:p>
There are some questions to be considered:
- How to specify the subject of the triple? A container element is needed to group all the fields of the form (note that IDEAL does not have a 'form' element).
- How to avoid the generation of a triple containing an empty string before the field is filled?
Here is a potential reformulation of the previous example addressing these questions:
<mymw:div typeof-bind="foaf:Person"> <mymw:p id="p1" layout="vertical" align="center" style="nowrap"> <mymw:label id="date">Date:</mymw:label> <mymw:datefield style="mydate" labelid="date" id="date" bind="${date}" property-bind="foaf:birthdate" property_tobefilled="true" /> </mymw:p> <mymw:p id="p2" align="center"> <mymw:submit id="submit" value="Accept" principal="true" /> </mymw:p> </mymw:div>
Finally, there is another important question: what to do with the semantic markup when an interface specified using IDEAL is transformed into the presentation language (eg.: XHTML)? It is not possible to embed these annotations as RDFa, because RDFa cannot annotate interfaces (just content). Other solutions are needed, such as the ones described in our paper published in MWeb'07.
Conclusions
- New container elements must be added to IDEAL to define blocks (i.e.: equivalents to "div" and "span" in XHTML).
- The current attribute set of RDFa is not enough to satisfy all the requirements. New attributes are necessary to keep compatibility with the RDFa specification and existing parsers.
- A decision has to be made on how to serve the semantic descriptions to the client: within the interface language, or as a separate RDF file?
- A new set of rules is necessary to define how triples are derived from IDEAL documents annotated with RDFa. The current set of rules defined for XHTML does not apply due to data binding and user-entered content.
- Annotation of tables is challenging when each row describes several resources that are related to each other.
Why do we need new attributes for data-binding?
Because if we re-use the existing RDFa attributes for data-binding, we will have to re-write existing RDFa rules, and we will break existing RDFa parsers. This issue will be illustrated with an example. Consider the following table that contains data about books (just one book, for simplicity):
| Title | ISBN |
|---|---|
| Segunda Fundación | 555-09312 |
The scenario described above for data-binding with tables proposes to add annotations to the 'td' elements that are actually bound to the data. In this case, if we re-use existing RDFa attributes instead of introducing new ones, the IDEAL document will look like this:
<mymw:tr typeof-bind="foaf:Document"> <mymw:td member="title" property="dc:title"/> <mymw:td member="isbn" property="dc:identifier" /> </mymw:tr>
Unfortunately, if we use this file as the input to a standard-compliant RDFa parser, we will get some strange triples:
[] a foaf:Document ;
dc:title "" ;
dc:identifier "" .
What's the problem? The standard rules have been applied and some triples have been produced as part of the normal processing of the file. Sadly, these triples are not meaningful, and we clearly don't want them. Modifying the standard rules to avoid them would break the RDFa compliance rules (you can extend RDFa to generate new triples, but at least you have to generate the same triples as a standards-compliant RDFa parser would generate).
Note that moving the annotations to the header row would not fix the problem. Consider this markup:
<mymw:th typeof-bind="foaf:Document"> <mymw:td property="dc:title">Title</mymw:td> <mymw:td property="dc:identifier">ISBN</mymw:td> </mymw:th>
In this case, the RDF outcome would be even worse:
[] a foaf:Document ;
dc:title "Title" ;
dc:identifier "ISBN" .
Therefore, our proposal is to introduce new attributes. Standards-compliant RDFa parsers will just ignore them (no triple will be generated for data-bound tables). Only the MyMobileWeb parser will notice these new attributes, and will use them to produce the proper triples. In this way, MyMobileWeb's parser will generate a superset of the triples prescribed by the standard, which is OK.
Future work
- Decide how the annotations will be transferred to the client: (a) as RDFa embedded in the presentation markup, or (b) as RDF files.
- If (a), then a set of rules must be defined to translate annotations in IDEAL to RDFa annotations in the presentation language.
- If (b), then a new set of rules must be defined to extract RDF triples from annotated IDEAL documents at run-time.
- Rewrite the examples at http://195.235.93.70:8081/semantic/ using the new proposed annotation mechanism.
