November 28th, 2003
Spiders & Webs
The Need for a Semantic Web
In this age of deep changes for the human kind, this age which, after
the coming of the microchip, has become a post-industrial age - now
that the World Wide Web has penetrated so deeply in our lives, and
still in poor countries like Senegal there is an Internet Cafeí at
every roadís corner - itís licit to ask ourselves where is going the
Into the last years we noticed what I called the ĎRise of the
Robotsí. I can easily recall the birth of the various WebCrawler,
Altavista, and the superstar Google. Surely they do a great job, as
evidenced by some numbers, today October 2003 Google:
- satisfies more than 200 millions searches a day;
- Web pages searched: more than 3 billion;
- Images: 425 million+
- Usenet messages: 800 million+
- Global unique users per month: 73.5 million
These numbers are really impressive. But is the Web really working
well? And what does the future deserve? How will be the next years?
Inasmuch the nowaday Web is filled with zillion of pages of text
that can be machine-readable but not machine-understandable, it is
pretty tough to automate everything, nor this volume of information
can be managed manually. So the solution proposed is to use metadata
to describe the content of the Web.
The Importance of the Metadata.
One of the key points is how we manage the Metadata. Very clarifying
is the presentation written by Paolo Ceravolo, that I tried to
translate; and inasmuch it's a very long article, I have cut or exposed
what in my opinion is less or more important; adding something of mine
and reassembling the data.
What is astounding is that many people ask which is the purpose of
the Metadata. This is weird because without the Metadata the Web simply
doesn't work. The programmed choice of the Web is to question not the
texts but the Metadata. And this happens because the data on the Web
have no structure, they are simply spread on millions of sites. While
the Metadata are informations built following a sharp schema. It's
thank to this structure that we are allowed to manipulate the data,
knowing how they are interelated.
At the same time the Metadata are the weak ring of the chain.
Because they are expensive to produce and sometime could result clear
as mud. If they are generated through a human indexing might be vague,
inexact. And the indexing requires well trained and motivated people.
Otherwise if it's faced simply as a clerk's due, the result is not
reliable. The alternate way is an automated, or semiautomated,
Here the biggest problem is that an automated tool cannot
distinguish the text from the context. That means an automated tool can
only extract the most significant data but can never insert info not
included in the original document but to it related, i.e. the context.
And it remains sensible to the ambiguity of some words.
So the actual choice is the socalled Assisted Extraction, corrected
and reviewed by trained operators, able to moderate the limits of the
automated tools. Usually they are divided in:
- first step of such a tool is a Text Zoner, that divide the text in structured parts, like title, body and so on;
- then is the turn of the Preprocessor that does a morphological analsys of the sentences, trying to guess what is the subject, the verb, the object;
- then is time for a Filter, that cuts off sentences not reputed useful;
- through a Named Entity Recognizer can be identified minimal lexical structures like names, dates, numbers, companies names, and so on;
- all these informations are then organized by a parser that will furnish the hierarchy of the relations, ordering them with a tree-structure;
- last step is a Lexical Disambiguation that must make sure that words with plural meanings be translated in a single way.
[Paolo Ceravolo, online:
Itís humble opinion of truly yours that one of the biggest odd of
the nowadays Internet is the shape in which the informations are
rendered. I mean that if I am a student and need an abstract about
Franklin Delano Roosevelt, I receive some thousands links, that I have
to browse to find his birthdate or death, or to discover he made the
"New Deal", and what this was, and why it has been important in the
So it's required something more to address the user, as the student
as the researcher. Is there any new on sight?
Let's start with some definitions:
Definition: The Semantic Web is the representation of data on the
World Wide Web. It is a collaborative effort led by W3C with
participation from a large number of researchers and industrial
partners. It is based on the Resource Description Framework (RDF),
which integrates a variety of applications using XML for syntax and
URIs for naming. [
"The Semantic Web is an extension of the current web in which
information is given well-defined meaning, better enabling computers
and people to work in cooperation." -- Tim Berners-Lee, James Hendler,
The Semantic Web, Scientific American, May 2001.
Resource Description Framework (RDF)
The RDF is "a foundation for processing metadata. It provides
interoperability between applications that exchange
machine-understandable information on the Web. RDF can be used in a
variety of application areas; for example: in
to provide better search engine capabilities, in
for describing the content and content relationships available at a
particular Web site, page, or digital library, by
software agents to facilitate knowledge sharing and exchange, in
content rating, in describing
collections of pages that
represent a single logical "document", for describing
intellectual property rights of Web pages, and for expressing the
privacy preferences of a user as well as the
policies of a Web site. RDF with
digital signatures will be
key to building the "Web of Trust" for electronic commerce,
collaboration, and other applications"
Now the way in which RDF data are represented, i.e. the syntax,
the model for encoding and transporting this metadata, the use of the
(subject, predicate, object) is out of the scope of this
Let's say only that: "The syntax uses the Extensible Markup
Language [XML]: one
of the goals of RDF is to make it possible to specify semantics for
data based on XML in a standardized, interoperable manner. RDF and XML
are complementary: RDF is a model of metadata and only addresses by
reference many of the encoding issues that transportation and file
storage require (such as internationalization, character sets, etc.).
For these issues, RDF relies on the support of XML. It is also
important to understand that this XML syntax is only one possible
syntax for RDF and that alternate ways to represent the same RDF data
model may emerge"
And let's add that RDF uses classes to define and model the world,
like many object-oriented languages. For this purpose probably you
But the RDF presents some odds, it is only a frame system, it does
not include a mechanism for reasoning
. But a reasoning mechanism
could be built on top of this frame system. So what's next?
Ontology Web Language (OWL)
Before talking about ontology let's give a look to the DARPA Agent
Markup Language (DAML
As its site clearly states: "The goal of the DAML effort is to
develop a language and tools to facilitate the concept of the Semantic
Ontology, definition: study of the existence; a branch of the
metaphysics related to the nature of the human being; from the ancient
Greek root: ont- "being".
Now just for love of technical matter, let's give a short
look to the DAML coding to understand the implementation of classes
a:date="7 Aug 1961"
a:place="Wyandotte, Wayne Co., Michigan"/>
So at least now we have an idea about the DAML. But the smart user
could ask: "Works? What there is outta here?".
State of the Art
In the words of Sean B.Palmer (a W3C researcher): "Unfortunately,
the Semantic Web is dissimilar in many ways from the World Wide Web,
including that you can't just point people to a Web site for them to
realise how it's working, and what it is. However, there have been a
number of small scale Semantic Web applications written up. One of the
best ones is Dan Connolly's Arcs and Nodes diagrams experiment
Circles and arrows diagrams using stylesheet rules, Dan Connolly
Sean advises to check also: "another good example of the
Semantic Web at work is Dan Brickley et al.'s
Both mentioned in:
What Can I Do To Help?
, hit PgDwn on your keyboard and read.
So I hope to have given to my readers at least a pale idea of what
appears to be the future of a layer of Internet; if the Semantic Web
will put roots, this will bring to a Social Change for the human kind.
But hey the average webmaster is a human being, so as lazy as any
Thank you for your time, stay tuned,
References & Acknowledgements:
Over than a user, as researcher and programmer, my personal thanks go
at least to:
Tim Berners-Lee, Ora Lassila, James Hendler, Sean B.Palmer,
Aaron Swartz, plus some other dozens, for the exceptional work done.
For the purpose of this article have been consulted the following:
(The Semantic Web In Breadth)
(Introduction to Semantic Web)
(N-Triples - N3)
(DAML status and tools)