KM 5433 Blog/Joe Colannino

A blog discussing knowledge management and library science issues.

Saturday, September 16, 2006

Review: Steffi Gradmann: rdfs:frbr – Towards an Implementation Model for Library Catalogs Using Semantic Web Technology.

Introduction

This highly technical article by Steffi Gradmann proposes a smart way of linking library catalogs to the world wide web. The particulars of her approach involve using the semantic web via RDF and FRBR. If you are unfamiliar with these concepts, the foregoing links will take you to summaries that explain them.

In brief, the semantic web (also sWeb) promises to be the web on steroids and to allow for much more than keyword searches. And this begs the question, "What’s wrong with keyword searches? " In a word, language. It is well known, that the smallest possible unit of semantic meaning in any language is the sentence, not the word.[1] Don’t believe me? Green.

Dictionary. com lists 33 meanings for the word green. It is not possible to decide among them unless green is used in a sentence at least. Even then, there can be ambiguity:

Joe is green.

Am I envious, sick, inexperienced, conscious of the environment, or painted? When such a simple word can mean everything from recycling to regurgitation we begin to see the problem with keywords for searches, yet for the most part, that is what we have without the sWeb.

The Semantic Web (sWeb)

As everyone knows, Al Gore did not invent the internet but, give him his due; he was the first widely recognized source to use the term information super highway or publicly acknowlege its potential. Tim Berners-Lee is credited with inventing the internet (if you don’t believe it’s true, just ask him) and he is the leading proponent of the sWeb.

Why the sWeb? If we have any hope of creating a web which allows informed inquiry, we must be able to intelligently differentiate among the semantic meanings of words and understand their relations. This is especially so for bibliographic records. Enter Dr. Gradmann:

bibliographic information originated by libraries still largely remains buried within the ‘hidden Web’….

According to Gradmann, with the current web structure, this is probably all for the best because

as long as different layers of information remain blended in bibliographic records, the non-librarian world probably is better off without these thousands of identical bibliographic records pointing simply to different items or manifestations and thus ‘polluting’ search engine results with massive amounts of redundant information.

Gradmann’s solution is to standardize the use of metadata descriptions using RDF (and even the data themselves) by restructuring library catalogs using the functional hierarchical structure of FRBR; or in her words,

“The proposal is to rethink the technical platforms for librarian metadata implementation in terms of Semantic Web technology and to do so using FRBR as a kind of pivot concept. In that sense, the proposal is not to view FRBR as a kind of ontology to be expressed in RDF, but rather to consider it a kind of specific meta-ontology in the field of librarian information objects, which would have to be expressed using RDF schema.

Yes, this would be as unclear and as complicated as it sounds. Nonetheless the proposal has some merit for several reasons.

  • First, librarians would be structuring the relationships. This is a HUGE advantage because they alone have the unique expertise necessary to structure and index related semantic meanings. It would be virtually impossible for the general public to likewise participate in any similar effort.
  • Second, the pool of documents to be related encompasses only library holdings. Library holdings are formidable, but they comprise far fewer documents than the web at large. Moreover, the effort could begin with certain document subsets such as books.
  • Third, libraries are publicly underwritten, thus a contribution to the general welfare of the information seeking community would not be compromised by proprietary restrictions.

Will Gradmann’s proposal ever be put to widespread use? In my opinion, no, but my pessimism has more to do with the politics than the proposal. In Gradmann’s native Germany, or among EU nations in general, people (i.e., taxpayers) might support the concept of expending public funds for such a long range goal. Even the German language seems better for structuring semantic relations because word form and meaning are a bit more closely linked[2]. What will doom the effort worldwide is that U.S. taxpayers will remain too libertarian to support it. English is the lingua franca[3] of the world.[4] Without U.S. support, most of the world’s documents and web content will be out of reach for this kind of semantic web effort.

In the end, I predict that a private company will find the breakthrough technology and heuristics-turned-to-algorithms to relate words semantically based on context rather than morphologically based on spelling. I know that the task has resisted progress and is far more complex than keyword searching. But by way of analogy, consider that Google made a number of breakthroughs that revolutionized almost everything about web searching. If Berners-Lee invented the web, Google made it usable.

I predict a similar statement will follow in time. It will go something like this: Berners-Lee may have invented the sWeb, but Schmoogle made things findable. Schmoogle? Well, it's just a similie.


END NOTES

[1] This is recognized in a variety of disciplines; my first exposure to the concept was upon reading Geisler’s first of four volumes titled “Systematic Theology” (though, in my opinion Grudem has a much better systematic theology in one volume if you are interested.)

[2] Germany has a government body dedicated to promulgating standards for word acceptance, spelling, etc. With its clear rules for spelling and concatenation, German words, especially technical words, have less semantic range than English. Unlike Germany (or France, or other countries), the United States has no official body for deciding spelling and grammar, rules for hyphenation, or even which words are accepted as English. The U.S. approach is a democratic rather than republican process; such issues are decided solely by usage not by authority.

[3] Similar to Koine Greek after the conquests of Alexander the Great. The advent of a single world language has debuted only three times in recorded history. With the possible exception of the present, each advent set the stage for a major cataclysm.

The first occasion culminated in the tower of Babel. According to the Bible, this prompted God to scatter the nations and confuse their languages. Although many regard this as legendary, most cultures have similar legends and all languages appear to derive from a single Ursprache. The table of nations given in Genesis Chapter 10 also appears to be quite accurate, being borne out by scientific disciplines: archaeology and linguistics, to name two; this textual-evidential corroboration is unique in scope and degree as compared to other religious traditions.

The second occasion was in the first century A.D. Despite Roman conquests, Greek remained the lingua franca well through the first century A.D. This is why the New Testament was written in Greek rather than Latin and how it spread so quickly throughout the known world. The cataclysmic event was Christianity – a radical worldview so dramatic that its founder was crucified, its early followers martyred, and the world clock reset from B.C. to A.D.

We are now entering a third cycle. This time, the lingua franca is English. The worldview is postmodern. What is our cataclysm? WW III? The Second Coming? Or will we escape the pattern of history?

[4] Interestingly enough, this is probably due precisely to the democratic and non-authoritarian way that words become English. In what language would Ursprache become the winning word in national spelling bee championships? Certainly not German! They have no spelling bees – word sound and spelling are too logically related – only in America. Though the Germans and French may experience schadenfreude at our rendezvous with the foreign, it is precisely this libertarian and purely democratic tendency that unhesitatingly adopts schadenfreude and rendezvous as English. The result is a richness and semantic depth unrivaled in any other language. This makes for the large number of synonyms that give English its utility and also make it refractory to automated semantic structuring.