KM 5433 Blog/Joe Colannino

A blog discussing knowledge management and library science issues.

Sunday, October 29, 2006

Biblical exegesis and metatags/ J. Colannino: a way to view “usage patterns of collaborative tagging systems” by S.A. Golder and B.A. Huberman.

1a.) In the beginning was the Word,

1b.) And the Word was with God,

1c.) And the Word was God.

So begins the gospel of John. When John wrote his gospel, he undoubtedly had the opening verse of Genesis in mind. It begins with a similar and well-known opening:

1a.) In the beginning, God created the heavens and the earth”

Graphæ, Rhæma, and Logos

The Greeks had three words for our English term “word”: graphæ – the written or engraved word; rhæma – the spoken word; and logos – the verbal idea behind the word. (From logos, we derive the English word “logic”) In his opening lines, John uses the Greek word Logos. The first-century Hellenistic worldview with its high regard for both thought and language would have no problem identifying Thought as being before all things (1a). The concept that logic itself was associated with God in a relational way (1b) would also be permissible within a first-century worldview, though the tendency would be to view the construction as a personification.

The problem would begin in earnest with 1c: the Logos is a Person (not a mere personification), the source of all thought and logic – the very God, Himself. The chapter goes on to attribute to the Logos activities of God alone including the creation all things (v 10) and that most vulgar of concepts to the Hellenist – corporeal being (v 14).

The purpose for this digression has been to embed clear distinctions among the different but related concepts of written text (graphæ), idea or meaning (logos) and spoken word (rhæma). Unfortunately, English has lost the precision the ancient Greeks had; we have only one English moniker for all three of these essentially different things: “word.”

Metatag and Keyword Searching

With that background, I approach the subject at hand: with keyword searches, we want logos but search graphæ. Metatags are an attempt to codify logos with graphæ. But logos and graphæ are ontologically distinct. Therefore, if the graphæ is not limited or controlled in some way (i.e., uniquely mapped to logos), then we have ambiguity.

Logos, Graphæ, and Mapping

Golder and Huberman identify several problems with metatags constructed from uncontrolled graphæ as the search instrument; most seriously: logos does not map to graphæ bijectively (i.e., one-to-one), or even surjectively (i.e., many-to-one); Golder and Huberman express this more conventionally with the term basic-level variation. Basic-level variation is the unintended practice of using different terms to represent fundamental attributes of an object or collection of objects. For example, some might refer to “dogs, cats, and birds” as “animals,” others as “pets,” etc.

Democracy comes to metatagging

Golder and Huberman suggest that collaborative tagging can filter basic-level concepts and provide consensus. It creates a kind of surjective map for concepts onto metatags. The activity itself democratizes metatagging. However, metatags are still graphæ that are consensually but not formally controlled. As such, this kind of metatagging still misses out on the benefits that hierarchy brings to classification.

Taxonomy and Metatags

Linnaean taxonomy is the most used (and arguably, the most successful) example of a hierarchical classification. Linnaean taxonomy owes its existence to a clear paradigm: the expectation that work of a rational and orderly Creator exhibits rational order. The concept is now passé and retained only out of pragmatism, but with that paradigm, Carolus Linnaeus revolutionized classification by hierarchically arranging living things based on anatomical similarities.

A similar worldview could be helpful for classifying knowledge, and indeed, only such a worldview truly legitimizes it (but I would love to hear dissenting views): human thought is rational and ordered because humans are created in the image of a rational and orderly Creator. Therefore, thoughts can be ordered and arranged (knowledge). In turn, knowledge is itself further classifiable. This leads to a hierarchy having a universal structure that transcends language, vocabulary, culture, race, and geography. However, such a classification must focus on logos and not on graphæ, because logos alone is universal and transcendent while graphæ is dependent and contingent on the very things logos transcends: e.g., vocabulary, culture, race, and geography.

Graphæ is not logos; indeed, metatags need not even use real words. For example, suppose the concept of murdering a person were coded as A12J45 (arbitrary), and the concept of foot pain were coded as B23K15 (arbitrary). Then when someone used the text “these shoes are murder,” or “my feet are killing me,” we would be able to map these ideas with a metatag to entity B23K15. Searching for the B23K15 metatag we would find all of the entries associated with foot pain, including shoes that don’t fit.

Recipe for collaborative metatagging

I am not surprised about great merit of unwitting collaboration. Market economies are built on such a principle. So then, this has got me to thinking. At some risk, let me give my recipe (half-baked not fully cooked) for a useful way to improve the collaborative metatagging documents. We begin with a non-proprietary controlled vocabulary, collaboratively developed. Ultimately concepts are expressed as indices – i.e., thoughts have part numbers, logos and graphæ are distinguished.

To translate a logos to its unique index, users employ a Rosetta stone. This could be proprietary but it is freely available and updates based on usage. Okay, that’s the hard part, but the alphabetical index of the Yellow Pages tells you that “used car dealers” and “auto dealers, used” are both really “p158.” Is that so different?

Now, it is well-known that search engines cannot use metadata because some people lie or goof. To counter that, those using the system rate the returns from the search engine with a checkbox – true or false. Over time, the lies and mistakes are filtered from the search results.

Will it work? I have certainly oversimplified things, but I still think so. Can we just regard it as a work in progress?


Labels: , , , , , ,