KM 5433 Blog/Joe Colannino

A blog discussing knowledge management and library science issues.

Sunday, September 17, 2006

Review and Musings: Khaled A.F. Mohamed: The impact of metadata in web resources discovering

Metadata are data about data. I have a particular irritation with this word because

  1. It is a proprietary term, not a generic one; therefore, it has no business being used by a library community that has a mission to safeguard intellectual property.
  2. It is of mixed origin: meta is Greek preposition while data is a Latin noun. (Supradata would be a more consistent synthesis having a wholly Latin etymology.)
  3. The word metadata is a plural noun which is routinely and incorrectly treated as singular by persons who ought to know better. Both data and metadata are incorrectly conjugated most of the time, typically in library and information science journals and books: “the data says….” or “the data is…” are incorrect conjugations; “the data say…” or “the data are…” have a subject and verb that agree in number.
  4. For the record, the singular of data is datum. While I’m at it, there is no such word as specie, the word is species.
  5. I feel better now.

So, from this point forward I am going to use the term supradata in lieu of metadata, even though that is a nonstandard term and never used once in any library science journal of which I am aware nor by Mohamed in his article. I will limit its use insofar as possible to reduce the irritation to the reader – if it becomes unbearable, search and replace supradata with you know what.

Examples of supradata

A most familiar example of supradata is the library card catalog. It contains three entries: author, title, and subject. If the book comprises the data then author, title, and subject are supradata. You can think of supradata as data labels, similar to file folder labels. The Dublin Core standard has 15 main labels: e.g., creator, title, subject, description, etc. Keywords specifically called out by the creator or indexer also comprise supradata presuming they represent some attribute of the data as a whole. These are the kind of supradata Mohamed examined.

What did Mohamed find?

Supradata made no significant difference in how search engines listed electronic records. This is disturbing because supradata’s raison d’être is to aid search and retrieval, yet it was precisely in this regard that they had virtually no influence.

There are a number of reasons for this; foremost among them is the practice, known as spoofing – the dishonest cramming of unrelated supradata into a document in order to inflate a search engine’s tendency to link to it. There are other ways to do this as well – for example, googlebombing: a Google search on miserable failure pulls up President George W. Bush’s web site. The Google site explains why this is so. These are reasons that search engines cannot rely on supradata.

And this leads to the axiom I wish to contemplate in this blog: the dishonesty of some compromises the integrity of the whole, or in biblical parlance, a little leaven leavens the whole batch. If so, where do we go from here?

Like Heaven

Imagine a world where no one lies, or steals, or locks their doors at night – there are no locks. There are no antivirus software or anti-spyware programs. Supradata are part of the warp and woof of documents and they are implemented with excellence and honesty. Casting the net wider – some professions have disappeared: there are no locksmiths, no security industry; but the world is richer because the people with those skills have turned their hearts and souls and minds toward serving others in even more powerful ways. Resources are redeployed from defensive uses to productive ones – swords are beat into plowshares.

Such a world is not purely equitable – diversity is celebrated: some are richer than others, some are smarter, some are thinner, but no one is envious. Singers sing and listeners listen, painters paint and viewers admire, and all rejoice in one another’s accomplishments. We are free to follow our heart and improve our lot. Our motives are pure and well executed.

That would be heaven, but we live in Los Angeles.

Like Los Angeles

When I was fourteen, living in the greater Los Angeles area, I flew from L.A. to Italy, but almost not: we nearly missed the plane because we couldn’t find the key to the front door. It had never been locked. Not at night, not during the day – never. Our 1961 Chevy Bel Air sat in front of the house with its windows rolled down and an ignition that could be turned without a key. Neighbors would knock on the front door and enter without an invitation. Windows were left open and we all enjoyed the night breeze and blooming jasmine.

Like Today

The world has changed – in some ways for the better and in other ways for the worse. I take great comfort in the fact that I am unlikely to die from anything I hear about in the news: by definition, I will not die of Lyme disease, pit bull attacks, El Niño, plane crashes, or anything else described by a media biased toward the peculiar. I will lead a very boring life and croak from congestive heart failure (if natural) or a car wreck (if accidental). This is how people die, statistically speaking. They do not die from the big one or a meteor striking the earth. In the main, we do not get sucked down into earthquake-generated fissures or sucked up by tornado-generated vortices. Some do, but with a frequency that is so vanishingly small it is less than useless to worry about. This is good news.

Like Yesterday

I do not long for the good old days because those were the days when civil rights were meted out to the few, life expectancy was lower, and the air and water were dirtier. Those were the days when the L.A. smog was so bad it would hurt to breath while playing.

Like Me and You

It has always been a small minority – the little leaven – that has spoiled things for the rest of us. Violent felons represent less than one-half of one percent of the general population. We can think of spoofers and spammers as leaven within the big batch of the web. While I do not have faith in technology per se, I have faith in the big batch insofar as it contains the image of God – people – like me and you. Technology is the great homogenizer. If the overwhelming majority of persons – the big batch – are decent, then the ubiquity of guns, video cameras, and web technology is overwhelmingly beneficial. If technology is restricted then the law abiding among us have preferentially and statistically diminished access and technology becomes oppressive.

Like Tomorrow

The disdain of the big batch by the leaven is very old; elites consider crowds ignorant and accursed. Media elites are fearful of their dwindling information monopoly. With regard to the particular issue Mohamed explores, the leaven has adulterated supradata and those of vision among us have responded by rendering it superfluous. Overall, this is a good thing. The solution to evil is quarantine. If we cannot eradicate bad actors from the web, the next best thing is to render them impotent. Supradata have a place in controlled environments where one can keep leaven from being introduced into the big batch. In uncontrolled information environments one needs supradata generated on the fly with appropriate safeguards to exclude the adulterating leaven; this will require new technology generated by men and women of vision. As regards the web, I see no role for controlled vocabularies or supradata generated in the traditional way. I do see a role for tomorrow’s technology brought to us by those in the big batch who dare to dream.