KM 5433 Blog/Joe Colannino

A blog discussing knowledge management and library science issues.

Friday, November 24, 2006

Digital Library Archaeology: A Conceptual Framework... by Scott Nicholson/Peripheral Thoughts – J. Colannino

Here is a 25-page article that makes the case for data mining in a library setting. Dr. Nicholson calls this “bibliomining.” Clicking on the title above will take you to a summary, my thoughts here will be more peripheral.


Data mining is a relatively new field in statistics, but one that has gained respectability. The company with the most extensive record in data mining is SAS – it is certainly the dominant company in the industry. In my work as knowledge manager for my company, I do a fair amount of data mining. So, I know firsthand that data mining is hundreds of times weaker than on-purpose experimentation, statistically speaking. It is also not strictly possible to determine causal relationships with data mining, only correlative ones. That said, some data are only available in archival form and data mining is an appropriate strategy in these cases.


In looking at the blogs of my student colleagues, I see that many of them are more concerned about the implications of data mining in general than the technology in particular. I tend to agree. The title “Digital Library Archaeology” implies a reconstructive method of artifact analysis. However, archaeology is a science that pieces together history from the remnants of the deceased; if their history is maligned it will not inure personal detriment to the subjects themselves; however, that is not the case with bibliomining if based on an archaeological paradigm. Bibliomining cannot be archaeology – cultural anthropology perhaps, but not archaeology. Although Shakespeare penned “What’s in a name…” I must concur with Mark Twain when he said “The difference between the almost right word and the right word is the difference between the lightning-bug and the lightning.” Cultural anthropology has safeguards for dealing with the habits of living persons – safeguards for anonymity. With these safeguards (and only with them) I endorse the value of bibliomining, and we can be sure this practice will become commonplace.


Now to be fair, Dr. Nicholson is constructing a scientific case, not a political one. However, scientific methods are not without political relevance or repercussion. One cannot do science in a vacuum. As a member of the American Statistical Association (ASA) I have seen willful neglect of the political ramifications in statistical debates for some time now. Most recently this was expressed in ASA’s statement on intelligent design (see my response here). Prior to this, a lengthy debate ensued about whether census data should be adjusted to reflect non-response. Statistically the answer is “Yes,” and this was reflected clearly in the pages of ASA’s periodical, AMSTAT News. However, politically the answer is “No!” or perhaps “Hell no!” as we will certainly have hell to pay if we allow any political body to monkey with census data. Great wealth, political power, and dissemination of funding are tied to census data and it is sheer political naïveté to believe that such adjustments, if allowed, will be done free from corruption.


In the same way, library and information professionals cannot afford to treat their science as if it were part of some non-overlapping magisteria, to borrow Gould’s phrase. Without some clear ethical requirement to destroy all personally identifying information before analysis, bibliomining will become as misused as social security records were barely 60 years ago in Executive Order 9066. Academics should begin to use the their tenure as it was intended – as a foil to tyranny – in order to argue forcefully for practices and legislation to thwart the unwarranted political intrusions that their research may justifiably be imagined to empower. Unless they do, they are only doing half their job. Indeed, those of us without such a protection should argue just as forcefully, for what does it profit a man if he gains the whole world but lose his soul?