I am willing to admit that I remain skeptical about the “one big pile” approach to next generation catalogs that is sweeping the library automation world. While I don’t agree that advanced relevance ranking techniques are ineffective on bibliographic records (go look, there is no literature that I can find on this topic…there’s tons on full-text, but nothing on surrogate record relevance), I wonder what happens when the catalog becomes more than it used to be.
If a relevance algorithm is based on whether or not a library holds a title, what happens when an article is thrown in the mix? How does/will Google’s relevance algorithm work when the body of content is 20M books and 20M articles?
One development I am encouraged by comes from our friends at Bowker Syndetics, the folks who have been enriching catalog records for several years now. Traditionally, catalog enrichment with things like book jackets, Tables of Contents (TOCs), reviews, etc., is done on the fly by tying content to something like an ISBN. Of course, the problem with enriching records on the fly is that the content of the enrichment is not part of the retrieval process.
Traditionally, the way around this has been to dump tons of data into the MARC record itself—the perfect example of tradition stunting progress. Our profession’s obsession with “the record”—not MARC, but the record itself—has led to missed opportunities, both philosophical and technological.
Syndetics now has an interesting compromise, called ICE (Indexed Content Enrichment). What if you could have all the enrichment and index it with your MARC data? New catalogs—AquaBrowser, Endeca, Primo, and Encore—will certainly help this idea along. It may even be what led Bowker to see Medialab (creator of AquaBrowser) as a nice little acquisition opportunity.
Calling all researchers! Let’s not make the mistake that some of the vendors and showroom floor demo wizards are. We need more research in this area. Indexing first chapters, reviews, tables of contents, flyleafs, and annotations—and turning media awards and fiction files into faceted navigation elements—does not necessarily improve relevance ranking. It can provide recall where there was none before, but relevance is something different. And how will any of this compare with full-text (especially book-length text) relevance ranking?
Is Bowker onto something? I got to thinking about all the hub-bub over BISAC codes in the public library space. Then I thought about Bowker owning Books in Print and all this enrichment content. They and others are also heavily involved in the ONIX standard for publishers. Then I recalled that AquaBrowser has a deal with LibraryThing for tagging and other content. Throw in a little ICE and you’ve got a pretty interesting cocktail, making this a more intriguing battle:
BIP + ONIX + BISAC + ICE + LibraryThing vs. MARC
Throw in all the full text that is coming at us and all bets could be off. Think about the fact that Bowker is part of the Cambridge Information Group which also owns CSA, ProQuest IL, and RefWorks; and now Bowker owns AquaBrowser. Boy, all Bowker needs is an ILS for a soup-to-nuts package. I reckon there’s one or two for sale out there.
[This post originally appeared as part of American Libraries’ Hectic Pace Blog and is archived here.]