I am willing to admit that I remain skeptical about the “one big pile” approach to next generation catalogs that is sweeping the library automation world. While I don’t agree that advanced relevance ranking techniques are ineffective on bibliographic records (go look, there is no literature that I can find on this topic…there’s tons on full-text, but nothing on surrogate record relevance), I wonder what happens when the catalog becomes more than it used to be.
If a relevance algorithm is based on whether or not a library holds a title, what happens when an article is thrown in the mix? How does/will Google’s relevance algorithm work when the body of content is 20M books and 20M articles?
One development I am encouraged by comes from our friends at Bowker Syndetics, the folks who have been enriching catalog records for several years now. Traditionally, catalog enrichment with things like book jackets, Tables of Contents (TOCs), reviews, etc., is done on the fly by tying content to something like an ISBN. Of course, the problem with enriching records on the fly is that the content of the enrichment is not part of the retrieval process.
Traditionally, the way around this has been to dump tons of data into the MARC record itself—the perfect example of tradition stunting progress. Our profession’s obsession with “the record”—not MARC, but the record itself—has led to missed opportunities, both philosophical and technological.
Syndetics now has an interesting compromise, called ICE (Indexed Content Enrichment). What if you could have all the enrichment and index it with your MARC data? New catalogs—AquaBrowser, Endeca, Primo, and Encore—will certainly help this idea along. It may even be what led Bowker to see Medialab (creator of AquaBrowser) as a nice little acquisition opportunity.
Calling all researchers! Let’s not make the mistake that some of the vendors and showroom floor demo wizards are. We need more research in this area. Indexing first chapters, reviews, tables of contents, flyleafs, and annotations—and turning media awards and fiction files into faceted navigation elements—does not necessarily improve relevance ranking. It can provide recall where there was none before, but relevance is something different. And how will any of this compare with full-text (especially book-length text) relevance ranking?
Is Bowker onto something? I got to thinking about all the hub-bub over BISAC codes in the public library space. Then I thought about Bowker owning Books in Print and all this enrichment content. They and others are also heavily involved in the ONIX standard for publishers. Then I recalled that AquaBrowser has a deal with LibraryThing for tagging and other content. Throw in a little ICE and you’ve got a pretty interesting cocktail, making this a more intriguing battle:
BIP + ONIX + BISAC + ICE + LibraryThing vs. MARC
Throw in all the full text that is coming at us and all bets could be off. Think about the fact that Bowker is part of the Cambridge Information Group which also owns CSA, ProQuest IL, and RefWorks; and now Bowker owns AquaBrowser. Boy, all Bowker needs is an ILS for a soup-to-nuts package. I reckon there’s one or two for sale out there.
[This post originally appeared as part of American Libraries’ Hectic Pace Blog and is archived here.]
BIP + ONIX + BISAC + ICE + LibraryThing vs. MARC
Whew, space shuttle versus donkey-pulled cart? Game over. Thanks for writing about ICE; I saw it demo’ed at Midwinter and was intrigued.
Andrew –
You can take a look at ICE in action with Aquabrowser at http://boss.library.okstate.edu
More info at http://www.library.okstate.edu/news.htm
Maybe I’m missing something? I just went to Oklahoma’s catalog and tried the ICE/AquaBrowser combo, and did a simple search for “Middle East Women”. Literally none of the topics in the cluster on the left had any relevance (why was “employment” brought up?), and I had to go two pages into the ranked search results to find books on women in the Middle East. I then clicked on the “classic catalog” tab and performed the same search. A relevant item was listed first thing!
I admit I love the look of the AquaBrowser product, because classic OPACs are just so dull. But every time I’ve tried AquaBrowser on another library’s catalog it hasn’t brought up the things I expected it to. Do we have to sacrifice form for function or function for form? Can’t we have both?
Hi Lisa,
Thank you for sharing your experience!
Let me help with some background. The word cloud’s goal is to provide users with useful or surprising connections that lie within your data – in an open, spontaneous, and serendipitous way. Since it derives deeper connections based on textual content, the relevance of certain associations is sometimes lost on people that know their cataloging well.
From and end user’s viewpoint however, it makes sense: if you key in ‘Middle East women’ the word cloud offers associations to help specify what you mean: middle east women in the context of what? ‘Employment’ may very well be the context you mean, and all you have to do is click it. This brings up ‘Women, work, and economic reform in the Middle East and North Africa’ on top, followed by ‘Middle Eastern women and the invisible economy’. Two great findings in the context of Middle East Women and employment, that would be extremely hard to tease out of classic catalogs, especially if someone doesn’t know about limiters or about controlled vocabulary.
There’s more to AquaBrowser than the word cloud however, and your search result is a case of ICE in action. The second and third result came up because your search terms were found in the annotations and table of contents. The third result even has a full chapter on histories of Middle East women. That result would not have come up in an environment where the marc record wasn’t enriched and searched in real time.
In your example, the book you had in mind gets pushed down. I too think it would better suit a position more to the top of the list. Examples like this drive and help us to further improve our relevancy ranking algorithms.
As for the faceted search part to the right of the screen, the advantages are numerous and well established both in research, writings, and customer success. AquaBrowser is currently the only system running live at libraries to provide precision faceted search on the catalog and other data sources.
Thanks for your input!