A while back, I made a joke on April 1 about Google buying OCLC. This was actually a well coordinated April Fool’s Day attack between myself and the folks at ALA Techource (none of our bosses from back then want to know how much time actually went into this coordination). I guess one true test of a joke is its staying power, and this one has it, oddly enough (I suggest using Google to see the folks who took it seriously, and that way I can avoid embarrassing anyone). From time to time, I even find current links to the announcement treating it as real news.
So, either this joke was really funny, or the juxtaposition of two “big switch” players is intriguing to librarians. My money is on the latter. I’m pretty pleased that OCLC has embarked on record-sharing deals with Google because I have always thought that search companies with great algorithms generally undervalue the power of metadata. I’m convinced that they only way to prove the point is to show them, as libraries are starting to do with faceted browse catalogs.
I’m also insatiably curious as to what the first page of search results in Google Book search will look like when there are 100 million books in the database. What will Scholar look like with 100 million books and 100 million articles? The best answer I am able to get from Google is “highly relevant.” Relevance ranking is hard, as we have learned in enhancing WorldCat with non-monograph metadata. Done well, however, it greatly enhances the discovery experience for patrons.
Other people are beginning to wonder out loud about the prowess of Google and WorldCat data. Another recent post had me wondering about the future of Google Books. CrossRef has created a new plagiarism screening service calledCrossCheck (clever!). Another indicator of my love of data and what it empowers us to do and discover. It got me thinking (un-originally, apparently) about what Google could cook up in searching for plagiarism once it has millions and millions of books scanned. I understand that Google is doing duplicate checking in its scans to keep from scanning books twice, so I imagine that “plagiarism checking” would be rather simple for them too. Literary crime detection using Google could be a fun pastime.