As libraries struggle to create their identity in the digital age, they continue to cling to an antiquated concept in regards to accessing their resources. “To use our collections, content, services and resources, you must enter through the front door”. This is a throwback to the physical limitations on libraries as place. This obsolete notion does not take into consideration the less structured nature of the web. Since the confines of “space” are not an issue, the user can effortlessly flit from one resource to another, contextually and tangentially, regardless of location or affiliation.
Librarians will immediately point out some “problems” with this model. The first, of course, is the very real fact that majority of the content on the web is not academic, peer reviewed or scientific. Most of that content is locked away in publisher and journal aggregator websites protected by IP address or password. Google Scholar and Elsevier’s Scirus begin to address this problem; allowing anyone to search their indexes of copyrighted scholarly content.
This raises the second issue: the content that is supplied is not necessarily the appropriate copy for the individual. Take, for example, Google Scholar and the Journal of Theoretical Biology. The links that Google Scholar provides are to the Journal of Theoretical Biology as aggregated by Ingenta. Georgia Tech, however, has access to the Journal of Theoretical Biology via Elsevier ScienceDirect, which is unlikely to allow Google to index them in the near future. Ingenta will gladly sell the researcher the linked article for $58.32, but, since the library already has electronic access to the same journal, this is unacceptable.
The traditional “solution” to these dilemmas is to discourage the users from using “lay” internet resources and train them to use the library, its website and the electronic resources it pays for. While it certainly is useful to have a population of “information literate” patrons, this method is ultimately inefficient, counterintuitive and entirely too complicated to demand of our users. As Roy Tennant has said, “Only librarians like to search, everyone else likes to find”. Library web resources frequently put a tremendous burden on the researcher to search: to navigate the library website, to know the appropriate database/journal/catalog to search, to understand the boolean query. We expect them to understand why all databases are not full-text. We give obscure links to even more obscure notions as link resolvers and require the user to comprehend the convoluted path that may not even be taking them to what he or she wants.
This assumes, of course, that the researcher is even using the library’s resources, which studies show to be less and less true. According to LibQUAL+, while roughly 40% of faculty at ARL institutions claim to use their library’s website and resources daily, only 11% of the undergraduates say the same and, more tellingly, that 5.5% of undergraduates never use the library’s website at all. In a speech that John Regazzi, of Elsevier, gave at the 2004 NFAIS Annual Conference, he said, “In a survey for this lecture, librarians and scientists were asked to name the top scientific and medical search resources that they use or are aware of. The difference is startling. Librarians named Science Direct, ISI Web of Science, and Medline, while scientists named Google, Yahoo, and PubMed (librarians also named PubMed).” 
The real solution to this problem is not by training the constituency to jump through the proper hoops to find their data, but to localize and contextualize the web for the user. By localizing, I mean that links point to the appropriate copy of a given work, that access through a local proxy server is provided, if necessary. Contextualization refers to allowing the user to surf wherever he or she pleases and offer alternative resources and links to canned searches based on the user’s query and location. Certainly the latter is more complicated and requires more work on the part of librarians and developers. However, this is a much more logical course to follow for the future. It also helps to give the library and the librarian relevance in the internet age. To the internet researcher, the library should be considered less as a destination and more as a service to aid the researcher.
Now for some real world examples to explain how localization and contextualization will solve not only the two problems librarians raise with the web, but also the third, unaddressed problem; getting people to use the resources that librarians have compiled for the researcher’s benefit. First there is Wikipedia (http://www.wikipedia.org/), a community contributed “web encyclopedia” that “anyone can edit” and current flashpoint among librarians. The prevailing argument against Wikipedia by librarians is that is not peer reviewed and it is non-authoritative. The very notion that “anyone can edit” it means that there is no guarantee that the author of a given topic entry is any sort of an expert on the given subject. This a very valid concern, however, it does nothing to address whether or not Wikipedia is a valid search interface nor does it help the user find “better” content from where he or she is currently. A Wikipedia search for “chaos theory” returns a fairly descriptive page about this concept including a definition, history, related terms and references. The library could also include links to its metasearch software, ScienceDirect or MathSciNet with a canned search, a link to the catalog for the subject “chaos theory”, check holdings to see if the references are available locally, and, if a folksonomic layer is applied to the the library’s resources, link into that thesaurus, as well.
To use another example, imagine a user searching for “chaos theory introduction” in a major search engine. Suppose one of the returned results is the page: “Subject Guides: Chaos” by the Goddard Space Flight Center Library. This page includes links to six electronic journals and three databases. If the user was to just stumble across this page while searching from home, she would not be able to access these resources even though Georgia Tech subscribes to every one of them. Even if the user happened to be performing their search on campus, where IP address restrictions are not an issue, and she finds herself at Caltech Library System’s Physics page, she will be unable to access many of the resources due to the fact the links provided route through Caltech’s proxy server which the user does not have the credentials to access, despite the fact that she has the credentials to access the resource in question.
These issues have existed for years and, in general, have only gotten worse as institutions try to improve access to their resources to off-campus users. Indeed, access to resources for off-campus users has improved dramatically, although this has only increased the dependence on the “front door” model of delivery. This method would most likely have persisted for many more years had Google not introduced Google Scholar, which worked in the exact opposite way from how all of the scholarly databases had in the past.
Instead of being an index of content that the library has permission to access, it is an index of all kinds of content, with no regard to whether or not the searcher’s organization has rights to view what it links to. It works less like the myriad aggregator databases that a library might subscribe to and more like a robust federated search or metasearch engine. Its performance, however, is completely unmatched by any competitor, as it returns results from its own local indices rather than remotely querying the countless databases via Z39.50 and other methods that traditional metasearchers utilize.
Most importantly, it is free. Google provides the library with a robust and versatile metasearch engine that at the very least competes with, and certainly complements, the commercially offered rivals that vendors attempt to sell them. The catch, of course, is that the copy that Google exposes may not be appropriate to the library. In fact, it is certainly possible that no protected content available in Google Scholar is accessible by the library’s users, even though every article is available electronically from the library. This “problem” makes for the perfect opportunity to fix the broken service model that has pervaded libraries since the introduction of the world wide web.
By accepting the fact that good and useful content exists outside the confines of the library and its resources, the library and its librarians will be able to focus on establishing their importance in the daily life their users. They would be able to make suggestions and recommendations that the user would not have thought of or known the existence of. They could make the user’s library ubiquitous with information search and retrieval, which is exactly what the library is supposed to be. The library would become an adaptable point-of-service tool available to user outside the traditional library locations.
If the content in Google Scholar is available to the user through other means, the question becomes, “How do we expose the appropriate copy to the user?” Another important question to answer is, “Can we use these methods to localize other remote resources?” In Service Autodiscovery for Rapid Information Movement, Chudnov and Frumkin propose an answer to the former: using simple HTML tags to create OpenURL links that are vendor and institution neutral. Either through web browser extensions or bookmarklets, these HTML tags become links to local resources.
This is an incredibly versatile and simple solution to open the metadata of a resource to other avenues of exploration. It does not, however, address how these tags come into existence nor what sorts of services could be implemented as a result of this exposure. Service autodiscovery, however, is integral to creating maintainable and extensible localization and contextualization services and is, in essence, the missing link between the state of information technology now and what it could be.
It is fairly simple to make Google Scholar localized and contextualized. The data is fairly uniform in scope and layout. The process of “screen scraping” to create OpenURLs is less than ideal but sometimes necessary. If Google, and other entities like it, are unwilling or unable to create OpenURL Autodiscovery tags, it is still possible to parse the text of a given result page and create these tags (or their equivalent). Take Google Scholar as an example: the first line of any result is the title. If the item is a book, the title is preceded by [BOOK]. If the item is a citation (which means no link to fulltext) it will be proceeded by [CITATION]. The second line is the author. The last line is the source. Google may wind up tweaking its interface a bit, but most like this format will remain similar.
Knowing even this small amount of information can aid us considerably in making Google Scholar work more effectively for our users. There are some assumptions we can make about the result items that will aid in locating local items. For instance, if the item is a book, Google Scholar will present a link to that book’s OpenWorldcat record. By grabbing the url to the OpenWorldcat link, we can then parse the OpenWorldcat page and grab the ISBN. With that, we can query OCLC Research’s ISBN Concordance (xISBN) web service to get back all ISBNs associated with a particular work and then query the local catalog for holdings on all of the ISBNs. If there is no match, the query could be extended to a consortial union catalog.
Links to journal articles would be rewritten to be accessed through the institutional web proxy server. With the citation information that has been obtained by screen scraping, the local link resolver can be queried to see if the item is available via fulltext. The only flaw with this method is that, given the sparse metadata that can be gleaned from reading the HTML text, most link resolvers would be unable to link directly to the article.
With the localized data, it is now possible to start creating contextual applications, as well. With the ISSN of a given journal in the result set, it is possible to query the local catalog (or, if not held locally, the Library of Congress catalog) and retrieve the LC Subject Heading for that journal. With that, it is possible to find other items in the local catalog and present them as possible alternatives for information. Another approach could be to retrieve subject information by retrieving the title of the database the journal appears in, and, assuming that it is not only available from a large multidisciplinary database, the localizer could retrieve other databases in the subject based on data in a subject guide or a database of databases.
Ideally, the system could also query these databases and return the hit count based on the user’s original query. Canned search links could then be presented giving the user not only more choices to search, but also helping them determine which resources would be appropriate to search. The “related terms” that are returned in Scirus queries, could also be pointed at local assets to discover more appropriate content. Also, subject-based limits could be placed on search for the query in a local metasearch engine. This would allow a very “lay” resource, such as Google Scholar, to expose and assist more sophisticated resources such as catalogs and metasearch applications.
The localizer could also present a link to the virtual reference system so the researcher can ask for assistance where they already are and it would give the librarian some context as to where the user is. Bookmarking services (such as Furl, de.icio.us or unalog) could save localized pages or even individual citations. Citation managers (such as CiteULike or NPG’s Connotea) that include OpenURL Autodiscovery tags could become useful citation databases.
It is important for libraries to continue to find ways to make their collections more easily accessible to their users. When information overload sets in, the users will naturally go to the places they feel most comfortable. It is at this point that we need to make useful suggestions to aid them in their research. By examining what the user is looking at and where they are looking it, we can give helpful advice as to where they could or should expand their search. It is time to show our patrons the multitude of side doors into our collections and services.
- Tennant, Roy (2004). Metasearching: The promise and peril
Retrieved February 14, 2005, from http://escholarship.cdlib.org/rtennant/presentations/2004newyork/metasearch.pdf
- Lippincott, S., & Kyrillidou, M., (2004).
How ARL University Communities Access Information: Highlights from LibQUAL+TM. ARL Bimonthly Report 236, October 2004
Retrieved February 14, 2005 from http://www.arl.org/newsltr/236/lqaccess.html
- Regazzi, John J. (2004). The Battle for Mindshare: A battle beyond access and retrieval
Retrieved February 15, 2005 from http://www.nfais.org/publications/mc_lecture_2004.htm
- Anderson, N. (?). The Goddard Library – Subject Guides: Chaos
Retrieved February 16, 2005 from http://library.gsfc.nasa.gov/SubjectGuides/Chaos.htm
- Smith, C. (2004). Physics
Retrieved February 16, 2005 from http://library.caltech.edu/collections/physics.htm
- Chudnov, D., & Frumkin, J. (2005). Service Autodiscovery for Rapid Information Movement
Retrieved December 15, 2004 from http://curtis.med.yale.edu/dchud/writings/sa4rim.html
- OCLC. (2004). xISBN [OCLC – Projects]
Retrieved November 25, 2004 from http://www.oclc.org/research/projects/xisbn/default.htm