Local Library

SDSR

A Framework for Auto-Generating Library Catalog Tags

NOTE: This was my final project for a graduate course.

Currently, the average online public access catalog is a static program tied to subject headings and keyword searches. These are a good starting point for finding library materials, but as the world of searching has advanced e.g. Google, libraries have stayed behind. Some libraries have tried social tagging and word clouds in the catalog, but to my knowledge, nothing has been implemented that is truly responsive to the way patrons search. That’s why I am proposing a system that auto generates tags for library materials based on the actual natural language searches of patrons. The system would improve the more it is used and if it connected with other libraries, could become even more effective in a shorter period of time.

Proposal

Difficulty exists with subject headings as they are slow to adapt to new information or topics. They are also unintuitive and difficult to navigate. By using natural language patron searches in item records we can direct patrons to resources that are tailored specifically to their search. That is not to say that subject headings are not useful. They will be good for building and fleshing out the system and will still exists as tags that stand on their own.

Rather than offer a search formula, the process creates a hierarchy based upon user success. If an item is placed on hold or checked out after the use of a certain search term or phrase, that term or phrase would be tied to the item. The next time someone searched that same thing, the item would have a higher priority in the results. The more a term is searched and applied, the more the material(s) move(s) up in the results.

By using patron keyword searches as item tags and attaching those tags to the item(s) that they ultimately check out as a result of that search, we can make search results operate in a more intuitive way. The notion is that if one person thinks to search a term to find something, another patron may search that same way. Also, because of the specificity of the tags, they will only work when the exact term that is attached is searched. This will prevent stray words from pulling up inaccurate results into the ranking.

Ideally the catalog becomes more robust over time rather than less, as currently is the case with subject headings. The idea is that the more people search, use, and have success with the tag system the more tags will be applied and results will only get better. The system will obviously be based on the current structure, as how else would people find the materials to which we attach the tags in the first place? But as the system grows, more and more focus will be on the new folksonomic system. Users could have the option of opting in or out of the keyword system for privacy reasons, however it must be noted that the tags will be linked with the items themselves rather than the patron. There will be no record of who searched what.

If libraries could tie their systems together, it could speed up the process of attaching tags to items and create a larger, much more robust system. By connecting systems, if a library tagged an item with a search term, other libraries would also have that item tagged with the term. This would be a longer term goal for the program as it would generate more issues and bugs than simply a local implementation.

Methods

I’m currently looking at using the python programming language to write the program. There are a number of reasons why it shows a lot of potential. Google has used it extensively in creating its search process, so it has credentials to stand on. The language is good with data scraping, or sifting through data to find what someone is looking for, and within the code, python can create and modify lists. These lists would work well for attaching the necessary tags on the back end.

As it happens, my abilities with python are currently just a smidge better than zero. Because of this, I figured I'd put the idea out for anyone interested in using or adapting it. I've attempted to put my thoughts in based on the little knowledge I have.

Process:

User enters a search term
The term pulls up a list of results
If an item is checked out or put on hold from that list of results
Then attach term to item in the form of a tag
Else disregard term
(temporarily stored Term) = raw_input(“Search the catalog: “)
“If temporarily stored term finds book in results AND book is checked out or put on hold,
Then term becomes tag and attach tag to book”

User End Example: Searching for a ‘Field Guides to Trees’

Patron searches the catalog with the phrase “tree guide”
They scroll through titles like: “The Official Guide to Family Tree Maker” and “The Tree Doctor Guide to Care and Maintenance”
They come to “The Sibley Guide to Trees” on the fourth page of the catalog, result number 48.
They put it on hold or get it off the shelf and check it out.
Either one of these actions will automatically assign the search tag “tree guide” to the book.
The book will get higher priority in the results for the next patron who uses that search term.

Discussion

How do we prevent certain items from a confirmation bias i.e. the same items coming up over and over again and getting stronger in the results and drowning out other potentially relevant or useful items? This may remain an open question for the time being.

Wouldn’t patron need to be logged in to track which book they then checked out using that keyword? So they’d need to be logged in to even search? Not necessarily. An item would be tagged if it showed up in a search from a patron and was checked out or put on hold within a certain amount of time of the search. Librarians would have the ability to generate a list of library tags and modify ones that may be attached in error. This kind of error could occur if materials were accidentally included in the results and also just happened to be checked out within the time frame. Ideally, the more the system is developed, this situation becomes less likely to occur.

How do we avoid a “Google Bomb” possibility? Because library staff will be able to generate a modifiable list of tags and associations that they can modify or delete, the system will give human oversight to the process to weed out tampering. If the searches are tied to patron accounts or traced backwards through holds or checkouts, it may be possible for library staff to take actions.

This kind of option, however, raises certain privacy questions. Does this kind of system overstep the boundaries of patron confidentiality? Or is it no different from the way we do things now as only library staff would have access to the information. My take is that it would be the same as now and would not give library staff much more information about a patron than they already have. The only thing that would change is that they could more easily see a patron’s search terms within the catalog and as long as the library staff is ethical, there would be no real problems.

Conclusion

Going forward I would like to work on creating a prototype, or at least a more detailed method for bringing it to life. I would be interested in taking this prototype, this framework, and my questions to other librarians, programmers, and thinkers to get opinions on the viability of this model and to bring up things I haven’t thought about. With something like this, I believe that to make it as good as I can, it needs to be seen and discussed from different angles. The best path may even be to make it an open source project that anyone could work on and improve. I’m excited to pursue this system and get a better understanding of how technology and libraries intersect in the process.