Technical Overview of Searching in Quicksearch and Books+
Quicksearch unites several Yale search services under one discovery interface. A search from the main Quicksearch page (search.library.yale.edu) returns results organized in two “bento boxes”: Books+, which includes resources from Orbis and Morris, and Articles+, which includes licensed content from Summon. Selecting Books+ enables more focused searching and browsing within the library catalogs. For example, the user may limit a keyword search to a specific field such as “Title,” and narrow their results through adding or removing terms from facets such as “Format,” “Location,” and “Language.” The user may also browse the facets without a keyword search, for example to display all records for resources with the format Journals & Newspapers and in the Thai language.
Quicksearch is Yale’s implementation of the open-source Blacklight application. Searching Books+ in Quicksearch is powered by the underlying Solr index, which structures data derived from MARC records for efficient querying and retrieval. Updated unsuppressed bibliographic and holdings records are extracted daily from Orbis and Morris. The extract process transfers selected information from the MFHDs, such as location and call number, into local fields in the associated bibliographic records, which are then staged for ingest into Solr. The ingest process analyzes these modified MARC records and transfers their data to one or more indexes in Solr, while also performing transformations such as removing non-filing characters from titles going into the Title sort index, or translating language codes into names.
Four main types of indexes govern the discovery interface: search indexes, which are queried when a user performs a general keyword or field-specific search; facet indexes, which populate the facet browse; sort indexes, which are used to sort results; and display indexes, which contain the text displayed to the user. Each index may be configured separately to optimize it for the particular function it supports.
When a user enters one or more search terms, Solr queries the search indexes associated with the relevant search type (for example, keyword, Title, or Subject). Solr uses a complex and customizable set of algorithms to select search results and rank them by relevancy. Search results are “boosted” in the rankings depending on the index in which they appear. For example, in a general keyword search, results rank higher for search terms in any of the indexes associated with title, subject, or author. Search boosts may be assigned to individual indexes. For example, searching by author queries two indexes, one derived from the main author entry and one derived from any added author entries. Matches in the main entry are boosted higher than added entry matches. Additionally, results are boosted higher if two or more of the search terms appear in close proximity within the record or within an entry in the index.
All searches in Quicksearch are essentially keyword searches, but enhancement syntax may be configured internally or invoked by users to refine search handling in specific ways. Quicksearch does not use a static Boolean connector for search terms, but rather dynamically calculates a minimum number of terms that a result must match, based on the number of search terms entered. Users may force a term to be included or excluded from the results by prefixing a search term with “+” (require) or “-“ (omit). Quicksearch also discounts stop words, and performs some stemming to support fuzzy matching (for example, searching for “beginning boating” would also return results with “beginners,” “boat,” “boats,” etc.). Using quotation marks around a phrase causes the phrase to be handled as a single search term.
Refining how Quicksearch processes user searches is an ongoing process, and will draw upon both user feedback and data from web analytics. Public-facing documentation of user-customizable search syntax is also underway.