Skip to content
blog_transkribubus-1

+ Searching the Spanish Golden Age with Keyword Spotting

READ-COOP |

In sixteenth- and seventeenth-century Spain, there was a significant surge of thousands of theatrical productions. This period has become known as the Spanish Golden Age.  Thanks to a new protoype web tool, anyone can now search through 40,000 images from a significant digitised collection of manuscripts relating to this period of Spanish history.  This tool uses cutting-edge Keyword Spotting technology, allowing users to search images which have  never before been transcribed.

This tool is a collaboration between the Pattern Recognition and Human Language Technology research centre at the Universitat Politecnica de Valencia (one of the READ partners), the National Library of Spain and the PROLOPE research group (both READ MOU partners).

The PRHLT research centre has treated these manuscripts with advanced text recognition and probabilistic word indexing technology.  This sophisticated form of searching is often called Keyword Spotting. It is more powerful than a conventional full-text search because it uses statistical models trained for text recognition to search through probability values assigned to character sequences (words), considering most possible readings of each word on a page.

Keyword Spotting for the word ‘Madrid’.

The 40,000 pages currently available for searching represents about half of the collection.  More documents from the collection will be processed in this way if further funding can be found.

The release of this Keyword Spotting tool coincides with a new exhibition at the National Library of Spain all about the Spanish Golden Age which runs until March 2019.  The exhibition will combine original manuscripts with digital displays.  The PRHLT team have a created an online quiz (in Spanish) for the exhibition which asks users to work with the Keyword Spotting too to find out which words appear frequently or in combination.

If you are interested in Keyword Spotting, check out other tools constructed by the PRHLT team relating to:

Share this post