Skip to content
blog_transkribubus-1
Success Story Research Universities

How the 'Material Culture of Wills' project transcribed 25,000 wills with Transkribus

Fiona Park
Fiona Park |

What did people in early modern England think about their possessions? Which objects held the most significance in their lives? It is these questions and more that the Material Culture of Wills: England, 1540-1790 research project is hoping to answer. 

A collaboration between the University of Exeter and The National Archives, this ambitious project is transcribing and analysing 25,000 wills from the period to find out how relationships with material objects evolved in the 250 years leading up to the Industrial Revolution. By harnessing the power of Transkribus, the research team, led by Professor Jane Whittle, is making a vast collection of historical documents accessible for large-scale research—something that would have been unfeasible with traditional manual transcription.

We spoke to Dr Emily Vine, a Research Fellow on the project, and Mark Bell, Senior Digital Researcher at The National Archives, to find out how the team is using digital tools to transform how historians approach early modern wills, and unlocking new insights into material culture in the process.

 

The original copies of the wills are all held at the National Archives. © Emily Vine 

 

The power of wills in understanding early modern lives

Wills are one of the most revealing sources of personal history from the early modern period. Unlike other legal or financial records, they capture the thoughts and priorities of individuals at the moment of their death by providing detailed descriptions of possessions and their intended recipients. 

As the team explains, “It is the element of choice in selecting and describing objects that makes wills so revealing of people’s attitudes towards the things they owned.” By working at scale with 25,000 wills, the project aims to uncover long-term trends in how people valued and interacted with their material world.

The wills in question are registered copies that were proved before the Prerogative Court of Canterbury, and are part of a collection held by the National Archives. “Each will was copied out by a Church Court clerk, in fairly uniform handwriting, and following a fairly regular layout,” Emily explained. “The large number of documents, and their relatively consistent handwriting and page layout, make them ideally suited to using Transkribus.” 

Many of the wills have a clear and consistent structure, making them ideal for using with text recognition software. © Emily Vine

 

A digital solution for a large-scale challenge

The idea to use Transkribus emerged after project lead Professor Jane Whittle attended a research seminar on the Bentham Project, which had successfully used the platform for a similar large-scale transcription project. “Jane had always wanted to conduct a large-scale study of material culture within early modern wills, but knew that the time taken to transcribe the necessary volume of wills would make this unfeasible,” Emily said. 

However, with Transkribus, the transcription process could be automated, saving the project team valuable resources and time. As Mark explains: “[Creating individual HTR systems] is computationally intensive and requires specialist infrastructure, the creation and maintenance of which would divert time and energy away from the main work of the project. [But Transkribus makes] the training and use of models incredibly easy, and [its] APIs make automation possible, which is essential when working at scale.”

 

The project is analysing wills across a 250-year period. © National Archives

 

Creating the first model

The starting point for the automatic transcription was a Ground Truth dataset of 400 wills that were manually transcribed by volunteers. “We were working with project volunteers who had years of experience transcribing wills, but usually worked by transcribing directly into a Word document,” Emily explained. This project was the first time they had transcribed directly in Transkribus, but, thankfully, they adapted relatively quickly. “I think many of [the volunteers] liked the fact that Transkribus highlights one line at a time and matches up the corresponding line in the manuscript [...],  as it can be easy to lose your place or skip a line when moving between a Word document and a photograph of a manuscript.”

Once the Ground Truth had been prepared, Mark set about training the first model and was pleasantly surprised by the results. “I actually started working with this collection in the early days of Transkribus, and the images were just too low quality to get any results,” he explained. “[However this time,] our first 18th-century model needed just 30 pages to get readable transcriptions.” 

This first model had a barely passable CER of 11%. To improve the CER, more Ground Truth data would be needed. So, the team came up with an innovative idea to generate accurate data using the power of the crowd.

 

Transkribus was able to recognise text even from poor-quality scans. © National Archives

 

Engaging volunteers through citizen science

Citizen science is a type of scientific research in which the data collection is done not by professional scientists but by members of the public. In large-scale transcription projects, this usually means using volunteers to transcribe documents or proofread the transcriptions of others. The Material Culture of Wills did the latter, asking volunteers to proofread the transcriptions generated by Transkribus.

To achieve this, they launched a crowdsourcing initiative on Zooniverse, an online platform that enables volunteers from around the world to contribute to academic research. The process begins with Transkribus generating ‘rough’ transcriptions using the best available model. These are then uploaded to Zooniverse, where volunteers check individual lines of text against manuscript images and make corrections where necessary. Once each line has been checked three times, the edited versions are consolidated to create a final transcription and additional Ground Truth for the model.

“The important part of our process is that we don't upload the whole transcription [to Zooniverse],” Mark explained. “We sample lines based on [quality] so most of the lines sent to Zooniverse are in need of correction. This makes the most of the volunteers' time as they aren't just checking lots of correct transcription.”

The volunteer engagement has been remarkable, with over 2,500 people already contributing more than 7,800 hours of work to the project. “We think that we have around 2 million lines of text to check in total, and to date, our Zooniverse volunteers have completed 353,402 line checks,” said Emily. Thanks to the Ground Truth generated by the volunteers, the team has been able to retrain several versions of their first model, making it more accurate with each training cycle. The latest versions of the 17th and 18th century models returned a CER of under 4%, a significant improvement in accuracy.

 

The Zooniverse site allows volunteers to proofread transcriptions in a fun and engaging way. © National Archives via Zooniverse

 

Creative collaborations: Bringing wills to life

Beyond transcription and analysis, the project is also working with creatives to explore innovative ways of interpreting the wills. Dr Laura Sangha, a co-investigator on the project, successfully applied for an Exeter Arts and Culture Creative Fellowship, which has led to a collaboration with composer and lyricist Chris Hoban.

The fellowship, titled Wills as Windows onto Past Lives, explores how historical and creative approaches can intersect to bring early modern wills to life. Chris recently joined the team at The National Archives to examine original manuscripts and begin developing artistic responses to the material. “The team have already started jointly exploring some of the stories and material culture encapsulated in these fascinating documents,” Emily told us. 

“Using Transkribus has also led the project in other fun and collaborative directions,” she went on to explain. “Two of our Transkribus volunteers have written posts for our project blog, and a few members of our Zooniverse community have also discovered shared intellectual interests via our talkboards, and are in the process of collaboratively writing a blog post.” These activities highlight the diverse potential of digital transcription projects—not only as research tools but as sources of inspiration for creative and intellectual engagement.

 

Chris visited the National Archives to examine the wills in person. © Laura Sangha

 

Lessons learned and advice for future projects

As with any large-scale digital humanities project, the team has faced challenges. One key issue was the poor quality of the microfilm scans used in transcription, which made it difficult for layout models to process marginalia and decorative elements. “We have eventually managed to train a decent model for layouts,” Mark explained. “But it was a painstaking process to create the Ground Truth. It still struggles a bit with marginalia, [however,] I plan to try training a Field Model to overcome this.”

Another challenge was deciding how to integrate crowd-checked lines back into the training data. As the National Archives is a member of READ-COOP, the cooperative behind Transkribus, Mark was able to seek advice from the Members’ Slack channel about possible solutions. “The [Transkribus] team suggested just creating an XML file containing those lines,” he said. “This has worked really well and means we can train our models and [produce] higher-quality transcriptions.”

For other researchers considering a similar project, Emily offers some key advice:

  • Assess whether automation is the right fit. “Figure out if the scale of the project lends itself to working with Transkribus. In our case, training an accurate model is much quicker and easier than transcribing 25,000 wills by hand.”
  • Technical expertise is key. “Ensure you have people with technical expertise working as part of the research team, or look at developing working partnerships with those who have that expertise.”



Creating digital versions of these historical documents requires technical expertise. © Laura Sangha

 

The future of historical research

Ultimately, the project demonstrates how text recognition technology and crowdsourcing can unlock new possibilities in historical research. “We’ve been delighted at the transformative potential of Transkribus,” Emily reflects. “A project of our scale wouldn’t be possible without it, and this is why it hasn’t been attempted until now.”

By combining digital innovation, volunteer engagement, and interdisciplinary collaboration, The Material Culture of Wills is shedding new light on the lived experiences of people in early modern England—one will at a time.

Thank you to Emily, Mark, and their colleagues for taking the time to talk to us—we wish you all the best for the project.

 

You can contribute to the Material Culture of Wills project by proofreading transcriptions on the project’s Zooniverse site and find out the latest news from the project team on their blog.

Share this post