Skip to content
blog_transkribubus-1
Success Story Research Archives

How FromThePage enhanced their transcription platform with the Transkribus API

Fiona Park
Fiona Park |

FromThePage has long championed the idea that public participation can profoundly enrich historical research. As one of the leading platforms for crowdsourced transcription, it is used by heritage institutions around the world to turn historical documents into accessible digital resources with the help of volunteer transcribers. This approach not only provides valuable research material but also fosters a deeper connection with history for everyone involved. 

While FromThePage was originally conceived for manual transcription, the team behind it recently combined forces with Transkribus to create an “AI Assist” feature. Powered by Transkribus’ metagrapho API, this new feature generates an automatic transcription of the document, helping volunteers to decipher hard-to-read words and learn new scripts.

We spoke to Sara Brumfield, co-founder of FromThePage, about collaborating with Transkribus and about how the “AI Assist” feature has benefited their community.

FromThePage allows heritage institutions to transcribe their documents using the power of the crowd. © FromThePage

 

A new type of crowdsourcing platform

While FromThePage is now used by some of the largest cultural heritage institutions in the US and UK, it actually started out back in 2005 as a family history project. Sara’s business partner and husband, Ben, had inherited a collection of diaries from his great-great-grandmother and, inspired by the crowdsourcing approach of Wikipedia, he wanted to find a way to let volunteers transcribe them online. 

Sara and Ben's computer science degrees and extensive IT careers provided the ideal foundation for their project. “As a software engineer, I have worked for IBM and a number of startups for over two decades,” Sara explained. “I’m an expert at working with IT folks specifically to solve problems with software.” This problem-solving mindset resulted in FromThePage, a platform that streamlines document digitisation by connecting institutions with volunteer transcribers.

Fast forward 20 years, and the couple’s hobby project has now become their business, with a client list including Harvard, Stanford, and the British Library. But while their clients might be large organisations, Sara is proud that FromThePage has maintained its family roots. “I grew up in a small business where we had relationships with our customers and knew our success rested on our reputation. I love bringing the same meaning to my family that my parents brought to mine.”

FromThePage offers a range of features for processing historical documents. © FromThePage



The growing popularity of AI

The decision to incorporate AI into FromThePage stemmed from user feedback. "[Our] customers were telling us [...] that their colleagues and patrons were saying 'Can’t you just use AI for transcription?'" Sara explained. The couple recognised the increasing sophistication of automatic text recognition technology. However, they also wanted FromThePage to be a platform that encouraged human engagement in historical documents. Was there a way to combine the two?

That is where Transkribus came in. The AI-powered transcription platform contained the exact technology that Sara and Ben were looking to use, making it the ideal collaboration partner. “Transkribus is a trusted service in the archives world,” Sara said. “Because [Transkribus] models are trained on archival material, rather than commercial material like checks or medical charts, we know – and our partners know – that we’ll get high-quality, content-appropriate [transcriptions].”

The Transkribus technology was incorporated into FromThePage using the platform’s metagrapho API. This acts as a messenger service: Document images are sent from FromThePage to Transkribus, Transkribus automatically transcribes them, and then the API sends the transcriptions back to FromThePage. Using an API meant the technologies of both systems could be integrated without too much costly development and without too many difficulties along the way.

“Any integration between systems is challenging – there’s the one you know well, and then this black box at the other end.  The documentation and support we received from Transkribus was great, though, and it didn’t take long to get an integration working,” Sara said.

“AI Assist” displays the automatic transcription over the original document image. © UNC Cameron Papers via FromThePage

 

Introducing “AI Assist”

The end product of this integration was FromThePage’s “AI Assist” feature. “AI Assist” uses Transkribus’ technology to generate an automatic transcription, which is then overlaid onto the document image. Users can choose whether to see the transcription on the image or not, allowing the platform to cater to different users. "Our experienced genealogists often find [the transcription overlay] annoying because they have a lot of experience with old hands, so they only use it when they need a 'second opinion' on a hard to read word," said Sara. "However, undergraduates who are not fluent in cursive really like having the machine-generated text as a true AI assist."

FromThePage has also experimented with an "AI Draft" option, providing a preliminary transcription for users to refine. However, this approach presented a unique challenge: maintaining user trust in the original document. "We all tend to trust computers – more than we should! And if you don’t trust your ability to read old hands, it’s easy to trust machine-generated text more than you trust yourself to do the hard work of deciphering a hard-to-read word," Sara pointed out.

A shopkeeper ledger from the History Revealed project. © John Glassford & Co via History Revealed

 

Real-world applications

“AI Assist” has already been used to enhance transcriptions in several of FromThePage’s larger projects. At the University of North Carolina Chapel Hill, for example, the Cameron Family Papers - Records of Enslavement project utilised both AI Assist and AI Draft features.

"UNC was also gracious enough to provide a lot of feedback on integrating HTR into the FromThePage user interfaces," Sara told us. The volunteer transcribers, many of whom were experienced genealogists, offered valuable insights. "Their feedback was interestingly mixed! From: 'I prefer to transcribe manually. It is more challenging to me in that way' to 'While using AI, I didn't feel as lost when encountering a difficult word and I felt more relaxed about not deciphering a word or words.'"

Another project, led by Molly Kerr at History Revealed, focused on a shopkeeper's letterbook and aimed to support student education and public engagement. "Because many of Molly's volunteers are students, she was interested in applying HTR and using FromThePage's 'AI Assist' feature to give the transcribers the step up they needed when working with cursive from the 1700s," Sara explained. The project also addressed multilingual documents by utilising the Text Titan model for French and German letters and the English Eagle for the rest.

Every couple of years, the Transkribus User Conference brings together READ-COOP members and Transkribus users from around the world. © Transkribus

 

The Transkribus partnership

Aside from the tech benefits, the fact that Transkribus has similar values to FromThePage was also an important factor for Sara and Ben. Both organisations are of a small-to-medium size, and both create tech solutions specifically for the library and archival sector. "We believe in – and frankly really enjoy – participating in communities of smart digital library folks doing interesting things," Sara said. "We also like working with another small business, rather than one of the tech behemoths to whom we are just an API transaction."

It was this alignment of values that led FromThePage to become members of READ-COOP, the cooperative behind Transkribus. READ-COOP has over 200 members, from major universities such as Cambridge and Vienna, who use Transkribus every day for research, to private members who just want to show their support for the platform. In contrast to most other tech companies, it is READ-COOP’s members, not its management board, that shape the future of Transkribus, and this was an opportunity that Sara didn’t want to miss out on.

"We’re fascinated by the coop model applied to software services. [...] READ-COOP is a grand experiment, and we wanted to be part of it."

 

Thank you for talking to us, Sara!

 

You can find out more about becoming a member of READ-COOP on our cooperative website.

Thumbnail image © Sara Brumfield

Share this post