Skip to content
blog_transkribubus-1
Text recognition models Hebrew Yiddish

Need to transcribe handwritten Hebrew or Yiddish documents? These AI models could help.

Fiona Park
Fiona Park |

Across the world, archives and libraries preserve large collections of Hebrew and Yiddish documents. For many researchers, these materials—covering everything from communal ledgers to personal diaries—are a goldmine of information and key to understanding Jewish history.

However, many of these documents are hard to read, and their sheer volume makes it difficult to analyse collections on a large scale. Researchers and institutions are increasingly turning to technology such as Transkribus to make this work easier. Using AI-powered text recognition, Transkribus is built to read and transcribe historical documents, turning static images into digital text that is fully searchable and editable.

Thanks to the generosity and contributions of the Transkribus community, there are already several public models available for both handwritten and printed Hebrew and Yiddish. In this post, we'll look at three such models and show how you can use them to turn your documents into digital text with Transkribus.

 

Placeholder imageA pinkas contains information about a community, such as important events and decisions. ©Emanuel Elyasaf via Transkribus

 

1) Transcribing a 19th-century pinkas

A "pinkas" is a community's register or ledger, a key document for understanding its history. These books, often kept for centuries, track a community's day-to-day business, money, and decisions, but they're almost always handwritten. The "Pinkas Brody Model" (ID 59324) was created for exactly this kind of material. It has been trained on over 159,000 words of 19th-century handwritten documents from the pinkas of the Brody community. The model is good at reading both Hebrew and Yiddish cursive scripts from this period and gets a really good Character Error Rate (CER) of 4.4% on its test documents. This makes it a great starting point for any archive working with 19th-century Jewish communal records.

Try the model with your documents →

 

1842_1858 Parnes Sefaradi Har Hazeitim_074Creating digital versions of these cemetery records makes it easier to find particular names or locations. © IGRA via Transkribus

 

2) Decoding handwritten cemetery names

For genealogists and family historians, cemetery records are a key resource. But transcribing the names and dates from burial registers can be challenging because of the sometimes almost illegible handwriting. The "IGRA Sfardi Burial Hebrew" (ID 293793) model was created to tackle this problem. It was trained by the Israel Genealogy Research Association (IGRA) on a collection of Sephardic burial records written in Hebrew. Although the training set is small, with just under 5,000 words, it is specially trained for the tough job of reading names in a ledger. With a test CER of 4.36%, this model provides a tool for genealogists and institutions looking to digitise and index cemetery data, making them searchable for the first time.

Try the model with your documents →

 

Screenshot 2025-11-05 152103The HaMeorer journal was just one of the publications that helped to train the DiJeSt model. From HaMeorer May 1907, page 6 via hameorer.net

 

3) Digitising printed Hebrew and Yiddish

While handwriting can be a challenge, old printed texts in Hebrew and Yiddish have their own problems, for example, unusual fonts and poor print quality. The "DiJeSt 3.0" model (ID 357765) is a large and robust model for printed Hebrew and Yiddish. Trained on a huge dataset of nearly 1.5 million words, this model is very flexible and can handle all sorts of printed materials, such as Yiddish newspapers, religious books printed in block script, and other publications. The model is part of the "Digitizing Jewish Studies" project and is perfect for big digitisation projects at libraries and universities that hold large collections of Hebrew and Yiddish printed works.

Try the model with your documents →

 

 

How to use these models in Transkribus

Transkribus was built for historians, archivists, and scholars, not computer scientists, and so it is easy to get started with transcribing your documents.

  1. Upload your documents: First, upload your scanned images (as PDFs or JPGs) to your personal or institutional collection in Transkribus.

  2. Select your pages: Once uploaded, you can select the document, or a specific range of pages, that you want to transcribe.

  3. Find the right model: Go to the "Process with AI" tab to start the text recognition. In the public models section, you can search for the model that best fits your material. You can search by name (e.g., "Pinkas Brody Model") or by the model ID number if you know it.

  4. Start the recognition: After picking your model, click "Start Recognition". Transkribus will then process the pages, using the AI model to read the script and create a transcription.

  5. Review and correct: No AI is perfect, but a good model gets you most of the way there. The transcription will appear in the text editor right next to the document image, letting you easily check, correct, and edit the text.

For more information about using public AI models with Transkribus, visit our Help Center or YouTube channel.

Share this post