Skip to content
blog_transkribubus-1
Technology READ-COOP

Bringing AI back to the community: The Transkribus vision for 2026

Fiona Park
Fiona Park

The past few years have been defined by an unprecedented AI boom. Large Language Models (LLMs) such as Gemini and ChatGPT have amazed us with their ability to predict text and generate transcriptions at scale, turning complex tasks into something that looks almost effortless. But for the heritage sector, this moment has raised a fundamental question: Who controls the process of turning historical sources into digital knowledge?

As we move into 2026, the market is crowded with powerful, largely opaque “black-box” tools that prioritise speed and convenience. At Transkribus, we are taking a different approach. Our goal is not just to recognise text, but to preserve it, and to keep that process firmly in the hands of the community.

We spoke with READ-COOP Board Members Florian Stauder and Michaela Prien about how the platform is evolving to support sustainable, transparent AI workflows for cultural heritage.

 

Design ohne Titel (8)Michaela Prien and Florian "Flo" Stauder are joint co-executive directors of READ-COOP, the cooperative society that manages Transkribus. © Transkribus


Q: Hi Michaela, hi Flo — thanks for talking to us. AI has exploded over the last few years. What has this meant for text recognition with historical documents?

Flo: There are simply far more options than there used to be. We now see everything from small, open-access tools that do very specific things, to large LLMs like ChatGPT, which are getting better and better at recognising and predicting text.

That said, none of these tools really covers the full range of needs that scholars and archivists have when working with historical documents. Even among experienced Transkribus users, it’s common to combine several tools — using Transkribus for core transcription, LLMs for particular tasks, or even developing custom solutions. For us, the big question for 2026 is how we bring these capabilities together into a single, coherent environment that properly addresses the needs of our community.



Q: Which of those community needs are you particularly focused on?

Michaela: Cultural heritage work is rarely about producing text alone. It’s about context, standards, and long-term usability. Nowadays, institutions don’t just need high-quality transcriptions, they need data that fits into existing archival systems, follows scholarly conventions, and can still be reused years or decades from now. That kind of work requires structured, transparent workflows, which generic AI tools usually don’t provide.

Flo: Exactly. This is why we’re moving beyond the idea of Transkribus as just a text recognition tool. We’re building an environment where each step supports the next, from transcription, to enrichment, to collaboration and publishing. Features like structured outputs, configurable workflows, and integrated collaboration aren’t extras for us; they’re central to making heritage data meaningful and sustainable.

Michaela: And this applies across the board. Whether someone is working on a small personal collection or coordinating a large institutional project with millions of pages, the platform should support the entire lifecycle of the material, without forcing users to move data between disconnected systems or adapt their work to opaque, one-size-fits-all solutions.

 

Screenshot 2026-02-18 095855The Stockholm City Archives are just one of the many cultural heritage institutions that use Transkribus to make their documents digitally accessible. © Transkribus



Q: Could you tell us about some of the new features you have planned for this year?

Flo: One of the most important goals for 2026 is integrating LLMs into Transkribus. Rather than trying to build our own large models from scratch, we’re focusing on connecting existing LLMs directly to the platform. This allows users to benefit from their strengths while keeping their data and workflows in one place.

Michaela: Data protection is the key consideration here. Transkribus is built on principles of transparency and data sovereignty: Any data you upload stays on the platform and remains under your control. Commercial LLMs don’t offer the same guarantees, so their use will always be clearly marked as “opt-in”. When a user starts a process that involves an external model, they will be explicitly informed about what data is shared and for how long, to make sure the process remains transparent.



Q: Does this mean LLMs will replace traditional Transkribus models?

Flo: Not at all. Public and custom Transkribus models will remain the foundation of the platform. They are designed specifically for historical sources and for users who need accuracy, control, and reproducibility.

LLMs are a complementary layer. They’re very good at predicting text and supporting certain enrichment or analysis tasks, but preservation requires more than prediction alone. Transkribus is about creating reliable, traceable digital representations of historical material that can be reused in the long term. Used together, these technologies can be very powerful.

 

Textual Tags - Wikidata ID-2Named Entity Recognition takes away the need to manually tag people, place, and other entities, speeding up the research process. © Transkribus



Q: Is this the year we will finally see Named Entity Recognition on Transkribus?

Michaela: That’s the plan! Named Entity Recognition (NER) is a highly requested feature and takes us well beyond simple, linear transcription. It allows users to automatically detect, classify, and tag entities such as people, places, and organisations within their texts. The real value is that this turns unstructured text into structured, interoperable data.

Once entities are identified, users can query entire collections — for example, finding every mention of “Goethe” or “Berlin” across thousands of documents — and reuse that information in databases, GIS systems, or network analysis tools. This helps analysis, as it enables researchers and archivists to apply and link data much more easily than before.

Flo: From a technical perspective, NER requires robust models trained on diverse historical materials. From the user’s perspective, however, the goal is simplicity. It's about empowering scholars to ask much deeper research questions of their sources without getting bogged down in manual data entry.



Q: You’ve been developing custom end-to-end models, as shown in the project with the Museum für Naturkunde, Berlin. Are these going to be rolled out on a larger scale?

Flo: Yes, that is what we are currently working on. Our new Smart Extract Models leverage cutting-edge architectures like DAN (Document Analysis Network) and DONUT (Document Understanding Transformer) to make it faster to extract information from documents. With a traditional model, a user transcribes the text and then has to manually define fields for data extraction. A Smart Extract Model does all of that simultaneously.

Michaela: Imagine uploading a complex 17th-century parish register with irregular columns and dense handwriting. A Smart Extract Model can recognise the text, understand the structure, identify fields like dates, names, and professions, and output a clean table in a single step. This dramatically reduces the time spent on layout analysis and post-processing.

 

Featured Image (7)Smart Extract Models were first developed in this project with the Museum für Naturkunde in Berlin. © Transkribus



Q: Which other features are you planning to tackle this year?

Michaela: The first one is Datasets. These are curated, versioned collections of pages used to train or evaluate models. Historically, it could be challenging for users to maintain version control, ensure quality, and share the exact training data that underpinned a high-performing model.

Flo: With the new Datasets feature, users gain a dedicated, systematic workflow for curating and managing their data. They can define a specific, high-quality set of documents and pages, version it, check its consistency, and easily share this exact dataset with collaborators or colleagues. This ensures better reproducibility: If a colleague wants to replicate a model's success, they can access the exact training data used.

Michaela: The second upcoming feature is Projects. As research collections grow, and as more institutions adopt Transkribus, it has become clear that the platform needs a higher level of organisational structure. Projects are designed to bring clarity to complex work. Instead of a single, long list of documents, models, and datasets, users can now create distinct organisational containers — a "Project" — which contains all the resources connected to a certain research task.

Flo: This means a user working on the "18th Century Correspondence Project" can keep all their relevant documents, the models trained specifically for that hand, and the curated datasets neatly grouped and separate from, say, their "Medieval Charters Pilot." It dramatically simplifies access for collaborative teams, ensures models and data are correctly linked, and provides a clean, focused workspace for every research project.

Michaela: The last details on all of these features are still being finalised but we will provide full transparency on pricing and credit usage as soon as they are ready for rollout.



Q: Is there anything else that 2026 has in store?

Michaela: The absolute peak of our year will be the Transkribus User Conference in September. This dynamic event brings together users, developers, and leading researchers to discuss the future of AI and historical document analysis. 

Flo: We will be running in-depth, practical workshops on all the features we’ve just talked about: NER, Smart Extract, Datasets, and Projects. It's a good environment to gain hands-on experience, ask the development team direct questions, and participate in vital discussions about the digitisation of historical sources. This year, we are also holding it not in Innsbruck, but at one of our cooperative’s members, the University of Passau. Passau is a beautiful German city with so much to offer, so this will surely bring a fresh dimension to the TUC.

 

Screenshot 2026-02-18 103137The Trankribus User Conference 2026 will take place at the end of September in Passau, Germany. © Transkribus



Q: Save me a ticket! And finally, is there anything else you would like to say about the 2026 vision for Transkribus?

Michaela: At its core, Transkribus exists to unlock, understand, and preserve written heritage. Everything we build is guided by that mission. We’re not just offering AI tools; we’re creating an environment that turns historical sources into lasting digital knowledge.

Flo: And because we are a cooperative, we have a responsibility to the community. We prioritise purpose over profit, and we work to ensure that the knowledge created on the platform remains open, reusable, and under the control of those who produce it. Because we believe that keeping the full workflow, from capture, to processing, to enrichment and publication, within a transparent, community-governed ecosystem is essential for sustainable heritage work.



Thank you Michaela and Flo for taking the time to talk to us.

Share this post