Models

Introducing Text Titan I bis

Florian Stauder | April 16, 2025

A new generation of general-purpose text recognition for Latin scripts in Transkribus

We’re introducing Text Titan I bis, the next iteration of our public model for Latin script recognition in Transkribus. Built on a broader and more balanced training set than its predecessor, it is designed to deliver strong out-of-the-box performance across a wide variety of historical and modern documents - whether printed or handwritten, common or unusual.

This model is part of an ongoing training initiative. We are scaling model development step by step, starting with a smaller sample of our ground truth data and gradually increasing coverage. Text Titan I bis marks the first milestone in this process, offering a preview of what becomes possible with a richer and more representative dataset.

Why it matters

Many users working with Latin script documents - ranging from researchers and archivists to private individuals - need a model that performs reliably across different languages, writing styles, and time periods. Text Titan I bis meets this need by covering a wide range of materials with minimal setup. It reduces the need for custom models or post-correction, making the transcription process faster and more efficient from the start.

The model is trained on a carefully balanced sample drawn from a large and diverse training pool. This selection strategy ensures good generalisation and longtail performance, particularly in less common or challenging cases.

Benchmark insights

We tested Text Titan I bis on a representative benchmark set of 2,000 high-quality pages. The evaluation covered several languages and script types using identical parameters to those applied to the original Text Titan I model.

The results show a clear improvement across the board. On average, the character error rate across all Latin scripts dropped from 8.6% to 6.7%. English documents saw a reduction of 2.3 percentage points, handwritten material improved by 2.6 points, and unusual Latin scripts by 3.2. Even printed materials, where previous models already performed well, showed a measurable gain.

These gains are not limited to specific cases but are consistent across the entire test set. This confirms that Text Titan I bis provides a robust and general improvement in recognition quality.

What’s next

Text Titan I bis is the first in a series of increasingly capable models. The model is available in Transkribus as the next-generation model for Latin script recognition. It delivers measurable improvements over Text Titan I and is recommended for general use across a wide range of documents. At the same time, it represents only the first step in an ongoing effort to build even stronger models. While Text Titan I remains available, I bis sets the direction for what is to come - offering both immediate gains and a preview of the progress ahead. There is more to come.