
Should I use a base model when training a custom model?
The “select a base model” option is a key feature available when training a custom model in Transkribus. But what exactly does it do, and when should you use it? This guide breaks down the concept, functionality, and benefits of using base models, to help you set up your model training most effectively.
What is a base model?
When setting up your model training in Transkribus, you have the option to select a base model. This is an existing text recognition model that can be used as a starting point for training your custom model.
Any public or private model within Transkribus can be used as a base model. You can choose anything from a large public model to a custom model you’ve created previously. The only exception are Super Models, as these use a different type of technology and therefore cannot be used as base models.
A base model can be selected when setting up the training for a new text recognition model. Image via Transkribus.
What does a base model do?
Models are like manuals, telling Transkribus how to transcribe a certain type of document. If you train a new model from scratch, you need to provide a good amount of Ground Truth training data, which Transkribus analyses, finds patterns in the writing, and then creates a set of rules for transcribing the different characters, words, and phrases found in the documents. In this situation, the model has only one source of knowledge: your Ground Truth data.
However, if a base model is selected as part of the training, your new model learns not only from your Ground Truth, but also from the information contained in the base model. In machine learning, this is known as transfer learning, as the knowledge is transferred from the base model to the new model. Base models allow new models to be trained more quickly and with less Ground Truth, as the new model does not have to relearn the information contained in the base model.
Does using a base model create a more accurate model?
Yes, it can do and often does. For example, take a look at these two training charts of a model created for 20th-century council registers in English.
The model was first trained without a base model (the training chart on the left). This resulted in a CER of 7.18%, which was a little higher than what the project team wanted. But then they retrained the model using the same Ground Truth but this time, they used the English Eagle as a base model (the training chart on the right). With the base model, the CER was 5.86%, and the training needed fewer epochs to reach a good CER.
Which base model should I choose?
When it comes to base models, general models tend to work better than more specific models. This is because they are used to seeing a wide variety of different fonts and handwriting styles, and therefore are more adaptable to new styles.
For instance, there are plenty of models for 20th-century English that could have been used as base models in the example above. However, the researchers decided to use the English Eagle, as this model is the most general and therefore likely to have the biggest impact on the CER of the new model.
Other general models that would be ideal base models include the German Giant, the Dutchess, and Coloso Español.
The German Giant and Coloso Español are general models that make excellent base models. Image via Transkribus.
Does using a base model always increase accuracy?
No, not always. Using a base model that is either too different or too similar to your documents can decrease the accuracy.
For example, if you’re training a model for 19th-century French, and you choose a base model for 16th-century Latin, this would simply confuse your new model and would lead to a higher CER.
Likewise — and this is a mistake that many people make at the start — if you use a previous version of your model as a base model, then the duplicate Ground Truth will also confuse your new model and result in a high CER. In this situation, it would be better to either take a more general model as a base model (see above) or simply not use a base model at all.
How do I know if using a base model will work for my model?
Simply try it! If you already trained your model with a base model and are unhappy with the CER, then try training it again with a different base model, or even without any base model. Sometimes, this can be the solution, even if it seems counterintuitive.
Where can I learn more about using base models?
Our webinar on “Expert Text Recognition Model Training” offers more information about using base models when training models. You can watch the recording of the webinar below and check out our YouTube channel for more tutorial videos and webinars about using Transkribus.