Skip to content
blog_transkribubus-1
Text recognition models Technology

How is the CER calculated?

Fiona Park
Fiona Park |

If you're browsing models on Transkribus, you will see that each model has a Character Error Rate, or CER. This is a score that shows the accuracy of that model: The lower the CER of a model, the more likely that model is to produce accurate transcriptions.

But how is the CER calculated? And how important is this metric? In this post, we will take a look at all things CER-related, to find out more about this little number.

 

What is the CER?

The CER stands for "Character Error Rate". This is a mathematical formula used to indicate how often the model will make a mistake in a transcription, expressed as a percentage of incorrect characters. If a model has a CER of 5%, this means that if you use the model to transcribe a document, then, on average, 5% of the characters will be incorrectly transcribed.

 

How is the CER calculated?

If you've ever trained a model, you will know that you first need a collection of documents that have been pre-transcribed with 100% accuracy. These documents, also known as your Ground Truth data, are what Transkribus uses to learn the handwriting you want to train it on. 

When setting up your training, you divide your Ground Truth data into two separate groups: Training Data and Validation Data. Transkribus uses the Training Data to learn the handwriting, and then tests its knowledge by trying to transcribe the documents in the Validation Data. It then compares its "test" automatic transcription against the correct, original transcription and counts how many characters were incorrect. This number is then expressed as a percentage of the total number of transcribed characters, giving you the Character Error Rate (CER).

 

 

What does Transkribus count as an incorrect character?

When comparing the automatic transcription with the accurate, original transcription, Transkribus looks for three types of errors:

  • Insertions: characters that are in the automatic transcription that weren't in the original transcription
  • Substitutions: characters that were incorrectly transcribed, for example "moom" instead of "moon"
  • Deletions: characters from the original transcription that were missed out in the automatic transcription

It counts how many of each type of errors there are, adds them together, and then divides them by the total number of characters. Finally, it multiplies this number by 100 to turn it into a percentage. In formulaic terms, this translates to:

CER = [ (i + s + d) / n ]*100

 

Can you give me a real-life example?

Let's say you have a collection of 1000 pages and you would like to train a model for it. You have already manually transcribed 50 of these pages, and you are sure they are completely accurate. You use 45 of these pages as your Training Data, and select the other 5 pages as your Validation Data. The 5 pages of your Validation Data contain 1576 characters in total.

You start training the model. Transkribus analyses the 45 pages of Training Data, and then tests out what it has learnt by transcribing the 5 pages of Validation Data. It compares the automatic transcriptions with the original transcriptions and finds that 312 characters have been incorrectly transcribed. This is 19.8% out of the total number of characters, giving a CER of 19.8%.

Transkribus then repeats this process again. This time, because Transkribus has learnt more, only 242 of the 1576 characters are incorrectly transcribed, resulting in a CER of 15.3%. And then it repeats the process again and again.

Each of these cycles is known as an epoch, and for the first few epochs, the CER will be lower at the end of each epoch. But at some point, the CER will stabilise and will remain the same after each epoch. This is then the final CER of your model. For larger models with lots of training data, it can take several hundred epochs before the CER stabilises.

 

What is a "good" CER?

In general, a model with a CER of 10% or less is considered accurate enough to produce "useful" transcriptions, even if they still contain a few errors. 

However, just because your model has a CER of more than 10%, this doesn't mean that it will necessarily produce poor transcriptions. For example, if Transkribus transcribed all the words in a document with 100% accuracy, but transcribed every full stop or period as a comma, then this would result in quite a high CER even though the transcriptions are perfectly readable.

Therefore, it is always good to take a quick look at which kind of errors were made by the model, as opposed to relying solely on the CER for assessing accuracy.  You can do this by comparing versions of the same page, such as in the example below.

What can I do with models that have a CER higher than 10%?

Often, though, the errors created by high-CER models do affect the readability of your transcriptions. However, error-prone transcriptions can still be useful, depending on your workflow. Here are three examples:

Searching for certain keywords

If you just want to search your collection for certain keywords, then you don't need an entirely accurate transcription. You just need to ensure that the word(s) you are searching for are accurately transcribed. 

For example, if you are searching birth records for a certain location, then as long as Transkribus can reliably recognise that location, it does not matter if the model makes many other errors. You can also use the Fuzzy Search feature in order to find the words you are looking for, even if they are incorrectly transcribed.

Further processing with other tools

Some inaccurate transcriptions can be cleaned up with other AI tools, such as ChatGPT. This works especially well with more recent material, as large LLMs tend to recognise modern-day language more effectively than historical language. There is more information on this topic in this article.

Using it as a base model

When training a new model, you can choose to use a base model as a starting point. The new model then learns from both your Ground Truth data and the data in the base model, usually resulting in a more accurate model.

Sometimes, a model with a slightly higher CER can still work very well as a base model, particularly if you only have a small amount of Ground Truth data.


How can I improve the CER of my model?

If your model keeps giving you a high CER, no matter how many epochs you run, then do not despair. From checking the accuracy of your Ground Truth to using a base model, there are a few key things you can do to improve the CER and get it below 10%.

You can find out more in this blog post.

 

Share this post