When it comes to creating subtitles, the concept is straightforward: transcribe, translate, and sync. However, when it comes to reading on-screen captions, deaf, hard of hearing, and foreign viewers are frequently left wanting more. Poor subtitling can be perplexing, frustrating, or hilarious at times.
After all, all text deliverables are digital code that can be rendered obsolete by adding or deleting a single character during translation.
This post will go over the three most common code errors that occur during subtitling, as well as what you can do to avoid them.
The most frequently requested deliverable for subs video translation is text files. This is a significant improvement over even five years ago when most projects required burning them to film or delivering them as graphics for overlay.
The reason for this shift is simple: online video streaming platforms such as YouTube, Vimeo, Netflix, Amazon, and Hulu, as well as channel-based apps.
Subtitle deliverables have grown in popularity as more content has moved online, including TV shows, movies, marketing spots, e-Learning, and instructional videos. By the way, dubbing and voice-over deliverables have evolved as well – most of these platforms now support multiple foreign-language audio channels.
Today’s multimedia localization professionals must be familiar with text-based formats such as SRT, STL, WebVTT, and SCC. If they’re working on e-Learning or corporate content, they’ll need to include DFXP and TTML, XML-based formats from Flash.
All of these formats have varying levels of complexity, ranging from SRT, which displays time-codes and caption units in a relatively user-friendly manner, to SCC, which encodes each character as a specific binary hex code.
Whatever their complexity, all captions text files have one thing in common: strict structure requirements. A change in the time-code structure, or even the number of tabs or spaces in a file, can have disastrous consequences. Naturally, this is a problem in translation because linguists frequently have access to this code while doing their work, and they occasionally make mistakes.
The following are some of the most common ones that can be true “code-killers.”
Can you figure out the issues in this SRT file?
They are circled in the following image:
There were four problems: a pair of hyphens converted to an em-dash (#1), a space inserted in the middle of an SRT “arrow” (#2), a space inserted before an end time-code (#3), and a tab inserted at the end of a line (#4). #3 is nearly invisible, and #4 is completely invisible.
All of these errors would have resulted in problems when adding this SRT to a video on most online players. Most players issue an integration warning, but only for the first line with a problem. And these problems can be extremely difficult to resolve, especially if they are hidden, as the tab above is.
Of course, an SRT file for a feature film or television show can contain hundreds or even thousands of segments, so a widespread problem can result in hours of frustrating labor.
And these errors are very common – most people frequently lose their place in a long document and hit keys on their keyboard, or delete text and then incorrectly replace it. Even the most diligent linguists’ work will eventually suffer from this type of problem due to simple human error.
Many text files allow for font formatting, screen placement, and other specialized formats. The majority of them, in fact, use standard XML tags, even in formats that aren’t XML-based, to begin with. If you’ve ever translated in XML, you know how easy it is to get those tags mixed up – even one I open code (for italics) without its corresponding /i> will throw off an entire string.
That also applies to tag hierarchies – even one out-of-place tag will render the code invalid. We see this issue frequently in German localization, for example, because the syntax is so different from English, and translation necessitates moving a lot of tags around.
If you’re using a tag-based format, make sure your linguists are familiar with XML in general or the file format in particular.
This error is mostly caused by a simple human error, similar to the inserted spaces, characters, or tabs in the first item. However, this occasionally occurs because linguists change the time-codes themselves, usually to combine two English-language segments or split them up when the translations are too long to fit.
The errors are divided into two categories: time-code structure errors (such as a missing reel number, a missing colon, a frame number that doesn’t fit within the frame-rate, or simply a missing decimal number); and time-codes that overlap with the previous or next subtitle, which are common when translators try to lengthen the on-screen time of a particular segment.
These errors are especially difficult to correct, especially for double-byte language projects, such as Japanese and Chinese subtitling, as they necessitate the services of a linguist and a professional time-coder who can re-spot those sections of the video.
You can, fortunately, avoid these problems by doing the following:
Select the most basic format that will work for your project. The more code you have in your text file, the more likely it will be mangled. Use the simplest format to provide the quality and flexibility required for your localization project.
Unfortunately, this isn’t possible for a lot of entertainment content that uses text formatting to convey emotion or irony. And a lot of content still uses SCC (the hex-based format mentioned above), which is notoriously difficult to create.
When subtitling, avoid using any software that auto-formats, particularly Microsoft Word. The hyphen-to-em-dash conversion in the preceding paragraph is a Word auto-correction. Furthermore, keep in mind that Word frequently auto-capitalizes the first word after a hard return, which can cause havoc in subtitles with multiple lines.
Employ linguists with experience in subtitling services. It’s difficult enough for experts to translate subtitles, let alone newcomers.
Perform a quality assurance check on the captions and subtitles that have been implemented. This will not prevent problems caused by the translation process, but it will ensure that no problems are communicated to your audience. This usually entails taking videos offline while the captions are added or setting up a beta test. Make room for this in your workflow.
Use automatic subtitles services. Instead of facing the risk of having subtitles littered with errors, you can avoid that by using automatic captioning tools. For example, Motionbear – an automatic subtitles, video and audio transcription tool with more than 90% accuracy.
It will save you time creating subtitles and avoid silly, unexpected mistakes of manually transcribing.
You just need to upload or drag and drop to upload the file to Motionbear. Then, depending on the length of your file, you may need to wait a few minutes or even just a few seconds for the video to be automatically subtitled by Motionbear.
You can check out this article to know how to create subtitles and create automatic transcription using Motionbear.
Rushing through caption and subtitle projects can result in more human error, bugs, and longer QA cycles. Thorough project planning – even during the original English-language project’s post is the best way to ensure that audio and video subtitles projects run smoothly, release on time, and stay within budget.
Our generative AI save you countless hours on subtitling and transcription tasks.