Apr 6, 2017
Watching movies or TV shows in a foreign language is great for learning that language, but it can be challenging. Quick speech, slang, and background noise can all make understanding more challenging. I find it helpful to have subtitles that match the speech, but foreign-language films/shows don’t always have subtitles. Fortunately, you can often find subtitle files (with a .srt extension) at opensubtitles.org. Unfortunately, those files aren’t easy to read, because they are marked up with timestamps and include every sound made (e.g., mobile phone ringing).
1 00:00:33,599 --> 00:00:35,270 (NARRA) "Soy Amelia Folch. 2 00:00:36,199 --> 00:00:39,870 Tengo 23 años y sin embargo he salvado la vida del Empecinado. 3 00:00:45,160 --> 00:00:46,550 (Disparo) 4 00:00:48,800 --> 00:00:50,310 He conocido a Lope de Vega. 5 00:00:56,400 --> 00:00:58,080 Y he visto la Armada Invencible.
I wanted to reduce that to:
(NARRA) "Soy Amelia Folch. Tengo 23 años y sin embargo he salvado la vida del Empecinado. He conocido a Lope de Vega. Y he visto la Armada Invencible.
Here’s the solution I came up with.
Run it like this:
python srt_to_txt.py file_name.srt cp1252
Note that the script assumes that lines beginning with lowercase letters or commas are part of the previous line and lines beginning with any other character are new lines. This won’t always be correct, but it does a good enough job to make it easy to follow along with the movie.
Related Training: Python