Simple Python Script for Extracting Text from an SRT File

See Python: Tips and Tricks for similar articles.

Watching movies or TV shows in a foreign language is great for learning that language, but it can be challenging. Quick speech, slang, and background noise can all make understanding more challenging. I find it helpful to have subtitles that match the speech, but foreign-language films/shows don’t always have subtitles. Fortunately, you can often find subtitle files (with a .srt extension) at opensubtitles.org. Unfortunately, those files aren’t easy to read, because they are marked up with timestamps and include every sound made (e.g., mobile phone ringing).

For example, we are currently watching El Ministerio del Tiempo on Movistar in Spain. The SRT file of season 1, episode 3 (available here) begins like this:

1
00:00:33,599 --> 00:00:35,270
(NARRA) "Soy Amelia Folch.
2
00:00:36,199 --> 00:00:39,870
Tengo 23 años y sin embargo
he salvado la vida del Empecinado.
3
00:00:45,160 --> 00:00:46,550
(Disparo)
4
00:00:48,800 --> 00:00:50,310
He conocido a Lope de Vega.
5
00:00:56,400 --> 00:00:58,080
Y he visto la Armada Invencible.

I wanted to reduce that to:

(NARRA) "Soy Amelia Folch.
Tengo 23 años y sin embargo he salvado la vida del Empecinado.
He conocido a Lope de Vega.
Y he visto la Armada Invencible.

Here’s the solution I came up with.

Run it like this:

python srt_to_txt.py file_name.srt cp1252

Note that the script assumes that lines beginning with lowercase letters or commas are part of the previous line and lines beginning with any other character are new lines. This won’t always be correct, but it does a good enough job to make it easy to follow along with the movie.

Written by Nat Dunn. Follow Nat on Twitter.


Related Articles

  1. Fixing WebVTT Times with Python
  2. Using Python to Convert Images to WEBP
  3. Scientific Notation in Python
  4. Understanding Python’s __main__ variable
  5. Converting Leading Tabs to Spaces with Python
  6. pow(x, y, z) more efficient than x**y % z and other options
  7. A Python Model for Ping Pong Matches
  8. Bulk Convert Python files to IPython Notebook Files (py to ipynb conversion)
  9. Python’s date.strftime() slower than str(), split, unpack, and concatenate?
  10. Basic Python Programming Exercise: A Penny Doubled Every Day
  11. Bi-directional Dictionary in Python
  12. How to find all your Python installations on Windows (and Mac)
  13. Associate Python Files with IDLE
  14. Change Default autosave Interval in JupyterLab
  15. Python: isdigit() vs. isdecimal()
  16. Python Clocks Explained
  17. Python Color Constants Module
  18. Maximum recursion depth exceeded while calling a Python object
  19. When to use Static Methods in Python? Never
  20. Finally, a use case for finally – Python Exception Handling
  21. Creating an Email Decorator with Python and AWS
  22. Python Coding Challenge: Two People with the Same Birthday
  23. How to Create a Simple Simulation in Python – Numeric Data
  24. Collatz Conjecture in Python
  25. Simple Python Script for Extracting Text from an SRT File (this article)
  26. Python Virtual Environments with venv
  27. Mapping python to Python 3 on Your Mac
  28. How to Make IDLE the Default Editor for Python Files on Windows
  29. How to Do Ternary Operator Assignment in Python
  30. How to Convert Seconds to Years with Python
  31. How to Create a Python Package
  32. How to Read a File with Python
  33. How to Check the Operating System with Python
  34. How to Use enumerate() to Print a Numbered List in Python
  35. How to Repeatedly Append to a String in Python
  36. Checking your Sitemap for Broken Links with Python
  37. How to do Simultaneous Assignment in Python
  38. Visual Studio Code - Opening Files with Python open()
  39. How to Slice Strings in Python
  40. How Python Finds Imported Modules
  41. How to Merge Dictionaries in Python
  42. How to Index Strings in Python
  43. How to Create a Tuple in Python