Simple Python Script for Extracting Text from an SRT File

See Python: Tips and Tricks for similar articles.
Looking to improve your Python skills? Check out Webucator’s Python classes.

Watching movies or TV shows in a foreign language is great for learning that language, but it can be challenging. Quick speech, slang, and background noise can all make understanding more challenging. I find it helpful to have subtitles that match the speech, but foreign-language films/shows don’t always have subtitles. Fortunately, you can often find subtitle files (with a .srt extension) at opensubtitles.org. Unfortunately, those files aren’t easy to read, because they are marked up with timestamps and include every sound made (e.g., mobile phone ringing).

For example, we are currently watching El Ministerio del Tiempo on Movistar in Spain. The SRT file of season 1, episode 3 (available here) begins like this:

1
00:00:33,599 --> 00:00:35,270
(NARRA) "Soy Amelia Folch.
2
00:00:36,199 --> 00:00:39,870
Tengo 23 años y sin embargo
he salvado la vida del Empecinado.
3
00:00:45,160 --> 00:00:46,550
(Disparo)
4
00:00:48,800 --> 00:00:50,310
He conocido a Lope de Vega.
5
00:00:56,400 --> 00:00:58,080
Y he visto la Armada Invencible.

I wanted to reduce that to:

(NARRA) "Soy Amelia Folch.
Tengo 23 años y sin embargo he salvado la vida del Empecinado.
He conocido a Lope de Vega.
Y he visto la Armada Invencible.

Here’s the solution I came up with.

Run it like this:

python srt_to_txt.py file_name.srt cp1252

Note that the script assumes that lines beginning with lowercase letters or commas are part of the previous line and lines beginning with any other character are new lines. This won’t always be correct, but it does a good enough job to make it easy to follow along with the movie.

Written by Nat Dunn.


Related Articles

  1. Scientific Notation in Python
  2. Understanding Python’s __main__ variable
  3. Associate Python Files with IDLE
  4. Python: isdigit() vs. isdecimal()
  5. Python Color Constants Module
  6. Python: pow(x, y, z) less efficient than x**y % z
  7. A Python Model for Ping Pong Matches
  8. Bulk Convert Python files to IPython Notebook Files (py to ipynb conversion)
  9. Collatz Conjecture in Python
  10. Finally, a use case for finally – Python Exception Handling
  11. Python Clocks Explained
  12. Python’s date.strftime() slower than str(), split, unpack, and concatenate?
  13. Bi-directional Dictionary in Python
  14. Maximum recursion depth exceeded while calling a Python object
  15. Basic Python Programming Exercise: A Penny Doubled Every Day
  16. Creating an Email Decorator with Python and AWS
  17. How to Create a Simple Simulation in Python – Numeric Data
  18. Python Coding Challenge: Two People with the Same Birthday
  19. How to find all your Python installations on Windows
  20. Change Default autosave Interval in JupyterLab
  21. Interactive Quiz using IPython Notebook
  22. When to use Static Methods in Python? Never
  23. Converting Leading Tabs to Spaces with Python
  24. Simple Python Script for Extracting Text from an SRT File (this article)
  25. Python Virtual Environments with venv
  26. Mapping python to Python 3 on Your Mac
  27. How to Make IDLE the Default Editor for Python Files on Windows
  28. How to Do Ternary Operator Assignment in Python
  29. How to Convert Seconds to Years with Python
  30. How to Create a Python Package
  31. How to Read a File with Python
  32. How to Check the Operating System with Python
  33. How to Use enumerate() to Print a Numbered List in Python
  34. How to Repeatedly Append to a String in Python
  35. Checking your Sitemap for Broken Links with Python
  36. How to do Simultaneous Assignment in Python
  37. Visual Studio Code - Opening Files with Python open()
  38. How to Slice Strings in Python
  39. How Python Finds Imported Modules
  40. How to Merge Dictionaries in Python
  41. How to Index Strings in Python
  42. How to Create a Tuple in Python