Simple Python Script for Extracting Text from an SRT File
See Python: Tips and Tricks for similar articles.Watching movies or TV shows in a foreign language is great for learning that language, but it can be challenging. Quick speech, slang, and background noise can all make understanding more challenging. I find it helpful to have subtitles that match the speech, but foreign-language films/shows don’t always have subtitles. Fortunately, you can often find subtitle files (with a .srt extension) at opensubtitles.org. Unfortunately, those files aren’t easy to read, because they are marked up with timestamps and include every sound made (e.g., mobile phone ringing).
For example, we are currently watching El Ministerio del Tiempo on Movistar in Spain. The SRT file of season 1, episode 3 (available here) begins like this:
1
00:00:33,599 --> 00:00:35,270
(NARRA) "Soy Amelia Folch.
2
00:00:36,199 --> 00:00:39,870
Tengo 23 años y sin embargo
he salvado la vida del Empecinado.
3
00:00:45,160 --> 00:00:46,550
(Disparo)
4
00:00:48,800 --> 00:00:50,310
He conocido a Lope de Vega.
5
00:00:56,400 --> 00:00:58,080
Y he visto la Armada Invencible.
I wanted to reduce that to:
(NARRA) "Soy Amelia Folch.
Tengo 23 años y sin embargo he salvado la vida del Empecinado.
He conocido a Lope de Vega.
Y he visto la Armada Invencible.
Here’s the solution I came up with.
Run it like this:
python srt_to_txt.py file_name.srt cp1252
Note that the script assumes that lines beginning with lowercase letters or commas are part of the previous line and lines beginning with any other character are new lines. This won’t always be correct, but it does a good enough job to make it easy to follow along with the movie.
Related Articles
- Fixing WebVTT Times with Python
- Using Python to Convert Images to WEBP
- Scientific Notation in Python
- Understanding Python’s __main__ variable
- Converting Leading Tabs to Spaces with Python
- pow(x, y, z) more efficient than x**y % z and other options
- A Python Model for Ping Pong Matches
- Bulk Convert Python files to IPython Notebook Files (py to ipynb conversion)
- Python’s date.strftime() slower than str(), split, unpack, and concatenate?
- Basic Python Programming Exercise: A Penny Doubled Every Day
- Bi-directional Dictionary in Python
- How to find all your Python installations on Windows (and Mac)
- Associate Python Files with IDLE
- Change Default autosave Interval in JupyterLab
- Python: isdigit() vs. isdecimal()
- Python Clocks Explained
- Python Color Constants Module
- Maximum recursion depth exceeded while calling a Python object
- When to use Static Methods in Python? Never
- Finally, a use case for finally – Python Exception Handling
- Creating an Email Decorator with Python and AWS
- Python Coding Challenge: Two People with the Same Birthday
- How to Create a Simple Simulation in Python – Numeric Data
- Collatz Conjecture in Python
- Simple Python Script for Extracting Text from an SRT File (this article)
- Python Virtual Environments with venv
- Mapping python to Python 3 on Your Mac
- How to Make IDLE the Default Editor for Python Files on Windows
- How to Do Ternary Operator Assignment in Python
- How to Convert Seconds to Years with Python
- How to Create a Python Package
- How to Read a File with Python
- How to Check the Operating System with Python
- How to Use enumerate() to Print a Numbered List in Python
- How to Repeatedly Append to a String in Python
- Checking your Sitemap for Broken Links with Python
- How to do Simultaneous Assignment in Python
- Visual Studio Code - Opening Files with Python open()
- How to Slice Strings in Python
- How Python Finds Imported Modules
- How to Merge Dictionaries in Python
- How to Index Strings in Python
- How to Create a Tuple in Python