SRT files are plain text files that contain subtitle information for a video. Each subtitle is represented by a block of text that consists of a number, a timecode, and the subtitle text itself. The timecode indicates when the subtitle should be displayed and when it should be hidden. The format of an SRT file is as follows:
00:00:20,000 --> 00:00:24,400 This is the first subtitle. 2 00:00:24,500 --> 00:00:28,200 This is the second subtitle.
The first line of each block is the subtitle number. The second line is the timecode, which is separated by an arrow (–>). The third and subsequent lines are the subtitle text.
fs module to read the file, and the
readFileSync() method to read the contents of the file.
const fs = require('fs'); const srt = fs.readFileSync('example.srt', 'utf8');
readFileSync() method takes two parameters: the file name and the encoding. In this case, we are reading a plain text file, so we use the UTF-8 encoding.
Once we have the contents of the file, we can use regular expressions to extract the subtitle text. We can use the
match() method to match all the lines of text that are not the timecode or the subtitle number.
const srtText = srt.match(/^[^\d]+(.+)/gm);
The regular expression
/^[^\d]+(.+)/gm matches all lines that do not start with a digit. The
g flag stands for global and
m stands for multiline.
match() method returns an array of matched strings. We can then use the
join() method to join all the strings in the array into one string.
const srtText = srt.match(/^[^\d]+(.+)/gm).join('\n');
graph TD; A[Read SRT File] --> B[Extract Subtitle Text] B --> C[Join Subtitle Text]