I have been working on figuring out different ways to do subtitling, because putting subtitles on speech-heavy videos longer than 3 minutes with Sony Vegas is s l o w. I searched around a bit and found a way to hardcode subtitles onto the video using a combination of Youtube’s fabulous subtitling tools and the free software Handbrake. I uploaded my transcribed Spanish text to Youtube, and they automatically assigned timestamps.
I had only one problem – I needed to translate the subtitles from Spanish to English. I downloaded the subtitle file from Youtube and pasted it in to Google translate to quickly give me a (very) rough translation, which I then thoroughly reviewed for errors.
The biggest error was something odd that happened to the formatting in the translation process. The timestamps in the Spanish version look like this:
1 00:00:00,770 --> 00:00:08,490 Muchas gracias, pastor Coby y toda la congregación y liderazgo de tan distinguida iglesia. Le 2 00:00:08,490 --> 00:00:15,029 agradezco sinceramente que me permita dirigirme a su congregación para compartir nuestras
When I translated this document into English using Google Translate, they looked like this:
1 00: 00: 00,770 -> 00: 00: 08,490 Thank you very much Pastor Coby and the entire congregation and leadership of such a distinguished church. 2 00: 00: 08,490 -> 00: 00: 15,029 I sincerely thank you for allowing me to address to your congregation to share our
After a few lines of manually fixing the format, I had the idea that this would be so much easier if I could just use code to reformat everything.
And so I did! Worked like a charm!
=begin Input: text Output: text Rules: Problem: break up the text based on newlines. Every time there is a new line that starts with 0, fix the formatting. DS: strings, array Algo: define a method called format_timestamps, takes one parameter, text split text into array of strings based on newline characters iterate over strings in the array and if the string starts with '0', remove all spaces replace the '-' character with ' -' replace the '>' character with '> ' join the array back together as a string =end def format_timestamps(text) arr_text = text.split("\n") arr_text.map do |str| if str == '0' str.gsub!(' ','') str.gsub!('-', ' --') str.gsub!('>', '> ') else str end end.join("\n") end
My initial algorithm called for removing spaces after colons and then adding an extra dash before the arrow, but then I thought about removing all spaces and then adding spaces back in where needed. Much easier!
It seems to me like I remember that it is not a very good idea to have so many steps in one block like I have there in lines 6-8 of my
format_timestamps method. It would probably be a better idea to make some helper methods that I can then call inside the block. If I were doing more to this program, I would also improve readability because I could name the helper methods descriptively, like
Another fun process I want to practice how to do again is opening up, edit, and close the .srt file without copy-pasting all the text into my program. For today, though, my text was short enough that it didn’t really matter.