This project aims to semi-automatically extract hardcode subtitles. It contains two submodules, SubGen
and SubOCR
. To do the work, in addition, you need VideoSubFinder
Sometimes, you have a low-quality video file with hardcode subtitles and an HD video file of the same movie but without subtitles. Or the hardcode subtitles are in a different language and you want to translate them. Anyways, the only options are: a) creating a new subtitle file from scratch by typing the lines on your own and b) extract the subtitles from the video file automatically. Of course, you will choose wisely.
The overall workflow is as follows:
- Use
VideoSubFinder
to open the video file with hardcode subtitles and save pictures containing subtitles with the timings- This process is almost automatic. What you need to do is simply wait for about 10 to 30 minutes for a 2-hour-long movie with 800 to 1500 subtitles
- Use
SubOCR
to scan the images and convert them into plain texts- This process is almost automatic. What you need to do is simply wait about 10 minutes for 1200 subtitle images
- Use
SubGen
to visually proofread the subtitles generated bySubOCR
- This is the only manual process. Usually, the OCR results are quite accurate, and if you are familiar with this program, you will only spend about 15 minutes for 1200 subtitles
This README will introduce mainly the usage of SubGen
. Some important notes for the other two programs will also be briefly covered. For more details about SubOCR
, please refer to its README
This project mainly targets Windows users. All three programs can be run on Windows XP, 7, and 10, either 32-bit or 64-bit OS.
VideoSubFinder
is C++-based. Therefore, Microsoft Visual C++ Redistributable for Visual Studio 2015-2019 is needed (Download here). However, it is very likeyly that you have already installed it because a lot of other softwares depend on itSubOCR
is Ruby-based and therefore cross-platform. On Windows, there is NO additional requirement (NO NEED to install Ruby, unless you, as a developer, want to use your own Ruby environment). However, you MUST have access to the Internet becauese the program will send requests to BaiduOCR web-based APISubGen
is .NET-based. If your OS is Win7, there is nothing you need to do. If you are running on Win10, you might need to enable the .NET Framework 3.5. It is very simple, though, and you can just double-click the executable and follow the system prompts
Download the latest release for SubOCR
and SubGen
, and extract the zip archive, preferably in the same folder as VideoSubFinder
- First of all, remember to click "Clear folders" in
VideoSubFinder
to remove any file generated before for other videos - Next, set the beginning time, ending time, and the region of the subtitles; then, click "Run Search" and get the images (Fig. 1). Generally speaking, the default searching parameters are good enough
- NO NEED to and DO NOT run "Create Cleared Text Images." The RGBImages will suffice for
SubOCR
, and there will be no risk of data loss (For example, in Fig. 1, the first line will be ignored by the program in the "cleared text images"); in addition, a lot of time will be saved - Now you can close
VideoSubFinder
- Before running
SubOCR
, you need to configure the program properly by editingSubOCR.rb
as instructed here- Especially, you MUST apply for an BaiduOCR API_Key and replace the placeholder at the beginning of
SubOCR.rb
as instructed here
- Especially, you MUST apply for an BaiduOCR API_Key and replace the placeholder at the beginning of
- Generally speaking, the default settings are good enough. You can just double-click
SubOCR.exe
to run and the results will be saved to theTXTResults
folder- For more details, see here
- Select the folder of
VideoSubFinder
and set the region of the subtitles in the images (Fig. 2) - In the revision window, you can compare the original images with the OCR results and then correct anything wrong (Fig. 3)
- This process can be done in the meantime while
SubOCR
is running to save time
- This process can be done in the meantime while
- In the review window, make final changes and save the subtitle file (Fig. 4)
- Now you will return to the first window (Fig. 2), and you can start processing another video by repeating the steps above
Fig. 2 SubGen
Fig. 3 The revision window
Fig. 4 The review window
-
In the first interface (Fig. 2), the program can automatically detect the subtitle region. Then, you can do manual adjustion by dragging using the left mouse button (to set the left-top corner) and the right mouse button (to set the right-bottom corner)
- This needs not to be very accurate. The cropped region will be shown on the left in the revision window
-
Then, press Enter to proceed to the revision window (Fig. 3)
-
You can hover the mouse at different places to see tips, the timing, and the corresponding filename of the OCR result
-
There will be an asterisk mark (*) in the caption of the window if the page has been saved before
-
The vertical line character (|) in the text means there was a line break. The color of the text indicates the status:
- Black: the original OCR results
- Blue: the texts have been altered
- Red: The line segmentation is incorrect, and you may want to transfer some lines from one textbox to another (The cause and the workaround of this issue has been explained in the README of
SubOCR
)
-
The hotkeys:
- Enter: Save the current page and move to the next page
- Esc: Close the window and proceed to the review window without saving
- PageUp/PageUp: Move to the previous/next page without saving
- If you are already at the first page, PgUp will take you to the final one; and vice versa
- ↑/↓: Move to the previous/next textbox
- If you are already at the first textbox, Up will take you to the final one; and vice versa
- Shift+↑/Shift+↓: Select the text from the beginning to the current position, or the other way around
- You can use Shift+↑ + ← to move the cursor to the left end; or vice versa
- Ctrl+↑/Ctrl+↓: Transfer the selected text to the previous/next textbox
- You cannot use this function if you are already at the first or the last textbox
-
-
Then, press Esc to proceed to the review window (Fig. 4)
- The cursor will be at the first line of the subtitles. If you scroll up, you can define the styles of the texts, etc.
- Press Shift+Enter or Ctrl+Enter to close the window and save
-
Done!