Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocr_bitmap can run out of buffer memory copying the "last font tag" #1586

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

jstrot
Copy link

@jstrot jstrot commented Jan 7, 2024

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Version: 0.94

During OCR of a VOB PS, ccextractor can run out of buffer space if it has to copy all text since the last font tag (which can also be the beginning of the input):

$ ./ccextractor -1 -cc2 -out=srt -utf8 test.vob -o test.srt
...
Error: In ocr_bitmap: Running out of memory. It shouldn't happen. Please report.

I believe the bug existed since that piece of code was introduced way back in 2017 (#844)

The fix simply makes sure the allocated buffer is big enough for this extra string.

Example crash under gdb:

$ gdb --args ./ccextractor -1 -cc2 -out=srt -utf8 test.vob -o test.srt
(gdb) run                     
Starting program: /home/jst/tools/src/ccextractor/linux/ccextractor -1 -cc2 -out=srt -utf8 test.vob -o test.srt  
[Thread debugging using libthread_db enabled]  
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".  
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.  
Teletext portions taken from Petr Kutalek's telxcc                                                               
--------------------------------------------------------------------------  
Input: test.vob                                                             
[Extract: 1] [Stream mode: Autodetect]                      
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]  
[Timing mode: Auto] [Debug: No] [Buffer input: No]                          
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]  
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]  
[Add font color data: Yes] [Add font typesetting: Yes]         
[Convert case: No][Filter profanity: No] [Video-edit join: No]  
[Extraction start time: not set (from start)]                        
[Extraction end time: not set (to end)]                              
[Live stream: No] [Clock frequency: 90000]              
[Teletext page: Autodetect]                                     
[Start credits text: None]                       
[Quantisation-mode: CCExtractor's internal function]  
                                                 
-----------------------------------------------------------------  
Opening file: test.vob                           
File seems to be a program stream, enabling PS mode   
Analyzing data in general mode                   
                                                                   
                                                 
New video information found                          
[720 * 480] [AR: 02 - 4:3] [FR: 04 - 29.97] [progressive: no]  
   
  0%  |  00:00                                   
...                          
Skip forward to the next Sequence or GOP start.  
 95%  |  19:38  
Skip forward to the next Sequence or GOP start.  
  
Skip forward to the next Sequence or GOP start.  
  
Thread 1 "ccextractor" hit Breakpoint 1, fatal (exit_code=1000, fmt=0x555555ee8da0 "In ocr_bitmap: Running out of memory. It shouldn't happen. Please report.\n") at ../src/lib_ccx/utility.c:272  
272             va_start(args, fmt);  
(gdb) up  
#1  0x00005555557976ed in ocr_bitmap (arg=0x602000008250, palette=0x602000b1c390, alpha=0x602000b1c3b0 "", indata=0x62a000726200 "", w=556, h=42, copy=0x60400003c210) at ../src/lib_ccx/ocr.c:638  
638                                                             fatal(CCX_COMMON_EXIT_BUG_BUG, "In ocr_bitmap: Running out of memory. It shouldn't happen. Please report.\n", errno);  
(gdb) list  
633                                             {  
634                                                     if ((new_text_out_iter - new_text_out) +  
635                                                             (last_font_tag_end - last_font_tag) >  
636                                                         length)  
637                                                     {  
638                                                             fatal(CCX_COMMON_EXIT_BUG_BUG, "In ocr_bitmap: Running out of memory. It shouldn't happen. Please report.\n", errno);  
639                                                     }  
640                                                     memcpy(new_text_out_iter, last_font_tag, last_font_tag_end - last_font_tag);  
641                                                     new_text_out_iter += last_font_tag_end - last_font_tag;  
642                                             }  
(gdb) p new_text_out_iter - new_text_out  
$1 = 96  
(gdb) p last_font_tag_end - last_font_tag  
$2 = 76  
(gdb) p length  
$3 = 158  
(gdb) p new_text_out_iter - new_text_out + last_font_tag_end - last_font_tag  
$4 = 172                                                                                                                                                                                                                                                                         

Before actually reaching this point I also had to fix an ASAN error with process_spu using memcpy on overlapping buffers. I can't say I understand why the buffers would be overlapping but using memmove at least fixes the error.

==611746==ERROR: AddressSanitizer: memcpy-param-overlap: memory ranges [0x7fffdf1eae84,0x7fffdf1eb528) and [0x7fffdf1ea800, 0x7fffdf1eaea4) overlap
    #0 0x7ffff786db25 in __interceptor_memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:899
    #1 0x5555556c2302 in process_spu ../src/lib_ccx/dvd_subtitle_decoder.c:387
    #2 0x5555556fe994 in process_data ../src/lib_ccx/general_loop.c:662
    #3 0x555555701650 in process_non_multiprogram_general_loop ../src/lib_ccx/general_loop.c:968
    #4 0x555555702248 in general_loop ../src/lib_ccx/general_loop.c:1062
    #5 0x5555556738ee in api_start ../src/ccextractor.c:204
    #6 0x555555675c39 in main ../src/ccextractor.c:465
    #7 0x7ffff64456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #8 0x7ffff6445784 in __libc_start_main_impl ../csu/libc-start.c:360
    #9 0x555555672c50 in _start (/home/jst/tools/src/ccextractor/linux/ccextractor+0x11ec50) (BuildId: 466667d3e95ff9aa8e7b1165aeac946dcfc18371)

@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 79aaf86...:

Report Name Tests Passed
Broken 0/13
CEA-708 0/14
DVB 0/7
DVD 0/3
DVR-MS 0/2
General 0/27
Hauppage 0/3
MP4 0/3
NoCC 0/10
Options 0/86
Teletext 0/21
WTV 0/13
XDS 0/34

All tests passing on the master branch were passed completely.

NOTE: The following tests have been failing on the master branch as well as the PR:


Check the result page for more info.

@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit 280939d...:

Report Name Tests Passed
Broken 0/13
CEA-708 0/14
DVB 0/7
DVD 0/3
DVR-MS 0/2
General 0/27
Hauppage 0/3
MP4 0/3
NoCC 0/10
Options 0/86
Teletext 0/21
WTV 0/13
XDS 0/34

All tests passing on the master branch were passed completely.

NOTE: The following tests have been failing on the master branch as well as the PR:


Check the result page for more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants