Skip to content

[WIP] Set up a Rust native version of Hardsubx context and rewrite hardsubx.c #1458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 30 commits into
base: master
Choose a base branch
from

Conversation

shashwat1002
Copy link
Contributor

  • constructor function (new) made that takes options as argument [this will replace init_hardsubx]
  • default function for hardsubx context
  • default function for cc_subtitle

- constructor function (new) made that takes options as argument [this will replace init_hardsubx]
- default function for hardsubx context
- default function for cc_subtitle
@shashwat1002
Copy link
Contributor Author

@PunitLodha please check if the new struct looks okay

@PunitLodha
Copy link
Member

@prateekmedia Since you made the change to rsmpeg. Is it a drop-in replacement? Are there any changes needed to this PR?

@prateekmedia
Copy link
Member

prateekmedia commented Mar 15, 2023

@PunitLodha you can check what the CI says for build_ocr_hardsubx linux, if it runs fine then no change else I think there might be some imports that need to be done.

Like ffmpeg_sys_next::* would not be equivalent to rsmpeg::*, some dependencies are needed to be include manually. Check my implementation on mod.rs and utility.rs

@cfsmp3
Copy link
Contributor

cfsmp3 commented Mar 3, 2024

@prateekmedia @IshanGrover2004 @PunitLodha thoughts on this PR?

@PunitLodha
Copy link
Member

Code looks fine, but it needs testing before it can be merged. @prateekmedia @cfsmp3

@cfsmp3
Copy link
Contributor

cfsmp3 commented Mar 23, 2025

@shashwat1002 if this still applies can you rebase?

@prateekmedia prateekmedia requested a review from Copilot March 29, 2025 19:25
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR sets up a native Rust version of the HardsubX context and rewrites portions of the hardsubx C implementation. The changes include introducing a new constructor (new) that takes options as an argument, adding default implementations for the hardsubx context and cc_subtitle, and refactoring OCR-related functions to use native Rust types.

Reviewed Changes

Copilot reviewed 9 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/rust/src/lib.rs Introduces a new enum for encoding types required for native HardsubX processing
src/rust/src/hardsubx/utility.rs Adds an edit_distance_string function using Rust idioms
src/rust/src/hardsubx/mod.rs Adds context-related enums, struct initializations, and a new constructor for context
src/rust/src/hardsubx/main.rs Integrates the new context and processing flows for burned-in subtitle extraction
src/rust/src/hardsubx/classifier.rs Refactors OCR functions to return Rust Strings rather than raw C strings
src/rust/build.rs Updates the allowlist and symbol list for FFI bindings
src/rust/Cargo.toml Adds new dependencies to support numeric types and traits
Files not reviewed (2)
  • src/lib_ccx/hardsubx.c: Language not supported
  • src/rust/wrapper.h: Language not supported
Comments suppressed due to low confidence (2)

src/rust/src/lib.rs:31

  • [nitpick] The enum name 'ccx_encoding_type' does not follow Rust naming conventions. Consider renaming it to 'CcxEncodingType'.
pub enum ccx_encoding_type {

src/rust/src/hardsubx/mod.rs:24

  • The import 'use std::matches;' is unnecessary or incorrect since the 'matches!' macro is available in the prelude. Consider removing this import.
use std::matches;

} else {
text_out = TessBaseAPIGetUTF8Text((*ctx).tess_handle);
text_out = TessBaseAPIGetUTF8Text(ctx.tess_handle);
Copy link
Preview

Copilot AI Mar 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory allocated by TessBaseAPIGetUTF8Text is not freed, which may lead to a memory leak. Ensure that TessDeleteText is called on the returned pointer after converting it to a Rust String.

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants