Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added --limit=<seconds> flag to csv export. (Includes tests.) #15

Closed
wants to merge 2 commits into from

Conversation

Arthaey
Copy link
Contributor

@Arthaey Arthaey commented Jul 24, 2017

Addresses (my own) feature request #14.

Added --limit=<seconds> flag to csv export. Only the first seconds will be processed, then substudy will stop early. Useful when testing a new file or subtitle timing.

Added tests to confirm that the generated csv contains expected data.

Bumped cli_test_dir to version 0.1.2 to take advantage of new expect_file_contents and expect_contains methods.

…ll be

processed, then substudy will stop early. Addresses feature request emk#14.

Added tests to confirm that the generated csv contains expected data.

Bumped cli_test_dir to version 0.1.2 to take advantage of new
expect_file_contents and expect_contains methods.
@emk emk self-assigned this Jul 24, 2017
Copy link
Owner

@emk emk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great PR! Thank you very much for sending it.

I have just a couple of ideas for improvement, because I'm picky and unreasonable. 😉 But I think this would be a great addition.

And thank you so much for including tests. Yay, tests!

@@ -21,7 +21,7 @@ Subtitle processing tools for students of foreign languages

Usage: substudy clean <subs>
substudy combine <foreign-subs> <native-subs>
substudy export csv <video> <foreign-subs> [<native-subs>]
substudy export csv [--limit=<secs>] <video> <foreign-subs> [<native-subs>]
substudy export review <video> <foreign-subs> [<native-subs>]
substudy export tracks <video> <foreign-subs>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two thoughts:

  • Can we find a better name than --limit?
  • If we're going to support this for export csv, can we support it for all the export subcommands? I think they mostly use the same code internally, don't they?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't particularly like the name "limit" either, but I couldn't think of a better one at the time. "End-at" is much better, thanks! I've changed it to "end-at" throughout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to add it to the other exports before I knew if you'd even accept this PR, but I agree with you, it's much better to be consistent and not surprise/confuse users. So I've added it to the other 2 export subcommands.

@@ -129,6 +134,11 @@ impl Exporter {
self.native.as_ref()
}

/// Get the seconds to limit processing to.
pub fn processing_limit(&self) -> Option<f32> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to improve this name, too. processing_limit doesn't specify either what kind of processing we're doing, or the units we use to measure the limit. Maybe end_time? And then we could change --limit to --end-time, perhaps? I'm open to other suggestions, of course!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even --end-at and end_at if that looks better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to end[-_]at! :)

tests/tests.rs Outdated
@@ -54,6 +54,10 @@ fn cmd_combine() {
assert!(from_utf8(&output.stdout).unwrap().find("¡Si!").is_some());
}

static FIRST_LINE: &'static str = "[sound:empty_00060_828-00066_164.es.mp3],0:01:00.828,empty,\"<img src=\"\"empty_00063_496.jpg\"\" />\",¡Si! ¡Aang ha vuelto!,Yay! Yay! Aang\'s back!,,,¡Lo sabía!,I knew it!\n";

static LAST_LINE: &'static str = "[sound:empty_00077_124-00083_379.es.mp3],0:01:17.124,empty,\"<img src=\"\"empty_00080_251.jpg\"\" />\",\"El no ha hecho nada, Sokka, fué un accidente.\",Aang didn\'t do anything.,harás que vengan a nosotros.,\"You\'re leading them straight to us, aren\'t you?\",,\n";
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FIRST_LINE and LAST_LINE test data is too specific, and it will cause the tests to break whenever we change the details of the sound naming, the exact time padding, or the HTML we output for Anki. It might be better to limit it to:

static FIRST_LINE: &'static str = "¡Si! ¡Aang ha vuelto!,Yay! Yay! Aang\'s back!,,,¡Lo sabía!,I knew it!";

...and something similar for LAST_LINE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see some test somewhere that confirms the exact format of the csv export, so we can be sure we don't break Anki support. But yeah, let's do that in a different PR in a better way than this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that changing to your suggestion without updating cli_test_dir will mean that we lose the test that verifies --end-at actually had the desired effect.

tests/tests.rs Outdated
testdir.expect_path("empty_csv/cards.csv");
testdir.expect_path("empty_csv/empty_00063_496.jpg");
testdir.expect_path("empty_csv/empty_00060_828-00066_164.es.mp3");
testdir.expect_file_contents("empty_csv/cards.csv", FIRST_LINE);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the corresponding negative tests here, both expect_not_path and expect_not_file_contents, or something like that. Please feel free to submit a PR against cli_test_dir with the APIs you need!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about accepting this PR's tests as-is, then when I've added expect_not_path and that's been made available, I can submit a follow-up PR here using the new APIs?

I prefer multiple PRs that keep things moving forward, rather than a series of dependent PRs... but also I recognize that sometimes "later" never happens. They're both your projects, so I'll defer to whichever you'd like.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'd prefer to hold off on merging this PR until the tests are complete, because then we can both trivially verify that it works correctly without needing to hand-test it. And then I can be lazy and only make a single release of each project, too. 😉

The rest of this patch is looking really great! I'm impressed that your first Rust contributions are so clean and well-tested!

@Arthaey
Copy link
Contributor Author

Arthaey commented Jul 27, 2017

No, thank you for having tests already in the project, so it was easy to TDD this! I've never even looked at Rust before, so I was glad to have an "easy" entry point. :)

I agree with all your suggestions, but my free time is unpredictable, so I'm not sure when I'll get back to this. Thanks for replying quickly, though, even if I can't follow up as quickly!

@Arthaey
Copy link
Contributor Author

Arthaey commented Jul 30, 2017

For the record, I don't think you're being "nitpicky" at all! Keeping codebases clean, consistent, and well-factored is important. And since I'd never even looked at Rust or your project before, there are bound to be things I need to fix up. :)

}
_ => {}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a clean way in Rust to extract this 3x duplicated code block? The trick is that it uses break it stop iterating early, which a side-effect-y helper function usually can't do. But if I could get rid of the duplication, that would make me happier and the code easier to maintain. :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an if let statement might allow you to drop the _ => {} at least?

fn main() {
    let v1 = Some("world");
    if let Some(v2) = v1 {
        if v2 == "world" {
            println!("Hello, world!");
        }
    }
}

Another option (probably a better one) is to move the duplicated code into a method on exporter, so you can just:

if exporter.finishedBefore(seconds) {
    break;
}

Or even more ambitiously:

if !exporter.shouldExportAt(seconds) {
    next;
}

This version would also work for a --start-at option, which is inevitably going to be added once --end-at exists. :-)

Also, as a general rule for substudy, it's often worth replacing println! (when useful) with debug!. This will hide the extra output unless the user sets RUST_LOG=debug in the environment, but I generally prefer commands that don't talk much as long as they're doing what you asked them to do. ;-) But I do like generous use of debug! and trace!; it's proven extraordinarily useful in production. The log + env_logger crates are pretty awesome for something so simple.

@emk
Copy link
Owner

emk commented Aug 2, 2017

I've updated cli_test_dir to have the APIs that this patch needs. We're getting closer!

@emk
Copy link
Owner

emk commented Aug 20, 2017

Just as a followup, I think this is almost ready to merge. IIRC, it just needs some tests to prove that it honors --end-time and doesn't export any sound clips or CSV records beyond that time.

I've added the necessary APIs to cli_test_dir, so this should be pretty easy. No rush, just letting folks know the current status of this PR. Thank you for the work you've done on this!

@emk
Copy link
Owner

emk commented Nov 23, 2017

Hello, I'm putting some more work in on substudy, and I have some time to look at this if you're still interested. :-) There may be some merge conflicts because I cleaned up a bunch of serialization and error-handling code, but it shouldn't be anything too horrible.

@emk emk removed their assignment Nov 23, 2017
@emk
Copy link
Owner

emk commented May 5, 2024

This could still be merged if somebody were interested in cleaning up the merge conflicts, but the affected code will be heavily rewritten soon.

@emk emk closed this May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants