Added --limit=<seconds> flag to csv export. (Includes tests.) #15

Arthaey · 2017-07-24T07:10:30Z

Addresses (my own) feature request #14.

Added --limit=<seconds> flag to csv export. Only the first seconds will be processed, then substudy will stop early. Useful when testing a new file or subtitle timing.

Added tests to confirm that the generated csv contains expected data.

Bumped cli_test_dir to version 0.1.2 to take advantage of new expect_file_contents and expect_contains methods.

…ll be processed, then substudy will stop early. Addresses feature request emk#14. Added tests to confirm that the generated csv contains expected data. Bumped cli_test_dir to version 0.1.2 to take advantage of new expect_file_contents and expect_contains methods.

emk

This is a great PR! Thank you very much for sending it.

I have just a couple of ideas for improvement, because I'm picky and unreasonable. 😉 But I think this would be a great addition.

And thank you so much for including tests. Yay, tests!

emk · 2017-07-24T14:05:20Z

src/bin/substudy.rs

@@ -21,7 +21,7 @@ Subtitle processing tools for students of foreign languages

 Usage: substudy clean <subs>
       substudy combine <foreign-subs> <native-subs>
-       substudy export csv <video> <foreign-subs> [<native-subs>]
+       substudy export csv [--limit=<secs>] <video> <foreign-subs> [<native-subs>]
       substudy export review <video> <foreign-subs> [<native-subs>]
       substudy export tracks <video> <foreign-subs>


Two thoughts:

Can we find a better name than --limit?

If we're going to support this for export csv, can we support it for all the export subcommands? I think they mostly use the same code internally, don't they?

I didn't particularly like the name "limit" either, but I couldn't think of a better one at the time. "End-at" is much better, thanks! I've changed it to "end-at" throughout.

I didn't want to add it to the other exports before I knew if you'd even accept this PR, but I agree with you, it's much better to be consistent and not surprise/confuse users. So I've added it to the other 2 export subcommands.

emk · 2017-07-24T14:10:52Z

src/export/exporter.rs

@@ -129,6 +134,11 @@ impl Exporter {
        self.native.as_ref()
    }

+    /// Get the seconds to limit processing to.
+    pub fn processing_limit(&self) -> Option<f32> {


We might be able to improve this name, too. processing_limit doesn't specify either what kind of processing we're doing, or the units we use to measure the limit. Maybe end_time? And then we could change --limit to --end-time, perhaps? I'm open to other suggestions, of course!

Or even --end-at and end_at if that looks better.

Renamed to end[-_]at! :)

emk · 2017-07-24T14:13:32Z

tests/tests.rs

@@ -54,6 +54,10 @@ fn cmd_combine() {
    assert!(from_utf8(&output.stdout).unwrap().find("¡Si!").is_some());
 }

+static FIRST_LINE: &'static str = "[sound:empty_00060_828-00066_164.es.mp3],0:01:00.828,empty,\"<img src=\"\"empty_00063_496.jpg\"\" />\",¡Si! ¡Aang ha vuelto!,Yay! Yay! Aang\'s back!,,,¡Lo sabía!,I knew it!\n";
+
+static LAST_LINE: &'static str = "[sound:empty_00077_124-00083_379.es.mp3],0:01:17.124,empty,\"<img src=\"\"empty_00080_251.jpg\"\" />\",\"El no ha hecho nada, Sokka, fué un accidente.\",Aang didn\'t do anything.,harás que vengan a nosotros.,\"You\'re leading them straight to us, aren\'t you?\",,\n";


The FIRST_LINE and LAST_LINE test data is too specific, and it will cause the tests to break whenever we change the details of the sound naming, the exact time padding, or the HTML we output for Anki. It might be better to limit it to:

static FIRST_LINE: &'static str = "¡Si! ¡Aang ha vuelto!,Yay! Yay! Aang\'s back!,,,¡Lo sabía!,I knew it!";

...and something similar for LAST_LINE.

I'd like to see some test somewhere that confirms the exact format of the csv export, so we can be sure we don't break Anki support. But yeah, let's do that in a different PR in a better way than this.

Note that changing to your suggestion without updating cli_test_dir will mean that we lose the test that verifies --end-at actually had the desired effect.

emk · 2017-07-24T14:15:39Z

tests/tests.rs

+    testdir.expect_path("empty_csv/cards.csv");
+    testdir.expect_path("empty_csv/empty_00063_496.jpg");
+    testdir.expect_path("empty_csv/empty_00060_828-00066_164.es.mp3");
+    testdir.expect_file_contents("empty_csv/cards.csv", FIRST_LINE);


We need the corresponding negative tests here, both expect_not_path and expect_not_file_contents, or something like that. Please feel free to submit a PR against cli_test_dir with the APIs you need!

How do you feel about accepting this PR's tests as-is, then when I've added expect_not_path and that's been made available, I can submit a follow-up PR here using the new APIs?

I prefer multiple PRs that keep things moving forward, rather than a series of dependent PRs... but also I recognize that sometimes "later" never happens. They're both your projects, so I'll defer to whichever you'd like.

Personally, I'd prefer to hold off on merging this PR until the tests are complete, because then we can both trivially verify that it works correctly without needing to hand-test it. And then I can be lazy and only make a single release of each project, too. 😉

The rest of this patch is looking really great! I'm impressed that your first Rust contributions are so clean and well-tested!

Arthaey · 2017-07-27T06:19:47Z

No, thank you for having tests already in the project, so it was easy to TDD this! I've never even looked at Rust before, so I was glad to have an "easy" entry point. :)

I agree with all your suggestions, but my free time is unpredictable, so I'm not sure when I'll get back to this. Thanks for replying quickly, though, even if I can't follow up as quickly!

Arthaey · 2017-07-30T18:44:26Z

For the record, I don't think you're being "nitpicky" at all! Keeping codebases clean, consistent, and well-factored is important. And since I'd never even looked at Rust or your project before, there are bound to be things I need to fix up. :)

Arthaey · 2017-07-31T15:00:38Z

src/export/csv.rs

+                }
+                _ => {}
+            }
+


Is there a clean way in Rust to extract this 3x duplicated code block? The trick is that it uses break it stop iterating early, which a side-effect-y helper function usually can't do. But if I could get rid of the duplication, that would make me happier and the code easier to maintain. :)

Maybe an if let statement might allow you to drop the _ => {} at least?

fn main() { let v1 = Some("world"); if let Some(v2) = v1 { if v2 == "world" { println!("Hello, world!"); } } }

Another option (probably a better one) is to move the duplicated code into a method on exporter, so you can just:

if exporter.finishedBefore(seconds) { break; }

Or even more ambitiously:

if !exporter.shouldExportAt(seconds) { next; }

This version would also work for a --start-at option, which is inevitably going to be added once --end-at exists. :-)

Also, as a general rule for substudy, it's often worth replacing println! (when useful) with debug!. This will hide the extra output unless the user sets RUST_LOG=debug in the environment, but I generally prefer commands that don't talk much as long as they're doing what you asked them to do. ;-) But I do like generous use of debug! and trace!; it's proven extraordinarily useful in production. The log + env_logger crates are pretty awesome for something so simple.

emk · 2017-08-02T14:19:00Z

I've updated cli_test_dir to have the APIs that this patch needs. We're getting closer!

emk · 2017-08-20T14:30:51Z

Just as a followup, I think this is almost ready to merge. IIRC, it just needs some tests to prove that it honors --end-time and doesn't export any sound clips or CSV records beyond that time.

I've added the necessary APIs to cli_test_dir, so this should be pretty easy. No rush, just letting folks know the current status of this PR. Thank you for the work you've done on this!

emk · 2017-11-23T21:48:24Z

Hello, I'm putting some more work in on substudy, and I have some time to look at this if you're still interested. :-) There may be some merge conflicts because I cleaned up a bunch of serialization and error-handling code, but it shouldn't be anything too horrible.

emk · 2024-05-05T00:55:56Z

This could still be merged if somebody were interested in cleaning up the merge conflicts, but the affected code will be heavily rewritten soon.

emk self-assigned this Jul 24, 2017

emk requested changes Jul 24, 2017

View reviewed changes

emk assigned Arthaey Jul 24, 2017

Change --limit to --end-at. Add option to all export subcommands.

23d87ed

Arthaey commented Jul 31, 2017

View reviewed changes

emk removed their assignment Nov 23, 2017

emk added help wanted enhancement labels Nov 23, 2017

emk closed this May 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added --limit=<seconds> flag to csv export. (Includes tests.) #15

Added --limit=<seconds> flag to csv export. (Includes tests.) #15

Arthaey commented Jul 24, 2017

emk left a comment

emk Jul 24, 2017

Arthaey Jul 30, 2017

Arthaey Jul 30, 2017

emk Jul 24, 2017

emk Jul 24, 2017

Arthaey Jul 30, 2017

emk Jul 24, 2017

Arthaey Jul 30, 2017

Arthaey Jul 30, 2017

emk Jul 24, 2017

Arthaey Jul 30, 2017

emk Jul 31, 2017

Arthaey commented Jul 27, 2017

Arthaey commented Jul 30, 2017

Arthaey Jul 31, 2017

emk Jul 31, 2017

emk commented Aug 2, 2017

emk commented Aug 20, 2017

emk commented Nov 23, 2017

emk commented May 5, 2024 •

edited

Loading

Added --limit=<seconds> flag to csv export. (Includes tests.) #15

Added --limit=<seconds> flag to csv export. (Includes tests.) #15

Conversation

Arthaey commented Jul 24, 2017

emk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Arthaey commented Jul 27, 2017

Arthaey commented Jul 30, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emk commented Aug 2, 2017

emk commented Aug 20, 2017

emk commented Nov 23, 2017

emk commented May 5, 2024 • edited Loading

emk commented May 5, 2024 •

edited

Loading