Skip to content
This repository has been archived by the owner on Jun 28, 2024. It is now read-only.

Check markdown rs #5621

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

Christopher-C-Robinson
Copy link

Draft of markdown parser

Signed-off-by: Domesticcadiz <[email protected]>
Signed-off-by: Domesticcadiz <[email protected]>
Signed-off-by: Domesticcadiz <[email protected]>
@Christopher-C-Robinson Christopher-C-Robinson added the wip Work in Progress (PR incomplete - needs more work or rework) label Apr 25, 2023
@Christopher-C-Robinson Christopher-C-Robinson requested a review from a team as a code owner April 25, 2023 15:38
@katacontainersbot katacontainersbot added the size/tiny Smallest and simplest task label Apr 25, 2023
Signed-off-by: Domesticcadiz <[email protected]>
@jodh-intel
Copy link
Contributor

jodh-intel commented Apr 25, 2023

Speaking to @Christopher-C-Robinson and @Toreseen, we know that the following needs to be done to this draft PR:

  • Remove the target/ binaries introduced by the editor we believe.
  • Remove the src/main_from_go.rs and src/validate_structure.rs.
  • Possibly split the functionality out of src/main.rs into one or more other files below src/.
  • Rewrite the README.md to contain details of the tool itself (it's currently a copy of the top-level README.md).
  • Squash all the commits down into one.
  • Update the commit format to match https://github.com/kata-containers/community/blob/main/CONTRIBUTING.md#patch-format.

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Christopher-C-Robinson and @Toreseen - this is looking good! I've added a few comments.

More generally, I know we've discussed this. But looking at the code I do think it might make sense to split the main into a few other files as this grows. Maybe:

src/output.rs
src/node.rs
src/toc.rs
src/stats.rs

... and then add a corresponding mod output, mod node, etc in src/main.rs to "activate" these modules.

Also, it would be good to add a few unit tests as that will help shake out bugs and increase your confidence in the functions as you'll know they can tolerate all types of nasty input (writing tests can be fun as you try to find lots of devious inputs for your functions to handle 😈)

let input = fs::read_to_string(input_file_path)?;
let root = parse_document(&arena, &input, &ComrakOptions::default());

let toc = generate_toc(&root, &arena);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to only display this when the toc subcommand (or an option) is specified. We use a sub-command for the existing golang tool:

You'll want to cargo add clap and then ideally create structs and/or enums to represent, declaratively, the CLI options. For examples, see:

eprintln!("Usage: {} <input_file>", args[0]);
}

let input_file_path = env::current_dir()?.join("src").join(args[1].clone());
Copy link
Contributor

@jodh-intel jodh-intel Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit odd: why do we need to add src? Although it might work with cargo run -- $file, will it also work if you cargo install and then run check-markdown-rs "$file"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure if this is what you are lookin for, bit I'm pushing a commit now that removes the SRC from the path, assuming that you will already cd into the src directory.

Comment on lines 57 to 62
NodeValue::Heading(heading) => println!(
"{}Header (level {}): {}",
indent,
heading.level,
get_node_text(&node)
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know these blocks are small, but I'd be inclined to create a set of functions to handle each node type:

  • handle_heading_node(indent, level, text)
  • handle_text_node(indent, text)
  • handle_link_node(indent, title, url)

The advantages of doing this include:

  • simplifies print_nodes() and makes it easier to read.
  • makes it easier to unit test different types of nodes raising confidence in the parts of the code you have tested formally.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I added what you were looking for.

Comment on lines 116 to 121
return Err(String::from("Error: More than one H1 heading found."));
}
}

if level > last_seen_level + 1 {
return Err(String::from("Error: Invalid heading level order."));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can you remove the Error: prefix from the errors? The type of these objects Result::Err already encodes that they are errors so it's best to let the higher levels handle adding prefixes like this (see comment on that topic below).
  • Can you include the heading in the error being returned here? In fact, ideally, you'd also add the line number and the filename to the error (either here or at a higher level) to make it 100% clear to the user which part of which file contains the error. This is particularly important when the tool starts to jump between files when it finds a cross-file link, and then finds, for example, an invalid link in another file with the same name as a link in the initial file (yep - I've been there and done that ;)

Copy link
Contributor

@Toreseen Toreseen Apr 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first part of this should be fixed now!

Comment on lines 307 to 310
println!(" Found {} invalid links:", errors.len());
for error in errors {
println!(" {}", error);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the way you've collected all the errors together here and are displaying them at the end. Conventionally, errors should be displayed to standard error stream, so can you change the println!() calls to eprintln!() and add an "ERROR: " prefix to the call in the for loop?

@katacontainersbot katacontainersbot added size/huge Largest and most complex task (probably needs breaking into small pieces) and removed size/tiny Smallest and simplest task labels Apr 26, 2023
@katacontainersbot katacontainersbot added size/tiny Smallest and simplest task and removed size/huge Largest and most complex task (probably needs breaking into small pieces) labels Apr 30, 2023
@katacontainersbot katacontainersbot added size/huge Largest and most complex task (probably needs breaking into small pieces) and removed size/tiny Smallest and simplest task labels Apr 30, 2023
Comment on lines 16 to 17
cargo build --release
./target/release/markdown_parser input_file.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It might be safe to install the command to avoid the target line potentially being incorrect:

$ cargo install --path .
$ check-markdown-rs input_file.md

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

Comment on lines 18 to 20
if args.len() < 2 {
eprintln!("Usage: {} <input_file>", args[0]);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • You need to call std::process::exit(1) after displaying the message: currently, the message is displayed but the program keeps running, then fails:
    $ check-markdown-rs
    Usage: check-markdown-rs <input_file>
    thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/main.rs:22:52
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
  • Please can you add some argument parsing code so that the user can run:
    $ check-markdown-rs --help
    $ check-markdown-rs --version

…curred in the markdown doc

Signed-off-by: Maxwell Wendlandt <[email protected]>
Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Christopher-C-Robinson and @Toreseen,

As we're very close to the end of the outreach project now, I suggest you rework this PR to remove the binary files, etc as follows. Note that once you've done this though, @Toreseen won't be able to git pull from @Christopher-C-Robinson's fork, so you'll both have to work together and have just @Christopher-C-Robinson git push -f to his fork once you've followed the steps below. The steps below are similar in principle to what we did in the meeting on Tuesday with Alec and Jenny's branch. Yes, this is "hacky" in the sense that we're not using git to resolve the problems - we're simply recreating the branch, but I think this approach is simpler to understand. As always, make sure you have a copy of your local branch/repo before doing the following:

# Update your copy of the main branch
git checkout main
git pull

# Switch to your feature branch
git checkout check-markdown-rs

# Save the files you've created for this new tool
cp cmd/check-markdown-rs/Cargo.toml cmd/check-markdown-rs/README.md cmd/check-markdown-rs/src/main.rs /tmp/

# Rename the original version of the branch (it's now a "backup")
git branch -m check-markdown-rs-original

# Create a new clean version of the feature branch
git checkout main
git checkout -b check-markdown-rs

# Recreate the branch
mkdir cmd/check-markdown-rs
cd cmd/check-markdown-rs
cp /tmp/Cargo.toml /tmp/README.md .
mkdir src
cp /tmp/main.rs src

# Add the new files to git
git add Cargo.toml README.md src/main.rs

# Save the changes
git commit -as

# Update your fork (which will automatically update this PR)
git push -f

Once you've done that, you'll also need to update cmd/README.md to add an entry for your new tool, so:

  • Add an entry for your new tool to cmd/README.md
  • git commit -as
  • Merge the new commit with the main commit by running git rebase -i main.
  • Change the 2nd commit to squash.
  • Save and quit the editor.
  • Update the PR again by running, git push -f

@@ -1,13 +0,0 @@
# CLI tools
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Christopher-C-Robinson and @Toreseen - I don't think this file should have been deleted?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instructions above show you how to recreate the entire branch which will resolve this issue 😄

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that now. I added that back in. should be pushed sometime tonight.

@Christopher-C-Robinson
Copy link
Author

@jodh-intel

@Toreseen and I have done some serious work tonight on our code. If you get a chance, can you take a look at it. Might be messy bit it works. We both spent about 12 hours each on it individually. I’m finally off to bed. Hopefully we are getting close!

@katacontainersbot katacontainersbot added size/tiny Smallest and simplest task and removed size/huge Largest and most complex task (probably needs breaking into small pieces) labels May 5, 2023
Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Christopher-C-Robinson and @Toreseen ! This is very close now.

Before we can get this landed though, you'll need to clean up the branch to (a) remove all the binary files; and (b) collapse all your commits into a single one - see my big "branch recreation" comment.

Comment on lines 48 to 49
eprintln!("Usage: {} <input_file>", args[0]);
return Ok(());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Ideally, this would return an Err as this is it invalid to call the tool without specifying a file.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is fixed now

@katacontainersbot katacontainersbot added size/huge Largest and most complex task (probably needs breaking into small pieces) and removed size/tiny Smallest and simplest task labels May 8, 2023
@Christopher-C-Robinson
Copy link
Author

Thanks @Christopher-C-Robinson and @Toreseen - this is looking good! I've added a few comments.

More generally, I know we've discussed this. But looking at the code I do think it might make sense to split the main into a few other files as this grows. Maybe:

src/output.rs
src/node.rs
src/toc.rs
src/stats.rs

... and then add a corresponding mod output, mod node, etc in src/main.rs to "activate" these modules.

Also, it would be good to add a few unit tests as that will help shake out bugs and increase your confidence in the functions as you'll know they can tolerate all types of nasty input (writing tests can be fun as you try to find lots of devious inputs for your functions to handle 😈)

@jodh-intel , I split the main.rs into different files. let me know what your thoughts are. I will work more on it tonight.

@katacontainersbot katacontainersbot added size/tiny Smallest and simplest task and removed size/huge Largest and most complex task (probably needs breaking into small pieces) labels May 8, 2023
…ther requested freatures.

Signed-off-by: Maxwell Wendlandt <[email protected]>
@katacontainersbot katacontainersbot added size/huge Largest and most complex task (probably needs breaking into small pieces) and removed size/tiny Smallest and simplest task labels May 9, 2023
Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Christopher-C-Robinson and @Toreseen - this is looking good.

Could you remove the binary files from this PR now? Also, could you delete the unused functions that cargo build lists?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
size/huge Largest and most complex task (probably needs breaking into small pieces) wip Work in Progress (PR incomplete - needs more work or rework)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants