Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dirty in, dirty out #18

Open
kortina opened this issue Jul 23, 2023 · 0 comments
Open

dirty in, dirty out #18

kortina opened this issue Jul 23, 2023 · 0 comments

Comments

@kortina
Copy link

kortina commented Jul 23, 2023

First, I love this tool -- thanks for making it!

I have run into a few instances where I get somewhat odd markdown back out of gather, so I've been saving a few examples.

(1) https://time.com/6286449/ray-dalio-world-great-disorder/ produces the following:

1. **The ****largest amounts of debt, the fastest rates of debt growth, and the greatest
amounts of central bank printing of money and buying debt since 1930-45. **

which is not the cleanest markdown (empty open/close bold before "largest" and a " " space character before the end of the line / closing bold), but clearly the result parsing:

<ol> <li><strong>The </strong><strong>largest amounts of debt, the fastest
 rates of debt growth, and the greatest amounts of central bank printing of 
money and buying debt since 1930-45. </strong></li> 

Ideally, would be great to get a cleaned up version:

1. **The largest amounts of debt, the fastest rates of debt growth, and the greatest 
amounts of central bank printing of money and buying debt since 1930-45.**

(2) https://www.commentary.org/articles/gary-morson/joseph-epsteins-argues-we-all-need-novels/ produces:

 _Great Expecta__tions_: Life disappoints.

which would ideally be

 _Great Expectations_: Life disappoints.

but again clearly result of just parsing:

<i>Great Expecta</i><i>tions</i>: Life disappoints.

I suppose it makes sense from a simplicity POV to just do the literal parsing of html with no "cleanup" of the markdown, but what do you think of adding a flag to do things like:

  • remove empty <i>, <b>, <u>, etc elements
  • remove trailing spaces before the close of one of these elements

Essentially running output through some sort of linter with autofix.

Perhaps this is easier imagined / said than done.

Running the second example through prettier yields:

 _Great Expecta\_\_tions_: Life disappoints.

What are your thoughts on such a modification?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant