Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip all the remaining HTML tag when keepHtml option is disabled #31

Open
tzi opened this issue Jan 29, 2018 · 3 comments
Open

Strip all the remaining HTML tag when keepHtml option is disabled #31

tzi opened this issue Jan 29, 2018 · 3 comments

Comments

@tzi
Copy link
Member

tzi commented Jan 29, 2018

From #29

@SL-Gundam
Copy link
Contributor

Here are some examples of left over html after parsing emails with Markdownify. Some of them are a big mess html wise.
Would also really like it if something could be done about the large amount of empty lines in the end result like in Variant1

I tried cleaning them of any sensitive information. Let me know If i overlooked anything.

The files are paired
_HTML is the html before Markdownify
_Markdownify is what Markdownify made of it after processing using
$html2markdown = new Markdownify\ConverterExtra();
$html2markdown->setKeepHTML( FALSE );
$body = $html2markdown->parseString( $body );
Variant1_HTML.txt
Variant1_Markdownify.txt

Variant2_HTML.txt
Variant2_Markdownify.txt

Variant3_HTML.txt
Variant3_Markdownify.txt

Variant4_HTML.txt
Variant4_Markdownify.txt

Variant5_HTML.txt
Variant5_Markdownify.txt

@SL-Gundam
Copy link
Contributor

SL-Gundam commented Feb 18, 2018

Current status using #32 to fix the various examples above

Variant1 = 90%
Variant2 = 60%
Variant3 = 60%
Variant4 = 100%
Variant5 = 100%

@SL-Gundam
Copy link
Contributor

Here is another example which has a fix in #32 with fd6763e
Variant7_HTML.txt
Variant7_Markdownify.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants