-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing improvements for #31 #32
base: master
Are you sure you want to change the base?
Conversation
I'm pretty OK to merge the following feature if you add associated test cases:
I'm not sure about the numeric arguments, because in the HTML spec you can have string attribute value without quotes. What do you think? Cheers, |
I separated 2 of your commits about inline space fixing in #33, because it's valuable! 🎉 |
Regarding the test cases: I am planning on adding those but before i spend time on that i wanted to at least have your opinion about the work in progress. Regarding attribute values: |
Variant 4 (#31) is now fixed 6f40bcc - Ignore the last / for url and title comparison (Variant5) Bacause of ea771c3 the test cases are failing. Do you have objections to this commit? if not i will adjust the test cases as required. Otherwise i would like your input on how to handle this |
e024197 - Cleans up redundant spaces using rtrim. Since a lot of tests are expecting these spaces, they are currently failing |
That should fix all test cases for the previously proposed changes |
Some improvement for handling Variant1 in #31 7a0f6a1 - Markdownify only works properly with line feed EOLs so any incoming HTML is now converted Variant1 is not perfect but good enough and further improvements might effect other situations undesirably |
552ba1b - Ignores Office namespace tag <o:p xmlns:o="#unknown"></o:p>. |
0bea029 - I had an issue where plaintext content would be parsed for markdown with undesired results. This addition allows me to pull the regular expressions out of Markdownify and apply only the escaping on my content. |
* | ||
* @return array escapeInText | ||
*/ | ||
public function getescapeInText() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: should this be getEscapeInText
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the variables use a construction where the first word is not capped like here https://github.com/Elephant418/Markdownify/blob/master/src/Converter.php#L169 and here https://github.com/Elephant418/Markdownify/blob/master/src/Converter.php#L185
I kept the function name to adhere to the same name as the variable being retrieved. @tzi so far has not commented on this. But i think he will go over the code when I've finished all of my "improvements" and added the associated test cases per his request
So the short question is. Should the name of the function be 100% exactly the same as the variable being retrieved? or are there guidelines which decide the function name?
@tzi How to best handle the following code if encountered in html
It's comparable to the <o:p></o:p> but that one at least started and ended with the same tag
|
xmlns can have numbers in them apparently (Outlook 2010)
|
I had a situation where an html email contained an empty table tag (microsoft onedrive email). This makes sure that this does not result in an error: b9b3f41 |
Fix PHP8.3 support
This fully fixes Variant5 #31
Also added an option for ConverterExtra to disable adding the CSS class after a tag. I do need ConverterExtra for table support so this CSS class was an annoying addition for me
Let me know what you think