Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

De-duplicate parsing logic across PHP and JS #1086

Closed
westonruter opened this issue Jun 9, 2017 · 9 comments · Fixed by #1130 or #1152
Closed

De-duplicate parsing logic across PHP and JS #1086

westonruter opened this issue Jun 9, 2017 · 9 comments · Fixed by #1130 or #1152
Labels
[Feature] Blocks Overall functionality of blocks [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Type] Bug An existing feature does not function as intended

Comments

@westonruter
Copy link
Member

There are currently two parsers being used: one in PegJS (in JS) and regular expressions (in the PHP do_blocks and parse_block_attributes functions). Has there been any investigation into re-using the PegJS grammar directly in PHP? For example, php-pegjs. This would be important as the attributes get more complex, such as in implementing dynamic blocks (widgets, #1011).

In that regard, the serialization logic is currently limited to strings that lack any special characters. If a block attempts to serialize a string that includes --> or " then the parser would break. Likewise, if an attribute contains an object then it currently gets encoded as [object Object] and if it is an integer or a boolean, then it gets converted into strings.

I'm also aware of #988, but it doesn't change how attributes get serialized. To me it seems like there needs to be more robust handling of serializing attribute values that need special encoding. While the goal of being human-readable is important, it's probably less important than having data that isn't easily corrupted. This problem was tackled in the Shortcake plugin by URL-encoding such values: wp-shortcake/shortcake#496

Nevertheless, to preserve the original attribute value's data type as well as to escape the value during serialization, perhaps the value should be serialized as JSON and then URL-encoded. This is the approach that the JS Widgets plugin took when porting making widgets available as shortcodes: xwp/wp-js-widgets#32

The attributes encoded with JSON could be prefixed with encoded:. For example:

<!-- wp:example/foo encoded:bar="%7B%22baz%22%3A%5B%22qux%22%5D%7D" --><!-- /wp:example/foo -->

If the parser can be made more robust than to just rely on regular expressions, however, then we could just encode straight JSON as the values.

<!-- wp:example/foo bar={"baz":["qux"]} --><!-- /wp:example/foo -->

Note here that no double-quotes are used for the value because it's encoding an object. Similarly, integers could be unquoted and as well as booleans:

<!-- wp:example/foo someString="a \"great\" quote" --><!-- /wp:example/foo -->
<!-- wp:example/foo someNumber=1.243513 --><!-- /wp:example/foo -->
<!-- wp:example/foo someBool=false --><!-- /wp:example/foo -->
<!-- wp:example/foo someArray=[1,2,3] --><!-- /wp:example/foo -->
@westonruter westonruter added [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Feature] Blocks Overall functionality of blocks labels Jun 9, 2017
@nylen
Copy link
Member

nylen commented Jun 9, 2017

Has there been any investigation into re-using the PegJS grammar directly in PHP? For example, php-pegjs.

This is the correct way forward - we've already run into multiple issues because these two are different (#890 and #870 (comment) for example).

Note that our PEG does include some JavaScript code, so we will need to simplify this and/or provide an approach to run equivalent logic on the server side (#, # for example).

we could just encode straight JSON as the values.
<!-- wp:example/foo bar={"baz":["qux"]} --><!-- /wp:example/foo -->

This seems like the best approach.

@nylen nylen added the [Type] Bug An existing feature does not function as intended label Jun 9, 2017
@westonruter
Copy link
Member Author

It seems php-pegjs requires pegjs 0.8 wheras Gutenberg has 0.10. Also php-pegjs hasn't been updated in two years.

Aside from the PHP implementation, adaptation, and integration, I think the other thing is to modify the current grammar from:

HTML_Attribute_Item
  = HTML_Attribute_Quoted
  / HTML_Attribute_Unquoted
  / HTML_Attribute_Empty

To be something like:

HTML_Attribute_Item
  = HTML_Attribute_JSON
  / HTML_Attribute_Empty

Where HTML_Attribute_JSON is a grammar which replicates the JSON specification, as us bundled with pegjs itself: https://github.com/pegjs/pegjs/blob/master/examples/json.pegjs

@aduth
Copy link
Member

aduth commented Jun 9, 2017

cc @dmsnell , I think you've been working on JSON attributes support in the Automattic/wp-post-grammar repository?

https://github.com/Automattic/wp-post-grammar/blob/master/src/post.pegjs
https://automattic.github.io/wp-post-grammar/explorer/

@westonruter
Copy link
Member Author

Doesn't look like it's quite working yet? For one, I'm seeing the JSON being stored inside of a string instead of appearing literally in the block comment:

<!-- wp:core/text id="a98b469" data='{"version":1.0,"name":"wp-post-grammar","deps":["a","b",{"name":"c","devOnly":true}]}' -->
Content
<!-- /wp -->

It also seems that the JSON is limited in the characters it can represent? I couldn\'t seem to add an apostrophe to the JSON strings without causing a parse error, and &apos; wasn't recognized.

@dmsnell
Copy link
Member

dmsnell commented Jun 11, 2017

In #1130 ^^^ I added an alternative approach to parsing these kinds of attributes. I believe that the real problem wasn't that the parser wasn't designed to handle them so much as it simply failed to handle escaped quotes, which should now be possible.

Personally I would like to do what we can with the JS parser and pushing that out in the design before we spend too much effort harmonizing with a PHP-based parser. I'm sure the parser will be changing a lot and unless we need the server at this point (which maybe we do and I didn't realize it) to be able to parse I feel like we will help ourselves by figuring out the design first then add the implementation.

Thanks for the poke @aduth

@westonruter
Copy link
Member Author

@dmsnell thanks for that. Yeah, the most important problem is the lack of being able to parse attribute values with escaped quotes. The secondary concern I have is in regards to the data type of the values. For example, does foo=false get parsed as boolean false or string "false"? Same goes for numbers and null. If the attribute values are always is quoted then how will we disambiguate between a value that is JSON data vs a string that happens to be valid JSON? If the value is always JSON then will the attribute values always get double-quoted, for example foo='"some string"'?

unless we need the server at this point (which maybe we do and I didn't realize it) to be able to parse

This will become increasingly important now that we are driving into dynamic blocks (widgets) which have server-side rendering.

@dmsnell
Copy link
Member

dmsnell commented Jun 11, 2017

@westonruter thanks for the review!

the most important problem is the lack of being able to parse attribute values with escaped quotes

I disagree. this is a minor problem hardly a problem. it's just a small bug in the parser and we've got a simple fix for that in #1130.

does foo=false get parsed as boolean false or string "false"?

wouldn't see any reason to interpret it as a boolean false. we still haven't made our decisions on what the block structure needs and what attributes mean, but unless we make a decision to say, "any attribute value will be JSON unless it fails to parse as JSON" I would be reluctant to "try" on occasion.

that being said, I don't think it would be bad to consider that, or to make a special data attribute that does imply JSON data. for simple attributes though I think it can be overkill…

<!-- wp:my/block name="false" -->

how will we disambiguate between a value that is JSON data vs a string that happens to be valid JSON

this is my point. so far I haven't seen anyone suggest that attributes are JSON so I don't see why we need to disambiguate. that is, it's not JSON unless a block attempts to parse the JSON as such.

If the value is always JSON then will the attribute values always get double-quoted

earlier today I found something which suggested we might need to double-quote and double-quote-escape all the attributes because I suspect that WordPress or React might be "fixing" the single-quoted strings in the block comment delimiters by replacing them with double-quotes. we discovered this in #1130 when data='{"key":"value"}' turned unexpectedly into data="{"key":"value"}" on save in the editor and then the block was destroyed.

This will become increasingly important now

I'm not as familiar with this, but I thought the point of the dynamic blocks was that they render client-side? Fallback HTML in the post_content but the actual rendering is in the browser?

@westonruter
Copy link
Member Author

@dmsnell

I disagree. this is a minor problem hardly a problem. it's just a small bug in the parser and we've got a simple fix for that in #1130.

I meant it is the main problem (primary defect), not necessarily that it is a major problem. Good that you got a fix.

I'm not as familiar with this, but I thought the point of the dynamic blocks was that they render client-side? Fallback HTML in the post_content but the actual rendering is in the browser?

Take the latest-posts block as an example. It renders on the client server-side, with an abstract live preview in the client editor. It has one attribute poststoshow which is an integer. It doesn't have any fallback HTML since it would always be changing depending on which posts are the most recent.

@nylen
Copy link
Member

nylen commented Jun 12, 2017

Reopening to reflect the main issue here, that our parsing logic is duplicated (and behaves differently) between JS and PHP.

@nylen nylen reopened this Jun 12, 2017
@nylen nylen changed the title Encoding of complex arbitrary attributes De-duplicate parsing logic across PHP and JS Jun 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Blocks Overall functionality of blocks [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Type] Bug An existing feature does not function as intended
Projects
None yet
4 participants