recreate yacc and lex source code #586

santiagoIT · 2024-10-22T20:39:08Z

Hello,

I have enhanced the string regular expresion used in StepP21Lex.lex to fix a problem that we have encountered a few times with certain IFC files.

So I ran the MAKEPARSER.BAT batch file to recreate the yacc and lex source files. But I had compile errors. So I undid my changes and ran the MAKEPARSER.BAT batch file without any changes. It seems to have run fine:

But the generated StepP21Lex.cs files has some changes in it that lead to compile errors:

Longs have been turned into ints:

Also an ifdef is lost:

Should all that be fixed manually or am I missing something or doing something wrong?

andyward · 2024-10-23T09:47:55Z

Yes, it's a hack from a while back. See #561 (comment)

We should really look to replace this old PointsGarden parser

andyward · 2024-10-23T14:39:27Z

I meant to add - you should be able to git cherrypick -n 6517bc1 to re-apply the #6517bc16042b3cfd820dd7eb45f72bbab92d13ad fix to your local branch

santiagoIT · 2024-10-23T15:37:49Z

@andyward
It was precisely the single backslash issue that I am trying to address. Hope to be able to try this out soon and hopefully all unit tests will pass. If so, I will submit a pull request.
We run into this problem frequently.

I hope there are tests with the short unicode encoding, if not I will try to add them. I need to make sure that the regex I have does not break anything with that. If not, I will add some.

santiagoIT · 2024-10-28T20:24:29Z

unfortunately, the change I did to the regex broke some tests.
I wanted the parser to be tolerant against non-correctly encoded strings.
I ran into the EncodeBackslash() Test which is now disabled, and I can see that that is the way it used to work (fault tolerant) but it had to be changed.

I believe the correct approach would be to try to detect Invalid strings, by adding a new Token type (Tokens.STRING_INVALID) in the lex file.
An exception could then be thrown specifying the line number and string, which would make it clear to the user why the file does not load.
I know very little about the encoding of strings in IFC.

Are these the only valid encodings for IFC?
https://technical.buildingsmart.org/resources/ifcimplementationguidance/string-encoding/

\S . No idea where this comes from
‘\PA Are other code pages supported?

Basically what I am trying to come up with is a regex which can be used to detect invalid strings. This regex would be run before the regular string regex,

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recreate yacc and lex source code #586

recreate yacc and lex source code #586

santiagoIT commented Oct 22, 2024

andyward commented Oct 23, 2024

andyward commented Oct 23, 2024

santiagoIT commented Oct 23, 2024

santiagoIT commented Oct 28, 2024

recreate yacc and lex source code #586

recreate yacc and lex source code #586

Comments

santiagoIT commented Oct 22, 2024

andyward commented Oct 23, 2024

andyward commented Oct 23, 2024

santiagoIT commented Oct 23, 2024

santiagoIT commented Oct 28, 2024