Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recreate yacc and lex source code #586

Open
santiagoIT opened this issue Oct 22, 2024 · 4 comments
Open

recreate yacc and lex source code #586

santiagoIT opened this issue Oct 22, 2024 · 4 comments

Comments

@santiagoIT
Copy link

Hello,

I have enhanced the string regular expresion used in StepP21Lex.lex to fix a problem that we have encountered a few times with certain IFC files.

So I ran the MAKEPARSER.BAT batch file to recreate the yacc and lex source files. But I had compile errors. So I undid my changes and ran the MAKEPARSER.BAT batch file without any changes. It seems to have run fine:

image

But the generated StepP21Lex.cs files has some changes in it that lead to compile errors:

Longs have been turned into ints:
image

image

Also an ifdef is lost:
image

Should all that be fixed manually or am I missing something or doing something wrong?

@andyward
Copy link
Member

Yes, it's a hack from a while back. See #561 (comment)

We should really look to replace this old PointsGarden parser

@andyward
Copy link
Member

I meant to add - you should be able to git cherrypick -n 6517bc1 to re-apply the #6517bc16042b3cfd820dd7eb45f72bbab92d13ad fix to your local branch

@santiagoIT
Copy link
Author

@andyward
It was precisely the single backslash issue that I am trying to address. Hope to be able to try this out soon and hopefully all unit tests will pass. If so, I will submit a pull request.
We run into this problem frequently.

I hope there are tests with the short unicode encoding, if not I will try to add them. I need to make sure that the regex I have does not break anything with that. If not, I will add some.

@santiagoIT
Copy link
Author

unfortunately, the change I did to the regex broke some tests.
I wanted the parser to be tolerant against non-correctly encoded strings.
I ran into the EncodeBackslash() Test which is now disabled, and I can see that that is the way it used to work (fault tolerant) but it had to be changed.

I believe the correct approach would be to try to detect Invalid strings, by adding a new Token type (Tokens.STRING_INVALID) in the lex file.
An exception could then be thrown specifying the line number and string, which would make it clear to the user why the file does not load.
I know very little about the encoding of strings in IFC.

Are these the only valid encodings for IFC?
https://technical.buildingsmart.org/resources/ifcimplementationguidance/string-encoding/

  1. \S . No idea where this comes from
  2. ‘\PA Are other code pages supported?

Basically what I am trying to come up with is a regex which can be used to detect invalid strings. This regex would be run before the regular string regex,

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants