Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding check and restructured file loading #122

Merged
merged 7 commits into from
Aug 24, 2023
Merged

Conversation

gaxweb
Copy link
Contributor

@gaxweb gaxweb commented Aug 16, 2023

See https://www.gefeg.com/jswg/cl/v3/11b/cl1.htm Those are all that are supported in syntax version 3. In v4 there are more: https://www.gefeg.com/jswg/cl/v4x/40219/cl1.htm

As discussed in #82 here's my suggested solution.

I think the check for UNOA and UNOB isn't 100% correct, but it was the best I could find.

The if(!isset(self::$charsets[$this->syntaxID])) check might be better as an entry in the errors array instead of an exception, but then the actual check wouldn't make any sense, if a different syntaxID is used. Maybe an EdifactException?

Also some minor refactoring, like renaming private stuff to be closer to what they are actually called in EDI lingo.

P.S.: I wouldn't return anything from either load(), loadString() or parse(). I consider that leaking object state. Especially since get() exists.

Also some minor refactoring.
@sabas
Copy link
Collaborator

sabas commented Aug 17, 2023

On the PS remark, in edifact-generator I return the object (return $this) to allow method chaining. We could do it here as well?
Would you complete this commit with the suggestion you made in #73 ?

the current commit looks good

@gaxweb
Copy link
Contributor Author

gaxweb commented Aug 17, 2023

If you want to break with that, I'd go one further and remove the $url argument from Parser::__construct(), too, along with the related code in it. The constructor shouldn't do (much) work since they're really meant for dependency injection.

The load() and loadString() methods already exist and are public after all. Just needs one for passing an array, or just adjust the load() method to work with arrays, too.

@sabas
Copy link
Collaborator

sabas commented Aug 17, 2023

Ok let's try! Do you want to be added to the organization?

That made the constructor and resetUN*() methods useless.
@gaxweb
Copy link
Contributor Author

gaxweb commented Aug 18, 2023

I've changed the loading methods.

Do you want to be added to the organization?

Not sure what you mean.

Edit: I just noticed that testSplitMultiMessage() fails in the Reader. I'll have a look.

It was required for the Reader::load() method anyway.
This also dramatically reduced the memory requirement somehow.
The tests wouldn't even complete with less than 5GB of RAM. Less than 1GB needed now.
@gaxweb
Copy link
Contributor Author

gaxweb commented Aug 18, 2023

Fixed that, too. This ok?

@@ -326,7 +329,7 @@ public function load(string $location): self
* @param string $txt
* @return self
*/
public function loadString(string &$txt): self
public function loadString(string $txt): self
Copy link
Contributor Author

@gaxweb gaxweb Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reduced the required memory. Why was this passed by reference? The tests don't seem to care and it only needs 10MB to run them now instead of GBs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because I already had the string loaded in a variable? I don't remember :D

Copy link
Contributor Author

@gaxweb gaxweb Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if it was for performance reasons, it's probably not necessary, maybe even harmful: http://schlueters.de/blog/archives/125-Do-not-use-PHP-references.html

Passing by reference disables the copy-on-write PHP default behaviour, which is probably why the memory requirement exploded.

@gaxweb gaxweb changed the title Simple check of the character encoding Encoding check and restructured loading Aug 18, 2023
@gaxweb gaxweb changed the title Encoding check and restructured loading Encoding check and restructured file loading Aug 18, 2023
@sabas
Copy link
Collaborator

sabas commented Aug 21, 2023

Should parse() return $this?

Btw tell me when you think it's complete to merge..

@gaxweb
Copy link
Contributor Author

gaxweb commented Aug 21, 2023

Should parse() return $this?

I don't really have an opinion on that.

But I'm not certain how (if) to handle multiple calls to the loading methods. The una(b)Checked flags are flipped after the first time parse() is called, which might not relate to the actual data that's now loaded inside the object on consecutive calls. Maybe the two resetUN*() methods should be brought back and executed either inside the load* or parse method(s)?
I'm not sure why the 2 analyseUN*() methods are only run conditionally anyway. They don't look expensive to run.

This is really getting off-topic from the initial pull request though. :D

@gaxweb
Copy link
Contributor Author

gaxweb commented Aug 21, 2023

I think I'm done here. I hope I didn't miss something in the process.

P.S.: Thanks for the invite, but this account is really a catch-all one that multiple people have access to, and you probably don't want all of them in here.

@sabas sabas merged commit da3277d into php-edifact:master Aug 24, 2023
@gaxweb
Copy link
Contributor Author

gaxweb commented Aug 31, 2023

I've found a bit of a problem, if you pass the Reader a Parser that had loaded an array. I'll post about it soon. Just a heads up that the current state isn't quite stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants