Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unquote csv strings #18

Merged
merged 2 commits into from
May 2, 2024
Merged

Conversation

jbruechert
Copy link
Contributor

No description provided.

@jbruechert jbruechert marked this pull request as draft April 29, 2024 15:58
@jbruechert
Copy link
Contributor Author

jbruechert commented Apr 29, 2024

meh, the one case that wasn't easy to solve is the relevant one for MOTIS :(

EDIT: This may also not be the right place to do this

@felixguendling
Copy link
Member

felixguendling commented Apr 29, 2024

parse_arg is still quite generic and also used in the HAFAS Rohdaten parser. Since the double quote escaping seems to be CSV specific, it could make sense to have it in a separate function that's only used by the CSV parser.

Maybe something overloading like this:

  • parse_csv(cstr s, std::string& t) { /* escape */ }
  • parse_csv(cstr s, T& t) { parse_arg(cstr, t); }

General note:

  • I try to use AAA style everywhere. I didn't do it since the beginning, so you'll find plenty of places where it's not AAA style. However, for new code it makes sense to stick to it.
  • I would recommend to configure a "format on save" to have code formatted in the project specific way by default (no action required). Otherwise a commit hook or manual check before committing helps.
  • In general the parser code should avoid temporary and unnecessary allocations at all costs to stay fast. Your code looks fine for now though. 👍

@felixguendling
Copy link
Member

meh, the one case that wasn't easy to solve is the relevant one for MOTIS :(

EDIT: This may also not be the right place to do this

I guess the way you started it here would require to change row types in these structs to std::string:
https://github.com/motis-project/nigiri/blob/55dde0d22013150f7db98054f336555c5003a781/src/loader/gtfs/stop.cc#L178-L179

In this case, it makes sense to std::move the string into the struct stop struct (or in other places the trip struct, etc.) to prevent allocating and copying more memory than necessary.

@jbruechert jbruechert force-pushed the csv-quote branch 2 times, most recently from 7e0ba2c to 53ad5cf Compare April 29, 2024 16:44
@jbruechert
Copy link
Contributor Author

I guess the way you started it here would require to change row types in these structs to std::string: https://github.com/motis-project/nigiri/blob/55dde0d22013150f7db98054f336555c5003a781/src/loader/gtfs/stop.cc#L178-L179

In this case, it makes sense to std::move the string into the struct stop struct (or in other places the trip struct, etc.) to prevent allocating and copying more memory than necessary.

Do you think that is a good way to go? I'm a bit unhappy about the inconsistency of the overloads between cstr and std::string.

@felixguendling
Copy link
Member

felixguendling commented Apr 29, 2024

The question is: what is the alternative? To apply the unescaping function manually where necessary? But then where is it not necessary? Even for agency_id, etc. it's probably required if we do it correctly.

I think it's still good to have an escape hatch to prevent unescaping if you use cstr instead of std::string. The only problem I see is that it's not really obvious that one will be escaped and the other one not.

I think as long as there are no extra allocations, I am fine. I am a bit crazy with this.. especially temporary allocations 🤯

@jbruechert
Copy link
Contributor Author

jbruechert commented Apr 29, 2024

Makes sense.

Now for some reason I can't get the std::string overload to be called from stop.cc, I just hope the cause are not implicit conversions to cstr, since code depends on them all over the place.

EDIT: Okay no, I just got the wrong stop.cc and didn't notice.

@jbruechert
Copy link
Contributor Author

works now 👍

Copy link
Member

@felixguendling felixguendling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please run the formatter after these changes - then I think we're good to go 👍

include/utl/parser/csv.h Show resolved Hide resolved
include/utl/parser/csv.h Outdated Show resolved Hide resolved
include/utl/parser/csv.h Outdated Show resolved Hide resolved
include/utl/parser/csv.h Show resolved Hide resolved
@jbruechert jbruechert marked this pull request as ready for review April 30, 2024 09:49
include/utl/parser/csv.h Outdated Show resolved Hide resolved
include/utl/parser/csv.h Outdated Show resolved Hide resolved
@felixguendling
Copy link
Member

Can you please add a test-case for the found_at == s.size() - 1 case?

Somehow I don't get an email notification for force-push. So if you want me to have a look at it, it would help to request a new review or comment here. Or is there a way to subscribe to force-push events too?

@felixguendling felixguendling merged commit 81821f8 into motis-project:master May 2, 2024
6 of 10 checks passed
@felixguendling
Copy link
Member

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants