Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] CSV output doesn't handle entries with quotes but not commas. #3050

Closed
Michael-S opened this issue Oct 1, 2024 · 1 comment · Fixed by #3063 or #3148
Closed

[BUG] CSV output doesn't handle entries with quotes but not commas. #3050

Michael-S opened this issue Oct 1, 2024 · 1 comment · Fixed by #3063 or #3148
Labels
bug Something isn't working

Comments

@Michael-S
Copy link
Contributor

What is the bug?
CSV output of entries like "a" b or a " b should, according to RFC 4180, escape them to be """a"" b" and "a "" b", respectively. See section 2.5, https://www.rfc-editor.org/rfc/rfc4180

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Add values to a test index.
#!/usr/bin/env bash
# Assuming OPENSEARCH_SERVER is the server,
# OPENSEARCH_USER is the host, OPENSEARCH_PASSWORD is the password.
# and the user has the rights to create and query this index.
# Add an entry to the index that has two quotes in it, not at the boundaries.
curl -XPUT "https://${OPENSEARCH_SERVER}:9200/csv_error/_doc/1" \
    -H 'Content-Type: application/json' \
    -d'{  "field1": "\"a\" b" }' --user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"
# Add an entry that has a quote in it.
curl -XPUT "https://${OPENSEARCH_SERVER}:9200/csv_error/_doc/2" \
    -H 'Content-Type: application/json' \
    -d'{ "field1": "\"a\" b" }' --user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"
# Optional, add an entry that has no quotes or commas.
curl -XPUT "https://${OPENSEARCH_SERVER}:9200/csv_error/_doc/3" \
    -H 'Content-Type: application/json' \
    -d'{ "field1": "a b" }' --user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"
# Optional, add an entry that has commas.
curl -XPUT "https://${OPENSEARCH_SERVER}:9200/csv_error/_doc/4" \
    -H 'Content-Type: application/json' \
    -d'{ "field1": "a, b" }' --user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"
  1. Query the index to CSV output.
#!/usr/bin/env bash
curl -XPOST "https://${OPENSEARCH_SERVER}:9200/_plugins/_sql?format=csv" \
    -H 'Content-Type: application/json' \
    -d'{ "query": "SELECT * FROM csv_error LIMIT 50" }' \
     --user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"
  1. Response data is:
"a" b  # expected to be """a"" b"
a " b  # expected to be "a "" b"
a b     # correct
"a, b" # correct

What is the expected behavior?
The CSV output escapes double quotes properly.

What is your host/environment?

  • OS: Ubuntu 20.04
  • Version - 2.17.0.0
  • Plugins - default

Do you have any additional context?
As far as I can tell, the problem appears to be in function quoteIfRequired in FlatResponseBase.java,

protected String quoteIfRequired(String separator, String cell) {
but it's possible changing it there might affect other output formats.

@Michael-S Michael-S added bug Something isn't working untriaged labels Oct 1, 2024
Michael-S added a commit to Michael-S/opensearch-sql-plugin that referenced this issue Oct 9, 2024
Michael-S added a commit to Michael-S/opensearch-sql-plugin that referenced this issue Oct 9, 2024
Michael-S added a commit to Michael-S/opensearch-sql-plugin that referenced this issue Oct 9, 2024
@dblock dblock removed the untriaged label Oct 21, 2024
@dblock
Copy link
Member

dblock commented Oct 21, 2024

[Catch All Triage - 1, 2]

@Swiddis Swiddis closed this as completed in cfe38d7 Nov 6, 2024
Swiddis pushed a commit to Swiddis/sql that referenced this issue Nov 6, 2024
Swiddis pushed a commit to Swiddis/sql that referenced this issue Nov 6, 2024
Fixes opensearch-project#3050

Signed-off-by: Mike Swierczek <[email protected]>
(cherry picked from commit cfe38d7)
Signed-off-by: Simeon Widdis <[email protected]>
ykmr1224 pushed a commit that referenced this issue Nov 14, 2024
* Improve error handling for some more edge cases (#3080)

* Add failing tests

Signed-off-by: Simeon Widdis <[email protected]>

* Fix the first test

Signed-off-by: Simeon Widdis <[email protected]>

* Revise the tests

Signed-off-by: Simeon Widdis <[email protected]>

* Fix wildcard tests

Signed-off-by: Simeon Widdis <[email protected]>

* Add license header

Signed-off-by: Simeon Widdis <[email protected]>

* Fix rerunning SQL parsing

Signed-off-by: Simeon Widdis <[email protected]>

---------

Signed-off-by: Simeon Widdis <[email protected]>

* Fix: CSV and Raw output, escape quotes (#3063)

Fixes #3050

Signed-off-by: Mike Swierczek <[email protected]>
(cherry picked from commit cfe38d7)
Signed-off-by: Simeon Widdis <[email protected]>

* Fix merge conflict

Signed-off-by: Simeon Widdis <[email protected]>

---------

Signed-off-by: Simeon Widdis <[email protected]>
Signed-off-by: Mike Swierczek <[email protected]>
Signed-off-by: Simeon Widdis <[email protected]>
Co-authored-by: Mike Swierczek <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants