Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV Scanner not working in SFTP context : extraneous or missing \" in quoted-field" #2812

Closed
acornforth opened this issue Aug 28, 2024 · 1 comment

Comments

@acornforth
Copy link

Tested with the same files locally using the csv input, works fine. Something about the records extracted from the csv is breaking when they are pulled from sftp. Not sure is this is something speicif to the SFTP server we are connecting to.

I have already googled for issues, and a found some instance of this error occurring for example when line endings are /r, not /r/n or /n. but i checked this and it doesn't appear to be the cause in our case.

Not sure if this is a benthos issue directly or something to do with the underlying encoding/csv package

config below:

input:
  sftp:
    address: '${SFTP_HOST}:22'
    credentials:
      username: '${SFTP_USER}'
      password: '${SFTP_PASSWORD}'
      private_key_file: '/app/.ssh/id_rsa'
    paths:
      - '/*.csv'
    auto_replay_nacks: true
    scanner:
      csv:
        - parse_header_row: true
    delete_on_finish: false
    watcher:
      enabled: true
      minimum_age: 20s
      poll_interval: 60s
      cache: memory

buffer:
  none: {}
output:
  stdout: {}

cache_resources:
  - label: memory
    memory:
      default_ttl: 7200s
  - label: redis
    redis:
      url: 'redis://${REDIS_HOST}:${REDIS_PORT}'
      prefix: benthos_sftpw_cache_

@mihaitodor
Copy link
Collaborator

Hey @acornforth 👋 Thanks for reporting this! Would you mind doing a quick test and set lazy_quotes: true in the csv scanner? It should look like this (not sure why you had it as an array):

scanner:
      csv:
         parse_header_row: true
         lazy_quotes: true

I think the error you're hitting is due to this code in the parser.

PS: Converting to a discussion as per #2026.

@redpanda-data redpanda-data locked and limited conversation to collaborators Aug 28, 2024
@mihaitodor mihaitodor converted this issue into discussion #2813 Aug 28, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants