Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sqlserver] use nvarchar to load unicode data? #37

Closed
nickolay opened this issue Oct 27, 2023 · 4 comments
Closed

[sqlserver] use nvarchar to load unicode data? #37

nickolay opened this issue Oct 27, 2023 · 4 comments

Comments

@nickolay
Copy link

  • $ sling --version

      Version: 1.0.48
    
  • file /c/temp/test_unicode.csv

      /c/temp/test_unicode.csv: Unicode text, UTF-8 text, with CRLF line terminators
    
  • cat /c/temp/test_unicode.csv

      a;b;c
      aaa;bbb;ñññ
    
  • cat /c/temp/test_unicode.csv | sling run --tgt-conn dev --tgt-object dbo.test

      INF connecting to target database (sqlserver)
      INF reading from stream (stdin)
      INF delimiter auto-detected: ";"
      INF writing to target database [mode: full-refresh]
      INF streaming data
      WRN bcp version 13 is old. This may give issues with sling, consider upgrading.
      INF dropped table "dbo"."test"
      INF created table "dbo"."test"
      INF inserted 1 rows in 0 secs [2 r/s]
      INF execution succeeded
    

This creates varchar columns:

CREATE TABLE [dbo].[test](
	[a] [varchar](255) NULL,
	[b] [varchar](255) NULL,
	[c] [varchar](255) NULL,
	[_sling_loaded_at] [bigint] NULL
) ON [PRIMARY]
GO

...which use the database default collation, corrupting any incompatible characters:

a	b	c
aaa	bbb	+-+-+-

Would it be possible to use the unicode nvarchar type instead of varchar?

@flarco
Copy link
Collaborator

flarco commented Oct 28, 2023

Done.
Commit: flarco/dbio@7b87316
Releasing 1.0.50 in a bit.

@flarco flarco closed this as completed Oct 28, 2023
@nickolay
Copy link
Author

Thanks! I can see that the column type is now nvarchar, but the data is still not loaded correctly.

Will look into it further the next time I have something to load!

@flarco
Copy link
Collaborator

flarco commented Oct 29, 2023

Interesting, it might be the encoding?
Feel free to try the transforms source option (source.options.transforms), provided as an array.
Maybe decode_latin1 or decode_windows1250?

See:

@nickolay
Copy link
Author

nickolay commented Mar 5, 2025

Will look into it further the next time I have something to load!

Filed #518

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants