Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Microsoft SQL #29

Draft
wants to merge 23 commits into
base: main
Choose a base branch
from
Draft

Conversation

kiendang
Copy link

@kiendang kiendang commented Sep 6, 2024

Close #17.

Not complete yet. Most test failures are due to MS SQL does not use UTF-8 for strings. Need to figure that out.

@lihaoyi
Copy link
Member

lihaoyi commented Sep 6, 2024

Can UTF-8 support be opted-in? This post seems to indicate it should be available https://techcommunity.microsoft.com/t5/sql-server-blog/introducing-utf-8-support-for-sql-server/ba-p/734928

@kiendang
Copy link
Author

kiendang commented Sep 6, 2024

That's the thing. I did try that by setting MSSQL_COLLATION but the container (through testcontainers) is flaky. It could hang or close the connection before the tests even run and I had to rerun the tests several time, at least on my machine. It does fix the encoding issue though.

@lihaoyi
Copy link
Member

lihaoyi commented Sep 7, 2024

The code changes look reasonable. In terms of flakiness, perhaps we need to do a Thread.sleep or retries on startup to give the SQL server container a time to initialize?

@kiendang
Copy link
Author

@lihaoyi Question about the sorting tests with nulls under OptionalTests. Take the nullsLast test

SELECT opt_cols0.my_int AS my_int, opt_cols0.my_int2 AS my_int2
FROM opt_cols opt_cols0
ORDER BY my_int NULLS LAST

the defined accepted result is

value = Seq(
  OptCols[Sc](Some(1), Some(2)),
  OptCols[Sc](Some(3), None),
  OptCols[Sc](None, None),
  OptCols[Sc](None, Some(4))
)

but shouldn't

value = Seq(
  OptCols[Sc](Some(1), Some(2)),
  OptCols[Sc](Some(3), None),
  OptCols[Sc](None, Some(4)),
  OptCols[Sc](None, None)
)

also be acceptable since we only order by my_int and not my_int2?

@lihaoyi
Copy link
Member

lihaoyi commented Sep 10, 2024

@kiendang yes that should be acceptable. IIRC the test suite allows a moreValues argument if you want to pass in additional possible expected values, you can use it to allow either case to pass tests for this case

@kiendang
Copy link
Author

Adapted .take and .drop for MS SQL which fixes quite a few tests. MS SQL doesn't have LIMIT. .take without .drop is translated to SELECT TOP(?) ... while .drop (and .drop with .take) is translated to OFFSET ? ROWS FETCH FIRST ? ROWS ONLY

The UTF-8 workaround fixed around 40 tests. There are still around 93 failed tests out of 310. A large number of those are due to MS SQL does not have a boolean type. You can only use boolean expression in the WHERE clause or CASE WHEN conditions, but not anywhere else. You'd have to use the BIT data type, CASE WHEN or IIF instead.

This is legal

SELECT * FROM buyer WHERE name IS NOT NULL;

These are not

CREATE TABLE tb (
  x BOOLEAN
);

SELECT name IS NOT NULL FROM buyer;

SELECT * FROM buyer ORDER BY name IS NOT NULL;

Instead have to do

CREATE TABLE tb (
  x BIT
);

SELECT IIF(name IS NOT NULL, 1, 0) FROM buyer;

SELECT * FROM buyer ORDER BY IIF(name IS NOT NULL, 1, 0);

@lihaoyi
Copy link
Member

lihaoyi commented Sep 10, 2024

Sounds good. Generating BITs instead of BOOLEANs seems fine

The built-in LogMessageWaitStrategy doesn't work.
@kiendang
Copy link
Author

kiendang commented Sep 12, 2024

It's a bit more involved than just overriding TypeMapper[Boolean]. It works in the simple case if you have a BIT column b, SELECT b ... would return a Java boolean. However something like this SELECT x > 3 ... would not work because it's not legal in MS SQL. We would have to do SELECT IIF(x > 3, 1, 0) .... Is there a way to customize the rendering of Expr[Boolean], specifically inside a SELECT statement?

@lihaoyi
Copy link
Member

lihaoyi commented Sep 12, 2024

Having custom serialization in the SELECT clause only is doable, as we control the entire query serialization pipeline, but how invasive it would be depends on exactly what semantics you need to introduce. Is x > 3 only a problem in SELECT, or are there other places where it causes issues as well? Is it ok if we pass x > 3 to a SQL function, or use it in a WHERE clause, or GROUP BY?

@kiendang
Copy link
Author

Boolean expressions can only be used as conditions, not values.

These are legal

WHERE x > 3 

CASE WHEN x > 3

These are not

SELECT x > 3

ORDER BY x > 3

GROUP BY x > 3

-- this is also illegal because `x > 3` is used as value here
WHERE (x > 3) = (x > 3)

Another issue is boolean literals don't exists either, so these currently fail .filter(_ => true). MSSQL users have to do WHERE 1 = 1.

@lihaoyi
Copy link
Member

lihaoyi commented Sep 12, 2024

And how anout as parameters to functions? Like booleans are allowed as parameters to iif, but how about other sql functions?

Maybe one thing to do here is to look into how Slick and Quill handle this. IIRC both of those libraries support mssql server, and their handling may give us some idea of the best way to proceed for scalasql

@kiendang
Copy link
Author

Can't find anything related to MSSQL functions taking boolean inputs but my guess is that IIF is a special built-in construct and the first argument takes in a condition similar to CASE WHEN and WHERE.

Quill does render boolean values as CASE WHEN ... 1 ELSE 0 and boolean literals as 1 = 1 and 1 = 0

https://scastie.scala-lang.org/kiendang/XHfKDq7jRmS0NwFEAC18pQ/25

@lihaoyi
Copy link
Member

lihaoyi commented Sep 12, 2024

Would following Quill's strategy work? Presumably if it works for Quill well enough for people to depend on it it should work for Scalasql as well

@lihaoyi
Copy link
Member

lihaoyi commented Sep 12, 2024

Seems like SLICK treats all booleans as bits as well https://github.com/slick/slick/blob/c4b081da7996ad74ccc707d139b07c11c2eb6bba/slick/src/main/scala/slick/jdbc/SQLServerProfile.scala#L316

I guess thats what we need to do in ScalaSql. If the library doesnt already provide appropriate hooks to override, we'll have to add them

@lihaoyi
Copy link
Member

lihaoyi commented Oct 17, 2024

Closing due to inactivity. The bounty remains open for anyone reading

@lihaoyi lihaoyi closed this Oct 17, 2024
@kiendang
Copy link
Author

@lihaoyi sorry for the lack of updates. This issue does turn out to require more work than I thought due to the many Microsoft SQL quirks. Beside the issue with Boolean, a lot of remaining failed tests would require more non-trivial solutions than the ones I've already covers. I haven't abandoned the PR but do work on it at a slower pace. If it's ok for you to reopen the PR, I'd work on the remaining issues gradually.

Of course I understand at the same time the bounty's still open to everyone else. Anyone who's interested feels free to continue from branch which provides a good starting point.

@lihaoyi
Copy link
Member

lihaoyi commented Oct 18, 2024

@kiendang got it

@lihaoyi lihaoyi reopened this Oct 18, 2024
@lihaoyi
Copy link
Member

lihaoyi commented Oct 18, 2024

happy to leave it open if you're still working on it.

@lihaoyi
Copy link
Member

lihaoyi commented Oct 18, 2024

@kiendang I'm happy to pay out the original bounty (500USD) for the work you've already done so far, just so you don't feel the work has been wasted in case someone else picks it up and completes it. Email me your international bank transfer details and I'll perform the wire

@kiendang
Copy link
Author

@lihaoyi Thank you! I've been compiling a list of remaining issues. Will clean it up and post here. Would require your perspective for some of them. Plus it'd help others who are interested in continue the work.

I'm also happy to receive the original bounty amount. Will email you later but this could take me a couple weeks coz I've just gone through some life changes.

I've also just seen the post on Reddit. I don't want to block anyone from continue the work and I wouldn't claim the rest of the bounty if someone does. Would it be a good idea if you update the original issue to point to this PR as a starting point?

@lihaoyi
Copy link
Member

lihaoyi commented Oct 18, 2024

@kiendang take your time, no hurry at all. Yes I'll update the ticket to explicitly point at this WIP for others to build upon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Microsoft SQL server (1000USD Bounty)
2 participants