Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow collation to be a part of data type #7748

Merged
merged 9 commits into from
Nov 24, 2023

Conversation

aafemt
Copy link
Contributor

@aafemt aafemt commented Sep 13, 2023

@AlexPeshkoff
Copy link
Member

I see nothing bad that we get closer to SQL standard.

@dyemanov
Copy link
Member

dyemanov commented Sep 26, 2023

A few comments after looking at parse.y:

  1. By SQL spec, COLLATE can be specified not only for specific character set, but also for national_character_type.
  2. I see charset_clause also in udf_data_type and array_type, they're not covered by this patch.

Also, accordingly to the SQL spec, COLLATE cannot be specified for BINARY. It corresponds to your new grammar, but what about CHARACTER SET OCTETS being specified explicitly with some COLLATE? Is it going to be rejected due to collation mismatch after INTL tables lookup or maybe we should prohibit it with a dedicated error message?

24) COLLATE clause can be used as a part of character data type as per SQL standard.
(Dmitry Sibiryakov)

If is used twice, an error is returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not correspond to changes below. For parameters you seem to allow both and use the latest one. What about local variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is intended for table fields only, I'm not sure if parameters and variables are covered. Should they?

Copy link
Member

@dyemanov dyemanov Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have added the code to ParameterClause::ctor with a comment:
// Explicit collation has precedence over data type collation
This does not match the text about error being raised.

And parameters/variables are declared using:
column_domain_or_non_array_type collate_clause ...
where column_domain_or_non_array_type now includes a collate_clause itself after your changes...
domain_or_non_array_type -> domain_or_non_array_type_name -> non_array_type -> simple_type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment applies to cases when data type is derived from domain or column including collation. Old code silently reset collation to default if no explicit COLLATE was provided so I've added the condition and the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For parameters you seem to allow both and use the latest one.

Indeed, I'll fix that but it is not really related to the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about local variables?

Local variables seems to be ok:

SQL> create procedure p2 as
CON> declare a varchar(10) character set win1251 collate pxw_cyrl not null;
CON> declare b varchar(10) character set win1251 not null collate pxw_cyrl;
CON> declare c vu collate unicode_ci_ai;
CON> declare d vu;
CON> declare e vc not null collate win1251;
CON> declare f vc;
CON> begin end^
SQL> create procedure p3 as
CON> declare a varchar(10) character set win1251 collate pxw_cyrl not null collate win1251;
CON> begin end^
Statement failed, SQLSTATE = 42000
SQL error code = -637
-duplicate specification of COLLATE - not supported

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is pity that setClause() is not available outside of parser. Can they be made static?

IMHO, they (along with setClauseFlag / checkDuplicateClause / isDuplicateClause) should be static, as they don't access the parser state. @asfernandes , do you agree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would question the need in collate_clause after domain_or_non_array_type and column_domain_or_non_array_type, as now it's also included inside these parser rules. The only regression would be that NOT NULL must be written after COLLATE, not before as now. Is it an acceptable breakage, given the low probability that both co-exist in input/output parameters / return values in real-world scripts? And there will be no need to check for duplicated definitions then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nodes are going to call Parser::setClause, I don't agree. It's layer violation.
In this case I would move them to Node class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collation can be removed from ParameterClause constructor parameters and assigned later. Is it ok?

@aafemt
Copy link
Contributor Author

aafemt commented Sep 26, 2023

array_type is not covered on purpose for two reasons:

  1. arrays are not much supported in general;
  2. it adds another parser conflict that is not good.

Copy link
Member

@dyemanov dyemanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, I agree with the PR, but the questioned issues must be addressed before merging.

@dyemanov
Copy link
Member

array_type is not covered on purpose for two reasons:

1. arrays are not much supported in general;
2. it adds another parser conflict that is not good.

I don't mind.

@aafemt
Copy link
Contributor Author

aafemt commented Sep 27, 2023

but what about CHARACTER SET OCTETS being specified explicitly with some COLLATE? Is it going to be rejected due to collation mismatch after INTL tables lookup or maybe we should prohibit it with a dedicated error message?

Charset OCTETS has only one collation so unless somebody created a custom collation for it (is it possible at all?) you'll get error "collation not found" trying to set collation to anything but default "OCTETS". I don't think that using of COLLATE should be prohibited for particular charset(s). At least for the sake of backward compatibility.

@dyemanov
Copy link
Member

but what about CHARACTER SET OCTETS being specified explicitly with some COLLATE? Is it going to be rejected due to collation mismatch after INTL tables lookup or maybe we should prohibit it with a dedicated error message?

Charset OCTETS has only one collation so unless somebody created a custom collation for it (is it possible at all?) you'll get error "collation not found" trying to set collation to anything but default "OCTETS". I don't think that using of COLLATE should be prohibited for particular charset(s). At least for the sake of backward compatibility.

OK, fine with me.

@@ -4737,14 +4737,16 @@ array_range
%type <legacyField> simple_type
simple_type
: non_charset_simple_type
| character_type charset_clause
| character_type charset_clause collate_clause
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether it works in our codebase, but accordingly to the SQL specification collate_clause must also be supported for text blobs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but Firebird has no standard character blobs (CLOB or CHARACTER LARGE OBJECT), only binary blobs so this part is not for this PR.

@aafemt
Copy link
Contributor Author

aafemt commented Sep 28, 2023

  1. By SQL spec, COLLATE can be specified not only for specific character set, but also for national_character_type.

Added.

@aafemt aafemt requested a review from dyemanov November 11, 2023 14:04
@AlexPeshkoff AlexPeshkoff merged commit e23f030 into FirebirdSQL:master Nov 24, 2023
22 checks passed
@asfernandes
Copy link
Member

@aafemt this change introduced a problem making this query work:

select cast('x' as varchar(10) character set utf8 collate wrong) from rdb$database;

@aafemt
Copy link
Contributor Author

aafemt commented Sep 11, 2024

If something that never worked started to work I wouldn't call it "a problem".
If cast ignores collation in target data type definition it is the problem with cast and not data type definition. Thus I extract your comment to a separate issue.

@asfernandes
Copy link
Member

If something that never worked started to work I wouldn't call it "a problem".

Not true. It worked perfectly well, giving an error in a not supported contruction.

If cast ignores collation in target data type definition it is the problem with cast and not data type definition. Thus I extract your comment to a separate issue.

No. It's a problem introduced in this PR.

@aafemt
Copy link
Contributor Author

aafemt commented Sep 12, 2024

It worked perfectly well, giving an error in a not supported contruction

Now the construction is supported and the query doesn't produce "not supported" error. It is logical and usual behavior of any new functionality.

@asfernandes
Copy link
Member

Now the construction is supported and the query doesn't produce "not supported" error. It is logical and usual behavior of any new functionality.

I've no idea of what are you talking, really. Makes no sense for me.

Is it a way to say that you're not going to fix regression #8249 introduced by this your change?

@aafemt
Copy link
Contributor Author

aafemt commented Sep 12, 2024

Is it a way to say that you're not going to fix regression #8249 introduced by this your change?

It is a way to tell you "do not call this regression". Regression is a case when working documented functionality doesn't work anymore.

#8249 is a separate bug that should be fixed by me or anyone else like any other bug. You assigned it to me - fine. Now there is nothing to do but to sit and wait.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants