Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improvement](jdbc catalog) Optimize JdbcCatalog case mapping stability #41510

Merged
merged 4 commits into from
Dec 25, 2024

Conversation

zy-kkk
Copy link
Member

@zy-kkk zy-kkk commented Sep 30, 2024

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

  1. Cancel the preservation of name mapping by IdentifierMapping, and only use it as the logic of case processing
  2. Save the remote name in ExternalDatabase/ExternalTable object for get
  3. Use fromRemoteXXName to process the remote name and map it to the local. Currently, only JdbcExternalCatalog uses IdentifierMapping for processing in this method, and other Catalogs are not processed. If processing is required, it should be extended by yourself
  4. When lower_case_meta_name is true, the database, table, and column names mapped remotely to local will be checked to see if they are only case-sensitive.
  5. Since Doris's column names are case-insensitive, column names will be checked at any time
  6. When Fe config's lower_case_table_names is 1 or 2, the table names mapped from remote to local will be checked for only case-sensitive.
  7. include/exclude_database_lists The remote name must be used
  8. meta_names_mapping is used to resolve the conflicts mentioned above
  9. If the same remoteName is in include_database_lists and meta_names_mapping, the filtered remoteName will be displayed according to the name mapped by meta_names_mapping
  10. Since IdentifierMapping is no longer used to store the uppercase and lowercase name mapping, ExternalObject itself is used instead, which solves the problem of frequent refresh and small mapping loss
  11. Remove the logic of rebuilding JdbcClient every time refreshCatalog

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@zy-kkk
Copy link
Member Author

zy-kkk commented Sep 30, 2024

run buildall

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add need to add more FE unit test to test all these logics

@zy-kkk zy-kkk force-pushed the jdbc_lower_mapping2 branch from 1f71caf to a833914 Compare October 7, 2024 15:12
@zy-kkk
Copy link
Member Author

zy-kkk commented Oct 7, 2024

run buildall

@zy-kkk zy-kkk force-pushed the jdbc_lower_mapping2 branch from a833914 to a310c8f Compare October 8, 2024 03:59
@zy-kkk
Copy link
Member Author

zy-kkk commented Oct 8, 2024

run buildall

@zy-kkk
Copy link
Member Author

zy-kkk commented Oct 9, 2024

run buildall

@zy-kkk zy-kkk force-pushed the jdbc_lower_mapping2 branch 2 times, most recently from 4483c21 to ac6a485 Compare November 4, 2024 17:08
@zy-kkk
Copy link
Member Author

zy-kkk commented Nov 4, 2024

run buildall

@zy-kkk zy-kkk force-pushed the jdbc_lower_mapping2 branch 2 times, most recently from d440a07 to 313cb95 Compare November 19, 2024 16:45
@zy-kkk zy-kkk force-pushed the jdbc_lower_mapping2 branch 2 times, most recently from 72c6f9a to 9185dca Compare December 4, 2024 07:42
String remoteColumnName = jdbcCatalog.getRemoteColumnNames(this.dbName, this.name, column.getName());
remoteColumnNames.put(column.getName(), remoteColumnName);
}
if (!remoteColumnNames.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the remoteColumnNames is always not empty

dbName -> Optional.ofNullable(
buildDbForInit(dbName, Util.genIdByName(name, dbName), logType, true)),
(key, value, cause) -> value.ifPresent(v -> v.setUnInitialized(invalidCacheInInit)));
ignored -> remoteToLocalPairs.stream().map(Pair::value).collect(Collectors.toList()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not right.
If you do this, the dbnames cache will always be remoteToLocalPairs, even if there are new dbs added.

@@ -76,6 +79,8 @@ public abstract class ExternalDatabase<T extends ExternalTable>
protected long id;
@SerializedName(value = "name")
protected String name;
@SerializedName(value = "remoteName")
protected String remoteName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remoteName maybe null when upgrading from old version. How to handle it?

@@ -85,6 +87,7 @@ public class ExternalTable implements TableIf, Writable, GsonPostProcessable {
protected long dbId;
protected boolean objectCreated;
protected ExternalCatalog catalog;
protected ExternalDatabase db;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also mark dbName as deprecated?

@@ -70,6 +70,8 @@ public class ExternalTable implements TableIf, Writable, GsonPostProcessable {
protected long id;
@SerializedName(value = "name")
protected String name;
@SerializedName(value = "remoteName")
protected String remoteName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remoteName maybe null when upgrading from old version. How to handle it?

@Override
public boolean tableExist(SessionContext ctx, String dbName, String tblName) {
makeSureInitialized();
return jdbcClient.isTableExist(dbName, tblName);
ExternalTable tbl = Objects.requireNonNull(this.getDbNullable(dbName)).getTableNullable(tblName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to make sure tbl is always not null?

@@ -49,6 +49,7 @@
import com.google.common.collect.Sets;
import com.google.gson.annotations.SerializedName;
import com.google.gson.internal.LinkedTreeMap;
import jdk.internal.joptsimple.internal.Strings;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong import, use guava

@zy-kkk zy-kkk force-pushed the jdbc_lower_mapping2 branch from 690a196 to af1edc6 Compare December 10, 2024 11:11
@@ -781,6 +781,15 @@ protected ExternalDatabase<? extends ExternalTable> buildDbForInit(String remote
return null;
}
}
} catch (RuntimeException e) {
// Handle "Found conflicting" exception explicitly
if (e.getMessage().contains("Found conflicting")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better define "Found conflicting" as a static field.
and use it here and in RuntimeException msg

LOG.error(e.getMessage());
throw e; // Rethrow to let the caller handle this critical issue
} else {
LOG.warn("Failed to check db {} exist in remote system, ignore it.", localDbName, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this can be ignored?
Add comment in code

@liutang123
Copy link
Contributor

Hi, I found a bug witch can be fixed in this pr, can you add a test case like #45521 in ExternalTableNameComparedLowercaseTest?

@zy-kkk zy-kkk force-pushed the jdbc_lower_mapping2 branch from af1edc6 to 1e38576 Compare December 23, 2024 07:26
@zy-kkk
Copy link
Member Author

zy-kkk commented Dec 23, 2024

run buildall

1 similar comment
@zy-kkk
Copy link
Member Author

zy-kkk commented Dec 23, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39494 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8419a4479c2e719e671ff5d09ac3324fe90cafae, data reload: false

------ Round 1 ----------------------------------
q1	17730	7562	7265	7265
q2	2052	181	171	171
q3	10598	1147	1101	1101
q4	10244	689	718	689
q5	7608	2725	2706	2706
q6	254	150	148	148
q7	999	616	607	607
q8	9225	1823	1938	1823
q9	6633	6429	6368	6368
q10	7036	2254	2338	2254
q11	471	262	265	262
q12	435	220	215	215
q13	17812	2891	2959	2891
q14	250	221	210	210
q15	564	516	513	513
q16	670	585	585	585
q17	985	565	611	565
q18	7285	6611	6626	6611
q19	1345	1014	1013	1013
q20	478	183	189	183
q21	4473	3347	3011	3011
q22	381	303	319	303
Total cold run time: 107528 ms
Total hot run time: 39494 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7248	7198	7223	7198
q2	324	232	229	229
q3	2923	2759	2994	2759
q4	2109	1858	1901	1858
q5	5579	5618	5666	5618
q6	223	140	141	140
q7	2250	1811	1817	1811
q8	3394	3487	3489	3487
q9	8882	8912	8933	8912
q10	3578	3564	3514	3514
q11	598	511	521	511
q12	805	589	581	581
q13	11544	3081	3116	3081
q14	302	277	283	277
q15	569	498	499	498
q16	694	644	636	636
q17	1823	1609	1548	1548
q18	7960	7404	7325	7325
q19	1664	1460	1530	1460
q20	2072	1762	1802	1762
q21	5412	5207	5206	5206
q22	687	606	560	560
Total cold run time: 70640 ms
Total hot run time: 58971 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 188766 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8419a4479c2e719e671ff5d09ac3324fe90cafae, data reload: false

query1	986	387	378	378
query2	6519	2495	2399	2399
query3	6706	222	218	218
query4	34256	23367	23223	23223
query5	4398	492	458	458
query6	284	203	193	193
query7	4643	310	318	310
query8	302	242	232	232
query9	9742	2778	2749	2749
query10	477	240	235	235
query11	18154	15068	14975	14975
query12	154	101	100	100
query13	1657	420	410	410
query14	10169	6539	7058	6539
query15	284	172	179	172
query16	8253	382	470	382
query17	1592	565	538	538
query18	2125	299	304	299
query19	362	158	164	158
query20	127	111	112	111
query21	209	104	105	104
query22	4258	4418	4304	4304
query23	33879	33480	33432	33432
query24	11568	2536	2376	2376
query25	706	398	427	398
query26	1815	156	158	156
query27	2880	332	332	332
query28	8109	2487	2487	2487
query29	1040	473	412	412
query30	297	148	151	148
query31	1045	815	858	815
query32	101	55	58	55
query33	799	291	289	289
query34	996	505	512	505
query35	861	730	735	730
query36	1079	927	959	927
query37	298	73	79	73
query38	4289	4279	3973	3973
query39	1472	1433	1430	1430
query40	289	100	100	100
query41	47	45	50	45
query42	116	103	102	102
query43	549	488	486	486
query44	1266	803	806	803
query45	188	160	163	160
query46	1141	742	719	719
query47	1942	1814	1844	1814
query48	421	323	310	310
query49	1290	405	389	389
query50	789	387	390	387
query51	7177	7032	6970	6970
query52	101	92	90	90
query53	258	181	183	181
query54	1313	410	428	410
query55	80	77	81	77
query56	266	232	245	232
query57	1268	1087	1118	1087
query58	242	208	225	208
query59	3167	2919	2924	2919
query60	277	250	248	248
query61	108	109	106	106
query62	894	653	682	653
query63	230	178	187	178
query64	5193	708	628	628
query65	3212	3230	3214	3214
query66	1413	312	301	301
query67	15876	15351	15336	15336
query68	5834	563	565	563
query69	433	256	249	249
query70	1207	1132	1113	1113
query71	436	252	256	252
query72	6414	4076	4134	4076
query73	783	364	374	364
query74	10294	8761	8834	8761
query75	3457	2585	2711	2585
query76	3523	1024	1105	1024
query77	554	283	287	283
query78	10724	9578	9460	9460
query79	2243	604	618	604
query80	748	429	426	426
query81	507	246	233	233
query82	816	118	113	113
query83	170	148	148	148
query84	242	72	68	68
query85	976	348	301	301
query86	460	313	304	304
query87	4421	4333	4299	4299
query88	3947	2252	2219	2219
query89	428	292	298	292
query90	2208	192	187	187
query91	139	103	115	103
query92	64	50	53	50
query93	2052	566	548	548
query94	1036	278	276	276
query95	349	262	246	246
query96	619	292	281	281
query97	2839	2700	2662	2662
query98	217	197	198	197
query99	1569	1331	1301	1301
Total cold run time: 305713 ms
Total hot run time: 188766 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.32 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 8419a4479c2e719e671ff5d09ac3324fe90cafae, data reload: false

query1	0.04	0.03	0.04
query2	0.08	0.03	0.03
query3	0.24	0.07	0.07
query4	1.61	0.10	0.11
query5	0.44	0.39	0.40
query6	1.14	0.65	0.66
query7	0.02	0.02	0.02
query8	0.04	0.04	0.03
query9	0.56	0.49	0.50
query10	0.54	0.57	0.55
query11	0.15	0.10	0.10
query12	0.13	0.11	0.12
query13	0.60	0.59	0.60
query14	2.74	2.85	2.77
query15	0.87	0.81	0.82
query16	0.39	0.38	0.40
query17	1.04	1.05	1.05
query18	0.23	0.21	0.21
query19	1.82	1.83	1.92
query20	0.02	0.02	0.01
query21	15.36	0.60	0.57
query22	2.77	2.36	1.44
query23	16.82	0.93	0.71
query24	3.70	0.34	1.94
query25	0.15	0.12	0.05
query26	0.58	0.14	0.13
query27	0.04	0.05	0.05
query28	10.19	1.11	1.07
query29	12.56	3.22	3.22
query30	0.25	0.07	0.06
query31	2.85	0.39	0.38
query32	3.24	0.46	0.46
query33	3.03	3.02	3.08
query34	16.97	4.43	4.42
query35	4.47	4.42	4.47
query36	0.68	0.48	0.47
query37	0.10	0.06	0.05
query38	0.04	0.04	0.03
query39	0.04	0.02	0.02
query40	0.17	0.12	0.14
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 106.86 s
Total hot run time: 31.32 s

@zy-kkk
Copy link
Member Author

zy-kkk commented Dec 24, 2024

run buildall

Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@zy-kkk zy-kkk merged commit 9e50906 into apache:master Dec 25, 2024
23 of 25 checks passed
@zy-kkk zy-kkk deleted the jdbc_lower_mapping2 branch December 25, 2024 12:15
zy-kkk added a commit to zy-kkk/doris that referenced this pull request Dec 25, 2024
…ty (apache#41510)

1. Cancel the preservation of name mapping by IdentifierMapping, and only use it as the logic of case processing
2. Save the remote name in ExternalDatabase/ExternalTable object for get
3. Use fromRemoteXXName to process the remote name and map it to the local. Currently, only JdbcExternalCatalog uses IdentifierMapping for processing in this method, and other Catalogs are not processed. If processing is required, it should be extended by yourself
4. When lower_case_meta_name is true, the database, table, and column names mapped remotely to local will be checked to see if they are only case-sensitive.
5. Since Doris's column names are case-insensitive, column names will be checked at any time
6. When Fe config's lower_case_table_names is 1 or 2, the table names mapped from remote to local will be checked for only case-sensitive.
7. include/exclude_database_lists The remote name must be used
8. meta_names_mapping is used to resolve the conflicts mentioned above
9. If the same remoteName is in include_database_lists and meta_names_mapping, the filtered remoteName will be displayed according to the name mapped by meta_names_mapping
10. Since IdentifierMapping is no longer used to store the uppercase and lowercase name mapping, ExternalObject itself is used instead, which solves the problem of frequent refresh and small mapping loss
11. Remove the logic of rebuilding JdbcClient every time refreshCatalog
zy-kkk added a commit to zy-kkk/doris that referenced this pull request Dec 25, 2024
…ty (apache#41510)

1. Cancel the preservation of name mapping by IdentifierMapping, and only use it as the logic of case processing
2. Save the remote name in ExternalDatabase/ExternalTable object for get
3. Use fromRemoteXXName to process the remote name and map it to the local. Currently, only JdbcExternalCatalog uses IdentifierMapping for processing in this method, and other Catalogs are not processed. If processing is required, it should be extended by yourself
4. When lower_case_meta_name is true, the database, table, and column names mapped remotely to local will be checked to see if they are only case-sensitive.
5. Since Doris's column names are case-insensitive, column names will be checked at any time
6. When Fe config's lower_case_table_names is 1 or 2, the table names mapped from remote to local will be checked for only case-sensitive.
7. include/exclude_database_lists The remote name must be used
8. meta_names_mapping is used to resolve the conflicts mentioned above
9. If the same remoteName is in include_database_lists and meta_names_mapping, the filtered remoteName will be displayed according to the name mapped by meta_names_mapping
10. Since IdentifierMapping is no longer used to store the uppercase and lowercase name mapping, ExternalObject itself is used instead, which solves the problem of frequent refresh and small mapping loss
11. Remove the logic of rebuilding JdbcClient every time refreshCatalog
zy-kkk added a commit to zy-kkk/doris that referenced this pull request Dec 25, 2024
…ty (apache#41510)

1. Cancel the preservation of name mapping by IdentifierMapping, and only use it as the logic of case processing
2. Save the remote name in ExternalDatabase/ExternalTable object for get
3. Use fromRemoteXXName to process the remote name and map it to the local. Currently, only JdbcExternalCatalog uses IdentifierMapping for processing in this method, and other Catalogs are not processed. If processing is required, it should be extended by yourself
4. When lower_case_meta_name is true, the database, table, and column names mapped remotely to local will be checked to see if they are only case-sensitive.
5. Since Doris's column names are case-insensitive, column names will be checked at any time
6. When Fe config's lower_case_table_names is 1 or 2, the table names mapped from remote to local will be checked for only case-sensitive.
7. include/exclude_database_lists The remote name must be used
8. meta_names_mapping is used to resolve the conflicts mentioned above
9. If the same remoteName is in include_database_lists and meta_names_mapping, the filtered remoteName will be displayed according to the name mapped by meta_names_mapping
10. Since IdentifierMapping is no longer used to store the uppercase and lowercase name mapping, ExternalObject itself is used instead, which solves the problem of frequent refresh and small mapping loss
11. Remove the logic of rebuilding JdbcClient every time refreshCatalog
zy-kkk added a commit to zy-kkk/doris that referenced this pull request Dec 25, 2024
…ty (apache#41510)

1. Cancel the preservation of name mapping by IdentifierMapping, and only use it as the logic of case processing
2. Save the remote name in ExternalDatabase/ExternalTable object for get
3. Use fromRemoteXXName to process the remote name and map it to the local. Currently, only JdbcExternalCatalog uses IdentifierMapping for processing in this method, and other Catalogs are not processed. If processing is required, it should be extended by yourself
4. When lower_case_meta_name is true, the database, table, and column names mapped remotely to local will be checked to see if they are only case-sensitive.
5. Since Doris's column names are case-insensitive, column names will be checked at any time
6. When Fe config's lower_case_table_names is 1 or 2, the table names mapped from remote to local will be checked for only case-sensitive.
7. include/exclude_database_lists The remote name must be used
8. meta_names_mapping is used to resolve the conflicts mentioned above
9. If the same remoteName is in include_database_lists and meta_names_mapping, the filtered remoteName will be displayed according to the name mapped by meta_names_mapping
10. Since IdentifierMapping is no longer used to store the uppercase and lowercase name mapping, ExternalObject itself is used instead, which solves the problem of frequent refresh and small mapping loss
11. Remove the logic of rebuilding JdbcClient every time refreshCatalog
morningman pushed a commit that referenced this pull request Dec 26, 2024
morningman pushed a commit that referenced this pull request Dec 26, 2024
morningman pushed a commit that referenced this pull request Dec 27, 2024
### What problem does this PR solve?

Related PR: #41510

Problem Summary:

We should use remote name to get statistics
github-actions bot pushed a commit that referenced this pull request Dec 27, 2024
### What problem does this PR solve?

Related PR: #41510

Problem Summary:

We should use remote name to get statistics
morningman pushed a commit that referenced this pull request Jan 3, 2025
### What problem does this PR solve?
Related PR: #41510

Problem Summary:

In this pr#41510, namesCache changed from `LoadingCache<String,
List<String>>` to `LoadingCache<String, List<Pair<String, String>>>`,
but the `remove` method of this cache did not change, causing the hms
event case to fail.
github-actions bot pushed a commit that referenced this pull request Jan 3, 2025
### What problem does this PR solve?
Related PR: #41510

Problem Summary:

In this pr#41510, namesCache changed from `LoadingCache<String,
List<String>>` to `LoadingCache<String, List<Pair<String, String>>>`,
but the `remove` method of this cache did not change, causing the hms
event case to fail.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants