Skip to content

Commit

Permalink
fix wrong regular expression
Browse files Browse the repository at this point in the history
  • Loading branch information
hfxsd committed Nov 22, 2023
1 parent 0cc518c commit daf9df8
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions tidb-lightning/tidb-lightning-data-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -345,7 +345,7 @@ TiDB Lightning currently only supports Parquet files generated by Amazon Aurora
```
[[mydumper.files]]
# The expression needed for parsing Amazon Aurora parquet files
pattern = '(?i)^(?:[^/]*/)*([a-z0-9_]+)\.([a-z0-9_]+)/(?:[^/]*/)*(?:[a-z0-9\-_.]+\.(parquet))$'
pattern = '(?i)^(?:[^/]/)([a-z0-9\-_]+).([a-z0-9\-_]+)/(?:[^/]/)(?:[a-z0-9\-_.]+.(parquet))$'
schema = '$1'
table = '$2'
type = '$3'
Expand Down Expand Up @@ -377,14 +377,14 @@ Take the Aurora snapshot exported to S3 as an example. The complete path of the

Usually, `data-source-dir` is set to `S3://some-bucket/some-subdir/some-database/` to import the `some-database` database.

Based on the preceding Parquet file path, you can write a regular expression like `(?i)^(?:[^/]*/)*([a-z0-9_]+)\.([a-z0-9_]+)/(?:[^/]*/)*(?:[a-z0-9\-_.]+\.(parquet))$` to match the files. In the match group, `index=1` is `some-database`, `index=2` is `some-table`, and `index=3` is `parquet`.
Based on the preceding Parquet file path, you can write a regular expression like `(?i)^(?:[^/]/)([a-z0-9\-_]+).([a-z0-9\-_]+)/(?:[^/]/)(?:[a-z0-9\-_.]+.(parquet))$` to match the files. In the match group, `index=1` is `some-database`, `index=2` is `some-table`, and `index=3` is `parquet`.

You can write the configuration file according to the regular expression and the corresponding index so that TiDB Lightning can recognize the data files that do not follow the default naming convention. For example:

```toml
[[mydumper.files]]
# The expression needed for parsing the Amazon Aurora parquet file
pattern = '(?i)^(?:[^/]*/)*([a-z0-9_]+)\.([a-z0-9_]+)/(?:[^/]*/)*(?:[a-z0-9\-_.]+\.(parquet))$'
pattern = '(?i)^(?:[^/]/)([a-z0-9\-_]+).([a-z0-9\-_]+)/(?:[^/]/)(?:[a-z0-9\-_.]+.(parquet))$'
schema = '$1'
table = '$2'
type = '$3'
Expand Down

0 comments on commit daf9df8

Please sign in to comment.