Skip to content

Commit

Permalink
Merge pull request duckdb#847 from Alex-Monahan/read_parquet
Browse files Browse the repository at this point in the history
Standardize on read_parquet instead of parquet_scan
  • Loading branch information
Alex-Monahan authored Jul 3, 2023
2 parents ee8918a + 77b0bf9 commit 85c77d6
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 8 deletions.
6 changes: 4 additions & 2 deletions docs/data/partitioning/hive_partitioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ title: Hive Partitioning

```sql
-- read data from a hive partitioned data set
SELECT * FROM read_parquet('orders/*/*/*.parquet', hive_partitioning=1);
-- parquet_scan is an alias of read_parquet, so they are equivalent
SELECT * FROM parquet_scan('orders/*/*/*.parquet', hive_partitioning=1);
-- write a table to a hive partitioned data set
COPY orders TO 'orders' (FORMAT PARQUET, PARTITION_BY (year, month));
Expand Down Expand Up @@ -36,7 +38,7 @@ orders
Files stored in this hierarchy can be read using the `hive_partitioning` flag.

```sql
SELECT * FROM parquet_scan('orders/*/*/*.parquet', hive_partitioning=1);
SELECT * FROM read_parquet('orders/*/*/*.parquet', hive_partitioning=1);
```

When we specify the `hive_partitioning` flag, the values of the columns will be read from the directories.
Expand All @@ -46,7 +48,7 @@ Filters on the partition keys are automatically pushed down into the files. This

```sql
SELECT *
FROM parquet_scan('orders/*/*/*.parquet', hive_partitioning=1)
FROM read_parquet('orders/*/*/*.parquet', hive_partitioning=1)
WHERE year=2022 AND month=11;
```

Expand Down
15 changes: 9 additions & 6 deletions docs/extensions/httpfs.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ SELECT COUNT(*) FROM 'https://domain.tld/file.parquet';
Scanning multiple files over HTTP(S) is also supported:

```sql
SELECT * FROM read_parquet(['https://domain.tld/file1.parquet', 'https://domain.tld/file2.parquet']);

-- parquet_scan is an alias of read_parquet, so they are equivalent
SELECT * FROM parquet_scan(['https://domain.tld/file1.parquet', 'https://domain.tld/file2.parquet']);
```

Expand Down Expand Up @@ -129,7 +132,7 @@ SELECT * FROM 's3://bucket/file.extension';
Multiple files are also possible, for example:

```sql
SELECT * FROM parquet_scan(['s3://bucket/file1.parquet', 's3://bucket/file2.parquet']);
SELECT * FROM read_parquet(['s3://bucket/file1.parquet', 's3://bucket/file2.parquet']);
```

### Glob
Expand All @@ -138,7 +141,7 @@ File globbing is implemented using the ListObjectV2 API call and allows to use f
multiple files, for example:

```sql
SELECT * from parquet_scan('s3://bucket/*.parquet')
SELECT * from read_parquet('s3://bucket/*.parquet')
```

This query matches all files in the root of the bucket with the parquet extension.
Expand All @@ -147,13 +150,13 @@ Several features for matching are supported, such as `*` to match any number of
character or `[0-9]` for a single character in a range of characters:

```sql
SELECT COUNT(*) FROM parquet_scan('s3://bucket/folder*/100?/t[0-9].parquet')
SELECT COUNT(*) FROM read_parquet('s3://bucket/folder*/100?/t[0-9].parquet')
```

A useful feature when using globs is the `filename` option which adds a column with the file that a row originated from:

```sql
SELECT * FROM parquet_scan('s3://bucket/*.parquet', FILENAME = 1);
SELECT * FROM read_parquet('s3://bucket/*.parquet', FILENAME = 1);
```

could for example result in:
Expand All @@ -178,7 +181,7 @@ s3://bucket/year=2014/file.parquet
If scanning these files with the HIVE_PARTITIONING option enabled:

```sql
SELECT * FROM parquet_scan('s3://bucket/*/file.parquet', HIVE_PARTITIONING = 1);
SELECT * FROM read_parquet('s3://bucket/*/file.parquet', HIVE_PARTITIONING = 1);
```

could result in:
Expand All @@ -194,7 +197,7 @@ however, these columns behave just like regular columns. For example, filters ca
columns:

```sql
SELECT * FROM parquet_scan('s3://bucket/*/file.parquet', HIVE_PARTITIONING = 1) where year=2013;
SELECT * FROM read_parquet('s3://bucket/*/file.parquet', HIVE_PARTITIONING = 1) where year=2013;
```

## Writing
Expand Down

0 comments on commit 85c77d6

Please sign in to comment.