Skip to content

Commit

Permalink
Updated based on review
Browse files Browse the repository at this point in the history
Add remote database url. Modify attribute naming convention. Remove chunking scripts.
  • Loading branch information
zacdezgeo committed Sep 5, 2024
1 parent ec6f51d commit 1ff3ada
Show file tree
Hide file tree
Showing 4 changed files with 3 additions and 76 deletions.
4 changes: 2 additions & 2 deletions docs/acceptance/db.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The acceptance test below provides steps to verify that the deliverable meets ou

The input data is stored in Parquet format on AWS S3 (object storage), specifically in the file `space2stats_updated.parquet`. Any additional fields must be appended to this file. The Parquet file is tabular with the following columns:
- `hex_id`
- `{aggregation_method[sum, mean, etc.]}_{variable_name}_{year}`
- `{variable_name}_{aggregation_method[sum, mean, etc.]}_{year}`

### Database Setup

Expand All @@ -20,7 +20,7 @@ You can use a local database for this acceptance test by running the following c
docker-compose up
```

Alternatively, you can connect to a remote database, such as the Tembo database used for production.
Alternatively, you can connect to a remote database, such as the [Tembo database](reluctantly-simple-spoonbill.data-1.use1.tembo.io) used for production.

### Data Ingestion

Expand Down
18 changes: 0 additions & 18 deletions postgres/chunk_parquet.py
Original file line number Diff line number Diff line change
@@ -1,18 +0,0 @@
import os

import pandas as pd

chunk_dir = "parquet_chunks"
df = pd.read_parquet("space2stats_updated.parquet")
chunk_size = 100000 # Number of rows per chunk

if not os.path.exists(chunk_dir):
os.mkdir(chunk_dir)

for i in range(0, len(df), chunk_size):
chunk = df.iloc[i : i + chunk_size]
chunk.to_parquet(
os.path.join(chunk_dir, f"space2stats_part_{i // chunk_size}.parquet")
)

print("Parquet file split into smaller chunks.")
55 changes: 0 additions & 55 deletions postgres/load_parquet_chunks.sh

This file was deleted.

2 changes: 1 addition & 1 deletion postgres/load_to_prod.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ CHUNKS_DIR="parquet_chunks"

# Name of the target table
TABLE_NAME="space2stats"
PARQUET_FILE=space2stats_updated.parquet
PARQUET_FILE=space2stats.parquet

echo "Starting"

Expand Down

0 comments on commit 1ff3ada

Please sign in to comment.