Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
daranzolin committed Nov 22, 2024
0 parents commit bb09ad9
Showing 1 changed file with 39 additions and 0 deletions.
39 changes: 39 additions & 0 deletions .github/workflows/ttx_to_parquet.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: Download and Save Data as Parquet (DuckDB)

on:
schedule:
- cron: "0 1 * * *"
workflow_dispatch: # Allows manual triggering of the workflow

jobs:
process-data:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Install duckdb
run: |
sudo apt-get update
sudo apt-get install -y duckdb-cli
- name: Download ttx and write to parquet
run: |
DATA_URL="https://data.sfgov.org/resource/g8m3-pdis.json?$limit=9999999"
OUTPUT_FILE="data/ttx.parquet"
duckdb -c "copy (select certificate_number as ban, ownership_name, dba_name, cast(dba_start_date as date) as dba_start_date, cast(location_start_date as date) as location_start_date from read_json_auto('$DATA_URL')) TO '$OUTPUT_FILE' (FORMAT 'parquet');"
echo "Data saved to $OUTPUT_FILE"
- name: Configure
run: |
git config --global user.name "github-actions[bot]"
git config --global user.email "github-actions[bot]@users.noreply.github.com"
- name: Commit
run: |
git add data/ttx.parquet
git commit -m "update ttx"
git push

0 comments on commit bb09ad9

Please sign in to comment.