Skip to content

CalStatePays/calstatepays_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

8f526a3 · Mar 13, 2023

History

7 Commits
Nov 2, 2022
Nov 2, 2022
Mar 13, 2023

Repository files navigation

CalStatePays Data Repository

This houses the https://calstatepays.org data in csv form.

Introduction

CalStatePays is a visualization application for discovering, exploring, and analyzing the potential student financial earnings after graduation from 6 different California State Universities (CSU). California State employment records associated with alumni from these CSU campuses are used as the bases for the information that is presented.

Database sources

The raw CSV files are provided by the CalStatePays data team, which differs from the development team. The CSV files are a filtered down dataset of the full employment data from the state of California. More information on how the data is retrieved and presented can be found on the CalStatePays FAQ section. These filtered CSV files need to be converted to appropriate JSON files so that the Laravel seed operation can load the data into the database. The following steps need to be performed when we receive updated CSV files.

  • First we need to pull the CalStatePays source code from its repository and setting up the .env file.
$ git clone https://github.com/CalStatePays/calstatepays.git
$ cd calstatepays
$ cp .env.dev .env

⚠️ Important: The steps described above assume that your machine is already set-up for development purposes. Please visit the source code repository of CalStatePays https://github.com/CalStatePays/calstatepays for more information.

Once we have the source code downloaded to our machine, we need to tell git that this repository is a submodule of the CalStatePays source code. For more information on how git submodules work please see the git documentation found here https://git-scm.com/book/en/v2/Git-Tools-Submodules

  • Run the following commands to declare this repository as a git submodule:
# Let's make sure we're on the project's root directory
$ cd calstatepays
$ git submodule add https://github.com/CalStatePays/calstatepays_data.git python_parser
# Place the updated CSV files into the following directory calstatepays/python_parser/python_parser_work_in_progress/csv

After successfully creating a git submodule, we can then proceed to run the python script that will do the necessary filtering to the dataset in the CSV files and produce a JSON representation of the dataset.

  • To run the python script we must first start up our docker containers by running the following command:
$ docker-compose up --detach

From here we need to start a shell inside the docker container that runs our web app since it has python already installed.

  • To do this we must run:
docker exec -it calstatepays_web /bin/bash

Once we have successfully started a shell inside the appropriate docker container, we then issue the following commands to run the python script:

$ cd python_parser/python_parser_work_in_progress
$ python3.6 python_parser_main.py

⚠️ Important: This command takes a very long time to run so sit back and relax while the python script completes.

After the python script completes, we can focus on importing the new data to the database using the Laravel seed operation.

  • We issue the following commands to run the seed operation:
# assuming we are just finished running the python scripts from
# the docker container...
# We want to navigate to the root directory of the calstatepays repo by
# going up two directories.
$ cd ../../
# We drop any existing tables and data to start fresh
$ php artisan migrate:refresh --seed

⚠️ Important: This command takes a very long time to run so sit back and relax while the seed operation completes.

The next step is to clear the application’s cache

  • To clear the application’s cache run the following command:
$ php artisan cache:clear
# exit out of the docker container
$ exit

Congratulations! You have successfully performed a data update to CalStatePays. The final step is to commit the changes to the JSON files that were generated by the python scripts to GitHub.

  • Run the following commands to commit your changes to GitHub:
$ git commit -am "Updated the data"
$ git push

Releases

No releases published

Packages

No packages published

Languages