Skip to content

Files

Latest commit

14e4d39 · Mar 15, 2025

History

History
215 lines (161 loc) · 10.1 KB

project.md

File metadata and controls

215 lines (161 loc) · 10.1 KB

Project

As stated in the course description:

Over the semester, students will build a complex end-to-end data system.

You'll be building a live dashboard, with all the infrastructure behind it:

  • Automated data ingestion
  • A database
  • Web-based interactive data visualization

All of this will be in the cloud.

Inspiration

Expectations

Part 1

Goals

Your group will pick an initial:

  • Problem space
  • Dataset

Part of this project is getting experience with automated data ingestion. Doing so is more interesting with data that changes regularly. You can incorporate additional datasets in the future.

Steps

Do the following as a group:

  1. Discuss what you'd like your project to focus on. Don't need to get too specific yet.
  2. Explore datasets that are updated weekly (the more often, the better) and pick one.
  3. Create a new notebook in Google Colab.
  4. Ensure you can load the data.
  5. Narrow down on 1-3 research questions.
    • In other words, at the end of this project, what do you want to be able to show?
  6. Draw an example visualization that you'd like to produce.
    • You can do so digitally or on a piece of paper.
    • Include a title, legend, and axes labels (where appropriate).
    • This is just a sketch; don't worry about the specific values.

Proposal

You will then submit the following to the Discussion on Ed:

  • What dataset are you going to use?
    • Please include a link.
  • What are your research question(s)?
    • It should be specific, and objectively answerable through the data available.
  • What's the link to your notebook?
    • Go to Share -> General access -> LionMail -> Commenter.
  • What's your target visualization?
    • Include a picture.
  • What are your known unknowns?
  • What challenges do you anticipate?

Only one person from your group needs to submit. None of this is set in stone long term, it is just a starting place. It can all be changed later.

Part 2

Goal: Get experience with an application development framework

Steps

  1. Using your dataset from Part 1:
    1. Create a Streamlit app.
    2. Deploy the app.
    3. Add a visualization.
      • You can get fancy, but don't have to at this stage. Get something simple working first.
  2. Bring in a second relevant dataset. (This one doesn't need to be regularly updated.)
    • This can be shown on a separate page of your Streamlit app, or combined in a single visualization.
  3. Add the names of the people on your team to your Streamlit app homepage.
  4. Turn in the link to your live app via CourseWorks.

Tips

Part 3

Goal: Get experience with unit testing

Steps

Work on branches and submit pull requests for the chunks of work — you decide what the "chunks" are.

  1. Without writing any code:
    1. Review your existing code.
      • What can be refactored into functions?
      • Where can we make our code DRY?
    2. Decide what function you're going to create.
    3. Come up with test cases (inputs) and expected outputs.
      • This can be in a text file, doc, piece of paper, etc.
  2. Then, as code:
    1. Write tests.
    2. Confirm they fail.
    3. Refactor your code into the function.
    4. Make the tests pass.
  3. Repeat until you feel your code is well-organized and well-tested.
  4. Submit the links to the pull requests via CourseWorks.

Outcome

As a result, your:

should be relatively short and easy to read.

This isn't a one-time thing; continue testing and refactoring as you continue with the Project.

Part 4

Retro

You will hold a team retrospective, with the goal of improving how your team works together. Since the groups are small, it can be fairly informal.

  1. Schedule 45 minutes for the retro.
    • The retro needs to be done live/synchronous, not asynchronous.
  2. Read about retros.
  3. Decide who will be the Facilitator.
    • Optional: Get someone from outside the team.
  4. Facilitator: Set up EasyRetro. Instructions.
  5. In the actual retro:
    1. Read the Agile Prime Directive out loud.
    2. 5 minutes: Individually write down "what went well" and "what could be better".
    3. 10-15 minutes: Discuss what has gone well.
    4. 20-25 minutes: Discuss what could be better.
    5. 5 minutes: Document takeaways / action items.

Analysis

  1. Move your Proposal to the Streamlit app as is.
  2. Revisit the Proposal.
    • Any new insights?
    • Anything you want to adjust?
  3. Document any changes to the Proposal on the Streamlit page.
  4. Proceed with the analysis.
    • If the majority of your code (to call APIs, etc.) is in modules/functions, it can be imported from a Jupyter notebook. You can do exploratory analysis there, moving things to modules/Streamlit as you go.
    • You might not be able to fully answer the question(s) yet, but get as close as you can.

At this point, your project should be looking more like one of the examples. Looking through the Streamlit data elements may be helpful.

Submit

Submit links to:

  • The EasyRetro board
  • Jupyter notebook(s), if any
  • The (updated) Streamlit app

Part 5

Goal: Understand how to work with a cloud-based database

Notes

  • A service account has been created in your Project for you. It has been given read-only access to BigQuery.
  • There are various things that can go wrong in these steps. Don't wait until the last minute.

Steps

  1. Install pandas-gbq.
  2. Load data.
  3. Have your app use BigQuery.
    1. Create a service account key as JSON. The service account is streamlit@[project].iam.gserviceaccount.com.
    2. Set up secrets management locally.
      • Make sure to add secrets.toml to your .gitignore so that you don't accidentally commit it to Git.
    3. Copy the key information to your secrets.toml file.
    4. Modify your app to read data from BigQuery.
    5. Copy the secrets to your deployed app.
    6. Re-deploy.
  4. Submit the links via CourseWorks for:
    • The pull request(s)
    • The link to your live Streamlit app