Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo Cleanup #458

Open
alex-dewar opened this issue Aug 24, 2023 · 6 comments
Open

Repo Cleanup #458

alex-dewar opened this issue Aug 24, 2023 · 6 comments

Comments

@alex-dewar
Copy link
Contributor

ChipWhisperer is a fairly old project at this point and, as such, the repo has accumulated a lot of files and a large history. This has a lot of negative effects:

  • Downloading the repo takes a while
  • ChipWhisperer taking up so much room makes the installer and VM take up a lot more room
    • The VM now takes up so much room that it's over the Github limit for release files
    • These large file sizes increase build time, download time, install time, etc.
  • Some of the paths are a lot longer than they need to be. hardware/victims/firmware/* could be condensed into just target_firmware, for example
  • There's a lot of files here that most people probably don't care about - most people, for example, don't want to print ChipWhisperer-Lite PCBs or rebuild the FPGA/microcontroller firmware

It would therefore be beneficial if we could archive most of that history/those files and start fresh. The archive should be fairly simple; just make a new repo (maybe chipwhisperer-historical) and point a local version there.

For the new chipwhisperer repo, one option would be to start completely fresh; move all the desired files into a new repo and point that here. However, it would be nice if we could keep the history of all the non-NewAE contributions and just squash everything else to reduce space.

@colinoflynn
Copy link
Contributor

There may be some not-horrible ways too. I tried running git-filter-repo analyze which showed close to 400MB of deleted directories (in the packed size). That with killing old branches might go pretty far. It looks like a lot of the FPGA implementation files for e.g. CW305 are included which most people don't care about too, so could save some space there removing them (or moving).

We could also consider moving the ChipWhisperer python stuff to a separate repo... a long discussed option but would also need more consideration, as may be an even more breaking change.

blob-shas-and-paths.txt
directories-all-sizes.txt
path-deleted-sizes.txt
path-all-sizes.txt
extensions-deleted-sizes.txt
extensions-all-sizes.txt
directories-deleted-sizes.txt

@jpcrypt
Copy link
Contributor

jpcrypt commented Aug 28, 2023

Moving all the FPGA target stuff to its own repo makes sense and would give a substantial reduction.

@alex-dewar
Copy link
Contributor Author

alex-dewar commented Sep 5, 2023

Seems like the archive import will be even easier than I thought. There's a github option when making a new repo to import another. Will have to see how this works, but hopefully it grabs all branches and stuff there currently.

EDIT: Yeah, looks like importing preserves everything, including commits/branches/etc.

@alex-dewar
Copy link
Contributor Author

alex-dewar commented Sep 5, 2023

git-filter-repo also seems to work very well. By deleting the CW305 files and removing the history of every deleted file, I was able to get the repo size down to ~400MB. This is down from roughly 1.7GB. It may be worth trying to squash all the hardware/cw305.py history down to a single commit as well. I'd guess that would save something like 100MB

I assume that we can make similar gains on chipwhisperer-jupyter as well.

Useful link for this: https://stackoverflow.com/questions/63496368/git-how-to-remove-all-files-from-the-git-history-that-are-not-currently-prese

@alex-dewar
Copy link
Contributor Author

For chipwhisperer-jupyter, it looks like there quite a bit to save, but the traces for the simulated versions of labs end up taking up a lot of space. Without the trace files, we can get the repo down <200MB.

@colinoflynn
Copy link
Contributor

That sounds pretty good! For the traces and similar - we could move them to some external location (either another repo or even off github). Would like something stable so github might make sense still, but they could get downloaded "on demand" if you actually need them (and not by default).

Had thought of this a little before, we could have some small Python module that deals with it like e.g.,

import chipwhisperer_traces as ct

traces = ct.sca101.etc

Would have to see if there is an easy module to do this for us, but basically idea could be that when you first access it then it actually downloads them, and caches them locally. Or you can force a download of everything (if for example you are running a training and want it all cached locally) with something like ct.download()

The main advantage of a complicated download system is it can be updated to be almost anything in the future. So could be another URL or even another system (e.g., eventually using a real database or simialr).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants