-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support large(r) file uploads by directly uploading to S3 #2012
Conversation
server/util/file.py
Outdated
@@ -41,6 +41,7 @@ def s3(): # pylint: disable=invalid-name | |||
"s3", | |||
aws_access_key_id=config.AWS_ACCESS_KEY_ID, | |||
aws_secret_access_key=config.AWS_SECRET_ACCESS_KEY, | |||
region_name="us-west-1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to add this or the url returned in the get presigned post call will not be domain specific which causes errors. I assume this should be an env var?
@@ -237,7 +233,7 @@ const SelectSystemStep: React.FC<{ | |||
<HTMLSelect | |||
large | |||
onChange={setSystemType} | |||
value={systemType ?? undefined} | |||
value={systemType || undefined} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pre commit hook yelled about this line and made me change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird! I wonder how it got past the hook before
Overall feedback:
Python notes:
|
I like the endpoint names! Very clear. I also can't think of a way to make this fewer than 3 calls. And I like the idea of keeping the 3 calls for local file system uploads, too. To keep branching between local dev and prod as minimal as possible.
I like the idea of consolidating if easy to do so, again just to make local dev as similar to prod as possible.
This makes sense to me!
I like the idea of consolidating validation logic, probably in step 3. The only validation that I'd want to happen sooner is preventing wrong file types, and I think we already cover that on the frontend by virtue of the allowed file types that we specify when we instantiate the file input.
I've always used
I have in some cases used arlo/server/api/standardized_contests.py Lines 132 to 135 in ab2a972
Might help in some of the cases you're describing? But also okay with punting as Jonah suggested. To the questions re wrapper ZIPs, assumptions around Hart/ES&S, etc., I believe we covered all those in Slack but am happy to answer any follow up questions! |
fa3fd69
to
f79cacd
Compare
07c8327
to
1e3f911
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for splitting this up into easy to read commits! It was pretty straightforward to review.
I've got a bunch of small notes but nothing major (hopefully)
client/src/components/useCSV.ts
Outdated
try { | ||
const file = files[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're only supporting a single file, should we change the files: File[]
param to just be file: File
?
: { | ||
fileType: file.type, | ||
} | ||
const getUploadResponse = await axios( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, we only used axios
over fetch
in order to gain the ability to track upload progress. I can't remember how it impacts mocks and testing, but might consider switching all requests but s3 upload request to fetch
client/src/components/useCSV.ts
Outdated
`/api${url}/upload-complete`, | ||
addCSRFToken({ | ||
method: 'POST', | ||
data: finalizeUploadformData, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these endpoints should probably just take JSON for consistency with the rest of the API, since we don't need form data to send a file. Though I realize at this point that might be an annoying large change to make.
@@ -237,7 +233,7 @@ const SelectSystemStep: React.FC<{ | |||
<HTMLSelect | |||
large | |||
onChange={setSystemType} | |||
value={systemType ?? undefined} | |||
value={systemType || undefined} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird! I wonder how it got past the hook before
multiple: | ||
values.cvrFileType === CvrFileType.ESS || | ||
values.cvrFileType === CvrFileType.HART, | ||
multiple: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think this is false by default, and could be omitted now
server/api/cvrs.py
Outdated
@@ -1035,19 +1034,31 @@ def parse_hart_cvrs( | |||
|
|||
cvr_zip_files: Dict[str, BinaryIO] = {} # { file_name: file } | |||
scanned_ballot_information_files: List[BinaryIO] = [] | |||
hasNonCsvZipFiles = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove has
since this isn't a boolean
if len(cvr_zip_files) == 0 and len(scanned_ballot_information_files) == 0: | ||
cvr_zip_files[jurisdiction.cvr_file.name] = retrieve_file( | ||
jurisdiction.cvr_file.storage_path | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to download the file again in this case, or is there a way to reuse it since we already have it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the wrapper zip is immediately closed I didn't see a obvious way and wanted to err on the side of touching this code as little as possible haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could just do wrapper_zip.seek(0)
and move the closing of the file to later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filing a followup for this #2025
@@ -377,11 +365,12 @@ def test_dominion_cvrs_with_leading_equal_signs( | |||
rv = client.get( | |||
f"/api/election/{election_id}/jurisdiction/{jurisdiction_ids[0]}/cvrs" | |||
) | |||
print(json.loads(rv.data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove print
data={ | ||
"jurisdictions": ( | ||
# We expect the API to order the jurisdictions by name, so we | ||
# upload them out of order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe keep this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
server/tests/helpers.py
Outdated
) | ||
|
||
|
||
def setup_ballot_manifest_upload( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These helpers are great! One nit: they are doing more than just set up the upload, they are doing it, so might be slightly clearer to name them: upload_ballot_manifest
, etc.
1e3f911
to
3830ef4
Compare
a053980
to
023fd09
Compare
was hoping to merge tonight but there is some security issue I'm not understanding in my test deployment so will pick up tomorrow |
7f1eed4
to
ea5df28
Compare
…ndle error testing
ea5df28
to
2dcf903
Compare
#1674
Suppor large(r) file uploads by directly uploading to S3
In order to enable larger file uploads in arlo directly upload from the client to s3 rather then doing "passthrough" file uploads through our server. This is the heroku recommended approach and generally recommended by most other heroku/s3 users. There will still be a 5GB max limit above which we need to chunk the transfer to s3. The process involves
I landed on keeping the upload-url endpoint for step 1 rather then the other suggested approaches as different files have different authentication which directly relates to the file path that that endpoint needs to construct (it needs to know about the election_id and in some cases jurisdiction_id) so it seemed simplest to leave it like this.
This PR also changes CVR uploads to be 1 file and always be the "wrapper" zip we previously created for Hart/ESS. In the case of HART we now check if there were not any zip files inside of the top-level zip after unzipping to handle the situation where someone just uploads the zip of cvrs and does not "wrap" it in anything. There was some validation logic for hart around not having images or other file types in the zip that is now moves to happen in the processing step.
I recommend reviewing by commit, I tried to isolate the bigger test refactors but let me know if you want me to break up any commits further.
If we wanted to support multiple file uploads in the future to not change this I think we would need to: