-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need faster deploys #160
Comments
@ronaldtse I got two questions:
|
No, as long as you can maintain it.
Great idea! GitHub now also supports environments - so that you can queue deploys that if one job is running, the other jobs are queued. In this case, we can use S3 Transfer Acceleration for the temporary bucket (as long as it does not contain '.' dots).
This is probably necessary in either case. The third option is to use AWS DynamoDB or MongoDB Cloud Atlas, which will be necessary for high frequency update workloads. |
https://github.com/cobbzilla/s3s3mirror seems to work for mirroring. |
I just found out that we could enable Transfer Acceleration if we rename the buckets to remove the dots. It's now possible to use an arbitrarily named S3 bucket as an origin for CloudFront, so we can use "example-com" instead of "example.com" as the bucket name. Let me see what we can do. |
Is this any expected? I though glossaries will not be updated very frequently. |
AWS docs say:
Doesn't sound like our case. |
Frequency: it’s also the burst frequencies, eg if people make subsequent changes quickly. I found a way to make transfer acceleration work with cloud front, but it requires a separate lambda@edge to return index.html in order to mimic S3 website functionality. In this case we may not need two buckets but let’s see. |
Wow, sounds like very different thing than deploys we have now. If burst updates can happen, then slow uploads aren't our only problem. Building the full site from scratch will be too slow too. Note that IEV has 20k concepts or so. We need some kind of incremental site builds in GHA to handle burst updates. Or throttling, or debouncing. |
Also, we need to prevent race conditions between deploys. |
I'm not sure what exactly Paneron will be responsible for when it comes to site generation, so this may be a silly idea: We can use Paneron to generate concept pages, and then use Jekyll to bind them into a site. Jekyll supports incremental site generation, so if we modify a few files only, then it should finish quite fast. Then we need to upload these modified files without touching the others — maybe Obviously that won't speed up full site rebuilds which we need too. |
My new idea involves persisting generated site across builds. This is going to be a separate Git repo (maybe hosted on GitHub, maybe existing just in GHA cache, it doesn't really matter) because I don't trust file timestamps as much as commit dates. File modification timestamp can be updated for any reason whereas git commit date means actual change to file contents. In steps (all done in GHA):
This approach should greatly reduce deploy time as compared to |
@skalee I think a more comprehensive approach is needed for S3 bucket sync; synching unchanged items is clearly not desired. A possible mechanism is to maintain a hash index at the root (with hash keys of all files), which is updated by some cron/lambda function, so that when we upload something we can match up which files need (or not) updating. |
FYI I've just triggered re-deploy on iev-demo-site and it's slow again, despite the facts that nothing was changed and that most files are identical. |
Deploying IEV site took over an hour, most of which (50 minutes) was spent on sending produced files to S3. We need to speed it up.
Currently we deploy with our custom Rake task defined here: https://github.com/geolexica/geolexica-server/blob/master/lib/tasks/deploy.rake. Under the hood it uses
s3 sync
, an official AWS tool.Some ideas how to deal with that can be found in glossarist/iev-demo-site#66.
The text was updated successfully, but these errors were encountered: