Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start embedding service faster by including installation step in build #75

Merged
merged 6 commits into from
Aug 2, 2023

Conversation

pal03377
Copy link
Contributor

@pal03377 pal03377 commented Aug 1, 2023

This PR improves the startup speed of the Athena-CoFee embedding service. The current version is pretty slow on my machine because it always re-downloads spacy en_core_web_sm on start. This is now done in the build stage of the Docker container. This way, the embedding service can start faster and does not need a re-download each time it starts.

Also, the Docker build of the clustering service broke, which I also quickly fixed by updating the used version of hdbscan.

@pal03377 pal03377 self-assigned this Aug 1, 2023
@pal03377 pal03377 marked this pull request as draft August 1, 2023 19:58
pal03377 added a commit to ls1intum/Athena that referenced this pull request Aug 1, 2023
@pal03377 pal03377 marked this pull request as ready for review August 2, 2023 11:29
@maximiliansoelch maximiliansoelch merged commit 2dbf9c8 into master Aug 2, 2023
7 checks passed
@maximiliansoelch maximiliansoelch deleted the improvement/speed-up-embedding-start branch August 2, 2023 15:48
pal03377 added a commit to ls1intum/Athena that referenced this pull request Aug 6, 2023
* filter for english submissions for CoFee

* rename "content" to "text" on TextSubmissions for consistency with Artemis

* default value for exercise meta

* auto-convert camelCase fields to snake_case

* Authorization instead of X-API-Secret header

* make grading instructions optional

* submission selection: allow camelCase for submission_ids

* change feedback consumer to consume multiple feedbacks

* small fixes related to last commit (feedback consumer becomes feedbackS consumer)

* make feedback.text optional

* add text submission language field to DB

* fix warning log

* add language data to test exercises for CoFee to process them

* small fix for sending more feedback

* correct feedback metadata

* automatically store incoming submissions

* prevent the cofee module from throwing when there is no cluster for a block

* TEMPORARY, REMOVE THIS AFTER: make it easier to test CoFee locally

* undo last temporary changes

* Add cleanup step to deployment

* fix restarting docker script

* download cofee_traefik_config before starting

* only use GitHub docker image registry, not DockerHub

* download another necessary CoFee config file

* Add logs viewer

* fix http basic auth

* fix prospector checks

* change http basic auth

* fix caddyfile

* add postgres db

* add psycopg2 for postgres support

* re-install athena in all modules for psycopg2

* store more data when requesting an Athena endpoint

* cofee: store text clusters first to avoid FK contraint violations

* Merge branch 'develop' into artemis-athena-integration

* small ts fix

* Show 422 errors in the log

* poetry fixes

* make mongodb more quiet

* Change detection of `@config_schema_provider` to be one per worker, not one per app

* install psycopg2 again

* re-install athena in all modules in the hope to have postgres installed

* Remove config_schema_provider check entirely as it's not reliable

* allow sending `submissionIds` instead of `submission_ids` to the submission selection endpoint again

* remove available module name enum because it caused more problems than it solved

* make mongodb much less chatty by silencing it (except for errors)

* add grading_instruction

* fix playground auth header

* fix /feedbacks for playground

* fix another place of the /feedbacks endpoint

* make feedback title nullable (can be null in Artemis)

* don't store submissions before the exercise

* add script to get docker images to build

* make images to build script more sophisticated

* special image building rule for develop

* attempt at making it work with newline outputs

* Script improvements: Stop on error and Fetch develop to find the commit hash of the branch creation

* log executed commands

* error handling for no base commit

* change approach to change detection with GitHub API

* input env vars into build detection script

* only build anything if there is anything to build

* improvements on image building

* fix has_images value in action

* fix logic of images-to-build to detect existing images

* fix my variable declaration in bash

* don't override github output

* Get the current branch again

* remove wrong ifs

* test

* change IMAGE_EXISTS check

* correct endpoint for github images

* fix api endpoint

* change approach to checking package existence yet again

* fix token name 🤦

* adapt image name

* correct image name again for url

* test 2

* undo tests (should not build anything)

* If athena was changed, build all docker images starting with module_*

* athena test

* undo test

* improve changed files to only use the ones changed since the last push

* change 1

* change 2

* try fetching the commit

* simplify changed files getting

* use correct variable

* change 1

* change 2

* another change (athena)

* undo athena change

* Submission selection: Don't crash when an index is out of range

* temporarily use CoFee version from ls1intum/Athena-CoFee#75

* cofee: add log for number of returned feedback suggestions

* provoke error again when they should happen

* remove redundant code in /submissions endpoint

* small syntax improvement

* remove tag that's not needed any more (merged into develop in Athena-CoFee)

* simplify result processing

* remove trailing newline for prospector

* quickfix

* go back to old cluster and segments processing because the new approach crashed

* add explainer comment for return value

* small correction of last commit

* ensure correct order of cluster & block storage

* fix log placeholder

* add distance matrix problem log

* prevent server crash and instead just ignore invalid block distances

* store text block positions in cluster because I found out that they are actually used

* fix text block get syntax

* cofee: correctly connect clusters and segments

* correctly pass cluster IDs for processing

* small cluster IDs fix

* another attempt at getting the cluster IDs

* debugging log

* add cluster instead of merge

* undefined instead of null

Co-authored-by: Felix T.J. Dietrich <[email protected]>

---------

Co-authored-by: Felix T.J. Dietrich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants