Problem with Retrain DL in Quickannotator #13

anuradhakar49 · 2021-10-17T17:46:08Z

Hi,
The installation of the tool runs smoothly as described in the Github repository but I am encountering problems with retraining the deep learning model. For example, after adding 2 pairs of images in a new project, making patches and annotations and uploading them as training and test images, if we click "Retrain model" on the Project page, I am getting the ERROR: train_autoencoder (job N) failed. On the Annotations page, clicking the "Retrain DL" button displays an HTML error.

Please provide suggestions on how to resolve these errors.
Anuradha Kar

choosehappy · 2021-10-19T12:13:51Z

Can you please provide the log files showing what the exact error is? Unfortunately this information is too high-level for us to provide any insights

…

On Sun, Oct 17, 2021 at 7:46 PM Anuradha Kar ***@***.***> wrote: Hi, The installation of the tool runs smoothly as described in the Github repository but I am encountering problems with retraining the deep learning model. For example, after adding 2 pairs of images in a new project, making patches and annotations and uploading them as training and test images, if we click "Retrain model" on the Project page, I am getting the ERROR: train_autoencoder (job N) failed. On the Annotations page, clicking the "Retrain DL" button displays an HTML error. Please provide suggestions on how to resolve these errors. Anuradha Kar — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#13>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACJ3XTFJJTOXKGPSG63EPU3UHMDWXANCNFSM5GFBYJCA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

mariokreutzfeldt · 2021-11-25T13:00:06Z

Hi @anuradhakar49 and @choosehappy

could you solve the issue? I'm having the same problem:

2021-11-25 13:54:10,872 [INFO] (THREAD 18304) About to train a new transfer model for try2
2021-11-25 13:54:10,887 INFO sqlalchemy.engine.base.Engine ROLLBACK
2021-11-25 13:54:10,887 [INFO] (THREAD 18304) ROLLBACK
2021-11-25 13:54:10,888 [INFO] (THREAD 18304) 127.0.0.1 - - [25/Nov/2021 13:54:10] "GET /api/try2/retrain_dl?frommodelid=0 HTTP/1.1" 404 -

System:
Win10, python 3.8, cuda 10.2

Best regards,
Mario

choosehappy · 2021-11-25T15:28:06Z

Sorry to hear this Mario!

Is this information you're putting here from the command line itself, or is it coming from the log file?

If you can send over the entire associated log file that would be appreciated

In the end, we were able to fix anuradhakar49's problem, it was environmental. if I remember correctly it was an incompatible cuda driver + cuda version? @tasvora may have additional info

tasvora · 2021-11-25T15:32:51Z

Yes it was environment issue related to cuda, but did not get to look at it in detail as Anuradha decided to use Linux and it worked fine there. Regards Tasneem

…

On Thu, Nov 25, 2021 at 10:28 AM choosehappy ***@***.***> wrote: Sorry to hear this Mario! Is this information you're putting here from the command line itself, or is it coming from the log file? If you can send over the entire associated log file that would be appreciated In the end, we were able to fix anuradhakar49's problem, it was environmental. if I remember correctly it was an incompatible cuda driver + cuda version? @tasvora <https://github.com/tasvora> may have additional info — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMTB5DQ57VHROZ2KYZIFHXLUNZIZFANCNFSM5GFBYJCA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

anuradhakar49 · 2021-11-25T15:41:37Z

Yes this issue is solved and was linked to cuda +torch versions. @mariokreutzfeldt Please check if you have a cuda compatible GPU and that your code is being able to access the GPU (i.e the GPU is not busy with another task) . Also make sure the pytorch version is compatible with cuda 10.2 (https://pytorch.org/get-started/previous-versions/) Else try a reinstall with torch CPU only version to test.

mariokreutzfeldt · 2021-11-26T11:32:22Z

Dear all,
thank you for your fast replies!!

I have verified the CUDA installation via nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Wed_Oct_23_19:32:27_Pacific_Daylight_Time_2019 Cuda compilation tools, release 10.2, V10.2.89

and pytorch installation via torch.cuda.is_available()
true

During installation of QA I ran into many unresolvable version issues.
So I ended up installing the following.

numpy==1.17.3 Flask_SQLAlchemy==2.4.0 scikit_image==0.16.2 scikit_learn==0.24.0 opencv_python_headless==4.1.2.30 scipy==1.4.1 requests==2.22.0 SQLAlchemy==1.3.5 tensorboard==2.4.1 ttach==0.0.2 albumentations==0.4.3 config==0.4.2 Flask==1.0.3 Pillow==8.1.2 llvmlite==0.34.0 numba umap-learn Flask_Restless==0.17.0 python-openslide==1.1.2

For Pytorch I had the automatic installation already fail for another project, so I downloaded the packages manually.
torch 1.8.1+cu102
torchaudio 0.10.0+cu102
torchvision 0.9.1+cu102

I installed torch first. When I installed torchaudio and torchvision it would deinstall torch and replace it with a non-cuda version.
So I installed torch+cu102 again after having installed torchaudio and torchvision.

@choosehappy, the complete log is here

Best regards,
Mario

mariokreutzfeldt · 2021-11-26T13:21:10Z

Quick additional info:
replacing the CUDA with CPU versions of pytorch did not solve it.
Still getting ERROR 404.

choosehappy · 2021-11-26T13:38:25Z

it does like this environment is really going to be the issue. those libraries have been tested to work together and is what is used to create e.g., our docker files unfortunately this log file doesn't appear to contain anything interesting. can you as well upload all data.* files? there might be up to 3 of them: data.db, data.db-shm, data.db-wal

…

On Fri, Nov 26, 2021 at 2:21 PM mariokreutzfeldt ***@***.***> wrote: Quick additional info: replacing the CUDA with CPU versions of pytorch did not solve it. Still getting ERROR 404. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACJ3XTFPIFJZHJBYPDIYKLDUN6CVFANCNFSM5GFBYJCA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

mariokreutzfeldt · 2021-11-26T15:11:46Z

@choosehappy here you go.

Doesn`t contain data.db-wal because the file was 0kb.

choosehappy · 2021-11-26T15:49:04Z

Okay, this database looks like it was cleaned out It looks like you restarted quick annotator after you had the error, which by default goes through and clears out old jobs Can you set this line: https://github.com/choosehappy/QuickAnnotator/blob/7cf55b1939fc9ad73ccf6d5435b613bfb697c74c/config/config.ini#L7 to False reproduce your error and send back over?

…

On Fri, Nov 26, 2021 at 4:11 PM mariokreutzfeldt ***@***.***> wrote: @choosehappy <https://github.com/choosehappy> here you go <https://www.dropbox.com/t/wwWRuHA61zpwkpTn>. Doesn`t contain data.db-wal because the file was 0kb. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACJ3XTEGSAAY67PI26OQC6DUN6PTZANCNFSM5GFBYJCA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

tasvora · 2021-11-26T16:08:38Z

Also in addition to that. If you could copy everything that you see on your console where u initiating the quick annotator application from and save it as a text file and send that too would help too, may be there is a specific library error we might be missing. Regards Tasneem On Fri, Nov 26, 2021 at 10:49 AM choosehappy ***@***.***> wrote:

…

Okay, this database looks like it was cleaned out It looks like you restarted quick annotator after you had the error, which by default goes through and clears out old jobs Can you set this line: https://github.com/choosehappy/QuickAnnotator/blob/7cf55b1939fc9ad73ccf6d5435b613bfb697c74c/config/config.ini#L7 to False reproduce your error and send back over? On Fri, Nov 26, 2021 at 4:11 PM mariokreutzfeldt ***@***.***> wrote: > @choosehappy <https://github.com/choosehappy> here you go > <https://www.dropbox.com/t/wwWRuHA61zpwkpTn>. > > Doesn`t contain data.db-wal because the file was 0kb. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > < #13 (comment) >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/ACJ3XTEGSAAY67PI26OQC6DUN6PTZANCNFSM5GFBYJCA > > . > Triage notifications on the go with GitHub Mobile for iOS > < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 > > or Android > < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub >. > > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMTB5DSRLZCQ6UKRWSK4ELLUN6T7XANCNFSM5GFBYJCA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

mariokreutzfeldt · 2021-11-26T22:01:51Z

Here are the log files and the data.db after changing the config.
I am using the CPU version of pytorch now and have seen that one project is giving me a "not enough training/test images"..which makes sense. The second project is still giving error 400.

choosehappy · 2021-11-29T15:08:40Z

hmm...i think we'll have to jump on a call, these log files and database seem to indicate that things are working as expected : )

mariokreutzfeldt · 2021-12-06T08:03:23Z

Thank you @choosehappy and @tasvora for helping solve this issue!
In case someone else is having this problem, it turned out that I had a broken svml_dispmd.dll (730kb instead of 18MB).
Also, make sure scikit-image==0.18.1 is installed.

Best regards,
Mario

stellaqu123 · 2022-06-09T08:53:56Z

Hi @choosehappy and @mariokreutzfeldt,
I have the same problem about Retrain DL in Quickannotator. After annotating a patch, when I ran Retrain DL -From base, I got error message like "ERROR 404: (Unknown error)". The shotcut is as below
. The console log is like
"2022-06-09 08:49:11,130 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2022-06-09 08:49:11,130 [INFO] (THREAD 139621868058368) BEGIN (implicit)
2022-06-09 08:49:11,131 INFO sqlalchemy.engine.base.Engine SELECT project.id AS project_id, project.name AS project_name, project.description AS project_description, project.date AS project_date, project.train_ae_time AS project_train_ae_time, project.make_patches_time AS project_make_patches_time, project.iteration AS project_iteration, project.embed_iteration AS project_embed_iteration
FROM project
WHERE project.name = ?
LIMIT ? OFFSET ?
2022-06-09 08:49:11,131 [INFO] (THREAD 139621868058368) SELECT project.id AS project_id, project.name AS project_name, project.description AS project_description, project.date AS project_date, project.train_ae_time AS project_train_ae_time, project.make_patches_time AS project_make_patches_time, project.iteration AS project_iteration, project.embed_iteration AS project_embed_iteration
FROM project
WHERE project.name = ?
LIMIT ? OFFSET ?
2022-06-09 08:49:11,131 INFO sqlalchemy.engine.base.Engine ('test1', 1, 0)
2022-06-09 08:49:11,131 [INFO] (THREAD 139621868058368) ('test1', 1, 0)
2022-06-09 08:49:11,131 [INFO] (THREAD 139621868058368) About to train a new transfer model for test1
2022-06-09 08:49:11,131 [INFO] (THREAD 139621868058368) About to train a new transfer model for test1
2022-06-09 08:49:11,132 INFO sqlalchemy.engine.base.Engine ROLLBACK
2022-06-09 08:49:11,132 [INFO] (THREAD 139621868058368) ROLLBACK
2022-06-09 08:49:11,132 [INFO] (THREAD 139621868058368) 124.126.17.86 - - [09/Jun/2022 08:49:11] "GET /api/test1/retrain_dl?frommodelid=0 HTTP/1.1" 404 -"
According to your previous talk recordings, I checked my cuda version and pytorch version, which is compatible. pytorch installation via torch.cuda.is_available()
true.
Hoping I could get help about this issue.
Best regards,
Xiaoping

choosehappy · 2022-06-09T11:26:35Z

we can start by collecting more information:

operating system + version
python version
pip freeze output
cuda version
Nvidia GPU version

stellaqu123 · 2022-06-10T02:36:01Z

Sure.

operating system + version
I use Amazon EC2 linux system. By using command "cat /proc/version", the version is "Linux version 4.14.238-125.422.amzn1.x86_64 (mockbuild@koji-pdx-corp-builder-64004) (gcc version 7.2.1 20170915 (Red Hat 7.2.1-2) (GCC)) Error when uploading a completed annotation to db #1 SMP Tue Jul 20 20:51:46 UTC 2021".
python version
python 3.8.13
pip freeze output
the output is here,
pip_freeze_output.txt
cuda version
with command "nvcc --version", the information is as below
"
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
"
5.Nvidia GPU version
NVIDIA-SMI 450.142.00 Driver Version: 450.142.00 CUDA Version: 11.0
T4
torch version and cuda version
torch version: 1.8.1+cu111
torch.cuda.is_availabel() return True

choosehappy · 2022-06-10T17:54:28Z

hmmm!! this all looks very reasonable!

is there any additional information in the console window at the top of the screen on the right?

In looking at the API itself and the console information you provided, the only 404 message that seems reasonable is here:

QuickAnnotator/QA_api.py

Line 147 in cafc757

error=f"Deep learning model {frommodelid} doesn't exist"), 404

This would seem to suggest that you don't have a base model already trained? is that the case?

if you look here:

https://github.com/choosehappy/QuickAnnotator/wiki/Image-List-Page

did you use the "3. (re)train model 0" button?

this step is needed to give good default weights

stellaqu123 · 2022-06-14T10:30:48Z

Thanks @choosehappy .
I didn't use "3.(re)train model 0 "button before.
When I use "3.（re)train model 0" button, I got error message in console, which is like
"
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:

Downgrade the protobuf package to 3.20.x or lower.
Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
".
After downgrade protobuf package to 3.19.1, “3 (re)train model 0” and Retrain DL function work. The problem is solved.
Thanks for your help! 👍

choosehappy · 2022-06-14T21:00:16Z

Fantastic! so you're all set? did you encounter this problem when using the provided docker file, or you were using in your own base operating system?

…

On Tue, Jun 14, 2022 at 12:31 PM stellaqu123 ***@***.***> wrote: Thanks @choosehappy <https://github.com/choosehappy> . I didn't use "3.(re)train model 0 "button before. When I use "3.（re)train model 0" button, I got error message in console, which is like " TypeError: Descriptors cannot not be created directly. If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0. If you cannot immediately regenerate your protos, some other possible workarounds are: 1. Downgrade the protobuf package to 3.20.x or lower. 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower). ". After downgrade protobuf package to 3.19.1, “3 (re)train model 0” and Retrain DL function work. The problem is solved. Thanks for your help! 👍 — Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACJ3XTGYXEQZ62ZCBQ7SRT3VPBNOHANCNFSM5GFBYJCA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

stellaqu123 · 2022-06-23T09:21:00Z

yes. I could use Quickannotator Retrain DL function.
I did't use docker. I just installed this package in my operating system.

choosehappy · 2022-10-11T08:42:35Z

Got it, thanks Yes, protobuf can be a tricky one to maintain at the os level :)

…

On Thu, Jun 23, 2022, 11:21 stellaqu123 ***@***.***> wrote: yes. I could use Quickannotator Retrain DL function. I did't use docker. I just installed this package in my operating system. — Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACJ3XTC6LEVU4RPFDE7QCDDVQQUARANCNFSM5GFBYJCA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with Retrain DL in Quickannotator #13

Problem with Retrain DL in Quickannotator #13

anuradhakar49 commented Oct 17, 2021

choosehappy commented Oct 19, 2021 via email

mariokreutzfeldt commented Nov 25, 2021

choosehappy commented Nov 25, 2021

tasvora commented Nov 25, 2021 via email

anuradhakar49 commented Nov 25, 2021

mariokreutzfeldt commented Nov 26, 2021

mariokreutzfeldt commented Nov 26, 2021

choosehappy commented Nov 26, 2021 via email

mariokreutzfeldt commented Nov 26, 2021

choosehappy commented Nov 26, 2021 via email

tasvora commented Nov 26, 2021 via email

mariokreutzfeldt commented Nov 26, 2021

choosehappy commented Nov 29, 2021

mariokreutzfeldt commented Dec 6, 2021

stellaqu123 commented Jun 9, 2022

choosehappy commented Jun 9, 2022

stellaqu123 commented Jun 10, 2022

choosehappy commented Jun 10, 2022

stellaqu123 commented Jun 14, 2022

choosehappy commented Jun 14, 2022 via email

stellaqu123 commented Jun 23, 2022

choosehappy commented Oct 11, 2022 via email

Problem with Retrain DL in Quickannotator #13

Problem with Retrain DL in Quickannotator #13

Comments

anuradhakar49 commented Oct 17, 2021

choosehappy commented Oct 19, 2021 via email

mariokreutzfeldt commented Nov 25, 2021

choosehappy commented Nov 25, 2021

tasvora commented Nov 25, 2021 via email

anuradhakar49 commented Nov 25, 2021

mariokreutzfeldt commented Nov 26, 2021

mariokreutzfeldt commented Nov 26, 2021

choosehappy commented Nov 26, 2021 via email

mariokreutzfeldt commented Nov 26, 2021

choosehappy commented Nov 26, 2021 via email

tasvora commented Nov 26, 2021 via email

mariokreutzfeldt commented Nov 26, 2021

choosehappy commented Nov 29, 2021

mariokreutzfeldt commented Dec 6, 2021

stellaqu123 commented Jun 9, 2022

choosehappy commented Jun 9, 2022

stellaqu123 commented Jun 10, 2022

choosehappy commented Jun 10, 2022

stellaqu123 commented Jun 14, 2022

choosehappy commented Jun 14, 2022 via email

stellaqu123 commented Jun 23, 2022

choosehappy commented Oct 11, 2022 via email