Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database manager not working on local dev machine #2171

Closed
amykglen opened this issue Oct 18, 2023 · 33 comments
Closed

Database manager not working on local dev machine #2171

amykglen opened this issue Oct 18, 2023 · 33 comments
Assignees
Labels

Comments

@amykglen
Copy link
Member

So I'm finally getting my local dev machine updated with the latest databases after my summer away, and running python ARAX_database_manager.py on my machine in master fails to update databases and instead prints a bunch of errors like these:

kg2c_sqlite has a local version, '1.0_KG2.8.0.1', which does not match the remote version, '1.0_KG2.8.4'.
downloading remote version...
ERROR: Wanted to run the following rsync, but it isn't going to work anyway. Skipping: rsync -Lhzcvv --progress [email protected]:/home/rtxconfig/KG2.8.4/kg2c_v1.0_KG2.8.4.sqlite /Users/amyglen/Projects/RTX/code/ARAX/KnowledgeSources/KG2c/kg2c_v1.0_KG2.8.4.sqlite
Error downloading kg2c_sqlite leaving local copy.

I dug into ARAX_database_manager.py, and it looks like @edeutsch commented out the rsync command in August and replaced it with the 'ERROR: Wanted to run..' message here:
c87c238#diff-6c5ef8a6fc1a29e4d6b0769025a7776adf87952fcc3814d3d45b6e9015687b65R342

when I undo that change (uncomment the rsync line and remove the error line), the database manager works normally for me.

is there a reason why the rsync line is commented out in master? was there an issue with it on one of our instances or something?

@amykglen amykglen added the bug label Oct 18, 2023
@edeutsch
Copy link
Collaborator

Yes, there was a long saga over the summer here, that to make a long story short, the running ITRB instances were trying to download databases via this mechanism and this was causing big problems. I applied that quick hack, not realizing that others were using it.

In the mean time, I thought @saramsey had already fixed this issue more robustly. What happened to that? Is that languishing in a branch, or is there a needed command-line flag?

@amykglen
Copy link
Member Author

amykglen commented Oct 18, 2023

Ah, I see. What if we only skip the rsync line if rtxConfig.is_itrb_instance == False? I think then ITRB would still never run the rsync line, but things would still work on local dev machines?

(I think it's pretty important to have the database manager functioning for dev work - it's a huge mess otherwise when trying to run pytests)

@amykglen
Copy link
Member Author

apologies - skip the rsync line if rtxConfig.is_itrb_instance == True, not False

@edeutsch
Copy link
Collaborator

edeutsch commented Oct 18, 2023

I think we concluded that we don't want to be downloading databased on arax.ncats.io either because we do that manually as well. In fact, I thought the final conclusion was that don't ever want database downloading to be happening while ARAX is "running". And so I thought the decision was to only let the database manager perform downloads when run from the CLI.

And I thought @saramsey implemented this already. @saramsey ?

If not, I will just reverse my change. It has caused more thrashing than it avoided at this point.

@saramsey
Copy link
Member

saramsey commented Oct 18, 2023

Well, as far as I can tell, the issue with simply uncommenting that line of code is that ARAX_infer.py creates an instance of the NodeSynonymizer class,

self.synonymizer = NodeSynonymizer()

and the initializer for the NodeSynonymizer class conditionally instantiates the ARAXDatabaseManager class,

db_manager = ARAXDatabaseManager()
db_manager.update_databases()

which can attempt to download databases. I thought we had decided we did not want that to ever happen at query time (i.e., downstream of query_controller.py). So I had been meaning to modify ARAX_database_manager.py to have a special command-line switch and/or constructor parameter, and only ever download databases if it is run with that switch, and not just when update_databases() is run on a ARAXDatabaseManager(),

But maybe the simpler thing is just to change those two lines of ARAX_infer.py to just log a TRAPI error and give up. I'm not wild about downloading databases at query time; we should just have ARAX fail loudly if it is missing a database file. When we restart the application, it will download the databases at application start-up. Just my $0.02.

But that is why I didn't simply comment out that line in ARAX_database_manager.py. I think there may be some other changes we want to make. I'm happy to discuss it, of course.

@saramsey
Copy link
Member

See RTXteam/RTX issue 2098

@saramsey
Copy link
Member

Also consider that ARAX_filter.py has a method that instantiates a QueryGraphInfo object, and the latter class's initializer (as describe above) calls the database manager's update_databses method. So that's another potential case where ARAXDatabaseManager.update_databases could get called at query time, if we simply uncomment the rsync line of code.

@saramsey
Copy link
Member

My feeling is that NodeSynonymizer's initializer should just fail if a database file is missing. It should under no circumstances attempt to download database files. But that's not the case as things stand currently.

@saramsey
Copy link
Member

I think there may be more code paths at query time that indirectly could call ARAXDatabaseManager.update_databases(), but I would need to audit the code more carefully to be sure.

@saramsey
Copy link
Member

So, what do we want to do? Apologies I didn't push this (i.e., 2098) through earlier. But I guess it's probably for the best that we discuss the specific proposed solution before implementing.

@saramsey
Copy link
Member

saramsey commented Oct 18, 2023

I should say @amkglen that I have just locally uncommented that line of code, when I have wanted to install databases. Which is not ideal but was my work-around until we can get this issue resolved more definitively.

@edeutsch
Copy link
Collaborator

I propose that we just update the class so that it will only actually download files if run from the CLI, e.g.,

python3 ARAX_database_manager.py --rsync

Any other ways that the database manager is called, it will decline to do an actual rsync but be noisy about the fact that it got into a state where it would have.

@saramsey
Copy link
Member

I propose that we just update the class so that it will only actually download files if run from the CLI, e.g.,

python3 ARAX_database_manager.py --rsync

Any other ways that the database manager is called, it will decline to do an actual rsync but be noisy about the fact that it got into a state where it would have.

I'd also like it to rsync when it is called at Flask application start-up (we can have a special initializer parameter for that, of course)

@saramsey
Copy link
Member

Do you want me to code that up?

@saramsey
Copy link
Member

I can do it right now.

@saramsey
Copy link
Member

I'm working on this in branch issue-2171

@edeutsch
Copy link
Collaborator

okay, thanks.

saramsey added a commit that referenced this issue Oct 19, 2023
@saramsey
Copy link
Member

Can you all please check commit 63e0449 to see if it looks sane?

@saramsey
Copy link
Member

saramsey commented Oct 19, 2023

In that commit, I've done a few things:

  • used underscores to identify methods on ARAX_database_manager.py that should not be called from outside the module
  • added a boolean class member variable allow_downloads that controls whether downloads are allowed; it is set via a parameter to the class initializer
  • when the ARAX_database_manager.py module is run via the CLI, the allow_downloads boolean is set to True, so downloads are permitted; that variable is also set to True in the two Flask __main__.py modules where they instantiate the ARAXDatabaseManager.
  • the code in node_normalizer.py that creates an instance of ARAXDatabaseManager has been removed.

@edeutsch
Copy link
Collaborator

Can you all please check commit 63e0449 to see if it looks sane?

The line we wanted to uncomment is still commented out?

#os.system(f"rsync -Lhzc{verbose} --progress {remote_location} {local_path}")

saramsey added a commit that referenced this issue Oct 19, 2023
@saramsey
Copy link
Member

saramsey commented Oct 19, 2023

D'oh! (Homer Simpson impression)

Done

@saramsey
Copy link
Member

saramsey commented Oct 19, 2023

With commit 63918bb, it seems to work on my MBP.

@saramsey
Copy link
Member

Do you want to test the branch on /beta or something or just merge it and be legends?

@edeutsch
Copy link
Collaborator

Legends!

@saramsey
Copy link
Member

Merged.

#2173

@saramsey
Copy link
Member

Testing on arax.ncats.io in /test. See updates on Slack in #deployment.

@saramsey
Copy link
Member

Testing looks good. I have moved /test back to the latest master branch (which includes the issue-2171 commits) and restarted ARAX.

@saramsey
Copy link
Member

I'm now going to delete the issue-2171 branch.

@saramsey
Copy link
Member

Deleted. Closing this issue out for now (just reopen if anything looks amiss).

@saramsey
Copy link
Member

@amykglen can you confirm that with the latest code in master, this issue is fixed for you on your local machine?

@amykglen
Copy link
Member Author

amykglen commented Oct 19, 2023

@saramsey - just confirmed python ARAX_database_manager.py works on my machine now. thanks!

so just to clarify - the database manager is no longer run automatically when we run ARAX pytests (without running the whole Flask application), so we should run python ARAX_database_manager.py on dev machines when databases change?

@saramsey
Copy link
Member

saramsey commented Oct 19, 2023

the database manager is no longer run automatically when we run ARAX pytests (without running the whole Flask application), so we should run python ARAX_database_manager.py on dev machines when databases change?

@amykglen: Near as I can tell, the only way ARAXDatabaseManager::update_databases() was being automatically run before during the pytest suite would have been via ARAXQuery::__init__, is that right?

If so, you correct. That's gone, as of commit e528c6c. It happened on Aug. 29, when you were away.

I think (and others at the time concurred) that we don't want the database manager being run in the initializer for ARAXQuery. The update_databases method is slow and we would incur that tax on every single query. (I think?)

If you want the old automation back, it's perfectly fine to check in a shell script like RTX/code/ARAX/test/check-db-and-run-tests.sh that runs the database manager and then runs the tests. But my $0.02 is that I don't think the default behavior of the pytest suite should be to automatically download and update databases locally. When I run pytest, my expectation would be that it runs tests on whatever installation I have set up. Downloading databases can take hours over anemic home broadband, and I frequently test at home. Again, just my $0.02.

I know you have done a lot of work on expand which necessarily involves running the pytests a lot. So I'd really welcome your thoughts.

@amykglen
Copy link
Member Author

thanks @saramsey - no, I'm content running the database manager manually as needed. just wanted to clarify that that's the SOP now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants