Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default auth_source to None instead of self.database #981

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

janosh
Copy link
Member

@janosh janosh commented Jul 23, 2024

using the database as the default authSource breaks jobflow-remote (running on AWS) from talking to a remote database. this PR changes the default to None, allowing MongoClient to use it's own default

i think this change is strictly speaking breaking but i suspect few users are relying on the current behavior since it's common practice to specify the authSource manually if it's not admin, which is the MongoClient default (docs)

authSource: The database to authenticate on. Defaults to the database specified in the URI, if provided, or to “admin”.


the error message is not helpful which is why it took me a while to troubleshoot this. commenting out the lines in my local code which are removed by this PR resolved this error

db_ids = submit_flow(
    flow, worker="aws-slurm", project=proj_name, resources=resources
)
>>> ---> 74     db_ids = submit_flow(
     75         flow, worker="aws-slurm", project=proj_name, resources=resources
     76     )

File /scratch/janosh/.venv/py312/lib/python3.12/site-packages/jobflow_remote/jobs/submit.py:79, in submit_flow(flow, worker, project, exec_config, resources, allow_external_references)
     73         raise ConfigError(
     74             f"Additional stores {missing_stores!r} are not configured for this project."
     75         )
     77 jc = proj_obj.get_job_controller()
---> 79 return jc.add_flow(
     80     flow=flow,
     81     worker=worker,
     82     exec_config=exec_config,
...
    245 elif code == 43:
    246     raise CursorNotFound(errmsg, code, response, max_wire_version)
--> 248 raise OperationFailure(errmsg, code, response, max_wire_version)

OperationFailure: Authentication failed., full error: {'ok': 0.0, 'errmsg': 'Authentication failed.', 'code': 18, 'codeName': 'AuthenticationFailed'}

Copy link

codecov bot commented Jul 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.76%. Comparing base (1188a57) to head (801d5c1).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #981      +/-   ##
==========================================
- Coverage   88.76%   88.76%   -0.01%     
==========================================
  Files          46       46              
  Lines        3979     3977       -2     
==========================================
- Hits         3532     3530       -2     
  Misses        447      447              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@janosh
Copy link
Member Author

janosh commented Jul 23, 2024

not sure who to ping here, best guess is @rkingsbury, @munrojm.

also @davidwaroquiers since it affects jf-remote and @mkhorton since it's breaking

@davidwaroquiers
Copy link
Contributor

@gpetretto I don't think this is a problem for jf remote. What do you think?

@gpetretto
Copy link
Contributor

Hi @janosh, does this break only in the case of jobflow-remote? Is there anything we could do there to avoid a breaking change in maggma? Or is this a generic problem when accessing a DB hosted on AWS with the MongoStore?

@janosh
Copy link
Member Author

janosh commented Jul 24, 2024

Or is this a generic problem when accessing a DB hosted on AWS with the MongoStore?

i only mentioned that our jf-remote is running on AWS for context (our MongoDB is not on AWS, it's running on a private cluster) but i don't think it's a necessary condition to experience this issue. authSource should be the name of the database that has the collection with the user credentials. if i understand correctly, any MongoDB that uses the default admin for holding the credentials collection (rather than the database where calculation results are stored) would experience this issue, no matter where it's running.

@janosh
Copy link
Member Author

janosh commented Jul 24, 2024

to elaborate, i think this is one of those cases where code that's trying to be helpful ends up causing more user surprise than if the code did nothing. if you have a non-default location for your credentials, then you're more likely to know you need to specify that in order for MongoClient to know where to look for it. but if you use the default and it's being overriden in deeply nested code, it takes a long time to figure out why the connection is failing

@rkingsbury
Copy link
Collaborator

Thanks for reporting @janosh . I tend to agree with you that this is probably a case where "trying to be helpful" is worse than just being more explicit about default behavior. I do have a few questions though:

  1. Can you elaborate on why this caused a problem for jf-remote? Since your PR is just changing a default, couldn't this be resolved by specifying 'admin' as the auth_source? (Or are you just saying that it was highly non-obvious that you had to do that?)

  2. If we are going to change the default here, would it make more sense to explicitly set the default to 'admin' rather than None? I actually can't tell from the MongoClient docs what the default setting of the authsource argument is (None or a str), though it does seem clear that it defaults to admin

  3. But if the default authsource is admin while the current behavior is to use the database itself, won't changing this break a lot of people's MongoStore that have currently been authenticating on the database itself instead of admin? (Possibly not, since your change doesn't trigger any test failures).

@janosh
Copy link
Member Author

janosh commented Jul 25, 2024

Since your PR is just changing a default, couldn't this be resolved by specifying 'admin' as the auth_source? Or are you just saying that it was highly non-obvious that you had to do that?)

yes and yes. the problem is not lack of workaround, it's the time needed to pinpoint the issue given it was 4 levels down in the import stack from the jobflow-remote submit_flow function which failed to authenticate with the database (even though pymongo.MongoClient connected just fine given the exact same credentials). i'm not familiar with the maggma code base so took me well over an hour to pinpoint the source of the connection issue. this PR is meant to remove this pitfall for the next person.

I actually can't tell from the MongoClient docs what the default setting of the authsource argument is (None or a str), though it does seem clear that it defaults to admin

i thought the MongoClient docs which i also quoted in the OP were pretty clear on this point. at least i found what i was looking for right away but maybe just got lucky with my google query. it does default to admin so no need to hard-code the default in maggma:

authSource: The database to authenticate on. Defaults to the database specified in the URI, if provided, or to “admin”.

But if the default authsource is admin while the current behavior is to use the database itself, won't changing this break a lot of people's MongoStore that have currently been authenticating on the database itself instead of admin? (Possibly not, since your change doesn't trigger any test failures).

potentially yes. my suspicion is that few people rely on this default but i don't have data to back that up (other than like you said no tests breaking and people who know their authSource is non-default being more likely to specify it explicitly in which case changing the default makes no difference)

@davidwaroquiers
Copy link
Contributor

davidwaroquiers commented Jul 25, 2024

I am also in favor of @janosh proposal. Using the database itself as the place to store the credentials is to me not the best approach. It means that when you have two databases, you basically have to define two users (of the same name) in the two different databases, and they can have different passwords ... Not very handy ... I also had a partner who did not found how to set the auth_source (he was using admin), so by trial and error, he found that it works when uses the database itself and then went on with this... I then suggested him to use admin (because of the above) and told me that it did not work and he didn't know why. So I told him that he could pass auth_source="admin" in maggma stores. I think it should be the default (either None, so that Mongo's default is used, i.e. admin, or directly admin as a default), even if it may be backward incompatible for some people (but they just have to change their configuration files).

@rkingsbury
Copy link
Collaborator

the problem is not lack of workaround, it's the time needed to pinpoint the issue

Got it! So in your case, you were using a separate admin database for authentication, and because that is the MongoDB default, it wasn't obvious that you had to set a non default kwarg in MongoStore. Do I have that right?

i thought the MongoClient docs which i also quoted in the OP were pretty on this point. at least i found what i was looking for right away but maybe just got lucky with my google query. it does default to admin so no need to hard-code the default in maggma:

Yes I should have been more specific - thank you for quoting and linking the docs. I read those and the default behavior of MongoClient is clear. What I can't tell is whether in the call signature to MongoClient, the default value of authsource is None or "admin" or something else. It's a minor point, but if we are going to change the Store default behavior to better harmonize with MongoClient, I think we ought to use the same semantics to minimize confusion for those coming from MongoClient

potentially yes. my suspicion is that few people rely on this default

My suspicion is that few people specify auth_source unless they have a special requirement to do so. For example in the MongoDB example in the docs we don't do this, and I don't recall ever having set this myself. So if we change the default, I think we will be effectively switching the auth source from the database itself to admin for a lot of users. I don't know enough about Mongo to know whether to expect that to "just work" or if that will require some change in their DB setups. Do any of you know, or could you test?

Unless we are 100% sure switching auth to admin should "just work", I'm thinking maybe we should include a warning when auth_source=None notifying users that the behavior has changed, and they might need to set auth_source equal to their database name.

@davidwaroquiers
Copy link
Contributor

Unless we are 100% sure switching auth to admin should "just work", I'm thinking maybe we should include a warning when auth_source=None notifying users that the behavior has changed, and they might need to set auth_source equal to their database name.

It took me some time to completely understand how MongoDB works for this. Short answer, switching authentication to admin will work but of course you have to re-add the user/credentials inside admin (the other option is for the user to set auth_source to the database name, the warning you mention should indeed be done and some changelog message to warn the users for this change in the new maggma version).

Basically, if it is the first database of a user, e.g. user_A, you add this user inside the admin database (with its password or other authentication method), providing him access to database_1. Then if you want to give him access to database_2, you add within the admin database the access of user_A to the database_2, but you don't need to set the authentication method again. In the case you define the authentication of a user in the database itself, you have to "duplicate" this user in each of the database he needs access to. And the point is these users are NOT the same users in a sense, they just share the same name but exist in different databases. MongoDB is actually quite flexible on this (too flexible imho) in that you could even define the authentication of user_A in database_1 for access to database_2 and vice-versa ... well I don't know who would be that crazy :D

Note that the recommended (and I would say most logical) way (and also the default in MongoDB and pymongo) is to use the admin database for authentication.

My suspicion is that few people specify auth_source unless they have a special requirement to do so. For example in the MongoDB example in the docs we don't do this, and I don't recall ever having set this myself. So if we change the default, I think we will be effectively switching the auth source from the database itself to admin for a lot of users. I don't know enough about Mongo to know whether to expect that to "just work" or if that will require some change in their DB setups. Do any of you know, or could you test?

I'm always setting auth_source in maggma stores (anywhere I'm using it, be it for us or else) because I want to use admin as you probably guessed from my comments :)

@rkingsbury
Copy link
Collaborator

Thanks all. I agree this should be changed, the question is how to do it in the least intrusive way possible.

  • @janosh - I was looking back at how my connections were set up for NERSC-hosted Mongo, and they always used the database itself as the auth_source (because I never set auth_source). I no longer have access to those databases, but could you or someone in Kristin's group please test and see what happens to an existing, working MongoStore that connects to NERSC if you change auth_source from None o admin? I suspect the lion's share of maggma users are either MP-associated staff working with Atlas or researchers at LBNL using NERSC hosted databases, so I want to fully understand how the change will impact them before I release this.

Based on how that goes, let's please add a warning when auth_source is None that explains the change in default behavior. Be explicit about what a user would need to do to go back to the old behavior (e.g., "If you are getting an Authentication Error, try setting auth_source ...")

@janosh
Copy link
Member Author

janosh commented Jul 26, 2024

I no longer have access to those databases, but could you or someone in Kristin's group please test and see what happens to an existing, working MongoStore that connects to NERSC if you change auth_source from None o admin?

I also no longer have access to NERSC. maybe @esoteric-ephemera could give this a try and report what happens when you apply the 2 line change in this PR to maggma and then try to ping a NERSC database.

I suspect the lion's share of maggma users are either MP-associated staff working with Atlas or researchers at LBNL using NERSC hosted databases, so I want to fully understand how the change will impact them before I release this.

i think so too. that said, i think maggma should aim to adopt the best default for a general audience and not override Mongo defaults in order to match infrastructural decisions in NERSC's Mongo setup

@rkingsbury
Copy link
Collaborator

i think so too. that said, i think maggma should aim to adopt the best default for a general audience and not override Mongo defaults in order to match infrastructural decisions in NERSC's Mongo setup

Agreed! I just want to make sure I fully understand the likely impact. Regardless I think the path forward will be to add a warning message.

@rkingsbury
Copy link
Collaborator

@janosh @materialsproject/materials-project-extended-team just resurfacing this discussion. Could someone with NERSC Mongo access let us know what the behavior is if you change auth_source from None to admin?

@janosh
Copy link
Member Author

janosh commented Aug 23, 2024

sorry, this feel off my radar. currently on vacation/traveling but it's not urgent anyway. happy for this to just be parked for now

@esoteric-ephemera
Copy link

esoteric-ephemera commented Aug 23, 2024

@rkingsbury @janosh Hey sorry I was supposed to check this and it fell off my radar too. Using "admin" instead of the database name gives me this error when trying to connect to my NERSC-hosted MongoDB as admin or readonly:

OperationFailure: Authentication failed., full error: {'operationTime': Timestamp(1724438398, 4), 'ok': 0.0, 'errmsg': 'Authentication failed.', 'code': 18, 'codeName': 'AuthenticationFailed', '$clusterTime': {'clusterTime': Timestamp(1724438398, 4), 'signature': {'hash': ...

Also tbc, if auth_source = None, then I get a different error for typing:

TypeError: Wrong type for authSource, value must be an instance of str

@janosh
Copy link
Member Author

janosh commented Aug 23, 2024

thanks for checking @esoteric-ephemera! not unexpected. i leave the decision to y'all. breaking changes are always a tough sell but overriding a mongo default wasn't a great design choice in a general purpose tool imo

@esoteric-ephemera
Copy link

We'd have to update the jobflow (and probably fireworks) to reflect the change in auth_source. Can you specify auth_source : admin in your jobflow.yaml?

@janosh
Copy link
Member Author

janosh commented Aug 24, 2024

Can you specify auth_source : admin in your jobflow.yaml?

yes, see here

@esoteric-ephemera
Copy link

Is maybe a safer default behavior for maggma to try auth_source = self.database, test the connection, and then try admin? No breaking changes and hides the issue / avoids needless debugging for users with non-NERSC-style MongoDBs

@rkingsbury
Copy link
Collaborator

rkingsbury commented Aug 26, 2024

Is maybe a safer default behavior for maggma to try auth_source = self.database, test the connection, and then try admin? No breaking changes and hides the issue / avoids needless debugging for users with non-NERSC-style MongoDBs

Yes I was going to suggest something along these lines. @janosh I agree in principle that it would be best for maggma to follow MongoDB defaults here, but I do want to do something to mitigate / smoothen the breakage.

I was going to suggest defaulting to None in a try/except, and automatically resetting to self.database (with a warning) if it fails, e.g.:

try:
    Store.connect()
except OperationFailure:
    try:
         # connect with admin=self.database
         warnings.warn('Due to a recent update in maggma, you need to change the value of the admin kwarg!`)
    except OperationFailure as exc:
         raise exc

If we do one of the above, plus proactively message the MP crowd via Slack about the upcoming change, I think I'm comfortable with it.

@janosh
Copy link
Member Author

janosh commented Aug 27, 2024

good ideas @rkingsbury @esoteric-ephemera to minimize breakage! 👍 will take a swing at implementing once back in the office.

@rkingsbury
Copy link
Collaborator

good ideas @rkingsbury @esoteric-ephemera to minimize breakage! 👍 will take a swing at implementing once back in the office.

Hi @janosh , I know this isn't a high priority but just resurfacing this for when you're ready to implement. I'll be happy to merge once you do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants