-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk User Import into SuperTokens #912
Comments
About method 1:
About method 2:
This is a very tiny advantage IMO. What other advantages are there?
Additional points:
|
@rishabhpoddar After discussions, we've opted for method 1. I've expanded the original to provide detailed elaboration on method 1. |
Please give it in SQL format like
The totp object needs skew, period, deviceName (optional). Please see the API to import a device.
The test you ran, what are the specs of the setup?
This is not true.
Why do we need this API? Why can't we just auto clear the space in the db?
|
Done.
Done.
I ran the tests with my MacBook Pro which has pretty good specs but I was able to insert users till 50K as well. 10K is a conservative number that should be able to accommodate lower end specs too.
We'll update the data, overwriting fields wherever applicable for simplicity. If a user is performing a bulk import while their previous auth provider is active, it's possible that user data could be updated for some users after creating the export file. Allowing users to update and create users in the same API would be helpful.
Can you please elaborate? Do we support every thirdparty?
Upon some discussion we decided to remove it.
The information on the atomicity of Common Table Expressions (WITH statement) is ambiguous. While some sources suggest potential issues where another query might select the same documents before they are updated, incorporating
I have wrote more about this in the above point.
Good point. The API will provide the index of the user requiring attention. Our script can then handle the removal of users with errors, creating a separate file for them. Users with errors can be addressed manually later.
Not only there could be new sign ups but existing user data can be updated as well. Users can call the API again with the updated data and make it work without "just in time" migration as well. This requires that bulk import API will update the data if the user already exists.
Please find a pseudo code for the function ImportUsers(user) {
try {
Start Transaction;
let EPUserId = null;
let TPUserId = null;
let PWlessUserId = null;
let primaryUserId = null;
Set isPrimary to true for the first loginMethod if none of the methods have it set
if (isPrimary is not true for any user.loginMethods) {
user.loginMethods[0].isPrimary = true;
}
for (loginMethod in user.loginMethods) {
if (loginMethod.recipeId = 'emailpassword') {
EPUserId = CreateUserForEP(loginMethod, user.externalUserId);
if (loginMethod.isPrimary) {
primaryUserId = EPUserId;
}
}
if (loginMethod.recipeId = 'passwordless') {
PWlessUserId = CreateUserForPasswordless(loginMethod, user.externalUserId);
if (loginMethod.isPrimary) {
primaryUserId = PWlessUserId;
}
}
if (loginMethod.recipeId = 'thirdparty') {
TPUserId = CreateUserForThirdParty(loginMethod, user.externalUserId);
if (loginMethod.isPrimary) {
primaryUserId = TPUserId;
}
}
}
if (user.loginMethods.length > 1) {
Call accountlinking.createPrimaryUser for primaryUserId;
Call accountlinking.linkAccount for every userId other than primaryUser;
}
if (user.usermetadata) {
Update user metadata for the primaryUserId;
}
if (user.mfa) {
Call mfa related APIs for the primaryUserId;
}
Delete user from bulk_import_users table;
} catch(e) {
user.status = 'FAILED';
user.error_msg = e.message // Set meaningful error message
} finally {
End Transaction;
}
}
function CreateUserForEP(data, externalUserId) {
Validate data fields;
Call EmailPassword.importUserWithPasswordHash;
if (data.isPrimary) {
Link this user to externalUserId;
}
if (data.time_joined) {
Update time_joined of the created user;
}
if (data.isVerified) {
Update email verification status;
}
return newUserId;
}
function CreateUserForPasswordless(data, externalUserId) {
Validate data fields;
Call Passwordless.createCode;
Call Passwordless.consumeCode; // Email will be verified automatically here
if (data.isPrimary) {
Link this user to externalUserId;
}
if (data.time_joined) {
Update time_joined of the created user;
}
return newUserId;
}
function CreateUserForThirdParty(data, externalUserId) {
Validate data fields;
Call ThirdParty.SignInAndUp API; // Pass data.isVerfied
if (data.isPrimary) {
Link this user to externalUserId;
}
if (data.time_joined) {
Update time_joined of the created user;
}
return newUserId;
} |
We decided that we will not be updating existing users in the bulk migration logic. If we see a repeated user in bulk migration (same external user id, or same account info in same tenant), then we will mark it as an error |
ProblemWhen a user generates a bulk import JSON file from an existing authentication provider, ongoing changes to user data can make the import outdated by the time it's completed. For example, if a user updates their password after the JSON file is created, they will not be able to log in with their new password on SuperTokens. Solution 1: Bulk Import with User Data UpdatesA straightforward solution is to perform a second bulk import after switching to SuperTokens. If the Bulk Import API supports updates, the new data can replace or add to what's in the SuperTokens database, helping to correct any inconsistencies from the initial import. Issues with Solution 1
Solution 2: Implementing
|
Bulk Import Documentation OutlineMigration steps overviewOverall goals:
Steps:
Code migrationYou want to completely replace your existing auth system with SuperTokensYou just follow our docs as is. You want to keep both alive for a certain amount of timeYou just follow our docs as is, except for session verification, you do the following: app.post("/verify", verifySession({sessionRequired: false}), async (req, res) => {
}); Frontend migration:
Option 1: Migration API -> OverviewFor bulk migration
For small number of usersTalk about the migration API here and link to the JSON schema explanation in the "1) Creating the user JSON" page 1) Creating the user JSON
2) Add the user JSON to the SuperTokens core
3) Monitoring the core cronjob
Example: Create user JSON file from Auth0
Option 2: User Creation without password hashesFor now, let it be how it is in the docs Session migration (optional)For now, let it be how it is in the docs Step 4: Post production operationsIf you have done bulk migration:
If you have done User Creation without password hashes:
|
https://docs.google.com/document/d/1TUrcIPbdsHfqheIkB6CTkTNiA6YN-73Xz_pX2kjXhkc/edit
Open PRs:
TODO:
Test with 1M users in CICD. Make sure users are generated in a way that have various login methods, tenancy, metadata, roles etc. Make sure the time the test takes is not too much, and things work well and are consistent.
Create an API for starting/stopping the cron job, and an other one for getting the status of the cron job (active/inactive). The processing batch size should be a parameter for the starting API.
Allow developers to configure parallelism in ProcessBulkImportUsers cron job
Currently, it takes on an average 66 seconds to process 1000 users. This is very slow if we are processing a large number of users. This happens because we loop through the users one by one in a for loop and also use just 1 DB connection for BulkImportProxyStorage.
The solution is to process users parallely using threads and create a BulkImportProxyStorage instance for each user (that is being processed). The number of users we will process in parallel will depend on the
BULK_MIGRATION_PARALLELISM
config value set by the user. This will be a SaaS protected prop and can be added toPROTECTED_CONFIGS
in CoreConfig.java. It should have the@NotConflictingInApp
annotation.PR changes in supertokens-core PR
- All the PR changes are done but there maybe more changes after review.
PR changes in supertokens-postgresql-plugin PR
Changes in Node Script to add users
The script needs to re-written to optimise for the following user stories -
The user is not expected to monitor the script. The script should try to continue processing and retry failures wherever possible.
The user should be able to re-run the script using the same input file multiple times and the script should process the remaining users. This can be implemented by maintaining a state file per input file name.
The Core API calls should have an exponential backoff retries (unlimited) in case we get an error from the Core. This is to ensure that we don't halt processing in case the Core is down for a few seconds.
The script should continue showing the Bulk Import Cron Job status after it has added all the users. Any users with status=
FAILED
will be added to the sameusersHavingInvalidSchema.json
file. This file could also be renamed to be something likeusersHavingErrors.json
. Since, we will be writing to theusersHavingErrors.json
file, they are expected to wait until all the users have been processed and then fix the error file and add those users again.The script should display progress logs while adding the users. This could include total number of usrs, number of users added, number of users having errors, etc.
We also need to re-search about the size limit of the JSON file. A JSON file having a million user would be about 880 MB. JSON files cannot be streamed, the whole file needs to be read in memory. If this is an issue then we may need to switch to
ndjson
file format which allows streaming the file.Documentation for Bulk Import
Update the CDI spec to include
/bulk-import/import
and/bulk-import/users/count
APIs. Also update theBulkImportUser
schema to includeplainTextPassword
field.Bulk Import for Auth0 users (ignore for now)
After the Bulk Import task is complete. We plan to have a special guide for Auth0 users.
Auth0 users need to request the exported file from their support team if they want password hashses of the users. For any other type of login, they can export the users themselves using the API. However, this API doesn't include user roles.
We could write a script that takes the exported JSON and Auth0 credentials. The script would get all the roles and map to the users. (We could also call the roles API for each user but that would take more API calls)
We can also add separate page in our documentation for Auth0 users that guides them about requesting the export JSON file and running the above script.
Order of changes:
The text was updated successfully, but these errors were encountered: