Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project identity-based cloud synchronization #7743

Merged
merged 17 commits into from
Jan 11, 2021
Merged

Conversation

darzu
Copy link
Contributor

@darzu darzu commented Jan 4, 2021

Cloud Save

This PR adds the ability for projects to be saved to the cloud.
Users need to sign into our new identity platform first (which might be restricted to a few participants first.)

In this PR, the basic usage is as follows:

  • Any save made to a project while the user is signed in promotes that project to a cloud project by stamping the user's UserID onto it. Any project with a UserID is considered a cloud project.
  • On first load (and at a few other points), we initiate a cloud sync. When complete, if there are changes from the cloud, the page is reloaded.
  • When a user makes a change, after a few seconds a background cloud save is initiated. If there is a problem, we'll do a cloud sync and maybe reload the page.
  • When there is a local vs remote conflict, we resolve it by having the last modification win.
  • There is a small cloud icon on projects that have been cloud synced.

There's a lot missing here. Importantly:

  • UX to involve the user in conflict decisions
  • UX to indicate when a save is under way
  • UX to indicate the last save and/or sync
  • Analytics

However it should work well enough to start kicking the tires with what is here.

The nitty gritty of the solution breaks down into a few main files:

cloud.ts

This is our cloud client. It can "set", "get", and "list" projects. A Project consists of a Header and Text (or "files", e.g. main.blocks, pxt.json, etc.).

The heart of this code is the "syncAsync" function which is responsible for listing all the projects on the server, all the local projects and doing a reconciliation. Right now, conflicts are resolved by which source has the latest modification. This is bad because we can easily loose arbitrary amounts of user work.

One of the biggest and subtlest difficulties in this work has been properly tracking and managing versions. Each storage medium has its own version system (e.g. our cloud uses "etags", PouchDB uses a "_rev" field). When updating a record, we must supply the version of the previous value. In theory this should prevent us from accidentally overwriting another change, but unfortunately in practice that can still happen plenty. We track version numbers differently for the different mediums and there is a lot of inconsistency for how this is handled. For example, for cloud projects there is a single version number for the whole project, for pouch DB there are two version numbers: one for the Header and one for the Text, and it's possible to hit errors where one storage operation succeeds and the other doesn't.

Because our cloud functions do not implement WorkspaceProvider, we instead track the cloud version on the header. It's basically meaningless to actually store that cloud version on the header itself because as soon as it is stored in the cloud it is by construction obsolete. This has lead to subtle bugs keeping this straight.

Staying in the background is challenging. Right now we mostly do this by having our heavy lifting work done in "syncAsync" which is always run in the background and if syncAsync detects there were changes it refreshes the whole page. This is very heavy handed but it probably works okay in scenarios where the user isn't frequently editing on multiple devices at the same time.

When a single edit is made, a cloud save is initiated by package.ts's savePkgAsync() which does some crude throttling.

workspace.ts

This is a catch all for anything that needs to be done above the WorkspaceProvider abstraction. One of the big customers is package.ts but almost every place in the web app reaches into workspace.ts. Most interface through one of: saveAsync, getHeader(s), isSessionOutOfDate (and its variants), and syncAsync.

One of the big things workspace.ts does is to keep our storage safe during multi-tab access. The methods with "session" in the name do this: isHeadersSessionOutdated, maybeSyncHeadersAsync, refreshHeadersSession. When we detect the session is out of date, we call "syncAsync" which cascades into calling our local git provider's "syncAsync" and the cloud's "syncAsync". This chain of calls is somewhat brittle and fundamentally probably shouldn't be tied to our multi-tab system.

Another concern of workspace.ts is to hide most of the versioning from the rest of the system. It keeps a cached in-memory copy of all projects that are being worked on and what their last version was so that "saveAsync" only needs to take the Header and optional Text to update a file. Of course this defeats the whole point of the versions/etags system, but it mostly works.

One of the most difficult parts of this PR was in deciding what counted as a "modification". For better or worse, we are very liberal about "saves" we make to the project from various parts of MakeCode. For example, when creating a new project, there are at least 5 separate calls made to "saveAsync". Each change triggers between 1-3 and actions such as running the simulator or switching languages or perhaps even the alignment of Jupiter and Mars will cause more "saves". This is a big problem for cloud synced project since each save has the potential to overwrite work done in the cloud. So I introduced a somewhat complicated system for diffing the previous version of the project to the new version that is going to be saved to determine what constitutes a significant or "user" change. This is effetely implementing a .gitignore file where we can ignore certain files and header field changes as not coming from the user and thus not updating the modification time.

Future Infrastructure

Ideally, we would use version control like git, or maybe a consensus system like Google Doc's Operation Transforms or Fluid Frame's CRDTs. This helps us side step many of the main reliability issues we're facing here and enables new features like history, merges, and perhaps realtime collaboration.

While working on this, I found a bunch of issues with how we were using our workspace abstraction which I started cleaning up here: #7657
However this is a larger scope of work and I'm working on this on the back-burner outside the critical path.

@darzu darzu marked this pull request as draft January 5, 2021 02:04
@darzu darzu marked this pull request as ready for review January 6, 2021 23:26
@darzu darzu changed the title [DRAFT] min-bar cloud sync Project cloud synchronization Jan 6, 2021
@darzu darzu changed the title Project cloud synchronization Project identity-based cloud synchronization Jan 6, 2021
@darzu darzu requested a review from eanders-ms January 6, 2021 23:28
webapp/src/app.tsx Outdated Show resolved Hide resolved
webapp/src/workspace.ts Outdated Show resolved Hide resolved
local.isDeleted = true;
localHeaderChanges[local.id] = local
return workspace.forceSaveAsync(local, {}, true)
.then(() => { data.clearCache(); }) // TODO @darzu: is this cache clear necessary?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eanders-ms , do you remember why you added this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you asking about the call to clearCache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@darzu
Copy link
Contributor Author

darzu commented Jan 8, 2021

Might be able to test here once staging is up: https://arcade.makecode.com/app/d695b519d3c28d9164c3cb0d093e144647ef32a8-33bb617036

@eanders-ms
Copy link
Contributor

Testing locally, I saw an unexpected "project is open elsewhere" message when signing in while editing a local project.

  1. While signed out, open a locally-stored project in the editor.
  2. Sign in.
  3. Notice the error message.

Seems non-destructive and the error message disappears due to page reload.

@eanders-ms
Copy link
Contributor

Another instance of "project is already opened elsewhere":

  1. Go to the home screen and ensure you're signed in.
  2. Open the F12 debugger.
  3. Open a cloud-saved project.
  4. Notice the debugger breaks on debugger in errorNotification method.

@darzu
Copy link
Contributor Author

darzu commented Jan 8, 2021

Testing locally, I saw an unexpected "project is open elsewhere" message when signing in while editing a local project.

  1. While signed out, open a locally-stored project in the editor.
  2. Sign in.
  3. Notice the error message.

Seems non-destructive and the error message disappears due to page reload.

Hmm, I tried this once and it's not repro'ing. I'll try the other repro then I'll keep poking

@eanders-ms
Copy link
Contributor

When loading the main page (tab opened by pxt serve), I see "project is already opened elsewhere":
image

@eanders-ms
Copy link
Contributor

Possible related to the above, just after that visual error message, I see "trying to access outdated session" in the debugger.

@eanders-ms
Copy link
Contributor

gif of above:
proj

@darzu
Copy link
Contributor Author

darzu commented Jan 8, 2021

Any logs? Preferably with ?dbg=1. I'm not seeing these repro. There must be something persistent in your local storage.

Copy link
Contributor

@eanders-ms eanders-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

webapp/src/app.tsx Outdated Show resolved Hide resolved
webapp/src/browserworkspace.ts Show resolved Hide resolved
// Possible conflict.
const conflictStr = `conflict found for '${local.name}' (${local.id.substr(0, 5)}...). Last cloud change was ${agoStr(remoteFile.header.modificationTime)} and last local change was ${agoStr(local.modificationTime)}.`
// last write wins.
if (local.modificationTime > remoteFile.header.modificationTime) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want to switch to _etag here (if it is being returned)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I'm not sure what we would do in the case where local.modificationTime == remoteFile.header.modificationTime and local.cloudVersion !== remoteFile.header.cloudVersion.
We're inherently depending on modification in the current iteration no matter what.

The List operation doesn't currently return etags so we'd need to modify the backend to support this (now that we suspect it's possible.) I'd prefer to do this in follow up PRs.

local.isDeleted = true;
localHeaderChanges[local.id] = local
return workspace.forceSaveAsync(local, {}, true)
.then(() => { data.clearCache(); }) // TODO @darzu: is this cache clear necessary?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you asking about the call to clearCache?

webapp/src/codecard.tsx Show resolved Hide resolved
@darzu darzu merged commit 9c3018c into master Jan 11, 2021
@darzu darzu deleted the dazuniga/minbar_cloudsync branch January 11, 2021 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants