-
Notifications
You must be signed in to change notification settings - Fork 581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Project identity-based cloud synchronization #7743
Conversation
…ll due to spurious, aggressive local saving
webapp/src/cloud.ts
Outdated
local.isDeleted = true; | ||
localHeaderChanges[local.id] = local | ||
return workspace.forceSaveAsync(local, {}, true) | ||
.then(() => { data.clearCache(); }) // TODO @darzu: is this cache clear necessary? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eanders-ms , do you remember why you added this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking about the call to clearCache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
Might be able to test here once staging is up: https://arcade.makecode.com/app/d695b519d3c28d9164c3cb0d093e144647ef32a8-33bb617036 |
Testing locally, I saw an unexpected "project is open elsewhere" message when signing in while editing a local project.
Seems non-destructive and the error message disappears due to page reload. |
Another instance of "project is already opened elsewhere":
|
Hmm, I tried this once and it's not repro'ing. I'll try the other repro then I'll keep poking |
Possible related to the above, just after that visual error message, I see "trying to access outdated session" in the debugger. |
Any logs? Preferably with |
…acking "project is open elsewhere"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Possible conflict. | ||
const conflictStr = `conflict found for '${local.name}' (${local.id.substr(0, 5)}...). Last cloud change was ${agoStr(remoteFile.header.modificationTime)} and last local change was ${agoStr(local.modificationTime)}.` | ||
// last write wins. | ||
if (local.modificationTime > remoteFile.header.modificationTime) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want to switch to _etag here (if it is being returned)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I'm not sure what we would do in the case where local.modificationTime == remoteFile.header.modificationTime
and local.cloudVersion !== remoteFile.header.cloudVersion
.
We're inherently depending on modification in the current iteration no matter what.
The List operation doesn't currently return etags so we'd need to modify the backend to support this (now that we suspect it's possible.) I'd prefer to do this in follow up PRs.
webapp/src/cloud.ts
Outdated
local.isDeleted = true; | ||
localHeaderChanges[local.id] = local | ||
return workspace.forceSaveAsync(local, {}, true) | ||
.then(() => { data.clearCache(); }) // TODO @darzu: is this cache clear necessary? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking about the call to clearCache?
Cloud Save
This PR adds the ability for projects to be saved to the cloud.
Users need to sign into our new identity platform first (which might be restricted to a few participants first.)
In this PR, the basic usage is as follows:
There's a lot missing here. Importantly:
However it should work well enough to start kicking the tires with what is here.
The nitty gritty of the solution breaks down into a few main files:
cloud.ts
This is our cloud client. It can "set", "get", and "list" projects. A Project consists of a Header and Text (or "files", e.g. main.blocks, pxt.json, etc.).
The heart of this code is the "syncAsync" function which is responsible for listing all the projects on the server, all the local projects and doing a reconciliation. Right now, conflicts are resolved by which source has the latest modification. This is bad because we can easily loose arbitrary amounts of user work.
One of the biggest and subtlest difficulties in this work has been properly tracking and managing versions. Each storage medium has its own version system (e.g. our cloud uses "etags", PouchDB uses a "_rev" field). When updating a record, we must supply the version of the previous value. In theory this should prevent us from accidentally overwriting another change, but unfortunately in practice that can still happen plenty. We track version numbers differently for the different mediums and there is a lot of inconsistency for how this is handled. For example, for cloud projects there is a single version number for the whole project, for pouch DB there are two version numbers: one for the Header and one for the Text, and it's possible to hit errors where one storage operation succeeds and the other doesn't.
Because our cloud functions do not implement WorkspaceProvider, we instead track the cloud version on the header. It's basically meaningless to actually store that cloud version on the header itself because as soon as it is stored in the cloud it is by construction obsolete. This has lead to subtle bugs keeping this straight.
Staying in the background is challenging. Right now we mostly do this by having our heavy lifting work done in "syncAsync" which is always run in the background and if syncAsync detects there were changes it refreshes the whole page. This is very heavy handed but it probably works okay in scenarios where the user isn't frequently editing on multiple devices at the same time.
When a single edit is made, a cloud save is initiated by package.ts's savePkgAsync() which does some crude throttling.
workspace.ts
This is a catch all for anything that needs to be done above the WorkspaceProvider abstraction. One of the big customers is package.ts but almost every place in the web app reaches into workspace.ts. Most interface through one of: saveAsync, getHeader(s), isSessionOutOfDate (and its variants), and syncAsync.
One of the big things workspace.ts does is to keep our storage safe during multi-tab access. The methods with "session" in the name do this: isHeadersSessionOutdated, maybeSyncHeadersAsync, refreshHeadersSession. When we detect the session is out of date, we call "syncAsync" which cascades into calling our local git provider's "syncAsync" and the cloud's "syncAsync". This chain of calls is somewhat brittle and fundamentally probably shouldn't be tied to our multi-tab system.
Another concern of workspace.ts is to hide most of the versioning from the rest of the system. It keeps a cached in-memory copy of all projects that are being worked on and what their last version was so that "saveAsync" only needs to take the Header and optional Text to update a file. Of course this defeats the whole point of the versions/etags system, but it mostly works.
One of the most difficult parts of this PR was in deciding what counted as a "modification". For better or worse, we are very liberal about "saves" we make to the project from various parts of MakeCode. For example, when creating a new project, there are at least 5 separate calls made to "saveAsync". Each change triggers between 1-3 and actions such as running the simulator or switching languages or perhaps even the alignment of Jupiter and Mars will cause more "saves". This is a big problem for cloud synced project since each save has the potential to overwrite work done in the cloud. So I introduced a somewhat complicated system for diffing the previous version of the project to the new version that is going to be saved to determine what constitutes a significant or "user" change. This is effetely implementing a .gitignore file where we can ignore certain files and header field changes as not coming from the user and thus not updating the modification time.
Future Infrastructure
Ideally, we would use version control like git, or maybe a consensus system like Google Doc's Operation Transforms or Fluid Frame's CRDTs. This helps us side step many of the main reliability issues we're facing here and enables new features like history, merges, and perhaps realtime collaboration.
While working on this, I found a bunch of issues with how we were using our workspace abstraction which I started cleaning up here: #7657
However this is a larger scope of work and I'm working on this on the back-burner outside the critical path.