Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customize ds_pk on Delta Sync table created by Amplify #2780

Open
naveenkoduri opened this issue Jul 26, 2023 · 7 comments
Open

Customize ds_pk on Delta Sync table created by Amplify #2780

naveenkoduri opened this issue Jul 26, 2023 · 7 comments
Labels
DataStore question Further information is requested transferred

Comments

@naveenkoduri
Copy link

naveenkoduri commented Jul 26, 2023

Amplify CLI Version

10.6.2

Question

Currently, we have an Android Amplify app that does Selective Sync from the backend and uses a specific GSI on the base table. So, the base query which uses the base table during the sync is all fine and good.

Now, with respect to delta sync using its Delta Sync table, whenever our app users are most active, this Delta Sync table is just getting throttled despite being an on-demand table as it is designed to use only two keys (for instance, keys are foo-table-2023-07-27 and bar-table-2023-07-27). Keeping the throttling issue aside, the RCU cost incurred for this delta sync table was always huge when the TTL was 27 hours and consumed 99% of our Dynamo DB cost. We reverted the TTL back to 30 min which again triggered the bug which is again cost prohibitive and we worked around that bug. This means we are back to doing a lot of base queries.

That begs the question, Is there a way we can customize the default pk and sk created by Amplify for this Delta Sync table that aligns with our access pattern related to Selective Sync

  • i.e Users always sync ONLY their data using the base query,
  • and also sync ONLY their data from lastSync using the delta query as well?
@naveenkoduri naveenkoduri added pending-triage question Further information is requested labels Jul 26, 2023
@AnilMaktala
Copy link
Member

Hey @naveenkoduri , 👋 thanks for raising this! I'm going to transfer this over to our Android repository for better assistance 🙂.

@AnilMaktala AnilMaktala transferred this issue from aws-amplify/amplify-category-api Jul 28, 2023
@david-mcafee
Copy link

@naveenkoduri - can you please provide the following?

  1. Your project's schema (found under /amplify/backend/api/[api name]/schema.graphql
  2. Current sync expressions
  3. Any relevant DataStore code snippets that demonstrate how you are querying and subscribing to updates

Thank you!

@david-mcafee
Copy link

@naveenkoduri - there is currently a PR out that changes the delta table partition key format for better sync performance with models having custom primary keys. New attributes are added to mutations and sync query resolvers to notify AppSync to use the newly improved data format.

Once this PR has been merged, the delta sync performance will be improved to better utilize the custom primary keys that you are using.

@david-mcafee david-mcafee self-assigned this Aug 14, 2023
@naveenkoduri
Copy link
Author

naveenkoduri commented Aug 14, 2023

@david-mcafee, the PR you mentioned seem to be applicable for models having custom primary key. Just so you are aware our model does not have a custom primary key as you can see in the schema we are not using @PrimaryKey directive. Below is the information you asked for.
Test code
Project Schema:

type TextMessage @model @auth(rules: [
{allow: owner, ownerField: "offId", provider: oidc, identityClaim: "offId"}, 
{allow: owner, ownerField: "customerId", provider: oidc, identityClaim: "customerId"}]) {
  customerId: String!
  offId: String
  contactId: String!
  inId: String @index(name: "byInId", sortKeyFields: ["createdAt"], queryField: "messagesByInId") 
  content: String
  subject: String
  contactFirstNm: String!
  contactLastNm: String!
  inFirstNm: String!
  inLastNm: String!
  createdAt: AWSDateTime
}

Sync Expression

 DataStoreSyncExpression textMessageDataSyncExpression = () -> TextMessage.IN_ID.eq(inId);
            DataStoreConfiguration datastoreBuilder = DataStoreConfiguration.builder()
                    .syncExpression(TextMessage.class,textMessageDataSyncExpression)
                    .syncPageSize(1000)
                    .syncMaxRecords(25000)
                    .build();.

Datastore

Amplify.DataStore.observe(TextMessage.class,
                    cancelable -> {
                        cancelableMessageSubscription = new AtomicReference<>(cancelable);
                        Log.d("logging");
                    },
                    messageReceived -> {
                       //Code to discard any duplicate events
                    },
                    failure -> {
                        DataStoreException dataStoreException = failure;
                        Log.e("logging");
                        cancelSubscription();
                        resume();
                    },
                    () -> Log.i("Observation complete for TextMessages.");

Cloud Formation generated for delta table with latest Amplify CLI 12.2.5
As you could see below, there is no GSI created by CLI for the delta table, and hence, for the Delta query irrespective of how Amplify builds the request, there is no way to query records by inId while querying the Dynamodb unless you are expecting that the data in the PK would be of this format {tablename}-{inId}-{yyy-mm-dd}. Currently, the data is in the format {tablename}-{yyyy-mm-dd}

  DataStore:
    Type: AWS::DynamoDB::Table
    Properties:
      KeySchema:
        - AttributeName: ds_pk
          KeyType: HASH
        - AttributeName: ds_sk
          KeyType: RANGE
      AttributeDefinitions:
        - AttributeName: ds_pk
          AttributeType: S
        - AttributeName: ds_sk
          AttributeType: S
      BillingMode: PAY_PER_REQUEST
      StreamSpecification:
        StreamViewType: NEW_AND_OLD_IMAGES
      TableName:
        Fn::Join:
          - ''
          - - AmplifyDataStore-
            - Fn::GetAtt:
                - GraphQLAPI
                - ApiId
            - '-'
            - Ref: env
      TimeToLiveSpecification:
        AttributeName: _ttl
        Enabled: true
    UpdateReplacePolicy: Delete
    DeletionPolicy: Delete

Also, I just want to reassure you that there is no issue related to syncing data on User's device. If the user has 10 messages, the user's AmplifyDatastore.db always has only 10 messages, and RCU is significantly lower whenever Base query is done. The inefficiency is only with the Delta query where there is no option to query records by a PK expression to begin with unless I am missing something.

I happen to find this in AppSync Sync operation documentation, but, not sure what we have to do on Amplify Schema to have the mentioned "deltaIndex" generated by Amplify.

deltaIndexName
The index used for the Sync operation. This index is required to enable a Sync operation on the whole delta store table when the table uses a custom partition key. The Sync operation will be performed on the GSI (created on gsi_ds_pk and gsi_ds_sk). This field is optional.

@naveenkoduri
Copy link
Author

naveenkoduri commented Aug 15, 2023

Adding sample request received by AppSync and the transformed request to DynamoDB for Base Query and Delta Query. For both of these queries there is not much change in how GraphQL query looks like, but, the transformed request changes

Base Query

AppSync Query:

f0dde0a6-a02b-438a-a8fa-e020caaa4551 GraphQL Query: 
query SyncTextMessages($filter: ModelTextMessageFilterInput, $lastSync: AWSTimestamp, $limit: Int) {
  syncTextMessages(filter: $filter, lastSync: $lastSync, limit: $limit) {
    items {
      id
	  foo
	  bar	  
    }
    nextToken
    startedAt
  }
}
, Operation: null, Variables: {
    "filter": {
        "and": [
            {
                "inId": {
                    "eq": "foo#1111"
                }
            }
        ]
    },
    "limit": 1000,
    "lastSync": 1691202372480
}

TransformedTemplate:
Please notice how the transformed request has an index name and the usage of query opposed to filter

{
    "version": "2018-05-29",
    "operation": "Sync",
    "limit": 1000,
    "lastSync": 1691202372480,
    "query": {
        "expression": "#pk = :pk",
        "expressionNames": {
            "#pk": "inId"
        },
        "expressionValues": {
            ":pk": {
                "S": "foo#1111"
            }
        }
    },
    "scanIndexForward": true,
    "filter": {
        "expression": "",
        "expressionNames": {},
        "expressionValues": {}
    },
    "index": "byInIdIndex"
}

Delta Query

AppSync Query:

397b753e-51ab-4273-8202-965d06125bc6 GraphQL Query: query SyncTextMessages($filter: ModelTextMessageFilterInput, $lastSync: AWSTimestamp, $limit: Int) {
  syncTextMessages(filter: $filter, lastSync: $lastSync, limit: $limit) {
    items {
      id
	  foo
	  bar	  
    }
    nextToken
    startedAt
  }
}
, Operation: null, Variables: {
    "filter": {
        "and": [
            {
                "inId": {
                    "eq": "foo#2222"
                }
            }
        ]
    },
    "limit": 1000,
    "lastSync": 1691211580443
}

TransformedTemplate:
Please notice usage of just the filter

{
    "version": "2018-05-29",
    "operation": "Sync",
    "limit": 1000,
    "nextToken": null,
    "lastSync": 1691211580443,
    "filter": {
        "expression": "(#inId = :and_0_inId_eq)",
        "expressionNames": {
            "#inId": "inId"
        },
        "expressionValues": {
            ":and_0_inId_eq": {
                "S": "foo#2222"
            }
        }
    }
}

@david-mcafee
Copy link

@naveenkoduri - it looks like you tried both a 27 hour TTL for the delta table, as well as a 30 minute TTL. Have you tried experimenting with a TTL configuration in between those values? If not, I would recommend starting with a 2 hour TTL to see if that helps.

I also wanted to follow up on your other comments:

  1. The transformed templates that you include look correct (i.e. there isn't a bug).
  2. You may already be aware, but I also wanted to point out that since you are using the @index directive on the inId field, and your sync expression is set to filter on that value, you are performing a query instead of a scan when performing a base sync (meaning highly efficient and cost-effective data retrieval).
  3. Regarding your questions on Delta sync: DataStore queries using the table name on the partition key (and date and time with the sort key), and then applies the filter. That’s a more costly operation compared to querying the base table if there are too many records added to one specific table in a short time. However, if you updated your schema to use a custom primary key, there would be a huge performance improvement once the PR I linked above is implemented.

If updating the TTL does not help and / or custom primary keys are not an option for you, please let me know! Thanks!

@naveenkoduri
Copy link
Author

naveenkoduri commented Aug 17, 2023

@david-mcafee, Thanks for looking further into it. Currently, we have 30 min for one table and 5 min for another table. With these TTL settings itself, we are see Datastore/DeltaSync table consuming more RCU compared to the Base table for the traffic we have. The one option we know will make it better is going further down to 5min on the other table as no. of records in DeltaSync table will decrease and thereby fewer records to query for Delta Query. However, this would pose another problem in the future as we grow where we will see a lot of Sync requests going to the Base table that will return up to 25000 records.

Regarding the primaryKey, we do need it to be auto-generated UUID as the PK identifies a specific text message between two parties.

@david-mcafee david-mcafee removed their assignment Mar 7, 2024
@tylerjroach tylerjroach transferred this issue from aws-amplify/amplify-android Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DataStore question Further information is requested transferred
Projects
None yet
Development

No branches or pull requests

4 participants