Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cassandra-stress doesn't support UDT in user-profile mode #332

Open
yarongilor opened this issue Jun 8, 2023 · 19 comments
Open

cassandra-stress doesn't support UDT in user-profile mode #332

yarongilor opened this issue Jun 8, 2023 · 19 comments

Comments

@yarongilor
Copy link

Creating a UDT and using it in a table schema is not supported by cassandra-stress user-profile mode.
This ability is required for testing specific user-profiles, similar to what is used by customers.

A Dtest for that failed like:
Using a user-profile table schema of:

table_definition: |
  CREATE TABLE IF NOT EXISTS user_with_ck (
    key blob,
    // Case sensitive column names
    //"Md5" text,
    md5 text,
    ...
    hash_bashname hash_and_basename,
    PRIMARY KEY (key, md5)
  ); 

The following code:

        with self.patient_cql_connection(node) as session:
            create_ks(session=session, name='keyspace_complex', rf=3)
            session.execute("CREATE TYPE IF NOT EXISTS hash_and_basename (md5 text, basename text)")
            wait_for_schema_agreement(session)
        time.sleep(30)
        logger.debug('Running stress ...')
        node.stress(["user", "no-warmup", "profile=%s" % os.path.realpath('test_data/batch-test/complex_schema.yaml'),
        ...

Failed with:

E           stderr: java.lang.UnsupportedOperationException: Because of this name: hash_bashname if you removed it from the yaml and are still seeing this, make sure to drop table

It got a similar failure testing in SCT:

< t:2023-06-01 14:03:34,429 f:common.py       l:1742 c:utils                p:DEBUG > Executing CQL 'CREATE TYPE IF NOT EXISTS custom_d1.post_speaker_item (id3 bigint, name3 text, reaction_type3 tinyint);' ...
< t:2023-06-01 14:03:34,434 f:thread.py       l:52   c:cassandra.cluster    p:DEBUG > Refreshing schema in response to schema change. {'target_type': 'TYPE', 'change_type': 'CREATED', 'keyspace': 'custom_d1', 'type': 'post_speaker_item'}
Stress command completed with bad status 1: WARNING: Table 'posts' has column 'speaker_map' of unsupported type
@mykaul
Copy link
Contributor

mykaul commented Jun 8, 2023

Is that the right repo to report this?

@yarongilor
Copy link
Author

Is that the right repo to report this?

Yes. At least it looks so, according to previously existing similar issues.

@mykaul
Copy link
Contributor

mykaul commented Jun 8, 2023

I don't see who's going to develop it. I reckon upstream don't have it either.

@yarongilor
Copy link
Author

ok, thanks, i see.
@fgelcer , @fruch , should we switch back to the previously suggested option of using tlp-stress instead?

@fruch
Copy link
Contributor

fruch commented Jun 11, 2023

Yaron,

can you share the whole user profile file ?

@fruch
Copy link
Contributor

fruch commented Jun 11, 2023

I would recommend testing the upstream cassandra-stress from 4.1, if it's kind of working there, backporting might be easy.

Getting familiar with tlp-stress might also be a good path to start building the exact cases we need.

1 similar comment
@fruch
Copy link
Contributor

fruch commented Jun 11, 2023

I would recommend testing the upstream cassandra-stress from 4.1, if it's kind of working there, backporting might be easy.

Getting familiar with tlp-stress might also be a good path to start building the exact cases we need.

@yarongilor
Copy link
Author

yarongilor commented Jun 12, 2023

I would recommend testing the upstream cassandra-stress from 4.1, if it's kind of working there, backporting might be easy.

Getting familiar with tlp-stress might also be a good path to start building the exact cases we need.

@fruch , i couldn't find any updates for this limitation in cassandra repo.
The code still seems to be there: https://github.com/apache/cassandra/blob/4f3cb5de3708e1c406989bb636892e5d010b1a6b/tools/stress/src/org/apache/cassandra/stress/StressProfile.java#L803

I noticed a quite similar issue, opened for years by @dkropachev - https://issues.apache.org/jira/browse/CASSANDRA-15598
Dmitry can you please confirm that? do yo think we can somehow use the patch you sent so it will resolve this limitation of UDT as well?

@roydahan
Copy link
Collaborator

I think that the fix from @dkropachev was not merged in the upstream but was merged for scylla (or at least that was the plan).

@fruch
Copy link
Contributor

fruch commented Jun 12, 2023

https://issues.apache.org/jira/browse/CASSANDRA-15598 isn't abot UDT, it's about nested sets.

@fruch
Copy link
Contributor

fruch commented Jun 12, 2023

@yarongilor
from the line you shared, it's just a warning that doesn't affect anything, just a print.

            if (unsupportedColumns.size() > 0) {
                for (ColumnInfo column : unsupportedColumns) {
                    System.err.printf("WARNING: Table '%s' has column '%s' of unsupported type\n",
                            tableName, column.name);
                }
            }

the actual problem is in getGenerator, that doesn't have a generator for types it can not recognize

                default:
                    throw new UnsupportedOperationException("Because of this name: "+name+" if you removed it from the yaml and are still seeing this, make sure to drop table");

@dkropachev
Copy link
Collaborator

dkropachev commented Jun 12, 2023

@roydahan , you are correct, i have 3 PRs to their repo, none of them endup even looked at.

@yarongilor , the error message that you see is result of having that patch, otherwise you would see it failing by panic.
There is bug is c-s architecture value generation logic that does not allow it to be used for nested structures, including UDTs.
Unfixed it causes c-s to crash whenever it sees column of this sort.

Unfortunately fixing this issue properly would cause major changes to be made, which would make our version diverge too far from the upstream.
So, easiest solution was to just make c-s silently skip these columns, which was implemented at apache repo as apache/cassandra#451 and at our repo as #144.
#144 - was merged, while apache/cassandra#451 - recently was closed due to the 2y inactivity.

I am going to check whether this issue is sorted out on the upstream at 4.x branch and update this issue

======= Update

This issue persist on the upstream at 4.x.
I can make c-s generate values for this kind of types, but since it require lot's of effort this task has to go thrue management pipeline to get prioritized and scheduled.

@roydahan
Copy link
Collaborator

roydahan commented Jun 15, 2023

@dkropachev assuming we are ok with the diverge between the two, how hard would it be to support the nested structures?

@dkropachev
Copy link
Collaborator

@roydahan , roughly 3 days of work

@roydahan
Copy link
Collaborator

@dkropachev do you think you'll be able to fit this task sometime soon?

@dkropachev
Copy link
Collaborator

@roydahan , I am sorry, next two weeks I am booked beyond capacity, I have it in mind and will find time for it after I finish what I currently have on me. Sorry for delaying it.

@roydahan
Copy link
Collaborator

Thanks @dkropachev, it may become high priority for us at the moment, I'll check our options around that.

@avelanarius
Copy link

@muzarski is going to look into it today. We're not sure if some quick fix is possible, but hopefully @muzarski will determine it by the end of the day.

@avelanarius
Copy link

After some additional private discussion, for now skipping UDT columns will be the workaround and we agree that adding this new feature cannot be achieved by a quick fix. Nevertheless, @muzarski is going to experiment with it for a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants