`datalake`: remove offset translation from `translation_stm` (and add `compaction_test.py`) #24610

WillemKauf · 2024-12-18T22:07:01Z

Previously, the datalake::translation::translation_stm would return its max collectible as the following:

redpanda/src/v/datalake/translation/state_machine.cc

Lines 112 to 122 in 925707c

    
           model::offset translation_stm::max_collectible_offset() { 
        
               if (!_raft->log_config().iceberg_enabled()) { 
        
                   return model::offset::max(); 
        
               } 
        
               // if offset is not initialized, do not attempt translation. 
        
               if (_highest_translated_offset == kafka::offset{}) { 
        
                   return model::offset{}; 
        
               } 
        
               return _raft->log()->to_log_offset( 
        
                 kafka::offset_cast(_highest_translated_offset)); 
        
           }

This offset translation leads to an overly restrictive condition for the max collectible offset, due to the fact that it is translation batch unaware.

Here, the utility function get_translated_log_offset() is added, which returns the "equivalent" translated log offset for a given kafka offset, taking into account translation batches (which don't need to be translated, and thus shouldn't restrict the max collectible offset).

Use of this function is plumbed through the partition_translator and the translation_stm, and we now bookkeep the _highest_translated_log_offset in the translation_stm to avoid any offset translation within it.

Additionally, a new test for compaction with an Iceberg enabled topic is added to datalake/compaction_test.py, with some enhancements to the datalake_verifier service to make it compaction aware.

Backports Required

Release Notes

Improvements

Fixes an overly restrictive condition for retention in Iceberg-enabled topics.

WillemKauf · 2024-12-18T22:23:46Z

tests/rptest/tests/datalake/datalake_verifier.py

@@ -24,22 +24,24 @@

 class DatalakeVerifier():
    """
-     Verifier that does the verification of the data in the redpanda Iceberg table. 
-     The verifier consumes offsets from specified topic and verifies it the data 
+     Verifier that does the verification of the data in the redpanda Iceberg table.


Trailing whitespace removal

vbotbuildovich · 2024-12-19T00:42:10Z

Retry command for Build#59935

please wait until all jobs are finished before running the slash command



/ci-repeat 1
tests/rptest/tests/tiered_storage_model_test.py::TieredStorageTest.test_tiered_storage@{"cloud_storage_type_and_url_style":[1,"virtual_host"],"test_case":{"name":"(TS_Read == True, TS_ChunkedRead == True)"}}
tests/rptest/tests/tiered_storage_model_test.py::TieredStorageTest.test_tiered_storage@{"cloud_storage_type_and_url_style":[2,"virtual_host"],"test_case":{"name":"(TS_Read == True, TS_TxRangeMaterialized == True, SpilloverManifestUploaded == True)"}}
tests/rptest/tests/tiered_storage_model_test.py::TieredStorageTest.test_tiered_storage@{"cloud_storage_type_and_url_style":[1,"virtual_host"],"test_case":{"name":"(TS_Read == True, TS_TxRangeMaterialized == True, SpilloverManifestUploaded == True)"}}
tests/rptest/tests/write_caching_fi_test.py::WriteCachingFailureInjectionTest.test_crash_all
tests/rptest/tests/tiered_storage_model_test.py::TieredStorageTest.test_tiered_storage@{"cloud_storage_type_and_url_style":[2,"virtual_host"],"test_case":{"name":"(TS_Read == True, TS_TxRangeMaterialized == True)"}}
tests/rptest/tests/tiered_storage_model_test.py::TieredStorageTest.test_tiered_storage@{"cloud_storage_type_and_url_style":[1,"virtual_host"],"test_case":{"name":"(TS_Read == True, SpilloverManifestUploaded == True)"}}
tests/rptest/tests/tiered_storage_model_test.py::TieredStorageTest.test_tiered_storage@{"cloud_storage_type_and_url_style":[1,"virtual_host"],"test_case":{"name":"(TS_Read == True, TS_TxRangeMaterialized == True)"}}
tests/rptest/tests/write_caching_fi_e2e_test.py::WriteCachingFailureInjectionE2ETest.test_crash_all@{"use_transactions":false}
tests/rptest/tests/datalake/datalake_e2e_test.py::DatalakeE2ETests.test_topic_lifecycle@{"cloud_storage_type":1,"filesystem_catalog_mode":false}
tests/rptest/tests/tiered_storage_model_test.py::TieredStorageTest.test_tiered_storage@{"cloud_storage_type_and_url_style":[1,"virtual_host"],"test_case":{"name":"(TS_Read == True, TS_Timequery == True)"}}
tests/rptest/tests/tiered_storage_model_test.py::TieredStorageTest.test_tiered_storage@{"cloud_storage_type_and_url_style":[1,"path"],"test_case":{"name":"(TS_Read == True, TS_TxRangeMaterialized == True)"}}
tests/rptest/tests/datalake/datalake_e2e_test.py::DatalakeE2ETests.test_topic_lifecycle@{"cloud_storage_type":1,"filesystem_catalog_mode":true}
tests/rptest/tests/tiered_storage_model_test.py::TieredStorageTest.test_tiered_storage@{"cloud_storage_type_and_url_style":[1,"virtual_host"],"test_case":{"name":"(TS_Read == True, TS_Timequery == True, SpilloverManifestUploaded == True)"}}
tests/rptest/tests/tiered_storage_model_test.py::TieredStorageTest.test_tiered_storage@{"cloud_storage_type_and_url_style":[1,"path"],"test_case":{"name":"(TS_Read == True, TS_TxRangeMaterialized == True, SpilloverManifestUploaded == True)"}}
tests/rptest/tests/tiered_storage_model_test.py::TieredStorageTest.test_tiered_storage@{"cloud_storage_type_and_url_style":[1,"virtual_host"],"test_case":{"name":"(TS_Read == True, AdjacentSegmentMergerReupload == True)"}}

vbotbuildovich · 2024-12-19T01:44:33Z

CI test results

test results on build#59935

test_id	test_kind	job_url	test_status	passed
rptest.tests.datalake.datalake_e2e_test.DatalakeE2ETests.test_topic_lifecycle.cloud_storage_type=CloudStorageType.S3.filesystem_catalog_mode=False	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc18-c804-449e-983b-f6040b48eed2	FAIL	0/6
rptest.tests.datalake.datalake_e2e_test.DatalakeE2ETests.test_topic_lifecycle.cloud_storage_type=CloudStorageType.S3.filesystem_catalog_mode=False	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc1c-24f3-40db-9e3c-b72686edcfd1	FAIL	0/6
rptest.tests.datalake.datalake_e2e_test.DatalakeE2ETests.test_topic_lifecycle.cloud_storage_type=CloudStorageType.S3.filesystem_catalog_mode=True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc18-c806-4851-8e40-029c7bdf36d7	FAIL	0/6
rptest.tests.datalake.datalake_e2e_test.DatalakeE2ETests.test_topic_lifecycle.cloud_storage_type=CloudStorageType.S3.filesystem_catalog_mode=True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc1c-24f4-48f5-b848-e36477ea95e1	FAIL	0/6
rptest.tests.datalake.partition_movement_test.PartitionMovementTest.test_cross_core_movements.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc18-c803-4c2f-9cd0-86f9a8dd5064	FLAKY	5/6
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.ABS.2.virtual_host.test_case=.TS_Read==True.TS_TxRangeMaterialized==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc18-c807-4275-8982-8723111a2347	FAIL	0/1
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.ABS.2.virtual_host.test_case=.TS_Read==True.TS_TxRangeMaterialized==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc1c-24f4-48f5-b848-e36477ea95e1	FAIL	0/1
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.ABS.2.virtual_host.test_case=.TS_Read==True.TS_TxRangeMaterialized==True.SpilloverManifestUploaded==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc18-c803-4c2f-9cd0-86f9a8dd5064	FAIL	0/1
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.ABS.2.virtual_host.test_case=.TS_Read==True.TS_TxRangeMaterialized==True.SpilloverManifestUploaded==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc1c-24f6-49d5-ae5a-d5eb3497ad6e	FAIL	0/1
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.path.test_case=.TS_Read==True.TS_TxRangeMaterialized==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc18-c804-449e-983b-f6040b48eed2	FAIL	0/1
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.path.test_case=.TS_Read==True.TS_TxRangeMaterialized==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc1c-24f2-467e-a057-0e4a790311ae	FAIL	0/1
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.path.test_case=.TS_Read==True.TS_TxRangeMaterialized==True.SpilloverManifestUploaded==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc18-c806-4851-8e40-029c7bdf36d7	FAIL	0/1
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.path.test_case=.TS_Read==True.TS_TxRangeMaterialized==True.SpilloverManifestUploaded==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc1c-24f3-40db-9e3c-b72686edcfd1	FAIL	0/1
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.virtual_host.test_case=.TS_Read==True.AdjacentSegmentMergerReupload==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc1c-24f6-49d5-ae5a-d5eb3497ad6e	FAIL	0/1
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.virtual_host.test_case=.TS_Read==True.SpilloverManifestUploaded==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc18-c807-4275-8982-8723111a2347	FAIL	0/1
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.virtual_host.test_case=.TS_Read==True.SpilloverManifestUploaded==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc1c-24f4-48f5-b848-e36477ea95e1	FAIL	0/1
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.virtual_host.test_case=.TS_Read==True.TS_ChunkedRead==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc18-c803-4c2f-9cd0-86f9a8dd5064	FAIL	0/1
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.virtual_host.test_case=.TS_Read==True.TS_Timequery==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc18-c804-449e-983b-f6040b48eed2	FAIL	0/1
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.virtual_host.test_case=.TS_Read==True.TS_Timequery==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc1c-24f2-467e-a057-0e4a790311ae	FAIL	0/1
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.virtual_host.test_case=.TS_Read==True.TS_Timequery==True.SpilloverManifestUploaded==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc18-c806-4851-8e40-029c7bdf36d7	FAIL	0/1
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.virtual_host.test_case=.TS_Read==True.TS_Timequery==True.SpilloverManifestUploaded==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc1c-24f3-40db-9e3c-b72686edcfd1	FAIL	0/1
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.virtual_host.test_case=.TS_Read==True.TS_TxRangeMaterialized==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc18-c807-4275-8982-8723111a2347	FAIL	0/1
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.virtual_host.test_case=.TS_Read==True.TS_TxRangeMaterialized==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc1c-24f4-48f5-b848-e36477ea95e1	FAIL	0/1
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.virtual_host.test_case=.TS_Read==True.TS_TxRangeMaterialized==True.SpilloverManifestUploaded==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc18-c803-4c2f-9cd0-86f9a8dd5064	FAIL	0/1
rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.virtual_host.test_case=.TS_Read==True.TS_TxRangeMaterialized==True.SpilloverManifestUploaded==True	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc1c-24f6-49d5-ae5a-d5eb3497ad6e	FAIL	0/1
rptest.tests.write_caching_fi_e2e_test.WriteCachingFailureInjectionE2ETest.test_crash_all.use_transactions=False	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc18-c804-449e-983b-f6040b48eed2	FAIL	0/1
rptest.tests.write_caching_fi_e2e_test.WriteCachingFailureInjectionE2ETest.test_crash_all.use_transactions=False	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc1c-24f4-48f5-b848-e36477ea95e1	FAIL	0/1
rptest.tests.write_caching_fi_test.WriteCachingFailureInjectionTest.test_crash_all	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc18-c803-4c2f-9cd0-86f9a8dd5064	FAIL	0/1
rptest.tests.write_caching_fi_test.WriteCachingFailureInjectionTest.test_crash_all	ducktape	https://buildkite.com/redpanda/redpanda/builds/59935#0193dc1c-24f3-40db-9e3c-b72686edcfd1	FAIL	0/1

test results on build#60023

test_id	test_kind	job_url	test_status	passed
rptest.tests.datalake.partition_movement_test.PartitionMovementTest.test_cross_core_movements.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60023#0193e62a-cfd5-4342-9d5c-bd82d737eba0	FLAKY	2/6

test results on build#60079

test_id	test_kind	job_url	test_status	passed
rptest.tests.delete_records_test.DeleteRecordsTest.test_delete_records_concurrent_truncations.cloud_storage_enabled=True.truncate_point=start_offset	ducktape	https://buildkite.com/redpanda/redpanda/builds/60079#0193f494-6494-4f55-ac0e-69cf1956752e	FLAKY	5/6

WillemKauf · 2024-12-19T04:28:05Z

Lot of KgoVerifierProducer failures, panic: Out of order offset 0 (vs 0 20000).

Not sure if this is another KgoVerifierProducer issue or if something else has been broken.

The only related change I can see in KgoVerifier was this, in which pw.validOffsets.Insert() is now called under a lock in new function OnAcked (but CI must have ran for this change many times before seeing these failures, so I am uncertain)

EDIT: Probably just because of the oneshot() changes I made. Reverted.

WillemKauf · 2024-12-20T14:33:25Z

Force push to:

Revert changes to kgo_verifier_service::oneshot().

vbotbuildovich · 2024-12-20T20:42:39Z

Retry command for Build#60016

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/datalake/compaction_test.py::CompactionTest.test_compaction@{"cloud_storage_type":1}

WillemKauf · 2024-12-20T21:10:07Z

Force push to:

Change compaction wait condition in compaction_test.py. Translation seems to slow the compaction process down quite a bit.

mmaslankaprv · 2024-12-23T14:46:36Z

src/v/datalake/translation/tests/translated_log_offset_test.cc

+        b.add_batch(std::move(batch)).get();
+    };
+
+    make_and_add_record(model::record_batch_type::raft_data);


i think it would make sense to add a test case where we start from configuration batch as this is what it looks like in real life, configuration is always the first batch

Thanks for the callout here, I have added two new test cases which begin with configuration batches to translated_log_offset_test.

And most importantly, add the function `get_translated_log_offset()`. This function will be used to compute the appropriate highest translated log offset for a given translated kafka offset while taking into account translator batches. This is an important function for removing offset translation from within the `translation_stm`, and to be less pessimistic about the `max_collectible_offset` returned by the `stm` in the future.

Make use of the added utility function `get_translated_log_offset()` throughout `partition_translator.cc` and `state_machine.cc` in order to update the `translation_stm::_highest_translated_log_offset` as translation occurs. `translation_stm::max_collectible_offset()` now returns the `_highest_translated_log_offset` instead of performing offset translation for the `highest_translated_offset`, which is currently more restrictive than it should be for housekeeping (due to it being translator batch unaware).

By handling gaps in offsets and recording seen keys, we can validate the correctness of a compacted log that has been translated (fully) into an iceberg table.

Adds a new `test_compaction` test, which uses the `KgoVerifierSeqConsumer` to validate a fully compacted log, along with the `datalake_verifier` service to validate the Iceberg table. Also moves the contents of `compaction_gaps_test.py` into `compaction_test.py`.

WillemKauf · 2024-12-23T16:21:32Z

Force push to:

Add two new tests to translated_log_offset_test.cc
Remove early return in get_translated_log_offset() to correct behavior for the edge case of kafka::offset{}
Add comment to get_translated_log_offset() declaration about its use.

WillemKauf requested a review from andrwng December 18, 2024 22:07

github-actions bot added area/build area/redpanda labels Dec 18, 2024

WillemKauf force-pushed the datalake_translator_offset_fix branch from 5db3f18 to 6c113d7 Compare December 18, 2024 22:09

WillemKauf commented Dec 18, 2024

View reviewed changes

WillemKauf force-pushed the datalake_translator_offset_fix branch from 6c113d7 to 0e1a24c Compare December 20, 2024 14:32

WillemKauf force-pushed the datalake_translator_offset_fix branch from 0e1a24c to 2fe6c55 Compare December 20, 2024 15:05

WillemKauf force-pushed the datalake_translator_offset_fix branch from 2fe6c55 to b01095c Compare December 20, 2024 21:09

WillemKauf requested review from bharathv and mmaslankaprv December 21, 2024 00:45

mmaslankaprv reviewed Dec 23, 2024

View reviewed changes

WillemKauf added 5 commits December 23, 2024 11:17

datalake: add highest_translated_log_offset to various objects

1dcd1c5

rptest: make datalake_verifier compaction aware

16504a8

By handling gaps in offsets and recording seen keys, we can validate the correctness of a compacted log that has been translated (fully) into an iceberg table.

WillemKauf force-pushed the datalake_translator_offset_fix branch from b01095c to 4bd7693 Compare December 23, 2024 16:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`datalake`: remove offset translation from `translation_stm` (and add `compaction_test.py`) #24610

`datalake`: remove offset translation from `translation_stm` (and add `compaction_test.py`) #24610

WillemKauf commented Dec 18, 2024

WillemKauf Dec 18, 2024

vbotbuildovich commented Dec 19, 2024 •

edited

Loading

vbotbuildovich commented Dec 19, 2024 •

edited

Loading

WillemKauf commented Dec 19, 2024 •

edited

Loading

WillemKauf commented Dec 20, 2024

vbotbuildovich commented Dec 20, 2024

WillemKauf commented Dec 20, 2024

mmaslankaprv Dec 23, 2024

WillemKauf Dec 23, 2024

WillemKauf commented Dec 23, 2024

	model::offset translation_stm::max_collectible_offset() {
	if (!_raft->log_config().iceberg_enabled()) {
	return model::offset::max();
	}
	// if offset is not initialized, do not attempt translation.
	if (_highest_translated_offset == kafka::offset{}) {
	return model::offset{};
	}
	return _raft->log()->to_log_offset(
	kafka::offset_cast(_highest_translated_offset));
	}

datalake: remove offset translation from translation_stm (and add compaction_test.py) #24610

Are you sure you want to change the base?

datalake: remove offset translation from translation_stm (and add compaction_test.py) #24610

Conversation

WillemKauf commented Dec 18, 2024

Backports Required

Release Notes

Improvements

WillemKauf Dec 18, 2024

Choose a reason for hiding this comment

vbotbuildovich commented Dec 19, 2024 • edited Loading

Retry command for Build#59935

vbotbuildovich commented Dec 19, 2024 • edited Loading

CI test results

WillemKauf commented Dec 19, 2024 • edited Loading

WillemKauf commented Dec 20, 2024

vbotbuildovich commented Dec 20, 2024

Retry command for Build#60016

WillemKauf commented Dec 20, 2024

mmaslankaprv Dec 23, 2024

Choose a reason for hiding this comment

WillemKauf Dec 23, 2024

Choose a reason for hiding this comment

WillemKauf commented Dec 23, 2024

`datalake`: remove offset translation from `translation_stm` (and add `compaction_test.py`) #24610

`datalake`: remove offset translation from `translation_stm` (and add `compaction_test.py`) #24610

vbotbuildovich commented Dec 19, 2024 •

edited

Loading

vbotbuildovich commented Dec 19, 2024 •

edited

Loading

WillemKauf commented Dec 19, 2024 •

edited

Loading