4 different benchmark sets focused around str.replace_all #7

OliverMa1 · 2025-04-03T14:47:36Z

This pull request introduces four new benchmark sets aimed at evaluating and challenging string solvers on diverse and difficult problem instances focusing on the operator str.replace_all.

Benchmark Sets

1. pcp-3-3-random (1000 benchmarks)

Benchmarks encoding PCP[3,3] on randomly generated strings with exactly length 3 only using 0 and 1 as alphabet.The instances use 3 tiles, meaning that each PCP instance is constructed with exactly 3 tile pairs. Most benchmarks are expected to be unsat.

2. pcp-3-4-hard (3170 benchmarks)

Benchmarks encoding PCP[3,4] that were considered hard to solve by the paper "Creating Difficult Instances of the Post Correspondence Problem". The instances still have 3 tiles but strings up to length 4. Now the length is not exact but at most 4.
The benchmarks were originally considered hard; many have later been solved using alternative techniques such as Parikh Automata and Model Checking. However, string solvers have not been extensively tested on these benchmarks. Current experiments show that none of those benchmarks are solved by string solvers.

3. rna-sat and rna-unsat (500+500 benchmarks)

These benchmarks model a reverse transcription process inspired by bioinformatics.
An unknown RNA string y is converted into a DNA string by applying a series of str.replace_all operations that simulate nucleotide base pairing.

Pattern Constraint:
- rna-sat: Instances are constructed so that they are satisfiable.
- rna-unsat: Instances are designed to be unsatisfiable.

hansjoergschurr · 2025-04-14T20:06:49Z

Thank you for the submission!
The benchmarks look great.

However, I am worried about the diversity of the benchmark families. Especially, the second set with its 3170 benchmarks is large.
Do you think it would be reasonable to select e.g. 500 benchmark from each set?

I would also suggest merging the sets into one family and using sub folders to structure then.
For example, you could have 20250403-PCP-String/pcb-3-4-hard/....

Finally, can you insert white paces between the solvers in the Target solver field?

OliverMa1 · 2025-04-15T14:10:13Z

I removed some of the benchmarks. There might be some merit in the future to have all 3170 benchmarks or find a more refined subset but right now they all seem equally hard to solve for string solvers.

hansjoergschurr · 2025-04-25T15:13:17Z

Thank you for the update and the submission in general.

From our perspective, the size of a benchmark set is a difficult question. In the past we accepted some large, but not very diverse benchmark sets. Those are imho not very useful. I suspect that the subset you selected already give the SMT solver developers some hard nuts to crack.

add different string benchmark sets

0d28ca4

empty space and restructure

f625954

hansjoergschurr merged commit 7c142d2 into SMT-LIB:main Apr 25, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

4 different benchmark sets focused around str.replace_all #7

4 different benchmark sets focused around str.replace_all #7

Uh oh!

OliverMa1 commented Apr 3, 2025

Uh oh!

hansjoergschurr commented Apr 14, 2025

Uh oh!

OliverMa1 commented Apr 15, 2025

Uh oh!

hansjoergschurr commented Apr 25, 2025

Uh oh!

Uh oh!

Uh oh!

4 different benchmark sets focused around str.replace_all #7

4 different benchmark sets focused around str.replace_all #7

Uh oh!

Conversation

OliverMa1 commented Apr 3, 2025

Benchmark Sets

1. pcp-3-3-random (1000 benchmarks)

2. pcp-3-4-hard (3170 benchmarks)

3. rna-sat and rna-unsat (500+500 benchmarks)

Uh oh!

hansjoergschurr commented Apr 14, 2025

Uh oh!

OliverMa1 commented Apr 15, 2025

Uh oh!

hansjoergschurr commented Apr 25, 2025

Uh oh!

Uh oh!

Uh oh!