Skip to content

4 different benchmark sets focused around str.replace_all #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 25, 2025

Conversation

OliverMa1
Copy link
Contributor

This pull request introduces four new benchmark sets aimed at evaluating and challenging string solvers on diverse and difficult problem instances focusing on the operator str.replace_all.

Benchmark Sets

1. pcp-3-3-random (1000 benchmarks)

  • Benchmarks encoding PCP[3,3] on randomly generated strings with exactly length 3 only using 0 and 1 as alphabet.The instances use 3 tiles, meaning that each PCP instance is constructed with exactly 3 tile pairs. Most benchmarks are expected to be unsat.

2. pcp-3-4-hard (3170 benchmarks)

Benchmarks encoding PCP[3,4] that were considered hard to solve by the paper "Creating Difficult Instances of the Post Correspondence Problem". The instances still have 3 tiles but strings up to length 4. Now the length is not exact but at most 4.
The benchmarks were originally considered hard; many have later been solved using alternative techniques such as Parikh Automata and Model Checking. However, string solvers have not been extensively tested on these benchmarks. Current experiments show that none of those benchmarks are solved by string solvers.

3. rna-sat and rna-unsat (500+500 benchmarks)

These benchmarks model a reverse transcription process inspired by bioinformatics.
An unknown RNA string y is converted into a DNA string by applying a series of str.replace_all operations that simulate nucleotide base pairing.

  • Pattern Constraint:
    • rna-sat: Instances are constructed so that they are satisfiable.
    • rna-unsat: Instances are designed to be unsatisfiable.

@hansjoergschurr
Copy link
Collaborator

Thank you for the submission!
The benchmarks look great.

However, I am worried about the diversity of the benchmark families. Especially, the second set with its 3170 benchmarks is large.
Do you think it would be reasonable to select e.g. 500 benchmark from each set?

I would also suggest merging the sets into one family and using sub folders to structure then.
For example, you could have 20250403-PCP-String/pcb-3-4-hard/....

Finally, can you insert white paces between the solvers in the Target solver field?

@OliverMa1
Copy link
Contributor Author

I removed some of the benchmarks. There might be some merit in the future to have all 3170 benchmarks or find a more refined subset but right now they all seem equally hard to solve for string solvers.

@hansjoergschurr
Copy link
Collaborator

Thank you for the update and the submission in general.

From our perspective, the size of a benchmark set is a difficult question. In the past we accepted some large, but not very diverse benchmark sets. Those are imho not very useful. I suspect that the subset you selected already give the SMT solver developers some hard nuts to crack.

@hansjoergschurr hansjoergschurr merged commit 7c142d2 into SMT-LIB:main Apr 25, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants