-
Notifications
You must be signed in to change notification settings - Fork 169
Combined tasks #1104
base: master
Are you sure you want to change the base?
Combined tasks #1104
Conversation
Yes, the licensing should be properly handled. The best and easiest way to do this would be to decide on #1099, add machine-readable (REUSE-style) license headers to all relevant source files, and adjust your tool that constructs the combined programs such that it puts the union of all such headers from the input files in to the output files. |
Now that #1099 is closed to getting merged: For which directories would you need REUSE-style license declarations in order to be able to update this PR? After we have this list we can maybe split the workload over a few people. For some directories the task of adding license headers might be a lot of work, but then we could leave programs from these directories out for a first round of combined tasks. |
Thanks for following up on this. We have the following pairs:
So directory-wise, we need |
Oh, that's a problem. For both |
What about We could try to get these merged for SV-COMP'21 and postpone the others. |
Right, I overlooked this pair. This seems doable. All ECA programs should be easy, because I think there were never any cases where other people added tasks to their directories or so, so we can assume they are all from the same source. For |
718eb2d
to
cbc1944
Compare
…2012/Problem12_label0* Used in 'Dirk Beyer, Marie-Christine Jakobs, Thomas Lemberger: Difference Verification with Conditions, SEFM20'
…ring Used in 'Dirk Beyer, Marie-Christine Jakobs, Thomas Lemberger: Difference Verification with Conditions, SEFM20'
…ton_ Used in 'Dirk Beyer, Marie-Christine Jakobs, Thomas Lemberger: Difference Verification with Conditions, SEFM20'
…oft_float_ Used in 'Dirk Beyer, Marie-Christine Jakobs, Thomas Lemberger: Difference Verification with Conditions, SEFM20'
…systemc/token_ring Used in 'Dirk Beyer, Marie-Christine Jakobs, Thomas Lemberger: Difference Verification with Conditions, SEFM20'
cbc1944
to
4b37803
Compare
I've created #1210 to handle the feasible tasks. |
I might be missing something, but I don't see what is the aggregated value for this benchmarks. Besides the fact that the new benchmarks are larger than the base ones in isolation, I don't think they pose any challenge to the verifiers that are not posed by the base benchmarks already. Is there anything I am missing? |
The tasks are not just larger than the individual components, but simulate systems with loosely coupled components. This is almost non-existent in sv-benchmarks at the moment, as most tasks describe a single algorithm or program behavior. |
I can see the value in having loosely couple systems (microservices), but since "loosely" in here actually mean completely decoupled (exactly one of them is executed) one could argue that this does not fairly simulate these systems. As I see it, they just increase maintainability complexity without bringing much new in terms of verification challenges. |
I didn't talk about microservices, and I'm not trying to simulate those. You can read more about the tasks here, and also see that they actually are challenging: https://www.sosy-lab.org/research/pub/2020-SEFM.Difference_Verification_with_Conditions.pdf |
Thanks for the pointer. I see that in the context of evolving programs this makes sense. |
Previously, only pals_ were updated
I agree with @hernanponcedeleon. These programs are interesting if the original and evolved programs are provided. Are these programs being submitted in such a context ? If not, then what is the value add by them in the context of reachability analysis? These tasks can be made more challenging if not as loosely coupled. |
Since when is it necessary to argue for new benchmark contributions? I don't see this in the PR checklist. It requires time that is probably better spend on fixing other benchmark tasks.
Just to avoid confusion: The submitted programs are "original" and "evolved", but these concepts may be confusing because the tasks were artificially generated and "original" and "evolved" can be swapped arbitrarily. (We can say that A -> B or B -> A).
Sorry, I'm not sure what you mean with this.
This was already stated: the tasks are not just larger than the individual components, but simulate systems with loosely coupled components. This is almost non-existent in sv-benchmarks at the moment, as most tasks describe a single algorithm or program behavior. Loose coupling is a feature for these tasks: In case you're interested, all further information about how theses tasks were selected and how well CPA-seq and UAutomizer perform on them can be found in the referenced paper. |
Hi, int main() { main1 and main2 are independent of each other. So, they belong to different program slices. Any tool that can verify existing svcomp programs corresponding to main1 and main2 within a CPU time limit of 900 seconds can verify Pi. Thus verifying Pi is like firing the following commands either in parallel, or sequentially, |
Exactly.
Yes, in theory. But the runtime of verifying Pi is unknown. Similarly, traditional bounded model checking on Pi will have to check significantly larger formulas then if main1 and main2 were considered separately. As another example, let's assume Pi is unsafe; main1 is unsafe and main2 is safe. The tasks that were chosen for main1 and main2 are already difficult to solve by themselves - so choosing an efficient abstraction or portfolio approach may be necessary to solve Pi. In theory it is possible to simply run two instances of your tool, as you proposed; you run one instance on main1 and one instance on main2. Many other issues regarding scalability of verifiers apply to Pi. I would argue that these scalability issues are even more evident because Pi consists of loosely coupled components that don't benefit from caching etc. I hope this was able to show that solving these combinations without knowing their loose structure is not as trivial as it may seem.
I think that's an interesting idea.
Yes, that's certainly true. But these tasks, as they currently are, can still be useful for incremental verification. So removing some of them would diminish their purpose. |
Hmm. Thanks for the clarification. |
Description of tasks
New benchmark tasks that are a combination of existing benchmark tasks.
Each benchmark task in this directory is created from two other tasks
of other categories, according to the following pattern:
Definitions are renamed to avoid conflicts, if necessary.
This construction leads to programs with independent program parts.
The name of each benchmark task reveals its origin:
All task names are created by the pattern ${TASK_1}+${TASK_2}.
Open Issue: Licensing
I'm not sure about how to license these tasks.
Some of the tasks that we used are part of sv-benchmarks, but do not have a separate license in their corresponding directory.
For now, I have written a LICENSE.txt that coarsely references the original licenses (if present),
but I don't think this is good enough.
I'd be happy about any constructive input on this matter.
Checklist
programs added to new and appropriately named directory
license present and acceptable (either in separate file or as comment at beginning of program)
contributed-by present (either in README file or as comment at beginning of program)
programs added to a
.set
file of an existing category, or new sub-category established (if justified)intended property matches the corresponding
.prp
fileprograms and expected answer added to a
.yml
file according to task definitions.cfg
file