It looks like <a class="issue-link js-issue-link" data-error-text="Failed to load titl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Remerge #139 ("preproc") about sv-benchmarks HOT 12 CLOSED

sosy-lab commented on June 5, 2024

Remerge #139 ("preproc")

from sv-benchmarks.

Comments (12)

dbeyer commented on June 5, 2024

The problem with the pull request that I accidentally merged (sorry again!)
was that it mixes too many different things.
Please create separate pull requests, one for each concern.

For example, you made changes to the continuous integration.
That should be explained and documented by one pull request.

Then you contributed verification tasks. Those should be contributed separately,
with a description file README.txt and licence file LICENSE.txt in each directory,
such that people know that they are free to use those.
Ideally, one pull request for one category.

Then, you added some code for the preprocessing demo category.
This includes a preproc configuration file that defines how to preprocess.

Finally, there is something that you call "SV-COMP benchmark".
I do not understand what you are referring to by this name, but at least,
obviously, this name is reserved for the whole repository (sv-benchmarks).
We use the notion benchmark (subject) to refer to a set of verification tasks
to benchmark (verb) certain characteristics of verification tools.
Perhaps call it mkPreproc or something like this, if it makes
a verification task from a set of source and include files.
This would fit to the category for which it is meant.

Best regards,
Dirk

from sv-benchmarks.

delcypher commented on June 5, 2024

@dbeyer

The problem with the pull request that I accidentally merged (sorry again!)
was that it mixes too many different things.
Please create separate pull requests, one for each concern.

This doesn't really make sense because most of these things are intertwined with each other. I can't have separate PRs in parallel because they would be interdependent.

I grant you that the TravisCI changes could be broken down further. However

I wanted testing for the new SVCB infrastructure to be enabled from the beginning
The TravisCI changes are done in a separate commit.

For example, you made changes to the continuous integration.
That should be explained and documented by one pull request.

As noted above the TravisCI changes were in their own commit and documented in the commit message. They appeared in the same PR because those changes are needed for testing the SVCB infrastructure.

Then you contributed verification tasks. Those should be contributed separately,
with a description file README.txt and licence file LICENSE.txt in each directory,
such that people know that they are free to use those.
Ideally, one pull request for one category.

I'm happy to add a README.txt and LICENSE.txt files but these tasks are not meant to be permanent. They are "examples" which illustrate how to use the SVCB infrastructure. My expectation is that we would remove once we have some real benchmarks contributed.

The SVCB infrastructure also cannot be tested without these examples which is why they were all in the same PR.

Then, you added some code for the preprocessing demo category.
This includes a preproc configuration file that defines how to preprocess.

I don't know what code you are talking about here. Are you talking about the spec.yml files?
If so you've misunderstood their purpose. Their purpose is define properties about the benchmarks, one of the being what source files constitute a benchmark. Other properties include the properties being checked, what expected results are, architecture etc..

Read the schema to see the full description

https://github.com/sosy-lab/sv-benchmarks/blob/799bbb9d162c8206aa72f248dc652e887dfc6ce4/svcb/svcb/schema.yml

Finally, there is something that you call "SV-COMP benchmark".
I do not understand what you are referring to by this name, but at least,
obviously, this name is reserved for the whole repository (sv-benchmarks).
We use the notion benchmark (subject) to refer to a set of verification tasks
to benchmark (verb) certain characteristics of verification tools.
Perhaps call it mkPreproc or something like this, if it makes
a verification task from a set of source and include files.
This would fit to the category for which it is meant.

mkPreproc is not an appropriate name and I think shows a lack of understanding of what was actually in #139

The main thing that was actually in #139 is the "SVCB" infrastructure which is

A benchmark specification format defined by a schema (the fact that these use non-preprocessed source files is an implementation detail and should not be the main focus here)
A svcb python module which provides parsing and validation of benchmark specification files (spec.yml). It also provides code for generating build system target declarations, currently only for CMake.
CMake code to support building declared benchmarks.
Benchmark statistic gathering tools. Currently there is only one (category-count.py)
A svcomp.h and stub implementation for building benchmarks. This is referred to as the "SV-COMP runtime". This comes with Doxygen generated documentation.

So to repeat again. The main focus is the "SVCB" infrastructure. Not the "preprocessing".

I am happy to change the svcb name but not to mkPreproc. `svcb`` seemed like an appropriate name because it is very short and it is infrastructure for the SV-COMP benchmarks.

Anyway here's my proposal for moving forward

I'll create a PR for most of the TravisCI changes that #139. Obviously the changes that test the "SVCB" infrastructure will have to be left out
We can do some "bikeshedding" to determine a name for the infrastructure currently known as "SVCB".
Once we have a name I can create a PR that adds the infrastructure along with the examples and testing.

Note for the above proposes splitting #139 (along with a name change of some bits and adding licenses) in to two PRs. I don't think it makes sense to split into more PRs

from sv-benchmarks.

delcypher commented on June 5, 2024

A few ideas for a replacement name for the "SVCB" infrastructure:

SVCBS - SV-COMP Benchmark Support (python module name svcbs)
SVCOMP-BENCHMARK-SUPPORT - " (python module name svcomp_benchmark_support)
TSIFNAS - The Support infrastructure formerly known as SVCB (not a serious suggestion ;) )

from sv-benchmarks.

delcypher commented on June 5, 2024

@zvonimir Seeing as you some interest you may wish to give your opinion here.

from sv-benchmarks.

PhilippWendler commented on June 5, 2024

A few ideas for a replacement name for the "SVCB" infrastructure:

SVCBS - SV-COMP Benchmark Support (python module name svcbs)

SVCOMP-BENCHMARK-SUPPORT - " (python module name svcomp_benchmark_support)

Why are all these names specific for SV-COMP? As far as I see it, this repository is a benchmark suite for software verification, and SV-COMP is just one of its users. There are even tasks in this repository that are not part of SV-COMP and never will be.

from sv-benchmarks.

PhilippWendler commented on June 5, 2024

Regarding the actual proposal, I think it is way too early to merge it. There are a lot of open questions about what do we actually want, and these need to be discussed in the community. Afterwards we should talk about a design, and then an implementation could be proposed and eventually merged.

A prototype like you presented can be helpful for such discussions, so thank you for that, but we can have the discussion without it being merged.

The mailing list is probably a much better place for having general discussions than GitHub, e.g., because threading support is missing here. I won't start a discussion there, but I am going to list the first questions that came to my mind here such that the people who want to start discussing this can see what kind of questions I am referring to:

Do we even need a completely new infrastructure/metadata format at all? For example, C verification tasks that need pre-processing could simply be put in a directory per task and we tell the verifier to verify *.c in that directory. Necessary defines could be added to a header in the directory, and architecture would be defined per category like it is now.
What should be the scope of the new infrastructure? Only tasks written in C(++) without preprocessing? C files that are already preprocessed? Java verification tasks? Other languages? The name and presentation of the current prototype look like it should be for all languages, but in fact it seems pretty specific to C (for example it contains svcomp.h etc., so either it is only for C or the proposal does in fact mix too many things together).
Do we want the metadata about the tasks to be read by the verifier, or only by the benchmarking infrastructure? Is it acceptable to force all verifiers to parse and understand a complex metadata file? Otherwise, how will be pass all the relevant information from the benchmarking infrastructure to the verifier?
How much metadata do we actually want/need? How much variation do we want to allow, for example regarding variants of the input language etc.? We need to consider that for the verifiers it is less effort if there is as little variation as possible, so maybe aim for as little metadata as technically necessary?
Do we need to be able to specify completely different metadata for every single verification task, or is it enough to have sets of files (e.g., per directory) that share the metadata except the expected verification result?
Do we want to have different verification tasks that consist of the same source file(s) and property, i.e., differ only in some other flag? If so, how will we present them in tables etc.? How big would be the danger of confusing these tasks? Can people not familiar with the setup identify and use such verification tasks correctly? Do different verification tasks that have the same source file(s) and property need to have different expected results?

An important argument that we should have in mind when discussing this is the usability of all this outside of SV-COMP. For example, it is absolutely important that everybody can use the benchmark suite defined here as easily as possible when they do benchmarking for one of their papers, even when they need to compare against a verifier that is neither their own nor has at anytime participated in SV-COMP and thus does not support any kind of metadata format defined here.

Otherwise we will again start getting papers with insufficient experimental evaluation because of lack of (usable) benchmarks (to avoid this is one of the most important goals of SV-COMP), or worse, we would maybe even get papers with wrong experimental results because people used the benchmarks in a way that took not all relevant metadata in account.

from sv-benchmarks.

delcypher commented on June 5, 2024

@sfsiegel you may be interested in this.

from sv-benchmarks.

sfsiegel commented on June 5, 2024

I really think we should go ahead and do the merge.
People can look at it, play with it, and see what they think.
All of Phillipp's questions are reasonable, but also Dan and I have given many of them a lot of thought (with input from others) and we think we have come up with a solution that addresses them.
So I would like to see the basic system up and running with a few examples before making more suggestions. Then we will have something concrete to discuss.

from sv-benchmarks.

delcypher commented on June 5, 2024

@PhilippWendler

Why are all these names specific for SV-COMP? As far as I see it, this repository is a benchmark suite for software verification, and SV-COMP is just one of its users. There are even tasks in this repository that are not part of SV-COMP and never will be.

Because the benchmark specification format is tied to and driven by SV-COMP. You may say that the benchmark suite is meant just for software verification but in its current state it is really tied to SV-COMP (preprocessed files, .prp and .set files are all things dictated by SV-COMP)

from sv-benchmarks.

delcypher commented on June 5, 2024

@PhilippWendler

Regarding the actual proposal, I think it is way too early to merge it. There are a lot of open questions about what do we actually want, and these need to be discussed in the community. Afterwards we should talk about a design, and then an implementation could be proposed and eventually merged.

Merging doesn't necessarily mean that the design and implementation is finalised. My intention was that the contributed code would we iterated on once merged.

from sv-benchmarks.

delcypher commented on June 5, 2024

@PhilippWendler

A prototype like you presented can be helpful for such discussions, so thank you for that, but we can have the discussion without it being merged.

The mailing list is probably a much better place for having general discussions than GitHub, e.g., because threading support is missing here. I won't start a discussion there, but I am going to list the first questions that came to my mind here such that the people who want to start discussing this can see what kind of questions I am referring to:

...

These are good questions to ask. I have thoughts about some of them but these should be discussed on the mailing list. Notably though the list has been pretty silent about this so far as I solicited for comments on some of these quite a while ago.

from sv-benchmarks.

delcypher commented on June 5, 2024

I've decided I'm not going to waste my time trying to merge this again.

Development will proceed at https://github.com/delcypher/svcomp-build-mockup and discussion can happen its issue tracker or on the sv-comp mailing list.

from sv-benchmarks.

Remerge #139 ("preproc") about sv-benchmarks HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent