-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set the --testcasefilter directly through the command line #3091
Comments
Have you tried using the If yes, why do you think it should be also available via the command line ? |
Yes we use the test-case-filter option for local testing. To ensure that mutation testing provides timely feedback in Pull request builds, setting the This method will allow us to focus on specific tests that cover the code changes, reducing the total duration of the mutation testing process to approximately 5 minutes. This is significantly faster than using the By integrating mutation testing with the |
@saranyanandagopal |
How do you identify which tests need to be run? Stryker runs all tests in the init phase for coverage analysis. That is the only sure way to identify which tests cover which mutants. Stryker would need to run tests that have not been changed. That being said, you could request for another change: I think Stryker runs all tests twice: once to establish a baseline result, which tests fail and how long they take. And then to discover coverage. Both could be done in a single pass, but we since a typical Stryker run will requires running the tests several times, having one extra run for convenience and stability was ok. |
@candoumbe We have a naming pattern for our unit test files where all filenames include the word "fixture". I have verified that the following method works on our Azure DevOps Windows machine to identify if any test fixtures are being checked in.
I can then build the $testCaseFilterOption (testing in progress) and plug it into the Stryker command. script: | |
@dupdob I have seen Stryker running "all" unit tests in the init phase when --since is enabled. In our case it takes closer to 2 hours as we have around 5000 unit tests. But I don't think it does that when the test-case-filter value is passed. Because it finishes in ~5 minutes and we can see the mutation report with the score for that part of the code base. Thanks for the suggestion on cutting down a pass. It will certainly improve the mutation testing duration (helpful in our nightly builds as we can trigger it for multiple projects), but it still won't help our Pull request builds. Anything that takes more than 10 minutes cannot be integrated in Pull Request builds as it will slow down the feature development. My above answer has the command that I tested in the Azure Devops Pipeline to identify the test file names to pass to the test-case-filter command. |
If I understand your script correctly, you will only run modified tests. There is a significant risk that it will to incorrect result. Stryker needs to run all tests covering a mutation to certify its outcome. The intent behind the Here is another possible optimization request: Stryker could skip non modified project, i.e. only test projects (incl test projects) that have at least one modified file. |
Thank you for your quicker responses! We have around 4000 tests in one test project and about 1000 in the second. If only the first project has file changes, I assume Stryker would still need to run all 4000 tests. Is that correct? I understand your point about the test-case-filter. Our unit tests are organized in such a way that each test fixture corresponds to a single source code file, keeping related tests together. The risk of missing tests that cover all mutants is very low because of this organization. Running partial mutation tests in pull requests is better than not running them at all. We also have a nightly mutation testing pipeline that runs all tests, giving us a complete overview. However, the most critical time to catch issues is before merging pull requests. This approach helps us spot bugs early as we shift left. Stryker can add significant value to teams adopting a shift-left strategy by reducing the likelihood of bugs reaching integration testing environments. So far, the only faster option that has worked for us is the test-case-filter. Sorry for being persistent :) but I’ve tested multiple options and haven’t found a faster solution. |
No worries for being persistent, I am too. But what I want you to understand, is that the mutation score will often be false, and probably worse that it should be using a normal run. Test filter is supposed to be the same between baseline and diff, otherwise score and results are inconsistent. |
Thanks for taking the time for this. To be clear: I am not saying that you cannot see progress; I even think that if you see progress, it is actual progress. |
Got it! When changes are made to the existing code, it’s expected that the corresponding tests are also updated in the same PR. If the tests aren’t updated, they might break during the CI build, requiring the author to review and fix them. Also, Our on-demand pipelines will be manually triggered for regression testing before production releases or hot fixes. This means we’ll have smaller CI builds throughout the day, with a longer regression run conducted separately. |
Code can be changed without breaking existing unit tests, otherwise you would not need mutation testing. Again, I am just saying that however you look at this, using a different filter between full mutation testing and PRs means you may get misleading result for the PR. In practice, if a PR author finds his/her result to be wrong, he/she would still be able to touch the proper test files to get a more accurate result. That being, I am just a maintainer for this project, and I feel I cannot make this decision . @rouke-broersma and @richardwerkman : WDYT about exposing This is not difficult to make, but users have to be warned that they may get weird result when using diff mode with varying filter. |
You are correct about the coverage assumption; we do not have 100% test coverage. Stryker has been instrumental in identifying areas with no coverage, and we’ve improved this over the past few weeks. I understand your point about the differences in results between the test-case-filter run and the full mutation run. With our nightly mutation run, our goal is to make any necessary adjustments after the PR but still get immediate feedback in the PRs. Thanks for your questions—they prompted me to consider some important points. I originally raised this in the Slack channel and logged it here at the request of @rouke-broersma |
@dupdob I asked them to create the issue so we can have this discussion 👌. I share your concerns, however if we decide to add this in the command line I don't think a warning is necessary. A user could already achieve this by modifying the config file on the fly. The flag would imo not be advertised with the functionality that the OP has in mind, but if they use it like that it's at their own risk. The only big concern I have is the sheer number of commands line options. I feel like it has already grown past the amount we had back when we decided to remove the majority.. Ultimately I think the bigger concern (not sure if this is properly represented in this issue) is that our initial testrun is more than 4 times slower than when running with vstest. From what I understand from our conversation on slack the test case filter is a workaround for this problem. I don't remember if it's intentional but I think our initial testrun is not running with parallel execution enabled. If we can change that I think that could already alleviate some of the concerns. |
Parallelizing the initial test will accelerate Stryker testing, although it may not be faster for PR builds. vstest runs approximately 4,800 tests in parallel in about 20 minutes. Considering the time for mutant generation and the final test run, the total duration would be around 45-50 minutes, assuming no additional steps are involved. Does that seem accurate? This will still be an improvement for the regression runs though. Apart from the test-case-filter, which has its own risks, I can’t think of any other solution to make this work for PR builds. I am open to testing other suggestions that you might have. |
Update
Axis of improvementsActually, there are several options to improve performance both for baseline run and diff run. But most of them require significant effort. This is obvious, otherwise they would have already been done. Safe onesThese improvements have little to no risk. Skip initial test runInitial test run could be merged with coverage analysis. Benefit: save the time of a full test run, have a better run time baseline (at it includes mutation switching related costs). Risks: Stryker run on project with no tests or too many failing tests will be aborted after mutation instead of before Remove duplicate initial test runsFor solution mode, a test project will be run several times: one for each of the 'production' projects that it covers. Remove duplicate coverage analysis runFor solution mode, a test project will be run several times for coverage analysis: one for each of the 'production' projects that it covers. Diff mode for runsStryker could use diff results to skip test project that have not been modified and that do not cover any modified project. |
@saranyanandagopal I assume you also run your unit tests separately for your pull request builds? We could add an option to skip the initial testrun, leaving the consequence of invalid results with the user. This would save you the additional time of the initial testrun however the coverage run still needs to be performed. @dupdob I thought so too but it looks like we currently limit cpu to 1 during initial testrun, only test discovery is run with multiple cores: These are the runsettings for the initial testrun:
I think this is a bug right? |
No, it is not a bug. It is mandatory to get the proper default test run time. |
Of course, good points. |
Hi @rouke-broersma I understand the concern behind parallelization. We do run unit tests separately in pull request builds. I can test the time it takes when the duplicate initial test runs are removed. |
Hey @rouke-broersma @dupdob hope you are doing well! Following-up to see if we have any updates on the implementation of suggested improvements! |
Is your feature request related to a problem? Please describe.
We can’t run Stryker .NET in pull requests due to the long execution time. Developers don’t get immediate feedback on unit test quality, and late feedback (after nightly runs) is often ignored.
Describe the solution you'd like
Allow setting --testcasefilter directly via the command line. This would let us use git diff to dynamically select test files and run Stryker, providing immediate feedback in pull request builds.
Describe alternatives you've considered
Using --since is impractical because it requires running all unit tests initially, which is too slow for pull request builds in a large project with ~4000 tests.
Additional context
Even though the concurrency is set, I don't think it is being used for the initial test run adding to the longer execution time
The text was updated successfully, but these errors were encountered: