Write a Blog >>
ICSE 2021
Mon 17 May - Sat 5 June 2021
Thu 27 May 2021 15:25 - 15:45 at Blended Sessions Room 4 - 3.3.4. Testing: Flaky Tests Chair(s): José Miguel Rojas
Fri 28 May 2021 03:25 - 03:45 at Blended Sessions Room 4 - 3.3.4. Testing: Flaky Tests

When developers make changes to their code, they typically run regression tests to detect if their recent changes (re)introduce any bugs. However, many tests are flaky, and their outcomes can change non-deterministically, failing without apparent cause. Flaky tests are a significant nuisance in the development process, since they make it more difficult for developers to trust the outcome of their tests, and hence, it is important to know which tests are flaky. The traditional approach to identify flaky tests is to rerun them multiple times: if a test is observed both passing and failing on the same code, it is definitely flaky. We conducted a very large empirical study looking for flaky tests by rerunning the test suites of 24 projects 10,000 times each, and found that even with this many reruns, some previously identified flaky tests were still not detected. We propose FlakeFlagger, a novel approach that collects a set of features describing the behavior of each test, and then predicts tests that are likely to be flaky based on similar behavioral features. We found that FlakeFlagger correctly labeled as flaky at least as many tests as a state-of-the-art flaky test classifier, but that FlakeFlagger reported far fewer false positives. This lower false positive rate translates directly to saved time for researchers and developers who use the classification result to guide more expensive flaky test detection processes. Evaluated on our dataset of 23 projects with flaky tests, FlakeFlagger outperformed the prior approach (by F1 score) on 16 projects and tied on 4 projects. Our results indicate that this approach can be effective for identifying likely flaky tests prior to running time-consuming flaky test detectors.

Thu 27 May

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

15:05 - 16:05
3.3.4. Testing: Flaky TestsTechnical Track / Journal-First Papers at Blended Sessions Room 4 +12h
Chair(s): José Miguel Rojas University of Leicester, UK
15:05
20m
Paper
Quantifying, Characterizing, and Mitigating Flakily Covered Program ElementsJournal-First
Journal-First Papers
Pre-print Media Attached
15:25
20m
Paper
FlakeFlagger: Predicting Flakiness Without Rerunning TestsArtifact ReusableTechnical TrackArtifact Available
Technical Track
Abdulrahman Alshammari George Mason University, Christopher Morris Carnegie Mellon University, Michael Hilton Carnegie Mellon University, USA, Jonathan Bell Northeastern University
Pre-print Media Attached
15:45
20m
Paper
An Empirical Analysis of UI-based Flaky TestsArtifact ReusableTechnical TrackArtifact Available
Technical Track
Alan Romano University at Buffalo, Zihe Song University of Texas at Dallas, Sampath Grandhi University of Texas at Dallas, Wei Yang University of Texas at Dallas, Weihang Wang University at Buffalo, SUNY
Pre-print Media Attached

Fri 28 May

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

03:05 - 04:05
03:05
20m
Paper
Quantifying, Characterizing, and Mitigating Flakily Covered Program ElementsJournal-First
Journal-First Papers
Pre-print Media Attached
03:25
20m
Paper
FlakeFlagger: Predicting Flakiness Without Rerunning TestsArtifact ReusableTechnical TrackArtifact Available
Technical Track
Abdulrahman Alshammari George Mason University, Christopher Morris Carnegie Mellon University, Michael Hilton Carnegie Mellon University, USA, Jonathan Bell Northeastern University
Pre-print Media Attached
03:45
20m
Paper
An Empirical Analysis of UI-based Flaky TestsArtifact ReusableTechnical TrackArtifact Available
Technical Track
Alan Romano University at Buffalo, Zihe Song University of Texas at Dallas, Sampath Grandhi University of Texas at Dallas, Wei Yang University of Texas at Dallas, Weihang Wang University at Buffalo, SUNY
Pre-print Media Attached