FlakeFlagger: Predicting Flakiness Without Rerunning Tests (ICSE 2021 - Technical Track)

Who

Abdulrahman Alshammari, Christopher Morris, Michael Hilton, Jonathan Bell

Track

ICSE 2021 Technical Track

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 27 May 2021 15:25 - 15:45 at Blended Sessions Room 4 - 3.3.4. Testing: Flaky Tests Chair(s): José Miguel Rojas
Fri 28 May 2021 03:25 - 03:45 at Blended Sessions Room 4 - 3.3.4. Testing: Flaky Tests

Abstract

When developers make changes to their code, they typically run regression tests to detect if their recent changes (re)introduce any bugs. However, many tests are flaky, and their outcomes can change non-deterministically, failing without apparent cause. Flaky tests are a significant nuisance in the development process, since they make it more difficult for developers to trust the outcome of their tests, and hence, it is important to know which tests are flaky. The traditional approach to identify flaky tests is to rerun them multiple times: if a test is observed both passing and failing on the same code, it is definitely flaky. We conducted a very large empirical study looking for flaky tests by rerunning the test suites of 24 projects 10,000 times each, and found that even with this many reruns, some previously identified flaky tests were still not detected. We propose FlakeFlagger, a novel approach that collects a set of features describing the behavior of each test, and then predicts tests that are likely to be flaky based on similar behavioral features. We found that FlakeFlagger correctly labeled as flaky at least as many tests as a state-of-the-art flaky test classifier, but that FlakeFlagger reported far fewer false positives. This lower false positive rate translates directly to saved time for researchers and developers who use the classification result to guide more expensive flaky test detection processes. Evaluated on our dataset of 23 projects with flaky tests, FlakeFlagger outperformed the prior approach (by F1 score) on 16 projects and tied on 4 projects. Our results indicate that this approach can be effective for identifying likely flaky tests prior to running time-consuming flaky test detectors.

Link to Preprint

https://www.jonbell.net/preprint/icse21-flakeflagger.pdf

Abdulrahman Alshammari

George Mason University

United States

Christopher Morris

Carnegie Mellon University

United States

Michael Hilton

Carnegie Mellon University, USA

United States

Jonathan Bell

Northeastern University

United States

FlakeFlagger: Predicting Flakiness Without Rerunning Tests

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 27 May
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

15:05 - 16:05	3.3.4. Testing: Flaky TestsTechnical Track / Journal-First Papers at Blended Sessions Room 4 +12h Chair(s): José Miguel Rojas University of Leicester, UK

15:05 20m Paper		Quantifying, Characterizing, and Mitigating Flakily Covered Program ElementsJournal-First Journal-First Papers Shivashree Vysali Vaidhyam Subramanian McGill University, Shane McIntosh , Bram Adams Queens University Pre-print Media Attached
15:25 20m Paper		FlakeFlagger: Predicting Flakiness Without Rerunning TestsTechnical Track Technical Track Abdulrahman Alshammari George Mason University, Christopher Morris Carnegie Mellon University, Michael Hilton Carnegie Mellon University, USA, Jonathan Bell Northeastern University Pre-print Media Attached
15:45 20m Paper		An Empirical Analysis of UI-based Flaky TestsTechnical Track Technical Track Alan Romano University at Buffalo, Zihe Song University of Texas at Dallas, Sampath Grandhi University of Texas at Dallas, Wei Yang University of Texas at Dallas, Weihang Wang University at Buffalo, SUNY Pre-print Media Attached

Fri 28 May
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

03:05 - 04:05	3.3.4. Testing: Flaky TestsTechnical Track / Journal-First Papers at Blended Sessions Room 4

03:05 20m Paper		Quantifying, Characterizing, and Mitigating Flakily Covered Program ElementsJournal-First Journal-First Papers Shivashree Vysali Vaidhyam Subramanian McGill University, Shane McIntosh , Bram Adams Queens University Pre-print Media Attached
03:25 20m Paper		FlakeFlagger: Predicting Flakiness Without Rerunning TestsTechnical Track Technical Track Abdulrahman Alshammari George Mason University, Christopher Morris Carnegie Mellon University, Michael Hilton Carnegie Mellon University, USA, Jonathan Bell Northeastern University Pre-print Media Attached
03:45 20m Paper		An Empirical Analysis of UI-based Flaky TestsTechnical Track Technical Track Alan Romano University at Buffalo, Zihe Song University of Texas at Dallas, Sampath Grandhi University of Texas at Dallas, Wei Yang University of Texas at Dallas, Weihang Wang University at Buffalo, SUNY Pre-print Media Attached