Fault Localization with Code Coverage Representation Learning (ICSE 2021 - Technical Track)

Who

Yi Li, Shaohua Wang, Tien N. Nguyen

Track

ICSE 2021 Technical Track

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 27 May 2021 20:50 - 21:10 at Blended Sessions Room 3 - 3.6.3. Fault Localization #2 Chair(s): Davide Falessi
Fri 28 May 2021 08:50 - 09:10 at Blended Sessions Room 3 - 3.6.3. Fault Localization #2

Abstract

In this paper, we propose DEEPRL4FL, a deep learning fault localization (FL) approach that locates the buggy code at the statement and method levels by treating FL as an image pattern recognition problem. DEEPRL4FL does so via novel code coverage representation learning (RL) and data dependencies RL for program statements. Those two types of RL on the dynamic information in a code coverage matrix are also combined with the code representation learning on the static information of the usual suspicious source code. This combination is inspired by crime scene investigation in which investigators analyze the crime scene (failed test cases and statements) and related persons (statements with dependencies), and at the same time, examine the usual suspects who have committed a similar crime in the past (similar buggy code in the training data). For the code coverage information, DEEPRL4FL first orders the test cases and marks error-exhibiting code statements, expecting that a model can recognize patterns discriminating between faulty and non-faulty statements/methods easily. For dependencies among statements, the suspiciousness of a statement is seen taking into account the data dependencies to other statements in execution and data flows, in addition to the statement by itself. Finally, the vector representations for code coverage matrix, data dependencies among statements, and source code are combined and used as the input of a classifier built from a Convolution Neural Network to detect buggy statements/methods. Our empirical evaluation shows that DEEPRL4FL outperforms the baseline models and localizes 245 bugs from Defects4J. It improves the top-1 results of baselines from 15.0%–206.3%.

Link to Preprint

https://arxiv.org/pdf/2103.00270.pdf

Yi Li

New Jersey Institute of Technology

Shaohua Wang

New Jersey Institute of Technology

Tien N. Nguyen

University of Texas at Dallas

Fault Localization with Code Coverage Representation Learning