Common Statement Kind Changes to Inform Automatic Program Repair
The search space for automatic program repair approaches is vast and the search for mechanisms to help restrict this search are increasing. We make a granular analysis based on statement kinds to find which statements are more likely to be modified than others when fixing an error. We construct a corpus for analysis by delimiting debugging regions in the provided dataset and recursively analyze the differences between the Simplified Syntax Trees associated with EditEvent’s. We build a distribution of statement kinds with their corresponding likelihood of being modified and we validate the usage of this distribution to guide the statement selection. We then build association rules with different confidence thresholds to describe statement kinds commonly modified together for multi-edit patch creation. Finally we evaluate association rule coverage over a held out test set and find that when using a 95% confidence threshold we can create less and more accurate rules that fully cover 93.8% of the testing instances.