Traceability Transformed: Generating more Accurate Links with Pre-Trained BERT Models (ICSE 2021 - Technical Track)

Who

Jinfeng Lin, Yalin Liu, Qingkai Zeng, Meng Jiang, Jane Cleland-Huang

Track

ICSE 2021 Technical Track

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 25 May 2021 15:40 - 16:00 at Blended Sessions Room 2 - 1.3.2. Deep Neural Networks: Supporting SE Tasks #1 Chair(s): Ayse Tosun
Wed 26 May 2021 03:40 - 04:00 at Blended Sessions Room 2 - 1.3.2. Deep Neural Networks: Supporting SE Tasks #1

Abstract

Software traceability establishes and leverages associations between diverse development artifacts. Researchers have proposed the use of deep learning trace models to link natural language artifacts, such as requirements and issue descriptions, to source code; however, their effectiveness has been restricted by the availability of labeled data and efficiency at runtime. In this study, we propose a novel framework called Trace BERT (T-BERT) to generate trace links between source code and natural language artifacts. To address data sparsity, we leverage a three-step training strategy to enable trace models to transfer knowledge from a closely related Software Engineering challenge, which has a rich dataset, to produce trace links with much higher accuracy than has previously been achieved. We then apply the T-BERT framework to recover links between issues and commits in Open Source Projects. We comparatively evaluated the accuracy and efficiency of three BERT architectures in the framework. Experimental results show that a Single-BERT architecture generated the most accurate links, while a Siamese-BERT architecture produced comparable results with significantly less execution time. Furthermore, by learning and transferring knowledge, all three models in the framework can far outperform classical IR trace models and achieve impressive tracing accuracy on real-word OSS projects.

Link to Preprint

http://arxiv.org/abs/2102.04411

Jinfeng Lin

University of Notre Dame

Yalin Liu

University of Notre Dame

Qingkai Zeng

University of Notre Dame

Meng Jiang

University of Notre Dame

Jane Cleland-Huang

University of Notre Dame

United States

YT Video

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 25 May
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

15:20 - 16:15	1.3.2. Deep Neural Networks: Supporting SE Tasks #1NIER - New Ideas and Emerging Results / Journal-First Papers / Technical Track at Blended Sessions Room 2 +12h Chair(s): Ayse Tosun Istanbul Technical University

15:20 20m Paper		CODIT: Code Editing with Tree-Based Neural ModelsJournal-First Journal-First Papers Saikat Chakraborty Columbia University, Yangruibo Ding Columbia University, Miltiadis Allamanis Microsoft Research, UK, Baishakhi Ray Columbia University, USA Link to publication DOI Pre-print Media Attached
15:40 20m Paper		Traceability Transformed: Generating more Accurate Links with Pre-Trained BERT ModelsACM SIGSOFT Distinguished PaperTechnical Track Technical Track Jinfeng Lin University of Notre Dame, Yalin Liu University of Notre Dame, Qingkai Zeng University of Notre Dame, Meng Jiang University of Notre Dame, Jane Cleland-Huang University of Notre Dame Pre-print Media Attached
16:00 15m Paper		A Cognitive and Machine Learning-Based Software Development Paradigm Supported by ContextNIER NIER - New Ideas and Emerging Results Glaucia Melo University of Waterloo, Paulo Alencar University of Waterloo, Don Cowan University of Waterloo Pre-print Media Attached

Wed 26 May
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

03:20 - 04:15	1.3.2. Deep Neural Networks: Supporting SE Tasks #1NIER - New Ideas and Emerging Results / Journal-First Papers / Technical Track at Blended Sessions Room 2

03:20 20m Paper		CODIT: Code Editing with Tree-Based Neural ModelsJournal-First Journal-First Papers Saikat Chakraborty Columbia University, Yangruibo Ding Columbia University, Miltiadis Allamanis Microsoft Research, UK, Baishakhi Ray Columbia University, USA Link to publication DOI Pre-print Media Attached
03:40 20m Paper		Traceability Transformed: Generating more Accurate Links with Pre-Trained BERT ModelsACM SIGSOFT Distinguished PaperTechnical Track Technical Track Jinfeng Lin University of Notre Dame, Yalin Liu University of Notre Dame, Qingkai Zeng University of Notre Dame, Meng Jiang University of Notre Dame, Jane Cleland-Huang University of Notre Dame Pre-print Media Attached
04:00 15m Paper		A Cognitive and Machine Learning-Based Software Development Paradigm Supported by ContextNIER NIER - New Ideas and Emerging Results Glaucia Melo University of Waterloo, Paulo Alencar University of Waterloo, Don Cowan University of Waterloo Pre-print Media Attached