Traceability Transformed: Generating more Accurate Links with Pre-Trained BERT Models
ACM SIGSOFT Distinguished PaperTechnical Track
Wed 26 May 2021 03:40 - 04:00 at Blended Sessions Room 2 - 1.3.2. Deep Neural Networks: Supporting SE Tasks #1
Software traceability establishes and leverages associations between diverse development artifacts. Researchers have proposed the use of deep learning trace models to link natural language artifacts, such as requirements and issue descriptions, to source code; however, their effectiveness has been restricted by the availability of labeled data and efficiency at runtime. In this study, we propose a novel framework called Trace BERT (T-BERT) to generate trace links between source code and natural language artifacts. To address data sparsity, we leverage a three-step training strategy to enable trace models to transfer knowledge from a closely related Software Engineering challenge, which has a rich dataset, to produce trace links with much higher accuracy than has previously been achieved. We then apply the T-BERT framework to recover links between issues and commits in Open Source Projects. We comparatively evaluated the accuracy and efficiency of three BERT architectures in the framework. Experimental results show that a Single-BERT architecture generated the most accurate links, while a Siamese-BERT architecture produced comparable results with significantly less execution time. Furthermore, by learning and transferring knowledge, all three models in the framework can far outperform classical IR trace models and achieve impressive tracing accuracy on real-word OSS projects.