Write a Blog >>
MSR 2021
Mon 17 - Wed 19 May 2021
co-located with ICSE 2021
Wed 19 May 2021 02:01 - 02:05 at MSR Room 2 - NLP Chair(s): Chunyang Chen

In this paper, we study the problem of part-of-speech (POS) tagging for security vulnerability descriptions (SVD). In contrast to newswire articles, SVD often contains a high-level natural language description of the text composed of mixed language studded with codes, domain-specific jargon, vague language, and abbreviations. Moreover, training data dedicated to security vulnerability research is not widely available. Existing neural network-based POS tagging has often relied on manually annotated training data or applying natural language processing (NLP) techniques, suffering from two significant drawbacks. The former is extremely time-consuming and requires labor-intensive feature engineering and expertise. The latter is inadequate to identify linguistically-informed words specific to the SVD domain. In this paper, we propose an automatic approach to assign POS tags to tokens in SVD. Our approach uses the character-level representation to automatically extract orthographic features and unsupervised word embeddings to capture meaningful syntactic and semantic regularities from SVD. The character level representations are then concatenated with the word embedding as a combined feature, which is then learned and used to predict the POS tagging. To deal with the issue of the poor availability of annotated security vulnerability data, we implement a fine-tuning approach. Our approach provides public access to a POS annotated corpus of 8M tokens, which serves as a training dataset in this domain. Our evaluation results show a significant improvement in accuracy (17.72%-28.22%) of POS tagging in SVD over the current approaches.

Wed 19 May

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

02:00 - 02:50
NLPRegistered Reports / Technical Papers at MSR Room 2
Chair(s): Chunyang Chen Monash University
02:01
4m
Talk
Automatic Part-of-Speech Tagging for Security Vulnerability Descriptions
Technical Papers
Sofonias Yitagesu Tianjin University, Xiaowang Zhang Tianjin University, Zhiyong Feng Tianjin University, Xiaohong Li TianJin University, Zhenchang Xing Australian National University
Pre-print
02:05
4m
Talk
Attention-based model for predicting question relatedness on Stack Overflow
Technical Papers
Jiayan Pei South China University of Technology, Yimin Wu South China University of Technology, Research Institute of SCUT in Yangjiang, Zishan Qin South China University of Technology, Yao Cong South China University of Technology, Jingtao Guan Research Institute of SCUT in Yangjiang
Pre-print
02:09
4m
Talk
Characterising the Knowledge about Primitive Variables in Java Code Comments
Technical Papers
Mahfouth Alghamdi The University of Adelaide, Shinpei Hayashi Tokyo Institute of Technology, Takashi Kobayashi Tokyo Institute of Technology, Christoph Treude University of Adelaide
Pre-print
02:13
4m
Talk
Googling for Software Development: What Developers Search For and What They Find
Technical Papers
Pre-print
02:17
3m
Talk
Evaluating Pre-Trained Models for User Feedback Analysis in Software Engineering: A Study on Classification of App-Reviews
Registered Reports
Mohammad Abdul Hadi University of British Columbia, Fatemeh Hendijani Fard University of British Columbia
Pre-print
02:20
3m
Talk
Cross-status Communication and Project Outcomes in OSS Developmentā€“A Language Style Matching Perspective
Registered Reports
Yisi Han Nanjing University, Zhendong Wang University of California, Irvine, Yang Feng State Key Laboratory for Novel Software Technology, Nanjing University, Zhihong Zhao Nanjing Tech Unniversity, Yi Wang Beijing University of Posts and Telecommunications
Pre-print
02:23
27m
Live Q&A
Discussions and Q&A
Technical Papers


Information for Participants
Wed 19 May 2021 02:00 - 02:50 at MSR Room 2 - NLP Chair(s): Chunyang Chen
Info for room MSR Room 2:

Go directly to this room on Clowdr