MSR 2023
Dates to be announced Melbourne, Australia
co-located with ICSE 2023
Dates
Tracks
You're viewing the program in a time zone which is different from your device's time zone change time zone

Mon 15 May

Displayed time zone: Hobart change

09:00 - 10:30
Opening Session & Award TalksMSR Awards / MIP Award at Meeting Room 109
Chair(s): Emad Shihab Concordia Univeristy, Bogdan Vasilescu Carnegie Mellon University
09:00
20m
Day opening
Opening Session & Award Announcements
MSR Awards
Emad Shihab Concordia Univeristy, Patanamon Thongtanunam The University of Melbourne, Bogdan Vasilescu Carnegie Mellon University
09:20
20m
Talk
MSR 2023 Foundational Contribution Award
MSR Awards
09:40
20m
Talk
MSR 2023 Ric Holt Early Career Achievement Award
MSR Awards
Li Li Beihang University
10:00
30m
Talk
MIP #1: Mining Source Code Repositories at Massive Scale Using Language Modeling
MIP Award
A: Miltiadis Allamanis Microsoft Research, Charles Sutton Google Research
11:00 - 11:45
Development Tools & Practices IRegistered Reports / Industry Track / Technical Papers at Meeting Room 109
Chair(s): Olga Baysal Carleton University
11:00
12m
Talk
Understanding the Time to First Response In GitHub Pull Requests
Technical Papers
Kazi Amit Hasan Queen's University, Canada, Marcos Macedo Queen's University at Kingston / Universidad de Montevideo, Yuan Tian Queens University, Kingston, Canada, Bram Adams Queen's University, Kingston, Ontario, Ding Steven, H., H. Queen’s University at Kingston
Pre-print
11:12
12m
Talk
Dealing with Popularity Bias in Recommender Systems for Third-party Libraries: How far Are We?
Technical Papers
Phuong T. Nguyen University of L’Aquila, Riccardo Rubei University of L'Aquila, Juri Di Rocco University of L'Aquila, Claudio Di Sipio University of L'Aquila, Davide Di Ruscio University of L'Aquila, Massimiliano Di Penta University of Sannio, Italy
Pre-print
11:24
6m
Talk
Smart Contract Upgradeability on the Ethereum Blockchain Platform: An Exploratory Study
Registered Reports
Ilham Qasse Reykjavik University, Mohammad Hamdaqa Polytechnique Montréal, Björn Þór Jónsson Reykjavik University
11:30
6m
Talk
An Exploratory Study of Ad Hoc Parsers in Python
Registered Reports
Pre-print
11:36
6m
Talk
Improving Agile Planning for Reliable Software Delivery
Industry Track
Jirat Pasuksmit Atlassian, Fan Jiang Atlassian, Kemp Thornton Atlassian, Arik Friedman Atlassian, Natalija Fuksmane Atlassian, Isabelle Kohout Atlassian, Julian Connor Atlassian
Pre-print
11:00 - 11:45
SE for MLData and Tool Showcase Track / Technical Papers at Meeting Room 110
Chair(s): Sarah Nadi University of Alberta
11:00
12m
Talk
AutoML from Software Engineering Perspective: Landscapes and ChallengesDistinguished Paper Award
Technical Papers
Chao Wang Peking University, Zhenpeng Chen University College London, UK, Minghui Zhou Peking University
Pre-print
11:12
12m
Talk
Characterizing and Understanding Software Security Vulnerabilities in Machine Learning Libraries
Technical Papers
Nima Shiri Harzevili York University, Jiho Shin York University, Junjie Wang Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Song Wang York University, Nachiappan Nagappan Facebook
11:24
6m
Talk
DeepScenario: An Open Driving Scenario Dataset for Autonomous Driving System Testing
Data and Tool Showcase Track
Chengjie Lu Simula Research Laboratory and University of Oslo, Tao Yue Simula Research Laboratory, Shaukat Ali Simula Research Laboratory
Pre-print
11:30
6m
Talk
NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python
Data and Tool Showcase Track
Ratnadira Widyasari Singapore Management University, Singapore, Zhou Yang Singapore Management University, Ferdian Thung Singapore Management University, Sheng Qin Sim Singapore Management University, Singapore, Fiona Wee Singapore Management University, Singapore, Camellia Lok Singapore Management University, Singapore, Jack Phan Singapore Management University, Singapore, Haodi Qi Singapore Management University, Singapore, Constance Tan Singapore Management University, Singapore, Qijin Tay Singapore Management University, Singapore, David Lo Singapore Management University
11:36
6m
Talk
PTMTorrent: A Dataset for Mining Open-source Pre-trained Model Packages
Data and Tool Showcase Track
Wenxin Jiang Purdue University, Nicholas Synovic Loyola University Chicago, Purvish Jajal Purdue University, Taylor R. Schorlemmer Purdue University, Arav Tewari Purdue University, Bhavesh Pareek Purdue University, George K. Thiruvathukal Loyola University Chicago and Argonne National Laboratory, James C. Davis Purdue University
Pre-print
11:50 - 12:35
Documentation + Q&A IData and Tool Showcase Track / Technical Papers at Meeting Room 109
Chair(s): Ahmad Abdellatif Concordia University
11:50
12m
Talk
Evaluating Software Documentation Quality
Technical Papers
Henry Tang University of Alberta, Sarah Nadi University of Alberta
12:02
12m
Talk
What Do Users Ask in Open-Source AI Repositories? An Empirical Study of GitHub Issues
Technical Papers
Zhou Yang Singapore Management University, Chenyu Wang Singapore Management University, Jieke Shi Singapore Management University, Thong Hoang CSIRO's Data61, Pavneet Singh Kochhar Microsoft, Qinghua Lu CSIRO’s Data61, Zhenchang Xing , David Lo Singapore Management University
12:14
12m
Talk
PICASO: Enhancing API Recommendations with Relevant Stack Overflow Posts
Technical Papers
Ivana Clairine Irsan Singapore Management University, Ting Zhang Singapore Management University, Ferdian Thung Singapore Management University, Kisub Kim Singapore Management University, David Lo Singapore Management University
12:26
6m
Talk
GIRT-Data: Sampling GitHub Issue Report Templates
Data and Tool Showcase Track
Nafiseh Nikehgbal Sharif University of Technology, Amir Hossein Kargaran LMU Munich, Abbas Heydarnoori Bowling Green State University, Hinrich Schütze LMU Munich
Pre-print
11:50 - 12:35
11:50
6m
Talk
TypeScript's Evolution: An Analysis of Feature Adoption Over Time
Technical Papers
Joshua D. Scarsbrook The University of Queensland, Mark Utting The University of Queensland, Ryan K. L. Ko The University of Queensland
Pre-print
11:56
6m
Talk
DGMF: Fast Generation of Comparable, Updatable Dependency Graphs for Software Repositories
Data and Tool Showcase Track
Tobias Litzenberger TU Dortmund University, Johannes Düsing TU Dortmund University, Ben Hermann TU Dortmund
12:02
6m
Talk
Enabling Analysis and Reasoning on Software Systems through Knowledge Graph Representation
Data and Tool Showcase Track
Satrio Adi Rukmono , Michel Chaudron Eindhoven University of Technology, The Netherlands
12:08
6m
Talk
microSecEnD: A Dataset of Security-Enriched Dataflow Diagrams for Microservice Applications
Data and Tool Showcase Track
Simon Schneider Hamburg University of Technology, Tufan Özen Hamburg University of Technology, Michael Chen Hamburg University of Technology, Riccardo Scandariato Hamburg University of Technology
12:14
12m
Talk
Wasmizer: Curating WebAssembly-driven Projects on GitHub
Technical Papers
Alexander Nicholson University of Auckland, Quentin Stiévenart Vrije Universiteit Brussel, Arash Mazidi TU Clausthal, Mohammad Ghafari TU Clausthal
12:26
6m
Talk
Feature Toggle Usage Patterns : A Case Study on Google Chromium
Technical Papers
Md Tajmilur Rahman Gannon University
13:45 - 14:15
MIP TalkMIP Award at Meeting Room 109
Chair(s): Bogdan Vasilescu Carnegie Mellon University
13:45
30m
Talk
MIP #2: The Impact of Tangled Code Changes
MIP Award
Kim Herzig Microsoft, Andreas Zeller CISPA Helmholtz Center for Information Security
14:20 - 15:15
Language ModelsTechnical Papers at Meeting Room 109
Chair(s): Patanamon Thongtanunam University of Melbourne
14:20
12m
Talk
On Codex Prompt Engineering for OCL Generation: An Empirical Study
Technical Papers
Seif Abukhalaf Polytechnique Montreal, Mohammad Hamdaqa Polytechnique Montréal, Foutse Khomh Polytechnique Montréal
14:32
12m
Talk
Cross-Domain Evaluation of a Deep Learning-Based Type Inference System
Technical Papers
Bernd Gruner DLR Institute of Data Science, Tim Sonnekalb German Aerospace Center (DLR), Thomas S. Heinze Cooperative University Gera-Eisenach, Clemens-Alexander Brust German Aerospace Center (DLR)
14:44
12m
Talk
Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study
Technical Papers
Tim van Dam Delft University of Technology, Maliheh Izadi Delft University of Technology, Arie van Deursen Delft University of Technology
Pre-print
14:56
12m
Talk
Model-Agnostic Syntactical Information for Pre-Trained Programming Language Models
Technical Papers
Iman Saberi University of British Columbia Okanagan, Fatemeh Hendijani Fard University of British Columbia
14:20 - 15:15
Understanding DefectsRegistered Reports / Data and Tool Showcase Track / Technical Papers at Meeting Room 110
Chair(s): Matteo Paltenghi University of Stuttgart, Germany
14:20
12m
Talk
What Happens When We Fuzz? Investigating OSS-Fuzz Bug History
Technical Papers
Brandon Keller Rochester Institute of Technology, Benjamin S. Meyers Rochester Institute of Technology, Andrew Meneely Rochester Institute of Technology
14:32
12m
Talk
An Empirical Study of High Performance Computing (HPC) Performance Bugs
Technical Papers
Md Abul Kalam Azad University of Michigan - Dearborn, Nafees Iqbal University of Michigan - Dearborn, Foyzul Hassan University of Michigan - Dearborn, Probir Roy University of Michigan at Dearborn
Pre-print
14:44
6m
Talk
Semantically-enriched Jira Issue Tracking Data
Data and Tool Showcase Track
Themistoklis Diamantopoulos Electrical and Computer Engineering Dept, Aristotle University of Thessaloniki, Dimitrios-Nikitas Nastos Electrical and Computer Engineering Dept., Aristotle University of Thessaloniki, Andreas Symeonidis Electrical and Computer Engineering Dept., Aristotle University of Thessaloniki
Pre-print
14:50
6m
Talk
An exploratory study of bug introducing changes: what happens when bugs are introduced in open source software?
Registered Reports
Lukas Schulte Universitity of Passau, Anamaria Mojica-Hanke University of Passau and Universidad de los Andes, Mario Linares-Vasquez Universidad de los Andes, Steffen Herbold University of Passau
14:56
6m
Talk
HasBugs - Handpicked Haskell Bugs
Data and Tool Showcase Track
Leonhard Applis Delft University of Technology, Annibale Panichella Delft University of Technology
15:02
6m
Talk
An Empirical Study on the Performance of Individual Issue Label Prediction
Technical Papers
Jueun Heo , Seonah Lee Gyeongsang National University
15:45 - 16:30
Tutorial #1Tutorials at Meeting Room 109
Chair(s): Yuan Tian Queens University, Kingston, Canada
15:45
45m
Talk
Tutorial: Recognizing Developers' Emotions Using Non-invasive Biometrics Sensors
Tutorials
Nicole Novielli University of Bari
15:45 - 16:30
Process Automation & DevOpsData and Tool Showcase Track / Technical Papers / Industry Track at Meeting Room 110
Chair(s): Andy Meneely Rochester Institute of Technology
15:45
12m
Talk
Investigating the Resolution of Vulnerable Dependencies with Dependabot Security UpdatesDistinguished Paper Award
Technical Papers
Hamid Mohayeji Nasrabadi Eindhoven University of Technology, Andrei Agaronian Eindhoven University of Technology, Eleni Constantinou University of Cyprus, Nicola Zanone Eindhoven University of Technology, Alexander Serebrenik Eindhoven University of Technology
15:57
12m
Talk
Unveiling the Relationship Between Continuous Integration and Code Coverage
Technical Papers
José Diego Saraiva da Silva UFRN, Daniel Alencar Da Costa University of Otago, Uirá Kulesza Federal University of Rio Grande do Norte, Gustavo Sizílio Federal University of Rio Grande do Norte, José Gameleira Neto Federal University of Rio Grande do Norte, Roberta Coelho , Mei Nagappan University of Waterloo
16:09
6m
Talk
EGAD: A Moldable Tool for GitHub Action Analysis
Data and Tool Showcase Track
Pablo Valenzuela-Toledo University of Bern, Alexandre Bergel University of Chile, Timo Kehrer University of Bern, Oscar Nierstrasz University of Bern, Switzerland
16:15
6m
Talk
The Atlassian Data Lake: consolidating enriched software development data in a single, queryable system
Industry Track
Arik Friedman Atlassian, Rohan Dhupelia Atlassian, Ben Jackson Atlassian
File Attached
16:21
6m
Talk
Are We Speeding Up or Slowing Down? On Temporal Aspects of Code Velocity
Technical Papers
Gunnar Kudrjavets University of Groningen, Nachiappan Nagappan Facebook, Ayushi Rastogi University of Groningen, The Netherlands
Pre-print
16:35 - 17:20
Ethics & EnergyTechnical Papers / Registered Reports at Meeting Room 109
Chair(s): Arumoy Shome Delft University of Technology
16:35
12m
Talk
Energy Consumption Estimation of API-usage in Mobile Apps via Static Analysis
Technical Papers
Abdul Ali Bangash University of Alberta, Canada, Qasim Jamal FAST National University, Kalvin Eng University of Alberta, Karim Ali University of Alberta, Abram Hindle University of Alberta
Pre-print
16:47
12m
Talk
An Exploratory Study on Energy Consumption of Dataframe Processing Libraries
Technical Papers
Shriram Shanbhag IIT Tirupati, Sridhar Chimalakonda IIT Tirupati
Pre-print
16:59
6m
Talk
Understanding issues related to personal data and data protection in open source projects on GitHub
Registered Reports
Anne Hennig Karlsruhe Institute of Technology, Lukas Schulte Universitity of Passau, Steffen Herbold University of Passau, Oksana Kulyk IT University of Copenhagen, Denmark, Peter Mayer University of Southern Denmark
17:05
12m
Talk
Whistleblowing and Tech on Twitter
Technical Papers
Laura Duits Vrije Universiteit Amsterdam, Isha Kashyap Vrije Universiteit Amsterdam, Joey Bekkink Vrije Universiteit Amsterdam, Kousar Aslam Vrije Universiteit Amsterdam, Emitzá Guzmán Vrije Universiteit Amsterdam
16:35 - 17:20
SecurityTechnical Papers / Data and Tool Showcase Track at Meeting Room 110
Chair(s): Chanchal K. Roy University of Saskatchewan
16:35
12m
Talk
UNGOML: Automated Classification of unsafe Usages in Go
Technical Papers
Anna-Katharina Wickert TU Darmstadt, Germany, Clemens Damke University of Munich (LMU), Lars Baumgärtner Technische Universität Darmstadt, Eyke Hüllermeier University of Munich (LMU), Mira Mezini TU Darmstadt
Pre-print File Attached
16:47
12m
Talk
Connecting the .dotfiles: Checked-In Secret Exposure with Extra (Lateral Movement) Steps
Technical Papers
Gerhard Jungwirth TU Wien, Aakanksha Saha TU Wien, Michael Schröder TU Wien, Tobias Fiebig Max-Planck-Institut für Informatik, Martina Lindorfer TU Wien, Jürgen Cito TU Wien
Pre-print
16:59
12m
Talk
MANDO-HGT: Heterogeneous Graph Transformers for Smart Contract Vulnerability Detection
Technical Papers
Hoang H. Nguyen L3S Research Center, Leibniz Universität Hannover, Hannover, Germany, Nhat-Minh Nguyen Singapore Management University, Singapore, Chunyao Xie L3S Research Center, Leibniz Universität Hannover, Germany, Zahra Ahmadi L3S Research Center, Leibniz Universität Hannover, Hannover, Germany, Daniel Kudenko L3S Research Center, Leibniz Universität Hannover, Germany, Thanh-Nam Doan Independent Researcher, Atlanta, Georgia, USA, Lingxiao Jiang Singapore Management University
Pre-print Media Attached
17:11
6m
Talk
SecretBench: A Dataset of Software Secrets
Data and Tool Showcase Track
Setu Kumar Basak North Carolina State University, Lorenzo Neil North Carolina State University, Bradley Reaves North Carolina State University, Laurie Williams North Carolina State University
Pre-print
18:00 - 21:00
MSR Dinner at Cargo Hall, South WharfTechnical Papers at Offsite
18:00
3h
Meeting
MSR Dinner at Cargo Hall, South Wharf
Technical Papers

Tue 16 May

Displayed time zone: Hobart change

09:50 - 10:30
Tutorial #2Tutorials at Meeting Room 109
Chair(s): Alexander Serebrenik Eindhoven University of Technology
09:50
40m
Tutorial
Tutorial: Mining and Analysing Collaboration in git Repositories with git2net
Tutorials
Christoph Gote Chair of Systems Design, ETH Zurich
09:50 - 10:30
Mining ChallengeMining Challenge at Meeting Room 110
Chair(s): Audris Mockus The University of Tennessee
09:50
6m
Talk
An Empirical Study to Investigate Collaboration Among Developers in Open Source Software (OSS)
Mining Challenge
Weijie Sun University of Alberta, Samuel Iwuchukwu University of Alberta, Abdul Ali Bangash University of Alberta, Canada, Abram Hindle University of Alberta
Pre-print
09:56
6m
Talk
Insights into Female Contributions in Open-Source Projects
Mining Challenge
Arifa Islam Champa Idaho State University, Md Fazle Rabbi Idaho State University, Minhaz F. Zibran Idaho State University, Md Rakibul Islam University of Wisconsin - Eau Claire
Pre-print
10:02
6m
Talk
The Secret Life of CVEs
Mining Challenge
Piotr Przymus Nicolaus Copernicus University in Toruń, Mikołaj Fejzer Nicolaus Copernicus University in Toruń, Jakub Narębski Nicolaus Copernicus University in Toruń, Krzysztof Stencel University of Warsaw
Pre-print
10:08
6m
Talk
Evolution of the Practice of Software Testing in Java Projects
Mining Challenge
Anisha Islam Department of Computing Science, University of Alberta, Nipuni Tharushika Hewage Department of Computing Science, University of Alberta, Abdul Ali Bangash University of Alberta, Canada, Abram Hindle University of Alberta
Pre-print
10:14
6m
Talk
Keep the Ball Rolling: Analyzing Release Cadence in GitHub Projects
Mining Challenge
Oz Kilic Carleton University, Nathaniel Bowness University of Ottawa, Olga Baysal Carleton University
Pre-print
11:00 - 11:45
Documentation + Q&A IITechnical Papers / Data and Tool Showcase Track at Meeting Room 109
Chair(s): Maram Assi Queen's University
11:00
12m
Talk
Understanding the Role of Images on Stack Overflow
Technical Papers
Dong Wang Kyushu University, Japan, Tao Xiao Nara Institute of Science and Technology, Christoph Treude University of Melbourne, Raula Gaikovina Kula Nara Institute of Science and Technology, Hideaki Hata Shinshu University, Yasutaka Kamei Kyushu University
Pre-print
11:12
12m
Talk
Do Subjectivity and Objectivity Always Agree? A Case Study with Stack Overflow Questions
Technical Papers
Saikat Mondal University of Saskatchewan, Masud Rahman Dalhousie University, Chanchal K. Roy University of Saskatchewan
Pre-print
11:24
6m
Talk
GiveMeLabeledIssues: An Open Source Issue Recommendation System
Data and Tool Showcase Track
Joseph Vargovich Northern Arizona University, Fabio Marcos De Abreu Santos Northern Arizona University, USA, Jacob Penney Northern Arizona University, Marco Gerosa Northern Arizona University, Igor Steinmacher Northern Arizona University
Pre-print Media Attached
11:30
6m
Talk
DocMine: A Software Documentation-Related Dataset of 950 GitHub Repositories
Data and Tool Showcase Track
11:36
6m
Talk
PENTACET data - 23 Million Code Comments and 500,000 SATD comments
Data and Tool Showcase Track
Murali Sridharan University of Oulu, Leevi Rantala University of Oulu, Mika Mäntylä University of Oulu
11:00 - 11:45
11:00
12m
Talk
Don't Forget the Exception! Considering Robustness Changes to Identify Design Problems
Technical Papers
Anderson Oliveira PUC-Rio, João Lucas Correia Federal University of Alagoas, Leonardo Da Silva Sousa Carnegie Mellon University, USA, Wesley Assunção Johannes Kepler University Linz, Austria & Pontifical Catholic University of Rio de Janeiro, Brazil, Daniel Coutinho PUC-Rio, Alessandro Garcia PUC-Rio, Willian Oizumi GoTo, Caio Barbosa UFAL, Anderson Uchôa Federal University of Ceará, Juliana Alves Pereira PUC-Rio
Pre-print
11:12
12m
Talk
Pre-trained Model Based Feature Envy Detection
Technical Papers
mawenhao Wuhan University, Yaoxiang Yu Wuhan University, Xiaoming Ruan Wuhan University, Bo Cai Wuhan University
11:24
6m
Talk
CLEAN++: Code Smells Extraction for C++
Data and Tool Showcase Track
Tom Mashiach Ben Gurion University of the Negev, Israel, Bruno Sotto-Mayor Ben Gurion University of the Negev, Israel, Gal Kaminka Bar Ilan University, Israel, Meir Kalech Ben Gurion University of the Negev, Israel
11:30
6m
Talk
DACOS-A Manually Annotated Dataset of Code Smells
Data and Tool Showcase Track
Himesh Nandani Dalhousie University, Mootez Saad Dalhousie University, Tushar Sharma Dalhousie University
Pre-print File Attached
11:36
6m
Talk
What Warnings Do Engineers Really Fix? The Compiler That Cried Wolf
Industry Track
Gunnar Kudrjavets University of Groningen, Aditya Kumar Snap, Inc., Ayushi Rastogi University of Groningen, The Netherlands
Pre-print
11:50 - 12:35
Development Tools & Practices IIData and Tool Showcase Track / Industry Track / Technical Papers / Registered Reports at Meeting Room 109
Chair(s): Banani Roy University of Saskatchewan
11:50
12m
Talk
Automating Arduino Programming: From Hardware Setups to Sample Source Code Generation
Technical Papers
Imam Nur Bani Yusuf Singapore Management University, Singapore, Diyanah Binte Abdul Jamal Singapore Management University, Lingxiao Jiang Singapore Management University
Pre-print
12:02
6m
Talk
A Dataset of Bot and Human Activities in GitHub
Data and Tool Showcase Track
Natarajan Chidambaram University of Mons, Alexandre Decan University of Mons; F.R.S.-FNRS, Tom Mens University of Mons
12:08
6m
Talk
Mining the Characteristics of Jupyter Notebooks in Data Science Projects
Registered Reports
Morakot Choetkiertikul Mahidol University, Thailand, Apirak Hoonlor Mahidol University, Chaiyong Ragkhitwetsagul Mahidol University, Thailand, Siripen Pongpaichet Mahidol University, Thanwadee Sunetnanta Mahidol University, Tasha Settewong Mahidol University, Raula Gaikovina Kula Nara Institute of Science and Technology
12:14
6m
Talk
Optimizing Duplicate Size Thresholds in IDEs
Industry Track
Konstantin Grotov JetBrains Research, Constructor University, Sergey Titov JetBrains Research, Alexandr Suhinin JetBrains, Yaroslav Golubev JetBrains Research, Timofey Bryksin JetBrains Research
Pre-print
12:20
12m
Talk
Boosting Just-in-Time Defect Prediction with Specific Features of C Programming Languages in Code Changes
Technical Papers
Chao Ni Zhejiang University, xiaodanxu College of Computer Science and Technology, Zhejiang university, Kaiwen Yang Zhejiang University, David Lo Singapore Management University
11:50 - 12:35
Software Libraries & EcosystemsTechnical Papers / Industry Track / Data and Tool Showcase Track at Meeting Room 110
Chair(s): Mehdi Keshani Delft University of Technology
11:50
12m
Talk
A Large Scale Analysis of Semantic Versioning in NPM
Technical Papers
Donald Pinckney Northeastern University, Federico Cassano Northeastern University, Arjun Guha Northeastern University and Roblox Research, Jonathan Bell Northeastern University
Pre-print
12:02
12m
Talk
Phylogenetic Analysis of Reticulate Software Evolution
Technical Papers
Akira Mori National Institute of Advanced Industrial Science and Technology, Japan, Masatomo Hashimoto Chiba Institute of Technology, Japan
12:14
6m
Talk
PyMigBench: A Benchmark for Python Library Migration
Data and Tool Showcase Track
Mohayeminul Islam University of Alberta, Ajay Jha North Dakota State University, Sarah Nadi University of Alberta, Ildar Akhmetov University of Alberta
12:20
6m
Talk
Determining Open Source Project Boundaries
Industry Track
12:26
6m
Talk
Intertwining Communities: Exploring Libraries that Cross Software Ecosystems
Technical Papers
Kanchanok Kannee Nara Institute of Science and Technology, Raula Gaikovina Kula Nara Institute of Science and Technology, Supatsara Wattanakriengkrai Nara Institute of Science and Technology, Kenichi Matsumoto Nara Institute of Science and Technology
Pre-print
13:45 - 14:30
Tutorial #3Tutorials at Meeting Room 109
Chair(s): Alexander Serebrenik Eindhoven University of Technology
13:45
45m
Tutorial
Tutorial: Beyond the leading edge. What else is out there?
Tutorials
Tim Menzies North Carolina State University
Pre-print
13:45 - 14:30
Software QualityData and Tool Showcase Track / Technical Papers at Meeting Room 110
Chair(s): Tushar Sharma Dalhousie University
13:45
12m
Talk
Helm Charts for Kubernetes Applications: Evolution, Outdatedness and Security Risks
Technical Papers
Ahmed Zerouali Vrije Universiteit Brussel, Ruben Opdebeeck Vrije Universiteit Brussel, Coen De Roover Vrije Universiteit Brussel
Pre-print
13:57
12m
Talk
Control and Data Flow in Security Smell Detection for Infrastructure as Code: Is It Worth the Effort?
Technical Papers
Ruben Opdebeeck Vrije Universiteit Brussel, Ahmed Zerouali Vrije Universiteit Brussel, Coen De Roover Vrije Universiteit Brussel
Pre-print
14:09
12m
Talk
Method Chaining Redux: An Empirical Study of Method Chaining in Java, Kotlin, and Python
Technical Papers
Ali Keshk University of Nebraska-Lincoln, Robert Dyer University of Nebraska-Lincoln
Pre-print Media Attached
14:21
6m
Talk
Snapshot Testing Dataset
Data and Tool Showcase Track
Emily Bui Loyola University Maryland, Henrique Rocha Loyola University Maryland, USA
14:35 - 15:15
14:35
12m
Talk
Large Language Models and Simple, Stupid Bugs
Technical Papers
Kevin Jesse University of California at Davis, USA, Toufique Ahmed University of California at Davis, Prem Devanbu University of California at Davis, Emily Morgan University of California, Davis
Pre-print
14:47
12m
Talk
The ABLoTS Approach for Bug Localization: is it replicable and generalizable?Distinguished Paper Award
Technical Papers
Feifei Niu Nanjing University, Christoph Mayr-Dorn JOHANNES KEPLER UNIVERSITY LINZ, Wesley Assunção Johannes Kepler University Linz, Austria & Pontifical Catholic University of Rio de Janeiro, Brazil, Liguo Huang Southern Methodist University, Jidong Ge Nanjing University, Bin Luo Nanjing University, Alexander Egyed Johannes Kepler University Linz
Pre-print File Attached
14:59
6m
Talk
LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations
Data and Tool Showcase Track
Catherine Tony Hamburg University of Technology, Markus Mutas Hamburg University of Technology, Nicolás E. Díaz Ferreyra Hamburg University of Technology, Riccardo Scandariato Hamburg University of Technology
Pre-print
15:05
6m
Talk
Defectors: A Large, Diverse Python Dataset for Defect Prediction
Data and Tool Showcase Track
Parvez Mahbub Dalhousie University, Ohiduzzaman Shuvo Dalhousie University, Masud Rahman Dalhousie University
Pre-print
14:35 - 15:15
Human AspectsTechnical Papers / Data and Tool Showcase Track at Meeting Room 110
Chair(s): Alexander Serebrenik Eindhoven University of Technology
14:35
12m
Talk
A Study of Gender Discussions in Mobile Apps
Technical Papers
Mojtaba Shahin RMIT University, Australia, Mansooreh Zahedi The Univeristy of Melbourne, Hourieh Khalajzadeh Deakin University, Australia, Ali Rezaei Nasab Shiraz University
Pre-print
14:47
12m
Talk
Tell Me Who Are You Talking to and I Will Tell You What Issues Need Your Skills
Technical Papers
Fabio Marcos De Abreu Santos Northern Arizona University, USA, Jacob Penney Northern Arizona University, João Felipe Pimentel Northern Arizona University, Igor Wiese Federal University of Technology, Igor Steinmacher Northern Arizona University, Marco Gerosa Northern Arizona University
Pre-print
14:59
6m
Talk
She Elicits Requirements and He Tests: Software Engineering Gender Bias in Large Language Models
Technical Papers
Christoph Treude University of Melbourne, Hideaki Hata Shinshu University
Pre-print Media Attached
15:05
6m
Talk
GitHub OSS Governance File Dataset
Data and Tool Showcase Track
Yibo Yan University of California, Davis, Seth Frey University of California, Davis, Amy Zhang University of Washington, Seattle, Vladimir Filkov University of California at Davis, USA, Likang Yin University of California at Davis
Pre-print
15:45 - 17:30
Closing SessionVision and Reflection / MSR Awards at Meeting Room 109
Chair(s): Patanamon Thongtanunam The University of Melbourne
15:45
20m
Talk
MSR 2023 Doctoral Research Award
MSR Awards
Eman Abdullah AlOmar Stevens Institute of Technology
16:05
30m
Talk
Open Source Software Digital Sociology: Quantifying and Understanding Large Complex Open Source Ecosystems
Vision and Reflection
Minghui Zhou Peking University
16:35
30m
Talk
Human-Centered AI for SE: Reflection and Vision
Vision and Reflection
David Lo Singapore Management University
17:05
25m
Day closing
Closing
MSR Awards
Emad Shihab Concordia Univeristy

Accepted Papers

Title
A Dataset of Bot and Human Activities in GitHub
Data and Tool Showcase Track
CLEAN++: Code Smells Extraction for C++
Data and Tool Showcase Track
DACOS-A Manually Annotated Dataset of Code Smells
Data and Tool Showcase Track
Pre-print File Attached
DeepScenario: An Open Driving Scenario Dataset for Autonomous Driving System Testing
Data and Tool Showcase Track
Pre-print
Defectors: A Large, Diverse Python Dataset for Defect Prediction
Data and Tool Showcase Track
Pre-print
DGMF: Fast Generation of Comparable, Updatable Dependency Graphs for Software Repositories
Data and Tool Showcase Track
DocMine: A Software Documentation-Related Dataset of 950 GitHub Repositories
Data and Tool Showcase Track
EGAD: A Moldable Tool for GitHub Action Analysis
Data and Tool Showcase Track
Enabling Analysis and Reasoning on Software Systems through Knowledge Graph Representation
Data and Tool Showcase Track
GIRT-Data: Sampling GitHub Issue Report Templates
Data and Tool Showcase Track
Pre-print
GitHub OSS Governance File Dataset
Data and Tool Showcase Track
Pre-print
GiveMeLabeledIssues: An Open Source Issue Recommendation System
Data and Tool Showcase Track
Pre-print Media Attached
HasBugs - Handpicked Haskell Bugs
Data and Tool Showcase Track
LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations
Data and Tool Showcase Track
Pre-print
microSecEnD: A Dataset of Security-Enriched Dataflow Diagrams for Microservice Applications
Data and Tool Showcase Track
NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python
Data and Tool Showcase Track
PENTACET data - 23 Million Code Comments and 500,000 SATD comments
Data and Tool Showcase Track
PTMTorrent: A Dataset for Mining Open-source Pre-trained Model Packages
Data and Tool Showcase Track
Pre-print
PyMigBench: A Benchmark for Python Library Migration
Data and Tool Showcase Track
SecretBench: A Dataset of Software Secrets
Data and Tool Showcase Track
Pre-print
Semantically-enriched Jira Issue Tracking Data
Data and Tool Showcase Track
Pre-print
Snapshot Testing Dataset
Data and Tool Showcase Track

Call for Papers

The MSR Data and Tools Showcase Track aims to actively promote and recognize the creation of reusable datasets and tools that are designed and built not only for a specific research project, but for the MSR community as a whole. These datasets and tools should enable other practitioners and researchers to jumpstart their research efforts, and also allows the reproducibility of earlier work. The MSR Data and Tools Showcase papers can be descriptions of datasets or tools built by the authors that can be used by other practitioners or researchers, and/or descriptions of the use of tools built by others to obtain specific research results.

MSR’23 Data and Tools Showcase Track will accept two types of submissions: (1) data showcase papers and (2) reusable tool showcase papers.

  1. Data showcase submissions are expected to include:

    • a description of the data source,
    • a description of the methodology used to gather the data (including provenance and the tool used to create/generate/gather the data, if any),
    • a description of the storage mechanism, including a schema if applicable,
    • if the data has been used by the authors or others, a description of how this was done including references to previously published papers,
    • a description of the originality of the dataset (that is, even if the dataset has been used in a published paper, its complete description must be unpublished) and similar existing datasets (if any),
    • ideas for future research questions that could be answered using the dataset,
    • ideas for further improvements that could be made to the dataset, and
    • any limitations and/or challenges in creating or using the dataset.

  2. Reusable Tool showcase submissions are expected to include:

    • a description of the tool, which includes the background, motivation, novelty, overall architecture, detailed design, and preliminary evaluation of the tool, as well as the link to download or access the tool,
    • a description of the design of the tool, and how to use the tool in practice,
    • clear installation instructions and example dataset that allow the reviewers to run the tool,
    • if the tool has been used by the authors or others, a description of how the tool was used, including references to previously published papers,
    • ideas for future reusability of the tool, and
    • any limitations of using the tool.

The dataset or tool should be made available at the time of submission of the paper for review but will be considered confidential until publication of the paper. The dataset or tool should include detailed instructions about how to set up the environment (e.g., requirements.txt), how to use the dataset or tool (e.g., how to import the data or how to access the data once it has been imported, how to use the tool with a running example).

At a minimum, upon publication of the paper, the authors should archive the data or tool on a persistent repository that can provide a digital object identifier (DOI) such as zenodo.org, figshare.com, Archive.org, or institutional repositories. In addition, the DOI-based citation of the dataset or the tool should be included in the camera-ready version of the paper. GitHub provides an easy way to make source code citable (with third tools and with a CITATION file).

Data and Tools showcase submissions are not: * empirical studies, or * datasets that are based on poorly explained or untrustworthy heuristics for data collection, or results of trivial application of generic tools.

If custom tools have been used to create the dataset, we expect the paper to be accompanied by the source code of the tools, along with clear documentation on how to run the tools to recreate the dataset. The tools should be open source, accompanied by an appropriate license; the source code should be citable, i.e., refer to a specific release and have a DOI. If you cannot provide the source code or the source code clause is not applicable (e.g., because the dataset consists of qualitative data), please provide a short explanation of why this is not possible.

Evaluation Criteria

The Review Criteria for the Data/Tool Showcase submissions are as follows:

  • value, usefulness, and reusability of the datasets or tools.
  • quality of the presentation.
  • clarity of relation with related work and its relevance to mining software repositories.
  • availability of the datasets or tools.

Important Dates

  • Paper Deadline: Thursday 26th January 2023
  • Author Notification: Tuesday 7th March 2023
  • Camera Ready Deadline: Thursday 16th March 2023

Submission

Submit your paper (maximum 4 pages, plus 1 additional page of references) via the HotCRP submission site: https://msr2023-data-tool.hotcrp.com/.

Submitted papers will undergo single-anonymous peer review. We opt for single-anonymous peer review (as opposed to the double-anonymous peer review of the main track) due to the requirement above to describe the ways how data has been used in the previous studies, including the bibliographic reference to those studies. Such a reference is likely to disclose the authors’ identity.

To make research datasets and research software accessible and citable, we further encourage authors to attend to the FAIR rules, i.e., data should be: Findable, Accessible, Interoperable, and Reusable.

Submissions must conform to the IEEE formatting instructions IEEE Conference Proceedings Formatting Guidelines (title in 24pt font and full text in 10pt type, LaTeX users must use \documentclass[10pt,conference]{IEEEtran} without including the compsoc or compsocconf options).

Papers submitted for consideration should not have been published elsewhere and should not be under review or submitted for review elsewhere for the duration of consideration. ACM plagiarism policies and procedures shall be followed for cases of double submission. The submission must also comply with the IEEE Policy on Authorship. Please read the ACM Policy on Plagiarism, Misrepresentation, and Falsification and the IEEE - Introduction to the Guidelines for Handling Plagiarism Complaints before submitting.

Upon notification of acceptance, all authors of accepted papers will be asked to complete a copyright form and will receive further instructions for preparing their camera-ready versions. At least one author of each paper is expected to register and present the results at the MSR 2023 conference. All accepted contributions will be published in the conference electronic proceedings.