A Ground-truth Dataset and Classification Model for Detecting Bots in GitHub Issue and PR Comments (BotSE 2021)

Who

Mehdi Golzadeh, Alexandre Decan , Damien Legay, Tom Mens

Track

BotSE 2021

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 4 Jun 2021 16:15 - 16:30 at BotSE Room - Bots Helping Software Development Chair(s): Stefan Wagner

Abstract

Bots are frequently used in Github repositories to automate repetitive activities that are part of the distributed software development process. They communicate with human actors through comments. While detecting their presence is important for many reasons, no large and representative ground-truth dataset is available, nor are classification models to detect and validate bots on the basis of such a dataset. This paper proposes a ground-truth dataset, based on a manual analysis with high interrater agreement, of pull request and issue comments in 5,000 distinct Github accounts of which 527 have been identified as bots. Using this dataset we propose an automated classification model to detect bots, taking as main features the number of empty and non-empty comments of each account, the number of comment patterns, and the inequality between comments within comment patterns. We obtained a very high weighted average precision, recall and F1-score of 0.98 on a test set containing 40% of the data. We integrated the classification model into an open source command-line tool to allow practitioners to detect which accounts in a given Github repository actually correspond to bots.

Mehdi Golzadeh

Software Engineering lab, University of Mons

Belgium

Alexandre Decan

University of Mons

Belgium

Damien Legay

University of Mons

Tom Mens

University of Mons

Belgium

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 4 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

16:15 - 17:35	Bots Helping Software DevelopmentBotSE 2021 at BotSE Room Chair(s): Stefan Wagner University of Stuttgart

16:15 15m Paper		A Ground-truth Dataset and Classification Model for Detecting Bots in GitHub Issue and PR CommentsJournal-first Presentation BotSE 2021 Mehdi Golzadeh Software Engineering lab, University of Mons, Alexandre Decan University of Mons, Damien Legay University of Mons, Tom Mens University of Mons
16:30 15m Paper		SAW-BOT: Proposing Fixes for Static Analysis Warnings with GitHub Suggestions BotSE 2021 Dragos Serban Eindhoven University of Technology, Bart Golsteijn Philips, Ralph Holdorp Philips, Alexander Serebrenik Eindhoven University of Technology
16:45 15m Paper		Identifying bot activity in GitHub pull request and issue comments BotSE 2021 Mehdi Golzadeh Software Engineering lab, University of Mons, Alexandre Decan University of Mons, Eleni Constantinou Eindhoven University of Technology, Tom Mens University of Mons
17:00 15m Paper		Designing a Bot for Efficient Distribution of Service Requests BotSE 2021 Arkadip Basu Walmart Global Tech, Kunal Banerjee Walmart Global Tech
17:15 20m Live Q&A		Open discussion BotSE 2021

Information for Participants

Fri 4 Jun 2021 16:15 - 17:35 at BotSE Room - Bots Helping Software Development Chair(s): Stefan Wagner

Info for room BotSE Room:

Go directly to this room on Clowdr