Software Quality and Bug Prediction with Machine Learning
Schedule and location
Kampusklubi Tapahtuma-areena (Event Arena https://sykoy.fi/kampusklubi/), Korkeakoulunkatu 7, Tampere (5th floor)
Registration
Speakers
Senior Research Associate Fabio Palomba, University of Zurich, Switzerland.
Postdoctoral Researcher Valentina Lenarduzzi, Tampere University, Finland.
Organizers
Assistant Professor Davide Taibi, Tampere University
Organization
The seminar is composed by three main steps:
1) Pre-assignment. Students will receive a short paper with the description of the challenge they need to work on before the seminar and on the dataset containing all the information.
Students must familiarize with the dataset and the challenge. Therefore, each student should:
a. Study the paper received, which includes the most recent database layout and links to the online and download versions of the dataset.
b. Create a new issue (link to the issue tracker provided in case of seminar acceptance) in case they have problems with the dataset or want to suggest ideas for improvements.
c. Propose at least two main Research Questions they are interested to investigate and submit us two days before the beginning of the event
2) INFORTE SEMINAR (10-12 June)
3) Post Seminar Challenge (Assignment)
To answer the research questions proposed and presented during the seminar, students must report their findings in a four-page challenge paper (see information below), submitted before July 15th, 2019
The challenge paper should describe the results of your work by providing an introduction to the problem you address and why it is worth studying, the version of the dataset you used, the approach and tools you used, your results and their implications, and conclusions. Make sure your report highlights the contributions and the importance of your work.
Challenge papers must not exceed 6 pages (LNCS format) plus 1 additional page only with references and must conform to the CEUR-WS 2019 format and submission guidelines. Each submission will be reviewed by at least three members of the program committee.
Students can decide if they want to submit their assignment as
- report to be evaluated by the seminar speakers (min 3 pages)
- Paper to be peer-reviewed and, if accepted, published in the workshop proceedings series (CEUR-WS [Jufo 1])
Papers will be peer-reviewed by at least three experts
The Challenge
The challenge is about mining SonarQube Technical Debt, a dataset providing the version history of technical debt issues in more than 30 open source software projects at the commit, including information on commits, faults, technical issues and many more. Analyses can be based on this dataset alone or expanded to also include data from other resources such as github data or any other source of information. The overall goal is to study the evolution and maintenance of the projects in the dataset. Questions that are, to the best of our knowledge, not sufficiently answered yet include:
· How are the projects maintained?
· How are faults and Technical debt related?
· How can we detect buggy commits?
· How do different technical debt issues co-evolve?
· Does the evolution of technical debt issues follow patterns?
· Do these patterns differ between projects?
These are just some of the questions that could be answered using the provided dataset. We encourage participants to adapt the above questions or formulate their own research questions about the software evolution.
Detailed Program
Day 1 9:30-17:30
Speaker: Fabio Palomba.
Title: Software Design 101: Improving the Design of Existing Code, Tests, and Communities
Abstract: In 1999, Martin Fowler introduced the concept of code smells, describing them as symptoms of poor implementation choices that may possibly worsen source code design and induce software faults. Besides keeping source code design under control, the continuous evolution of software systems and the technologies associated with them - from Continuous Integration to DevOps - has had the effect of putting in the spotlight tests and people, that are now more than ever required to work in synergy with production code for the successful evolution of software systems. In this talk, I will overview the state of the art with respect to empirical software engineering and mining software repository methods that allow (i) the investigation and control of source code, test code, and development community design and (ii) the analysis of the interactions among them.
Day 2 9:30-17:30
Speaker: Valentina Lenarduzzi.
Title: “Machine Learning Techniques for Bug Prediction”
Abstract: Bug Prediction is one of software engineering’s holy grails, which concerns with the overall of software successes. Predicting the software faults already from the earlier phase improves the software quality, reliability, efficiency and reduces the maintenance cost. Developing robust bug prediction model is a challenging task and many techniques have been proposed in the literature. In this session, we will overview the state of the art of bug prediction model focusing on machine learning techniques, looking at the selection, application and data management. In this session we will provide a set of options to help students to select the most appropriate methods for their post seminar challenge
Day 3 9:30-17:30
Students Symposium. In this Session, students will make a short presentation (10 min per student) where they will present 1) their Ph.D. topics, 2) the research questions they are planning to investigate 3) the data analysis techniques they are planning to apply
Proceedings
The proceedings are now online http://ceur-ws.org/Vol-2520/ .
Credit points
Doctoral students participating in the seminar can obtain 3 credit points + 1 credit point after the paper is accepted. This requires participating all of the days and completing the assignments:
Registration fee
This seminar is free-of-charge for Inforte.fi member organization's staff and their PhD students. For others the participation fee is 400 €. The participation fee includes access to the event and the event materials. Lunch and dinner are not included.