“It’s not a bug, it’s a feature.” - Differentiating between bugs and non-bugs using machine learning

Bugzilla is a noisy data source: bugs are used to track anything, from Create a LDAP account for contributor X to Printing page Y doesn’t work. This makes it hard to know which bugs are bugs and which bugs are not bugs but e.g. feature requests, or meta bugs, or refactorings, and so on. To ease reading the next paragraphs, I’ll refer as bugbug to bugs that are actually bugs, as fakebug to bugs that are not actually bugs, and as bug to all Bugzilla bugs (bugbug + fakebug).

Why do we need to tell if a bug is actually a bug? There are several reasons, the main two being:

Quality metrics: to analyze the quality of a project, to measure the churn of a given release, it can be useful to know, for example, how many bugbugs are filed in a given release cycle. If we don’t know which bugs are bugbugs and which are feature requests, we can’t precisely measure how many problems are found (= bugbugs filed) in a given component for a given release, we can only know the overall number, confusing bugbugs and feature work;
Bug prediction: given the development history of the project, one can try to predict, with some measure of accuracy, which changes are risky and more likely to lead to regressions in the future. In order to do that, of course, you need to know which changes introduced problems in the past. If you can’t identify problems (i.e. bugbugs), then you can’t identify changes that introduced them!

On BMO, we have some optional keywords to identify regressions vs features, but they are not used consistently (and, being optional, they can’t be. We can work on improving the practices, but we can’t reach perfection when there is human involvement). So, we need another way to identify them. A possibility is to use handwritten rules (‘mozregression’ in comment → regression; ‘support’ in title → feature), which can be precise up to a certain accuracy level, but any improvement over that requires hard manual labor. Another option is to use machine learning techniques, leaving the hard work of extracting information from bug features to the machines!

The bugbug project is trying to do just that, at first with a very simple ML architecture.

We have a set of 1913 bugs, manually labelled between the two possible classes (bugbug vs nobug). We augment this manually labelled set with Bugzilla bugs containing the keywords ‘regression’ or ‘feature’, which are basically labelled already. The augmented data set contains 10818 bugs. Unfortunately we can’t use all of them indistinctly, as the dataset is unbalanced towards bugbugs, which would skew the results of the classifier, so we simply perform random under-sampling to reduce the number of bugbug examples. In the end, we have 1928 bugs.

We split the dataset into a training set of 1735 bugs and a test set of 193 bugs (90% - 10%).

We extract features both from bug fields (such as keywords, number of attachments, presence of a crash signature, and so on), bug title and comments.

To extract features from text (title and comments), we use a simple BoW model with 1-grams, using TF-IDF to lower the importance of very common words in the corpus and stop word removal mainly to speed up the training phase (stop word removal should not be needed for accuracy in our case since we are using a gradient boosting model, but it can speed up the training phase and it eases experimenting with other models which would really need it).

We are then training a gradient boosting model (these models usually work quite well for shallow features) on top of the extracted features.

Architecture view — **Figure 1**: A high-level overview of the architecture.

This very simple approach, in a handful of lines of code, achieves ~93% accuracy. There’s a lot of room for improvement in the algorithm (it was, after all, written in a few hours…), so I’m confident we can get even better results.

This is just the first step: in the near future we are going to implement improvements in Bugzilla directly and in linked tooling so that we can stop guessing and have very accurate data.

Since the inception of bugbug, we have also added additional experimental models for other related problems (e.g. detecting if a bug is a good candidate for tracking, or predicting the component of a bug), turning bugbug into a platform for quickly building and experimenting with new machine learning applications on Bugzilla data (and maybe soon VCS data too). We have many other ideas to implement, if you are interested take a look at the open issues on our repo!