Abstract:
Improvement in software development practices to predict and reduce software defects can lead to major cost savings. The goal of this thesis is to demonstrate the value of static analysis metrics and rules in predicting software defects at a much larger scale than previous efforts. The study analyses data collected from more than 500 software applications, across 3 multi-year software development programs, and uses over 150 software static analysis measurements. Static analysis metrics, rule violations and software defect historical actual values are sourced from multiple disparate databases, joined and groomed for analysis. Several feature selection techniques are employed to narrow the feature set focus to the most influential variables. Furthermore, a number of machine learning techniques such as neural network and random forest are used to determine whether seemingly innocuous rule violations can be used as significant predictors of software defect rates.