Scoring and Evaluating Software Methods, Practices, and Results


be realized that table 2 is a work in progress. Also, the value of table 2 is not in the precision of the rankings, which are somewhat subjective, but in the ability of the simple scoring method to show the overall sweep of many disparate topics using a single scale.

Table 2 is often used as a quick evaluation method for software organizations and software projects. From interviews with project teams and software managers, the methods actually deployed are checked off on table 2. Then the numeric scores from table 2 are summed and averaged.

A leading company will deploy methods that, when summed, total to more than 250 and average more than 5.5. Lagging organizations and lagging projects will sum to less than 100 and average below 4.0. The worst average encountered so far was only 1.8 and that was done as background to a lawsuit for breach of contract. The vendor, who was also the defendant, was severely behind in the use of effective methods and practices.

Note that the set of factors included are a mixture. They include full development methods such as team software process (TSP) and partial methods such as quality function deployment (QFD). They include specific practices such as “inspections” of various kinds, and also social issues such as friction between stakeholders and developers. They also include metrics such as “lines of code,” which is ranked as a harmful factor because this metric penalizes high-level languages and distorts both quality and productivity data. What all these things have in common is that they either improve or degrade quality and productivity.

Since programming languages are also significant, it might be asked why specific languages such as Java, Ruby, or Objective C are not included. This is because as of 2011 more than 2,500 programming languages exist, and new languages are being created at a rate of about one every calendar month.

In addition, a majority of large software applications utilize several languages at the same time, such as JAVA and HTML, or combinations that may top a dozen languages in the same applications. There are too many languages and they change far too rapidly for an evaluation to be useful for more than a few months of time. Therefore, languages are covered only in a general way: are they high-level or low-level, and are they current languages or “dead” languages no longer in use for new development.

Unfortunately, a single list of values averaged over three different size ranges and multiple types of applications does not illustrate the complexity of best-practice analysis. Table 3 shows examples of thirty best practices for small applications of 1,000 function points and for large systems of 10,000 function points. As can be seen, the two lists have very different patterns of best practices.

The flexibility of the agile methods is a good match for small applications, while the rigor of TSP and PSP is a good match for the difficulties of large-system development.

Table 3: Best Practice Differences between 1,000 and 10,000 Function Points

It is useful to discuss polar opposites such as best practices and worst practices which are on opposite ends of the spectrum.

The definition of a “worst practice” is a method or approach that has been proven to cause harm to a significant number of projects that used it. The word “harm” means either degradation of quality, reduction of productivity, or concealing the true status of projects. In addition, “harm” also includes data that is so inaccurate that it leads to false conclusions about economic value.

Each of the harmful methods and approaches individually has

About the author

AgileConnection is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.