## Real-World Math

Much of the math I learned in high school and college lies dormant. But when I sat down to come up with examples of using math skills on the job, I was surprised to find quite a few. I want to share some of the mathematical concepts I've used lately in real-world situations.

**Algebra**

I was working with an organization with 200 developers and no testing specialists. They asked me to evaluate the feasibility of retraining some of the developers as testers, so we'd have a 4:1 ratio of developers to testers. I didn't think it was likely that that many of their developers would want to start focusing exclusively on testing, but I needed to figure out how many people would need to be retrained. While I could have used trial and error to guess the answer, I decided to put my rusty algebra skills to use instead.

I set up a system of equations, where T is the number of testers and D is the number of developers, after the retraining. First, this equation represents the 4:1 ratio:

D = T * 4

And this equation represents the total number of people we have to work with:

T + D = 200

Using substitution, I calculated the number of testers:

T + D = 200

T + T * 4 = 200

T * 5 = 200

T = 200 / 5

T = 40

I could see that we would need 40 testers, leaving 160 as developers. I was excited to be able to use my high school algebra for something!

**The Modulo Function**

Remember calculating the remainder when doing division? For example, 25 ÷ 7 is 3 with a remainder of 4. The "modulo" function gives us the remainder, and it has some interesting applications. The symbol used in most programming languages for the modulo function is %, so to calculate the remainder of 25 ÷ 7, we would type 25 % 7, which gives us 4.

Here's a modulo example. I wrote a WebLoad script that logs in to a Web application. Many copies of the script may run at the same time, and each one needs to log in using a different account. I set up test users named testuser00000, testuser00001, etc. The traditional technique for doling out usernames in a load-test script is to put each username in a file and have each thread read one name from the file. But I prefer the more direct approach of generating the usernames without bothering with reading a file. Here's the Javascript code that does this for me in WebLoad:

`wlLocals.userNum = ClientNum % wlGlobals.totalUsers`

The script puts the value of wlLocals.userNum at the end of the string "testuser" to build the unique username, filling in leading zeros so the names would sort nicely in a database query. ClientNum is a built-in counter that starts at zero for the first thread and increments for each additional thread. (Note: ClientNum is not unique across multiple load generators, so this only works if you have a single load generator.)

When one thread finishes running the script, WebLoad starts another thread. For a long-running test, it's possible to use up all available test accounts. If that happens, I want it to loop back to the beginning of the list of users. That's where the modulo does its magic. I have the total number of users stored in the wlGlobals.totalUsers variable. So "ClientNum % wlGlobals.totalUsers" will cause the userNum to wrap back around to zero to avoid going outside the range of available user accounts.

Let's say we only have three test accounts, testuser00000, testuser00001, and testuser00002. The calculations for userNum would go like this:

ClientNum |
ClientNum % totalUsers = userNum |

0 | 0 % 3 = 0 |

1 | 1 % 3 = 1 |

2 | 2 % 3 = 2 |

3 | 3 % 3 = 0 |

4 | 4 % 3 = 1 |

5 | 5 % 3 = 2 |

6 |
6 % 3 = 0 |

No matter how large ClientNum gets, userNum will always be less than 3, and we'll never try to use a testuser00003 account that doesn't exist. By the way, this approach doesn't guarantee that when we wrap back to zero that the previous script using testuser00000 has completed yet.

**3) Expected Value**

Probability theory comes to play in many areas of software. One element of probability theory is the "expected value," which is the sum of the probability of each possible outcome multiplied by the value of that outcome. The expected value itself may not ever actually occur (think of it as an average of the possible outcomes). I apply this when I do risk management. The risk exposure formula is essentially an expected value calculation:

exposure = probability of loss * impact

Let's say you identify a risk of a $1,000,000 loss to the company within your project, with a 2% chance of incurring the loss. The risk exposure for this risk is $1,000,000 X .02, or $20,000. So you would want to limit any risk mitigation expenses to less than $20,000. This number may not make much sense unless you think of it as an expected value. If you ran your project exactly the same way one hundred times, the odds are that the risk would manifest into a problem twice (2% of 100), for a total loss of $2,000,000. Dividing that total loss across all one hundred project iterations gives us an average of $20,000 for each project. So even though each project will actually have either a $0 loss or a $1,000,000 loss, the expected value is $20,000 on average.

On real software projects, the math is usually much fuzzier. We don't have actuaries on our projects giving us precise probabilities like 2%. I tend to settle for relative measures of Low, Medium, and High, and I use the same scale for the impact. For the risk exposure formula, I translate these to a value of 1, 2, and 3 respectively. We can still use the expected value with the fuzzy numbers in order to make some fuzzy decisions.

**4) Logarithms**

You may have heard of logarithmic scales like the Richter scale, the frequency range of octaves on a piano, the decibel scale, etc. I found a use for a logarithmic scale while designing a disk load generator. Here's the basic idea: The maximum file size to test with would be configured in a variable. In a loop, the script would randomly choose a length from one byte up to the maximum size of the file, and then write a chunk of data of that size into the file. If I used a linear scale, the script would spend most of its time writing larger chunks, because they would take longer to write than small chunks. So I wanted to skew the random selection toward smaller chunks, while still sometimes using large chunks. A logarithmic scale took care of this nicely. Here's my Perl code for determining the chunk size for each iteration:

`my $len = int exp(rand(log($filesize)));`

Given a file size, this code chooses a random number between zero and the natural logarithm of the maximum file size; then it reverses the effect of the logarithm using the exp function. When I devised this formula, I wasn't really sure whether it would work, so I ran some experiments. I found that this did indeed produce numbers that were skewed toward the low end of the scale, but it also still produced some larger numbers. The average of a 100,000,000-sample experiment with a file size of one megabyte was 72,398, well under the 500,000 byte average file size that I would get with a linear scale.

**Love/Hate Relationship**

I am fascinated by many mathematical concepts. I remember how excited I was at the beginning of my linear algebra class, because the concepts seemed to be so powerful. But my enthusiasm quickly waned when I saw how fast the pace of the class was. We moved on to the next subject before we had a chance for the previous subject to sink in. I'm disappointed by the way math is taught at all levels.

So for those of you who can't stand math, take heart. Math is often not easy to learn, even for those of us who enjoy it. Plus, when we don't use it, we don't retain it. Look for the many ways that math can help you do your job, and find the resources you need to learn or relearn it.