The day after Thanksgiving, the biggest shopping day in the United States, a storm knocked out electric power service to a client's main retail store. No problem--of course they had a contingency plan. A year earlier they had bought an expensive electric power generator justified solely "in case we lose power the day after Thanksgiving."
Relieved that they had planned ahead, they fired up the backup power generator. Silence. The generator was not generating. They'd never actually tried it out before, and now it failed on its big day.
Oh, shoot! They had planned for a contingency. The contingency happened. The plan failed. With 20-20 hindsight, what they had overlooked is obvious.
You have a Y2K effort in place, and it's all about preparation for an event you know is coming. What have you overlooked that's going to bite you? This article will help give you 20-20 foresight to anticipate potential gotchas. It also will give you some ideas of what you can do about them in the time remaining.
Of course, there may be some things you still overlook, and you may not be able to do anything about some of the issues you do identify. We are, after all, in the end game. Nevertheless, you must get to the point where normal maintenance procedures can handle any remaining Y2K problems at such times as they actually occur.
Where You Are Now
You're probably involved with a Y2K remediation and testing effort that's completed already, or at least pretty well on track to be completed long before January 1, 2000. If you're not that far along, you may want to stop reading and get back to work on the basics.
Chances are you're following a testing strategy similar to the one shown in Figure 1. It's basically a three-step Y2K testing process. Step 1 is a Baseline Test to demonstrate how the software works today (that's presuming it does work today), pre-2000 and pre-remediation.
Step 2 is a Regression Test applying the same test data as in Step 1, but after the current software has been remediated for Y2K. We assume the development organization also has performed technical testing of whatever changes they made. The purpose of the Regression Test is to demonstrate that the clean-up changes have not impacted the software's ability to continue functioning in the current manner.
Step 3 is Future Date testing, often involving a "time machine." The changed programs are run with the system date set to January 1, 2000, February 29, 2000, and other dates we'd expect to cause problems. The test data input is essentially the same as in Steps 1 and 2, except that dates are aged to make sense within the context of the future system date.
If you're not following a strategy similar to the three steps described above, you've probably overlooked significant assurances that your Y2K changes are effective. Even if you have, though, be aware there can be several gaps in how the strategy is executed.
Perhaps the most prevalent problem relates to overlooking coordination with ongoing maintenance changes. Since practically every module in your software portfolio may undergo multiple changes for Y2K, other current maintenance changes may be applied to some modules in ways that drop or counteract the Y2K changes.
Comprehensive configuration management to control module versions is essential. So is clustering related changes in identifiable releases. Many organizations find it convenient to promote software back into production as soon as it passes the Regression Test. This shortens the period in which the module is checked out for remediation, reducing the likelihood of having to reapply the remediation changes in case another copy of the module was also modified for other reasons. It also starts preparing us for the desired post-2000 situation of being able to handle date-related problems like any other ongoing maintenance.
An extreme case of overlooked maintenance-type changes afflicts many organizations with external dependencies. This is the situation where a significant piece of software is delivered very late in the Y2K cycle. Most commonly, the software is coming from an outside vendor and represents their Y2K-compliant product release. For example, organizations report their vendors are scheduling deliveries as late as October 1999 for their first Y2K-compliant version of compilers, CASE tools, network and EDI tools, or functional applications.
At a minimum, you need to reopen your Y2K strategy to make sure these late changes are tested. You may have to recompile or regenerate whole systems of software. Solid configuration management is essential for identifying all the pieces and making sure the new versions are promoted together into production.
Hopefully, your Baseline Tests already covered the prior versions of these late-changing pieces of software. While you are essentially Regression Testing the vendor's changes, you may not be able to run the exact set of test cases in your original Baseline-because vendors often change workflow and operating procedures in new releases. Consider, for instance, just the changes you'll need to make to your tests if the vendor has expanded a date-entry field from a two- to a four-digit year.
Similarly, organizations often overlook late implementations of Y2K-compliant interfaced systems from other businesses in their supply chain. Here, timing creates an added challenge you may not have prepared for. For example, assume you're sending orders directly into your supplier's computer system. Currently your supplier uses a two-digit year, but will be switching to four digits. You not only have to make sure your software can use a four-digit year, but you also have to make sure it can keep using two digits until the supplier's version changes. You have to ensure that your software will recognize when the supplier's software changes. You want it to switch logic without your having to physically switch modules when the supplier finally gets around to implementing its Y2K-compliant version.
The other big issue with your strategy relates to the actual effectiveness of remediation. Colleagues report this is especially an issue when remediation has been outsourced to a "software factory," which generally relies on automated tools to spot and fix dates. Their testing may not always be up to your standards. You may want to spot check by doing some concentrated unit testing of a few programs the vendor claims are clean.
Regardless of who did the remediation, independently evaluate how thoroughly it identified and appropriately corrected dates. The most reliable way to calculate the percent of correct date remediations is manual inspection. Look at the original code of a sample of representative programs written by various developers to identify every date instance. Then compare to all the remediation changes.
Don't be surprised if an automated tool spotted some dates your inspection misses. You'll probably also find some dates the remediation missed. Look, too, for bad fixes that inject errors, especially false positives where an automated tool treated a non-date field as if it were a date. The issue is what level of thoroughness you can live with. Consider individually inspecting all high-risk modules written by developers whose coding styles resulted in remediations less than 95% correct. You may also want to pursue more rigor from your vendor.
Test Case Oversights
The remediated software may find it easy to pass your tests if those tests are weak. Tests are typically weak in two ways. Because they concentrate on dates, they may miss bugs that remediation causes in other areas. Because they concentrate on input-driven current processing, they miss other sources of date-related bugs.
One of the areas most likely to have oversights involves non-date uses of date fields. The most common use involves setting the date to some "unreal" value, like "99," to indicate forever, or never, or something other than 1999. In fact, as we made the transition to 1999, you may have already encountered this problem. The question is whether you detected it, because it may not have blown up or created some outlandish output that catches your attention.
Testing of non-dates is a potential blind spot because the gurus-and most testers-don't understand how a lot of the programming actually works. For example, the gurus warn to test for 9/9/99 (m/d/yy). In reality, that's not a value programmers would have used, because internally it would have been stored as 090999 (mm/dd/yy) rather than 9999.
Furthermore, the types of tests that Baselines probably include are unlikely to detect many of the non-dates which programmers actually used. You're going to need to add some different types of tests to detect real non-dates, such as 99/99/99, 00/00/00, 999/99, 000/00, and especially undisplayable hex "FF" (COBOL HIGH-VALUES) and hex "00" (COBOL LOW-VALUES). Look for differences in record counts, especially in sorts and following purges, to detect these real non-date usages.
While obviously any fix can mess up something else, here's an example of a specific bug likely to be caused by remediation of a non-date usage. An insurance policy year was coded "99" to signify it should not expire. It was "fixed" by making "99" a valid date. But now the policy incorrectly expires in 1999. A field should have been added to the database record to indicate the policy should not expire. You need to test such things to assure that users still can accomplish what they used to be able to accomplish before the fix.
You also need to identify and test for non-date uses that occur separately from the program code. Users have a habit of putting unexpected data in certain fields when the system lets them. Remediation can mess up such undocumented practices.
Another type of testing that Baselines often skip involves end-of-month, end-of-quarter, and especially end-of-year processing. It's common for changes in daily processing to require corresponding changes in programs which are run only at period end. Regular end-of-year processing invariably reveals problems, even without crossing a century.
You'll also need some special types of tests to assure that remediation has not significantly affected performance. Date expansion increases database size, memory usage, and processing time. Check that printing still fits on a page, especially on pre-printed forms. Don't forget to include tests for referential integrity, which assures that links among database records stay synchronized.
Check that outputs are reasonable, not just that they are produced. For example, people historically have failed to detect mail sent erroneously to zip code 40404 (somewhere, I believe, in Kentucky), which is what prints when an IBM mainframe packed field contains blanks.
Data does not just come from screens. If your systems receive data from external sources, you need to include and test defensive logic to guard against bad data from incorrectly remediated data sources. Recovering old data from your own archives can fit this situation too, since it probably has not been converted to current formats.
Finally, make sure your Baseline Tests examine the key functionality, not just dates. Your systems still need to process orders, print invoices, dispense medication, etc. Demonstrating these non-date functions will be your primary means for detecting errors the remediation introduces outside of the date processing.
This consideration is especially important when testing replacement systems. When your Y2K fix involves implementing a new system, such as a package, you don't really have a Baseline Test of current processing. Tests of the old system's operations will not work for the replacement. Instead, you need to focus on the business functions it was accomplishing and build a set of tests demonstrating that the replacement system performs them adequately. Don't forget to include date tests too.
Oh, by the Way
There are a few other oversights which need to be addressed, but probably outside the Baseline Test. User documentation, online help, user training, and Help Desk support need to be tested to assure they reflect the system revisions correctly. Localization to foreign languages needs to be checked, especially since language changes often affect the sizing of things like dialog boxes, which will be further affected by adding digits.
When performing Future Date Testing, be sure that your aging of data takes into account your business rules. For instance, just aging inputs by a particular number of days may cause transactions to occur on non-business days, which could invoke processing logic different from what you presume you are exercising.
Also, don't forget the other systems on which you rely, such as building security access, PBXs, and faxes. In addition, don't overlook end-user computing. All those spreadsheets and personal databases developed on individuals' desktops probably have slipped through your testing net.
Despite all the clamor in the press, my own suspicion is that exposure to lawsuits for Y2K problems is not going to be quite so prevalent as feared. However, you can worsen your odds by being misleading in your Y2K compliance statements. If your later testing discovers problems that may persist, go back and amend your compliance statements.
If All Else Fails
My retail client had a contingency plan. They just hadn't tested it. Their experience, though, offers us other lessons for your end game.
Fortunately, the store manager took the initiative and devised a Plan C. He sent a clerk to a nearby drug store to buy flashlights and pocket calculators. They closed all but a few cash register lanes for shoppers to go through when leaving the store. The manager staffed these lanes with three-person teams. One person held the flashlight. A second person wrote up the order on a pad of paper and used the calculator to add tax and total it. The third person took the money and hand-wrote credit card slips. Theft was negligible and the store still did thousands of dollars of sales despite the darkness. (Overheard: "What color do you think this dress is?" "Dark.")
Having people who are willing and able to deal spontaneously with problems is a key fallback you'll want to include in your end game. Often called SWAT teams, these should be your best systems and operations people. Remember the ground control group that figured out how Apollo 13 astronauts could save themselves from ultimate crisis with duct tape!
Make sure the SWAT team has more than just duct tape. My client was lucky they could find flashlights, batteries, and calculators. Your SWAT team needs to be even more prepared. They will also need system documentation, passwords, backups, screwdrivers, and assured access to office spaces even if automated security systems break down at the witching hour. By the way, your end game also needs to include measures now to make sure the SWAT team and others you depend on are still around come January 1, 2000. Bonuses, neat projects, training in new technologies, concerted team building, and assured continued employment are typical incentives that organizations have found effective so far.
Contingency plans need to protect not only against internal system breakdowns, but also against external risks. To plan effectively, first identify particular types of events that would interrupt business significantly. Then define workable workaround approaches, get them ready, and test that they indeed will work.
For example, if your automated system is unavailable for an extended period, consider doing the essential activities manually, as my retail client did that dark November day. Realize that you've got to be able to operate without looking in the computer for data or depending on the computer to be the only one who actually knows how to do the work. When was the last time you had a retail clerk who knew how to make change without the point-of-sale system? Test the manual procedures by using them now. You'll find you need things you didn't anticipate, like paper forms, backup listings, and cash.
Consider finding alternative sources and/or accumulating reserves for critical supplies, such as medicines for patients, raw materials for manufacturing, and fallback alternative software. You need to demonstrate that your fuel supply will feed backup power for up to two weeks, or longer for life-critical situations. Don't forget, you won't be able to replenish fuel supplies if there's no electricity to run gas pumps. You'll need to test the reliability of your suppliers, including alternates.
Prepare, too, for indirect effects. If you are a supplier of items which customers will tend to stockpile, expect spikes in demand, probably followed by deep troughs. After Christmas, banks can expect runs on cash machines, and airlines will have crowds rushing home...but they probably won't have many New Year's flyers.
We'll see in a year which methods really work best. Good luck!