How to Build Reliable Systems

[article]

In his Behaviorally Speaking series, Bob Aiello discusses hands-on software configuration management best practices within the context of organizational and group behavior.

Summary:
Bob Aiello describes some of the essential techniques necessary to ensure that systems can be upgraded and supported while enabling the business through frequent and continuous delivery of new system features.

Anyone who follows technology news is keenly aware that there have been a remarkable number of high-profile system glitches—at times, with catastrophic results. Major trading exchanges both in the US and in Tokyo have suffered serious outages that call into question the reliability of the world financial system itself. Knight Capital group has essentially ceased to exist as a corporate entity after what was reported to be a configuration management error that resulted in a one-day $440 million loss. Incidents like that highlight the importance of effective configuration management best practices and place a strong focus on the need for reliable systems. But what exactly makes a system reliable and how do we implement reliable systems? This article describes some of the essential techniques necessary to ensure that systems can be upgraded and supported while enabling the business through frequent and continuous delivery of new system features.

Mission-critical and enterprise-wide computer systems today are often very complex with many moving parts and even more interfaces between components; this presents special challenges even for expert configuration management engineers. These systems are getting more complex as the demand for features and rapid time to market provides unique issues that many technology professionals could not have envisioned even a few years ago.

Computer systems do more today, and many seem to learn more about us each and every day, evolving into intricate knowledge management systems that seem to anticipate our every need. High-frequency trading systems are just one example of multifaceted computer systems that must be supported by industry best practices; this is to ensure rapid and reliable system upgrades and implementation of market-driven new features.

These same systems can result in severe consequences when systems glitches occur, especially as a result of a failed systems upgrade. Finra is a highly respected regulatory authority that has recently issued a targeted examination letter to ten firms that support high-frequency trading systems. The letter requests that the firms provide information about their “software development lifecycle for trading algorithms, as well as controls surrounding automated trading technology” [1]. Some organizations may find it challenging to demonstrate adequate IT controls; really, the goal should be for implementing effective IT controls that help guarantee systems reliability.

Recently, I had the opportunity to teach configuration management best practices at the NITSL conference in Detroit for nuclear power plant engineers and quality assurance professionals. Everyone in the room was committed to software safety, including reliable safety systems.

In the IEEE, we are starting a working group to help update some of the related industry standards that help define software reliable, measures of dependability and safety. Make sure that you contact me directly if you are interesting in hearing more about participating in these worthwhile endeavors. Standards and frameworks are valuable, but it takes more than just guidelines to make reliable software. Most professionals focus on the importance of accurate requirements and well-written test scripts, which are essential, however, not sufficient to really create reliable software. What really needs to happen is that we build in quality from the very beginning which is an essential teaching that many of us learned from quality management guru W. Edwards Deming [2].

The key to success is to build the automated deployment pipeline from the very beginning of the application development lifecycle. We all know that software systems must be built with quality in mind from the beginning, and this includes the deployment framework itself. Using effective source code management practices along with automated application build, package, and deployment is only the beginning. You also need to understand that building a deployment factory is a major systems development itself. It has been my experience that many CM professionals forget to construct automated build, package, and deployment systems with the same rigor that they would a trading system. As the old adage says, “The chain is only as strong as its weakest link,” and inadequate deployment automation is indeed a very weak link.

About the author

Bob  Aiello's picture Bob Aiello

Technical Editor of CM Crossroads and author of Configuration Management Best Practices: Practical Methods that Work in the Real World, Bob Aiello is a consultant and software engineer specializing in software process improvement, including software configuration and release management. He has more than twenty-five years of experience as a technical manager at top New York City financial services firms, where he held company-wide responsibility for configuration management. He is vice chair of the IEEE 828 Standards Working Group on CM Planning and a member of the IEEE Software and Systems Engineering Standards Committee (S2ESC) Management Board. Contact Bob at Bob.Aiello@ieee.org, via Linkedin linkedin.com/in/BobAiello, or visit cmbestpractices.com.

AgileConnection is one of the growing communities of the TechWell network.

Featuring fresh, insightful stories, TechWell.com is the place to go for what is happening in software development and delivery.  Join the conversation now!

Upcoming Events

Sep 22
Sep 24
Oct 12
Nov 09