Software Change Management: Disaster Recovery Lessons
19 September 2001
Vic Wheatman   Chris Morris
 
The failure of the Australian Stock Exchange in 1995 shows the potential cost of even a brief interruption in service, and demonstrates that software change management is critical to uninterrupted operation.

 Strategy & Tactics/Trends & Direction
Note Number:  COM-14-5101
Related Terms:  Change Management; Disaster Recovery
Download:  PDF 

Software Change Management: Disaster Recovery Lessons

The failure of the Australian Stock Exchange in 1995 shows the potential cost of even a brief interruption in service, and demonstrates that software change management is critical to uninterrupted operation.

Bottom Line

Key Issue
What strategies should enterprises employ to provide business process protection in the event of a disaster?

The highly visible 1995 failure of the Australian Stock Exchange's (ASX's) share trading system — caused by a major software change immediately prior to a known critical processing period — is a stark reminder of the exposures an enterprise may face if its IT operations are disrupted.

The ASX Failure

The ASX outage occurred on a Monday, two days after an Australian federal election. The exchange had closed the previous Friday with expectations of a change of government to be followed by a buoyant market when the exchange reopened. The financial press had predicted a turnover of A$1 billion for the day, but the outage — which lasted slightly more than two hours — resulted in a turnover of just A$550 million. This dramatically reduced turnover meant significantly reduced commissions for brokerage firms.

The failure at the ASX was caused by a major software upgrade completed over the weekend. When the exchange opened on Monday, the application failed catastrophically. The ASX attempted to roll back to the earlier version, but this attempt failed, and the production database was corrupted and had to be reloaded in its entirety. Disaster recovery was initiated at the ASX's backup site, but the exchange was unable to resume normal trading operations for two hours.

The ASX had made a multimillion-dollar investment in disaster recovery facilities — with highly reliable hardware, software and communications platforms configured for high availability, and a full "hot" remote backup site — but this investment did not guarantee business continuity. Neither the main site nor the standby site experienced hardware or system software failures, which should serve as a valuable reminder us that most unplanned system outages are not caused by catastrophic events, such as the terrorist attacks against the World Trade Center on 11 September 2001.

Lessons Learned

One of the most important lessons of the ASX outage is the importance of the relationship between the IS organization, which provides mission-critical technology, and the business functions that the technology supports. The processes of change management are elements in this joint responsibility for service delivery.

In light of recent events, all enterprises should conduct an immediate review of their software change management methodologies. Best practices in software change management entail the most-effective and most-efficient processes — usually manifested in the form of policies and procedures — containing the highest value/cost ratio.

An effective software change management methodology must include processes and procedures for essential activities, including:

  • Problem/change request (CR) initiation and tracking
  • Change impact analysis
  • Version control
  • Security administration of software assets
  • Software promotion
  • Quality reviews
  • Software distribution

Such a methodology should also support release management, which enables major and maintenance releases to be scheduled appropriately, and controlled patch releases to occur on an as-needed basis.

An effective software change management methodology should also assist development organizations in defining and managing the life cycle of an application software CR — i.e., capturing, analyzing, approving, prioritizing, acting on and effecting CRs. CRs will vary, so the software change management methodology must support and provide explicit workflows for a number of CR types, including new development, enhancements, modifications, maintenance and emergency fixes. Ideally, each CR classification should have a defined workflow. A new-development effort, for example, might include the following stages: development, unit test, integrated test, systems test, quality assurance and production release.

Bottom Line

Software change management best practices require that the IS and application development organizations adopt a methodology-driven approach with support for defined CR workflows and leverage technology (i.e., automate) in a manner that is cost-effective and adaptive to the enterprise's needs.


This research is part of a set of related research pieces. See AV-14-5238 for an overview.