18-04-2014, 03:00 PM
Software and System Safety Research Group: A White Paper
Software and System.docx (Size: 25.09 KB / Downloads: 13)
Introduction
Computers are rapidly becoming an integral part of nearly every engineered product, as well as controlling the manufacturing process for products: Computers control consumer products, commercial aircraft, nuclear power plants, medical devices, weapon systems, aerospace systems, automobiles, public transportation systems, and so on. Virtually nothing is engineered and manufactured in the U.S. today without computers affecting the design, manufacturing and operation. Not only do products use computers to operate better or cheaper---``smart'' automobiles and appliances are examples---but complex systems are incorporating designs that cannot be operated without computers---for example, unstable aircraft and space vehicles that cannot be operated successfully by humans alone. David Hughes wrote in a recent editorial in Aviation Week and Space Technology:
``Information technology is becoming a key part of everything the aerospace and defense industry does for a living, and as the century closes it is computers and software that hold the keys to the future. The [aerospace] industry is being transformed from dependence on traditional manufacturing into something that looks more like IBM and Microsoft with wings.''
At the same time that computers are becoming indispensable in controlling complex engineered systems, quality and confidence issues are increasing in importance. We are hearing more and more about failures due to computers: Software errors have resulted in loss of life, destruction of property, failure of businesses, and environmental harm. Computers now have the potential for destabilizing our financial system. Some large government-financed projects are in trouble or have been canceled because of difficulty in assuring the quality of the software.
One of the reasons for the problems is that these systems require that standard engineering techniques be extended to deal with new levels of complexity, new types of failure modes, and new types of problems arising in the interactions between components. Computers exacerbate engineering problems by allowing levels of complexity and coupling with more integrated, multi-loop control in systems containing large numbers of dynamically interacting components. We are attempting to build systems where the interactions between components cannot be thoroughly planned, understood, anticipated, or guarded against. The fundamental problem is intellectual unmanageability: Increased complexity and coupling make it difficult for the designers to consider all the potential system states or for operators to handle all normal and abnormal situations and disturbances safely and effectively. The failures in these systems are arising in the interactions between components. While we train engineers to be experts in individual fields, these complex heterogeneous systems (composed of electromechanical, digital, and human components) require knowledge and techniques that span engineering disciplines.
The Software and System Safety Research Group is a response to these problems. It's goal is to act as a focus for interdisciplinary research, education, and development to support the engineering and use of computers embedded in and controlling complex engineered systems. This white paper discusses the problem being attacked, attempts to delineate why the problems have not already been solved, and suggests some specific research topics that we feel are of critical importance in stretching the current limits of complex system engineering.
The Problem
During and after World War II, technology expanded rapidly, and engineers were faced with designing and building more complex systems than had previously been attempted. The creation of systems engineering as a discipline received much of its impetus from aerospace programs, but the new systems engineering techniques were soon adopted and applied to the process industry (chemicals and nuclear power), transportation systems, and other complex engineered systems.
As the systems we wanted to build became too complex or too time-critical to be controlled by humans or even electromechanical devices, computers started to be used to take over at least part and sometimes all of the control functions. Not only are computers flexible and seemingly limitless in their power, but they work at a speed that cannot be duplicated by any other means and are relatively cheap besides. These characteristics allow us to engineer products and complex systems that were previously inconceivable. The computer has freed us from many of the physical limits of electromechanical devices, but we are now faced with practical limitations in our ability to engineer the software parts of these systems.
As electromechanical controllers are replaced by computers, many of the basic engineering and systems engineering techniques that were developed to cope with complex systems are no longer adequate. Software adds the potential for introducing a level of complexity not previously possible: Most control software is too complex for complete mathematical analysis and yet too structured for statistical analysis. At first, heroic human effort, brute force techniques, and tremendous amounts of money were able to get large software projects like the Space Shuttle control system finished successfully. However, our ambitions are starting to outstretch the limits of what brute force and money can accomplish, and the technology to build such systems and to provide the needed confidence in their quality does not exist.
As an example, the Space Shuttle software, one of the largest and most ambitious software development projects of the 1970's, contains about 400,000 lines of code. NASA put enormous amounts of money into its development and still spends approximately $100,000,000 a year to maintain it. In contrast, even automobiles and some household products now have or will soon have that much software in them. More complex projects, such as upgrades to the U.S. Air Traffic Control System, Space Station Freedom, commercial and military aircraft, and even telephone switching systems contain millions of lines of code. To build such software may require hundreds and sometimes thousands of people, and just organizing these projects is a massive undertaking. The result of not solving these system and software engineering problems may be failures in our attempts to build the complex systems of the future. As just one example, the huge cost overruns and technical difficulties encountered in building a new U.S. Air Traffic Control system led to cancelling large parts of it a few years ago. The more recent scaled back attempts to provide limited upgrades are also running into problems. The past six months have seen the failure of five satellite launch attempts, several of them blamed on software, including the most recent failure of a Titan IV-B/Centaur Milstar mission that has been billed as the most costly unmanned accident in the 50-year history of Cape Canaveral launch operations.
Merely producing enormous amounts of code is not enough. The potential for losses---human, environmental, and financial---with these computer-controlled systems makes quality of paramount importance. Virtually all non-trivial software has errors in it, and we do not currently have the capability to locate and correct these errors. We are putting reliance on human products that we cannot demonstrate are trustworthy, and it is getting worse as the complexity of the systems we attempt to build increases.
While the U.S. has been ahead of the rest of the world in software engineering, this situation is starting to change. The EEC countries and the Japanese are catching up and may be ahead in achieving high quality levels. Currently, the Japanese outstrip the U.S. in quality and productivity for relatively simple software systems, and they are now working on the engineering of more complex systems. The EEC countries have launched major initiatives in software engineering, including applying mathematical techniques to software, and are now ahead of the U.S. in this and other areas. The center of gravity of software engineering research in general may now have shifted to Europe.
Why the Problems
Although major initiatives are currently missing, certainly a great deal of effort has been and still is being applied to these problems. Why are we still having trouble building embedded software?
One answer to this question is that we have made progress, but the problems we are facing are increasing at a faster rate. The term ``software crisis'' to describe the problems of software engineering was introduced in the late 1960s and still is being used. However, this usage is misleading. Today we have relatively few problems building the typical software systems of the 1960s. Man's reach always outdistances his grasp---as we learn how to build one type of software system successfully, we immediately want to accomplish more.
But we cannot blame all our limitations on increasing expectations. Although a large number of researchers have been working on software engineering, their results have had limited use in real systems. There may be several reasons for this.
First, academic researchers have concentrated on the mathematical aspects of problems and solutions while ignoring human factors and the necessarily informal aspects of software development. While mathematical techniques are useful in some parts of the process, informal techniques will always be a large part (if not the majority) of any software development effort, and, indeed, most engineering projects in general. Researchers often focus exclusively on formal or on informal aspects of software development without considering their interaction.