Fault Tolerance PPT

**project girl** · 16-01-2013, 04:50 PM

Fault Tolerance

.ppt

1Fault Tolerance.ppt (Size: 604 KB / Downloads: 98)

Concepts of Fault Tolerance

Hardware, software and networks cannot be totally free from failures
Fault tolerance is a non-functional (QoS) requirement that requires a system to continue to operate, even in the presence of faults
Fault tolerance should be achieved with minimal involvement of users or system administrators
Distributed systems can be more fault tolerant than centralized systems, but with more processor hosts generally the occurrence of individual faults is likely to be more frequent
Notion of a partial failure in a distributed system

Attributes of a Dependable System

System attributes:
· Availability – system always ready for use, or probability that system is ready or available at a given time
· Reliability – property that a system can run without failure, for a given time
· Safety – indicates the safety issues in the case the system fails
· Maintainability – refers to the ease of repair to a failed system
Failure in a distributed system = when a service cannot be fully provided
System failure may be partial
A single failure may affect other parts of a system (failure escalation)

Strategies to Handle Faults

Fault avoidance
Techniques aim to prevent faults from entering the system during design stage
Fault removal
Methods attempt to find faults within a system before it enters service
Fault detection
Techniques used during service to detect faults within the operational system
Fault tolerant
Techniques designed to tolerant faults, i.e. to allow the system operate correctly in the presence of faults.

Example: Space Shuttle

Uses 5 identical computers which can be assigned to redundant operation under program control.
During critical mission phases - boost, re-entry and loading - 4 of its 5 computers operate an NMR configuration, receiving the same inputs and executing identical tasks. When a failure is detected the computer concerned is switched out of the system leaving a TMR arrangement.
The fifth computer is used to perform non-critical tasks in a simplex mode, however, under extreme cases may take over critical functions. The unit has "diverse" software and could be used if a systematic fault was discovered in the other four computers.
The shuttle can tolerate up to two computer failures; after a second failure it operates as a duplex system and uses comparison and self-test techniques to survive a third fault.

Process Groups

Organize several identical processes into a group
When a message is send to a group, all members of the group receives it
If one process in a group fails (no matter what reason), hopefully some other process can take over for it
The purpose of introducing groups is to allow processes to deal with collections of processes as a single abstraction.
Important design issue is how to reach agreement within a process group when one or more of its members cannot be trusted to give correct answers.

Reliable Communication

Fault Tolerance in Distributed system must consider communication failures.
A communication channel may exhibit crash, omission, timing, and arbitrary failures.
Reliable P2P communication is established by a reliable transport protocol, such as TCP.
In client/server model, RPC/RMI semantics must be satisfied in the presence of failures.
In process group architecture or distributed replication systems, a reliable multicast/broadcast service is very important.

Forward Recovery (Exception)

Exceptions
System states that should not occur
Exceptions can be defined either
predefined (e.g. array-index out of bounds, divide by zero)
explicitly declared by the programmer
Raising an exception
When such a state is detected in the execution of the program
The action of indicating occurrence of such as state
Exception handler
Code to be executed when an exception is raised
Declared by the programmer
For recovery action
Supported by several programming languages
Ada, ISO Modula-2, Delphi, Java, C++.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	A SEMINAR ON A-D Hybrid Modulation PPT	study tips	1	2,156	22-09-2017, 11:29 AM Last Post: jaseela123
	PPT on OCEAN THERMAL ENERGY CONVERSION	project girl	1	5,201	19-09-2017, 04:19 PM Last Post: jaseela123
	Spread Spectrum ppt	seminar post	1	1,108	19-09-2017, 03:41 PM Last Post: jaseela123
	PRESSURE TRANSMITTER ppt	study tips	1	2,784	14-09-2017, 01:14 PM Last Post: jaseela123
	SCADA and Control Valves ppt	study tips	1	1,715	14-09-2017, 12:09 PM Last Post: jaseela123
	Signal Conditioning PPT	project girl	1	882	13-09-2017, 04:30 PM Last Post: jaseela123
	Switching Techniques PPT	study tips	1	973	13-09-2017, 11:12 AM Last Post: jaseela123
	Introduction of Eddy Current Brake ppt	study tips	1	1,731	11-09-2017, 01:27 PM Last Post: jaseela123
	UTILITY SYSTEMS FOR CONTROLLED SOURCES OF REACTIVE POWER ppt	study tips	1	1,031	09-09-2017, 10:22 AM Last Post: jaseela123
	Type 1, 2 and 3 Control Schemes PPT	study tips	1	973	08-09-2017, 09:14 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.