0% found this document useful (0 votes)
2 views3 pages

Fault Tolerance

Fault tolerance is the ability of a system to operate uninterrupted despite component failures, ensuring no service breaks and complete recovery. It involves trade-offs between cost and fault tolerance levels, with phases including error detection, damage confinement, and error recovery. Various types of faults, such as processor and network faults, require different fault tolerance mechanisms like replication and process-level redundancy to maintain system attributes like availability, reliability, safety, and maintainability.

Uploaded by

mumar033423
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views3 pages

Fault Tolerance

Fault tolerance is the ability of a system to operate uninterrupted despite component failures, ensuring no service breaks and complete recovery. It involves trade-offs between cost and fault tolerance levels, with phases including error detection, damage confinement, and error recovery. Various types of faults, such as processor and network faults, require different fault tolerance mechanisms like replication and process-level redundancy to maintain system attributes like availability, reliability, safety, and maintainability.

Uploaded by

mumar033423
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

FAULT TOLERANCE

System ability to continue operating uninterrupted despite the failure of one or more of its
components.
 How an OS Responds to and allows malfunctions and failures.
 It guarantees no break in service.
 Recovers from failure completely and transparently.

FAULT TOLERANCE
 Every achievement in fault tolerance leads to a drawback somewhere else.
 The system will be slower, take more disk space, utilize more machines and also
increase other costs.
 There for fault tolerance is always a trade-off between cost and the degree of fault
tolerance.

FAILUREVS ERROR
System differs from expected behavior.
 Failure might involve the system being unreachable or producing incorrect output.
 Error is incorrectness of system that may lead to a failure.
 Error do not must create failures but can be detect in the system before they produce
failure.

FAULT TOLERANCE
Fault tolerance usually running through several phases.
Error Detection: error has to be detect in order to avoid failure.
Damage Confinement: it must prevent that the error spreads through other components
Error recovery: error must be removed, otherwise system would run into failure

PROCESSOR FAULT
Occur when the processor behaves in unexpected manner. It may be classified into three
kinds.
1. Fail Stop: totally failed and will never respond, neighboring processors can detect the
failed processor
2. Slowdown: processor might run in degraded form or might totally fail
3. Byzantine: processor can fail, run in degraded fashion for some time or execute at normal
speed but tries to fail the computation

NETWORK FAULTS
When processors are prevented from communicating with each other. Link faults can cause
new kinds of problems like
One-way Links: one processor can send messages but other is not able to receive message.
Network partition: network of portion is completely isolated with other

ATTRIBUTES OF FAULT TOLERANT SYSTEM


Fault tolerance system is depended system which requires following attributes
1. Availability: when system is in a ready state and ready to deliver tis functions. Highly
available systems work at a given instant in time.
2. Reliability: ability of computer to run continuously without failure, it is defined as time
interval instead of instant time. Reliable system works constantly without interruption.
3. Safety: fails to carry out its corresponding processes correctly and operations are incorrect
but no major disastrous happened and also doesn't affect other system to be faulty
4. Maintainability: if failures can be notices and fixed easily.

Types of failure:

CLASSIFICATION OF FAILURE
Transient:
Intermittent:
Permanent:

FAULT TOLERANCE MECHANISM IN DISTRIBUTED SYSTEM


Replication based fault tolerance technique.
Process level redundancy technique.
Fusion based redundancy technique.

REPLICATION BASED FAULT TOLERANCE TECHNIQUE


Replicate the data on other machine. It will not cause the whole system to stop.
Replicate the data on different server.

Problems of replication
Consistency: major problem of replication is consistency because of updating by any client.
Consistency of data is ensured by some model such as sequential, causal memory consistency
model
Degree of replica: large number of replications are needed in order to achieve high fault
tolerance.

PROCESS LEVEL REDUNDANCY TECHNIQUES


Faults that disappears without anything been done is called transient faults. This type of faults
is hard to identify.
Handling transient fault, software based fault tolerance technique are used.
PLR Compares processes to ensure correct execution.
Check point and roll back are popular technique in which the current state of system is done.

FUSION BASED TECHNIQUE


Replication: downside is multiple backups that increases cost.
This problem is solved by fusion based technique because it requires fewer backup
Backup machines are fused to a given set of system (NP-Problem).
Fusion based technique has very high overhead during recovery process and it's acceptable in
low probability of fault in a system.

You might also like