|
System Fault Tolerance
|
|||
![]() |
|||
| System Fault Tolerance is the methodology that enables a control, a system, or a device to recover from failure. | |||
|
System Fault Tolerance is the cornerstone to which all systems are developed. From the simplest to the most complex, primary business commitments to customer service require that responses to customer requests do not get interrupted when a system fails. For example, does your company close when you cannot take an order by computer? Do you stop selling when the phone lines go down and you cannot process credit cards? Hopefully not. If you continue to operate somehow, by manually writing down orders or processing credit cards, this is already a level of Fault Tolerance. |
||
|
SFT Levels
Level I - Simple Redundancy Single component of a device control, subsystem or process mirrored to another single component of a device, control, subsystem or process. An example of such a redundancy is the credit card manual form. When the phones go down, your business does not stop taking orders. When a customer comes in, you can manually generate a sales receipt from a credit card form. The accounting department is not closed, just hampered. Level II - Device Redundancy Single device, control, subsystem or process mirrored to another device control, subsystem or process, such as RAID. In this case there are several registers, possibly several phone lines devoted to credit card machines. This prevents a failure of one device to effect the entire accounting department. Level III - Process Redundancy Entire Process mirrored to another complete process. Possibly in several locations there are similar systems. If the credit card machines are out at one location, someone can call or reach another location and they can render the sales transaction. Accounting processes are duplicated also, so there would be other individuals with the authority to make decisions in the event a manager was out sick or a computer was down. Level IV - System Redundancy Entire System mirrored to another redundant system. In this example, not only is the credit card processing system redundant, but the entire credit department or accounting department is duplicated at another store. Thus failures, sales issues, or management functions can be duplicated, preventing a customer from being discouraged from making purchases. Level V - Operational Redundancy Entire Operation is mirrored to another complete operation. This is done on a larger scale, when the possible costs are outweighed by the possible losses of sales. In this instance, the entire business operation, from management, to shipping and receiving can be managed by another location with little or no business interruption. Key personnel are replicated throughout the system, any failure is taken over by another system so that the customer never loses confidence in the business transaction. From the sales person on the sales floor to the CIO, policies are in place to protect the sale or the customer interface. This becomes critical and more developed with online businesses, because the hardware solutions are more easily obtainable. |
||
|
Failure is almost inevitable in devices and processes. Minimizing that failure, or preventing it from becoming a business interruption is considered disaster avoidance. System Fault Tolerance is just a single facet of Disaster Avoidance.
|
|||
|
EXAMPLES OF HARDWARE SFT
RAID is not System Fault Tolerance however it can be used to raise the initial level of Fault Tolerance to SFT II. By nature of its design however, it cannot help duplicate or mediate deficiencies in other levels of SFT. |
|||
![]() |
An Example of Level I - Primary Redundancy is a standard network operating system today.
On the hard drive, the File Allocation Table is like the card catalog at the library. It records all the file locations, subsequent modification or changes and their locations. The FAT is duplicated in the event of an unexpected system shutdown. If the system is not properly shutdown the operating system can go back and compare the two FAT tables to control the damage. |
||
|
Level II - Device Redundancy is mirroring the entire drive to another drive. |
![]() |
||
| For more information, contact Marloe Group, and let us show you how some simple things can make a big difference in your IT performance. | |||
|
Level III - Process Redundancy is the mirroring of an entire process such as the entire server mirrored to another server. This process or level of SFT was significant several years ago, however the importance began to wane when the technology proved flawed for hardware. |
||||
![]() |
||||
![]() |
||||
|
In an SFT III Mirrored server solution, often times the failure was not hardware, something that SFT III was designed for, and when a software failure occurred it was duplicated to the other server. The result was a business interruption anyway. SFT III is now considered clustering. Instead of a single server standing in for another server, there can be a server farm, several devices that can stand in for each other in the event one fails. This solution allows for load balancing and other benefits. |
||||
|
Level IV - System Redundancy - An entire System mirrored to another redundant system. In most cases this is when a single set of servers or processes within the IT department for example are mirrored to another site, usually a third party data center, where identical systems are running concurrently. |
||
![]() |
Failures that occur in this scenario are limited, because users can quickly access the offsite equipment, however there is the possibility of business interruption. It often takes time to move the employees to another 'Hot' site to restart operations. | |
| How can we help? | |||||||
| Experience - Our expertise has been called upon by the U.S. State Department as well as Banks and Computer Manufacturers based in Houston. FSDM - Our methodology for IT has been built up from 20 years of computer networking experience, and what makes us different is we wrote it all down. Managed Services - Our solution will allow you to manage your business while we take care of the tools that make your business run. Our experience guarantees your IT department will be a better more responsible department. Our policies and procedures give you tools to see what is really going on. Documentation - We document our work, and your network. If things go bad and you have to evacuate the building, there are procedures for shut down, for evacuating equipment, and protecting data, written so that anyone can figure it out. Minimize your risk, call Marloe Group today. |
|||||||