Self-healing systems

**jaseela123** · 28-08-2017, 04:17 PM

In software systems, the term self-healing describes any application, service or system that may find that it is not functioning properly and that, without human intervention, makes the necessary changes to restore to normal or designed state. Self-healing consists in making the system capable of making its decisions by continually checking and optimizing its condition and automatically adapting to changing conditions. The goal is to make fault tolerant and responsive system capable of responding to changes in demand and failure recovery.
Self-healing systems can be divided into three levels, depending on the size and type of resources we are monitoring and acting on. These levels are as follows.
• Application level
• System level
• Hardware level

We will explore each of these three types separately.

Self-healing at application level

Application-level healing is the ability of an application, or service, to heal itself internally. Traditionally, we are accustomed to catching problems through exceptions and, in most cases, to register them for further examination. When such an exception occurs, we tend to ignore it and move on (after recording), as if nothing had happened, hoping for the best in the future. In other cases, we tend to stop the application if an exception of a certain type occurs. An example would be a connection to a database. If the connection is not established when the application is started, we often stop the entire process. If we are a little more experienced, we could try to repeat the attempt to connect to the database. Hopefully those attempts are limited or we can easily enter an endless loop, unless the connection failure to the database is temporary and the DB is reconnected online soon after. Over time, we have better ways to deal with problems within applications. One of them is Akka. It is the use of the supervisor and the patterns of design that promotes, allow us to create applications and services of internal self-healing. Akka is not the only one. Many other libraries and frameworks allow us to create fault-tolerant applications capable of recovering from potentially disastrous circumstances. Since we are trying to be agnostic to programming languages, I will leave you, dear reader, researching ways to self-heal your applications internally. Note that self-healing in this context refers to internal processes and does not provide, for example, recovery of failed processes. In addition, if we adopt the architecture of microservices, we can quickly reach services written in different languages, using different frames, etc. It's really up to the developers of each service to design it in a way that can heal itself and recover from failures.

Self-healing at the system level

Unlike application level healing that depends on a programming language and design patterns we apply internally, system-level self-healing can be generalized and applied to all services and applications, regardless of their internal. This is the kind of self-healing we can design at the level of the whole system. While there are many things that can happen at the system level, the two most monitored aspects are process failures and response time. If a process fails, we need to reassign the service or restart the process. On the other hand, if the response time is not adequate, we need to scale or disqualify, depending on whether we reach upper or lower response time limits. Recovering from process failures is often not enough. While such actions can restore our system to the desired state, human intervention is often still necessary. We need to investigate the cause of the failure, correct the service design or correct an error. That is, self-healing often goes hand in hand with researching the causes of that failure. The system recovers automatically and we (humans) try to learn from those failures, and improve the system as a whole. For that reason, some kind of notification is also required. In both cases (failure and traffic increase), the system needs to be monitored and take some action.

Self-healing at the hardware level

In fact, there is no such thing as self-healing hardware. We can not have a process that automatically heats the failed memory, repairs the broken hard disk, repairs the faulty CPU, and so on. What healing really means at this level is the redistribution of services from a healthy knot to one of the healthy ones. As with the system level, we need to periodically check the status of the different hardware components, and act accordingly. In fact, most of the healing caused due to the hardware level will occur at the system level. If the hardware does not work properly, it is likely that the service will fail and, therefore, will be solved by system-level healing. Hardware-level healing is more related to the preventive types of checks we will discuss shortly.

[Image: health-hardware.png?w=625&h=230]

[Image: health-hardware.png?w=625&h=230]

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Decision Support Systems	seminar code	1	672	11-09-2017, 12:11 PM Last Post: jaseela123
	Computer Networks: A Systems Approach Fourth Edition Solutions Manual pdf	project girl	1	2,563	07-09-2017, 10:59 AM Last Post: jaseela123
	An Overlook Through Intelligent Systems	seminar addict	1	7,055	02-09-2017, 12:13 PM Last Post: jaseela123
	Short Term Load Forecasting with Fuzzy Logic Systems Report	seminar flower	1	963	28-08-2017, 05:00 PM Last Post: jaseela123
	3G Cellular Systems	seminar tips	1	983	28-08-2017, 12:20 PM Last Post: jaseela123
	Embedded Systems: An Overview	Electrical Fan	0	10,302,602	25-08-2017, 09:32 PM Last Post: Electrical Fan
	Survivable Networks Systems	Electrical Fan	0	5,961,170	25-08-2017, 09:32 PM Last Post: Electrical Fan
	Consistency Mechanisms In Distributed Systems Full Seminar Report Download	computer science crazy	0	14,506,084	25-08-2017, 09:32 PM Last Post: computer science crazy
	INFORMATION DISCOVERY IN DISTRIBUTED SYSTEMS Ã¢â‚¬â€œ A NOVEL CONCEPT BASED ON MODELLING OF	super	0	8,249,817	25-08-2017, 09:32 PM Last Post: super
	Connectionist Systems Full Download Seminar Report and Paper Presentation	computer science crazy	0	11,400,293	25-08-2017, 09:32 PM Last Post: computer science crazy

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.