The word “Resilience” means “the ability to recover quickly from difficulties; toughness”. In the world of Microservices, it is very common for the services to fail but failure is not a problem. Our aim should be how quickly our system can recover from it, avoid cascading failures and the other microservices should not be affected from the failure of a particular micro-service (Apply this mantra in your real life also :p). It is not necessary to add resilience in our microservices but it is a good practice to do so in order to enhance the user expeerience.

Let us understand few things regarding resilience before moving to Hystrix.
Microservices Cascading failure
Suppose there’s a microservices’ system in which Microservice1 calls Microservice2 and Microservice2 calls Microservice3 for their functioning.

Due to some reason, Microservice3 fails and becomes unresponsive to the incoming requests. But Microservice2 keeps on sending further requests to it and waiting for the responses and hence, Microservice2 also becomes slow and unresponsive. Similarly, Microservice1 also becomes slow and unresponsive. This is what is known as Cascading failure in which failure of one component leads to another and then another and finally, the whole system goes down.



This could have been avoided if Microservice2 stopped sending requests after the failure of the first few requests to Microservice3 and would have tried again only after a certain interval. This concept is known as circuit breaking in resilience. This is taken from the concept of circuit breaking in electricity where the fuse goes down and breaks the circuit whenever the voltage is high and hence, saves the electrical appliances from burning down. While the circuit is broken, Microservice2 will still receive requests that need to be sent to Microservice3 but since the circuit is broken, it sends an alternative response back to Microservice1 which is known as a fallback.

What is Netflix Hystrix?
It is an open-source resilience library by Netflix. It provides latency and fault tolerance in a distributed system. It improves the overall resilience of the system by isolating the failing services and stopping the cascading effect of failures. It provides features like circuit break and fallback. According to Netflix, “Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services, and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable.”
Circuit break with Hystrix:
In a microservices system, when the number of errors/failures increases than the configured threshold, the circuit opens and hence, breaking the further flow of requests to the faulty component. This saves other services of the system from being affected. The system tries again to send the request to the faulty component only after a certain time interval and if the request fails again, the circuit is again opened. Hystrix provides the circuit breaking functionality and all the things, failure threshold, waiting time, the number of retry attempts, etc are configurable. Hystrix follows the ideology of fail-fast.
Fallback with Hystrix:
It is considered as bad user experience if, in case of any failure inside the system, the error itself is reflected back to the user. In the case of the circuit open, we can’t tell the user that a particular service is down and return an error response. It will be better to execute an alternative code and return a custom response to the user in case of a circuit open situation. The alternative code here is known as a fallback and this functionality is provided by Hystrix. Hystrix follows the ideology of fail-gracefully.
How to use Hystrix?
It is always better to understand anything practically with an example. So, we will create two microservices for our example: Microservice1 and Microservice2. Microservice1 will call Microservice2 to get a response. We will add Hystrix logic in Microservice1 and we will stop Microservice2 to see the functioning of Hystrix i.e. how it deals with failures.
Implementing Microservice1
Here is the code for implementing the microservice 1:
Microservice1Application.java:
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class Microservice1Application {
      public static void main(String[] args) {
       SpringApplication.run(Microservice1Application.class, args);
      }
}
Microservice1Controller: 
It provides and endpoint /getmicro2 which internally hits the endpoint of Microservice2 and returns the response.
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.client.RestTemplate;
@RestController
public class Microservice1Controller {
    @GetMapping("/getmicro2")
    public String getMicro2Instance()
    {
        String url = "http://localhost:8081/microservice2/port";
        String response = new RestTemplate().getForObject(url,  
                          String.class);
        return response;
    }
}
Microservice1's application.properties:
spring.application.name property specifies the name of the service.
spring.application.name = microservice1
Implementing Microservice2
Now let's implement the second microservice:
Microservice2Application:
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class Microservice2Application {
      public static void main(String[] args) {
       SpringApplication.run(Microservice2Application.class, args);
      }
}
Microservice2Controller:
It has an endpoint /microservice2 which returns a string stating the port on which the service is currently running.
import org.springframework.beans.factory.annotation.Value;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
@RestController
@RequestMapping("/microservice2")
public class Microservice2Controller {
 
 @Value("${server.port}")
 private String port;
 
 @GetMapping("/port")
 public String getPort()
 {
       return "Microservice2 is running on port: "+ port;
 }
}
Microservice2's application.properties:
spring.application.name property specifies the name of the service. server.port specifies the port on which the service is running.
spring.application.name = microservice2
server.port = 8081
Hystrix's circuit breaking and fallback logic in our Microservice1
	- Add hystrix's dependency in pom.xml:
	
<dependency>
       <groupId>org.springframework.cloud</groupId>
       <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
</dependency>
 
- 
	Add @EnableCircuitBreaker annotation on application class:  This annotation enables the Circuit breaker implementation. It tells the Spring that circuit breaking is being used in this application. 
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.client.circuitbreaker.EnableCircuitBreaker;
@SpringBootApplication
@EnableCircuitBreaker
public class Microservice1Application {
public static void main(String[] args) {
  SpringApplication.run(Microservice1Application.class, args);
 }
}
 
- 
	Specify the fallback method using @HystrixCommand annotation: Spring Cloud Netflix Hystrix looks for the methods annotated with @HystrixCommandannotation. It makes the method fault tolerant. Using it we can specify the fallback method which will be executed in case of an error, timeout or circuit break situation.
 
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.client.RestTemplate;
import com.netflix.hystrix.contrib.javanica.annotation.HystrixCommand;
@RestController
public class Micro1Controller {
 
     @HystrixCommand(fallbackMethod = "getMicro2InstanceFallback")
     @GetMapping("/getmicro2")
     public String getMicro2Instance()
     {
          String url = "http://localhost:8081/microservice2/port";
          String response = new RestTemplate().getForObject(url, 
                            String.class);
          return response;
     }
 
     public String getMicro2InstanceFallback()
     {
          return "Microservice2 is down!!";
     }
}
 
- 
	Add below properties in application.properties:
 Hystrix properties can be configured using @HystrixPropertyannotation also but we will prefer to specify the properties in application.properties. Let’s understand the properties. It says open the circuit for 30 seconds (sleepWindowInMilliseconds) if 2 requests (requestVolumeThreshold) in 10 seconds (timeInMilliseconds) are sent and 50% (errorThresholdPercentage) or more of them fails. After the sleep window, one request will be sent and if it too fails, the circuit will again be opened for the specified sleep window. This cycle repeats until a request succeeds.
 
hystrix.command.default.circuitBreaker.requestVolumeThreshold=2
hystrix.command.default.metrics.rollingStats.timeInMilliseconds=10000
hystrix.command.default.circuitBreaker.errorThresholdPercentage=50
hystrix.command.default.circuitBreaker.sleepWindowInMilliseconds=30000
 Let's try it out now. Run Microservice1 only and try to hit the endpoint of Microservice1 which internally calls Microservice2. Since, Microservice2 is down, Microservice1 will get an error while connecting to it. So, we can see that in the console: 1st request- normal function called, error occurred and hence, fallback called, 2nd request- normal function called, error occurred and hence, fallback called but in subsequent requests directly fallback is called because the circuit is open as 2 requests are sent within 10 seconds and both failed (more than 50% of 2). Now, after 30 seconds once again the normal function called to check whether Microservice2 is up or not. Since, it is not hence, circuit is again open for 30 seconds.  
 
Conclusion:
In this articles, we have understood what is resilience, what is its importance in microservices system, cascading failure, circuit break, fallback. We have understood why Hystrix is required, what are all the functionalities that it provides to the developers, and how to configure and use it in our microservices using a simple demo application.
You may also like: