hystrix 工作原理和实践 #2

greatwqs · 2019-03-22T16:24:51Z

Hystrix中文意思豪猪，因其背上长满了刺而有自我保护能力。Netflix的Hystrix是一个帮助解决分布式系统交互时候超时处理、容错的类库，它同样拥有保护系统的能力。
Hystrix从Netflix API团队于2011年开始的弹性工程工作演变而来。2012年，Hystrix继续发展和成熟，Netflix的许多团队都采用了它。今天，Netflix每天都会通过Hystrix执行数百亿个线程隔离和数千亿个信号量隔离的调用。这导致了正常运行时间和弹性的显着改善。
https://github.com/Netflix/Hystrix

Hystrix解决了什么问题？

复杂分布式体系结构中的应用程序具有许多依赖关系，每个依赖关系在某些时候都将不可避免地失败。如果主机应用程序未与这些外部故障隔离，则可能会被它们取下。
例如，对于依赖于30个服务的应用程序，其中每个服务的正常运行时间为99.99％，您可以期待以下内容：

99.99 30 = 99.7％正常运行时间10 
％请求中的0.3％= 3,000,000次故障2个
小时停机/月，即使所有依赖项都具有出色的正常运行时间。

现实情况通常更糟。
即使所有依赖关系都表现良好，如果您没有为整个系统设计弹性，那么即使0.01％停机时间对数十种服务中的每项服务的总体影响也相当于每月停机一小时。

Hystrix工作目标

防止任何单个依赖项用尽所有容器（例如Tomcat）用户线程。
Preventing any single dependency from using up all container (such as Tomcat) user threads.
脱落负载并快速失败而不是排队。
Shedding load and failing fast instead of queueing.
在可行的情况下提供回退以保护用户免于失败。
Providing fallbacks wherever feasible to protect users from failure.
使用隔离技术（例如隔板，泳道和断路器模式）来限制任何一个依赖项的影响。
Using isolation techniques (such as bulkhead, swimlane, and circuit breaker patterns) to limit the impact of any one dependency.
通过近实时指标，监控和警报优化发现时间
Optimizing for time-to-discovery through near real-time metrics, monitoring, and alerting
通过Hystrix的大多数方面的配置更改的低延迟传播和对动态属性更改的支持来优化恢复时间，这允许您使用低延迟反馈循环进行实时操作修改。
Optimizing for time-to-recovery by means of low latency propagation of configuration changes and support for dynamic property changes in most aspects of Hystrix, which allows you to make real-time operational modifications with low latency feedback loops.
防止整个依赖关系客户端执行中的故障，而不仅仅是网络流量。
Protecting against failures in the entire dependency client execution, not just in the network traffic.
Hystrix如何实现其目标
将对外部系统（或“依赖项”）的所有调用包含在通常在单独线程中执行的对象HystrixCommand或HystrixObservableCommand对象中。
Wrapping all calls to external systems (or “dependencies”) in a HystrixCommand or HystrixObservableCommand object which typically executes within a separate thread
为每个依赖项维护一个小的线程池（或信号量）; 如果它变满，将立即拒绝发往该依赖项的请求而不是排队。
Maintaining a small thread-pool (or semaphore) for each dependency; if it becomes full, requests destined for that dependency will be immediately rejected instead of queued up.
统计调用成功，失败（客户端引发的异常），超时和线程拒绝。
Measuring successes, failures (exceptions thrown by client), timeouts, and thread rejections.
如果服务的错误百分比超过阈值，则手动或自动地使断路器跳闸以停止对特定服务的所有请求一段时间。
Tripping a circuit-breaker to stop all requests to a particular service for a period of time, either manually or automatically if the error percentage for the service passes a threshold.
当请求失败时执行回退逻辑，被拒绝，超时或短路。
Performing fallback logic when a request fails, is rejected, times-out, or short-circuits.
近乎实时地监控指标和配置更改。
Monitoring metrics and configuration changes in near real-time.

Hystrix依赖隔离

Hystrix工作原理

https://raw.githubusercontent.com/wiki/Netflix/Hystrix/images/hystrix-command-flow-chart.png

Construct a HystrixCommand or HystrixObservableCommand Object
Execute the Command
Is the Response Cached?
Is the Circuit Open?
Is the Thread Pool/Queue/Semaphore Full?
HystrixObservableCommand.construct() or HystrixCommand.run()
Calculate Circuit Health
Get the Fallback
Return the Successful Response
https://github.com/Netflix/Hystrix/wiki/How-it-Works

Hystrix断路器内核

https://raw.githubusercontent.com/wiki/Netflix/Hystrix/images/circuit-breaker-1280.png

断路器打开或关闭状态判定

假设电路上的并发流量达到某个阈值HystrixCommandProperties.circuitBreakerRequestVolumeThreshold()
并假设错误百分比超过阈值错误百分比HystrixCommandProperties.circuitBreakerErrorThresholdPercentage()
然后断路器从转换CLOSED到OPEN。
当它打开时，它会短路所有针对该断路器的请求。
经过一段时间HystrixCommandProperties.circuitBreakerSleepWindowInMilliseconds()后，下一个请求将通过（这是HALF-OPEN状态）。如果请求失败，则断路器返回OPEN睡眠窗口持续时间的状态。如果请求成功，则断路器转换为1 CLOSED并且逻辑1再次接管。

//设置打开熔断
@HystrixProperty(name ="circuitBreaker.enabled", value ="true"), 
//请求数达到后才计算错误率
@HystrixProperty(name ="circuitBreaker.requestVolumeThreshold", value ="10"),    
//成功率超过这个数字就代表服务恢复了
@HystrixProperty(name ="circuitBreaker.errorThresholdPercentage", value ="40"),    
//熔断时间，即设置一个时间窗口。当失败次数达到熔断是，就会进入这个时间窗口，这时候默认返回服务降级的处理逻辑，
//过了这个窗口时间，服务恢复了就会采用原来的处理逻辑，如果服务未恢复就进入新的时间窗口。
@HystrixProperty(name ="circuitBreaker.sleepWindowInMilliseconds", value ="10000"),

演示

student-springboot 基础服务提供者
school-springboot 调用 student 服务
通过postman调用school微服务
http://localhost:8088/getSchoolDetails1/abcschool
Hystrix 标注：

@HystrixCommand(fallbackMethod = "callStudentService_Fallback",
   commandProperties = {
      @HystrixProperty(name = "circuitBreaker.forceClosed", value = "true"),
      @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds", value = "4000")
   },
   threadPoolKey = "studentServiceThreadPool",
   threadPoolProperties = {
      @HystrixProperty(name = "coreSize", value = "5"),
      @HystrixProperty(name = "maxQueueSize", value = "5")
   })

hystrix stream:
http://localhost:8088/hystrix.stream
hystrix dashboard
http://localhost:8088/hystrix

http://localhost:8088/hystrix/monitor?stream=http%3A%2F%2Flocalhost%3A8088%2Fhystrix.stream

结尾

配置中心 spring cloud config / Apollo @RefreshScope
网关 spring cloud gateway 80 20 原则

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hystrix 工作原理和实践 #2

hystrix 工作原理和实践 #2

greatwqs commented Mar 22, 2019 •

edited

Loading

hystrix 工作原理和实践 #2

hystrix 工作原理和实践 #2

Comments

greatwqs commented Mar 22, 2019 • edited Loading

Hystrix解决了什么问题？

Hystrix工作目标

Hystrix依赖隔离

Hystrix工作原理

Hystrix断路器内核

断路器打开或关闭状态判定

演示

结尾

相关产品

greatwqs commented Mar 22, 2019 •

edited

Loading