RED Metrics: Monitoring Microservices Efficiently
Providing a reliable product and user experience require robust application metrics monitoring. However, it is challenging to efficiently monitor your apps, and it’s even more challenging if your apps have transitioned from monolithic to microservices.
Back in 2015, Tom Wilkie introduced the RED technique as a monitoring methodology based on what he learned while working at Google. The main reason behind developing RED was prior monitoring philosophies and approaches, such as the USE method, which did not completely meet the objectives of software companies and modern software architectures.
The primary goal of the RED technique is to ensure that the software application functions properly for the end-users. Hardware measurements aren’t nearly as relevant in the present era of microservice architectures and cloud infrastructure as long as your service level objectives (SLOs) are being addressed.
RED stands for rate, errors, and duration. These are the three essential metrics you should monitor for each service in your architecture:
* Rate (The number of requests the service is handling per second)
* Errors (The number of those requests that are failing)
* Duration (The amount of time those requests take)
Using those three metrics, you get a solid understanding of how your services are performing. The number of requests your service is handling per second provides a baseline for how much traffic is directed to your service. On the other hand, the percent of those requests resulting in errors indicates whether or not a service is operating within your SLO. Finally, the amount of time those requests take provides insight into the user experience of your application.
Benefits of the RED method:
1. Lessen the time & effort needed to determine service issues:
RED makes it simple to understand what’s wrong in any service and how to repair it, even if the service you’re attempting to solve is essentially a black box that you don’t understand its internals. You may analyze telemetry data, identify the best course of action to improve the user experience. Taking into consideration that the same measurements are utilized for all services, training time and service-specific expertise are reduced.
2. Aligning with the company objectives:
App users are less concerned with an app’s infrastructure, memory use, CPU utilization, or any other hardware metrics; instead, they are more concerned with error messages they may encounter when using your app, app load time,…etc of issues that may affect their experience while using your app. The RED technique makes it extremely clear when a service is failing to meet your SLO and your users are having a poor experience, thus aligning with your users’ and company’s objectives.
3. Drives standardization and consistency
Using RED for monitoring allows you and your team to be consistent in how you monitor any service since everyone will be looking at the same measure. When building dashboards, they can have the same look and feel, independent of the service being monitored. Thus making it much easier working as a team.
4. Helps with automation
Because all services are considered the same, using RED simplifies and secures the automation of repetitive processes. In addition, since the same three metrics are utilized across services, you can also standardize things like dashboard design.
Limitations of the RED method:
* Since the RED approach was primarily intended for request-driven applications, it may not give the necessary information for use cases such as batch processing or streaming.
* The external perspective of RED may make it difficult to determine how near a service is to failure. A modest increase in traffic may cause your response time to lengthen, and you may not have internal application analytics to explain why.
* Using the RED technique means that your measurements might be interpreted differently based on a variety of circumstances, thus it does need careful planning.
Published: Friday , 12 November , 2021