RED (Rate/Error/Duration)

Focus is application performance.

Rate

  • the rate at which your system is receiving requests
  • can provide important context when monitoring performance or troubleshooting

Errors

  • how many requests are ending in or encountering errors?
  • is a specific call failing 100% of the time?
  • do errors increase as the rate of traffic increases?

Duration

  • length of time each request to your system takes
  • request duration is critical to determining end-user experience and monitoring overall performance
  • “slow is the new down”
    • as page load time increases from 1 to 3 seconds, the likelihood of a user leaving increases by 32%

USE (Utilization/Saturation/Errors)

Focus is system ressources/infrastructure.

Utilization

  • number of resources a system is using to process work
  • cpu, memory, network bandwidth, or even software metrics like process capacity and thread pools

Saturation

  • amount of work that cannot be processed by the system due to a lack of available resources
  • can e.g. be observed as queueing or latency

Errors

  • just as errors can signal issues with your application, they can signal issues with your resources