Today, it’s nigh impossible to deploy an application without using APIs. While APIs have long been important to a certain extent for sharing data or other resources between applications, they have grown into an essential part of the architecture of most modern, cloud-native applications. No matter which language your app is written in, how you deploy it or which type of operating systems it runs on, it very likely relies heavily on APIs to do its work.
By extension, guaranteeing API performance and availability is also more critical than ever. Proper monitoring of all the APIs in your portfolio will go a long way to ensure application and business health.
With this need in mind, this article examines key metrics that you can monitor to help optimize API performance and availability.
API monitoring concepts
Before diving into those metrics, note that there are three “dimensions” that you should be reporting on when monitoring APIs: The client (or caller), the API layer, and the back-end implementation(s) supporting the API.
It’s also important to test beyond intended functionality. You want to be sure your APIs are useful for the types of problems you designed them to solve, but you also need to anticipate the unique ways clients will use them as well. For example, this may include combinations of calls, where results from one API call are combined or used as input to a second call to form the output in a unique way. (Think mashups and screen scraping.) As a result, a change or issue with one call can affect others. To the extent possible, functionality monitoring should consider all the edge cases with API usage.
When it comes to software, every list of concerns should always start with security. When it comes to an API, security is paramount given the wide-range of customers, end-users, application use cases, and languages used to access it. You need to ensure that your application, your API, and your data remain secure, and that key information is encrypted and protected along the way. Don’t forget about connection management and role-based access control.
Obviously, the internal performance of your API is critical to customer satisfaction and the health of your own dependant applications. Monitoring performance goes beyond one-time execution in an acceptance test phase. Regular monitoring provides a complete picture of performance over time, where performance might degrade slowly, or slow periodically over 24-hours due to exposed bottlenecks or scalability issues. It’s also important to have target performance goals or service-level agreements (SLAs) that you can measure against.
To ensure that your API performance monitoring is done continuously or at least regularly (to avoid impact if that’s an issue), looking at the following metrics:
- Usage count: the number of times an API call is made within a given time period; the volume of API callers; the number of unique simultaneous callers; and the distribution of back-end calls made to support front-end API calls.
- Latency: Monitor latency metrics to measure the overall responsiveness of your API calls. Remember to track absolute latency (looking for outliers), not just the average, as averages will cover up the long-tail calls.
- Back-end latency: measure the time between when API code sends a request to a back-end implementation and when a response is received from the back-end
- Caching: Monitor data caching metrics to optimize cache capacities to achieve desired target performance goals.
- Errors: measure client (caller-side) errors; errors that occur within the API implementation; and any errors in the back-end services supporting your API.
Know and guard against external forces that can affect your API performance — for example, slow clients that hold up synchronous execution, or exceptions that occur within callbacks to clients. This also includes network issues, either between your API and back-end services, external API calls you depend on, and internal network connectivity issues or bottlenecks that might creep in.
Finally, don’t forget about scalability. Measuring performance for each call by a single caller is an important metric, but be sure to measure and understand how performance tails off as larger numbers of simultaneous calls are made. This can be from sheer volume of calls made in a time period, the number of unique callers themselves, or a combination.
Ensure you’re the first to know when an API isn’t available to your callers before those clients begin flooding your help desk. Although your API may be fast and scale well, you need to know that it’s returning the correct responses and data. Even if your API is responsive, if it returns bad data or errors, it should be considered unavailable.
Know how easy it is to integrate your API with applications and other APIs. An integrated API testing and monitoring framework allows your developers to test and measure APIs quickly, taking the guesswork out of front-end developers’ integration efforts.
As mentioned above, monitoring and validating data returned to the caller is a critical API metric. However, it’s just as important to monitor and validate inbound data from callers. If trends are measured where improper data is sent to your APIs, it could indicate issues such as potential denial-of-service attacks, invalid responses from your APIs used as input to subsequent calls, or a misunderstanding of the way your API is meant to be used.
Measure for memory leaks
Memory leaks take many forms, and some language platforms help guard against them. APIs are unique in that they’re often subject to the widest type of memory errors, such as:
- Leaks that occur within the API layer itself
- Leaks within the back-end services that support the API
- Leaks from the interaction between the API and the back-end (e.g. forgetting to free memory allocated by the back-end after use)
- The failure of clients to free memory your API allocates on their behalf. This could indicate an API usability issue as well.
Even in the age of garbage collection, memory leaks can occur when applications and callers hang on to object references longer than anticipated. Measure memory at each link in the chain to know where issues are occurring to help improve usability going forward.
Detect issues proactively
Simulate real users using tools that help monitor your API as though real callers are using them. That means executing real-world transactions on your APIs instead of monitoring individual endpoint calls made out of the context of actual application usage. As a result, this approach often helps uncover issues that can occur with real callers, proactively, helping to detect and correct issues before your paying customers (or your own applications) are impacted.
Monitor dependent APIs
Your internal API may depend on other internal APIs or external APIs for its own functionality. Even after you’ve put proper monitoring in place for your own APIs, you can’t be sure the same has been done for external dependent APIs. Monitor and measure these APIs over time to be sure you understand their behavior and availability. Guard against adverse behavior, or find alternatives if there’s no other choice.