Exploring the methods of end-user experience monitoring for APM

#include <std_disclaimer.h>

Today’s application performance management (APM) marketplace is maturing and the best solutions bring a wide-ranging set of capabilities for measuring performance and understanding behavior for many aspects of the application delivery stack. One of the cornerstone’s of APM is end-user experience monitoring (EUM).

As defined by Gartner for the APM Magic Quadrant, EUM is:
“The capture of data about how end-to-end latency, execution correctness and quality appear to the real user of the application. Secondary focus on application availability may be accomplished by synthetic transactions simulating the end user.”

But what does that mean? What are those capabilities?

There are a number of methods to do end-user monitoring. Each has advantages, and one is not enough. It is important to look at end-user experience through a number of different sides of the prism to really try and understand how the metrics match up against user experience. As I was cataloging them for myself I thought it would be good food for thought to share my definitions.

Synthetic monitoring
Web performance monitoring started with synthetic monitoring in the 1990s. A synthetic monitor is not a real user of your application but an artificial robot user, thus synthetic. The robot periodically executes an interaction with your website, API or web application to verify availability and measure performance. It is one of the easiest monitoring to setup and provides almost immediate value by delivering visibility and hard data without having to install or configure anything within the application. An example of a synthetic monitor would be a web transaction monitor that ensures a online store is working by visiting the home page, searching for a product, viewing the product detail, adding it to the cart, and checking out. This is very similar to the pile of functional tests that should run every time

Although Gartner has relegated synthetic monitoring to an availability role, it still has a lot of value for performance monitoring that passive methods do not address. No other method can help you measure service delivery when real users are not on the system. Thus it is ideal for measuring SLAs. And it is the only way to see individual page resources (a la the waterfall report) as this is still not quite yet a real user monitoring (RUM) capability. Synthetics eliminate a lot of the independent variables that can make it difficult to compare real user monitoring data. Finally, the synthetic connection to the DevOps tool chain of tests run at build or in QA provides a continuous reference point from development environments, through test and production.

Web real-user monitoring (RUM)
When I first saw real-user monitoring back in 2008, I knew it was going to change the way we measure web performance. RUM works by extracting performance values using javascript. As actual users visit web pages performance metrics are beaconed back to the great reporting mothership. Originally, the only metric that could be captured by RUM was a basic page load number, but modern browsers now collect a slew of detailed performance metrics thanks to the w3c timings standards and soon will even provide access to page resource level detail.

RUMs great advantage vs. synthetic is that is can see what’s happening for all of your actual users on all of the web pages they visit. This means you can understand web performance metrics by page, geography, browser or mobile device type. While this provides a broader understanding of general performance, it also has many, many more independent variables making specific trending and comparison more challenging. RUM is also the front-end method by which transactions are “tagged” so they can be traced and correlated through the back-end for greater understanding of how software and infrastructure work together to deliver end-user experience and root-cause analysis.

RUMs greatest and perhaps least exploited value to business today is that it captures business activity information that represents the WHY we have a website to begin with. It is this business outcome data that should be our first canary in the coal mine for determining if something needs attention.

Mobile real-user monitoring
Mobile web applications can be monitored with traditional RUM; however, today’s native mobile apps require a different mechanism to measure the application experience. That is typically accomplished by adding an extra library into your mobile application that beacons mobile application performance data back for reporting. Like traditional RUM, this is also how transactions are “tagged” for mapping through delivery software and infrastructure.

With mobile web traffic now reaching 25% or total traffic and mobile being the #1 method for brands to engage consumers, mobile RUM will be of increasing importance to most organizations.

Network real-user monitoring
Hardware appliances that plug into a network switch’s span port to passively listen to network traffic provides network based RUM that very accurately represents end-to-end network performance of the application. This type of packet smashing leverages timestamps in the network packet headers to break performance down into client, server, and network components.

My own assessment is that network RUM is particularly good at monitoring HTTP API performance for services rather than the higher level end user experience of an application consumer.

Desktop agent based monitoring
A few tools focussed on the enterprise measure end-user performance and usage by installing an agent on the Windows desktop. These agents often use similar technology as network RUM to inspect client network traffic by IP address and Port. This method also provides visibility into usage of enterprise applications as well as general performance and availability.

How many sides of the prism is your organization looking at user experience through?

Hopefully, unless you are already a monitoring guru, you learned a little about the monitoring methods being offered by today’s crop of APM tools for understanding end-user experience. What is also interesting to explore is what capabilities do users get from the different tools leveraging these methods.

Perhaps good subject for a future post :)

Leave a Reply

Your email address will not be published. Required fields are marked *