Category Archives: APM

Exploring the methods of end-user experience monitoring for APM

#include <std_disclaimer.h>

Today’s application performance management (APM) marketplace is maturing and the best solutions bring a wide-ranging set of capabilities for measuring performance and understanding behavior for many aspects of the application delivery stack. One of the cornerstone’s of APM is end-user experience monitoring (EUM).

As defined by Gartner for the APM Magic Quadrant, EUM is:
“The capture of data about how end-to-end latency, execution correctness and quality appear to the real user of the application. Secondary focus on application availability may be accomplished by synthetic transactions simulating the end user.”

But what does that mean? What are those capabilities?

There are a number of methods to do end-user monitoring. Each has advantages, and one is not enough. It is important to look at end-user experience through a number of different sides of the prism to really try and understand how the metrics match up against user experience. As I was cataloging them for myself I thought it would be good food for thought to share my definitions.

Synthetic monitoring
Web performance monitoring started with synthetic monitoring in the 1990s. A synthetic monitor is not a real user of your application but an artificial robot user, thus synthetic. The robot periodically executes an interaction with your website, API or web application to verify availability and measure performance. It is one of the easiest monitoring to setup and provides almost immediate value by delivering visibility and hard data without having to install or configure anything within the application. An example of a synthetic monitor would be a web transaction monitor that ensures a online store is working by visiting the home page, searching for a product, viewing the product detail, adding it to the cart, and checking out. This is very similar to the pile of functional tests that should run every time

Although Gartner has relegated synthetic monitoring to an availability role, it still has a lot of value for performance monitoring that passive methods do not address. No other method can help you measure service delivery when real users are not on the system. Thus it is ideal for measuring SLAs. And it is the only way to see individual page resources (a la the waterfall report) as this is still not quite yet a real user monitoring (RUM) capability. Synthetics eliminate a lot of the independent variables that can make it difficult to compare real user monitoring data. Finally, the synthetic connection to the DevOps tool chain of tests run at build or in QA provides a continuous reference point from development environments, through test and production.

Web real-user monitoring (RUM)
When I first saw real-user monitoring back in 2008, I knew it was going to change the way we measure web performance. RUM works by extracting performance values using javascript. As actual users visit web pages performance metrics are beaconed back to the great reporting mothership. Originally, the only metric that could be captured by RUM was a basic page load number, but modern browsers now collect a slew of detailed performance metrics thanks to the w3c timings standards and soon will even provide access to page resource level detail.

RUMs great advantage vs. synthetic is that is can see what’s happening for all of your actual users on all of the web pages they visit. This means you can understand web performance metrics by page, geography, browser or mobile device type. While this provides a broader understanding of general performance, it also has many, many more independent variables making specific trending and comparison more challenging. RUM is also the front-end method by which transactions are “tagged” so they can be traced and correlated through the back-end for greater understanding of how software and infrastructure work together to deliver end-user experience and root-cause analysis.

RUMs greatest and perhaps least exploited value to business today is that it captures business activity information that represents the WHY we have a website to begin with. It is this business outcome data that should be our first canary in the coal mine for determining if something needs attention.

Mobile real-user monitoring
Mobile web applications can be monitored with traditional RUM; however, today’s native mobile apps require a different mechanism to measure the application experience. That is typically accomplished by adding an extra library into your mobile application that beacons mobile application performance data back for reporting. Like traditional RUM, this is also how transactions are “tagged” for mapping through delivery software and infrastructure.

With mobile web traffic now reaching 25% or total traffic and mobile being the #1 method for brands to engage consumers, mobile RUM will be of increasing importance to most organizations.

Network real-user monitoring
Hardware appliances that plug into a network switch’s span port to passively listen to network traffic provides network based RUM that very accurately represents end-to-end network performance of the application. This type of packet smashing leverages timestamps in the network packet headers to break performance down into client, server, and network components.

My own assessment is that network RUM is particularly good at monitoring HTTP API performance for services rather than the higher level end user experience of an application consumer.

Desktop agent based monitoring
A few tools focussed on the enterprise measure end-user performance and usage by installing an agent on the Windows desktop. These agents often use similar technology as network RUM to inspect client network traffic by IP address and Port. This method also provides visibility into usage of enterprise applications as well as general performance and availability.

How many sides of the prism is your organization looking at user experience through?

Hopefully, unless you are already a monitoring guru, you learned a little about the monitoring methods being offered by today’s crop of APM tools for understanding end-user experience. What is also interesting to explore is what capabilities do users get from the different tools leveraging these methods.

Perhaps good subject for a future post :)

Unified Monitoring – the new monitoring renaissance has a moniker

#include <std_disclaimer.h>

I’ve been seeing a lot of marketing leveraging the term Unified Monitoring lately. At times it’s made me smirk but mostly smile. Let me explain.

It’s made me smirk because once again what’s old is new.

Many of the infrastructure components of Unified Monitoring have been a part of Enterprise Systems Management tools for more than 20 years. IBM, BMC, and CA products have offered dashboards, event management, correlation, reporting and service level management for as long as I can remember and I have a fair amount of gray hair :)

What’s so compelling is to maximize customer experience in the real-time digital enterprise we are re-imagining traditional management systems. Existing network and systems management capabilities are being enhanced with easy-to-use web based access, big data powered analytics, and more focused APM capabilities including visitor behavioral data. Put all of that capability in a well-defined, self-service pricing model bringing it to hundreds of thousands of companies and not just blue chip enterprises and you can start to see the potential.

This is making me smile a big toothy grin!

I’ve suggested before that we are in the great monitoring renaissance and I think that the term Unified Monitoring is probably the arrowhead that all this is lining up behind.

For years we have heard the term business alignment to help IT do the right thing. In tomorrow’s successful digital enterprise there will be no clear lines between business and IT. There will just be teams of specialists all working on part of the customer experience, the business. And, those teams will include IT people and UX people and marketing people and customer support people.

Do you remember, back in middle-school, the way the science book used to have those cellophanes of the human body systems? The skeleton, muscular, circulatory, organs layers. I’ve always had this vision that we could do the same for our customer experience delivery stack. Business results come from user behaviors that are the result of user experience delivered by application performance supported by the technology delivery stack.

Layering the business like this allows the team to focus on business results. And it let’s the teams focus on building a user experience first and then the technology required to support that. Combine the above visualization of the layers with powerful anomaly detection and statistical algorithms and you now have a competent and logical artificial intellect helping you deliver, manage and optimize the customer experience. Add marketing analytics, financial and supply chain data and we might be able to imagine closed-loop, machine learning powered Business Resource Planning.

I’m excited about the future of Unified Monitoring and you should be too!

Am I being too utopian?

How to select the most important web performance metric as a KPI – #feelsfast

We all know intrinsically that website performance is important. It has a tremendous impact on all of the business KPIs that measure the success of our online endeavor. I think website performance gets so much attention for two reasons: (1) it’s the most obvious symptom for bad results and (2) and it is easy to measure.

In my larger philosophical views on Customer Experience (CX) I’ve suggested…

PX > CX > UX = #usable + #feelsfast + #emotive

Feelsfast here represents “a lack of perceived latency.”

15 years ago, when I first started thinking about web performance, we only had network oriented metrics to understand web page performance. Today, there are a larger set of collectible metrics to measure many aspects of the spectrum on User Experience (UX). And today’s web applications, because they are pretty fat clients, must take client-side performance into account as well.

We are constantly reminded of the importance of performance by vendors, the media and customers through their actions.

What is the most important metric to measure web performance as a KPI? It’s the one that best represents User Experience or a lack of perceived latency.

We have network metrics. These are old school metrics that focus on how long it takes your server and the network to delver web page resources to the browser’s network layer.

  • DNS lookup time – time to resolve DNS name
  • TCP connect time – time to TCP connect
  • SSL handshake time – time to perform SSL handshake
  • Time to first byte – time to receive the first packet of data
  • Time to receive the data – time to receive the data
  • Fullpage time – time to load the web page and all it’s resources

Most of today’s modern network browsers supplement this with a richer set of data based on the W3C standards for navigation timings:

  • navigationStart – time that the action was triggered
  • unloadEventStart – time of start of unload event
  • unloadEventEnd – time of completion of unload event
  • redirectStart – time http redirection begins
  • redirectEnd – time http redirection completes
  • fetchStart – time that request begins
  • domainLookupStart – time of start of DNS resolution
  • domainLookupEnd – time DNS resolution completes
  • connectStart – time when tcp connect request begins
  • connectEnd – time when tcp connect completes
  • secureConnectionStart – time just before secure handshake
  • requestStart – time that the browser requests the resource
  • responseStart – time that the browser receives first packet of data
  • responseEnd – time the browser receives the last byte of data
  • domLoading – time that the document object is created
  • domInteractive – time when the browser finishes parsing the document
  • domContentLoadedEventStart – time just before DomContentLoaded event
  • domContentLoadedEventEnd – time just after DOMContentLoaded event
  • domComplete – time when the load event of the document is completed
  • loadEventStart – time when the page load event is fired
  • loadEventEnd – time when the page load event completes

This is a nice visual of the W3C timings.

timing-overview

And we have visual timing metrics available from various tools:

  • IE11 brings us msFirstPaint as part of the browser timings
  • webpagetest.org gives us start render, filmstrip view, and the innovative speed index
  • AlertSite.com can provide visual capture and metrics for FirstPaint and Above the Fold using Firefox

How do you choose which web performance metric has the most value as a KPI when all of these have some value? The key is to identify, for any particular monitored application or web page, which metric best represents a users perception of latency, in other words, if it feels fast. This is likely one of the more modern metrics – loadEventEnd, FirstPaint, Speedindex, Above the fold.

Once selected, this #feelsfast metric should become a critical business KPI and tracked and managed as such.

Are you giving the web performance component of UX enough attention?

Ken

We are in the great monitoring renaissance

#include <std_disclaimer.h>

Someone told me just yesterday that my head was in the clouds. That I was too much of a dreamer about monitoring, but I really disagree. We are in the great business and application monitoring renaissance!

Today, monitoring systems both open source and from leading vendors are simpler to implement and distill better intelligence about application performance than ever before and better capabilities are coming.

There are a pile of vendors that do all or most of the 5 APM dimensions described by Gartner. The future though is different. It’s something more, something with it’s own intuition to help us normal humans manage things well. And it will be more than a system that helps you become aware and address technical performance issues like today’s APM. It will be a system that helps you manage Customer Experience across all channels.

Someday we may have the internet of things (IoT) because everything will be a sensor, but we already have a lot of sensor data for managing business, applications, networks and platforms.

Many organizations already have sensors that collect performance and availability data from:
– synthetic end-user monitoring
– real user monitoring
– algorithm performance
– transaction tracing
– platform monitoring
– network performance monitoring
– database performance
– visitor analytics
– business performance statistics
– events like product releases

The bigger issue is that much of the above sensor data are still looked at in a non-integrated way.

What organizations need are business analytics and performance systems that give us the traditional shareable KPI dashboards with a layer underneath. That statistically powered, machine learning layer that includes analyzing the streams of “big data” coming from all those sensors in real-time, identifying anomalous behavior and correlating other anomalous events all the way from the technical stack, through to the user experience layer, and ending up with business results.

I was told that this is too complex. That it will never be mainstream.

Yes, performing streaming analysis of data in real-time and correlating that across hundreds or thousands of metrics is complex, and so is a fingerprint sensor on a smartphone. It’s ok if something is complex inside as long as the user interaction is not complex. Well designed products take very complex things and make them simple for users to leverage.

This isn’t anything as futuristic as AI. In fact, to me this seems like the maturation of business intelligence systems applied to customer experience. In the beginning there was the data. The data is big and raw and complex and hard to look at. Over the years we turned that data into information. Delivering reports and dashboards that make it easy to understand and ask questions of the data or build dashboards to show KPIs over time. The fulfillment of BI promise is that software systems can help us turn data into information and into knowledge.

That’s really what we are striving for. That our operational systems are smart enough to self-identify anomalous behavior anywhere is the business / technology stack. Machine detected anomalies effectively create a warrant which needs to be triaged before jumping in to action. But isn’t that what we really want from our business monitoring systems.

Tell me when something unusual is happening and provide all the related things that could be causing it.

Just my 2-cents. It doesn’t seem like rocket science to me.

Ken

Gartner’s Application Performance Mangement leaderboard likely to keep changing in 2014

Every year Gartner publishes the Magic Quadrant (MQ) for Application Performance Management (APM). It is one of the most comprehensive reports covering APM vendors that can address all 5 of the dimensions Gartner has defined. There are many other tools and solutions that focus on specific areas, often more effectively than those in the research, but the list includes only those who provide a complete solution. Magic Quadrants are an info-graphic that displays who the competing players in a major technology market broken down into leaders, visionaries, niche players and challengers.

What’s so interesting is are the changes from 2012 to 2013, and maybe, how things are likely to shift again.

The APM marketplace is very dynamic! Two factors are making it so dynamic. The first, of course, is that digital experiences have become much more significant to our business strategies driven by Cloud, Mobile and Social. The second, a more traditional story, is that the pace of innovation in application performance management is so breakneck that many of the traditional leaders have lost their footing to newer more agile startups.

Let’s focus on the top right quadrant, the leaders quadrant. The 2013 research lists just 4 technology players on the leaderboard. They are Compuware, courtesy of their Gomez and Dynatrace acquisitions; Riversoft, courtesy of it’s OpNet acquisition; and two recently started APM innovators AppDynamics and New Relic. Two of the entrants are jazzy new startups hatched from the brain trust at Wiley Technologies (acquired by CA) – New Relic and AppDyamics. The other two have invested $600M and around $1B in acquisitions to grow into the leaders quadrant.

This is a significant change from the 2012 Magic Quadrant which listed IBM, CA, Quest (now Dell), and BMC in the leaders quadrant as well as the products from 2013. If we add HP and Microsoft to the list, not a single one of the BIG systems management players – HP, IBM, CA, BMC, MS or Dell for that matter – have innovated enough to be a leader. You now what that means :)

There has already been a significant amount of reporting about how New Relic is readying themselves for a likely 2014 IPO and the same can be said for AppDynamics.

Compuware’s business has been under fire for sometime while they try to transform themselves to more relevant businesses. Even Riversoft had recent rumors of a private equity bid of over $3B.

How long will 6 ginormous systems management vendors be without leading products in the hottest part of the IT Ops marketplace?

I’m guessing while we may have many of the same products in the 2014 leaders quadrant at least a couple will be operating as a part of IBM, HP, CA, BMC, Microsoft or Dell.

In fact, getting acquired again by CA might help New Relic’s CEO, Lew Cirne, out of his patent disputes over former Wiley patents.

Ken

Related links:
What is a Gartner Magic Quadrant

See the 2013 Gartner Magic Quadrant for Application Performance Management from AppDynamics and register for a copy (scroll down below the form to see)

A glimpse of the 2012 Magic Quadrant for Application Performance Management