Lessons Learnt: Challenges with Application Visibility

After I create a web application and make it generally available to people for use, the first and foremost challenge that comes in front of me is to find out number of servers to be deployed for sustaining the incoming traffic. For determining number of servers I need to know the traffic that is landing on my servers. Knowing aggregated numbers does not help here really because the traffic does not come at a constant rate. Understanding the traffic pattern is interesting. To start with I need to provision servers based on maximum load unless I have a solution that automatically scale my servers based on traffic.

Once I make sure that all the traffic is being taken care properly, I start worrying about the user’s experience. The very first experience builds based on how much time users need to wait after a click to the page load. I majorly divide it into two parts: one is the response time from server and other is the time taken by browser in rendering.

When everything looks good at gross level, time comes to drill down deeper and figure out what areas (URLs) of the applications are being used more than the others. Typically these are the critical parts of application workflow and require special attention for availability as well as response time. When I see that traffic to some URL is going very high, I consider dedicating servers for that traffic and try optimizing that code path.

When high priority infrastructure items are taken care of, I would like to know more about users. First point is how many users are visiting, they are new or returning, after how much time they return, how many pages they look or better what pages they visit in what order, what browsers or devices they use, from what they come, how they are distributed geographically etc.

Next I want to figure out how much of the traffic is coming from non-useful bots so that I can block that traffic somehow and focus on the real traffic. I would also like to detect (as well as prevent) traffic sent with the malicious intent by an attacker. The malicious intent may be DDoS kind of attach for impacting availability of my application or a try to steal data or just an attempt to harm the user interface.

When I come up with new version of my application, I would like to compare its performance with the version currently running. Only if I find the performance acceptable, I would like to expose the new version to all my customers.

Giving the high level visibility into these areas is not enough. When something goes wrong, the tool providing me application visibility should also help me debugging the issue and allow me to drill down to the individual request level.

Google Analytics is generally known to fulfil most of application visibility aspects. However, that does not satisfy the need because of following reasons:

Google Analytics only capture the traffic in which HTML page is accessed. All my ajax calls and requests to other resources are not included

Google Analytics gets triggers only when JavaScript on the page is executed. Access of the page by bots in not included

So what I see in Google Analytics is the subset of the traffic hitting my servers. When reports say that the bot traffic amounts to more than 30%, I can’t accurately plan based on Google Analytics number. Also Google Analytics being a client side JavaScript tool has no understanding of my deployment infrastructure and neither helps me drill down for issues not help me scale.

A piece that is in traffic path is load balancer. But load balancers like AWS ELB, NginX or HA Proxy provide very limited visibility into traffic and are not really helpful from insights point of view.

[The post has also been published at Appcito Blog]

Lessons Learnt

Challenges with Application Visibility

No comments:

Post a Comment