Everything you wanted to know about web performance but were afraid to ask, part 2
In this video, Joshua Bixby, Strangeloop president, discusses the performance solution landscape, including content delivery networks and application delivery controllers. He also explains how Strangeloop's front-end optimization solution, Site Optimizer, works. (See part 1 here.)
Joshua Bixby: So, let’s come back to this world, I’ve got a multi browser world, I’ve got a multi browser world, I’ve got the internet and I have the data centre. If you’re going to solve this today what would you do? Well, the first thing people think about is I’m going to improve my data centre; I’m going to increase the, the size of my pipe. So, I’m going to go from this pipe to a really big pipe. What we find in our studies, and I would say ubiquitously across all of the studies that I mention, is that this doesn’t help any more.
We are now at the speed of light, you could spend more money and more time in the pipe between you and your data center and the pipe within your data center doesn’t change the performance to your site, that’s important. The next thing I’m going to do is I’m going to invest in bigger servers, I’m going to go from these little servers to really, really big servers. What most of time that we see in a site, let’s say it takes 10 seconds for a site to load here, only about less than one second, maybe 10 percent of the time is taken here. And when we look at the small server, it’s not out of CPU, it’s not out of memory, it’s working as fast as it possibly can.
So, improving your pipe isn’t going to help, getting bigger servers isn’t going to help. Now most organizations then start looking outside, they start investing in things like application delivery controllers. I want, you know, an F5 or Citrix or Cisco or something here that will improve my site. Well, there are certainly is a slight bit of improvement that you can gain from investing in an application delivery controller. But most of the sites on the internet already have a delivery controller and if you look at the Fortune 500 the average load time is 7 seconds, so, clearly and getting worse. So, clearly that’s not something that’s going to dramatically alter the situation, in fact you probably already own one. Organizations also looking investing in CVMs where you can cash some of the objects out here at the edge so those round trips are much smaller.
Again, a solution that’s going to help but as we look at across the Fortune 500 across the enterprise and e-Commerce base almost all of our customers own a CDN and own an ADC and performance still challenged. So, now that we’ve sort of looked at what you could do to solve it, bandwidth isn’t going to help, server isn’t going to help, an ADC will help, but you probably already own one, having a CDN is going to help marginally but you probably own one, how do you move forward from here? One of the solutions that is implemented regularly is to look at front-end acceleration. So, all of the problems we talked about the number of round trips, how browsers load, the way to solve front-end development problems, i.e. the output of your server needs to improve is through development.
So, today if you want to solve this problem you’re going to hire many, many developers and they are going to improve the code on your server. So, let’s improve the code. There are some huge challenges with this; one is that the problem is getting incredibly complex. The code that has to come out to each of these browsers has to be different; it’s a very challenging problem. The other, it’s risky and it’s incredibly expensive. So, the solution today isn’t good enough, I want to talk to you about how to solve this problem. So, let’s take that typical flow again and talk about how we solve the problem that we just demonstrated.
So, let’s go through that same flow, here is the internet, here is my data center and I’m going to drop into this picture, the Strangeloop service or device. We sit either as a service or as a device but really for today’s purpose is, let’s just talk about it as transformation. I need to have a transformation element that sits here. In this case it’s Strangeloop. Let’s follow the process again. The request comes from this browser, gets routed through, gets routed through Strangeloop, comes to the server. As that page comes through the Strangeloop device, remember the first thing that we talked about was the fact that I have time here, this server needs time to stitch together that page, that time where nothing is happening.
The browser is just waiting. So, that’s an opportunity for us and I want to go through and talk about all the opportunities but that’s a big opportunity. What can we do about the opportunity? Well when you come and call for the home page of shop.com, example.com every time there is a standard number of things that you are going to be sent back. So, one of the areas that transformation can help is by sitting and learning. We know as soon as that request hits us, or hits a transformation engine, we can send a response directly back to the server well back to the browser while the server is churning. I can start sending the logo, the home page image, maybe some of the CSS, the stuff I know is going to be called because I’ve seen that HTML document so many times.
So, I can take all the communalities and I can start shipping them down. That’s time today that’s lost with transformation and with site optimization and with the Strangeloop site optimizer, I can now take advantage of that by as soon as the request hits me, we send it off to the server to do its work and we automatically start sending responses, just start filling up the browser, so, that’s number one. Number two is when the server responds and it’s finally created that page it comes through the Strangeloop device again, the one size fits all package that’s coming from the server, and we transform it. What does transformation mean? Well, we change what’s coming out of the server to optimize it for the browser. So, what I’m going to come out of here with is a very different HTML and what I went in with. You sort of take a picture of this, I come in with something that might be very large, long, and I change it to something that’s optimized and is efficient for this especially, you know, these cars that need really good fuel.
So, what’s coming back to the browser is something that will have a different order, let’s say this is one, two, three, four, sort of the lowest common denominator order. This browser might render more efficiently if I might four up top and three there and two and then one for example. The other thing that’s really important and what we can do when we are sitting in a place of transformation is we can combine resources. So, remember when we talked about 80 to 200 resources that this HTML is going to call? So, let’s say the traditional call one, call two, call three, call four, call five, when we transform that we can actually put this into containers. So, I can send you one container that has all of these files embedded in it.
So, the HTML that you’re going to get is actually going to call for instead of 80 for to a 100, 200 elements, it’s going to call for five or ten. Let me show you what that looks like. So, HTML comes through, we start responding with resources, server does its work. Server responds with HTML, we transform it for this browser, so let’s again call this IE7, let’s use that example, transforms it for that browser. So, now I have an IE7 HTML document, as the browser starts reading that, it gets a call to download image package one. Well, if we’ve done our job, you already have image package one because I sent it back to here. So, check, I don’t have to go across the network for that, it’s already there.
Now, image package two probably already sent when the request came in and the server was working, I probably already sent you that, check. Let’s say there is image package three that I haven’t sent, what’s going to happen is I’m going to make a call across the internet for image package three, the Strangeloop device is going to respond with image package three. Remember the diagram that we had before hundreds of roundtrips, the limitations of two highways, the fact that it wasn’t necessarily organized properly? In the Strangeloop world now, in the transformation world, I have responded immediately, so that things are automatically there. More importantly I’ve reduced these roundtrips from hundreds to dozens or in some cases two or three because I’ve been able to package all this stuff together. And also, and very importantly, I’ve been able to package it directly for the specific browser.
There are number of other innovations, now that we have this technology that we can sit in the middle and make transformations there is more that we can do. I want to talk through that. So, one of the most important elements in solving this problem is obviously getting things to you faster and there are a number of opportunities that we have, we talked about a few of them. So, here is the browser again, one of the opportunities we have to get things to you faster is to respond, if we take the Strangeloop device, while the server is thinking. So, I’m just going to number these, one opportunity is as the server is sitting here thinking and churning away and building pages, I have the opportunity take advantage of that. I also have another really interesting opportunity is that once you’ve actually download the page and it’s rendered for you, users spend anywhere here is nice pictures, I won the award for most improved artist in grade eight, so I apologize in advance.
What we see is when a user gets a page they scroll, they read, they read, they absorb the page. We have anywhere from, you know, in the simplest click people can click in about 1.2 seconds. But many time people are waiting 3 to 6 seconds on a page, you get to the home page and you click somewhere else. Again, there is a huge opportunity here. In that one to two second, you know, one to six seconds today, you know, what’s happening between you and the server, nothing. This is a quiet line just waiting to be optimized. So, one of the opportunities that we take advantage of, and that the transformation can take advantage of, is the Strangeloop device while you’re thinking starts to send resources for the most likely next page that you’re going to see.
So that when you click on this page here, you have the bulk of the resources already downloaded for you so that you don’t have to make those calls across the wire, imagine as we are doing this the value of transformation is that we can reduce the number or eliminate roundtrips between the server and the browser. So, this opportunity of preloading, where I’m studying the user behavior, thousands of requests from different users are coming through here, all the way back generating HTML. We sit here, a very intelligent system and we watch, and we start using our learning algorithms to say, “huh, if you went to page one just like your analytics tool does, you’ve got a 30 percent likelihood of going to page two, 20 percent likelihood of going to page three” et cetera.
If you look at the resources across these pages there are many commonalities, lets start packaging those into packages that I can start sending you while you are reading a page. Again the only way we can do this is if we can transform HTML. use the HTML as our protocol to instruct the browser, HTML is what transformation uses to tell a browser you need to start doing what I want you to do. We now have control over our end users. Now, this isn’t something that we use maliciously obviously and browsers control for that we use it to improve performance. So, I’ve got the server wait time, I have the preloading opportunity where users are waiting. I have all of this opportunity to reduce round trips, and I also, and we alluded to this, but I want to speak to it also have the ability to create optimizations that are specific to browsers, be it the order of the browser, we’ve talked about the order, also when browsers and what packages browsers use.
So, the browser wars are significant, Firefox, IE, Chrome and Safari are all trying to win you as a customer. They want your allegiance and they put little hooks into their browsers that allow them to dominate. Think about IE, Microsoft, think about Chrome, Google think about Safari, they have invested interest in building up these tools so that their sights have optimum experience. So, they are building in hooks to let them leverage optimal experience and what transformation lets us do is take advantage of those hooks in order to render the optimum experience. There is more, I want to talk about the next phase of this which is it’s great to get all of the resources down, those are all, those are all very helpful things, it’s great to take advantage of preloader. But let’s talk about a flow through site because this is a really big problem.
When I go to page one; let’s say I have 100 resources. Obviously a flow to get somebody that conversion file, to get somebody out from the top and in through, it’s usually more than one page. I have to in the case of a shopping site go through 10 pages, get a check out, go buy my products, ad products, accessorize them. So, a typical flow goes from page 1 to page 2, to page 3 et cetera. Now, this change is a landscape, the simple problem that we talked about which is one page for one user reducing the round trips, that’s a great strategy. But when you start going from page 1 to page 2 to page 3 the game changes because imagine this page also has 100 round trips and this page also has 100 round trips. But there is a nuance here, they all share, the 100 round trips here 30 of them show up on this page, of this page 20 of these round trips show up on that page.
So, imagine if I took this sort of blind approach, I’m going to make every page have the smallest number of round trips. What would I do is I would start sending you stuff in this package that you also got in this package that you also got in this package. So, I actually would be duplicating our effort here, if the 30 of the trips here are also used here, I do not want to send those 30 when you come to page 2 that would be inefficient. So, we have to take into consideration the flows of customers through the site, what resources are used universally, and how to deal with what the package can be. This is not a simple problem just one package make the site, make each page efficient as possible. I actually have to think about all of these interactions.
I want to talk briefly about how we solve the problem from a technical perspective, what the architecture looks like because I think there is something that’s really important here. We are transforming HTML, in the transformation process imagine HTML is coming out from the server, hits the Strangeloop either appliance, you know, the Strangeloop site optimizer is either an appliance, physical appliance that sits in the data center, a virtual appliance that sits within the server environment or a service, any of those capacities, doesn’t matter which, we still have this. So, I’ve got the data center, I’ve got Strangeloop as that HTML comes through and we are going to act upon it and do our transformation. Imagine this has to be lightening fast, I can’t sit and hold a page while making decisions for anything longer than one millisecond.
That’s basically what our line speed needs to be, one millisecond. Because of that and because of the complexity of transformation we actually have to have, in order to solve this problem effectively, we’ve got to have a process where this can be lightening fast, in line, and where we have the ability to come off to an offline process which is actually structured within the same box where we can do all of our heavy lifting. Because the process of putting things into containers making it right for each browser, making these predictions can be done in one millisecond, you are just going to slow down your site.
So, in order to do effective transformation you need to have this two minded animal inline process where one millisecond make perfect transformations. And I need to have an offline process here where I can basically act like a browser. So, what happens is that page for IE7 comes here, we render it just like a browser would because we need to have the same picture as a browser. We need to create the dom, we need to create everything that a browser creates and then we need to take that and start transforming it. What are the best packages, what are the safe packages, how do I combine things together, what’s the best list of preload based on all off the algorithms and the math that we’ve seen in all of these loads?
So, what’s really important about the architecture here in any transformation product, in any product that’s going to really do heavy transformation of the HTML to create outputs for different users is the ability to have a separation between the line speed, the quick work that has to be done quickly. And an offline process where I can take the time and do it effectively, we cannot have a process where this is all done together. So, that concludes my session on webpage performance and the value of transformation. I encourage you to take a look at all the research that’s out there about why performance increases your business objectives, revenue and it’s been a pleasure. Thank you.