While working on optimizing page load times for a high-profile AJAX application, I had a chance to investigate how much I could reduce latency due to external objects. Specifically, I looked into how the HTTP client implementation in common browsers and characteristics of common Internet connections affect page load time for pages with many small objects.
I found a few things to be interesting:
IE, Firefox, and Safari ship with HTTP pipelining disabled by default; Opera is the only browser I know of that enables it. No pipelining means each request has to be answered and its connection freed up before the next request can be sent. This incurs average extra latency of the round-trip (ping) time to the user divided by the number of connections allowed. Or if your server has HTTP keepalives disabled, doing another TCP three-way handshake adds another round trip, doubling this latency.
By default, IE allows only two outstanding connections per hostname when talking to HTTP/1.1 servers or eight-ish outstanding connections total. Firefox has similar limits. Using up to four hostnames instead of one will give you more connections. (IP addresses don't matter; the hostnames can all point to the same IP.)
Most DSL or cable Internet connections have asymmetric bandwidth, at rates like 1.5Mbit down/128Kbit up, 6Mbit down/512Kbit up, etc. Ratios of download to upload bandwidth are commonly in the 5:1 to 20:1 range. This means that for your users, a request takes the same amount of time to send as it takes to receive an object of 5 to 20 times the request size. Requests are commonly around 500 bytes, so this should significantly impact objects that are smaller than maybe 2.5k to 10k. This means that serving small objects might mean the page load is bottlenecked on the users' upload bandwidth, as strange as that may sound.
Using these, I came up with a model to guesstimate the effective bandwidth of users of various flavors of network connections when loading various object sizes. It assumes that each HTTP request is 500 bytes and that the HTTP reply includes 500 bytes of headers in addition to the object requested. It is simplistic and only covers connection limits and asymmetric bandwidth, and doesn't account for the TCP handshake of the first request of a persistent (keepalive) connection, which is amortized when requesting many objects from the same connection. Note that this is best-case effective bandwidth and doesn't include other limitations like TCP slow-start, packet loss, etc. The results are interesting enough to suggest avenues of exploration but are no substitute for actually measuring the difference with real browsers.
To show the effect of keepalives and multiple hostnames, I simulated a user on net offering 1.5Mbit down/384Kbit up who is 100ms away with 0% packet loss. This roughly corresponds to medium-speed ADSL on the other side of the U.S. from your servers. Shown here is the effective bandwidth while loading a page with many objects of a given size, with effective bandwidth defined as total object bytes received divided by the time to receive them:
Interesting things to note:
For objects of relatively small size (the left-hand portion of the graph), you can see from the empty space above the plotted line how little of the user's downstream bandwidth is being used, even though the browser is requesting objects as fast as it can. This user has to be requesting objects larger than 100k before he's mostly filling his available downstream bandwidth.
For objects under roughly 8k in size, you can double his effective bandwidth by turning keepalives on and spreading the requests over four hostnames. This is a huge win.
If the user were to enable pipelining in his browser (such as setting Firefox's network.http.pipelining in about:config), the number of hostnames we use wouldn't matter, and he'd make even more effective use of his available bandwidth. But we can't control that server-side.
Perhaps more clearly, the following is a graph of how much faster pages could load for an assortment of common access speeds and latencies with many external objects spread over four hostnames and keepalives enabled. Baseline (0%) is one hostname and keepalives disabled.
Interesting things from that graph:
If you load many objects smaller than 10k, both local users and ones on the other side of the world could see substantial improvement from enabling keepalives and spreading requests over 4 hostnames.
There is a much greater improvement for users further away.
This will matter more as access speeds increase. The user on 100meg ethernet only 20ms away from the server saw the biggest improvement.
One more thing I examined was the effect of request size on effective bandwidth. The above graphs assumed 500 byte requests and 500 bytes of reply headers in addition to the object contents. How does changing that affect performance of our 1.5Mbit down/384Kbit up and 100ms away user, assuming we're already using four hostnames and keepalives?
This shows that at small object sizes, we're bottlenecked on the upstream bandwidth. The browser sending larger requests (such as ones laden with lots of cookies) seems to slow the requests down by 40% worst-case for this user.
As I've said, these graphs are based on a simulation and don't account for a number of real-world factors. But I've unscientifically verified the results with real browsers on real net and believe them to be a useful gauge. I'd like to find the time and resources to reproduce these using real data collected from real browsers over a range of object sizes, access speeds, and latencies.