Good article. Thanks.
Good article thanks.
excellent article!
jenny
http://www.spaml.com
but if you’re sending a bigger chunk (bandwidth) then my download finishes earlier, which is less time which is faster internet (bandwidth).
[...] Still the Latency, Stupid… Filed under: Uncategorized — recar @ 11:06 pm It’s Still the Latency, Stupid… If you think bandwidth is the only thing affecting your network speed, think again. As pipes get [...]
You are in violation of your Google Ads agreement by encouraging people to click on your ad links. Don’t let Google catch you…
“Thanks for Stopping By.
If you found this article useful, please leave a tip by clicking on an ad. “
I never realized that way - high BW is not necessarily high speed. Sandbag problem explains it all. Thanks.
Not bad, you are displaying at least displaying an understanding of the issues.
You said that an average ping to http://www.google.com (137 km from your house) took 73ms. I live on the East Coast of the US and when I ping http://www.google.com, I get an average of 19ms. Also, if I ping http://www.bbc.co.uk I get an average of 44ms. Here’s a picture: http://img478.imageshack.us/img478/7005/pingsup6.jpg . I, too, have Comcast as my provider. Am I missing something?
Eric,
You are actually demonstrating one point from my article, and point from the upcoming part2. You and I are seeing different latencies because of distance and a lack of SLAs on the Comcast connection. When I ping the same Google IP from my Comcast connection, I get ping times of 95ms. So, point #1 is latency is a function of distance and it has an impact on customer experience. Point #2 from the next article is that one method of fighting latency is to create a distributed architecture to move your services closer to your end users. This is what Google has done. The Google server you are accessing is close to you on the East coast. That is why you ping it at 19ms and I ping it at 95ms.
Thanks for taking the time to read and comment.
-Bill
Eric (other Eric),
It is absolutely NOT a violation of the Google Terms and Conditions to encourage my readers to click on an ad and visit my sponsors. The whole purpose of ads is to encourage this type of behavior. More specifically, the Google T&Cs read as follows:
“You shall not, and shall not authorize or encourage any third party to: (i) directly or indirectly generate queries, Referral Events, or impressions of or clicks on any Ad, Link, Search Result, or Referral Button through any automated, deceptive, fraudulent or other invalid means, including but not limited to through repeated manual clicks, the use of robots or other automated query tools and/or computer generated search requests, and/or the unauthorized use of other search engine optimization services and/or software”
I am not encouraging anyone to create “repeated manual clicks.” I merely ask that those who are kind enough to visit my site and read my articles support the sponsoring ads that make this site possible. I know most people are busy and often forget that ads are placed on a site not to annoy readers but to help offset the time and expense providing useful and original content entails. I make a concerted effort to visit the ads on sites I find useful and hope others will do the same for me.
I greatly appreciate all my readers and hope they find the experience was beneficial. I also appreciate Google and its ability to both drive traffic to my site and help me offset some of the costs. Ultimately it is up to the sponsors to make the results from you visiting their site worth the $.05 they pay me for the click.
Thank you for visiting and please come back to read part 2 of this article.
-Bill
You’re going to get banned either way if you keep it up lol. Google doesn’t even want you to mention the ads, let alone tell people to click them even once.
encourage any third party to: (i) directly or indirectly generate queries
you just said directly, generate one click.
Bill,
Thank you for clearing that up 
Bill,
While it is not a violation of the AdSense ToC, please take time to peruse this quotation from the Program Policies.
“In order to ensure a good experience for users and advertisers, publishers may not request that users click the ads on their sites or rely on deceptive implementation methods to obtain clicks. Publishers participating in the AdSense program:
* May not encourage users to click the Google ads by using phrases such as “click the ads,” “support us,” “visit these links,” or other similar language.”
According to this, requesting users to make a click is against the policy.
- Nic
Hey, if they ban me, they ban me. There’s always commission junction, ad brite, link exchange, etc. I’m lucky if I make enough off all of them to cover my hosting fees in any given month, so I’m not too concerned. I maintain that their T&Cs only prevent me from generating queries, clicks or referrals through “automated, deceptive, fraudulent or other invalid means…” Nothing automated, deceptive or fraudulent about asking people to support your sponsors. Not usings robots, automated query tools, or computer generated search requests. I’m just one poor schmuck pimping his wares.
If you really want to support me, buy one of the books I recommend off my amazon link, or use my Amazon search box to find and buy something else. I make more money off Amazon than anything else…still not enough to fund my own book habit though. 
-Bill
Am i missing something? You say theoretical speed through fiber optics is 200*10^6 m/s. That’s 200*10^3 kilometers/s, or 200,000 km/sec. Then you make the completely reasonable assumption that the USA is 5,000 km across. Then you somehow say that that means that the theoretical latency is therefore 50ms.
To find the theoretical latency, you then just take the distance (5,000km) and divide it by your given theoretical speed (200,000km/sec). The distances cancel, leaving you with seconds as your unit. Switch that to ms, and you have what you want–the latency.
5,000km / 200,000km/s = .025s = 25ms
25 ms seems like the correct theoretical latency to me, which means saying “a 55ms RTT SLA is pretty good” would actually not be true at all, given that it’s over twice the theoretical speed.
Unless I’m missing something, the math stated in this article is incorrect, which really serves to damage it’s credibility.
- Sam
Nic,
I re-read the Google policies and changed my pages to read “please visit our sponsors.” It’s walking a fine line, but I don’t specify which ads to click. Google and its advertisers should want people to click the ads. Hopefully, they won’t ban me but if they do, c’est la vie.
Sam,
Latency is measured in round trip time. The packet has to go there, and the acknowledgement has to come back. Google Maps reports the distance from San Francisco to New York as 2,907 miles or 4,678 kilometers. Fiber doesn’t follow a straight path, so I rounded up to 5,000km. Round trip it is 10,000km. Hence 50ms as the theoretical speed. Sprint is advertising a North American SLA on MPLS of 55ms between any 2 POPs. This means they have 10% overhead on the theoretical limit. Thanks for reading and commenting.
-Bill
I’m not impressed with my DSL ISP - a province-wide telco. They do what I call “Fedex-style” routing. Instead of mesh-routing or shortest-path routing, they route the traffic from every smaller town back to their central NOC and back out again. The valley I live in has 7 towns about 20 miles apart with a fiber point-to-point backbone connecting them.
The Telco uses the point-to-point fiber for voice traffic, but a tracert from town A to adjacent town B shows it gets routed to big city NOC 300 miles away and back, so instead of a 20 mi. direct route, it takes at least 600 mi. get 20 mi. up the valley.
Why do they do this? Maybe they can’t afford BGP-capable routers? They can’t afford enough knowledgeable techs to configure a mesh network? They have to do it this way (bring it all back to one central monitoring point) to comply with CALEA-type requirements?
I thought part of the rationale for the Internet was to minimize single-point-of-failure situations. This ‘Fedex-style’ routing can’t be good for latency either..
Any logical reason why Telco ISP does their network this way?
Nice write up!
Living in Hawaii and working for an ISP, the concept of the “big long pipe” is an everyday concern for us. Best case latency for us to the mainland is usually around 60ms first hop with an average of 150 to 200ms to actual servers.
With latency in this range a windows user can’t even benefit from an internet connection over 3mbs (give or take) without doing some tweaking.
Anyway good intro write up to the problem. I love the analogy, I think I’ll steal it.
What about Latency of ethernet gigabit network cards. I worry much about latencies in my very fast NFS (Network File System). Gigabit is enough but most disk reads are small and occur many at once. I have very low access time on SCSI disks and huge (over 6GB) memory disk cache. Usually when you want to buy network interface card you don’t have latency parameter in product description. I’d like to know card are better for that job and how should I search for them.
Off topic : the “If you found this article useful, please visit our sponsors.” right above your Adsense ads will some day get you in trouble
(read: violate Adsense TOS)
Thank you for a great article. I’t was very educational. You’re in my bookmarks!
Hey Pat,
Serialization delay or latency is the amount of time for a packet to be transmitted on the physical medium. This delay is determined by the size of a packet and the rate of your physical interface.
Serialization delay is only a concern on links below a T1 (for the most part). Anything higher is generally fast enough to never cause any noticeable delay. On your gigabit ethernet link the serialization is fixed so for your example it only takes 0.12ms to transmit a 1500 byte packet. The reason why you never hear anyone advertising what thier NIC serial latency is because it is fixed.
Also, the topic of this article is on TCP windowing, for all practical purposes TCP sessions are from NIC to NIC. Even though gigabit latency is better then most modern hard drives it’s irrelevant to this discussion as this topic is about the impact of what’s called propagation delay on a TCP session. Unless you are dealing with a SAN and have to worry about large data transfers from server to server over direct gigabit links then don’t worry about how your hard drives performs compared to your NIC.
I’m a fairly junior sysadmin/network admin but this doesn’t really cover much or explain it correctly — The article makes it look like the further away the destination, the bigger effect on bandwidth when in reality it is very rare if the TCP window size is large enough.
I wrote a little guide on network performance tuning that covers I think most reasons for low bandwidth or high latency - the guide can be found here: http://hackeron.dyndns.org/hackeron/trac.cgi/wiki/Linux%20Network%20Performance%20Tuning
Roman, your link and his are saying the same thing. I wouldn’t be surprised if his next article is about tweaking a windows PC the same way yours discusses tweaking a linux box.
Roman,
Thanks for the great link. This first article was discussing the limitations of TCP on “big long pipes” as Mark put it. The next article will focus on what to do about it. Although much of what I will cover is network and/or infrastructure solutions, I also plan to cover host tweaks in most major operating systems. I will be sure to link back to your page. Thanks for reading.
-Bill
We get away with a much smaller pipe than we would normally- we have an appliance that caches all data on a block level that gets sent over the WAN and only ever sends data once. Most of the time, it just sends references to the other side.
It’s kinda cool, because if an employee accesses a file on a server over the WAN, changes a bunch of stuff, and then emails it to someone else back at the other side, the WAN sends a bunch of references to the blocks that made up the unchanged parts file.
Hi,
I’ve noticed some fundamental flaws in your understanding of how TCP works. I’ve put a full response on my blog:
http://fragglet.livejournal.com/11924.html
Fragglet,
Thanks for linking to my article. You are correct that TCP will, to use your analogy, add more trucks, up to a point. How many trucks TCP will support depends on the host OSs on each end, many of which still default to a 64K Byte TCP window. You are incorrect in assuming this only applies to 10Gb/s links. On a long-distance WAN, latency can have an impact on T1s. You are also incorrect in assuming that the TCP window will not shrink due to congestion control algorithms. Most high-latent connections will also experience an increase in packet loss. When packets are lost, the congestion algorithm will decrease the congestion window.
While my sandbag analogy is not perfect, it does describe a fairly complex concept in language that a non-technical person can understand. As distance increases, it takes longer for the packet to travel round trip (the wall) and in some cases the TCP window (the container) shrinks. I hope you will check back in for the 2nd part in the series. In that article I will discuss what to do about latency. This includes tweaking the host TCP stack to increase the “number of trucks” as well as using network accelerators.
Thanks,
-Bill
> Thanks for linking to my article. You are correct that TCP will, to use
> your analogy, add more trucks, up to a point. How many trucks TCP will
> support depends on the host OSs on each end
This is incorrect. The congestion control algorithms run on the sending side, not the destination. It is the behaviour of the congestion control algorithms of the sending OS that determines the TCP window size.
> Most high-latent connections will also experience an increase in packet
> loss. When packets are lost, the congestion algorithm will decrease the
> congestion window.
This is the normal behaviour of the congestion control algorithms. Furthermore, you’re making the flawed assumption that latency causes packet loss, which is not true. Latency and packet loss are both symptoms of network congestion, caused by bandwidth being maxed out at a router. To understand why this is the case, you have to think about how routers work. Packets arrive at a router and are put into a queue. They get transferred over some form of link and retransmitted onto another network.
In an ideal situation, the queue has at most one packet stored in it. If packets arrive faster than the bandwidth of the link between the networks (or the bandwidth of the networks themselves), the queue backs up, as packets are held, waiting for the next one to be retransmitted. It’s kind of like 30 people all trying to get onto a bus at once. You get latency because packets are being held in a queue.
In the extreme situation, packets get lost because the queue is a limited size (you can’t keep queueing packets forever). So after a while, any more incoming packets just get dropped, resulting in dropped packets. There are other reasons for dropped packets, but they basically all involve your network hardware being broken. Network congestion due to lack of bandwidth is the main cause of packet loss. This is why it’s used by the congestion control algorithms as a signal to reduce the transmit rate (ie. reduce the transmit rate).
I seriously suggest you go and read Jacobson’s original paper on congestion avoidance [http://ee.lbl.gov/papers/congavoid.pdf], as it explains the problems of congestion avoidance from first principles and how the TCP Reno algorithms help solve these.
Fragglet,
I appreciate your comments and your interest. 3 things:
1) Most applications these days, but certainly not all, are bi-directional. Sometimes I’m the sender and sometimes I’m the receiver. That’s why I said the host on each side. Since the premise of this article is a WAN design, where sometimes clients are sending data and sometimes they are receiving it, I need to be aware of the limitations at both ends.
2) I did not say that latency causes packet loss; I said there was a correlation between the two. I will drop more packets on my trans-atlantic MPLS circuit than on my point-to-point link between two locations in California, all other things being equal.
3) Even if the window stays the same size, it still takes longer for a complete round trip transaction to occur over a highly latent connection. This is the whole reason RFC 1323 exists!
Obviously your contention is that high-latency networks should not have a problem because TCP will magically deal with the issue. The simple fact is that this is not the case. That is why companies design around this problem with CDNs and network accelerators. I hope you’ll check out and link to part 2 tomorrow.
Thanks,
-Bill
> 1) Most applications these days, but certainly not all, are
> bi-directional. Sometimes I’m the sender and sometimes I’m the receiver.
> That’s why I said the host on each side. Since the premise of this article
> is a WAN design, where sometimes clients are sending data and sometimes
> they are receiving it, I need to be aware of the limitations at both ends.
Although you’re right in that most network protocols are bi-directional (eg. HTTP), congestion control only takes effect when the congestion window is reached. In a typical download over HTTP, the client making the request will not reach the congestion window size. The congestion control algorithms on the client are therefore irrelevant. It’s the server’s algorithms that matter, because it’s the one sending lots of data and hitting the congestion window ceiling.
> 2) I did not say that latency causes packet loss; I said there was a
> correlation between the two. I will drop more packets on my trans-atlantic
> MPLS circuit than on my point-to-point link between two locations in
> California, all other things being equal.
Correlation does not equal causation! As I explained, high latencies and lost packets are both symptoms of network congestion. What is the solution to network congestion? …. add more bandwidth!
> 3) Even if the window stays the same size, it still takes longer for a
> complete round trip transaction to occur over a highly latent connection.
> This is the whole reason RFC 1323 exists!
Actually, no. Read the introduction to RFC1323 that explains the reasons for its existence:
The introduction of fiber optics is resulting in ever-higher
transmission speeds, and the fastest paths are moving out of the
domain for which TCP was originally engineered.
> Obviously your contention is that high-latency networks should not have a
> problem because TCP will magically deal with the issue. The simple fact is
> that this is not the case. That is why companies design around this
> problem with CDNs and network accelerators.
No, this is not what I am saying. You are saying that latency causes network problems and that by improving latency you can improve your network. I assert that this is false. If you have latency problems, they are a symptom of network congestion. If your network is suffering from serious congestion, it probably needs more bandwidth.
Fragglet,
>”You are saying that latency causes network problems and that by improving latency you can improve your network. I assert that this is false. If you have latency problems, they are a symptom of network congestion. If your network is suffering from serious congestion, it probably needs more bandwidth.”
Wow. It is impressive how someone can miss the point so completely so many times. While network congestion will add to latency, latency is in and of itself a problem. In a network with zero congestion, latency will still be a problem. The problem is distance. More bandwidth cannot improve upon the speed of light. Sorry. This is the whole point of my article. Latency does cause issues unrelated to bandwidth or congestion. Those issues can be reduced with planning.
Thanks for commenting.
-Bill
8man,
you didn’t provide any details of which ISP you’re using, but yes, a naive CALEA implementation could have everything routed the way you observe though not likely. most of that data is collected local to the node as gathering it on an aggregate interface downstream is harder to do due to the data rates involved.
it’s hard to speculate without seeing traceroutes to understand some of the topology involved. given what you said, it sounds like the majority of devices in their network have little more than static routes to the next router and have no real IGP awareness, or worse, have only one path through the network to other nodes.
it could be that their implementation has all remote nodes as circuits back to their ‘central’ office like a traditional backhauled dial network. in general seeing all traffic go through one node like that is indicative of a lean network with no other paths. this is fairly common in smaller isps as they can not afford the infrastructure as yet to allow for multiple exits from each pop to their core(s). in some cases this backhauling results in said traffic going via a ’scenic’ route. these cases can be financially and politically driven at times as well.
finding out why it is this way would require getting to know your ISP’s network engineers and noc. while they may not be able to share all the details, you could gain a better understanding of some of it. in some cases it may be simple oversight and misconfiguration, as they’re human and make mistakes too.
> Obviously your contention is that high-latency networks should not have a problem because TCP will magically deal with the issue. The simple fact is that this is not the case. That is why companies design around this problem with CDNs and network accelerators. I hope you’ll check out and link to part 2 tomorrow.
CDNs came about because content networks were interested in solving a problem on their own that ISPs and NSPs should be solving but are not for financial and political reasons. the manner in which they did this was to eliminate intermediate networks altogether and introduce faux localization. this is neither here nor there on topic.
a high latency network should experience no more issues than a low latency network. however, as more outstanding data will exist on a high latency network, the risk is bigger when something does become a problem.
simply saying “add more bandwidth’ is oversimplified as well. too many admins default to this when in reality you should first understand the cause of a problem instead of blaming the symptoms.
latency isn’t a problem unto itself. neither is bandwidth. more often than not, your assumptions on what your network is actually doing is the problem.
You guys really aren’t getting it.
Bill is talking about the impact caused to TCP by networks with high, fixed latency.
Here is another example besides Bills. I work for an ISP in Hawaii. We use an OC48 to get to the mainland. It has a fixed propagation delay of around 60ms RTT because the distance. On the other hand, I can set up a T1 line in my lab and get 5ms of latency off it.
It’s this fixed latency that can wreak havoc on a TCP session flow (like Bill explained) regardless of the size of the connection.
For example, a Windows PC that uses the default receiver buffers will not be able to take advantage of high speed connections in the 7mbs and up range if there is a lot of latency. The default RWIN value on a windows based PCs peaks out around 2.6mb/s on a path with 200ms RTT. It’s only through tweaking the registry that will allow a windows PC to take advantage of RFC1323 mechanisms. By doing some tweaking you can alleviate some of the performance issues caused by high latency connections to TCP.
On the other hand, if you have a MAC or Linux (supposedly Vista too) based PC you probably won’t have to do any tweaking as they generally have window scaling, SACK and a higher RWIN value enabled by default.
This subject is very near and dear to me as I am currently evaluating, in the lab, our 7 and 11mb DSL offering. One of the things I am testing is the results of website based speed tests to a local server setup with a speed test application. To this directly connected server I see
|Continued from above|
…..around 3.5 to 4mbs download speeds on a 11mb DSL circuit compared to seeing exactly 11mb (+/- 300kb) when the latency is
(stupid carrot sign)
less then 1ms. I’m not a rocket scientist but it looks to me like latency does have an impact.
If you guys still dont get it, google “long fat networks”
[...] read more | digg story [...]
[...] edgeblog » It’s Still the Latency, Stupid…pt.1. Here’s a great article about how your problems with network speed may have more to do with latency than bandwith. Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages. [...]