I normally have `mtr 1.1`¹ running in the background, in the third display mode, which is a 2D histogram—time in the x axis, hops in the y axis, and ASCII character/colour for ping time. When problems occur, this tends to let you easily see the nature of the problem—total loss, elevated packet loss, elevated response times; and to see the location of the problem—local network, local ISP, public internet. There are definitely occasions for loss%, sent, last/average/best/worst/stddev ping and such things as are found in the first display mode, but most of the time I find the histogram view most useful as the starting point.
You can make mtr start in this view with --displaymode=2 (direct command line arguments, `mtr --displaymode=2 …`; or shell alias, `alias mtr="mtr --displaymode=2"`; or set environment variable MTR_OPTIONS=--displaymode=2).
¹ 1.1 = 1.0.0.1 = Cloudflare public DNS, a convenient nearby public internet endpoint.
jlmcguire 39 days ago [-]
MTR is a useful tool but it is a somewhat common source of illusory issues since it generates so many icmp time exceeded packets that routers stop replying to other folks running traces. It's important, as others said, to understand that these aren't testing the data path of a network but instead the control plane path.
cootsnuck 38 days ago [-]
What tools exist for people to test the data path of network reliably? In my experience MTR has worked well enough to approximate network routing issues. But it has always felt like a blunt tool given it can't do anything about hops with firewalls.
jlmcguire 38 days ago [-]
It's tough. iperf is a reasonable tool. It works by setting up tcp connections and actually transferring data.
I like the work https://fasterdata.es.net/ does. They provide clear guides and set expectations if you want to get more bandwidth out of a connection.
neilv 39 days ago [-]
MTR has long been one of the first little tools that I install on workstations.
sudo apt install mtr-tiny
I also have a hotkey to pop it up in a window, pinging to some host that'll always be somewhere on the other side of any ISP from me. Whenever I suddenly suspect a networking problem from my laptop, I hit the hotkey as the first troubleshooting step. MTR starts to narrow down a few different problems very quickly.
eudhxhdhsb32 39 days ago [-]
Mtr is indeed nice.
One thing I've not understood is why will some hops have consistently lower ping times than hops farther down the chain in the same trace?
Is it indicating that the router is faster at forwarding packets than responding to ping requests?
p_ing 39 days ago [-]
This is always worth a (re)read to understand traceroute:
^ This should be required reading for anyone using traceroute.
wrigby 39 days ago [-]
> Is it indicating that the router is faster at forwarding packets than responding to ping requests?
Exactly this. In most “real” routers, forwarding (usually) happens in the “data plane”. It’s handled by an ASIC that has a routing table accessible to it in RAM. A packet comes in on an interface, a routing decision is made, and it goes out another interface - all of this happens with dedicated hardware. Pings (ICMP Echo requests), however, get forwarded by this ASIC to a local CPU, where they are handled by software (in the “control plane”).
You’re really seeing different response times from the two control planes - one may be more loaded or less powerful than another, regardless of the capacity of their data planes.
linsomniac 39 days ago [-]
This is also why you may see packet loss at one particular hop but then responses from hops beyond it. The hop with packet loss in this case probably has an overwhelmed CPU, rather than indicating that a particular network link has packet loss. mtr reporting packet loss at a hop is only reliable if every hop after it has similar packet loss.
Maybe the only thing I've explained more in my career than this is why it's ok that your Linux box has no "free" memory.
commandersaki 38 days ago [-]
It also doesn’t help that mtr ICMP handling code is just bad, it disregards packets that actually arrive as a loss.
commandersaki 38 days ago [-]
I retract my previous statement about bad ICMP code (and other comments where I posted it). I was under the impression that mtr was actually doing ICMP echo requests to individual hops with decreasing TTLs, but it's just relying on the TTL being generated for the end to end echo request. However, this is just still a terrible indicator for packet loss, for example by wifi router heavily deprioritises generating TTL exceeded packets but will respond to a flood of echo requests no issue. My main contention is the per hop loss indicator is a useless and misleading metric and you should be measuring these things end to end with traceroute and ping separately.
oxygen_crisis 39 days ago [-]
Traceroute doesn't use ping requests except with the old Windows binary. Usually it uses "Time-to-live (TTL) exceeded in transit" messages.
Beyond that technicality, your guess is often right... Routers will frequently prioritize forwarding packets over sending the TTL exceeded packets tools like MTR use to measure response times.
ta1243 39 days ago [-]
Also you can easily have the TTL expired message going via a different route on the return path (and indeed the same applies with your normal connections, asymetric routing can be a pain - especially in networks with rpf issues (multicast ones are a particular pain point), and with stateful firewalls, but most of the time it's fine. You just need to be aware.
Obviously you know, but for anyone else reading, a modern traceroute tool (like mtr) can send icmp, udp or tcp, on generic or specific ports. Indeed the default for mtr on my laptop is to use icmp.
toast0 39 days ago [-]
Most likely, it's as you described, router N forwards packets much faster than it generates icmp ttl exceeded, and router N+1 is nearby and generates icmp faster.
However, it could also be the case that the routing back to you is significantly different, so you can have a much longer path to you from router N than router N+1.
This is more likely to happen on routes that cross oceans. Say you're tracing from the US to Brazil. If router N and N+1 are both in Brazil, but N sends return packets through Europe and N+1 sends through Florida, N+1 returns will arrive significantly sooner.
rixed 39 days ago [-]
> Is it indicating that the router is faster at forwarding packets than responding to ping requests?
I believe most of the time this is the reason indeed. Answering an ICMP error to a TTL expiration or to an echo request is very low priority.
This latency in error message generation may even be a better signal of the router load than the latency of the actualy trip through it.
commandersaki 39 days ago [-]
Great tool for misleading results.
ta1243 39 days ago [-]
The results aren't misleading, shockingly large numbers of "computer professionals" have no idea how networks work, but that's because they can't use the tools rather than the tools being misleading
hiAndrewQuinn 39 days ago [-]
>shockingly large numbers of "computer professionals" have no idea how networks work
Incidentally, if you suspect you yourself are this, I can't recommend any book more highly than Michael W. Lucas's Networking for Systems Administrators. Don't be fooled by the title - the whole idea is to get you to the level where you can talk to a network engineer without looking totally clueless, and no farther - an excellent stopping point.
I would recommend it handily over, say, my own Intro to Networking class in college. And yes, `mtr` is mentioned by name in it!
jeroenhd 39 days ago [-]
People familiar with networking underestimate how complicated networking actually is. A huge segment of programmers will learn about the existence of routing and BGP and end up in a career where HTTPS and maybe DNS is all they need to worry about.
I'm 100% sure the only reason so many programmers know how NAT works is because NAT breaks video games.
tetha 39 days ago [-]
EDIT: Re-Reading. I think I am some degree of a networker underestimating network complexity. I'll stand by that. Please make fun of me for only speaking in IPs and Ports.
Yeh. There is a very achievable level of knowledge about networking that's enough to make a lot of practical problems solvable.
Like, my practically acquired patchwork of knowledge about subnets, routing, some DNS, some VPN tech, maybe some ideas of masquerading and NAT'ing is easily enough to run a multi-site production environment across a number of networking stacks. And I wouldn't really call these things hard. I don't like people who are like "I don't know networking" once you say "routing table". The hardest part there is to understand how things are often a very large amount of very local decisions and a bunch of crossed fingers to get a packet from A to B. Oh an no one thinks about return paths until they run a site to site VPN.
But just a few steps beyond that is a cliff dropping into a terrifying abyss of complexity. LIke I know acronyms like BGP, CGNAT, ideas like Anycast DNS and kinda what they do, but it turns into very dark and different magik rather quickly. I say if we need that, we need a networker.
mschuster91 39 days ago [-]
> I'm 100% sure the only reason so many programmers know how NAT works is because NAT breaks video games.
... and filesharing, from the days when bittorrent was huuuuge.
neom 39 days ago [-]
I once interviewed the manage who built MSN messenger - and when I asked her what the most important thing to the growth was, she said it needed to be able to punch through NATs so kids could use it at high school and uni, because that was the segment they were trying to get it to take off in. (and from what I recall, that strategy indeed worked quite well)
otterz 39 days ago [-]
Care to elaborate?
zinekeller 39 days ago [-]
One under-appreciated problem (except from MPLS fudging and multiple load-balancing routers) is that traceroute (including MTR) only shows the way from the sender to the recipient, but actual networks, especially non-peered connections, usually do not use the same paths for both directions. One example that I've encountered is network A sending its packets via then-Telia (now Arelion) but network B routing their packets through NTT instead, which is only shown if you have initiated traceroutes in both directions.
Hikikomori 39 days ago [-]
The way you write it makes it seems like you're blaming the tool for misleading results when that's the nature of traceroute itself.
MPLS don't have to hide routers though, up to the operator, even if they do it will give you idea of where things went wrong and you can contact the correct people. Load balancing links is either lacp or ecmp, first case doesn't really matter and in the second you'll just see multiple responses on a hop. Neither really had any impact on how useful traceroute is and doesn't really mislead.
tristor 39 days ago [-]
It is possible to to reveal the impacts of asymmetric routing through other tools, for instance ThousandEyes can do this by performing a time synchronized bidirectional trace (among other things it can do that MTR cannot). This can be very valuable.
That said, in practice for the majority of end users, they will not be directly impacted by asymmetric routing, if only because so many services are now cloud-based and the major cloud devices are direct peered with all of the major ISPs at regional meeting points in most countries. As an example, on my connection in Denver on Comcast, going to most applications in AWS will enter the AWS network /in Denver/ and without traversing any transit provider, meaning effectively my traffic never goes across "the Internet", it goes from Comcast (my provider) directly to AWS (the provider for the application).
While it's always good to be mindful of the complexities of real-world routing, for the vast majority of common use cases now, entry-points to the target application are so widely distributed that the most impactful routing is inside the private network of the cloud provider, not across the larger Internet.
Disclaimer: Opinions are my own.
crims0n 39 days ago [-]
Which is why any network engineer worth their salt with ask for a trace in both directions (if available). Asymmetric routing can be an issue especially when going through stateful devices like firewalls.
RajT88 39 days ago [-]
Network engineers for most issues ask for traces on both sides of the connection.
Packet traces do not lie, per se, but they represent only a certain perspective. More perspectives are needed for problems to come into focus.
tetha 38 days ago [-]
I'm not going to tell you how long I've at one time been searching for a missing route on the return path of a VPN connection... But damn the lights that went on when I realized that hurt by being too bright.
lode 39 days ago [-]
Traceroute is easy to be misinterpreted, because it does not have insight in underlying networks like MPLS, which could be the cause of issues.
They are only misleading if you allow yourself to be misled by them. It's an extremely informative measurement if you are aware of how it works and don't misinterpret the results.
perching_aix 39 days ago [-]
None of these claims are mutually exclusive with one another.
"Great tool for misleading results." -> the results the tool provides are either mostly misleading (many are misleading), or are in large part misleading (a large part of each is misleading), potentially both
"Traceroute is easy to be misinterpreted" -> the results the tool provides are easy to misinterpret
"They are only misleading if you allow yourself to be misled by them" -> the results the tool provides require expertise to interpret, implying that otherwise they're (largely) misleading - the same thing the person said right above you
This is turning into a "well I like it and it has its place". Cool, it's just not what was being argued.
ta1243 39 days ago [-]
You can claim pretty much any tool is misleading then. If you don't know how curl works, with say following links, it's "misleading".
perching_aix 39 days ago [-]
Yes, you can. It's basically a terminal case of something being unintuitive. Whether something is misleading is in the eye of the beholder.
Recently my mother felt misled by a car commercial. Her position was that saying things like "under this many years or that many miles" is misleading, because it suggests that it's a set of options she can pick from (which of course ended up not being the case).
Unfortunately for her, this is a natural language construct - whether she understands it correctly or not depends on how aligned her common sense regarding it is with people at large. She understood it differently and thus felt misled. But you may notice that ultimately it was her own mistaken understanding of the common parlance that misled her. So when she said this was misleading the only thing I could reasonably say was exactly this. That I did not find the phrasing misleading, and I'm sorry she'd been misled by it (irrespective of whether that was on her or on the world, as that doesn't really matter).
It's completely on people how they want to handle this. You can find people being misled by stuff like this to be unreasonable and just tell them so, or you can put out a disclaimer regardless. Depends completely per case. This goes all the way to having multiple mechanical interlocks at places with heavy duty xray sources, or preferring machine checked memory management.
commandersaki 38 days ago [-]
The packet loss indicator is the biggest issue I have. I’m well aware that routers may deprioritise ICMP and lead to packet loss, and therefore if you’re not seeing cascading packet loss then it’s probably phantom. Also what really matters is end to end loss anyways.
The other issue with packet loss is the tool doesn’t handle ICMP properly in the first place. A ping flood to an end to end host like 1.1.1.1 shows 0% loss, but when I use mtr to do flood like pinging it shows my wifi router with 100% loss. If I ping flood my router I get 0%.
It’s genuinely a bad tool and you should really just be keeping ping and traceroute separate as they do completely different things.
Elixir6419 37 days ago [-]
It's one of the best tools to troubleshoot packetloss on the internet and generally routed networks. It gives you way more information than ping or traceroute could potentially give.
If you run it in TCP or UDP mode you can even nail down the physical interface that's erroring in a LAG/LACP bundle due to being able to manipulate the 5 tuples very well.
I'm also curious about the flags you used for ping and mtr that showed you this discrapancy.
commandersaki 36 days ago [-]
mtr -i 0.1 1.1.1.1 gives 80% loss for my router (ok not the same as 100% loss as I stated earlier, but I just rerun to experiment), which is deprioritising ttl exceeded packets, but a ping -c 1000 -f 192.168.0.1 (my router) yields 0% loss. The per hop loss indicator is not only incorrect but also isn't useful even if it were accurate since end to end loss is what matters, not a phantom per hop loss that doesn't have any effect on end to end loss.
Elixir6419 35 days ago [-]
Right, so control-plane packet rates are rate limited (to some definition of sane), but they are applied to all applications, traceroutes, pings alike.
An argument could be made for a device configured as such to show loss on ping but not on mtr if you configure the rate limits so that the icmp reply rate is lower than ttl expired rates. Which tool would be wrong than? Would you blame ping for producing misleading results?
The running counters and the ability to pick out the obvious rate limiting when the loss doesn't cascade into the hops to me is akin to traceroutes * * * output. It doesn't always mean that the packets are blackholed, connectivity is broken, it just means the tool is producing an artifact due to network configuration or network characteristics. Further investigation is needed to figure out what's going on.
MTR imho is giving you much more insight into the network than traceroute or ping separately. It doesn't resolve the usual firewall/rate limiting artifacts, but gives you way more information about paths if you know how to interpret them.
commandersaki 35 days ago [-]
> Right, so control-plane packet rates are rate limited (to some definition of sane), but they are applied to all applications, traceroutes, pings alike.
I'm not sure I understand what you're saying, but in this case control-plane packet rates are different for generating TTL exceeded vs Echo Response, where one is giving 80% loss and the other is giving 0% loss at similar rates. Gripe #1 why are we even testing control plane in the first place, it's a useless metric that doesn't have utility at measuring end to end latency/loss.
> An argument could be made for a device configured as such to show loss on ping but not on mtr if you configure the rate limits so that the icmp reply rate is lower than ttl expired rates. Which tool would be wrong than? Would you blame ping for producing misleading results?
Sure that would be a problem, but any combination could be misleading if the data path is yielding 0% loss for high rates of ICMP end to end. This is why it's not a very particularly helpful metric and can be downright misleading (usually not to me, but I've seen plenty people make incorrect inferences from bunk MTR results because the tool isn't intuitive).
> The running counters and the ability to pick out the obvious rate limiting when the loss doesn't cascade into the hops to me is akin to traceroutes * * * output. It doesn't always mean that the packets are blackholed, connectivity is broken, it just means the tool is producing an artifact due to network configuration or network characteristics. Further investigation is needed to figure out what's going on.
Sure that's great, not particularly helpful to the masses who misunderstand the tool. I worked as a network engineer for a decade receiving bunk MTR reports where people freak out because they're seeing "packet loss" which was inexistent on the data forwarding plane (you know the one that actually matters).
> MTR imho is giving you much more insight into the network than traceroute or ping separately. It doesn't resolve the usual firewall/rate limiting artifacts, but gives you way more information about paths if you know how to interpret them.
Time shouldn't be wasted measuring the control path and then investigating to confirm it is the control path and not data path. You cannot make these mistakes using traceroute and ping separately because traceroute doesn't have a notion of a "per-hop" loss indicator and ping doesn't involve intermediate hops (unless an intermediate hop generates an ICMP diagnostic for an echo request).
Elixir6419 34 days ago [-]
> Sure that's great, not particularly helpful to the masses who misunderstand the tool. I worked as a network engineer for a decade receiving bunk MTR reports where people freak out because they're seeing "packet loss" which was inexistent on the data forwarding plane (you know the one that actually matters).
Understanding can be improved. Bunk MTRs are easy to spot. You tell them this is not an issue because .... . Than they will learn and usually that customer will stop sending you bunk MTRs.
I'm pretty sure that the people that are opening tickets with providers/network teams because they have nothing better to do is nearing 0. The fact that they ran an MTR shows that they were doing some troubleshooting and at the end of the day a problem needs to be solved. It may not be on your end but that needs to be investigated but the same would apply for a crappy iperf throughput test. IMHO Any clue/information into where that problem is, is helpful. You may need to filter relevant from irrelevant.
But if I get to pick one out of 2 problems, one has a crappy iperf results, the other has an MTR that has a loss that carries over, I would probably pick the second because that at least gives me indication on whereabouts should I start looking.
> Time shouldn't be wasted measuring the control path and then investigating to confirm it is the control path and not data path. You cannot make these mistakes using traceroute and ping separately because traceroute doesn't have a notion of a "per-hop" loss indicator.
traceroute does have per-hop indicator, it's the * in the output, it's just so often off that nobody pays much attention. You can't really catch issues that are related to route-flaps or reroutes with traceroute. with MTRs it becomes pretty clear if a reroute happens in the middle of your test. I guess you can keep running traceroute but I will leave it to you to sift through the output of that nightmare and than it effectively became MTR, with worse output.
There are also many options available in MTR that is not there in traceroute (to trigger these packets by tcp or udp packets), fix local or remote port etc. Even if you just run it with 3 packets per hop, you will have way more options. You don't have to use it as a continuous monitor to indicate packetloss but can give you the traceroute level information in a much cleaner format and you have more options to choose from.
> ping doesn't involve intermediate hops (unless an intermediate hop generates an ICMP diagnostic for an echo request).
ICMP echo requests and replys can be subject to different QoS treatment as TCP/UDP traffic, so that also doesn't necessarily gives you the right idea when testing for end to end connectivity issue.
Iperf imho is the best bet, and if you want to be really accurate you pick the src/dst port for client/server just to be sure you get into the same Class as your problematic traffic.
As a sidenote MTR packets are also ride the data-plane until they reach the TTL=1.
walrus01 39 days ago [-]
the results aren't misleading, people just don't know how to read them, or understand why bidirectional traceroutes are necessary.
Only misleading if you don't understand what it's doing. If you do, then it's a useful tool.
commandersaki 38 days ago [-]
I don’t trust the ICMP code in mtr. I’ve had an mtr to 1.1.1.1 which shows my wifi router as an intermediate hop showing 100% loss when doing pings at interval of 0.1ms. A flood ping to my router shows 0%. I’d rather just use time tested tools such as ping and trace route, which shouldn’t even be combined anyway since the loss indicator is usually unreliable unless there’s cascading loss (and even then can still be unreliable).
BrandoElFollito 36 days ago [-]
I tried mtr (I usually use ping and traceroute). 87% packet loss at my router and the next hop, then 0% loss. WTF I say to myself and uninstall.
commandersaki 35 days ago [-]
Somewhere in another thread I retracted my issue with the ICMP handling code. But you've nailed my #1 gripe, the per-hop loss indicator is testing control path for diagnostic packets at each hop when the tool is meant to diagnose end to end latency and loss. How do you square a router or many routers showing packet loss when there's 0% end to end loss; it doesn't make sense, it's unintuitive, and that's how misleading inferences manifest.
BrandoElFollito 35 days ago [-]
Whatever the 80% lost packets on the router and next hop mean, there is either a serious problem with the tool or with my router AND the next hop.
I will go for the tool because it is the only one that warns about such a problem.
You can make mtr start in this view with --displaymode=2 (direct command line arguments, `mtr --displaymode=2 …`; or shell alias, `alias mtr="mtr --displaymode=2"`; or set environment variable MTR_OPTIONS=--displaymode=2).
Screenshot of this mode: https://temp.chrismorgan.info/2025-02-06-hn-42924182-mtr-dis...
—⁂—
¹ 1.1 = 1.0.0.1 = Cloudflare public DNS, a convenient nearby public internet endpoint.
I like the work https://fasterdata.es.net/ does. They provide clear guides and set expectations if you want to get more bandwidth out of a connection.
One thing I've not understood is why will some hops have consistently lower ping times than hops farther down the chain in the same trace?
Is it indicating that the router is faster at forwarding packets than responding to ping requests?
https://archive.nanog.org/sites/default/files/traceroute-201...
Exactly this. In most “real” routers, forwarding (usually) happens in the “data plane”. It’s handled by an ASIC that has a routing table accessible to it in RAM. A packet comes in on an interface, a routing decision is made, and it goes out another interface - all of this happens with dedicated hardware. Pings (ICMP Echo requests), however, get forwarded by this ASIC to a local CPU, where they are handled by software (in the “control plane”).
You’re really seeing different response times from the two control planes - one may be more loaded or less powerful than another, regardless of the capacity of their data planes.
Maybe the only thing I've explained more in my career than this is why it's ok that your Linux box has no "free" memory.
Beyond that technicality, your guess is often right... Routers will frequently prioritize forwarding packets over sending the TTL exceeded packets tools like MTR use to measure response times.
Obviously you know, but for anyone else reading, a modern traceroute tool (like mtr) can send icmp, udp or tcp, on generic or specific ports. Indeed the default for mtr on my laptop is to use icmp.
However, it could also be the case that the routing back to you is significantly different, so you can have a much longer path to you from router N than router N+1.
This is more likely to happen on routes that cross oceans. Say you're tracing from the US to Brazil. If router N and N+1 are both in Brazil, but N sends return packets through Europe and N+1 sends through Florida, N+1 returns will arrive significantly sooner.
I believe most of the time this is the reason indeed. Answering an ICMP error to a TTL expiration or to an echo request is very low priority.
This latency in error message generation may even be a better signal of the router load than the latency of the actualy trip through it.
Incidentally, if you suspect you yourself are this, I can't recommend any book more highly than Michael W. Lucas's Networking for Systems Administrators. Don't be fooled by the title - the whole idea is to get you to the level where you can talk to a network engineer without looking totally clueless, and no farther - an excellent stopping point.
I would recommend it handily over, say, my own Intro to Networking class in college. And yes, `mtr` is mentioned by name in it!
I'm 100% sure the only reason so many programmers know how NAT works is because NAT breaks video games.
Yeh. There is a very achievable level of knowledge about networking that's enough to make a lot of practical problems solvable.
Like, my practically acquired patchwork of knowledge about subnets, routing, some DNS, some VPN tech, maybe some ideas of masquerading and NAT'ing is easily enough to run a multi-site production environment across a number of networking stacks. And I wouldn't really call these things hard. I don't like people who are like "I don't know networking" once you say "routing table". The hardest part there is to understand how things are often a very large amount of very local decisions and a bunch of crossed fingers to get a packet from A to B. Oh an no one thinks about return paths until they run a site to site VPN.
But just a few steps beyond that is a cliff dropping into a terrifying abyss of complexity. LIke I know acronyms like BGP, CGNAT, ideas like Anycast DNS and kinda what they do, but it turns into very dark and different magik rather quickly. I say if we need that, we need a networker.
... and filesharing, from the days when bittorrent was huuuuge.
MPLS don't have to hide routers though, up to the operator, even if they do it will give you idea of where things went wrong and you can contact the correct people. Load balancing links is either lacp or ecmp, first case doesn't really matter and in the second you'll just see multiple responses on a hop. Neither really had any impact on how useful traceroute is and doesn't really mislead.
That said, in practice for the majority of end users, they will not be directly impacted by asymmetric routing, if only because so many services are now cloud-based and the major cloud devices are direct peered with all of the major ISPs at regional meeting points in most countries. As an example, on my connection in Denver on Comcast, going to most applications in AWS will enter the AWS network /in Denver/ and without traversing any transit provider, meaning effectively my traffic never goes across "the Internet", it goes from Comcast (my provider) directly to AWS (the provider for the application).
While it's always good to be mindful of the complexities of real-world routing, for the vast majority of common use cases now, entry-points to the target application are so widely distributed that the most impactful routing is inside the private network of the cloud provider, not across the larger Internet.
Disclaimer: Opinions are my own.
Packet traces do not lie, per se, but they represent only a certain perspective. More perspectives are needed for problems to come into focus.
https://movingpackets.net/2017/10/06/misinterpreting-tracero... (discussion at https://news.ycombinator.com/item?id=15474043 )
"Great tool for misleading results." -> the results the tool provides are either mostly misleading (many are misleading), or are in large part misleading (a large part of each is misleading), potentially both
"Traceroute is easy to be misinterpreted" -> the results the tool provides are easy to misinterpret
"They are only misleading if you allow yourself to be misled by them" -> the results the tool provides require expertise to interpret, implying that otherwise they're (largely) misleading - the same thing the person said right above you
This is turning into a "well I like it and it has its place". Cool, it's just not what was being argued.
Recently my mother felt misled by a car commercial. Her position was that saying things like "under this many years or that many miles" is misleading, because it suggests that it's a set of options she can pick from (which of course ended up not being the case).
Unfortunately for her, this is a natural language construct - whether she understands it correctly or not depends on how aligned her common sense regarding it is with people at large. She understood it differently and thus felt misled. But you may notice that ultimately it was her own mistaken understanding of the common parlance that misled her. So when she said this was misleading the only thing I could reasonably say was exactly this. That I did not find the phrasing misleading, and I'm sorry she'd been misled by it (irrespective of whether that was on her or on the world, as that doesn't really matter).
It's completely on people how they want to handle this. You can find people being misled by stuff like this to be unreasonable and just tell them so, or you can put out a disclaimer regardless. Depends completely per case. This goes all the way to having multiple mechanical interlocks at places with heavy duty xray sources, or preferring machine checked memory management.
The other issue with packet loss is the tool doesn’t handle ICMP properly in the first place. A ping flood to an end to end host like 1.1.1.1 shows 0% loss, but when I use mtr to do flood like pinging it shows my wifi router with 100% loss. If I ping flood my router I get 0%.
It’s genuinely a bad tool and you should really just be keeping ping and traceroute separate as they do completely different things.
If you run it in TCP or UDP mode you can even nail down the physical interface that's erroring in a LAG/LACP bundle due to being able to manipulate the 5 tuples very well.
I'm also curious about the flags you used for ping and mtr that showed you this discrapancy.
An argument could be made for a device configured as such to show loss on ping but not on mtr if you configure the rate limits so that the icmp reply rate is lower than ttl expired rates. Which tool would be wrong than? Would you blame ping for producing misleading results?
The running counters and the ability to pick out the obvious rate limiting when the loss doesn't cascade into the hops to me is akin to traceroutes * * * output. It doesn't always mean that the packets are blackholed, connectivity is broken, it just means the tool is producing an artifact due to network configuration or network characteristics. Further investigation is needed to figure out what's going on.
MTR imho is giving you much more insight into the network than traceroute or ping separately. It doesn't resolve the usual firewall/rate limiting artifacts, but gives you way more information about paths if you know how to interpret them.
I'm not sure I understand what you're saying, but in this case control-plane packet rates are different for generating TTL exceeded vs Echo Response, where one is giving 80% loss and the other is giving 0% loss at similar rates. Gripe #1 why are we even testing control plane in the first place, it's a useless metric that doesn't have utility at measuring end to end latency/loss.
> An argument could be made for a device configured as such to show loss on ping but not on mtr if you configure the rate limits so that the icmp reply rate is lower than ttl expired rates. Which tool would be wrong than? Would you blame ping for producing misleading results?
Sure that would be a problem, but any combination could be misleading if the data path is yielding 0% loss for high rates of ICMP end to end. This is why it's not a very particularly helpful metric and can be downright misleading (usually not to me, but I've seen plenty people make incorrect inferences from bunk MTR results because the tool isn't intuitive).
> The running counters and the ability to pick out the obvious rate limiting when the loss doesn't cascade into the hops to me is akin to traceroutes * * * output. It doesn't always mean that the packets are blackholed, connectivity is broken, it just means the tool is producing an artifact due to network configuration or network characteristics. Further investigation is needed to figure out what's going on.
Sure that's great, not particularly helpful to the masses who misunderstand the tool. I worked as a network engineer for a decade receiving bunk MTR reports where people freak out because they're seeing "packet loss" which was inexistent on the data forwarding plane (you know the one that actually matters).
> MTR imho is giving you much more insight into the network than traceroute or ping separately. It doesn't resolve the usual firewall/rate limiting artifacts, but gives you way more information about paths if you know how to interpret them.
Time shouldn't be wasted measuring the control path and then investigating to confirm it is the control path and not data path. You cannot make these mistakes using traceroute and ping separately because traceroute doesn't have a notion of a "per-hop" loss indicator and ping doesn't involve intermediate hops (unless an intermediate hop generates an ICMP diagnostic for an echo request).
Understanding can be improved. Bunk MTRs are easy to spot. You tell them this is not an issue because .... . Than they will learn and usually that customer will stop sending you bunk MTRs.
I'm pretty sure that the people that are opening tickets with providers/network teams because they have nothing better to do is nearing 0. The fact that they ran an MTR shows that they were doing some troubleshooting and at the end of the day a problem needs to be solved. It may not be on your end but that needs to be investigated but the same would apply for a crappy iperf throughput test. IMHO Any clue/information into where that problem is, is helpful. You may need to filter relevant from irrelevant.
But if I get to pick one out of 2 problems, one has a crappy iperf results, the other has an MTR that has a loss that carries over, I would probably pick the second because that at least gives me indication on whereabouts should I start looking.
> Time shouldn't be wasted measuring the control path and then investigating to confirm it is the control path and not data path. You cannot make these mistakes using traceroute and ping separately because traceroute doesn't have a notion of a "per-hop" loss indicator.
traceroute does have per-hop indicator, it's the * in the output, it's just so often off that nobody pays much attention. You can't really catch issues that are related to route-flaps or reroutes with traceroute. with MTRs it becomes pretty clear if a reroute happens in the middle of your test. I guess you can keep running traceroute but I will leave it to you to sift through the output of that nightmare and than it effectively became MTR, with worse output.
There are also many options available in MTR that is not there in traceroute (to trigger these packets by tcp or udp packets), fix local or remote port etc. Even if you just run it with 3 packets per hop, you will have way more options. You don't have to use it as a continuous monitor to indicate packetloss but can give you the traceroute level information in a much cleaner format and you have more options to choose from.
> ping doesn't involve intermediate hops (unless an intermediate hop generates an ICMP diagnostic for an echo request).
ICMP echo requests and replys can be subject to different QoS treatment as TCP/UDP traffic, so that also doesn't necessarily gives you the right idea when testing for end to end connectivity issue. Iperf imho is the best bet, and if you want to be really accurate you pick the src/dst port for client/server just to be sure you get into the same Class as your problematic traffic.
As a sidenote MTR packets are also ride the data-plane until they reach the TTL=1.
https://www.youtube.com/watch?v=L0RUI5kHzEQ
I will go for the tool because it is the only one that warns about such a problem.