15-744: Computer Networking

1 15-744: Computer NetworkingL-16 Network Measurement ...
Author: Andrea Strickland
0 downloads 4 Views

1 15-744: Computer NetworkingL-16 Network Measurement

2 Outline Motivation Three case studies: Internet Topology Bandwidth MapInternet Latency

3 Why network measurement?Help understand the network (Internet) E.g. Topology (ISP don’t usually publish this) Useful for modeling and simulation Identify problems and bottlenecks Where do we need more investment? Which link is down (now)? Improve application/user experience End-to-end performance is key to UE… …UE means money

4 What to measure? Network topology Network performanceConnectivity, redundancy, coverage Network performance Bandwidth, latency, jitter, utility Types of protocols/applications TCP/UDP/ICMP, P2P/Client-server Application specific data Video player: buffering time, bitrate Web service: page load time

5 How to measure? A lot of general toolsTraceroute, ping, iperf, etc. Components built in applications E.g. video player could log buffering time ISPs knows better about their own network But usually not public Passive and active Pure listening vs. inject traffic Overhead difference

6 Outline Motivation Three case studies: Internet Topology Bandwidth MapFaloutsos’99, Li’04 Bandwidth Map Internet Latency

7 Why study topology? Correctness of network protocols typically independent of topology Performance of networks critically dependent on topology e.g., convergence of route information Identifying good topologies and mechanism design for them Internet impossible to replicate Modeling of topology needed to generate test topologies

8 Internet topologies Router level Autonomous System (AS) level AT&TSPRINT MCI AT&T MCI SPRINT Router level Autonomous System (AS) level

9 Router level vs. AS levelRouter level topologies reflect physical connectivity between nodes Inferred from tools like traceroute or well known public measurement projects like Mercator and Skitter AS graph reflects a peering relationship between two providers/clients Inferred from inter-domain routers that run BGP and public projects like Oregon Route Views

10 Hub-and-Spoke TopologySingle hub node Common in enterprise networks Main location and satellite sites Simple design and trivial routing Problems Single point of failure Bandwidth limitations High delay between sites Costs to backhaul to hub

11 Simple Alternatives to Hub-and-SpokeDual hub-and-spoke Higher reliability Higher cost Good building block Levels of hierarchy Reduce backhaul cost Aggregate the bandwidth Shorter site-to-site delay

12 Waxman model (Waxman 1988) Observation: Long-range links are expensiveNodes placed at random in 2-d space with dimension L Probability of edge (u,v): ae^{-d/(bL)}, where d is Euclidean distance (u,v), a and b are constants Models locality u d(u,v) v

13 Transit-stub model (Zegura 1997)Observation: Real networks exhibit Hierarchical structure Specialized nodes (transit, stub..) Connectivity requirements Redundancy Characteristics incorporated into the Georgia Tech Internetwork Topology Models (GT-ITM) simulator (E. Zegura, K.Calvert and M.J. Donahoo, 1995)

14 Transit-stub model (Zegura 1997)Transit domains placed in 2-d space populated with routers connected to each other Stub domains connected to transit domains Models hierarchy

15 So…are we done? No! In 1999, Faloutsos, Faloutsos and Faloutsos published a paper, demonstrating power law relationships in Internet graphs Specifically, the node degree distribution exhibited power laws That Changed Everything…..

16 Power Law Measurement MethodologyAS level topology Collect BGP routing information from many routers Router level topology Traceroute between a lot of pairs of nodes Two IP addresses correspond two the same node if the their names (inverse DNS) are the same

17 Power laws in AS level topology

18 Power Laws and Internet TopologySource: Faloutsos et al. (1999) Most nodes have few connections Rank R(d) R(d) = P (D>d) x #nodes A few nodes have lots of connections Degree d Router-level graph & Autonomous System (AS) graph Led to active research in degree-based network models

19 GT-ITM abandoned.. GT-ITM did not give power law degree graphsNew topology generators and explanation for power law degrees were sought Focus of generators to match degree distribution of observed graph

20 Features of Degree-Based ModelsPreferential Attachment Expected Degree Sequence Degree sequence follows a power law (by construction) High-degree nodes correspond to highly connected central “hubs”, which are crucial to the system Achilles’ heel: robust to random failure, fragile to specific attack

21 Problem With Power Law ... but they're descriptive models!Many graphs with similar distribution have different properties No correct physical explanation, need an understanding of practical issues: the driving force behind deployment the driving force behind growth

22 Li et al. 2004(HOT) Consider the explicit design of the InternetAnnotated network graphs (capacity, bandwidth) Technological and economic limitations Network performance Seek a theory for Internet topology that is explanatory and not merely descriptive. Explain high variability in network connectivity Ability to match large scale statistics (e.g. power laws) is only secondary evidence Why do we have the power law distribution?

23 Router Technology Constraint10 -1 1 2 3 Bandwidth (Gbps) Cisco GSR, circa 2002 high BW low degree Total Bandwidth high degree low BW Bandwidth per Degree 15 x 10 GE 15 x 3 x 1 GE 15 x 4 x OC12 15 x 8 FE Technology constraint 10 1 2 Degree

24 Aggregate Router Feasibilityapproximate aggregate feasible region core technologies older/cheaper technologies edge technologies So what does this suggest? Not possible to have high bw – high degree router as the hub nodes. Source: Cisco Product Catalog, June 2002

25 Variability in End-User Bandwidthshigh performance computing academic and corporate residential and small business 1e4 Ethernet 1-10Gbps 1e3 1e2 Ethernet 10-100Mbps Connection Speed (Mbps) a few users have very high speed connections It look sat the end-user bandwidth demands, there is great variability in the availability and/or willingness-to-pay for bandwidth on the part of customers at the edge. Although a few users have very high speed connections, most users have low speed connections. Given these facts, what will a ISP do? The goal of the ISP is to minimize cost. The cost of links creates an economic incentive to aggregate traffic as close to the edge as possible. The variability in edge connectivity speeds thus creates variability in the connectivity at edge routers. 1e1 Broadband Cable/DSL ~500Kbps 1 1e-1 Dial-up ~56Kbps most users have low speed connections 1e-2 1 1e2 1e4 1e6 1e8 Rank (number of users)

26 High degree nodes are at the edges.Heuristically Optimal Topology Mesh-like core of fast, low degree routers Cores Hosts High degree nodes are at the edges. Edges

27 Physical ConnectivityIntermountain GigaPoP U. Memphis Northern Lights Indiana GigaPoP Front Range GigaPoP Great Plains U. Louisville Merit OARNET WiscREN Qwest Labs OneNet Arizona St. NCSA StarLight U. Arizona Iowa St. MREN Oregon GigaPoP NYSERNet Pacific Wave UNM Kansas City Indian- apolis WPI Denver Northern Crossroads Pacific Northwest GigaPoP Seattle Chicago SINet U. Hawaii ESnet New York SURFNet AMES NGIX GEANT Wash D.C. Rutgers U. WIDE Sunnyvale CENIC Los Angeles MANLAN UniNet Houston Atlanta North Texas GigaPoP MAGPI TransPAC/APAN SOX PSC Texas Tech DARPA BossNet SFGP/ AMPATH Mid-Atlantic Crossroads Abilene Backbone Physical Connectivity (as of December 16, 2003) Texas GigaPoP Miss State GigaPoP UMD NGIX Drexel U. Gbps Gbps Gbps Gbps UT Austin U. Florida LaNet Florida A&M U. Delaware UT-SW Med Ctr. NCNI/MCNC Tulane U. U. So. Florida

28 Summary Topology MeasurementFaloutsos’99 on Internet topology Observed “power laws” in Internet structure Router level, AS-level, neighborhood sizes Inspired degree-based topology generators What is wrong with these topologies? Li’04 Many graphs with similar distribution have different properties Should look at fundamental technology constraints and economic trade-offs Thought Network measurement is a function system response. It’s always better to know what the system is like first.

29 Outline Motivation Three case studies: Internet Topology Bandwidth MapSun’14 Internet Latency

30 Motivation Measuring BW between nodes is easyIperf… How about a real-time traffic map of Internet? CDNs could have better server selection P2P application choose the best peer Network diagnoses and trouble-shooting Think about Google Map for real-world traffic

31 Challenges Coverage Overhead (Near) real-time viewsNeed millions of vantage points Overhead BW measurements usually inject non-trivial traffic to the network (Near) real-time views Even bigger overhead

32 Opportunity Idea: infer traffic map from video player statisticsThe growing volume of Internet video traffic (Coverage) Accounts for 30%-50% of total traffic 30M NetFlix streaming subscribers And the ability to instrument video players (Overhead, real-time) Companies are already doing this

33 Overview of Service Capacity & Utilization

34 …Yet Still Challenges Video measurements provide estimates of end-to-end throughput Not per link Not capacity/utilization No information about background traffic Unobserved video & non-video traffic

35 Problem definition

36 Inferring Capacity Use max observed throughput?Simple, but not accurate Actually gives a lower bound Problems Tend to underestimate (background traffic) Fails to consider confluence of flows Practical aspects: capacity are likely drawn from discrete set of values (e.g. 1Gbps) Practical aspects: ISPs tend to minimize cost

37 Inferring Capacity Continued…Confluence of flows & background traffic Minimize cost Discrete capacities

38 Side Information to HelpBecause original problem is hard to solve Candidate side information: Gravity models: The traffic volumes between a s, d pair is roughly proportional to the products of the total traffic volume originating in s and d. Measurement vs. Background ratio: e.g., YouTube is ≈ 18% of peak traffic Overprovisioning: network core links are typically overprovisioned to run at γ=30-40% link utilization on average.

39 Inferring UtilizationNeed to be more real time Side information doesn’t hold within short time Restate problem Ts,d,e,𝑃s,d,e,C𝑙bg𝑙,e We can estimate Ts,d,e from bg𝑙,e Using max-min fairness: allocate user with small demands of what it wants, and evenly divide unused resourse to big users

40 Inferring Utilization - ExampleEach A-C flow gets 1M, B-C flow gets 5M TA,C=1M, TB,C=5M Suppose measured 1 AC and 1 BC link If bga,c=4.5M, bgb,c=13.5M, then TEstA,C=0.5M, TEstB,C=1M If bga,c=4M, bgb,c=9M, then TEstA,C=1M, TEstB,C=5M

41 Evaluation – Simulation SetupSynthesized traffic patterns Sensitivity Number of epochs Background vs. measurement traffic ratio Accuracy of capacity inference

42 Evaluation – Capacity Accuracy

43 Evaluation – Utilization Accuracy

44 Summary Bandwidth MeasurementEarly stage work (Sun’14) on traffic map Leverage the popularity of video players Low overhead and high coverage The data is straight-forward to collect, but Converting from end-to-end measurements to per-link number is hard Converting from throughput to capacity is hard Thought The interesting fact may be hidden (deep) under the things we (can) measure

45 Outline Motivation Three case studies: Internet Topology Bandwidth MapInternet Latency Singla’14, Li’10

46 Value of (Very) Low LatencyUser experience Response within 30msillusion of zero wait-time Money A 100ms latency penalty for Amazon1% loss Cloud computing & thin clients Desktop as a service Computation offload from mobile devices New applications E.g. Telemedicine

47 Latency Lags BandwidthThe improvement of b/w has been huge It’s easier… Easier to market Average Internet connection is 11.5Mbps in US Reducing latency is harder Usually requires structural change of network Can we build a “speed of light” Internet? First let’s look at why it is slow? (Singla’14)

48 Wait, don’t CDNs solve the problem?They do help to some extent, but… It doesn’t help for all applications E.g. telemedicine CDNs are expensive Available only to larger Internet companies Lower Internet latency also helps to reduce CDN deployment cost

49 Latency Measurement MethodologyFetch web pages using cURL just the HTML for the landing pages 28,000 Web sites from 186 PlanetLab locations Obtain time for DNS resolution TCP handshake: between SYN and SYN-ACK TCP transfer: actual data transmission Total

50 Latency Measurement Methodology Cont.Also ping the Web servers for 30 times Log min and median traceroute from PlanetLab nodes to servers Then geolocate each router/server using commercial geolocation services Geolocation (based on IP) is not accurate Use multiple services Results are not sensitive to specific service

51 Important Concepts C-latency: Router path latency: Latency inflation:Time for light to travel between client & server (shortest path) Router path latency: Time for light to travel through each router Latency inflation: measured time/c-latency

52 Result Overview PlanetLab nodes are well-connected, so real world results are probably worse

53 Result Overview - SummaryTotal latency inflation: 35.4 (median); 100 (80th percentile) Protocol overhead DNS (7.4) and TCP handshake (3.4) But minimum ping is also 3.2 inflated Suggests inefficiency in lower layers… Very tail heavy distribution

54 Is Congestion the Problem?Log RTT (TCP SYN and SYN-ACK) over 24-hour time period Variation is generally small

55 Infrastructure InflationRouter path is 2x inflated (median) Suggests inefficiency in route selection Not too bad since speed of light in fiber is ~2/3rd the speed of light in vacuum Then why is pinging still 3.2x inflated? 1) Traceroute may not yield response from all routers 2) Physical path of links may not follow shortest path for geological and economical reasons

56 Absolute Numbers (Li’10)Consider client-server applications where server is running in the cloud Methodology Instantiate an instance in each data center of the cloud provider Ping these instances from 200 PlanetLab nodes Record the minimum RTT The best latency you can get with today’s cloud

57 Latency to Closest CloudUsing Amazon (C1), average RTT is 74ms

58 Summary Latency MeasurementSingla’14 proposes an ambitious goal Cut Internet latency to the limit of speed of light Will likely benefit many applications Why is today’s Internet so slow? Infrastructural inefficiency Protocol overhead Li’10 studied the latency to cloud Many tens of milliseconds of latency Long tail distribution Thought Detailed measurements are a good way to locate bottlenecks of the system