1 Performance Troubleshooting across NetworksJoe Breen University of Utah Center for High Performance Computing
2 What are User Expectations?Fasterdata Network Requirements and Expectations
3 What are the steps to attain the expectations?First, make sure the host specs are adequate Are you shooting for 1G, 10G, 25G, 40G, 100G? Second, tune the host most are auto-tuning but higher speeds are still problematic Third, validate network is clean between hosts Fourth, make sure the network stays clean
4 Host specs Motherboard specsHigher CPU speed better than higher core count PCI interrupts tie to CPU processor ==> Try to minimize crossing bus between CPU processors Storage Host Bus adapters and Network Interface Cards require the correct generation of PCI Express and the correct number of lanes https://fasterdata.es.net/science-dmz/DTN/hardware-selection/motherboard-and-chassis/ https://en.wikipedia.org/wiki/PCI_Express#PCI_Express_3.0 https://fasterdata.es.net/science-dmz/DTN/100g-dtn/ https://fasterdata.es.net/science-dmz/DTN/reference-implementation/
5 Host specs PCI bus What generation of PCI Express (PCIe) and how many lanes? 4, 8, and16 lanes possible # of lanes supported depends on motherboard and Network Interface Card (NIC) Speed of lane depends on generation of PCIe PCIe 2 -> 2.5Gb/s per lane including overhead PCIe 3 -> 5Gb/s per lane including overhead https://fasterdata.es.net/science-dmz/DTN/hardware-selection/motherboard-and-chassis/ https://en.wikipedia.org/wiki/PCI_Express#PCI_Express_3.0 https://fasterdata.es.net/science-dmz/DTN/100g-dtn/ https://fasterdata.es.net/science-dmz/DTN/reference-implementation/
6 Host specs PCI Implications PCIe v2 with 8 lanes and greater for 10G https://fasterdata.es.net/science-dmz/DTN/hardware-selection/motherboard-and-chassis/ https://en.wikipedia.org/wiki/PCI_Express#PCI_Express_3.0 https://fasterdata.es.net/science-dmz/DTN/100g-dtn/ https://fasterdata.es.net/science-dmz/DTN/reference-implementation/
7 Host specs Storage subsystem factors Local disk Network diskRAID6, RAID5 or RAID 1+0 SATA or SAS Spinning disk vs SSD Network disk High speed parallel system vs NFS or SMB mounts, Local Storage: * * https://en.wikipedia.org/wiki/Serial_ATA * https://en.wikipedia.org/wiki/Serial_Attached_SCSI * https://en.wikipedia.org/wiki/Solid-state_drive Network Storage NFS: * v3 https://tools.ietf.org/html/rfc1813 * v4.1 https://tools.ietf.org/html/rfc5661 CIFS/SMB: * https://en.wikipedia.org/wiki/Server_Message_Block * https://msdn.microsoft.com/en-us/library/windows/desktop/aa365233(v=vs.85).aspx * https://technet.microsoft.com/en-us/library/cc aspx Storage Performance: * Storage performance testers: * * * * https://www.spec.org/sfs2008/press/release.html Parallel File Systems * Parallel NFS: * Lustre:http://lustre.org/ * GPFS: ** https://en.wikipedia.org/wiki/IBM_General_Parallel_File_System ** https://www.ibm.com/support/knowledgecenter/en/SSFKCN/gpfs_welcome.html Multi-tenancy: * * https://en.wikipedia.org/wiki/Multitenancy
8 Host specs and other Memory – 32GB or greaterOther factors such as multi-tenancy – how busy is your system? Local Storage: * * https://en.wikipedia.org/wiki/Serial_ATA * https://en.wikipedia.org/wiki/Serial_Attached_SCSI * https://en.wikipedia.org/wiki/Solid-state_drive Network Storage NFS: * v3 https://tools.ietf.org/html/rfc1813 * v4.1 https://tools.ietf.org/html/rfc5661 CIFS/SMB: * https://en.wikipedia.org/wiki/Server_Message_Block * https://msdn.microsoft.com/en-us/library/windows/desktop/aa365233(v=vs.85).aspx * https://technet.microsoft.com/en-us/library/cc aspx Storage Performance: * Storage performance testers: * * * * https://www.spec.org/sfs2008/press/release.html Parallel File Systems * Parallel NFS: * Lustre:http://lustre.org/ * GPFS: ** https://en.wikipedia.org/wiki/IBM_General_Parallel_File_System ** https://www.ibm.com/support/knowledgecenter/en/SSFKCN/gpfs_welcome.html Multi-tenancy: * * https://en.wikipedia.org/wiki/Multitenancy
9 Host tuning TCP Buffers sets the max data rateToo small means TCP cannot fill the pipe Buffer size = Bandwidth * Round Trip Time Use ping for the RTT Most recent Operating Systems now have auto- tuning which helps For high bandwidth, i.e. 40Gbps+ NICs, the admin should double-check the maximum TCP buffer settings (OS dependent) Matt Mathis paper
10 Host tuning needs info on the networkDetermine the Bandwidth-Delay Product (BDP) Bandwidth Delay Product = Bandwidth * Round Trip Time BDP = BW * RTT e.g. 10Gbps*70ms =700,000,000bits = 87,500,000Bytes BDP determines proper TCP Receive Window RFC 1323 allows TCP extensions, i.e. window scaling Long Fat Pipe (LFN) – networks with large bandwidth delay. Matt Mathis original paper * TCP Performance and the Mathis equation * Enabling High Performance Data Transfers * https://www.psc.edu/services/networking/68-research/networking/641-tcp-tune TCP Large Window extensions – window scale and Long Fat Pipes * RFC https://www.ietf.org/rfc/rfc1323.txt A User's Guide to TCP Windows (Von Welch) Sizing Router Buffers (Appenzeller, Keslassy, McKeown) Internet Protocol Journal -- TCP Performance (Geoff Huston) *
11 Host Tuning Linux Apple Mac Modify /etc/sysctl.conf withSee Notes section for links with details and description Linux Modify /etc/sysctl.conf with recommended parameters Apple Mac # allow testing with buffers up to 128MB net.core.rmem_max = net.core.wmem_max = # increase Linux autotuning TCP buffer limit to 64MB net.ipv4.tcp_rmem = net.ipv4.tcp_wmem = # recommended default congestion control is htcp net.ipv4.tcp_congestion_control=htcp # recommended for hosts with jumbo frames enabled net.ipv4.tcp_mtu_probing=1 # recommended for CentOS7/Debian8 hosts net.core.default_qdisc = fq # OSX default of 3 is not big enough net.inet.tcp.win_scale_factor=8 # increase OSX TCP autotuning maximums net.inet.tcp.autorcvbufmax= net.inet.tcp.autosndbufmax= Linux: * Apple Mac: * * https://rolande.wordpress.com/2010/12/30/performance-tuning-the-network-stack-on-mac-osx-10-6/ MS Windows: * * https://www.speedguide.net/articles/windows server-tcpip-tweaks-5077 * MS Win 10 and Server 2016 Powershell Network cmdlets ** https://technet.microsoft.com/en-us/itpro/powershell/windows/netadapter/netadapter ** https://technet.microsoft.com/itpro/powershell/windows/nettcpip/set-nettcpsetting
12 Host Tuning See Notes section for links with details and description MS Windows Show the autotuning status - "netsh interface tcp show global" Use Powershell Network cmdlets for changing parameters in Windows 10 and Windows 2016 e.g. Set-NetTCPSetting -SettingName "Custom" -CongestionProvider CTCP -InitialCongestionWindowMss 6 Linux: * Apple Mac: * * https://rolande.wordpress.com/2010/12/30/performance-tuning-the-network-stack-on-mac-osx-10-6/ MS Windows: * * https://www.speedguide.net/articles/windows server-tcpip-tweaks-5077 * MS Win 10 and Server 2016 Powershell Network cmdlets ** https://technet.microsoft.com/en-us/itpro/powershell/windows/netadapter/netadapter ** https://technet.microsoft.com/itpro/powershell/windows/nettcpip/set-nettcpsetting
13 What does the Network look like?What bandwidth do you expect? How far away is the destination? What is the round trip time that ping gives? Are you able to support jumbo frames? Send test packets with the "don't fragment bit set Linux or Mac: "ping -s Mdo
14 What does the Network look like?Do you have asymmetric routing? Traceroute from your local machine gives one direction Are you able to traceroute from the remote site? Are they mirrors of each other? Matt Mathis original paper * PSC pages * https://www.psc.edu/services/networking/68-research/networking/641-tcp-tune *
15 What does the Network look like?Determine the Bandwidth-Delay Product (BDP) Bandwidth Delay Product = Bandwidth * Round Trip Time BDP = BW * RTT e.g. 10Gbps*70ms =700,000,000bits = 87,500,000Bytes BDP determines proper TCP Receive Window RFC 1323 allows TCP extensions, i.e. window scaling Long Fat Pipe (LFN) – networks with large bandwidth delay. Matt Mathis original paper * TCP Performance and the Mathis equation * Enabling High Performance Data Transfers * https://www.psc.edu/services/networking/68-research/networking/641-tcp-tune TCP Large Window extensions – window scale and Long Fat Pipes * RFC https://www.ietf.org/rfc/rfc1323.txt A User's Guide to TCP Windows (Von Welch) Sizing Router Buffers (Appenzeller, Keslassy, McKeown) Internet Protocol Journal -- TCP Performance (Geoff Huston) *
16 How clean does the network really have to be?ES net TCP tuning
17 How do I validate the network?Measurement! Active measurement Perfsonar – Iperf - https://github.com/esnet/iperf Nuttcp - https://www.nuttcp.net/Welcome%20Page.html Passive measurement Nagios, Solarwinds, Zabbix, Zenoss Cacti, PRTG, RRDtool Trend the drops/discards Perfsonar Iperf https://github.com/esnet/iperf Nuttcp https://www.nuttcp.net Nagios
18 How do I make sure the network is clean on a continual basis?Design network security zone without performance inhibitors Set up appropriate full bandwidth security Access Control Lists Remotely Triggered Black Hole Routing Setup ongoing monitoring with tools such as perfSONAR Create a maddash dashboard
19 Set up a performance/security zoneScience DMZ architecture is a dedicated performance/security zone on a campus Science DMZ Science DMZ: A Network Design Pattern for Data-Intensive Science https://www.es.net/assets/pubs_presos/sc13sciDMZ-final.pdf
20 Use the right tool Rclone - https://rclone.org/Globus - https://www.globus.org/ FDT - Bbcp - Udt -
21 100G Host, Parallel Streams: no pacing vs 20G pacingTechniques such as Packet Pacing 100G Host, Parallel Streams: no pacing vs 20G pacing Optimizing Data Transfer Nodes using Packet Pacing https://www.es.net/assets/pubs_presos/packet-pacing.pdf Credit: Brian Tierney, Nathan Hanford, Dipak Ghosal https://www.es.net/assets/pubs_presos/packet-pacing.pdf
22 Techniques such as Packet PacingFaster Data Packet pacing (ESnet) https://fasterdata.es.net/host-tuning/packet-pacing/ Credit: Brian Tierney, Nathan Hanford, Dipak Ghosal https://www.es.net/assets/pubs_presos/packet-pacing.pdf
23 Not just about researchTroubleshooting to the cloud is similar High latency, with big pipes Latency is not just to the front door but also internal to the cloud providers Example: Backups to the cloud are a lot like big science flows
24 Live example troubleshooting using bwctl on perfsonar boxesBwctl -s
25 References: See Notes pages on print out of slides for references for each slide