1 The Internet Protocol (IP) IPv4Seyed Ali Jokar 1
2 3. Packet Delivery and ForwardingContents 1. Principles 2. Addressing 3. Packet Delivery and Forwarding 4. IP header 5. ICMP 6. Fragmentation 7. Terminology 2
3 1. Why a network layer? We would like to interconnect all devices in the world. We have seen that we can solve the interconnection problem with bridges and the MAC layer. However this is not sufficient as it does not scale to large networks. Q. Why ? Bridges use a tree. This is not efficient in a large network, as the tree concentrates all traffic. Bridges use forwarding tables that are not structured. A bridge must lookup the entire table for every packet. The table size and lookup time would be prohibitive. Solution: connectionless network layer (eg. Internet Protocol, IP): every host receives a network layer address (IP address) intermediate systems forward packets based on destination address 3
4 Connectionless Network LayerConnectionless network layer = no connection every packet contains destination address intermediate systems ( = routers) forward based on longest prefix match to output B.x 2 A.x 0 to output A.x 1 B.D.x 2 B.x 3 to output A.x 1 B.D.x 1 B.C.x 0 Host B.C.H2 Host A.H1 router R1 2 1 router R2 3 1 router R3 1 2 to output A.x 1 B.x 2 router R4 Host B.D.H2 1 2 4
5 IP Principles Homogeneous addressingan IP address is unique across the whole network (= the world in general) IP address is the address of an interface communication between IP hosts requires knowledge of IP addresses Routers between subnetworks only: a subnetwork = a collection of systems with a common prefix inside a subnetwork: hosts communicate directly without routers between subnetworks: one or several routers are used Host either sends a packet to the destination using its LAN, or it passes it to the router for forwarding Terminology: host = end system; router = intermediate system subnetwork = one collection of hosts that can communicate directly without routers 5
6 2. IP addresses IP address Host and Prefix PartUnique addresses in the world, decentralized allocation The current format is IPv4; next format will be IPv6; we will see IPv6 at the end of the lecture. By default, “IP address” = “IPv4 address” An IP address is 32 bits, noted in dotted decimal notation: Host and Prefix Part An IP address has a prefix and a host part: prefix:host Prefix identifies a subnetwork The subnet prefix can be any length; frequent case is 24 bits but not always In order to know its prefix, a host needs to know how many bits constitute it usually by means of a “subnet mask” (see later) 6
7 *Example ETHZ-Backbone Komsys EPFL-Backbone DI LEMA LRCezci7-ethz-switch Komsys 66.46 ezci7-ethz-switch Modem + PPP Switch x.x sic500cs ed0-ext EPFL-Backbone ed0-swi 15.7 15.13 stisun1 15.221 ed2-in ed2-el 182.5 182.1 in-inr in-inj DI LEMA 00:00:0C:02:78:36 00:00:0C:17:32:96 lrcsuns 08:00:20:71:0D:D4 LRC lrcmac4 disun3 08:00:07:01:a2:a5 08:00:20:20:46:2E lrcpc3 lrcmac4 00:00:C0:B8:C2:8D 08:00:07:01:a2:a5 Ring SIDI SUN 7
8 Binary, Decimal and HexadecimalGiven an integer B “the basis”: any integer can be represented in “base B” by means of an alphabet of B symbols Usual cases are decimal: 234 binary: b hexadecimal: xEA Mapping binary <-> hexa is simple: one hexa digit is 4 binary digits xE = b1110 xA = b1010 xEA= b Mapping binary <-> decimal is best done by a calculator b = = 234 Special Cases to remember xF = b1111 = 15 xFF = b = 255 8
9 Representation of IP Addressesdotted decimal: group bits in bytes, write the decimal representation of the number example 1: example 2: hexadecimal: hexadecimal representation -- fixed size string example 1: x80 BF 97 01 example 2: x binary: string of 32 bits (2 symbols: 0, 1) example 1: b example 2: b solution 9
10 An IP address Prefix is written using one of two Notations: masks / prefixesUsing a mask: address + mask : example : mask the mask is the dotted decimal representation of the string made of : 1 in the prefix, 0 elsewhere bit wise address & mask gives the prefix here: prefix is example 2: mask Q1: what is the prefix ? Q2: how many host ids can be allocated ? Typically used in host configuration solution 10
11 Prefix Notation prefix – notation: 128.178.156.1/24 example 2:the 24 first bits of the binary representation of the string, interpreted as dotted decimal here: the prefix is bits in excess are ignored /24 is the same as /24 and /24 typically used in routing tables to identify routing prefixes example 2: Q1: write mask in prefix notation Q2: are these prefixes different ? /28, /28, /28, /28 how many IP addresses can be allocated to each of the distinct prefixes ? solution 11
12 *IP Address HierarchiesThe prefix of an IP address can itself be structured into subprefix in order to support aggregation For example: x.y represents an AUT host / 24 represents the CEIT subnet at AUT / 16 represents CEIT Used between routers by routing algorithms This way of doing is called classless and was first introduced in inter domain routing under the name of CIDR (classless interdomain routing) IP address classes IP addresses are sorted into classes This is an obsolete classification – no longer used At the origin, the prefix of an IP address was defined in a very rigid way. For class A addresses, the prefix was 8 bits. For class B, 16 bits. For class C, 24 bits. The interest of that scheme was that by simply analyzing the address you could find out what the prefix was. It was soon recognized that this form was too rigid. Then subnets were added. It was no longer possible to recognize from the address alone where the subnet prefix ends and where the host identifier starts. For example, the host part at AUT is 8 bits; it is 6 bits at TU. Therefore, an additional information, called the subnet mask, is necessary. Class C addresses were meant to be allocated one per network. Today, they are allocated in contiguous blocks. 12
13 *IP address classes 0 1 2 3… 8 16 24 31 class A class B class C0 1 2 3… class A class B class C class D class E Net Id Subnet Id Host Id 10 Net Id Subnet Id Host Id 110 Net Id Host Id 1110 Multicast address 11110 Reserved Examples: x.x = EPFL host; x.x = ETHZ host 9.x.x.x = IBM host x.x.x = MIT host Class Range A B C D E to to to to to Class B addresses are close to exhausted; new addresses are taken from class C, allocated as continuous blocks 13
14 *Address allocation World Coverage Current allocations of Class CEurope and the Middle East (RIPE NCC) Africa (ARIN & RIPE NCC) North America (ARIN) Latin America including the Caribbean (ARIN) Asia-Pacific (APNIC) Current allocations of Class C /8, /8, 217/8 for RIPE /8, /8, 216/8 for ARIN /8, /8, 218/8 for APNIC Simplifies routing short prefix aggregates many subnetworks routing decision is taken based on the short prefix 14
15 *Address delegation Europe62/8, 80/8, /8, … ISP-1 62.125/16 customer 1: banana foods /25 customer 2: sovkom /24 ISP-2 195.44/14 customer 1: /21 customer 2: /21 Q. Assume sovkom moves from ISP-1 to ISP-2; comment on the impact. solution 15
16 Special case IP addressesthis host, on this network 0.hostId specified host on this net (initialization phase) limited broadcast (not forwarded by routers) 4. subnetId.all 1’s broadcast on this subnet 5. subnetId.all 0’s BSD used it for broadcast on this subnet (obsolate) x.x.x loopback 7. 10/8 reserved networks for 172.16/ internal use (Intranets) /16 1,2: source only; 3,4,5: destination only 16
17 Test Your Understanding (1)solution bridge __.__ __.__.__.__ ? ? __.__.__.253 __.__.__.__ ? ? ? bridge host A __.__.__.1 Q: Can host A have this address? (masks are all ) 17
18 Test your Understanding (2)Q1: An Ethernet segment became too crowded; we split it into 2 segments, interconnected by a router. Do we need to change some IP host addresses? Q2: same with a bridge. solutions 18
19 3. IP packet forwarding Rule for sending packets (hosts, routers)The IP packet forwarding algorithm is the core of the TCP/IP architecture. It defines what a system should do with a packet it has to send or forward. The rule is simple : Rule for sending packets (hosts, routers) if the destination IP address has the same prefix as one of my interfaces, send directly to that interface otherwise send to a router as given by the IP routing table It uses the IP routing table; the table can be checked with a command such as “netstat” with Unix or “Route” with Windows. In reality, there are exceptions to the rule. The complete algorithm is in the next slide; the cases should be tested in that order (it is a nested if then else statement). 19
20 IP packet forwarding algorithmdestAddr = destination address /* unicast! */ if /*case 1*/: a host route exists for destAddr for every entry in routing table if (destinationAddr = destAddr) then send to nextHop IPaddr; leave else if /*case 2*/: destAddr is on a directly connected network (= on-link): for every physical interface IP address A and subnet mask SM if(A & SM = destAddr & SM) then send directly to destAddr; leave else if /*case 3 */ there is a matching entry in routing table find the longest prefix match for destAddr send to nextHop IP addr given by matching entry; leave /* this includes as special case the default route, if it exists */ else /* error*/ send ICMP error message “destination unreachable” to source 20
21 CMD route host routes (routes with a netmask of ), the loopback network route (routes with a destination of and a netmask of ), or a multicast route (routes with a destination of and a netmask of ). Add route route –p add mask Route print 21
22 *Example Q1: Fill in the table if an IP packet has to be sent from lrcsuns Q2: Fill in the table if an IP packet has to be sent from ed2-in solutions final destination next hop case number 22
23 Routing Tables Hosts and routers have routing tables, but only routers have significant routing tables Routing tables at routers are maintained manually or, more usually, by routing protocols Do not confuse Packet forwarding: determine which outgoing interface to use real time Routing compute the values in the routing table background job 23
24 Test Your Understanding (3)Q1. What are the MAC and IP addresses at points 1 and 2 for packets sent by M1 to M3 ? At 2 for packets sent by M4 to M3 ?(Mx = mac address) solution Router Ethernet Concentrator M1 p.h1 M2 p.h2 M3 q.h1 M8 q.1 M4 q.h3 M9 p.1 subnet p subnet q 1 2 24
25 Direct Packet Forwarding: ARPSending to host on the same subnet = direct packet forwarding does not use a router Requires the knowledge of the MAC address on a LAN (called “physical” address) There are four types of solutions for that; all exist in some form or another. write arp table manually: can always be implemented manually on Unix or Windows NT using the arp command Derive MAC address algorithmically from IP address. This requires that the MAC address fits in the IP address; it is used with IPv6 but not with the current version of IP. Write the mappings MAC <-> IP in a server (used in special cases like ATM or frame relay). Use a discovery protocol by broadcast. This is done on all LANs (Ethernet, WiFi). on LANs: uses the Address Resolution Protocol 32 bit IP address 48 bit MAC address ARP 25
26 ARP Protocol 1: lrcsuns has a packet to send to (lrcpc1) this address is on the same subnet lrcsuns sends an ARP request to all systems on the subnet (broadcast) target IP address = ARP request is received by all IP hosts on the local network is not forwarded by routers lrcsuns lrcpc1 lrcpc2 in-inr 08:00:20:71:0D:D4 00:00:C0:B3:D2:8D 00:00:0C:02:78:36 1 26
27 ARP Protocol 2: lrcpc1 has recognized its IP address lrcsuns lrcpc1 lrcpc2 in-inr 08:00:20:71:0D:D4 00:00:C0:B3:D2:8D 00:00:0C:02:78:36 2: lrcpc1 has recognized its IP address sends an ARP reply packet to the requesting host with its IP and MAC addresses 27
28 ARP Protocol lrcsuns lrcpc1 lrcpc2 in-inr 08:00:20:71:0D:D4 00:00:C0:B3:D2:8D 00:00:0C:02:78:36 1 2 3 3: lrcsuns reads ARP reply, stores in a cache and sends IP packet to lrcpc1 Systems learn from ARP-REQUESTs. At the end of flow 1, all systems have learnt the mapping IP <-> MAC addr for the source of the ARP REQUEST, namely, they have updated the following entry in their ARP table: IP addr: MAC addr: 08:00:20:71:0D:D4. As a result, lrcpc1 will not send an ARP-REQUEST to communicate back with lrcsuns. Gratuitous ARP consists in sending an ARP-REQUEST to self’s address. This is used at bootstrap to test the presence of a duplicate IP address. It is also used to force ARP cache entries to be changed after an address change (because systems learn from the ARP-REQUEST). As flow 2 shows, the ARP-REPLY is not broadcast, but sent directly to the system that issued the request. The “arp” command on Unix can be used to see or modify the ARP table. 28
29 Test Your Understanding (3, cont’d)Q2: What must the router do when it receives a packet from M2 to M3 for the first time? solution Router Ethernet Concentrator M1 p.h1 M2 p.h2 M3 q.h1 M8 q.1 M4 q.h3 M9 p.1 subnet p subnet q 1 2 29
30 *Look inside an ARP packetEthernet II Destination: ff:ff:ff:ff:ff:ff (ff:ff:ff:ff:ff:ff) Source: 00:03:93:a3:83:3a (Apple_a3:83:3a) Type: ARP (0x0806) Trailer: Address Resolution Protocol (request) Hardware type: Ethernet (0x0001) Protocol type: IP (0x0800) Hardware size: 6 Protocol size: 4 Opcode: request (0x0001) Sender MAC address: 00:03:93:a3:83:3a (Apple_a3:83:3a) Sender IP address: ( ) Target MAC address: 00:00:00:00:00:00 (00:00:00_00:00:00) Target IP address: ( ) 30
31 Proxy ARP Proxy ARP = a host answers ARP requests on behalf of othersexample: sic500cs for PPP connected computers Allows to cheat: connect to different physical networks that have same subnet prefix Price to pay: ad-hoc configuration + single point of failure Q1: how must sics500cs routing table be configured ? Q2: explain what happens when ed2-in has a packet to send to solution ed2-in 15.221 15.13 ed0-ext AUT-Backbone sic500cs Modem + PPP stisun1 15.7 31
32 *4. IP header Transmitted "big-endian" - bit 31 firstVersion H-size Type of service Size Identification F M Offset TTL Protocol Checksum source address destination address options Transmitted "big-endian" - bit 31 first Version is always 4 (IPv6 uses a different packet format) Header size options - variable size in 32 bit words 32
33 *IP header Type of service Packet size Id Flags OffsetPreviously used to encode priority; now used by DiffServ (Differentiated Services) 1 byte codepoint determining QoS class Expedited Forwarding (EF) - minimize delay and jitter Assured Forwarding (AF) - four classes and three drop-precedences (12 codepoints) Used only in corporate networks Packet size in bytes including header 64 Kbytes; limited in practice by link-level MTU (Maximum Transmission Unit) every subnet should forward packets of 576 = bytes Id unique identifier for re-assembling Flags M : more ; set in fragments F : prohibits fragmentation Offset position of a fragment in multiples of 8 bytes TTL (Time-to-live) in seconds now: number of hops router : --, if 0, drop (send ICMP packet to source) Protocol identifier of protocol (1 - ICMP, 6 - TCP, 17 - UDP) Checksum only on the header 33
34 *IP Checksum The IP checksum is a simple example of error detecting code. It works as follows. Consider a sequence of bytes and group them by 16-bit words. If the sequence has an odd number of bytes, add an extra 0 byte at the end. Obtain the 16 bits words W0 to Wj. Consider the number x = 216 j Wj (j-1) Wj-1 + … W1 + W0 The checksum is y = (216 –1) – z with z = x mod (216 –1) The computation of y is algorithmically simple. Note that 216 = 1 mod (216 –1) and thus z = Wj + Wj-1 + … + W1 + W0 mod (216 –1) The algorithm is: compute z = Wj + Wj-1 + … + W1 + W0 group the result by blocks of 16 bits; obtain x’ = 216 j’ W’j’ (j’-1) W’j’-1 + … W’1 + W’0 start again with x’ instead of x until z is a 16 bit word Comments: Addition modulo (216 –1) is called « one’s complement addition » The method is the same as the « proof by 9 » used by scholars before calculators existed, with 9 replaced by 216 –1; ex: mod 9 = mod 9 = 35 mod 9 = 3+5 mod 9 = 8 See RFC 1624 for how to do the computations in practice with 32 bit arithmetic. 34
35 *Examples of IP Checksumsall numbers are written in hexa data: W1=0103 W0= 0012 z = checksum y = data: F203 F4F5 F6F7 z = F203 + F4F5 + F6F7 = solution source: 35
36 *Verifying a Checksum Destination receives Wj … W0 y If there is no error we should have: Wj + … +W0 + y = 0 mod (216 –1) Destination computes the one’s complement sum of the block including checksum and verifies if the result is 0 mod (216 –1) Examples: received block FEEA verification: FEEA = FFFF p received block F203 F4F5 F6F7 210E verification: F203 + F4F5 + F6F E = 2 FFFD FFFD = FFFF p 36
37 *IP header Options Options strict source routing loose source routingall routers loose source routing some routers record route timestamp route router alert used by IGMP or RSVP for processing a packet 37
38 Look inside an IP packetEthernet II Destination: 00:03:93:a3:83:3a (Apple_a3:83:3a) Source: 00:10:83:35:34:04 (HEWLETT-_35:34:04) Type: IP (0x0800) Internet Protocol, Src Addr: ( ), Dst Addr: ( ) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) Total Length: 1500 Identification: 0x624d Flags: 0x04 Fragment offset: 0 Time to live: 64 Protocol: TCP (0x06) Header checksum: 0x82cf (correct) Source: ( ) Destination: ( ) 38
39 5. ICMP: Internet Control Message Protocolused by router or host to send error or control messages to other hosts or routers error or control messages relate to layer 3 only carried in IP datagrams (protocol type = 1) ICMP message types echo request ( reply) -> used by ping destination unreachable time exceeded (TTL = 0) -> used for traceroute responses address mask request/reply source quench redirect - router discovery timestamps ICMP messages never sent in response to ICMP error message - datagram sent or multicast or broadcast IP or layer 2 address - fragment other than first 39
40 *ICMP Redirect Sent by router R1 to source host A when R1 receives a packet from A with destination = B, and R1 finds that the next hop is R2, A is on-link with R2 (thus A should not have sent to R1, but directly to R2) R1 sends ICMP redirect to A saying next hop for destination B is R2 A updates its routing table with a host route General routing principle of the TCP/IP architecture: host have minimal routing information learn host routes from ICMP redirects routers have extensive knowledge of routes ICMP Redirect Format / / | IP datagram header (prot = ICMP) | | Type= | code | checksum | | Router IP address that should be preferred | | IP header plus 8 bytes of original datagram data | 40
41 *ICMP Redirect Example156.1 in-inr lemas3 29.1 ed2-el inr-el 29.9 29.200 ed2-in 182.5 1 2 lrcsuns 156.24 3 4 4 2 dest IP addr srce IP addr prot data part : udp xxxxxxx 2: udp xxxxxxx 3: icmp type=redir code=host cksum xxxxxxx (28 bytes of 1) 4: udp 41
42 ICMP Redirect Example (cont’d)After 4 lrcsuns:/export/home1/leboudec$ netstat -nr Routing Table: Destination Gateway Flags Ref Use Interface UH lo0 UGHD U le0 U le0 default UG 42
43 *6. MTU Link-layer networks have different maximum frame lengthEthernet, WiFi 802.3 with LLC/SNAP Token Ring 4 Mb/s 16 Mb/s FDDI X.25 Frame Relay ATM with AAL5 Hyperchannel PPP 1500 1492 4464 17914 4352 576 1600 9180 65535 296 to Link-layer networks have different maximum frame length MTU (maximum transmission unit) = maximum frame size usable for an IP packet value of short MTU ? of long MTU ? solution lrcsuns:/export/home1/leboudec$ ifconfig -a lo0: flags=849
44 IP Fragmentation IP hosts or routers may have IP datagrams larger than MTU Fragmentation is performed when IP datagram too large re-assembly is only at destination, never at intermediate points fragmentation is in principle avoided with TCP R2 R1 MTU = 1500 MTU = 620 MTU =1500 IP Header 1400 Bytes 600 B 200 B 1 2a 2b 2c 3a 3b 3c 44
45 IP Fragmentation (2) IP datagram is fragmented if MTU of interface < datagram total length all fragments are self-contained IP packets fragmentation controlled by fields: Identification, Flag and Fragment Offset IP datagram = original ; IP packet = fragments or complete datagram Fragment data size (here 600) is always a multiple of 8 Identification given by source Length Identification More Fragment flag Offset 8 * Offset 1 1420 567 2a 620 2b 75 600 2c 220 150 1200 45
46 *Fragmentation AlgorithmRepeated fragmentations may occur Don’t fragment flag prevents fragmentation Fragmentation Algorithm: procedure sendIPp(P0): if P0.totalLength > MTU then data1Length = (MTU-P0.HLEN rounded to multiple of 8) data1= first data1Length bytes of P0 data part data2= remainder of P0 data part header1 = P0.header with More bit set totalLength = P0.HLEN + data1Length P1= new (IPPacket; header1; data1) send P1 on data link layer header2 = P0.header with totalLength = P0.totalLength - data1Length fragmentOffset += data1Length/8 P2= new(IPPacket; header2; data2) sendIPp(P2) else send P0 on data link layer 46
47 IP packets are sorted in fragment lists one fragment list per (Identification, source sorted by increasing Fragment Offset Fragments F1 and F2 are contiguous iff F1.moreBit = 1 F1.fragmentOffset + F1.dataLength/8 = F2.fragmentOffset Fragment List F0…Fn is complete iff F0.fragmentOffset = 0 Fi and Fi+1 are contiguous for i=0…(n-1) Fn.moreBit = 0 IP packet arrival (P0) /* and packet is not a complete datagram */ -> if (P0.(identification, source address)) is new then if (new(fragmentList, P0.(identification, source address), fl)) then insert P0 in fl start reassemblyTimer(fl) else fl = fragmentList(P0.(identification, source address)) insert(fl,P0) if fl is complete then deliver IP datagram else start reassemblyTimer(fl) reassemblyTimer(fl) expires -> send ICMP error message to source delete(fl) Comments: new(fragment list) may fail if there is no buffer left; in that case the datagram is lost insert may fail; if insert fails, then the fragment is discarded 47
48 *Issues with FragmentationFragmentation requires re-assembly; issues are deadlocks identification wrapping problem unit of loss is smaller than unit of re-transmission: can worsen congestion Q. explain why Solution = avoid fragmentation Path MTU = minimum MTU for all links of one path Discovery of path MTU heuristics: local -> 1500; other : 576 (subnetsarelocal variable) Path MTU discovery avoids fragmentation solution 48
49 Path MTU Discovery Method for Path MTU (PMTU) discovery1. host sets Don’t Fragment bit on all datagrams and estimate PMTU to local MTU 2. routers send an ICMP message: “destination unreachable/ fragmentation needed” 3. host reduces PMTU estimate to next smallest value 4. after timeout, host increases PMTU estimate route changes may cause 2 49
50 TCP, UDP and FragmentationThe UDP service interface accepts a datagram up to 64 KB UDP datagram passed to the IP service interface as one SDU is fragmented at the source if resulting IP datagram is too large The TCP service interface is stream oriented packetization is done by TCP several calls to the TCP service interface may be grouped into one TCP segment (many small pieces) or: one call may cause several segments to be created (one large piece) TCP always creates a segment that fits in one IP packet: no fragmentation at source fragmentation may occur in a router, if IPv4 is used, and if PMTU discovery is not implemented Q. If all sources use PMTU discovery, in which cases has a router to fragment a packet ? solution 50
51 7. Terminology Architecture IP router Implementation:a system that forwards packets based on IP addresses performs packet forwarding + control method Implementation: any UNIX machine can be configured as IP router normally, dedicated box with specialized hardware called router 51
52 What is a “Multiprotocol Router” ?a system that forwards packets based on layer 3 addresses for various protocol architectures (ex: IP, Appletalk) CISCO, IBM, etc… most multiprotocol routers perform both bridging and routing architecture: bridge + router implementation: one CISCO IP router boxes also perform other functions: port filtering, DHCP relay, … Q. In a pure IP world (if all machines run TCP/IP) do we need multiprotocol routers ? A. Yes if both IPv4 and IPv6 are used. solution 52
53 * Example of Combined Functions in One ProductPut a bridge + Ethernet concentrator + router in the same box The resulting product is called “switching router” Avoids ARP broadcasts The words switches and routers are normally used in many different ways. For us, a switch is an intermediate system for connection oriented network layers such as ATM or Frame Relay. For the commercial literature, it usually means a fast packet forwarder, usually implemented in hardware. In reality, routers can be implemented exactly in the same way and with the same performance as “switches”. The main difference is for multiprotocol routers that need to understand not just one network layer, but many. In such cases, only software implementations are available. In contrast, IP only routers are emerging with a performance similar to that of switches. The “switching router” concept is an example of product, which is new as a product, but from an architecture viewpoint is nothing new. Since the router is in the same box as the Ethernet concentrator, it can know (by software) the MAC address of directly attached systems. Thus, the ARP broadcasts are avoided. H1 H2 Switching Router Switching Router 1 Router M9 p.1 M8 q.1 2 M1 p.h1 M4 q.h3 M3 q.h1 M2 p.h2 53
54 Why are Bridges called “Multiprotocol” ?Some network protocols (ex: Appletalk, IPX, IPv6) are not compatible with IPv4 routers must be multiprotocol but bridges work independently of which network layer protocol is used -- they are called “multiprotocol” in the commercial literature! B (an old Macintosh file server) runs only Appletalk. Only applications using the Appletalk protocols can be used (MacOS file sharing, printing). TCP/IP applications such as the web cannot be used on B. C (a modern PC) runs only TCP/IP. All TCP/IP applications can be used, but not native MacOS file sharing. A (a windows server) runs both in parallel. It can talk to both C and B. A bridge can be used to interconnect A, B and C; there is nothing special to do. If a router is used instead, it must run in parallel Appletalk and IP. The protocol stacks shown are all implemented in software. They use the standard Ethernet adapters. 54
55 What is a “Non Routable Protocol” ?NetBIOS was originally developed to work only in one bridged LAN uses LLC-2, similar to TCP but located in layer 2 (also called NETBEUI) in that form, it is not “routable”: can only be bridged NetBIOS is an interface for distributed applications that is commonly used with IBM and Microsoft systems. Only MAC addresses are used. In addition, NetBIOS offers a naming service. This version of NetBIOS works only in a bridged environment. NetBIOS today is offered as a TCP/IP application uses the NBT reserved port Windows machines at EPFL use TCP/IP only 55
56 *Virtual LANs and SubnetsIP requires machines to be organized by subnets -- This is a problem when machines (and people) move One solution is provided by layer 2: virtual LANs What is does : define LANs independent from location How: associate (by configuration rules) hosts with virtual LAN labels. The picture shows two virtual LANs: (ACLNV) and (BDMPU). The concentrators perform bridging between the different collision domains of the same virtual LAN. Between two virtual LANs, a router must be used. The figure shows one router that belongs to both VLANs Between X1 and X2, the two virtual LANs use the same physical link. This is made possible by adding a label to the Ethernet packet header, that identifies the virtual LAN. Q. How many spanning trees are there in this network ? solution Q. Can you think of another solution to the same problem ? solution 56
57 Router 57
58 Facts to Remember IP is a connectionless network layerIPv4 addresses are 32 bit numbers One IP address per interface Routers scale well because they can aggregate routes Hosts on the Internet exchange packets with IP addresses 58
59 Solutions 59
60 1. Why a network layer? We would like to interconnect all devices in the world. We have seen that we can solve the interconnection problem with bridges and the MAC layer. However this is not sufficient as it does scale to large networks. Q. Why ? A. 1. Bridges use a tree. This is not efficient in a large network, as the tree concentrates all traffic. 2. Bridges use forwarding tables that are not structured. A bridge must lookup the entire table for every packet. The table size and lookup time would be prohibitive. Solution: connectionless network layer (eg. Internet Protocol, IP): every host receives a network layer address (IP address) intermediate systems forward packets based on destination address back 60
61 Representation of IP Addressesdotted decimal: group bits in bytes, write the decimal representation of the number example 1: example 2: hexadecimal: hexadecimal representation -- fixed size string example 1: x80 BF 97 01 example 2: x81 C binary: string of 32 bits (2 symbols: 0, 1) example 1: b example 2: b back 61
62 A Subnet Prefix is written using one of two Notations: masks / prefixesexample 2: mask Q1: what is the prefix ? A: Q2: how many host ids can be allocated ? A: 64 (minus the reserved addresses: 62) 129 132 119 77 64 addresses 255 255 255 192 26 6 129 132 119 64 back 62
63 Prefix Notation example 2: Q1: write mask in prefix notation A: /26 or /26 Q2: are these prefixes different ? /28, /28, /28, /28 A: they differ in bits that are not the last 4 ones, thus they are all different prefixes how many IP addresses can be allocated to each of the distinct subnets ? A: 14 (16 minus 2 reserved) 201 10 16 addresses 201 10 15 28 4 back 63
64 Address delegation Europe62/8, 80/8, /8, … ISP-1 62.125/16 customer 1: banana foods /25 customer 2: sovkom /24 ISP-2 195.44/14 customer 1: /21 customer 2: /21 Q. Assume sovkom moves from ISP-1 to ISP-2; comment on the impact. A. If sovkom keeps the same IP addresses, the set of addresses of ISP-2 is no longer continguous. It cannot be represented by one single entry in routing tables. Routing tables in the internet need to represent ISP-2 by two entries: /14 and /24 back 64
65 Test Your Understanding (1)back bridge host Y router host X host Z router bridge host A A: No, host A is on subnetwork 65
66 Test your Understanding (2)Q1: An Ethernet segment became too crowded; we split it into 2 segments, interconnected by a router. Do we need to change some IP host addresses ? A: yes in general. Two different subnets cannot have the same prefix Q2: same with a bridge A: no, bridging is transparent. Q3: compare the two A: bridging is plug and play but the network performance is more difficult to guarantee (broadcasts + spanning tree) back 66
67 Example back Q: Fill in the table if an IP packet has to be sent from lrcsuns Q: Fill in the table if an IP packet has to be sent from ed2-in final destination next hop case number loopback 3 2 final destination next hop case number loopback 3 2 67
68 Test Your Understanding (3)back Q1: What are the MAC and IP addresses at points 1 and 2 for packets sent by M1 to M3 ? At 2 for packets sent by M4 to M3 ?(Mx = mac address) A: at 1: srce dest MACsrce=M1, MACdest=M9 at 2: srce dest MACsrce=M8, MACdest=M3 at 2: srce dest MACsrce=M4, MACdest=M3 Router Ethernet Concentrator M1 p.h1 M2 p.h2 M3 q.h1 M8 q.1 M4 q.h3 M9 p.1 subnet p subnet q 1 2 The solution is given in class. 68
69 Test Your Understanding (3)Q2: What must the router do when it receives a packet from M2 to M3 for the first time? A: send an ARP request broadcast on LAN q back Router Ethernet Concentrator M1 p.h1 M2 p.h2 M3 q.h1 M8 q.1 M4 q.h3 M9 p.1 subnet p subnet q 1 2 The solution is given in class. 69
70 Proxy ARP Q1: how must sics500cs routing table be configured ?A: one host route per host such as Q2: explain what happens when ed2-in has a packet to send to packet sent to ed0-ext ARP sent by ed0-ext for target address = sics500cs responds with MAC addr = sic500cs’s MAC addr packet sent ed0-ext to sic500cs sic500cs reads host route and forwards to (case 1 of IP forwarding algorithm) back ed2-in 15.221 15.13 ed0-ext AUT-Backbone sic500cs Modem + PPP stisun1 15.7 70
71 Examples of IP Checksumsall numbers are written in hexa data: W1=0103 W0= 0012 z = = 01 15 checksum y = FFFF – z = FEEA data: F203 F4F5 F6F7 z = F203 + F4F5 + F6F7 = 0002 DEEF z = DEEF = DEF1 checksum y = FFFF - DEF1= 210E back source: 71
72 MTU value of short MTU ? of long MTU ?reduces queue lengths and delays on lossy links (radio) reduces proba of packet error of long MTU ? reduces per packet processing back 72
73 Issues with Fragmentationback Fragmentation requires re-assembly; issues are deadlocks identification wrapping problem unit of loss is smaller than unit of re-transmission: can worsen congestion Q. explain why A. when a network is congested, packets get lost. Assume every datagram is fragmented in 10, and a single loss causes retransmission. The losses of a n packets (belonging to different datagrams) causes 10n retransmissions, which increases the offered traffic and makes congestion worse. Solution = avoid fragmentation Path MTU = minimum MTU for all links of one path Discovery of path MTU heuristics: local -> 1500; other : 576 (subnetsarelocal variable) Path MTU discovery avoids fragmentation 73
74 Fragmentation (sol) The UDP service interface accepts a datagram up to 64 KB UDP datagram passed to the IP service interface as one SDU is fragmented at the source if resulting IP datagram is too large The TCP service interface is stream oriented packetization is done by TCP several calls to the TCP service interface may be grouped into one TCP segment (many small pieces) or: one call may cause several segments to be created (one large piece) TCP always creates a segment that fits in one IP packet: no fragmentation at source fragmentation may occur in a router, if IPv4 is used, and if PMTU discovery is not implemented Q. If all sources use PMTU discovery, in which cases has a router to fragment a packet ? A. 1. UDP packets sent by sources that have a larger local MTU than the path MTU 2. TCP packets where PMTU estimation failed (due to path changes) back 74
75 What is a “Multiprotocol Router” ?a system that forwards packets based on layer 3 addresses for various protocol architectures (ex: IP, Appletalk) CISCO, IBM, etc… most multiprotocol routers perform both bridging and routing architecture: bridge + router implementation: one CISCO IP router boxes also perform other functions: port filtering, DHCP relay, … Q. In a pure IP world (if all machines run TCP/IP) do we need multiprotocol routers ? A. Yes if both IPv4 and IPv6 are used. back 75
76 Virtual LANs and SubnetsIP requires machines to be organized by subnets -- This is a problem when machines (and people) move One solution is provided by layer 2: virtual LANs What is does : define LANs independent from location How: associate (by configuration rules) hosts with virtual LAN labels. The picture shows two virtual LANs: (ACLNV) and (BDMPU). The concentrators perform bridging between the different collision domains of the same virtual LAN. Between two virtual LANs, a router must be used. The figure shows one router that belongs to both VLANs Between X1 and X2, the two virtual LANs use the same physical link. This is made possible by adding a label to the Ethernet packet header, that identifies the virtual LAN. Q. How many spanning trees are there in this network ? A. 2 (one per virtual LAN) back 76
77 Virtual LANs and SubnetsQ. Can you think of another solution to the same problem ? A. DHCP back 77