William Stallings Computer Organization and Architecture 8th Edition

1 William Stallings Computer Organization and Architectur...
Author: Lester Kelly
0 downloads 2 Views

1 William Stallings Computer Organization and Architecture 8th EditionChapter 2 Computer Evolution and Performance

2 Electronic Numerical Integrator And Computer Eckert and Mauchly ENIAC - background Electronic Numerical Integrator And Computer Eckert and Mauchly University of Pennsylvania Trajectory tables for weapons Started 1943 Finished 1946 Too late for war effort Used until 1955

3 ENIAC - details Decimal (not binary) 20 accumulators of 10 digits Programmed manually by switches 18,000 vacuum tubes 30 tons 15,000 square feet 140 kW power consumption 5,000 additions per second

4 Stored Program concept Main memory storing programs and data von Neumann/Turing Stored Program concept Main memory storing programs and data ALU operating on binary data Control unit interpreting instructions from memory and executing Input and output equipment operated by control unit Princeton Institute for Advanced Studies IAS Completed 1952

5 Structure of von Neumann machine

6 Set of registers (storage in CPU)IAS - details 1000 x 40 bit words Binary number 2 x 20 bit instructions Set of registers (storage in CPU) Memory Buffer Register Memory Address Register Instruction Register Instruction Buffer Register Program Counter Accumulator Multiplier Quotient

7 Structure of IAS – detail

8 1947 - Eckert-Mauchly Computer Corporation Commercial Computers Eckert-Mauchly Computer Corporation UNIVAC I (Universal Automatic Computer) US Bureau of Census 1950 calculations Became part of Sperry-Rand Corporation Late 1950s - UNIVAC II Faster More memory

9 Punched-card processing equipment 1953 - the 701IBM Punched-card processing equipment the 701 IBM’s first stored program computer Scientific calculations the 702 Business applications Lead to 700/7000 series

10 Transistors Replaced vacuum tubes Smaller Cheaper Less heat dissipation Solid State device Made from Silicon (Sand) Invented 1947 at Bell Labs William Shockley et al.

11 Transistor Based ComputersSecond generation machines NCR & RCA produced small transistor machines IBM 7000 DEC Produced PDP-1

12 Microelectronics Literally - “small electronics” A computer is made up of gates, memory cells and interconnections These can be manufactured on a semiconductor e.g. silicon wafer

13 Generations of ComputerVacuum tube Transistor Small scale integration on Up to 100 devices on a chip Medium scale integration - to 1971 100-3,000 devices on a chip Large scale integration 3, ,000 devices on a chip Very large scale integration 100, ,000,000 devices on a chip Ultra large scale integration – Over 100,000,000 devices on a chip

14 Moore’s Law Increased density of components on chipGordon Moore – co-founder of Intel Number of transistors on a chip will double every year Since 1970’s development has slowed a little Number of transistors doubles every 18 months Cost of a chip has remained almost unchanged Higher packing density means shorter electrical paths, giving higher performance Smaller size gives increased flexibility Reduced power and cooling requirements Fewer interconnections increases reliability

15 Growth in CPU Transistor Count

16 Replaced (& not compatible with) 7000 series IBM 360 series 1964 Replaced (& not compatible with) 7000 series First planned “family” of computers Similar or identical instruction sets Similar or identical O/S Increasing speed Increasing number of I/O ports (i.e. more terminals) Increased memory size Increased cost Multiplexed switch structure

17 First minicomputer (after miniskirt!) DEC PDP-8 1964 First minicomputer (after miniskirt!) Did not need air conditioned room Small enough to sit on a lab bench $16,000 $100k+ for IBM 360 Embedded applications & OEM BUS STRUCTURE

18 DEC - PDP-8 Bus Structure

19 Capacity approximately doubles each yearSemiconductor Memory 1970 Fairchild Size of a single core i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core Capacity approximately doubles each year

20 Intel First microprocessor All CPU components on a single chip 4 bit Followed in 1972 by 8008 8 bit Both designed for specific applications Intel’s first general purpose microprocessor

21 Speeding it up Pipelining On board cache On board L1 & L2 cache Branch prediction Data flow analysis Speculative execution

22 Performance Balance Processor speed increased Memory capacity increased Memory speed lags behind processor speed

23 Login and Memory Performance Gap

24 Increase number of bits retrieved at one timeSolutions Increase number of bits retrieved at one time Make DRAM “wider” rather than “deeper” Change DRAM interface Cache Reduce frequency of memory access More complex cache and cache on chip Increase interconnection bandwidth High speed buses Hierarchy of buses

25 Peripherals with intensive I/O demands Large data throughput demands I/O Devices Peripherals with intensive I/O demands Large data throughput demands Processors can handle this Problem moving data Solutions: Caching Buffering Higher-speed interconnection buses More elaborate bus structures Multiple-processor configurations

26 Typical I/O Device Data Rates

27 Key is Balance Processor components Main memory I/O devices Interconnection structures

28 Improvements in Chip Organization and ArchitectureIncrease hardware speed of processor Fundamentally due to shrinking logic gate size More gates, packed more tightly, increasing clock rate Propagation time for signals reduced Increase size and speed of caches Dedicating part of processor chip Cache access times drop significantly Change processor organization and architecture Increase effective speed of execution Parallelism

29 Problems with Clock Speed and Login DensityPower Power density increases with density of logic and clock speed Dissipating heat RC delay Speed at which electrons flow limited by resistance and capacitance of metal wires connecting them Delay increases as RC product increases Wire interconnects thinner, increasing resistance Wires closer together, increasing capacitance Memory latency Memory speeds lag processor speeds Solution: More emphasis on organizational and architectural approaches

30 Intel Microprocessor Performance

31 Increased Cache CapacityTypically two or three levels of cache between processor and main memory Chip density increased More cache memory on chip Faster cache access Pentium chip devoted about 10% of chip area to cache Pentium 4 devotes about 50%

32 More Complex Execution LogicEnable parallel execution of instructions Pipeline works like assembly line Different stages of execution of different instructions at same time along pipeline Superscalar allows multiple pipelines within single processor Instructions that do not depend on one another can be executed in parallel

33 Internal organization of processors complexDiminishing Returns Internal organization of processors complex Can get a great deal of parallelism Further significant increases likely to be relatively modest Benefits from cache are reaching limit Increasing clock rate runs into power dissipation problem Some fundamental physical limits are being reached

34 New Approach – Multiple CoresMultiple processors on single chip Large shared cache Within a processor, increase in performance proportional to square root of increase in complexity If software can use multiple processors, doubling number of processors almost doubles performance So, use two simpler processors on the chip rather than one more complex processor With two processors, larger caches are justified Power consumption of memory logic less than processing logic

35 x86 Evolution (1) 8080 first general purpose microprocessor8 bit data path Used in first personal computer – Altair 8086 – 5MHz – 29,000 transistors much more powerful 16 bit instruction cache, prefetch few instructions 8088 (8 bit external bus) used in first IBM PC 80286 16 Mbyte memory addressable up from 1Mb 80386 32 bit Support for multitasking 80486 sophisticated powerful cache and instruction pipelining built in maths co-processor

36 x86 Evolution (2) Pentium Pentium Pro Pentium II Pentium IIISuperscalar Multiple instructions executed in parallel Pentium Pro Increased superscalar organization Aggressive register renaming branch prediction data flow analysis speculative execution Pentium II MMX technology graphics, video & audio processing Pentium III Additional floating point instructions for 3D graphics

37 x86 Evolution (3) Pentium 4 Core Core 2Note Arabic rather than Roman numerals Further floating point and multimedia enhancements Core First x86 with dual core Core 2 64 bit architecture Core 2 Quad – 3GHz – 820 million transistors Four processors on chip x86 architecture dominant outside embedded systems Organization and technology changed dramatically Instruction set architecture evolved with backwards compatibility ~1 instruction per month added 500 instructions available See Intel web pages for detailed information on processors

38 ARM evolved from RISC design Used mainly in embedded systemsEmbedded Systems ARM ARM evolved from RISC design Used mainly in embedded systems Used within product Not general purpose computer Dedicated function E.g. Anti-lock brakes in car

39 Embedded Systems RequirementsDifferent sizes Different constraints, optimization, reuse Different requirements Safety, reliability, real-time, flexibility, legislation Lifespan Environmental conditions Static v dynamic loads Slow to fast speeds Computation v I/O intensive Descrete event v continuous dynamics

40 Possible Organization of an Embedded System

41 Designed by ARM Inc., Cambridge, England Licensed to manufacturers ARM Evolution Designed by ARM Inc., Cambridge, England Licensed to manufacturers High speed, small die, low power consumption PDAs, hand held games, phones E.g. iPod, iPhone Acorn produced ARM1 & ARM2 in 1985 and ARM3 in 1989 Acorn, VLSI and Apple Computer founded ARM Ltd.

42 ARM Systems CategoriesEmbedded real time Application platform Linux, Palm OS, Symbian OS, Windows mobile Secure applications

43 Performance Assessment Clock SpeedKey parameters Performance, cost, size, security, reliability, power consumption System clock speed In Hz or multiples of Clock rate, clock cycle, clock tick, cycle time Signals in CPU take time to settle down to 1 or 0 Signals may change at different speeds Operations need to be synchronised Instruction execution in discrete steps Fetch, decode, load and store, arithmetic or logical Usually require multiple clock cycles per instruction Pipelining gives simultaneous execution of instructions So, clock speed is not the whole story

44 System Clock

45 Instruction Execution RateMillions of instructions per second (MIPS) Millions of floating point instructions per second (MFLOPS) Heavily dependent on instruction set, compiler design, processor implementation, cache & memory hierarchy

46 Benchmarks Programs designed to test performanceWritten in high level language Portable Represents style of task Systems, numerical, commercial Easily measured Widely distributed E.g. System Performance Evaluation Corporation (SPEC) CPU2006 for computation bound 17 floating point programs in C, C++, Fortran 12 integer programs in C, C++ 3 million lines of code Speed and rate metrics Single task and throughput

47 SPEC Speed Metric Single taskBase runtime defined for each benchmark using reference machine Results are reported as ratio of reference time to system run time Trefi execution time for benchmark i on reference machine Tsuti execution time of benchmark i on test system Overall performance calculated by averaging ratios for all 12 integer benchmarks Use geometric mean Appropriate for normalized numbers such as ratios

48 SPEC Rate Metric Measures throughput or rate of a machine carrying out a number of tasks Multiple copies of benchmarks run simultaneously Typically, same as number of processors Ratio is calculated as follows: Trefi reference execution time for benchmark i N number of copies run simultaneously Tsuti elapsed time from start of execution of program on all N processors until completion of all copies of program Again, a geometric mean is calculated

49 Potential speed up of program using multiple processors Amdahl’s Law Gene Amdahl [AMDA67] Potential speed up of program using multiple processors Concluded that: Code needs to be parallelizable Speed up is bound, giving diminishing returns for more processors Task dependent Servers gain by maintaining multiple connections on multiple processors Databases can be split into parallel tasks

50 Amdahl’s Law Formula For program running on single processorFraction f of code infinitely parallelizable with no scheduling overhead Fraction (1-f) of code inherently serial T is total execution time for program on single processor N is number of processors that fully exploit parralle portions of code Conclusions f small, parallel processors has little effect N ->∞, speedup bound by 1/(1 – f) Diminishing returns for using more processors

51 Charles Babbage Institute PowerPC Intel Developer HomeInternet Resources Search for the Intel Museum Charles Babbage Institute PowerPC Intel Developer Home

52 References AMDA67 Amdahl, G. “Validity of the Single-Processor Approach to Achieving Large-Scale Computing Capability”, Proceedings of the AFIPS Conference, 1967.

53 William Stallings Computer Organization and Architecture 8th EditionChapter 3 Top Level View of Computer Function and Interconnection

54 Program Concept Hardwired systems are inflexible General purpose hardware can do different tasks, given correct control signals Instead of re-wiring, supply a new set of control signals

55 What is a program? A sequence of steps For each step, an arithmetic or logical operation is done For each operation, a different set of control signals is needed

56 Function of Control UnitFor each operation a unique code is provided e.g. ADD, MOVE A hardware segment accepts the code and issues the control signals We have a computer!

57 Data and instructions need to get into the system and results outComponents The Control Unit and the Arithmetic and Logic Unit constitute the Central Processing Unit Data and instructions need to get into the system and results out Input/output Temporary storage of code and results is needed Main memory

58 Computer Components: Top Level View

59 Instruction Cycle Two steps: Fetch Execute

60 Program Counter (PC) holds address of next instruction to fetch Fetch Cycle Program Counter (PC) holds address of next instruction to fetch Processor fetches instruction from memory location pointed to by PC Increment PC Unless told otherwise Instruction loaded into Instruction Register (IR) Processor interprets instruction and performs required actions

61 Execute Cycle Processor-memory Processor I/O Data processing Controldata transfer between CPU and main memory Processor I/O Data transfer between CPU and I/O module Data processing Some arithmetic or logical operation on data Control Alteration of sequence of operations e.g. jump Combination of above

62 Example of Program Execution

63 Instruction Cycle State Diagram

64 Interrupts Mechanism by which other modules (e.g. I/O) may interrupt normal sequence of processing Program e.g. overflow, division by zero Timer Generated by internal processor timer Used in pre-emptive multi-tasking I/O from I/O controller Hardware failure e.g. memory parity error

65 Program Flow Control

66 Added to instruction cycle Processor checks for interrupt Interrupt Cycle Added to instruction cycle Processor checks for interrupt Indicated by an interrupt signal If no interrupt, fetch next instruction If interrupt pending: Suspend execution of current program Save context Set PC to start address of interrupt handler routine Process interrupt Restore context and continue interrupted program

67 Transfer of Control via Interrupts

68 Instruction Cycle with Interrupts

69 Program Timing Short I/O Wait

70 Program Timing Long I/O Wait

71 Instruction Cycle (with Interrupts) - State Diagram

72 Multiple Interrupts Disable interrupts Define prioritiesProcessor will ignore further interrupts whilst processing one interrupt Interrupts remain pending and are checked after first interrupt has been processed Interrupts handled in sequence as they occur Define priorities Low priority interrupts can be interrupted by higher priority interrupts When higher priority interrupt has been processed, processor returns to previous interrupt

73 Multiple Interrupts - Sequential

74 Multiple Interrupts – Nested

75 Time Sequence of Multiple Interrupts

76 All the units must be connected Connecting All the units must be connected Different type of connection for different type of unit Memory Input/Output CPU

77 Computer Modules

78 Receives and sends data Receives addresses (of locations) Memory Connection Receives and sends data Receives addresses (of locations) Receives control signals Read Write Timing

79 Input/Output Connection(1)Similar to memory from computer’s viewpoint Output Receive data from computer Send data to peripheral Input Receive data from peripheral Send data to computer

80 Input/Output Connection(2)Receive control signals from computer Send control signals to peripherals e.g. spin disk Receive addresses from computer e.g. port number to identify peripheral Send interrupt signals (control)

81 CPU Connection Reads instruction and data Writes out data (after processing) Sends control signals to other units Receives (& acts on) interrupts

82 Buses There are a number of possible interconnection systems Single and multiple BUS structures are most common e.g. Control/Address/Data bus (PC) e.g. Unibus (DEC-PDP)

83 A communication pathway connecting two or more devices What is a Bus? A communication pathway connecting two or more devices Usually broadcast Often grouped A number of channels in one bus e.g. 32 bit data bus is 32 separate single bit channels Power lines may not be shown

84 Width is a key determinant of performanceData Bus Carries data Remember that there is no difference between “data” and “instruction” at this level Width is a key determinant of performance 8, 16, 32, 64 bit

85 Identify the source or destination of data Address bus Identify the source or destination of data e.g. CPU needs to read an instruction (data) from a given location in memory Bus width determines maximum memory capacity of system e.g has 16 bit address bus giving 64k address space

86 Control and timing informationControl Bus Control and timing information Memory read/write signal Interrupt request Clock signals

87 Bus Interconnection Scheme

88 Big and Yellow? What do buses look like?Parallel lines on circuit boards Ribbon cables Strip connectors on mother boards e.g. PCI Sets of wires

89 Physical Realization of Bus Architecture

90 Lots of devices on one bus leads to:Single Bus Problems Lots of devices on one bus leads to: Propagation delays Long data paths mean that co-ordination of bus use can adversely affect performance If aggregate data transfer approaches bus capacity Most systems use multiple buses to overcome these problems

91 Traditional (ISA) (with cache)

92 High Performance Bus

93 Bus Types Dedicated Multiplexed Separate data & address linesShared lines Address valid or data valid control line Advantage - fewer lines Disadvantages More complex control Ultimate performance

94 Bus Arbitration More than one module controlling the bus e.g. CPU and DMA controller Only one module may control bus at one time Arbitration may be centralised or distributed

95 Centralised or Distributed ArbitrationSingle hardware device controlling bus access Bus Controller Arbiter May be part of CPU or separate Distributed Each module may claim the bus Control logic on all modules

96 Co-ordination of events on bus SynchronousTiming Co-ordination of events on bus Synchronous Events determined by clock signals Control Bus includes clock line A single 1-0 is a bus cycle All devices can read clock line Usually sync on leading edge Usually a single cycle for an event

97 Synchronous Timing Diagram

98 Asynchronous Timing – Read Diagram

99 Asynchronous Timing – Write Diagram

100 PCI Bus Peripheral Component Interconnection Intel released to public domain 32 or 64 bit 50 lines

101 PCI Bus Lines (required)Systems lines Including clock and reset Address & Data 32 time mux lines for address/data Interrupt & validate lines Interface Control Arbitration Not shared Direct connection to PCI bus arbiter Error lines

102 PCI Bus Lines (Optional)Interrupt lines Not shared Cache support 64-bit Bus Extension Additional 32 lines Time multiplexed 2 lines to enable devices to agree to use 64-bit transfer JTAG/Boundary Scan For testing procedures

103 Transaction between initiator (master) and target Master claims bus PCI Commands Transaction between initiator (master) and target Master claims bus Determine type of transaction e.g. I/O read/write Address phase One or more data phases

104 PCI Read Timing Diagram

105 PCI Bus Arbiter

106 PCI Bus Arbitration

107 Foreground Reading Stallings, chapter 3 (all of it) In fact, read the whole site!

108 William Stallings Computer Organization and Architecture 8th EditionChapter 4 Cache Memory

109 Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organization

110 Location CPU Internal External

111 Capacity Word size Number of words The natural unit of organizationor Bytes

112 Unit of Transfer Internal External Addressable unitUsually governed by data bus width External Usually a block which is much larger than a word Addressable unit Smallest location which can be uniquely addressed Word internally Cluster or sector on disks

113 Access Methods (1) Sequential DirectStart at the beginning and read through in order Access time depends on location of data and previous location e.g. tape Direct Individual blocks have unique address Access is by jumping to vicinity plus sequential search Access time depends on location and previous location e.g. disk

114 Access Methods (2) Random AssociativeIndividual addresses identify locations exactly Access time is independent of location or previous access e.g. RAM Associative Data is located by a comparison with contents of a portion of the store e.g. cache

115 Internal or Main memoryMemory Hierarchy Registers In CPU Internal or Main memory May include one or more levels of cache “RAM” External memory Backing store

116 Memory Hierarchy - Diagram

117 Performance Access time Memory Cycle time Transfer RateTime between presenting the address and getting the valid data Memory Cycle time Time may be required for the memory to “recover” before next access Cycle time is access + recovery Transfer Rate Rate at which data can be moved

118 Physical Types Semiconductor Magnetic Optical Others RAM Disk & TapeCD & DVD Others Bubble Hologram

119 Physical CharacteristicsDecay Volatility Erasable Power consumption

120 Organization Physical arrangement of bits into words Not always obvious e.g. interleaved

121 The Bottom Line How much? How fast? How expensive? CapacityTime is money How expensive?

122 Hierarchy List Registers L1 Cache L2 Cache L3 Cache Main memory Disk cache Disk Optical Tape

123 So you want fast? It is possible to build a computer which uses only static RAM This would be very fast This would cost a very large amount

124 Locality of Reference During the course of the execution of a program, memory references tend to cluster e.g. loops

125 Cache Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module

126 Cache and Main Memory

127 Cache/Main Memory Structure

128 Cache operation – overviewCPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache slot

129 Cache Read Operation - Flowchart

130 Cache Design Addressing Size Mapping Function Replacement Algorithm Write Policy Block Size Number of Caches

131 Cache Addressing Where does cache sit?Between processor and virtual memory management unit Between MMU and main memory Logical cache (virtual cache) stores data using virtual addresses Processor accesses cache directly, not thorough physical cache Cache access faster, before MMU address translation Virtual addresses use same address space for different applications Must flush cache on each context switch Physical cache stores data using main memory physical addresses

132 Size does matter Cost Speed More cache is expensiveMore cache is faster (up to a point) Checking cache for data takes time

133 Typical Cache Organization

134 Comparison of Cache SizesComparison of Cache Sizes Processor Type Year of Introduction L1 cache L2 cache L3 cache IBM 360/85 Mainframe 1968 16 to 32 KB PDP-11/70 Minicomputer 1975 1 KB VAX 11/780 1978 16 KB IBM 3033 64 KB IBM 3090 1985 128 to 256 KB Intel 80486 PC 1989 8 KB Pentium 1993 8 KB/8 KB 256 to 512 KB PowerPC 601 32 KB PowerPC 620 1996 32 KB/32 KB PowerPC G4 PC/server 1999 256 KB to 1 MB 2 MB IBM S/390 G4 1997 256 KB IBM S/390 G6 8 MB Pentium 4 2000 IBM SP High-end server/ supercomputer 64 KB/32 KB CRAY MTAb Supercomputer Itanium 2001 16 KB/16 KB 96 KB 4 MB SGI Origin 2001 High-end server Itanium 2 2002 6 MB IBM POWER5 2003 1.9 MB 36 MB CRAY XD-1 2004 64 KB/64 KB 1MB

135 Comparison of Cache Sizes continuedProcessor Type Year introduced L1 cache L2 cache L3 cache AMD 6-core Opteron High-end server 2008 6 each 128 KB 512 KB 6.1 MB AMD Phenom II Desktop 4 each

136 Mapping memory to cacheCache mapping options – 8 cache slots Direct mapped 19 mod 8 = slot 3 Main memory blocks

137 Mapping memory to cacheCache mapping options – 8 cache slots Direct mapped Memory blocks 3, 11, 19, and 27 all map to cache slot 3 Main memory blocks

138 Mem blocks 3, 7, 11, 15, 19, 23, 27, 31 all map to set 3Mapping memory to cache Cache mapping options – 8 cache slots Set# 2-way set associative 19 mod 4 = set 3 (slots 6 or 7) Mem blocks 3, 7, 11, 15, 19, 23, 27, 31 all map to set 3 Main memory blocks

139 All mem blocks map anywhere in cacheMapping memory to cache Cache mapping options – 8 cache slots Fully associative 19 mod 1 = set 1 (All slots) All mem blocks map anywhere in cache

140 Mapping Function Cache of 64kByte Cache block of 4 bytesi.e. cache is 16k (214) lines of 4 bytes 16MBytes main memory 24 bit address (224=16M)

141 Each block of main memory maps to only one cache lineDirect Mapping Each block of main memory maps to only one cache line i.e. if a block is in cache, it must be in one specific place Address is in two parts Least Significant w bits identify unique word Most Significant s bits specify one memory block The MSBs are split into a cache line field r and a tag of s-r (most significant)

142 Direct Mapping Address StructureTag s-r Line or Slot r Word w 14 2 8 24 bit address 2 bit word identifier (4 byte block) 22 bit block identifier 8 bit tag (=22-14) 14 bit slot or line No two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tag

143 Direct Mapping from Cache to Main Memory

144 Direct Mapping Cache Line TableMain Memory blocks held 0, m, 2m, 3m…2s-m 1 1,m+1, 2m+1…2s-m+1 m-1 m-1, 2m-1,3m-1…2s-1

145 Direct Mapping Cache Organization

146 Direct Mapping Example

147 Direct Mapping SummaryAddress length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s+ w/2w = 2s Number of lines in cache = m = 2r Size of tag = (s – r) bits

148 Direct Mapping pros & consSimple Inexpensive Fixed location for given block If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high

149 Remember what was discardedVictim Cache Lower miss penalty Remember what was discarded Already fetched Use again with little penalty Fully associative 4 to 16 cache lines Between direct mapped L1 cache and next memory level

150 Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of memory Every line’s tag is examined for a match Cache searching gets expensive

151 Associative Mapping from Cache to Main Memory

152 Fully Associative Cache Organization

153 Associative Mapping Example

154 Associative Mapping Address StructureWord 2 bit Tag 22 bit 22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit word is required from 32 bit data block e.g. Address Tag Data Cache line FFFFFC FFFFFC FFF

155 Associative Mapping SummaryAddress length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s+ w/2w = 2s Number of lines in cache = undetermined Size of tag = s bits

156 Set Associative MappingCache is divided into a number of sets Each set contains a number of lines A given block maps to any line in a given set e.g. Block B can be in any line of set i e.g. 2 lines per set 2 way associative mapping A given block can be in one of 2 lines in only one set

157 Set Associative Mapping Example13 bit set number Block number in main memory is modulo 213 000000, 00A000, 00B000, 00C000 … map to same set

158 Mapping From Main Memory to Cache: v Associative

159 Mapping From Main Memory to Cache: k-way Associative

160 K-Way Set Associative Cache Organization

161 Set Associative Mapping Address StructureTag 9 bit Set 13 bit Word 2 bit Use set field to determine cache set to look in Compare tag field to see if we have a hit e.g Address Tag Data Set number 1FF 7FFC 1FF FFF 001 7FFC FFF

162 Two Way Set Associative Mapping Example

163 Set Associative Mapping SummaryAddress length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2d Number of lines in set = k Number of sets = v = 2d Number of lines in cache = kv = k * 2d Size of tag = (s – d) bits

164 Direct and Set Associative Cache Performance DifferencesSignificant up to at least 64kB for 2-way Difference between 2-way and 4-way at 4kB much less than 4kB to 8kB Cache complexity increases with associativity Not justified against increasing cache to 8kB or 16kB Above 32kB gives no improvement (simulation results)

165 Figure 4.16 Varying Associativity over Cache Size

166 Replacement Algorithms (1) Direct mappingNo choice Each block only maps to one line Replace that line

167 Replacement Algorithms (2) Associative & Set AssociativeHardware implemented algorithm (speed) Least Recently used (LRU) e.g. in 2 way set associative Which of the 2 block is lru? First in first out (FIFO) replace block that has been in cache longest Least frequently used replace block which has had fewest hits Random

168 Write Policy Must not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches I/O may address main memory directly

169 Write through All writes go to main memory as well as cache Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date Lots of traffic Slows down writes Remember bogus write through caches!

170 Write back Updates initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if update bit is set Other caches get out of sync I/O must access main memory through cache N.B. 15% of memory references are writes

171 Line Size Retrieve not only desired word but a number of adjacent words as well Increased block size will increase hit ratio at first the principle of locality Hit ratio will decreases as block becomes even bigger Probability of using newly fetched information becomes less than probability of reusing replaced Larger blocks Reduce number of blocks that fit in cache Data overwritten shortly after being fetched Each additional word is less local so less likely to be needed No definitive optimum value has been found 8 to 64 bytes seems reasonable For HPC systems, 64- and 128-byte most common

172 High logic density enables caches on chipMultilevel Caches High logic density enables caches on chip Faster than bus access Frees bus for other transfers Common to use both on and off chip cache L1 on chip, L2 off chip in static RAM L2 access much faster than DRAM or ROM L2 often uses separate data path L2 may now be on chip Resulting in L3 cache Bus access or now on chip…

173 Hit Ratio (L1 & L2) For 8 kbytes and 16 kbyte L1

174 Advantages of unified cacheUnified v Split Caches One cache for data and instructions or two, one for data and one for instructions Advantages of unified cache Higher hit rate Balances load of instruction and data fetch Only one cache to design & implement Advantages of split cache Eliminates cache contention between instruction fetch/decode unit and execution unit Important in pipelining

175 Pentium 4 Cache 80386 – no on chip cache80486 – 8k using 16 byte lines and four way set associative organization Pentium (all versions) – two on chip L1 caches Data & instructions Pentium III – L3 cache added off chip Pentium 4 L1 caches 8k bytes 64 byte lines four way set associative L2 cache Feeding both L1 caches 256k 128 byte lines 8 way set associative L3 cache on chip

176 Processor on which feature first appearsIntel Cache Evolution Problem Solution Processor on which feature first appears External memory slower than the system bus. Add external cache using faster memory technology. 386 Increased processor speed results in external bus becoming a bottleneck for cache access. Move external cache on-chip, operating at the same speed as the processor. 486 Internal cache is rather small, due to limited space on chip Add external L2 cache using faster technology than main memory Contention occurs when both the Instruction Prefetcher and the Execution Unit simultaneously require access to the cache. In that case, the Prefetcher is stalled while the Execution Unit’s data access takes place. Create separate data and instruction caches. Pentium Increased processor speed results in external bus becoming a bottleneck for L2 cache access. Create separate back-side bus that runs at higher speed than the main (front-side) external bus. The BSB is dedicated to the L2 cache. Pentium Pro Move L2 cache on to the processor chip. Pentium II Some applications deal with massive databases and must have rapid access to large amounts of data. The on-chip caches are too small. Add external L3 cache. Pentium III Move L3 cache on-chip. Pentium 4

177 Pentium 4 Block Diagram

178 Pentium 4 Core ProcessorFetch/Decode Unit Fetches instructions from L2 cache Decode into micro-ops Store micro-ops in L1 cache Out of order execution logic Schedules micro-ops Based on data dependence and resources May speculatively execute Execution units Execute micro-ops Data from L1 cache Results in registers Memory subsystem L2 cache and systems bus

179 Pentium 4 Design ReasoningDecodes instructions into RISC like micro-ops before L1 cache Micro-ops fixed length Superscalar pipelining and scheduling Pentium instructions long & complex Performance improved by separating decoding from scheduling & pipelining (More later – ch14) Data cache is write back Can be configured to write through L1 cache controlled by 2 bits in register CD = cache disable NW = not write through 2 instructions to invalidate (flush) cache and write back then invalidate L2 and L3 8-way set-associative Line size 128 bytes

180 Cache Line Size (words) Write Buffer Size (words)ARM Cache Features Core Cache Type Cache Size (kB) Cache Line Size (words) Associativity Location Write Buffer Size (words) ARM720T Unified 8 4 4-way Logical ARM920T Split 16/16 D/I 64-way 16 ARM926EJ-S 4-128/4-128 D/I ARM1022E ARM1026EJ-S Intel StrongARM 32-way 32 Intel Xscale 32/32 D/I ARM1136-JF-S 4-64/4-64 D/I Physical

181 ARM Cache OrganizationSmall FIFO write buffer Enhances memory write performance Between cache and main memory Small c.f. cache Data put in write buffer at processor clock speed Processor continues execution External write in parallel until empty If buffer full, processor stalls Data in write buffer not available until written So keep buffer small

182 ARM Cache and Write Buffer Organization

183 Internet Sources Manufacturer sites Intel ARM Search on cache

184 William Stallings Computer Organization and Architecture 8th EditionChapter 5 Internal Memory

185 Semiconductor Memory TypesCategory Erasure Write Mechanism Volatility Random-access memory (RAM) Read-write memory Electrically, byte-level Electrically Volatile Read-only memory (ROM) Read-only memory Not possible Masks Nonvolatile Programmable ROM (PROM) Erasable PROM (EPROM) Read-mostly memory UV light, chip-level Electrically Erasable PROM (EEPROM) Flash memory Electrically, block-level

186 Semiconductor Memory RAMMisnamed as all semiconductor memory is random access Read/Write Volatile Temporary storage Static or dynamic

187 Memory Cell Operation

188 Bits stored as charge in capacitors Charges leak Dynamic RAM Bits stored as charge in capacitors Charges leak Need refreshing even when powered Simpler construction Smaller per bit Less expensive Need refresh circuits Slower Main memory Essentially analogue Level of charge determines value

189 Dynamic RAM Structure

190 DRAM Operation Address line active when bit read or written Write ReadTransistor switch closed (current flows) Write Voltage to bit line High for 1 low for 0 Then signal address line Transfers charge to capacitor Read Address line selected transistor turns on Charge from capacitor fed via bit line to sense amplifier Compares with reference value to determine 0 or 1 Capacitor charge must be restored

191 Bits stored as on/off switches No charges to leak Static RAM Bits stored as on/off switches No charges to leak No refreshing needed when powered More complex construction Larger per bit More expensive Does not need refresh circuits Faster Cache Digital Uses flip-flops

192 Stating RAM Structure

193 Transistor arrangement gives stable logic state State 1Static RAM Operation Transistor arrangement gives stable logic state State 1 C1 high, C2 low T1 T4 off, T2 T3 on State 0 C2 high, C1 low T2 T3 off, T1 T4 on Address line transistors T5 T6 is switch Write – apply value to B & compliment to B Read – value is on line B

194 SRAM v DRAM Both volatile Dynamic cell StaticPower needed to preserve data Dynamic cell Simpler to build, smaller More dense Less expensive Needs refresh Larger memory units Static Faster Cache

195 Microprogramming (see later) Library subroutines Read Only Memory (ROM) Permanent storage Nonvolatile Microprogramming (see later) Library subroutines Systems programs (BIOS) Function tables

196 Written during manufacture Programmable (once)Types of ROM Written during manufacture Very expensive for small runs Programmable (once) PROM Needs special equipment to program Read “mostly” Erasable Programmable (EPROM) Erased by UV Electrically Erasable (EEPROM) Takes much longer to write than read Flash memory Erase whole memory electrically

197 Organization in detailA 16Mbit chip can be organized as 1M of 16 bit words A bit per chip system has 16 lots of 1Mbit chip with bit 1 of each word in chip 1 and so on A 16Mbit chip can be organised as a 2048 x 2048 x 4bit array Reduces number of address pins Multiplex row address and column address 11 pins to address (211=2048) Adding one more pin doubles range of values so x4 capacity

198 Refreshing Refresh circuit included on chip Disable chip Count through rows Read & Write back Takes time Slows down apparent performance

199 Typical 16 Mb DRAM (4M x 4)

200 Packaging

201 256kByte Module Organisation

202 1MByte Module Organisation

203 Interleaved Memory Collection of DRAM chips Grouped into memory bank Banks independently service read or write requests K banks can service k requests simultaneously

204 Bit flips are a problem in memory and data communications CausesMemory errors Bit flips are a problem in memory and data communications Causes Marginal or failed component Noise Cosmic rays / alpha particles

205 Detected using Hamming error correcting codeError Correction Hard Failure Permanent defect Soft Error Random, non-destructive No permanent damage to memory Detected using Hamming error correcting code

206 Error types Single-bit error Sent 00000010 Received 00001010Start of Text (STX) Line Feed (LF) Multiple-bit errors Sent Received ASCII B Line Feed (LF) Burst errors (two or more consecutive bits) Sent Rec

207 Error detection and correctionData message of m bits (gives 2m possible data messages) Add to this, r redundant bits that encode some kind of error detection and possibly correction Codeword sent of size n = m + r total bits The method of creating redundant r bits causes not all 2n codewords to be valid Receipt of an invalid codeword indicates an error

208 Defined as the number of bits by which codewords differ Hamming distance Defined as the number of bits by which codewords differ XOR two codewords together and count the number of 1's in the result xor Hamming distance of 3

209 Hamming distance continuedIf two codewords are d distance apart, it takes d single-bit errors to change one into the other All 2m messages are possible, and it is possible to create a list of all legal 2n codewords From this list, find the two legal codewords whose Hamming distance is the smallest This gives the Hamming distance for the entire code

210 Hamming distance continuedTo detect d errors, we need a code of at least d + 1 Hamming distance To correct d errors, need a code of at least 2d + 1 Hamming distance

211 Parity bit for redundancyAppend a single bit Even or odd parity chosen in advance In odd parity, if the count of ones in the message m is an even number we add a 1 parity bit to make the count odd m = (4 ones) parity bit = 1 codeword = (5 ones) Odd parity example

212 Parity bit continued A single parity bit gives a code with a Hamming distance of 2 Single parity bit can detect a single bit error, nothing more

213 Need a code with a larger Hamming distance than a single parity bitError correction Need a code with a larger Hamming distance than a single parity bit Suppose a code with these valid codewords: What is the Hamming distance of the code?

214 Error correction continuedClearly this code has a Hamming distance of 5 5 = 2d + 1, 2d = 4, d = 2 This code can correct double-bit errors What if we receive ? From the valid codewords, we would select A triple-bit error would not be corrected properly

215 Hamming error correction in practiceDesign a code with single-bit error correction Messages of size m divided into blocks with r redundancy bits per block Need a Hamming distance of at least 3 The number of r bits needed depends on size m Hamming devised such a code in 1950 that minimizes r Bits 1, 2, 4, 8, 16, etc are check bits (r redundant) All remaining bits are message bits (m message)

216 Redundant bits store parity for some group of message bits Hamming code Redundant bits store parity for some group of message bits For each message bit - Break down position number shown as power of 2 sum Upon message receipt - Check each redundant bit's group for parity If check fails, add number of its parity bit to a counter Success is counter = = 0, failure contains position of bit failure

217 Hamming code continuedposition k checked by sum of powers of 2 parity k 1 = 1 2 = 3 = 4 = 5 = 6 = 7 = 8 = 9 = 10 = 11 = 12 =

218 Hamming code continuedOdd bits have parity at bit 1 Bits 2-3, 6-7, 10-11, 14-15, 18-19, ... have parity at bit 2 Bits 4-7, 12-15, 20-23, ... have parity at bit 4 Bits 8-15, 24-31, 40-47, ... have parity at bit 8 and so on Position 1: Check a bit, skip a bit, check a bit, skip a bit Position 2: Check 2 bits, skip 2 bits, ....

219 Powers of 2 positions are parity (even)Hamming example Choose ASCII 'A' = (6510 or 0x41) Powers of 2 positions are parity (even) ___ ___ 0 ___ ___ k = Bits 1, 3, 5, 7, 9, 11 = __ = Odd number of ones, what is bit 1? Bits 2-3, 6-7, = __ = Even number of ones, what is bit 2? Bits 4-7, = __ = Even number of ones, what is bit 4? Bits = __ = Odd number of ones, what is bit 8?

220 ASCII is really a 7-bit code 7 data bits in an 11-bit codeword Types of Hamming codes ASCII is really a 7-bit code 7 data bits in an 11-bit codeword 'H' encoded as Not very efficient for memory 72/64 frequently used 64 data bits, 8 check bits Allows Single Error Correction, Double Error Detection (SEC-DED)

221 Single error correction exampleSuppose we have 12-bit Hamming code Memory value read is 0XE4F Is this valid? If not, what is wrong?

222 Error Correcting Code Function

223 Advanced DRAM OrganizationBasic DRAM same since first RAM chips Enhanced DRAM Contains small SRAM as well SRAM holds last line read (c.f. Cache!) Cache DRAM Larger SRAM component Use as cache or serial buffer

224 Synchronous DRAM (SDRAM)Access is synchronized with an external clock Address is presented to RAM RAM finds data (CPU waits in conventional DRAM) Since SDRAM moves data in time with system clock, CPU knows when data will be ready CPU does not have to wait, it can do something else Burst mode allows SDRAM to set up stream of data and fire it out in block DDR-SDRAM sends data twice per clock cycle (leading & trailing edge)

225 SDRAM

226 SDRAM Read Timing

227 Adopted by Intel for Pentium & Itanium Main competitor to SDRAM RAMBUS Adopted by Intel for Pentium & Itanium Main competitor to SDRAM Vertical package – all pins on one side Data exchange over 28 wires < cm long Bus addresses up to 320 RDRAM chips at 1.6Gbps Asynchronous block protocol 480ns access time Then 1.6 Gbps

228 RAMBUS Diagram

229 SDRAM can only send data once per clock DDR SDRAM SDRAM can only send data once per clock Double-data-rate SDRAM can send data twice per clock cycle Rising edge and falling edge

230 DDR SDRAM Read Timing

231 Simplified DRAM Read Timing

232 Integrates small SRAM cache (16 kb) onto generic DRAM chip Cache DRAM Mitsubishi Integrates small SRAM cache (16 kb) onto generic DRAM chip Used as true cache 64-bit lines Effective for ordinary random access To support serial access of block of data E.g. refresh bit-mapped screen CDRAM can prefetch data from DRAM into SRAM buffer Subsequent accesses solely to SRAM

233 Reading The RAM Guide RDRAM

234 William Stallings Computer Organization and Architecture 8th EditionChapter 6 External Memory

235 Types of External MemoryMagnetic Disk RAID Removable Optical CD-ROM CD-Recordable (CD-R) CD-R/W DVD Magnetic Tape

236 Disk substrate coated with magnetizable material (iron oxide…rust) Magnetic Disk Disk substrate coated with magnetizable material (iron oxide…rust) Substrate used to be aluminium Now glass Improved surface uniformity Increases reliability Reduction in surface defects Reduced read/write errors Lower flight heights (See later) Better stiffness Better shock/damage resistance

237 Read and Write MechanismsRecording & retrieval via conductive coil called a head May be single read/write head or separate ones During read/write, head is stationary, platter rotates Write Current through coil produces magnetic field Pulses sent to head Magnetic pattern recorded on surface below Read (traditional) Magnetic field moving relative to coil produces current Coil is the same for read and write Read (contemporary) Separate read head, close to write head Partially shielded magneto resistive (MR) sensor Electrical resistance depends on direction of magnetic field High frequency operation Higher storage density and speed

238 Inductive Write MR Read

239 Data Organization and FormattingConcentric rings or tracks Gaps between tracks Reduce gap to increase capacity Same number of bits per track (variable packing density) Constant angular velocity Tracks divided into sectors Minimum block size is one sector May have more than one sector per block

240 Disk Data Layout

241 Disk Velocity Bit near centre of rotating disk passes fixed point slower than bit on outside of disk Increase spacing between bits in different tracks Rotate disk at constant angular velocity (CAV) Gives pie shaped sectors and concentric tracks Individual tracks and sectors addressable Move head to given track and wait for given sector Waste of space on outer tracks Lower data density Can use zones to increase capacity Each zone has fixed bits per track More complex circuitry

242 Disk Layout Methods Diagram

243 Must be able to identify start of track and sector Format diskFinding Sectors Must be able to identify start of track and sector Format disk Additional information not available to user Marks tracks and sectors

244 Winchester Disk Format Seagate ST506

245 Fixed (rare) or movable head Removable or fixed Characteristics Fixed (rare) or movable head Removable or fixed Single or double (usually) sided Single or multiple platter Head mechanism Contact (Floppy) Fixed gap Flying (Winchester)

246 Fixed/Movable Head DiskFixed head One read write head per track Heads mounted on fixed ridged arm Movable head One read write head per side Mounted on a movable arm

247 Removable or Not Removable disk Nonremovable diskCan be removed from drive and replaced with another disk Provides unlimited storage capacity Easy data transfer between systems Nonremovable disk Permanently mounted in the drive

248 Heads are joined and aligned Multiple Platter One head per side Heads are joined and aligned Aligned tracks on each platter form cylinders Data is striped by cylinder reduces head movement Increases speed (transfer rate)

249 Multiple Platters

250 Tracks and Cylinders

251 Floppy Disk 8”, 5.25”, 3.5” Small capacity Slow Universal CheapUp to 1.44Mbyte (2.88M never popular) Slow Universal Cheap Mostly obsolete

252 Winchester Hard Disk (1)Developed by IBM in Winchester (USA) Sealed unit One or more platters (disks) Heads fly on boundary layer of air as disk spins Very small head to disk gap Getting more robust

253 Winchester Hard Disk (2)Universal Cheap Fastest external storage Getting larger all the time 250 Gigabyte now easily available 1+ Terabye now easily available

254 Access time = Seek + Latency Transfer rateSpeed Seek time Moving head to correct track (Rotational) latency Waiting for data to rotate under head Access time = Seek + Latency Transfer rate

255 Timing of Disk I/O Transfer

256 RAID Redundant Array of Independent Disks Redundant Array of Inexpensive Disks 6 levels in common use Not a hierarchy Set of physical disks viewed as single logical drive by O/S Data distributed across physical drives Can use redundant capacity to store parity information

257 Data striped across all disks Round Robin striping Increase speedRAID 0 No redundancy Data striped across all disks Round Robin striping Increase speed Multiple data requests probably not on same disk Disks seek in parallel A set of data is likely to be striped across multiple disks

258 Data is striped across disks 2 copies of each stripe on separate disks RAID 1 Mirrored Disks Data is striped across disks 2 copies of each stripe on separate disks Read from either Write to both Recovery is simple Swap faulty disk & re-mirror No down time Expensive

259 Disks are synchronized Very small stripes RAID 2 Disks are synchronized Very small stripes Often single byte/word Error correction calculated across corresponding bits on disks Multiple parity disks store Hamming code error correction in corresponding positions Lots of redundancy Expensive Not used

260 RAID 3 Similar to RAID 2 Only one redundant disk, no matter how large the array Simple parity bit for each set of corresponding bits Data on failed drive can be reconstructed from surviving data and parity info Very high transfer rates

261 RAID 4 Each disk operates independently Good for high I/O request rate Large stripes Bit by bit parity calculated across stripes on each disk Parity stored on parity disk

262 RAID 5 Like RAID 4 Parity striped across all disks Round robin allocation for parity stripe Avoids RAID 4 bottleneck at parity disk Commonly used in network servers N.B. DOES NOT MEAN 5 DISKS!!!!!

263 Two parity calculations Stored in separate blocks on different disks RAID 6 Two parity calculations Stored in separate blocks on different disks User requirement of N disks needs N+2 High data availability Three disks need to fail for data loss Significant write penalty

264 RAID 0, 1, 2

265 RAID 3 & 4

266 RAID 5 & 6

267 Data Mapping For RAID 0

268 Optical Storage CD-ROMOriginally for audio 650Mbytes giving over 70 minutes audio Polycarbonate coated with highly reflective coat, usually aluminium Data stored as pits Read by reflecting laser Constant packing density Constant linear velocity

269 CD Operation

270 Other speeds are quoted as multiples e.g. 24x CD-ROM Drive Speeds Audio is single speed Constant linear velocity 1.2 ms-1 Track (spiral) is 5.27km long Gives 4391 seconds = 73.2 minutes Other speeds are quoted as multiples e.g. 24x Quoted figure is maximum drive can achieve

271 CD-ROM Format Mode 0=blank data field Mode 1=2048 byte data+error correction Mode 2=2336 byte data

272 Random Access on CD-ROMDifficult Move head to rough position Set correct speed Read address Adjust to required location Slow vs. hard drive

273 CD-ROM for & against Large capacity (?) Easy to mass produce Removable Robust Expensive for small runs Slow Read only

274 Other Optical Storage CD-Recordable (CD-R) CD-RW WORM Now affordableCompatible with CD-ROM drives CD-RW Erasable Getting cheaper Mostly CD-ROM drive compatible Phase change Material has two different reflectivities in different phase states

275 Digital Versatile DiskDVD - what’s in a name? Digital Video Disk Used to indicate a player for movies Only plays video disks Digital Versatile Disk Used to indicate a computer drive Will read computer disks and play video disks Dogs Veritable Dinner Officially - nothing!!!

276 Very high capacity (4.7G per layer) Full length movie on single disk DVD - technology Multi-layer Very high capacity (4.7G per layer) Full length movie on single disk Using MPEG compression Finally standardized Movies carry regional coding Players only play correct region films Can be “fixed”

277 DVD – Writable Loads of trouble with standards First generation DVD drives may not read first generation DVD-W disks First generation DVD drives may not read CD-RW disks

278 CD and DVD

279 High Definition Optical DisksDesigned for high definition videos Much higher capacity than DVD Shorter wavelength laser Blue-violet range Smaller pits Blue-ray Data layer closer to laser Tighter focus, less distortion, smaller pits 25GB on single layer Available read only (BD-ROM), Recordable once (BR-R) and re-recordable (BR-RE)

280 Optical Memory Characteristics

281 Linear Tape-Open (LTO) Tape DrivesMagnetic Tape Serial access Slow Very cheap Backup and archive Linear Tape-Open (LTO) Tape Drives Developed late 1990s Open source alternative to proprietary tape systems

282 Linear Tape-Open (LTO) Tape DrivesRelease date 2000 2003 2005 2007 TBA Compressed capacity 200 GB 400 GB 800 GB 1600 GB 3.2 TB 6.4 TB Compressed transfer rate (MB/s) 40 80 160 240 360 540 Linear density (bits/mm) 4880 7398 9638 13300 Tape tracks 384 512 704 896 Tape length 609 m 680 m 820 m Tape width (cm) 1.27 Write elements 8 16

283 Optical Storage Technology AssociationInternet Resources Optical Storage Technology Association Good source of information about optical storage technology and vendors Extensive list of relevant links DLTtape Good collection of technical information and links to vendors Search on RAID