1 CSC311: Data Structures Java ReviewFall 2006 CSC311: Data Structures
2 The Java Virtual Machine (JVM)Virtual machine -- the logical machine (non-physical machine) that performs machine functions, typically implemented in software on top of a "real" hardware platform and operating system JVM -- the software implementation of a "CPU" designed to run compiled Java code Includes stand-alone Java applications, as well as "applets" that are downloaded and run in Web browsers such as the NetScape Navigator Thanks to the JVM, programs written in Java don't have to be rewritten to run on different computers Need an interpreter to translate JVM code to machine code Fall 2006 CSC311: Data Structures
3 Class and Object What is a class? What is an object?A class is corresponding to a concept, e.g. People, Student, Faculty, Staff, Rectangle, Circle, etc In Java, a class is a set of objects with the same behavior What is an object? An object is corresponding an entity, e.g. Jack, Peter In Java, an object is an entity that you can manipulate in your programs (by invoking methods) Each object belongs to a class Object constructions Fall 2006 CSC311: Data Structures
4 Class Components Components DeclarationFields – name, type and specifier Methods – signature (prototype) Constructors – object initialization Declaration Variable declaration Object variables w/o initialization Method declaration // method calls Fall 2006 CSC311: Data Structures
5 Data Types Data types Type conversionPrimitive data types – int, long, short, double, float, boolean, char Classes – String, Object Type conversion Explicit type conversion – casting may cause information lost Implicit type conversion – default no information lost Fall 2006 CSC311: Data Structures
6 Casting and Autoboxing/UnboxingCasting: (type) expr Autoboxing Implicit casting Number – Integer, Double, Float, etc Anytime a Number object is expected as a parameter to a method, the corresponding base type can be passed: Number base type Unboxing Anytime a base type is expected in an expression involving a Number reference, that Number object is changed to the corresponding base type base type Number Fall 2006 CSC311: Data Structures
7 Enum Types Declaration Example Constants -- finalmodifier enum name {vale_name0, value_name1, …, value_namen-1 Example Public enum Day{Mon, Tue, Wed, Thu, Fri, Sat, Sun}; Constants -- final Fall 2006 CSC311: Data Structures
8 Statements Assignments – location Decisions IterationsIf-then If-then-else Switch-case Iterations while do-while for Restricted gotos break continue return Boolean expressions and short-circuit evaluation Fall 2006 CSC311: Data Structures
9 Simple Input and OutputSimple output methods: Built-in static object: System.out An instance of java.io.PrintStream class print(Object) Print(String) print(base_type) Println(String) Simple input methods Built-in special object: System.in Related class: java.util.Scanner Scanner sc = new Scanner(System.in); hasNext() next() hasNextType() nextType() hasNextLine() nextLine() findInLine(String) Fall 2006 CSC311: Data Structures
10 Designing classes Choosing classes Principles:A class represents a single concept Concepts from mathematics: Point, Rectangle, Ellipse Concepts from real life: BankAccount, Person, Student Utility classes--no objects, only static methods Math Principles: Responsibilities Independence Behaviors Fall 2006 CSC311: Data Structures
11 Designing classes Cohesion and CouplingCohesive: all methods are closely related to the single concept that the class represents Coupling: A class depends on another if it calls one of its methods High Coupling many class dependencies Minimize coupling to minimize the impact of interface changes Fall 2006 CSC311: Data Structures
12 Accessor and Mutator ClassesAccessor: does not change the state of the implicit parameter (e.g. getBalance()) Mutator: changes the state of the implicit parameter (e.g. deposit(double amount ) Rule of thumb: Mutator should return void Immutable class: all methods are accessors (e.g. String) Fall 2006 CSC311: Data Structures
13 Preconditions/postconditionsThe condition that must be met before the method executes Publish preconditions so the caller won't call methods with bad parameters Postcondition The condition that's true after a method has completed If not, the method functioned incorrectly Fall 2006 CSC311: Data Structures
14 Scope Scope of variable: region of program where the variable can be referred by its name Local variable scope: from definition to end of block Class scope: all methods of the class Overlapping scope: local scope wins over class scope Static scope Static fields Define a field that belongs to a class instead of any object Non-static fields are instance fields, while static fields are class fields Minimize the use of static fields Static methods No implicit parameter Too many static methods are a sign of too little Object-oriented Fall 2006 CSC311: Data Structures
15 Coding and Pseudo-codeprogramming in Java IDE Pseudo-code Mixes natural language with standard programming language constructs Constructs: Expressions Method declarations Decision structures Loop structures Array indexing Method calls Method returns comments Fall 2006 CSC311: Data Structures
16 Object-oriented DesignGoals Robustness Adaptability Reusability Principles Abstraction Encapsulation Modularity Design Patterns Describe a solution to a typical software design problem Some typical design patterns: Position, Adaptor, Iterator, Template method, Composition, Comparator, Amortization, Divide-and-conquer, etc. Fall 2006 CSC311: Data Structures
17 Inheritance Specialization and generalization (extension)Specialization – subclass Generalization -- superclass Method overriding Refinement Replacement The keyword this Fall 2006 CSC311: Data Structures
18 Inheritance HierarchiesHierarchies of classes, subclasses, and sub-subclasses are common Similar to concept hierarchies: Person has subclasses like Faculty, Student, Staff; Faculty may have subclasses FullTimeFaculty and PartTimeFaculty; FullTimeFaculty may have TenuredFaculty and TenureTrackFaculty; Student may have subclasses FullTimeStudent and PartTimeStudent; etc. A superclass can have multiple subclasses, but a subclass should have at most ONE superclass Fall 2006 CSC311: Data Structures
19 Inheritance of MethodsOverride method: Supply a different implementation of a method that exists in the superclass Inherit method: Don't supply a new implementation of a method that exists in the superclass Add method: Supply a new method that doesn't exist in the superclass Subclass can not access the private methods of superclass Calling a Superclass Method/constructor Fall 2006 CSC311: Data Structures
20 Inheritance of Fields Inherit field: All fields from the superclass are automatically inherited Add field: Supply a new field that doesn't exist in the superclass Can't override fields Subclass can not access the private fields of superclass Fall 2006 CSC311: Data Structures
21 Polymorphism Polymorphism (greek: many shapes): The type of the object determines the method to call Called late binding. Resolved at runtime Different from overloading. Overloading is resolved by the compiler Converting from subclasses to superclass The instanceof method Test whether an object is of a specified class Example: x instanceof String to test if x is a string Fall 2006 CSC311: Data Structures
22 Access control public – accessible anywhereprivate – accessible inside the class protected -- accessible by class, subclasses and package default – no specifier, accessible by package Recommended Access Levels Fields: Always private Exception: public static final constants Methods: public or private Classes: public or package Don't use protected Beware of accidental package access (forgetting public or private) Fall 2006 CSC311: Data Structures
23 The Object Object The Cosmic SuperclassAll classes extend Object by default Most useful methods: String toString(): to convert the object to a string message boolean equals(Object otherObject): to test if the object is equal to the given object otherObject Object clone(): to create a new copy of the object Fall 2006 CSC311: Data Structures
24 Exceptions Throwing exceptions Catching exceptionsTry-catch-finally block try { … } catch (exception_type1 e1) { … } catch (exception_type2 e2) { … } … finally { … } Fall 2006 CSC311: Data Structures
25 Interfaces Special classes All methods in an interface are abstractno implementation All methods in an interface are automatically public An interface doesn't have instance fields An interface doesn’t have constructors Interface implementation Should be implemented by classes Multiple inheritance in Interfaces Fall 2006 CSC311: Data Structures
26 Abstract Class and MethodCannot have direct instances (objects) Must be extended by subclass(es) Can implement some methods (main difference from interfaces) Abstract method Must be overridden by subclass(es) Must be inside abstract classes Non-abstract class cannot have abstract method Fall 2006 CSC311: Data Structures
27 Strong Typing Strong typing Java All variables must be typedA strong-typing language An object can be viewed as being of various types But declared as being of only one type Fall 2006 CSC311: Data Structures
28 Generics Casting Generics Widening conversion: to superclassNarrowing conversion: to subclass Casting exception Casting with interfaces Generics Generic type: a type that is not defined at compilation time, but becomes fully specified at run time Formal type parameters Actual type parameters Fall 2006 CSC311: Data Structures
29 Generics: Example public class Pair
30 Arrays A set of data cells with fixed size, each cell holding a data item and all data items being the same type Purpose: to construct a linear structure to hold a given number of elements. Once an array is created, its size is fixed – length Indices starting from 0 Common algorithms on arrays Find max/min values Find average value Linear search Copying array and clone array High-dimensional arrays Fall 2006 CSC311: Data Structures
31 Array List ArrayList is a class – data typePurpose: to construct a linear structure to hold a flexible number of elements Methods: size() – returns the number of elements in the ArrayList: a.size() get(
32 Recursion A recursive computation solves a problem by using the solution of the same problem with simpler input For recursion to terminate, there must be special cases for the simplest inputs The simplest case can be directly solved The recursive cases should call the method itself Fall 2006 CSC311: Data Structures
33 Chapter 3 Arrays, Linked Lists, and RecursionObjectives Using Arrays Singly Linked Lists Doubly Linked Lists Circularly Linked Lists Linked-List Sorting Recursion Fall 2006 CSC311: Data Structures
34 Arrays Declaration Properties Operations Index: 0 through length-1Length: the number of elements Fixed length Elements could be objects Operations Insertion: add(Object e) Deletion: remove(Object e) remove(int i) Search: find(Object e) Fall 2006 CSC311: Data Structures
35 Sorting an Array Insertion-sort algorithm Selection-sort algorithmMerge-sort algorithm Bubble-sort algorithm Quick-sort algorithm Fall 2006 CSC311: Data Structures
36 java.util Methods for ArraysSimple methods equals(A, B) fill(A, x) sort(A) toString(A) Fall 2006 CSC311: Data Structures
37 Pseudo-Random NumbersRandom rand = new Random(); Rand.setSeed(System.currentTimeMills()); …… rand.nextInt(100); // between 0 and 99 Pseudo-random number generator seed Fall 2006 CSC311: Data Structures
38 Two-Dmensional ArraysArray of arrays Index Length Reference Assignment of arrays Matrix High-dimension arrays Fall 2006 CSC311: Data Structures
39 Singly Linked Lists Linked list: a collection of nodes in a linear order Nodes Link – pointing to the next node Data Head – refer to the first node Operations: Insertion addFirst(Object) -- O(1) addLast(Object) -- O(1) Removal removeFirst() -- O(1) removeLast() -- O(1) Search find(Object) -- O(n) Implementation Boundary scenarios: insert into and remove from an empty list Fall 2006 CSC311: Data Structures
40 Doubly Linked Lists Doubly linked list – a linked list with each node having two links pointing to its previous and next nodes respectively Node: DNode Fields: Data – element Link to previous node – prev Link to next node – next Methods: getElement() getNext() getPrev() setElement(Object) setNext(DNode) setPrev(DNode) Fall 2006 CSC311: Data Structures
41 Doubly Linked Lists Header and Trailer Sentinels Operations – at endsSeparate header and trailer One header, one trailer Integrated header and trailer One node with prev pointing to the last node and next pointing to the first node Operations – at ends Insertion Removal Operations – in the middle removal Fall 2006 CSC311: Data Structures
42 Circularly Linked ListsA linked list without head or tail Traversal means circle through all nodes Cursor: current node Operations: add(Object) – immediately after the cursor remove() – immediately after the cursor advance() – go to the next node Fall 2006 CSC311: Data Structures
43 Sorting a Linked List Insertion-sort Selection-sortUsing single linked list Start from the first to the current Find the appropriate position and insert Using two linked list Remove the first from the source list Insert into the target list Selection-sort Select the maximum from the original first Insert at the first Select the minimum from the source list Insert at the last to the target list Fall 2006 CSC311: Data Structures
44 Recursion Pattern Recursion: when a method calls itselfClassic example--the factorial function: n! = 1· 2· 3· ··· · (n-1)· n Recursive definition: As a Java method: // recursive factorial function public static int recursiveFactorial(int n) { if (n == 0) return 1; // basis case else return n * recursiveFactorial(n- 1); // recursive case } Fall 2006 CSC311: Data Structures
45 Linear Recursion Test for base cases. Recur once.Begin by testing for a set of base cases (there should be at least one). Every possible chain of recursive calls must eventually reach a base case, and the handling of each base case should not use recursion. Recur once. Perform a single recursive call. (This recursive step may involve a test that decides which of several possible recursive calls to make, but it should ultimately choose to make just one of these calls each time we perform this step.) Define each possible recursive call so that it makes progress towards a base case. Fall 2006 CSC311: Data Structures
46 A Simple Example of Linear RecursionExample recursion trace: Algorithm LinearSum(A, n): Input: An integer array A and an integer n 1, such that A has at least n elements Output: The sum of the first n integers in A if n = 1 then return A[0] else return LinearSum(A, n - 1) + A[n - 1] LinearSum ( A , 5 ) 1 2 3 4 call return [ ] = + = 7 6 13 15 20 Fall 2006 CSC311: Data Structures
47 Reversing an Array Algorithm ReverseArray(A, i, j):Input: An array A and nonnegative integer indices i and j Output: The reversal of the elements in A starting at index i and ending at j if i < j then Swap A[i] and A[ j] ReverseArray(A, i + 1, j - 1) return Fall 2006 CSC311: Data Structures
48 Defining Arguments for RecursionIn creating recursive methods, it is important to define the methods in ways that facilitate recursion. This sometimes requires we define additional parameters that are passed to the method. For example, we defined the array reversal method as ReverseArray(A, i, j), not ReverseArray(A). Fall 2006 CSC311: Data Structures
49 Computing Powers The power function, p(x,n)=xn, can be defined recursively: This leads to an power function that runs in O(n) time (for we make n recursive calls). We can do better than this, however. Fall 2006 CSC311: Data Structures
50 Recursive Squaring We can derive a more efficient linearly recursive algorithm by using repeated squaring: For example, 24 = 2(4/2)2 = (24/2)2 = (22)2 = 42 = 16 25 = 21+(4/2)2 = 2(24/2)2 = 2(22)2 = 2(42) = 32 26 = 2(6/ 2)2 = (26/2)2 = (23)2 = 82 = 64 27 = 21+(6/2)2 = 2(26/2)2 = 2(23)2 = 2(82) = 128. Fall 2006 CSC311: Data Structures
51 A Recursive Squaring MethodAlgorithm Power(x, n): Input: A number x and integer n = 0 Output: The value xn if n = 0 then return 1 if n is odd then y = Power(x, (n - 1)/ 2) return x · y ·y else y = Power(x, n/ 2) return y · y Fall 2006 CSC311: Data Structures
52 Analyzing the Recursive Squaring MethodAlgorithm Power(x, n): Input: A number x and integer n = 0 Output: The value xn if n = 0 then return 1 if n is odd then y = Power(x, (n - 1)/ 2) return x · y · y else y = Power(x, n/ 2) return y · y Each time we make a recursive call we halve the value of n; hence, we make log n recursive calls. That is, this method runs in O(log n) time. It is important that we used a variable twice here rather than calling the method twice. Fall 2006 CSC311: Data Structures
53 Tail Recursion Tail recursion occurs when a linearly recursive method makes its recursive call as its last step. The array reversal method is an example. Such methods can be easily converted to non-recursive methods (which saves on some resources). Example: Algorithm IterativeReverseArray(A, i, j ): Input: An array A and nonnegative integer indices i and j Output: The reversal of the elements in A starting at index i and ending at j while i < j do Swap A[i ] and A[ j ] i = i + 1 j = j - 1 return Fall 2006 CSC311: Data Structures
54 Binary Recursive MethodBinary recursion occurs whenever there are two recursive calls for each non-base case. Example Problem: add all the numbers in an integer array A: Algorithm BinarySum(A, i, n): Input: An array A and integers i and n Output: The sum of the n integers in A starting at index i if n = 1 then return A[i ] return BinarySum(A, i, n/ 2) + BinarySum(A, i + n/ 2, n/ 2) Example trace: 3 , 1 2 4 8 7 6 5 Fall 2006 CSC311: Data Structures
55 Computing Fibanacci NumbersFibonacci numbers are defined recursively: F0 = 0 F1 = 1 Fi = Fi-1 + Fi for i > 1. As a recursive algorithm (first attempt): Algorithm BinaryFib(k): Input: Nonnegative integer k Output: The kth Fibonacci number Fk if k <= 1 then return k else return BinaryFib(k - 1) + BinaryFib(k - 2) Fall 2006 CSC311: Data Structures
56 Analyzing the Binary Recursion Fibonacci AlgorithmLet nk denote number of recursive calls made by BinaryFib(k). Then n0 = 1 n1 = 1 n2 = n1 + n0 + 1 = = 3 n3 = n2 + n1 + 1 = = 5 n4 = n3 + n2 + 1 = = 9 n5 = n4 + n3 + 1 = = 15 n6 = n5 + n4 + 1 = = 25 n7 = n6 + n5 + 1 = = 41 n8 = n7 + n6 + 1 = = 67. Note that the value at least doubles for every other value of nk. That is, nk > 2k/2. It is exponential! Fall 2006 CSC311: Data Structures
57 A Better Fibonacci AlgorithmUse linear recursion instead: Algorithm LinearFibonacci(k): Input: A nonnegative integer k Output: Pair of Fibonacci numbers (Fk, Fk-1) if k <= 1 then return (k, 0) else (i, j) = LinearFibonacci(k - 1) return (i +j, i) Runs in O(k) time. Fall 2006 CSC311: Data Structures
58 Multiple Recursion Motivating example: summation puzzlespot + pan = bib dog + cat = pig boy + girl = baby Multiple recursion: makes potentially many recursive calls (not just one or two). Fall 2006 CSC311: Data Structures
59 Algorithm for Multiple RecursionAlgorithm PuzzleSolve(k,S,U): Input: An integer k, sequence S, and set U (the universe of elements to test) Output: An enumeration of all k-length extensions to S using elements in U without repetitions for all e in U do Remove e from U {e is now being used} Add e to the end of S if k = 1 then Test whether S is a configuration that solves the puzzle if S solves the puzzle then return “Solution found: ” S else PuzzleSolve(k - 1, S,U) Add e back to U {e is now unused} Remove e from the end of S Fall 2006 CSC311: Data Structures
60 Visualizing PuzzleSolve( 3 , () ,{ a b c } ) Initial call 2 1 ab ac cb ca bc ba abc acb bac bca cab cba Fall 2006 CSC311: Data Structures
61 Chapter 4 Analysis ToolsObjectives Experiment analysis of algorithms and limitations Theoretical Analysis of algorithms Pseudo-code description of algorithms Big-Oh notations Seven functions Proof techniques Fall 2006 CSC311: Data Structures
62 Analysis of AlgorithmsInput Algorithm Output An algorithm is a step-by-step procedure for solving a problem in a finite amount of time.
63 Running Time Most algorithms transform input objects into output objects. The running time of an algorithm typically grows with the input size. Average case time is often difficult to determine. We focus on the worst case running time. Easier to analyze Crucial to applications such as games, finance and robotics Fall 2006 CSC311: Data Structures
64 Experimental Studies Write a program implementing the algorithmRun the program with inputs of varying size and composition Use a method like System.currentTimeMillis() to get an accurate measure of the actual running time Plot the results Fall 2006 CSC311: Data Structures
65 Limitations of ExperimentsIt is necessary to implement the algorithm, which may be difficult Results may not be indicative of the running time on other inputs not included in the experiment. In order to compare two algorithms, the same hardware and software environments must be used Fall 2006 CSC311: Data Structures
66 Theoretical Analysis Uses a high-level description of the algorithm instead of an implementation Characterizes running time as a function of the input size, n. Takes into account all possible inputs Allows us to evaluate the speed of an algorithm independent of the hardware/software environment Fall 2006 CSC311: Data Structures
67 Pseudocode Example: find max element of an arrayAlgorithm arrayMax(A, n) Input array A of n integers Output maximum element of A currentMax A[0] for i 1 to n 1 do if A[i] currentMax then currentMax A[i] return currentMax Example: find max element of an array High-level description of an algorithm More structured than English prose Less detailed than a program Preferred notation for describing algorithms Hides program design issues Fall 2006 CSC311: Data Structures
68 Pseudocode Details Control flow Method declaration Method callif … then … [else …] while … do … repeat … until … for … do … Indentation replaces braces Method declaration Algorithm method (arg [, arg…]) Input … Output … Method call var.method (arg [, arg…]) Return value return expression Expressions Assignment (like in Java) Equality testing (like in Java) n2 Superscripts and other mathematical formatting allowed Fall 2006 CSC311: Data Structures
69 The Random Access Machine (RAM) ModelA CPU An potentially unbounded bank of memory cells, each of which can hold an arbitrary number or character 1 2 Memory cells are numbered and accessing any cell in memory takes unit time. Fall 2006 CSC311: Data Structures
70 Seven Important FunctionsSeven functions that often appear in algorithm analysis: Constant 1 Logarithmic log n Linear n N-Log-N n log n Quadratic n2 Cubic n3 Exponential 2n In a log-log chart, the slope of the line corresponds to the growth rate of the function Fall 2006 CSC311: Data Structures
71 Primitive Operations Basic computations performed by an algorithmIdentifiable in pseudocode Largely independent from the programming language Exact definition not important (we will see why later) Assumed to take a constant amount of time in the RAM model Examples: Evaluating an expression Assigning a value to a variable Indexing into an array Calling a method Returning from a method Fall 2006 CSC311: Data Structures
72 Counting Primitive OperationsBy inspecting the pseudocode, we can determine the maximum number of primitive operations executed by an algorithm, as a function of the input size Algorithm arrayMax(A, n) currentMax A[0] # operations for i 1 to n 1 do n if A[i] currentMax then 2(n 1) currentMax A[i] 2(n 1) { increment counter i } 2(n 1) return currentMax Total 8n 2 Fall 2006 CSC311: Data Structures
73 Estimating Running TimeAlgorithm arrayMax executes 8n 2 primitive operations in the worst case. Define: a = Time taken by the fastest primitive operation b = Time taken by the slowest primitive operation Let T(n) be worst-case time of arrayMax. Then a (8n 2) T(n) b(8n 2) Hence, the running time T(n) is bounded by two linear functions Fall 2006 CSC311: Data Structures
74 Growth Rate of Running TimeChanging the hardware/ software environment Affects T(n) by a constant factor, but Does not alter the growth rate of T(n) The linear growth rate of the running time T(n) is an intrinsic property of algorithm arrayMax Fall 2006 CSC311: Data Structures
75 Constant Factors The growth rate is not affected by Examplesconstant factors or lower-order terms Examples 102n is a linear function 105n n is a quadratic function Fall 2006 CSC311: Data Structures
76 Big-Oh Notation f(n) cg(n) for n n0Given functions f(n) and g(n), we say that f(n) is O(g(n)) if there are positive constants c and n0 such that f(n) cg(n) for n n0 Example: 2n + 10 is O(n) 2n + 10 cn (c 2) n 10 n 10/(c 2) Pick c = 3 and n0 = 10 Fall 2006 CSC311: Data Structures
77 Big-Oh Example Example: the function n2 is not O(n) n2 cn n cThe above inequality cannot be satisfied since c must be a constant Fall 2006 CSC311: Data Structures
78 More Big-Oh Examples 7n-2 3n3 + 20n2 + 5 3 log n + 5 7n-2 is O(n)need c > 0 and n0 1 such that 7n-2 c•n for n n0 this is true for c = 7 and n0 = 1 3n3 + 20n2 + 5 3n3 + 20n2 + 5 is O(n3) need c > 0 and n0 1 such that 3n3 + 20n2 + 5 c•n3 for n n0 this is true for c = 4 and n0 = 21 3 log n + 5 3 log n + 5 is O(log n) need c > 0 and n0 1 such that 3 log n + 5 c•log n for n n0 this is true for c = 8 and n0 = 2 Fall 2006 CSC311: Data Structures
79 Big-Oh and Growth Rate The big-Oh notation gives an upper bound on the growth rate of a function The statement “f(n) is O(g(n))” means that the growth rate of f(n) is no more than the growth rate of g(n) We can use the big-Oh notation to rank functions according to their growth rate f(n) is O(g(n)) g(n) is O(f(n)) g(n) grows more Yes No f(n) grows more Same growth Fall 2006 CSC311: Data Structures
80 Big-Oh Rules If is f(n) a polynomial of degree d, then f(n) is O(nd), i.e., Drop lower-order terms Drop constant factors Use the smallest possible class of functions Say “2n is O(n)” instead of “2n is O(n2)” Use the simplest expression of the class Say “3n + 5 is O(n)” instead of “3n + 5 is O(3n)” Fall 2006 CSC311: Data Structures
81 Asymptotic Algorithm AnalysisThe asymptotic analysis of an algorithm determines the running time in big-Oh notation To perform the asymptotic analysis We find the worst-case number of primitive operations executed as a function of the input size We express this function with big-Oh notation Example: We determine that algorithm arrayMax executes at most 8n 2 primitive operations We say that algorithm arrayMax “runs in O(n) time” Since constant factors and lower-order terms are eventually dropped anyhow, we can disregard them when counting primitive operations Fall 2006 CSC311: Data Structures
82 Computing Prefix AveragesWe further illustrate asymptotic analysis with two algorithms for prefix averages The i-th prefix average of an array X is average of the first (i + 1) elements of X: A[i] = (X[0] + X[1] + … + X[i])/(i+1) Computing the array A of prefix averages of another array X has applications to financial analysis Fall 2006 CSC311: Data Structures
83 Prefix Averages (Quadratic)The following algorithm computes prefix averages in quadratic time by applying the definition Algorithm prefixAverages1(X, n) Input array X of n integers Output array A of prefix averages of X #operations A new array of n integers n for i 0 to n 1 do n s X[0] n for j 1 to i do …+ (n 1) s s + X[j] …+ (n 1) A[i] s / (i + 1) n return A Fall 2006 CSC311: Data Structures
84 Arithmetic ProgressionThe running time of prefixAverages1 is O( …+ n) The sum of the first n integers is n(n + 1) / 2 There is a simple visual proof of this fact Thus, algorithm prefixAverages1 runs in O(n2) time Fall 2006 CSC311: Data Structures
85 Prefix Averages (Linear)The following algorithm computes prefix averages in linear time by keeping a running sum Algorithm prefixAverages2(X, n) Input array X of n integers Output array A of prefix averages of X #operations A new array of n integers n s for i 0 to n 1 do n s s + X[i] n A[i] s / (i + 1) n return A Algorithm prefixAverages2 runs in O(n) time Fall 2006 CSC311: Data Structures
86 Math you need to Review Summations Logarithms logb(xy) = logbx + logbylogbxa = alogbx logba = logxa/logxb Exponentials: a(b+c) = aba c abc = (ab)c ab /ac = a(b-c) b = a logab bc = a c*logab Fall 2006 CSC311: Data Structures
87 Proof Techniques Proof techniques By counterexample ContrapositiveEx. Let a and b integers. If ab is even, then a is even or b is even Proof: Consider the contrapositive: if a is odd and b is odd. Then you can find out that ab is odd. Contradiction Ex. Let a and b be integers. If ab is odd, then a is odd and b is odd Proof: Consider the opposite of the “then” part. You’ll reach to a contradiction where ab is even Induction Two formats Base case Induction case Loop invariants Fall 2006 CSC311: Data Structures
88 Relatives of Big-Oh big-Omegaf(n) is (g(n)) if there is a constant c > 0 and an integer constant n0 1 such that f(n) c•g(n) for n n0 big-Theta f(n) is (g(n)) if there are constants c’ > 0 and c’’ > 0 and an integer constant n0 1 such that c’•g(n) f(n) c’’•g(n) for n n0 Fall 2006 CSC311: Data Structures
89 Intuition for Asymptotic NotationBig-Oh f(n) is O(g(n)) if f(n) is asymptotically less than or equal to g(n) big-Omega f(n) is (g(n)) if f(n) is asymptotically greater than or equal to g(n) big-Theta f(n) is (g(n)) if f(n) is asymptotically equal to g(n) Fall 2006 CSC311: Data Structures
90 Example Uses of the Relatives of Big-Oh5n2 is (n2) f(n) is (g(n)) if there is a constant c > 0 and an integer constant n0 1 such that f(n) c•g(n) for n n0 let c = 5 and n0 = 1 5n2 is (n) f(n) is (g(n)) if there is a constant c > 0 and an integer constant n0 1 such that f(n) c•g(n) for n n0 let c = 1 and n0 = 1 5n2 is (n2) f(n) is (g(n)) if it is (n2) and O(n2). We have already seen the former, for the latter recall that f(n) is O(g(n)) if there is a constant c > 0 and an integer constant n0 1 such that f(n) < c•g(n) for n n0 Let c = 5 and n0 = 1 Fall 2006 CSC311: Data Structures
91 Chapter 5 Stacks and QueuesObjectives Stacks and implementations Queues and implementations Double-ended queues and implementation Stacks and Queues CSC311: Data Structures
92 Abstract Data Types (ADTs)An abstract data type (ADT) is an abstraction of a data structure An ADT specifies: Data stored Operations on the data Error conditions associated with operations Stacks and Queues CSC311: Data Structures
93 ADT Example Example: ADT modeling a simple stock trading systemThe data stored are buy/sell orders The operations supported are order buy(stock, shares, price) order sell(stock, shares, price) void cancel(order) Error conditions: Buy/sell a nonexistent stock Cancel a nonexistent order Stacks and Queues CSC311: Data Structures
94 The Stack ADT The Stack ADT stores arbitrary objectsInsertions and deletions follow the last-in first-out scheme Think of a spring-loaded plate dispenser Main stack operations: push(object): inserts an element object pop(): removes and returns the last inserted element Auxiliary stack operations: object top(): returns the last inserted element without removing it integer size(): returns the number of elements stored boolean isEmpty(): indicates whether no elements are stored Stacks and Queues CSC311: Data Structures
95 Stack Interface in Javapublic interface Stack { public int size(); public boolean isEmpty(); public Object top() throws EmptyStackException; public void push(Object o); public Object pop() throws EmptyStackException; } Java interface corresponding to our Stack ADT Requires the definition of class EmptyStackException Different from the built-in Java class java.util.Stack Stacks and Queues CSC311: Data Structures
96 Exceptions Attempting the execution of an operation of ADT may sometimes cause an error condition, called an exception Exceptions are said to be “thrown” by an operation that cannot be executed In the Stack ADT, operations pop and top cannot be performed if the stack is empty Attempting the execution of pop or top on an empty stack throws an EmptyStackException Stacks and Queues CSC311: Data Structures
97 Applications of StacksDirect applications Page-visited history in a Web browser Undo sequence in a text editor Chain of method calls in the Java Virtual Machine Indirect applications Auxiliary data structure for algorithms Component of other data structures Stacks and Queues CSC311: Data Structures
98 Method Stack in the JVM The Java Virtual Machine (JVM) keeps track of the chain of active methods with a stack When a method is called, the JVM pushes on the stack a frame containing Local variables and return value Program counter, keeping track of the statement being executed When a method ends, its frame is popped from the stack and control is passed to the method on top of the stack Allows for recursion main() { int i = 5; foo(i); } foo(int j) { int k; k = j+1; bar(k); } bar(int m) { … } bar PC = 1 m = 6 foo PC = 3 j = 5 k = 6 main PC = 2 i = 5 Stacks and Queues CSC311: Data Structures
99 Array-based Stack Algorithm size() return t + 1 Algorithm pop()if isEmpty() then throw EmptyStackException else t t 1 return S[t + 1] A simple way of implementing the Stack ADT uses an array We add elements from left to right A variable keeps track of the index of the top element … S 1 2 t Stacks and Queues CSC311: Data Structures
100 Array-based Stack (cont.)The array storing the stack elements may become full A push operation will then throw a FullStackException Limitation of the array-based implementation Not intrinsic to the Stack ADT Algorithm push(o) if t = S.length 1 then throw FullStackException else t t + 1 S[t] o S 1 2 t … Stacks and Queues CSC311: Data Structures
101 Performance and LimitationsLet n be the number of elements in the stack The space used is O(n) Each operation runs in time O(1) Limitations The maximum size of the stack must be defined a priori and cannot be changed Trying to push a new element into a full stack causes an implementation-specific exception Stacks and Queues CSC311: Data Structures
102 Array-based Stack in Javapublic class ArrayStack implements Stack { // holds the stack elements private Object S[ ]; // index to top element private int top = -1; // constructor public ArrayStack(int capacity) { S = new Object[capacity]); } public Object pop() throws EmptyStackException { if isEmpty() throw new EmptyStackException (“Empty stack: cannot pop”); Object temp = S[top]; // facilitates garbage collection S[top] = null; top = top – 1; return temp; } …… Stacks and Queues CSC311: Data Structures
103 Parentheses Matching Each “(”, “{”, or “[” must be paired with a matching “)”, “}”, or “[” correct: ( )(( )){([( )])} correct: ((( )(( )){([( )])} incorrect: )(( )){([( )])} incorrect: ({[ ])} incorrect: ( Stacks and Queues CSC311: Data Structures
104 Parentheses Matching AlgorithmAlgorithm ParenMatch(X,n): Input: An array X of n tokens, each of which is either a grouping symbol, a variable, an arithmetic operator, or a number Output: true if and only if all the grouping symbols in X match Let S be an empty stack for i=0 to n-1 do if X[i] is an opening grouping symbol then S.push(X[i]) else if X[i] is a closing grouping symbol then if S.isEmpty() then return false {nothing to match with} if S.pop() does not match the type of X[i] then return false {wrong type} return true {every symbol matched} else return false {some symbols were never matched} Stacks and Queues CSC311: Data Structures
105 HTML Tag Matching The Little BoatFor fully-correct HTML, each The storm tossed the little The Little Boat
106 Tag Matching AlgorithmIs similar to parentheses matching: import java.util.StringTokenizer; import datastructures.Stack; import datastructures.NodeStack; import java.io.*; /** Simpli.ed test of matching tags in an HTML document. */ public class HTML { /** Nested class to store simple HTML tags */ public static class Tag { String name; // The name of this tag boolean opening; // Is true i. this is an opening tag public Tag() { // Default constructor name = ""; opening = false; } public Tag(String nm, boolean op) { // Preferred constructor name = nm; opening = op; /** Is this an opening tag? */ public boolean isOpening() { return opening; } /** Return the name of this tag */ public String getName() {return name; } /** Test if every opening tag has a matching closing tag. */ public boolean isHTMLMatched(Tag[ ] tag) { Stack S = new NodeStack(); // Stack for matching tags for (int i=0; (i
107 Tag Matching Algorithm, cont.public final static int CAPACITY = 1000; // Tag array size upper bound /* Parse an HTML document into an array of html tags */ public Tag[ ] parseHTML(BufferedReader r) throws IOException { String line; // a line of text boolean inTag = false ; // true iff we are in a tag Tag[ ] tag = new Tag[CAPACITY]; // our tag array (initially all null) int count = 0 ; // tag counter while ((line = r.readLine()) != null) { // Create a string tokenizer for HTML tags (use < and > as delimiters) StringTokenizer st = new StringTokenizer(line,"<> \t",true); while (st.hasMoreTokens()) { String token = (String) st.nextToken(); if (token.equals("<")) // opening a new HTML tag inTag = true; else if (token.equals(">")) // ending an HTML tag inTag = false; else if (inTag) { // we have a opening or closing HTML tag if ( (token.length() == 0) | | (token.charAt(0) != ’/’) ) tag[count++] = new Tag(token, true); // opening tag else // ending tag tag[count++] = new Tag(token.substring(1), false); // skip the } // Note: we ignore anything not in an HTML tag } return tag; // our array of tags /** Tester method */ public static void main(String[ ] args) throws IOException { BufferedReader stdr; // Standard Input Reader stdr = new BufferedReader(new InputStreamReader(System.in)); HTML tagChecker = new HTML(); if (tagChecker.isHTMLMatched(tagChecker.parseHTML(stdr))) System.out.println("The input file is a matched HTML document."); else System.out.println("The input file is not a matched HTML document."); Stacks and Queues CSC311: Data Structures
108 Computing Spans We show how to use a stack as an auxiliary data structure in an algorithm Given an an array X, the span S[i] of X[i] is the maximum number of consecutive elements X[j] immediately preceding X[i] and such that X[j] X[i] Spans have applications to financial analysis E.g., stock at 52-week high X 6 3 4 5 2 1 S Stacks and Queues CSC311: Data Structures
109 Quadratic Algorithm Algorithm spans1(X, n) Input array X of n integersOutput array S of spans of X # S new array of n integers n for i 0 to n 1 do n s n while s i X[i - s] X[i] …+ (n 1) s s …+ (n 1) S[i] s n return S Algorithm spans1 runs in O(n2) time Stacks and Queues CSC311: Data Structures
110 Computing Spans with a StackWe keep in a stack the indices of the elements visible when “looking back” We scan the array from left to right Let i be the current index We pop indices from the stack until we find index j such that X[i] X[j] We set S[i] i - j We push x onto the stack Stacks and Queues CSC311: Data Structures
111 Linear Algorithm Each index of the arrayIs pushed into the stack exactly once Is popped from the stack at most once The statements in the while-loop are executed at most n times Algorithm spans2 runs in O(n) time Algorithm spans2(X, n) # S new array of n integers n A new empty stack for i 0 to n 1 do n while (A.isEmpty() X[A.top()] X[i] ) do n A.pop() n if A.isEmpty() then n S[i] i n else S[i] i - A.top() n A.push(i) n return S Stacks and Queues CSC311: Data Structures
112 The Queue ADT Auxiliary queue operations: ExceptionsThe Queue ADT stores arbitrary objects Insertions and deletions follow the first-in first-out scheme Insertions are at the rear of the queue and removals are at the front of the queue Main queue operations: enqueue(object): inserts an element at the end of the queue object dequeue(): removes and returns the element at the front of the queue Auxiliary queue operations: object front(): returns the element at the front without removing it integer size(): returns the number of elements stored boolean isEmpty(): indicates whether no elements are stored Exceptions Attempting the execution of dequeue or front on an empty queue throws an EmptyQueueException Stacks and Queues CSC311: Data Structures
113 Queue Example Operation Output Q enqueue(5) – (5) enqueue(3) – (5, 3)dequeue() 5 (3) enqueue(7) – (3, 7) dequeue() 3 (7) front() 7 (7) dequeue() 7 () dequeue() “error” () isEmpty() true () enqueue(9) – (9) enqueue(7) – (9, 7) size() 2 (9, 7) enqueue(3) – (9, 7, 3) enqueue(5) – (9, 7, 3, 5) dequeue() 9 (7, 3, 5) Stacks and Queues CSC311: Data Structures
114 Applications of QueuesDirect applications Waiting lists, bureaucracy Access to shared resources (e.g., printer) Multiprogramming Indirect applications Auxiliary data structure for algorithms Component of other data structures Stacks and Queues CSC311: Data Structures
115 wrapped-around configurationArray-based Queue Use an array of size N in a circular fashion Two variables keep track of the front and rear f index of the front element r index immediately past the rear element Array location r is kept empty normal configuration Q 1 2 r f wrapped-around configuration Q 1 2 f r Stacks and Queues CSC311: Data Structures
116 Queue Operations We use the modulo operator (remainder of division)Algorithm size() return (N - f + r) mod N Algorithm isEmpty() return (f = r) Q 1 2 r f Q 1 2 f r Stacks and Queues CSC311: Data Structures
117 Queue Operations (cont.)Algorithm enqueue(o) if size() = N 1 then throw FullQueueException else Q[r] o r (r + 1) mod N Operation enqueue throws an exception if the array is full This exception is implementation-dependent Q 1 2 r f Q 1 2 f r Stacks and Queues CSC311: Data Structures
118 Queue Operations (cont.)Algorithm dequeue() if isEmpty() then throw EmptyQueueException else o Q[f] f (f + 1) mod N return o Operation dequeue throws an exception if the queue is empty This exception is specified in the queue ADT Q 1 2 r f Q 1 2 f r Stacks and Queues CSC311: Data Structures
119 Queue Interface in JavaJava interface corresponding to our Queue ADT Requires the definition of class EmptyQueueException No corresponding built-in Java class public interface Queue { public int size(); public boolean isEmpty(); public Object front() throws EmptyQueueException; public void enqueue(Object o); public Object dequeue() throws EmptyQueueException; } Stacks and Queues CSC311: Data Structures
120 Application: Round Robin SchedulersWe can implement a round robin scheduler using a queue, Q, by repeatedly performing the following steps: e = Q.dequeue() Service element e Q.enqueue(e) The Queue Shared Service 1 . Deque the next element 3 Enqueue the serviced element 2 Service the Stacks and Queues CSC311: Data Structures
121 Chapter 6 Lists and IteratorsObjectives Array Lists and Vectors: ADT and Implementation Node Lists: ADT and Implementations Iterators: ADT and Implementation List ADTs and Collections Design Patterns Adaptor Pattern Position Pattern Iterator Pattern Lists and Iterators CSC311: Data Structures
122 The Array List (Vector) ADTThe Array List ADT extends the notion of array by storing a sequence of arbitrary objects An element can be accessed, inserted or removed by specifying its rank (number of elements preceding it) An exception is thrown if an incorrect rank is specified (e.g., a negative rank) Main Array List operations: object get(int r): returns the element at rank r without removing it object set(int r, Object o): replace the element at rank with o and return the old element add(int r, Object o): insert a new element o to have rank r object remove (int r): removes and returns the element at rank r Additional operations size() and isEmpty() Lists and Iterators CSC311: Data Structures
123 Applications of Array ListsDirect applications Sorted collection of objects (elementary database) Indirect applications Auxiliary data structure for algorithms Component of other data structures Lists and Iterators CSC311: Data Structures
124 Adaptor Pattern Adaptor Pattern: adapting an existing class to a new class Contains an instance of the existing class as a instance field, and implement each method of the new class by calling the method of the instance field object Adapt Array List ADT to Deque ADT class Deque { private ArrayList al; public Deque() { al = new ArrayList(); } public Object getFirst() { return al.get(0); public void addFirst(Object e) { al.add(0, e); public Object removeFirst() { return al.remove(0); … Lists and Iterators CSC311: Data Structures
125 Array-based ImplementationUse an array V of size N A variable n keeps track of the size of the array list (number of elements stored) Operation get(r) is implemented in O(1) time by returning V[r] V 1 2 r n Lists and Iterators CSC311: Data Structures
126 Insertion In operation add(r, o), we need to make room for the new element by shifting forward the n - r elements V[r], …, V[n - 1] In the worst case (r = 0), this takes O(n) time Algorithm add(i, e) for t = n 1 to i do A[t+1] A[t] A[t] e n n+1 V 1 2 n r o Lists and Iterators CSC311: Data Structures
127 Deletion In operation remove(r), we need to fill the hole left by the removed element by shifting backward the n - r - 1 elements V[r + 1], …, V[n - 1] In the worst case (r = 0), this takes O(n) time Algorithm remove(i) e A[i] for t = i to n-2 do A[t] A[t+1] n n-1 return e V 1 2 n r o Lists and Iterators CSC311: Data Structures
128 Performance In the array based implementation of an Array ListThe space used by the data structure is O(n) size, isEmpty, get and set run in O(1) time add and remove run in O(n) time If we use the array in a circular fashion, add(0) and remove(0) run in O(1) time In an add operation, when the array is full, instead of throwing an exception, we can replace the array with a larger one Lists and Iterators CSC311: Data Structures
129 Growable Array-based ImplementationConsider an array list implemented with array In a add operation, when the array is full, instead of throwing an exception, we can replace the array with a larger one How large should the new array be? incremental strategy: increase the size by a constant c doubling strategy: double the size Algorithm add(i, o) if n = S.length 1 then A new array of size … for j 0 to i-1 do A[j] S[j] A[i] = o; for ji+1 to n do A[j] = S[j-1]; S A Lists and Iterators CSC311: Data Structures
130 Comparison of the StrategiesWe compare the incremental strategy and the doubling strategy by analyzing the total time T(n) needed to perform a series of n add operations We assume that we start with an empty array list represented by an array of size 1 We call amortized time of an add operation the average time taken by an add over the series of operations, i.e., T(n)/n Lists and Iterators CSC311: Data Structures
131 Incremental Strategy AnalysisWe replace the array k = n/c times The total time T(n) of a series of n push operations is proportional to n + c + 2c + 3c + 4c + … + kc = n + c( … + k) = n + ck(k + 1)/2 Since c is a constant, T(n) is O(n + k2), i.e., O(n2) The amortized time of an add operation is O(n) Lists and Iterators CSC311: Data Structures
132 Doubling Strategy AnalysisWe replace the array k = log2 n times The total time T(n) of a series of n add operations is proportional to n …+ 2k = n + 2k = 2n -1 T(n) is O(n) The amortized time of an add operation is O(1) geometric series 1 2 4 8 Lists and Iterators CSC311: Data Structures
133 Position ADT The Position ADT models the notion of place within a data structure where a single object is stored It gives a unified view of diverse ways of storing data, such as a cell of an array a node of a linked list Just one method: object element(): returns the element stored at the position Lists and Iterators CSC311: Data Structures
134 Node List ADT The Node List ADT models a sequence of positions storing arbitrary objects It establishes a before/after relation between positions Generic methods: size(), isEmpty() Accessor methods: first(), last() prev(p), next(p) Update methods: set(p, e) addBefore(p, e), addAftter(p, e), addFirst(e), addLast(e) remove(p) Lists and Iterators CSC311: Data Structures
135 Doubly Linked List A doubly linked list provides a natural implementation of the Node List ADT Nodes implement Position and store: element link to the previous node link to the next node Special trailer and header nodes prev next elem node trailer header nodes/positions elements Lists and Iterators CSC311: Data Structures
136 Insertion p A B C p q A B C X p q A B X CWe visualize operation addAfter(p, X), which returns position q p A B C p q A B C X p q A B X C Lists and Iterators CSC311: Data Structures
137 Insertion Algorithm Algorithm addAfter(p,e): Create a new node vv.setElement(e) v.setPrev(p) {link v to its predecessor} v.setNext(p.getNext()) {link v to its successor} (p.getNext()).setPrev(v) {link p’s old successor to v} p.setNext(v) {link p to its new successor, v} return v {the position for the element e} Lists and Iterators CSC311: Data Structures
138 Deletion A B C D p A B C p D A B CWe visualize remove(p), where p = last() A B C D p A B C p D A B C Lists and Iterators CSC311: Data Structures
139 Deletion Algorithm Algorithm remove(p):t = p.element {a temporary variable to hold the return value} (p.getPrev()).setNext(p.getNext()) {linking out p} (p.getNext()).setPrev(p.getPrev()) p.setPrev(null) {invalidating the position p} p.setNext(null) return t Lists and Iterators CSC311: Data Structures
140 Performance In the implementation of the Node List ADT by means of a doubly linked list The space used by a list with n elements is O(n) The space used by each position of the list is O(1) All the operations of the List ADT run in O(1) time Operation element() of the Position ADT runs in O(1) time Lists and Iterators CSC311: Data Structures
141 Sequence ADT The Sequence ADT is the union of the Array List, Deque, and Node List ADTs Elements accessed by index, or Position Generic methods: size(), isEmpty() Bridge methods: atIndex(r) indexOf(p) Lists and Iterators CSC311: Data Structures
142 Applications of SequencesThe Sequence ADT is a basic, general-purpose, data structure for storing an ordered collection of elements Direct applications: Generic replacement for stack, queue, vector, or list small database (e.g., address book) Indirect applications: Building block of more complex data structures Lists and Iterators CSC311: Data Structures
143 Multiple Inheritance in the Sequence ADTSequence has three super interfaces: Node List, Array List, and Deque Node List-based methods: first() last() prev(p) next(p) set(p, o) addBefore(p, o) addAfter(p, o) addFirst(o) addLast(o) remove(p) Array List-based methods: get(r) set(r, o) add(r, o) remove(r) Deque-based methods: addFirst(o) addLast(o) removeFirst() removeLast() getFirst() getLast() Lists and Iterators CSC311: Data Structures
144 Iterator ADT An iterator abstracts the process of scanning through a collection of elements Methods of the Iterator ADT: boolean hasNext() object next() Implementation with an array or singly linked list An iterator is typically associated with an another data structure Two notions of iterator: snapshot: freezes the contents of the data structure at a given time dynamic: follows changes to the data structure Lists and Iterators CSC311: Data Structures
145 Design Pattern: IteratorsWe can augment the Stack, Queue, Vector, List and Sequence ADTs with the following method to iterate all elements: Iterator elements() For ADTs that support the notion of position, such as list and sequences, we can provide a method to iterate all positions: Iterator positions() Lists and Iterators CSC311: Data Structures
146 Iterators in Java Java.util.Iterator interfaceIterable ADT provides a method: Iterator(): returns an iterator of the elements in the collection For-Each loop Syntax: for (type name: expression) loop_statement; Equivalence: for (Iterator
147 ListIterator List iterator gives access to elements inside a linked list ListIterator protects the linked list while giving access ListIterator encapsulates a position anywhere in the linked list Lists and Iterators CSC311: Data Structures
148 Conceptual View of the ListIteratorThink of an iterator as pointing between two links The listIterator method of the LinkedList class gets a list iterator LinkedList list = . . . ListIterator iterator = list.listIterator(); Lists and Iterators CSC311: Data Structures
149 ListIterator Methods The next method moves the iterator iterator.next(); next throws a NoSuchElementException if you are already past the end of the list hasNext returns true if there is a next element if (iterator.hasNext()) iterator.next(); The next method returns the object of the link that it is passing while iterator.hasNext() { Object obj = iterator.next(); //do something with the object } To move the list position backwards, use: hasPrevious previous Lists and Iterators CSC311: Data Structures
150 Collections in Java Collection Iterator List ListIterator Map QueueSet Lists and Iterators CSC311: Data Structures
151 Chapter 7: Trees Objectives:Introduce general tree structure and Tree ADT Discuss the depth and height of trees Introduce the tree traversal algorithms Specialize to binary trees Implement binary trees with linked structure and array-list structure Introduce the Template Method Pattern CSC311: Data Structures
152 What is a Tree In computer science, a tree is an abstract model of a hierarchical structure A tree consists of nodes with a parent-child relation Applications: Organization charts File systems Programming environments Computers”R”Us Sales R&D Manufacturing Laptops Desktops US International Europe Asia Canada Trees CSC311: Data Structures
153 Tree Terminology subtree Root: node without parent (A)Internal node: node with at least one child (A, B, C, F) External node (a.k.a. leaf ): node without children (E, I, J, K, G, H, D) Ancestors of a node: parent, grandparent, grand-grandparent, etc. Depth of a node: number of ancestors Height of a tree: maximum depth of any node (3) Descendant of a node: child, grandchild, grand-grandchild, etc. Subtree: tree consisting of a node and its descendants A B D C G H E F I J K subtree Trees CSC311: Data Structures
154 Tree ADT We use positions to abstract nodes Generic methods:integer size() boolean isEmpty() Iterator iterator() Iterator positions() Accessor methods: position root() position parent(p) positionIterator children(p) Query methods: boolean isInternal(p) boolean isExternal(p) boolean isRoot(p) Update method: object replace (p, o) Additional update methods may be defined by data structures implementing the Tree ADT Trees CSC311: Data Structures
155 Depth of a Node The depth of a node v in a tree T is defined as:Algorithm depth(T, v) Input: Tree T and a node v Output: an integer that is the depth of v in T If v is the root of T then return 0 Else return 1 + depth(T, parent(v)) The depth of a node v in a tree T is defined as: If v is the root, then its depth is 0; Otherwise, its depth is one plus the depth of its parent Trees CSC311: Data Structures
156 Height of a Node The height of a node v in a tree T is defined as:Algorithm height(T, v) Input: Tree T and a node v Output: an integer that is the height of v in T If v is an external of T then return 0 Else h0 for each child w of v in T do hmax(h, height(T,w)) return 1+h The height of a node v in a tree T is defined as: If v is an external, then its height is 0; Otherwise, its height is one plus the maximum height of its children Trees CSC311: Data Structures
157 Features on Height The height of a nonempty tree T is equal to the maximum depth of an external node of T Let T be a tree with n nodes, and let cv denote the number if children of a node v of T. The summing over the vertices in T, vcv=n-1. Trees CSC311: Data Structures
158 Preorder Traversal Algorithm preOrder(v) visit(v)A traversal visits the nodes of a tree in a systematic manner In a preorder traversal, a node is visited before its descendants Application: print a structured document Algorithm preOrder(v) visit(v) for each child w of v preorder (w) 1 Make Money Fast! 2 5 9 1. Motivations 2. Methods References 6 7 8 3 4 2.1 Stock Fraud 2.2 Ponzi Scheme 2.3 Bank Robbery 1.1 Greed 1.2 Avidity Trees CSC311: Data Structures
159 Postorder Traversal Algorithm postOrder(v) for each child w of vIn a postorder traversal, a node is visited after its descendants Application: compute space used by files in a directory and its subdirectories Algorithm postOrder(v) for each child w of v postOrder (w) visit(v) 9 cs16/ 8 3 7 todo.txt 1K homeworks/ programs/ 1 2 4 5 6 h1c.doc 3K h1nc.doc 2K DDR.java 10K Stocks.java 25K Robot.java 20K Trees CSC311: Data Structures
160 Binary Trees Applications:arithmetic expressions decision processes searching A binary tree is a tree with the following properties: Each internal node has at most two children (exactly two for proper binary trees) The children of a node are an ordered pair We call the children of an internal node left child and right child Alternative recursive definition: a binary tree is either a tree consisting of a single node, or a tree whose root has an ordered pair of children, each of which is a binary tree A B C F G D E H I Trees CSC311: Data Structures
161 Arithmetic Expression TreeBinary tree associated with an arithmetic expression internal nodes: operators external nodes: operands Example: arithmetic expression tree for the expression (2 (a - 1) + (3 b)) + - 2 a 1 3 b Trees CSC311: Data Structures
162 Decision Tree Binary tree associated with a decision processinternal nodes: questions with yes/no answer external nodes: decisions Example: dining decision Want a fast meal? Yes No How about coffee? On expense account? Yes No Yes No Starbucks Spike’s Al Forno Café Paragon Trees CSC311: Data Structures
163 BinaryTree ADT The BinaryTree ADT extends the Tree ADT, i.e., it inherits all the methods of the Tree ADT Additional methods: position left(p) position right(p) boolean hasLeft(p) boolean hasRight(p) Update methods may be defined by data structures implementing the BinaryTree ADT Trees CSC311: Data Structures
164 Properties of Proper Binary Treese = i + 1 n = 2e - 1 h i h (n - 1)/2 e 2h h log2 e h log2 (n + 1) - 1 Notation n number of nodes e number of external nodes i number of internal nodes h height Trees CSC311: Data Structures
165 Inorder Traversal Algorithm inOrder(v) if hasLeft (v)In an inorder traversal a node is visited after its left subtree and before its right subtree Application: draw a binary tree x(v) = inorder rank of v y(v) = depth of v Algorithm inOrder(v) if hasLeft (v) inOrder (left (v)) visit(v) if hasRight (v) inOrder (right (v)) 6 2 8 1 4 7 9 3 5 Trees CSC311: Data Structures
166 Print Arithmetic ExpressionsAlgorithm printExpression(v) if hasLeft (v) print(“(’’) inOrder (left(v)) print(v.element ()) if hasRight (v) inOrder (right(v)) print (“)’’) Specialization of an inorder traversal print operand or operator when visiting node print “(“ before traversing left subtree print “)“ after traversing right subtree + - 2 a 1 3 b ((2 (a - 1)) + (3 b)) Trees CSC311: Data Structures
167 Evaluate Arithmetic ExpressionsSpecialization of a postorder traversal recursive method returning the value of a subtree when visiting an internal node, combine the values of the subtrees Algorithm evalExpr(v) if isExternal (v) return v.element () else x evalExpr(leftChild (v)) y evalExpr(rightChild (v)) operator stored at v return x y + - 2 5 1 3 Trees CSC311: Data Structures
168 Euler Tour Traversal + 2 - 3 2 5 1Generic traversal of a binary tree Includes a special cases the preorder, postorder and inorder traversals Walk around the tree and visit each node three times: on the left (preorder) from below (inorder) on the right (postorder) + L R B 2 - 3 2 5 1 Trees CSC311: Data Structures
169 Template Method PatternGeneric algorithm that can be specialized by redefining certain steps Implemented by means of an abstract Java class Visit methods that can be redefined by subclasses Template method eulerTour Recursively called on the left and right children A Result object with fields leftResult, rightResult and finalResult keeps track of the output of the recursive calls to eulerTour public abstract class EulerTour { protected BinaryTree tree; protected void visitExternal(Position p, Result r) { } protected void visitLeft(Position p, Result r) { } protected void visitBelow(Position p, Result r) { } protected void visitRight(Position p, Result r) { } protected Object eulerTour(Position p) { Result r = new Result(); if tree.isExternal(p) { visitExternal(p, r); } else { visitLeft(p, r); r.leftResult = eulerTour(tree.left(p)); visitBelow(p, r); r.rightResult = eulerTour(tree.right(p)); visitRight(p, r); return r.finalResult; } … Trees CSC311: Data Structures
170 Specializations of EulerTourWe show how to specialize class EulerTour to evaluate an arithmetic expression Assumptions External nodes store Integer objects Internal nodes store Operator objects supporting method operation (Integer, Integer) public class EvaluateExpression extends EulerTour { protected void visitExternal(Position p, Result r) { r.finalResult = (Integer) p.element(); } protected void visitRight(Position p, Result r) { Operator op = (Operator) p.element(); r.finalResult = op.operation( (Integer) r.leftResult, (Integer) r.rightResult ); } … } Trees CSC311: Data Structures
171 Linked Structure for TreesA node is represented by an object storing Element Parent node Sequence of children nodes Node objects implement the Position ADT B A D F B A D F C E C E Trees CSC311: Data Structures
172 Linked Structure for Binary TreesA node is represented by an object storing Element Parent node Left child node Right child node Node objects implement the Position ADT B A D B A D C E C E Trees CSC311: Data Structures
173 Array-Based Representation of Binary Treesnodes are stored in an array 1 A H G F E D C B J … 2 3 let rank(node) be defined as follows: rank(root) = 1 if node is the left child of parent(node), rank(node) = 2*rank(parent(node)) if node is the right child of parent(node), rank(node) = 2*rank(parent(node))+1 4 5 6 7 10 11 Trees CSC311: Data Structures
174 Array-Based Representation of Binary Trees: PropertiesLet n be the number of nodes of a binary tree T Let pM be the maximum value of rank(v) over all the nodes of T The array size N = pM+1 In the worst case, N = 2n Trees CSC311: Data Structures
175 Chapter 8: Priority QueuesObjectives: Priority Queue ADT Comparator design pattern Heaps Priority Queue Implementation with List and Heap Adaptable Priority Queues Sorting: Priority Queue-sort Selection-sort Insert-sort Heap-ort CSC311: Data Structures
176 Priority Queue ADT A priority queue stores a collection of entriesEach entry is a pair (key, value) Main methods of the Priority Queue ADT insert(k, x) inserts an entry with key k and value x removeMin() removes and returns the entry with smallest key Additional methods min() returns, but does not remove, an entry with smallest key size(), isEmpty() Applications: Standby flyers Auctions Stock market Priority Queues CSC311: Data Structures
177 Total Order Relations Keys in a priority queue can be arbitrary objects on which an order is defined Two distinct entries in a priority queue can have the same key Mathematical concept of total order relation Reflexive property: x x Antisymmetric property: x y y x x = y Transitive property: x y y z x z Priority Queues CSC311: Data Structures
178 Entry ADT An entry in a priority queue is simply a key-value pairPriority queues store entries to allow for efficient insertion and removal based on keys Methods: key(): returns the key for this entry value(): returns the value associated with this entry As a Java interface: /** * Interface for a key-value * pair entry **/ public interface Entry { public Object key(); public Object value(); } Priority Queues CSC311: Data Structures
179 Comparator ADT The primary method of the Comparator ADT:A comparator encapsulates the action of comparing two objects according to a given total order relation A generic priority queue uses an auxiliary comparator The comparator is external to the keys being compared When the priority queue needs to compare two keys, it uses its comparator The primary method of the Comparator ADT: compare(x, y): Returns an integer i such that i < 0 if a < b, i = 0 if a = b, and i > 0 if a > b; an error occurs if a and b cannot be compared. Priority Queues CSC311: Data Structures
180 Example Comparator Lexicographic comparison of 2-D points:/** Comparator for 2D points under the standard lexicographic order. */ public class Lexicographic implements Comparator { int xa, ya, xb, yb; public int compare(Object a, Object b) throws ClassCastException { xa = ((Point2D) a).getX(); ya = ((Point2D) a).getY(); xb = ((Point2D) b).getX(); yb = ((Point2D) b).getY(); if (xa != xb) return (xb - xa); else return (yb - ya); } Point objects: /** Class representing a point in the plane with integer coordinates */ public class Point2D { protected int xc, yc; // coordinates public Point2D(int x, int y) { xc = x; yc = y; } public int getX() { return xc; public int getY() { return yc; Priority Queues CSC311: Data Structures
181 Priority Queue SortingWe can use a priority queue to sort a set of comparable elements Insert the elements one by one with a series of insert operations Remove the elements in sorted order with a series of removeMin operations The running time of this sorting method depends on the priority queue implementation Algorithm PQ-Sort(S, C) Input sequence S, comparator C for the elements of S Output sequence S sorted in increasing order according to C P priority queue with comparator C while S.isEmpty () e S.removeFirst () P.insert (e, 0) while P.isEmpty() e P.removeMin().key() S.insertLast(e) Priority Queues CSC311: Data Structures
182 Sequence-based Priority QueueImplementation with an unsorted list Performance: insert takes O(1) time since we can insert the item at the beginning or end of the sequence removeMin and min take O(n) time since we have to traverse the entire sequence to find the smallest key Implementation with a sorted list Performance: insert takes O(n) time since we have to find the place where to insert the item removeMin and min take O(1) time, since the smallest key is at the beginning 4 5 2 3 1 1 2 3 4 5 Priority Queues CSC311: Data Structures
183 Selection-Sort Selection-sort is the variation of PQ-sort where the priority queue is implemented with an unsorted sequence Running time of Selection-sort: Inserting the elements into the priority queue with n insert operations takes O(n) time Removing the elements in sorted order from the priority queue with n removeMin operations takes time proportional to …+ n Selection-sort runs in O(n2) time Priority Queues CSC311: Data Structures
184 Selection-Sort ExampleSequence S Priority Queue P Input: (7,4,8,2,5,3,9) () Phase 1 (a) (4,8,2,5,3,9) (7) (b) (8,2,5,3,9) (7,4) . . . (g) () (7,4,8,2,5,3,9) Phase 2 (a) (2) (7,4,8,5,3,9) (b) (2,3) (7,4,8,5,9) (c) (2,3,4) (7,8,5,9) (d) (2,3,4,5) (7,8,9) (e) (2,3,4,5,7) (8,9) (f) (2,3,4,5,7,8) (9) (g) (2,3,4,5,7,8,9) () Priority Queues CSC311: Data Structures
185 Insertion-Sort Insertion-sort is the variation of PQ-sort where the priority queue is implemented with a sorted sequence Running time of Insertion-sort: Inserting the elements into the priority queue with n insert operations takes time proportional to …+ n Removing the elements in sorted order from the priority queue with a series of n removeMin operations takes O(n) time Insertion-sort runs in O(n2) time Priority Queues CSC311: Data Structures
186 Insertion-Sort ExampleSequence S Priority queue P Input: (7,4,8,2,5,3,9) () Phase 1 (a) (4,8,2,5,3,9) (7) (b) (8,2,5,3,9) (4,7) (c) (2,5,3,9) (4,7,8) (d) (5,3,9) (2,4,7,8) (e) (3,9) (2,4,5,7,8) (f) (9) (2,3,4,5,7,8) (g) () (2,3,4,5,7,8,9) Phase 2 (a) (2) (3,4,5,7,8,9) (b) (2,3) (4,5,7,8,9) (g) (2,3,4,5,7,8,9) () Priority Queues CSC311: Data Structures
187 In-place Insertion-sortInstead of using an external data structure, we can implement selection-sort and insertion-sort in-place A portion of the input sequence itself serves as the priority queue For in-place insertion-sort We keep sorted the initial portion of the sequence We can use swaps instead of modifying the sequence 5 4 2 3 1 Priority Queues CSC311: Data Structures
188 Heaps A heap is a binary tree storing keys at its nodes and satisfying the following properties: Heap-Order: for every internal node v other than the root, key(v) key(parent(v)) Complete Binary Tree: let h be the height of the heap for i = 0, … , h - 1, there are 2i nodes of depth i at depth h - 1, the internal nodes are to the left of the external nodes The last node of a heap is the rightmost node of depth h 2 5 6 9 7 last node Priority Queues CSC311: Data Structures
189 Height of a Heap Theorem: A heap storing n keys has height O(log n)Proof: (we apply the complete binary tree property) Let h be the height of a heap storing n keys Since there are 2i keys at depth i = 0, … , h - 1 and at least one key at depth h, we have n … + 2h Thus, n 2h , i.e., h log n depth keys 1 1 2 h-1 2h-1 h 1 Priority Queues CSC311: Data Structures
190 Heaps and Priority QueuesWe can use a heap to implement a priority queue We store a (key, element) item at each internal node We keep track of the position of the last node For simplicity, we show only the keys in the pictures (2, Sue) (6, Mark) (5, Pat) (9, Jeff) (7, Anna) Priority Queues CSC311: Data Structures
191 Insertion into a Heap Method insertItem of the priority queue ADT corresponds to the insertion of a key k to the heap The insertion algorithm consists of three steps Find the insertion node z (the new last node) Store k at z Restore the heap-order property (discussed next) 2 5 6 z 9 7 insertion node 2 5 6 z 9 7 1 Priority Queues CSC311: Data Structures
192 Upheap After the insertion of a new key k, the heap-order property may be violated Algorithm upheap restores the heap-order property by swapping k along an upward path from the insertion node Upheap terminates when the key k reaches the root or a node whose parent has a key smaller than or equal to k Since a heap has height O(log n), upheap runs in O(log n) time 2 1 5 1 5 2 z z 9 7 6 9 7 6 Priority Queues CSC311: Data Structures
193 Removal from a Heap Method removeMin of the priority queue ADT corresponds to the removal of the root key from the heap The removal algorithm consists of three steps Replace the root key with the key of the last node w Remove w Restore the heap-order property (discussed next) 2 5 6 w 9 7 last node 7 5 6 w 9 new last node Priority Queues CSC311: Data Structures
194 Downheap After replacing the root key with the key k of the last node, the heap-order property may be violated Algorithm downheap restores the heap-order property by swapping key k along a downward path from the root Upheap terminates when key k reaches a leaf or a node whose children have keys greater than or equal to k Since a heap has height O(log n), downheap runs in O(log n) time 7 5 5 6 7 6 w w 9 9 Priority Queues CSC311: Data Structures
195 Updating the Last Node The insertion node can be found by traversing a path of O(log n) nodes Go up until a left child or the root is reached If a left child is reached, go to the right child Go down left until a leaf is reached Similar algorithm for updating the last node after a removal Priority Queues CSC311: Data Structures
196 Heap-Sort Consider a priority queue with n items implemented by means of a heap the space used is O(n) methods insert and removeMin take O(log n) time methods size, isEmpty, and min take time O(1) time Using a heap-based priority queue, we can sort a sequence of n elements in O(n log n) time The resulting algorithm is called heap-sort Heap-sort is much faster than quadratic sorting algorithms, such as insertion-sort and selection-sort Priority Queues CSC311: Data Structures
197 Vector-based Heap ImplementationWe can represent a heap with n keys by means of a vector of length n + 1 For the node at rank i the left child is at rank 2i the right child is at rank 2i + 1 Links between nodes are not explicitly stored The cell of at rank 0 is not used Operation insert corresponds to inserting at rank n + 1 Operation removeMin corresponds to removing at rank 1 Yields in-place heap-sort 2 5 6 9 7 2 5 6 9 7 1 3 4 Priority Queues CSC311: Data Structures
198 Merging Two Heaps We are given two heaps and a key k3 2 We are given two heaps and a key k We create a new heap with the root node storing k and with the two heaps as subtrees We perform downheap to restore the heap-order property 8 5 4 6 7 3 2 8 5 4 6 2 3 4 8 5 7 6 Priority Queues CSC311: Data Structures
199 Bottom-up Heap ConstructionWe can construct a heap storing n given keys in using a bottom-up construction with log n phases In phase i, pairs of heaps with 2i -1 keys are merged into heaps with 2i+1-1 keys 2i -1 2i+1-1 Priority Queues CSC311: Data Structures
200 Example 15 16 12 4 7 6 20 23 25 5 11 27 Priority QueuesCSC311: Data Structures
201 Example (contd.) 25 5 11 27 16 15 4 12 6 9 23 20 15 4 6 23 16 25 5 12 11 9 27 20 Priority Queues CSC311: Data Structures
202 Example (contd.) 7 8 15 4 6 23 16 25 5 12 11 9 27 20 4 6 15 5 8 23 16 25 7 12 11 9 27 20 Priority Queues CSC311: Data Structures
203 Example (end) 10 4 6 15 5 8 23 16 25 7 12 11 9 27 20 4 5 6 15 7 8 23 16 25 10 12 11 9 27 20 Priority Queues CSC311: Data Structures
204 Analysis We visualize the worst-case time of a downheap with a proxy path that goes first right and then repeatedly goes left until the bottom of the heap (this path may differ from the actual downheap path) Since each node is traversed by at most two proxy paths, the total number of nodes of the proxy paths is O(n) Thus, bottom-up heap construction runs in O(n) time Bottom-up heap construction is faster than n successive insertions and speeds up the first phase of heap-sort Priority Queues CSC311: Data Structures
205 Adaptable Priority Queues3 a 5 g 4 e Adaptable Priority Queues CSC311: Data Structures
206 Motivating Example Suppose we have an online trading system where orders to purchase and sell a given stock are stored in two priority queues (one for sell orders and one for buy orders) as (p,s) entries: The key, p, of an order is the price The value, s, for an entry is the number of shares A buy order (p, s) is executed when a sell order (p’, s’) with price p’ s)
207 Methods of the Adaptable Priority Queue ADTremove(e): Remove from P and return entry e. replaceKey(e,k): Replace with k and return the key of entry e of P; an error condition occurs if k is invalid (that is, k cannot be compared with other keys). replaceValue(e,x): Replace with x and return the value of entry e of P. Priority Queues CSC311: Data Structures
208 Example Operation Output P insert(5,A) e1 (5,A)insert(3,B) e2 (3,B),(5,A) insert(7,C) e3 (3,B),(5,A),(7,C) min() e2 (3,B),(5,A),(7,C) key(e2) 3 (3,B),(5,A),(7,C) remove(e1) e1 (3,B),(7,C) replaceKey(e2,9) 3 (7,C),(9,B) replaceValue(e3,D) C (7,D),(9,B) remove(e2) e2 (7,D) Priority Queues CSC311: Data Structures
209 Locating Entries In order to implement the operations remove(k), replaceKey(e), and replaceValue(k), we need fast ways of locating an entry e in a priority queue. We can always just search the entire data structure to find an entry e, but there are better ways for locating entries. Priority Queues CSC311: Data Structures
210 Location-Aware EntriesA locator-aware entry identifies and tracks the location of its (key, value) object within a data structure Intuitive notion: Coat claim check Valet claim ticket Reservation number Main idea: Since entries are created and returned from the data structure itself, it can return location-aware entries, thereby making future updates easier Priority Queues CSC311: Data Structures
211 List Implementation A location-aware list entry is an object storingkey value position (or rank) of the item in the list In turn, the position (or array cell) stores the entry Back pointers (or ranks) are updated during swaps trailer header nodes/positions 2 c 4 c 5 c 8 c entries Priority Queues CSC311: Data Structures
212 Heap Implementation A location-aware heap entry is an object storingkey value position of the entry in the underlying heap In turn, each heap position stores an entry Back pointers are updated during entry swaps 2 d 4 a 6 b 8 g 5 e 9 c Priority Queues CSC311: Data Structures
213 Performance Using location-aware entries we can achieve the following running times (times better than those achievable without location-aware entries are highlighted in red): Method Unsorted List Sorted List Heap size, isEmpty O(1) O(1) O(1) insert O(1) O(n) O(log n) min O(n) O(1) O(1) removeMin O(n) O(1) O(log n) remove O(1) O(1) O(log n) replaceKey O(1) O(n) O(log n) replaceValue O(1) O(1) O(1) Priority Queues CSC311: Data Structures
214 Chapter 9: Maps and DictionariesObjectives: Map ADT Hash tables Hash functions and hash code Compression functions and collisions Dictionary ADT List-based Dictionary Hash table Dictionary Ordered search tables and binary search Skip list CSC311: Data Structures
215 Maps A map models a searchable collection of key-value entriesThe main operations of a map are for searching, inserting, and deleting items Multiple entries with the same key are not allowed Applications: address book student-record database Maps and Dictionaries CSC311: Data Structures
216 The Map ADT Map ADT methods: size(), isEmpty()get(k): if the map M has an entry with key k, return its associated value; else, return null put(k, v): insert entry (k, v) into the map M; if key k is not already in M, then return null; else, return old value associated with k remove(k): if the map M has an entry with key k, remove it from M and return its associated value; else, return null keys(): return an iterator of the keys in M values(): return an iterator of the values in M entries(): return an iterable collection containing all the key-value entries in M Maps and Dictionaries CSC311: Data Structures
217 Example Operation Output Map isEmpty() true Ø put(5,A) null (5,A)put(7,B) null (5,A),(7,B) put(2,C) null (5,A),(7,B),(2,C) put(8,D) null (5,A),(7,B),(2,C),(8,D) put(2,E) C (5,A),(7,B),(2,E),(8,D) get(7) B (5,A),(7,B),(2,E),(8,D) get(4) null (5,A),(7,B),(2,E),(8,D) get(2) E (5,A),(7,B),(2,E),(8,D) size() 4 (5,A),(7,B),(2,E),(8,D) remove(5) A (7,B),(2,E),(8,D) remove(2) E (7,B),(8,D) get(2) null (7,B),(8,D) isEmpty() false (7,B),(8,D) Maps and Dictionaries CSC311: Data Structures
218 Comparison to java.util.MapMap ADT Methods java.util.Map Methods size() size() isEmpty() isEmpty() get(k) get(k) put(k,v) put(k,v) remove(k) remove(k) keys() keySet() values() valueSet() entries() values() Maps and Dictionaries CSC311: Data Structures
219 A Simple List-Based MapWe can efficiently implement a map using an unsorted list We store the items of the map in a list S (based on a doubly-linked list), in arbitrary order trailer header nodes/positions entries 9 c 6 5 8 Maps and Dictionaries CSC311: Data Structures
220 The get(k) Algorithm Algorithm get(k): Input: A key kOutput: a value for key k in M, null if k is not in M B = S.positions() {B is an iterator of the positions in S} while B.hasNext() do p = B.next() fthe next position in Bg if p.element().key() = k then return p.element().value() return null {there is no entry with key equal to k} Maps and Dictionaries CSC311: Data Structures
221 The put(k,v) Algorithm Algorithm put(k,v):Input: A key-value pair (k, v) Output: the old value for k in M, null if k is new B = S.positions() while B.hasNext() do p = B.next() if p.element().key() = k then t = p.element().value() B.replace(p,(k,v)) return t {return the old value} S.insertLast((k,v)) n = n + 1 {increment variable storing number of entries} return null {there was no previous entry with key equal to k} Maps and Dictionaries CSC311: Data Structures
222 The remove(k) AlgorithmAlgorithm remove(k): Input: A key k Output: the removed value for k, null if k is not in M B =S.positions() while B.hasNext() do p = B.next() if p.element().key() = k then t = p.element().value() S.remove(p) n = n – 1 {decrement number of entries} return t {return the removed value} return null {there is no entry with key equal to k} Maps and Dictionaries CSC311: Data Structures
223 Performance of a List-Based Mapput, get and remove take O(n) time since in the worst case (the item is not found) we traverse the entire sequence to look for an item with the given key The unsorted list implementation is effective only for maps of small size Maps and Dictionaries CSC311: Data Structures
224 Hash Function and Hash TableA hash function h maps keys of a given type to integers in a fixed interval [0, N - 1] Example: h(x) = x mod N is a hash function for integer keys The integer h(x) is called the hash value of key x A hash table for a given key type consists of Hash function h Array (called table) of size N When implementing a map with a hash table, the goal is to store item (k, o) at index i = h(k) Maps and Dictionaries CSC311: Data Structures
225 Example We design a hash table for a map storing entries as (SSN, Name), where SSN (social security number) is a nine-digit positive integer Our hash table uses an array of size N = 10,000 and the hash function h(x) = last four digits of x 1 2 3 4 9997 9998 9999 … Maps and Dictionaries CSC311: Data Structures
226 Hash Functions The hash code is applied first, and the compression function is applied next on the result, i.e., h(x) = h2(h1(x)) The goal of the hash function is to “disperse” the keys in an apparently random way A hash function is usually specified as the composition of two functions: Hash code: h1: keys integers Compression function: h2: integers [0, N - 1] Maps and Dictionaries CSC311: Data Structures
227 Hash Codes Memory address: Integer cast: Component sum:We reinterpret the memory address of the key object as an integer (default hash code of all Java objects) Good in general, except for numeric and string keys Integer cast: We reinterpret the bits of the key as an integer Suitable for keys of length less than or equal to the number of bits of the integer type (e.g., byte, short, int and float in Java) Component sum: We partition the bits of the key into components of fixed length (e.g., 16 or 32 bits) and we sum the components (ignoring overflows) Suitable for numeric keys of fixed length greater than or equal to the number of bits of the integer type (e.g., long and double in Java) Maps and Dictionaries CSC311: Data Structures
228 Hash Codes Polynomial accumulation:We partition the bits of the key into a sequence of components of fixed length (e.g., 8, 16 or 32 bits) a0 a1 … an-1 We evaluate the polynomial p(z) = a0 + a1 z + a2 z2 + … … + an-1zn-1 at a fixed value z, ignoring overflows Especially suitable for strings (e.g., the choice z = 33 gives at most 6 collisions on a set of 50,000 English words) Polynomial p(z) can be evaluated in O(n) time using Horner’s rule: The following polynomials are successively computed, each from the previous one in O(1) time p0(z) = an-1 pi (z) = an-i-1 + zpi-1(z) (i = 1, 2, …, n -1) We have p(z) = pn-1(z) Maps and Dictionaries CSC311: Data Structures
229 Compression FunctionsDivision: h2 (y) = y mod N The size N of the hash table is usually chosen to be a prime The reason has to do with number theory and is beyond the scope of this course Multiply, Add and Divide (MAD): h2 (y) = (ay + b) mod N a and b are nonnegative integers such that a mod N 0 Otherwise, every integer would map to the same value b Maps and Dictionaries CSC311: Data Structures
230 Collision Handling Collisions occur when different elements are mapped to the same cell Separate Chaining: let each cell in the table point to a linked list of entries that map there 1 2 3 4 Separate chaining is simple, but requires additional memory outside the table Maps and Dictionaries CSC311: Data Structures
231 Map Methods with Separate Chaining used for CollisionsDelegate get and put methods to a list-based map at each cell: Algorithm get(k): Output: The value associated with the key k in the map, or null if there is no entry with key equal to k in the map return A[h(k)].get(k) {delegate the get to the list-based map at A[h(k)]} Algorithm put(k,v): Output: If there is an existing entry in our map with key equal to k, then we return its value (replacing it with v); otherwise, we return null t = A[h(k)].put(k,v) {delegate the put to the list-based map at A[h(k)]} if t = null then {k is a new key} n = n + 1 return t Maps and Dictionaries CSC311: Data Structures
232 Map Methods with Separate Chaining used for CollisionsDelegate the remove method to a list-based map at each cell: Algorithm remove(k): Output: The (removed) value associated with key k in the map, or null if there is no entry with key equal to k in the map t = A[h(k)].remove(k) {delegate the remove to the list-based map at A[h(k)]} if t ≠ null then {k was found} n = n - 1 return t Maps and Dictionaries CSC311: Data Structures
233 Linear Probing Example: h(x) = x mod 13Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order Open addressing: the colliding item is placed in a different cell of the table Linear probing handles collisions by placing the colliding item in the next (circularly) available table cell Each table cell inspected is referred to as a “probe” Colliding items lump together, causing future collisions to cause a longer sequence of probes 1 2 3 4 5 6 7 8 9 10 11 12 41 18 44 59 32 22 31 73 Maps and Dictionaries CSC311: Data Structures
234 Search with Linear ProbingConsider a hash table A that uses linear probing get(k) We start at cell h(k) We probe consecutive locations until one of the following occurs An item with key k is found, or An empty cell is found, or N cells have been unsuccessfully probed Algorithm get(k) i h(k) p 0 repeat c A[i] if c = return null else if c.key () = k return c.element() else i (i + 1) mod N p p + 1 until p = N Maps and Dictionaries CSC311: Data Structures
235 Updates with Linear ProbingTo handle insertions and deletions, we introduce a special object, called AVAILABLE, which replaces deleted elements remove(k) We search for an entry with key k If such an entry (k, o) is found, we replace it with the special item AVAILABLE and we return element o Else, we return null put(k, o) We throw an exception if the table is full We start at cell h(k) We probe consecutive cells until one of the following occurs A cell i is found that is either empty or stores AVAILABLE, or N cells have been unsuccessfully probed We store entry (k, o) in cell i Maps and Dictionaries CSC311: Data Structures
236 Double Hashing Double hashing uses a secondary hash function d(k) and handles collisions by placing an item in the first available cell of the series (i + jd(k)) mod N for j = 0, 1, … , N - 1 The secondary hash function d(k) cannot have zero values The table size N must be a prime to allow probing of all the cells Common choice of compression function for the secondary hash function: d2(k) = q - k mod q where q < N q is a prime The possible values for d2(k) are 1, 2, … , q Maps and Dictionaries CSC311: Data Structures
237 Example of Double HashingConsider a hash table storing integer keys that handles collision with double hashing N = 13 h(k) = k mod 13 d(k) = 7 - k mod 7 Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order 1 2 3 4 5 6 7 8 9 10 11 12 31 41 18 32 59 73 22 44 1 2 3 4 5 6 7 8 9 10 11 12 Maps and Dictionaries CSC311: Data Structures
238 Performance of HashingIn the worst case, searches, insertions and removals on a hash table take O(n) time The worst case occurs when all the keys inserted into the map collide The load factor a = n/N affects the performance of a hash table Assuming that the hash values are like random numbers, it can be shown that the expected number of probes for an insertion with open addressing is 1 / (1 - a) The expected running time of all the dictionary ADT operations in a hash table is O(1) In practice, hashing is very fast provided the load factor is not close to 100% Applications of hash tables: small databases compilers browser caches Maps and Dictionaries CSC311: Data Structures
239 Dictionary ADT Dictionary ADT methods:find(k): if the dictionary has an entry with key k, returns it, else, returns null findAll(k): returns an iterator of all entries with key k insert(k, o): inserts and returns the entry (k, o) remove(e): remove the entry e from the dictionary entries(): returns an iterator of the entries in the dictionary size(), isEmpty() The dictionary ADT models a searchable collection of key-element entries The main operations of a dictionary are searching, inserting, and deleting items Multiple items with the same key are allowed Applications: word-definition pairs credit card authorizations DNS mapping of host names (e.g., datastructures.net) to internet IP addresses (e.g., ) Maps and Dictionaries CSC311: Data Structures
240 Example Operation Output Dictionary insert(5,A) (5,A) (5,A)insert(7,B) (7,B) (5,A),(7,B) insert(2,C) (2,C) (5,A),(7,B),(2,C) insert(8,D) (8,D) (5,A),(7,B),(2,C),(8,D) insert(2,E) (2,E) (5,A),(7,B),(2,C),(8,D),(2,E) find(7) (7,B) (5,A),(7,B),(2,C),(8,D),(2,E) find(4) null (5,A),(7,B),(2,C),(8,D),(2,E) find(2) (2,C) (5,A),(7,B),(2,C),(8,D),(2,E) findAll(2) (2,C),(2,E) (5,A),(7,B),(2,C),(8,D),(2,E) size() 5 (5,A),(7,B),(2,C),(8,D),(2,E) remove(find(5)) (5,A) (7,B),(2,C),(8,D),(2,E) find(5) null (7,B),(2,C),(8,D),(2,E) Maps and Dictionaries CSC311: Data Structures
241 A List-Based DictionaryA log file or audit trail is a dictionary implemented by means of an unsorted sequence We store the items of the dictionary in a sequence (based on a doubly-linked list or array), in arbitrary order Performance: insert takes O(1) time since we can insert the new item at the beginning or at the end of the sequence find and remove take O(n) time since in the worst case (the item is not found) we traverse the entire sequence to look for an item with the given key The log file is effective only for dictionaries of small size or for dictionaries on which insertions are the most common operations, while searches and removals are rarely performed (e.g., historical record of logins to a workstation) Maps and Dictionaries CSC311: Data Structures
242 The findAll(k) AlgorithmAlgorithm findAll(k): Input: A key k Output: An iterator of entries with key equal to k Create an initially-empty list L B = D.entries() while B.hasNext() do e = B.next() if e.key() = k then L.insertLast(e) return L.elements() Maps and Dictionaries CSC311: Data Structures
243 The insert and remove MethodsAlgorithm insert(k,v): Input: A key k and value v Output: The entry (k,v) added to D Create a new entry e = (k,v) S.insertLast(e) {S is unordered} return e Algorithm remove(e): Input: An entry e Output: The removed entry e or null if e was not in D {We don’t assume here that e stores its location in S} B = S.positions() while B.hasNext() do p = B.next() if p.element() = e then S.remove(p) return null {there is no entry e in D} Maps and Dictionaries CSC311: Data Structures
244 Hash Table ImplementationWe can also create a hash-table dictionary implementation. If we use separate chaining to handle collisions, then each operation can be delegated to a list-based dictionary stored at each hash table cell. Maps and Dictionaries CSC311: Data Structures
245 Binary Search Binary search performs operation find(k) on a dictionary implemented by means of an array-based sequence, sorted by key similar to the high-low game at each step, the number of candidate items is halved terminates after a logarithmic number of steps Example: find(7) 1 3 4 5 7 8 9 11 14 16 18 19 l m h 1 3 4 5 7 8 9 11 14 16 18 19 l m h 1 3 4 5 7 8 9 11 14 16 18 19 l m h 1 3 4 5 7 8 9 11 14 16 18 19 l=m =h Maps and Dictionaries CSC311: Data Structures
246 Search Table A search table is a dictionary implemented by means of a sorted array We store the items of the dictionary in an array-based sequence, sorted by key We use an external comparator for the keys Performance: find takes O(log n) time, using binary search insert takes O(n) time since in the worst case we have to shift n/2 items to make room for the new item remove takes O(n) time since in the worst case we have to shift n/2 items to compact the items after the removal A search table is effective only for dictionaries of small size or for dictionaries on which searches are the most common operations, while insertions and removals are rarely performed (e.g., credit card authorizations) Maps and Dictionaries CSC311: Data Structures
247 What is a Skip List A skip list for a set S of distinct (key, element) items is a series of lists S0, S1 , … , Sh such that Each list Si contains the special keys + and - List S0 contains the keys of S in nondecreasing order Each list is a subsequence of the previous one, i.e., S0 S1 … Sh List Sh contains only the two special keys We show how to use a skip list to implement the dictionary ADT S3 + - S2 - 31 + S1 - 23 31 34 64 + S0 56 64 78 + 31 34 44 - 12 23 26 Maps and Dictionaries CSC311: Data Structures
248 Search S3 S2 S1 S0 We search for a key x in a a skip list as follows:We start at the first position of the top list At the current position p, we compare x with y key(next(p)) x = y: we return element(next(p)) x > y: we “scan forward” x < y: we “drop down” If we try to drop down past the bottom list, we return null Example: search for 78 S3 + - S2 - 31 + S1 - 23 31 34 64 + S0 - 12 23 26 31 34 44 56 64 78 + Maps and Dictionaries CSC311: Data Structures
249 Randomized AlgorithmsA randomized algorithm performs coin tosses (i.e., uses random bits) to control its execution It contains statements of the type b random() if b = 0 do A … else { b = 1} do B … Its running time depends on the outcomes of the coin tosses We analyze the expected running time of a randomized algorithm under the following assumptions the coins are unbiased, and the coin tosses are independent The worst-case running time of a randomized algorithm is often large but has very low probability (e.g., it occurs when all the coin tosses give “heads”) We use a randomized algorithm to insert items into a skip list Maps and Dictionaries CSC311: Data Structures
250 Insertion To insert an entry (x, o) into a skip list, we use a randomized algorithm: We repeatedly toss a coin until we get tails, and we denote with i the number of times the coin came up heads If i h, we add to the skip list new lists Sh+1, … , Si +1, each containing only the two special keys We search for x in the skip list and find the positions p0, p1 , …, pi of the items with largest key less than x in each list S0, S1, … , Si For j 0, …, i, we insert item (x, o) into list Sj after position pj Example: insert key 15, with i = 2 S3 + - p2 S2 S2 - + + - 15 p1 S1 S1 - 23 + + - 23 15 p0 S0 S0 - 10 23 36 + + - 10 36 23 15 Maps and Dictionaries CSC311: Data Structures
251 Deletion To remove an entry with key x from a skip list, we proceed as follows: We search for x in the skip list and find the positions p0, p1 , …, pi of the items with key x, where position pj is in list Sj We remove positions p0, p1 , …, pi from the lists S0, S1, … , Si We remove all but one list containing only the two special keys Example: remove key 34 S3 - + p2 S2 S2 - 34 + - + p1 S1 S1 - 23 34 + - 23 + p0 S0 S0 - 12 23 34 45 + - 12 23 45 + Maps and Dictionaries CSC311: Data Structures
252 Implementation x quad-nodeWe can implement a skip list with quad-nodes A quad-node stores: entry link to the node prev link to the node next link to the node below link to the node above Also, we define special keys PLUS_INF and MINUS_INF, and we modify the key comparator to handle them quad-node x Maps and Dictionaries CSC311: Data Structures
253 Space Usage Consider a skip list with n entriesBy Fact 1, we insert an entry in list Si with probability 1/2i By Fact 2, the expected size of list Si is n/2i The expected number of nodes used by the skip list is The space used by a skip list depends on the random bits used by each invocation of the insertion algorithm We use the following two basic probabilistic facts: Fact 1: The probability of getting i consecutive heads when flipping a coin is 1/2i Fact 2: If each of n entries is present in a set with probability p, the expected size of the set is np Thus, the expected space usage of a skip list with n items is O(n) Maps and Dictionaries CSC311: Data Structures
254 Height The running time of the search an insertion algorithms is affected by the height h of the skip list We show that with high probability, a skip list with n items has height O(log n) We use the following additional probabilistic fact: Fact 3: If each of n events has probability p, the probability that at least one event occurs is at most np Consider a skip list with n entires By Fact 1, we insert an entry in list Si with probability 1/2i By Fact 3, the probability that list Si has at least one item is at most n/2i By picking i = 3log n, we have that the probability that S3log n has at least one entry is at most n/23log n = n/n3 = 1/n2 Thus a skip list with n entries has height at most 3log n with probability at least /n2 Maps and Dictionaries CSC311: Data Structures
255 Search and Update TimesWhen we scan forward in a list, the destination key does not belong to a higher list A scan-forward step is associated with a former coin toss that gave tails By Fact 4, in each list the expected number of scan-forward steps is 2 Thus, the expected number of scan-forward steps is O(log n) We conclude that a search in a skip list takes O(log n) expected time The analysis of insertion and deletion gives similar results The search time in a skip list is proportional to the number of drop-down steps, plus the number of scan-forward steps The drop-down steps are bounded by the height of the skip list and thus are O(log n) with high probability To analyze the scan-forward steps, we use yet another probabilistic fact: Fact 4: The expected number of coin tosses required in order to get tails is 2 Maps and Dictionaries CSC311: Data Structures
256 Summary A skip list is a data structure for dictionaries that uses a randomized insertion algorithm In a skip list with n entries The expected space used is O(n) The expected search, insertion and deletion time is O(log n) Using a more complex probabilistic analysis, one can show that these performance bounds also hold with high probability Skip lists are fast and simple to implement in practice Maps and Dictionaries CSC311: Data Structures
257 Chapter 10: Search Trees Objectives:Binary Search Trees: Search, update, and implementation AVL Trees: Properties and maintenance 2-4 Trees: Properties and maintenance Red-black Trees: Properties and equivalence to 2-4 Trees CSC311: Data Structures
258 Ordered Dictionaries Keys are assumed to come from a total order.New operations: first(): first entry in the dictionary ordering last(): last entry in the dictionary ordering successors(k): iterator of entries with keys greater than or equal to k; increasing order predecessors(k): iterator of entries with keys less than or equal to k; decreasing order Search Trees CSC311: Data Structures
259 Binary Search Binary search can perform operation find(k) on a dictionary implemented by means of an array-based sequence, sorted by key similar to the high-low game at each step, the number of candidate items is halved terminates after O(log n) steps Example: find(7) 1 3 4 5 7 8 9 11 14 16 18 19 m l h l=m =h Search Trees CSC311: Data Structures
260 Search Tables A search table is a dictionary implemented by means of a sorted sequence We store the items of the dictionary in an array-based sequence, sorted by key We use an external comparator for the keys Performance: find takes O(log n) time, using binary search insert takes O(n) time since in the worst case we have to shift n/2 items to make room for the new item remove take O(n) time since in the worst case we have to shift n/2 items to compact the items after the removal The lookup table is effective only for dictionaries of small size or for dictionaries on which searches are the most common operations, while insertions and removals are rarely performed (e.g., credit card authorizations) Search Trees CSC311: Data Structures
261 Binary Search Trees A binary search tree is a binary tree storing keys (or key-value entries) at its internal nodes and satisfying the following property: Let u, v, and w be three nodes such that u is in the left subtree of v and w is in the right subtree of v. We have key(u) key(v) key(w) External nodes do not store items An inorder traversal of a binary search trees visits the keys in increasing order 6 9 2 4 1 8 Search Trees CSC311: Data Structures
262 Search To search for a key k, we trace a downward path starting at the root The next node visited depends on the outcome of the comparison of k with the key of the current node If we reach a leaf, the key is not found and we return nukk Example: find(4): Call TreeSearch(4,root) Algorithm TreeSearch(k, v) if T.isExternal (v) return v if k < key(v) return TreeSearch(k, T.left(v)) else if k = key(v) else { k > key(v) } return TreeSearch(k, T.right(v)) 6 9 2 4 1 8 < > = Search Trees CSC311: Data Structures
263 Insertion 6 9 2 4 1 8 5 < > w To perform operation inser(k, o), we search for key k (using TreeSearch) Assume k is not already in the tree, and let let w be the leaf reached by the search We insert k at node w and expand w into an internal node Example: insert 5 Search Trees CSC311: Data Structures
264 Deletion < To perform operation remove(k), we search for key k >6 9 2 4 1 8 5 v w < > To perform operation remove(k), we search for key k Assume key k is in the tree, and let v be the node storing k If node v has a leaf child w, we remove v and w from the tree with operation removeExternal(w), which removes w and its parent Example: remove 4 Search Trees CSC311: Data Structures
265 Deletion (cont.) 3 1 8 6 9 5 v w z 2 We consider the case where the key k to be removed is stored at a node v whose children are both internal we find the internal node w that follows v in an inorder traversal we copy key(w) into node v we remove node w and its left child z (which must be a leaf) by means of operation removeExternal(z) Example: remove 3 Search Trees CSC311: Data Structures
266 Performance Consider a dictionary with n items implemented by means of a binary search tree of height h the space used is O(n) methods find, insert and remove take O(h) time The height h is O(n) in the worst case and O(log n) in the best case Search Trees CSC311: Data Structures
267 AVL Tree Definition AVL trees are balanced.An AVL Tree is a binary search tree such that for every internal node v of T, the heights of the children of v can differ by at most 1. An example of an AVL tree where the heights are shown next to the nodes. Search Trees CSC311: Data Structures
268 Height of an AVL Tree Fact: The height of an AVL tree storing n keys is O(log n). Proof: Let us bound n(h): the minimum number of internal nodes of an AVL tree of height h. We easily see that n(1) = 1 and n(2) = 2 For n > 2, an AVL tree of height h contains the root node, one AVL subtree of height n-1 and another of height n-2. That is, n(h) = 1 + n(h-1) + n(h-2) Knowing n(h-1) > n(h-2), we get n(h) > 2n(h-2). So n(h) > 2n(h-2), n(h) > 4n(h-4), n(h) > 8n(n-6), … (by induction), n(h) > 2in(h-2i) Solving the base case we get: n(h) > 2 h/2-1 Taking logarithms: h < 2log n(h) +2 Thus the height of an AVL tree is O(log n) Search Trees CSC311: Data Structures
269 Insertion in an AVL TreeInsertion is as in a binary search tree Always done by expanding an external node. Example: 44 17 78 32 50 88 48 62 44 17 78 32 50 88 48 62 54 c=z a=y b=x w before insertion after insertion Search Trees CSC311: Data Structures
270 Trinode Restructuringlet (a,b,c) be an inorder listing of x, y, z perform the rotations needed to make b the topmost node of the three case 2: double rotation (a right rotation about c, then a left rotation about a) (other two cases are symmetrical) c=y b=x a=z T0 T1 T2 T3 b=y a=z c=x T0 T1 T2 T3 b=y a=z c=x T0 T1 T2 T3 b=x c=y a=z T0 T1 T2 T3 case 1: single rotation (a left rotation about a) Search Trees CSC311: Data Structures
271 Insertion Example, continuedunbalanced... 88 44 17 78 32 50 48 62 2 1 3 54 T x y z ...balanced 4 Search Trees CSC311: Data Structures
272 Restructuring: Single RotationsSearch Trees CSC311: Data Structures
273 Restructuring: Double RotationsSearch Trees CSC311: Data Structures
274 Removal in an AVL Tree Removal begins as in a binary search tree, which means the node removed will become an empty external node. Its parent, w, may cause an imbalance. Example: 44 17 78 32 50 88 48 62 54 44 17 62 50 78 48 54 88 before deletion of 32 after deletion Search Trees CSC311: Data Structures
275 Rebalancing after RemovalLet z be the first unbalanced node encountered while travelling up the tree from w. Also, let y be the child of z with the larger height, and let x be the child of y with the larger height. We perform restructure(x) to restore balance at z. As this restructuring may upset the balance of another node higher in the tree, we must continue checking for balance until the root of T is reached 44 17 78 50 88 48 62 54 w c=x b=y a=z Search Trees CSC311: Data Structures
276 Running Times for AVL Treesa single restructure is O(1) using a linked-structure binary tree find is O(log n) height of tree is O(log n), no restructures needed insert is O(log n) initial find is O(log n) Restructuring up the tree, maintaining heights is O(log n) remove is O(log n) Search Trees CSC311: Data Structures
277 Chapter 10: Search Trees Objectives:Binary Search Trees: Search, update, and implementation AVL Trees: Properties and maintenance 2-4 Trees: Properties and maintenance Red-black Trees: Properties and equivalence to 2-4 Trees Fall 2006 CSC311: Data Structures
278 Multi-Way Search Tree A multi-way search tree is an ordered tree such that Each internal node has at least two children and stores d -1 key-element items (ki, oi), where d is the number of children For a node with children v1 v2 … vd storing keys k1 k2 … kd-1 keys in the subtree of v1 are less than k1 keys in the subtree of vi are between ki-1 and ki (i = 2, …, d - 1) keys in the subtree of vd are greater than kd-1 The leaves store no items and serve as placeholders 15 30 Spring 2006 CSC311: Data Structures
279 Multi-Way Inorder TraversalWe can extend the notion of inorder traversal from binary trees to multi-way search trees Namely, we visit item (ki, oi) of node v between the recursive traversals of the subtrees of v rooted at children vi and vi + 1 An inorder traversal of a multi-way search tree visits the keys in increasing order 8 12 15 2 4 6 10 14 18 30 1 3 5 7 9 11 13 16 19 15 17 Spring 2006 CSC311: Data Structures
280 Multi-Way Searching Similar to search in a binary search treeA each internal node with children v1 v2 … vd and keys k1 k2 … kd-1 k = ki (i = 1, …, d - 1): the search terminates successfully k < k1: we continue the search in child v1 ki-1 < k < ki (i = 2, …, d - 1): we continue the search in child vi k > kd-1: we continue the search in child vd Reaching an external node terminates the search unsuccessfully Example: search for 30 15 30 Spring 2006 CSC311: Data Structures
281 (2,4) Trees A (2,4) tree (also called 2-4 tree or tree) is a multi-way search with the following properties Node-Size Property: every internal node has at most four children Depth Property: all the external nodes have the same depth Depending on the number of children, an internal node of a (2,4) tree is called a 2-node, 3-node or 4-node 2 8 12 18 Spring 2006 CSC311: Data Structures
282 Height of a (2,4) Tree Theorem: A (2,4) tree storing n items has height O(log n) Proof: Let h be the height of a (2,4) tree with n items Since there are at least 2i items at depth i = 0, … , h - 1 and no items at depth h, we have n … + 2h-1 = 2h - 1 Thus, h log (n + 1) Searching in a (2,4) tree with n items takes O(log n) time depth items 1 1 2 h-1 2h-1 h Spring 2006 CSC311: Data Structures
283 Insertion We insert a new item (k, o) at the parent v of the leaf reached by searching for k We preserve the depth property but We may cause an overflow (i.e., node v may become a 5-node) Example: inserting key 30 causes an overflow 2 8 12 18 v Spring 2006 CSC311: Data Structures
284 Overflow and Split We handle an overflow at a 5-node v with a split operation: let v1 … v5 be the children of v and k1 … k4 be the keys of v node v is replaced nodes v' and v" v' is a 3-node with keys k1 k2 and children v1 v2 v3 v" is a 2-node with key k4 and children v4 v5 key k3 is inserted into the parent u of v (a new root may be created) The overflow may propagate to the parent node u u u v v' v" 12 18 12 18 27 30 35 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 Spring 2006 CSC311: Data Structures
285 Analysis of Insertion Let T be a (2,4) tree with n itemsTree T has O(log n) height Step 1 takes O(log n) time because we visit O(log n) nodes Step 2 takes O(1) time Step 3 takes O(log n) time because each split takes O(1) time and we perform O(log n) splits Thus, an insertion in a (2,4) tree takes O(log n) time Algorithm insert(k, o) 1. Search for key k to locate the insertion node v 2. Add the new entry (k, o) at node v 3. while overflow(v) if isRoot(v) create a new empty root above v v split(v) Spring 2006 CSC311: Data Structures
286 Deletion We reduce deletion of an entry to the case where the item is at the node with leaf children Otherwise, we replace the entry with its inorder successor (or, equivalently, with its inorder predecessor) and delete the latter entry Example: to delete key 24, we replace it with 27 (inorder successor) 2 8 12 18 2 8 12 18 Spring 2006 CSC311: Data Structures
287 Underflow and Fusion u v v' wDeleting an entry from a node v may cause an underflow, where node v becomes a 1-node with one child and no keys To handle an underflow at node v with parent u, we consider two cases Case 1: the adjacent siblings of v are 2-nodes Fusion operation: we merge v with an adjacent sibling w and move an entry from u to the merged node v' After a fusion, the underflow may propagate to the parent u 9 14 10 u v 9 10 14 v' w Spring 2006 CSC311: Data Structures
288 Underflow and TransferCase 2: an adjacent sibling w of v is a 3-node or a 4-node Transfer operation: 1. we move a child of w to v 2. we move an item from u to v 3. we move an item from w to u After a transfer, no underflow occurs 4 9 6 8 2 u v w 4 8 6 9 Spring 2006 CSC311: Data Structures
289 Analysis of Deletion Let T be a (2,4) tree with n itemsTree T has O(log n) height In a deletion operation We visit O(log n) nodes to locate the node from which to delete the entry We handle an underflow with a series of O(log n) fusions, followed by at most one transfer Each fusion and transfer takes O(1) time Thus, deleting an item from a (2,4) tree takes O(log n) time Spring 2006 CSC311: Data Structures
290 Implementing a DictionaryComparison of efficient dictionary implementations Search Insert Delete Notes Hash Table 1 expected no ordered dictionary methods simple to implement Skip List log n high prob. randomized insertion (2,4) Tree log n worst-case complex to implement Spring 2006 CSC311: Data Structures
291 Red-Black Trees A red-black tree can also be defined as a binary search tree that satisfies the following properties: Root Property: the root is black External Property: every leaf is black Internal Property: the children of a red node are black Depth Property: all the leaves have the same black depth 9 4 15 2 6 12 21 7 Spring 2006 CSC311: Data Structures
292 From (2,4) to Red-Black TreesA red-black tree is a representation of a (2,4) tree by means of a binary tree whose nodes are colored red or black In comparison with its associated (2,4) tree, a red-black tree has same logarithmic time performance simpler implementation with a single node type 4 3 5 4 5 3 6 OR 3 5 2 7 Spring 2006 CSC311: Data Structures
293 Height of a Red-Black TreeTheorem: A red-black tree storing n entries has height O(log n) Proof: The height of a red-black tree is at most twice the height of its associated (2,4) tree, which is O(log n) The search algorithm for a binary search tree is the same as that for a binary search tree By the above theorem, searching in a red-black tree takes O(log n) time Spring 2006 CSC311: Data Structures
294 Insertion To perform operation insert(k, o), we execute the insertion algorithm for binary search trees and color red the newly inserted node z unless it is the root We preserve the root, external, and depth properties If the parent v of z is black, we also preserve the internal property and we are done Else (v is red ) we have a double red (i.e., a violation of the internal property), which requires a reorganization of the tree Example where the insertion of 4 causes a double red: 6 6 v v 3 8 3 8 z z 4 Spring 2006 CSC311: Data Structures
295 Remedying a Double Red Consider a double red with child z and parent v, and let w be the sibling of v Case 1: w is black The double red is an incorrect replacement of a 4-node Restructuring: we change the 4-node replacement Case 2: w is red The double red corresponds to an overflow Recoloring: we perform the equivalent of a split 4 4 w v w v 2 7 2 7 z z 6 6 Spring 2006 CSC311: Data Structures
296 Restructuring A restructuring remedies a child-parent double red when the parent red node has a black sibling It is equivalent to restoring the correct replacement of a 4-node The internal property is restored and the other properties are preserved z 4 6 w v v 2 7 4 7 z w 6 2 Spring 2006 CSC311: Data Structures
297 Restructuring (cont.) There are four restructuring configurations depending on whether the double red nodes are left or right children 6 4 2 6 2 4 2 6 4 2 6 4 4 2 6 Spring 2006 CSC311: Data Structures
298 Recoloring A recoloring remedies a child-parent double red when the parent red node has a red sibling The parent v and its sibling w become black and the grandparent u becomes red, unless it is the root It is equivalent to performing a split on a 5-node The double red violation may propagate to the grandparent u 4 4 w v w v 2 7 2 7 z z 6 6 … 4 … 2 6 7 Spring 2006 CSC311: Data Structures
299 Analysis of Insertion Recall that a red-black tree has O(log n) heightStep 1 takes O(log n) time because we visit O(log n) nodes Step 2 takes O(1) time Step 3 takes O(log n) time because we perform O(log n) recolorings, each taking O(1) time, and at most one restructuring taking O(1) time Thus, an insertion in a red-black tree takes O(log n) time Algorithm insert(k, o) 1. We search for key k to locate the insertion node z 2. We add the new entry (k, o) at node z and color z red 3. while doubleRed(z) if isBlack(sibling(parent(z))) z restructure(z) return else { sibling(parent(z) is red } z recolor(z) Spring 2006 CSC311: Data Structures
300 Deletion To perform operation remove(k), we first execute the deletion algorithm for binary search trees Let v be the internal node removed, w the external node removed, and r the sibling of w If either v of r was red, we color r black and we are done Else (v and r were both black) we color r double black, which is a violation of the internal property requiring a reorganization of the tree Example where the deletion of 8 causes a double black: 6 6 v r 3 8 3 r w 4 4 Spring 2006 CSC311: Data Structures
301 Remedying a Double BlackThe algorithm for remedying a double black node w with sibling y considers three cases Case 1: y is black and has a red child We perform a restructuring, equivalent to a transfer , and we are done Case 2: y is black and its children are both black We perform a recoloring, equivalent to a fusion, which may propagate up the double black violation Case 3: y is red We perform an adjustment, equivalent to choosing a different representation of a 3-node, after which either Case 1 or Case 2 applies Deletion in a red-black tree takes O(log n) time Spring 2006 CSC311: Data Structures
302 Red-Black Tree ReorganizationInsertion remedy double red Red-black tree action (2,4) tree action result restructuring change of 4-node representation double red removed recoloring split double red removed or propagated up Deletion remedy double black Red-black tree action (2,4) tree action result restructuring transfer double black removed recoloring fusion double black removed or propagated up adjustment change of 3-node representation restructuring or recoloring follows Spring 2006 CSC311: Data Structures
303 Chapter 13: Graphs I Objectives: Graph ADT: OperationsGraph Implementation: Data structures Graph Traversals: DFS and BFS Directed graphs Weighted graphs Shortest paths Minimum Spanning Trees (MST) CSC311: Data Structures
304 Graphs PVD ORD SFO LGA HNL LAX DFW MIA A graph is a pair (V, E), whereV is a set of nodes, called vertices E is a collection of pairs of vertices, called edges Vertices and edges are positions and store elements Example: A vertex represents an airport and stores the three-letter airport code An edge represents a flight route between two airports and stores the mileage of the route 849 PVD 1843 ORD 142 SFO 802 LGA 1743 337 1387 HNL 2555 1099 LAX 1233 DFW 1120 MIA Graphs I CSC311: Data Structures
305 Edge Types flight AA 1206 ORD PVD 849 miles ORD PVD Directed edgeordered pair of vertices (u,v) first vertex u is the origin second vertex v is the destination e.g., a flight Undirected edge unordered pair of vertices (u,v) e.g., a flight route Directed graph all the edges are directed e.g., route network Undirected graph all the edges are undirected e.g., flight network flight AA 1206 ORD PVD 849 miles ORD PVD Graphs I CSC311: Data Structures
306 Applications Electronic circuits Transportation networksPrinted circuit board Integrated circuit Transportation networks Highway network Flight network Computer networks Local area network Internet Web Databases Entity-relationship diagram Graphs I CSC311: Data Structures
307 Terminology X U V W Z Y a c b e d f g h i jEnd vertices (or endpoints) of an edge U and V are the endpoints of a Edges incident on a vertex a, d, and b are incident on V Adjacent vertices U and V are adjacent Degree of a vertex X has degree 5 Parallel edges h and i are parallel edges Self-loop j is a self-loop X U V W Z Y a c b e d f g h i j Graphs I CSC311: Data Structures
308 Terminology (cont.) V a b P1 d U X Z P2 h c e W g f Y Path Simple pathsequence of alternating vertices and edges begins with a vertex ends with a vertex each edge is preceded and followed by its endpoints Simple path path such that all its vertices and edges are distinct Examples P1=(V,b,X,h,Z) is a simple path P2=(U,c,W,e,X,g,Y,f,W,d,V) is a path that is not simple V a b P1 d U X Z P2 h c e W g f Y Graphs I CSC311: Data Structures
309 Terminology (cont.) V a b d U X Z C2 h e C1 c W g f Y Cyclecircular sequence of alternating vertices and edges each edge is preceded and followed by its endpoints Simple cycle cycle such that all its vertices and edges are distinct Examples C1=(V,b,X,g,Y,f,W,c,U,a,) is a simple cycle C2=(U,c,W,e,X,g,Y,f,W,d,V,a,) is a cycle that is not simple V a b d U X Z C2 h e C1 c W g f Y Graphs I CSC311: Data Structures
310 Properties Sv deg(v) = 2m Property 1 Notation Property 2Proof: each edge is counted twice Property 2 In an undirected graph with no self-loops and no multiple edges m n (n - 1)/2 Proof: each vertex has degree at most (n - 1) What is the bound for a directed graph? Notation n number of vertices m number of edges deg(v) degree of vertex v Example n = 4 m = 6 deg(v) = 3 Graphs I CSC311: Data Structures
311 Main Methods of the Graph ADTVertices and edges are positions store elements Accessor methods endVertices(e): an array of the two endvertices of e opposite(v, e): the vertex opposite of v on e areAdjacent(v, w): true iff v and w are adjacent replace(v, x): replace element at vertex v with x replace(e, x): replace element at edge e with x Update methods insertVertex(o): insert a vertex storing element o insertEdge(v, w, o): insert an edge (v,w) storing element o removeVertex(v): remove vertex v (and its incident edges) removeEdge(e): remove edge e Iterator methods incidentEdges(v): edges incident to v vertices(): all vertices in the graph edges(): all edges in the graph Graphs I CSC311: Data Structures
312 Edge List Structure Vertex object Edge object Vertex sequenceelement reference to position in vertex sequence Edge object origin vertex object destination vertex object reference to position in edge sequence Vertex sequence sequence of vertex objects Edge sequence sequence of edge objects u a c b d v w z u v w z a b c d Graphs I CSC311: Data Structures
313 Adjacency List Structurev b Edge list structure Incidence sequence for each vertex sequence of references to edge objects of incident edges Augmented edge objects references to associated positions in incidence sequences of end vertices u w u v w a b Graphs I CSC311: Data Structures
314 Adjacency Matrix Structurev b Edge list structure Augmented vertex objects Integer key (index) associated with vertex 2D-array adjacency array Reference to edge object for adjacent vertices Null for non nonadjacent vertices The “old fashioned” version just has 0 for no edge and 1 for edge u w u 1 v 2 w 1 2 a b Graphs I CSC311: Data Structures
315 Asymptotic Performancen vertices, m edges no parallel edges no self-loops Bounds are “big-Oh” Edge List Adjacency List Adjacency Matrix Space n + m n2 incidentEdges(v) m deg(v) n areAdjacent (v, w) min(deg(v), deg(w)) 1 insertVertex(o) insertEdge(v, w, o) removeVertex(v) removeEdge(e) Graphs I CSC311: Data Structures
316 Subgraphs A subgraph S of a graph G is a graph such thatThe vertices of S are a subset of the vertices of G The edges of S are a subset of the edges of G A spanning subgraph of G is a subgraph that contains all the vertices of G Subgraph Spanning subgraph Graphs I CSC311: Data Structures
317 Non connected graph with two connected componentsConnectivity A graph is connected if there is a path between every pair of vertices A connected component of a graph G is a maximal connected subgraph of G Connected graph Non connected graph with two connected components Graphs I CSC311: Data Structures
318 Trees and Forests A (free) tree is an undirected graph T such thatT is connected T has no cycles This definition of tree is different from the one of a rooted tree A forest is an undirected graph without cycles The connected components of a forest are trees Tree Forest Graphs I CSC311: Data Structures
319 Spanning Trees and ForestsA spanning tree of a connected graph is a spanning subgraph that is a tree A spanning tree is not unique unless the graph is a tree Spanning trees have applications to the design of communication networks A spanning forest of a graph is a spanning subgraph that is a forest Graph Spanning tree Graphs I CSC311: Data Structures
320 Depth-First Search Depth-first search (DFS) is a general technique for traversing a graph A DFS traversal of a graph G Visits all the vertices and edges of G Determines whether G is connected Computes the connected components of G Computes a spanning forest of G DFS on a graph with n vertices and m edges takes O(n + m ) time DFS can be further extended to solve other graph problems Find and report a path between two given vertices Find a cycle in the graph Depth-first search is to graphs while Euler tour is to binary trees Graphs I CSC311: Data Structures
321 DFS Algorithm The algorithm uses a mechanism for setting and getting “labels” of vertices and edges Algorithm DFS(G, v) Input graph G and a start vertex v of G Output labeling of the edges of G in the connected component of v as discovery edges and back edges setLabel(v, VISITED) for all e G.incidentEdges(v) if getLabel(e) = UNEXPLORED w opposite(v,e) if getLabel(w) = UNEXPLORED setLabel(e, DISCOVERY) DFS(G, w) else setLabel(e, BACK) Algorithm DFS(G) Input graph G Output labeling of the edges of G as discovery edges and back edges for all u G.vertices() setLabel(u, UNEXPLORED) for all e G.edges() setLabel(e, UNEXPLORED) for all v G.vertices() if getLabel(v) = UNEXPLORED DFS(G, v) Graphs I CSC311: Data Structures
322 Example unexplored vertex visited vertex unexplored edgeB A C E A visited vertex unexplored edge discovery edge back edge D B A C E D B A C E Graphs I CSC311: Data Structures
323 Example (cont.) D B A C E D B A C E D B A C E D B A C E Graphs ICSC311: Data Structures
324 Properties of DFS Property 1 Property 2DFS(G, v) visits all the vertices and edges in the connected component of v Property 2 The discovery edges labeled by DFS(G, v) form a spanning tree of the connected component of v D B A C E Graphs I CSC311: Data Structures
325 Analysis of DFS Setting/getting a vertex/edge label takes O(1) timeEach vertex is labeled twice once as UNEXPLORED once as VISITED Each edge is labeled twice once as DISCOVERY or BACK Method incidentEdges is called once for each vertex DFS runs in O(n + m) time provided the graph is represented by the adjacency list structure Recall that Sv deg(v) = 2m Graphs I CSC311: Data Structures
326 Path Finding We can specialize the DFS algorithm to find a path between two given vertices u and z using the template method pattern We call DFS(G, u) with u as the start vertex We use a stack S to keep track of the path between the start vertex and the current vertex As soon as destination vertex z is encountered, we return the path as the contents of the stack Algorithm pathDFS(G, v, z) setLabel(v, VISITED) S.push(v) if v = z return S.elements() for all e G.incidentEdges(v) if getLabel(e) = UNEXPLORED w opposite(v,e) if getLabel(w) = UNEXPLORED setLabel(e, DISCOVERY) S.push(e) pathDFS(G, w, z) S.pop(e) else setLabel(e, BACK) S.pop(v) Graphs I CSC311: Data Structures
327 Cycle Finding Algorithm cycleDFS(G, v, z) setLabel(v, VISITED) S.push(v) for all e G.incidentEdges(v) if getLabel(e) = UNEXPLORED w opposite(v,e) S.push(e) if getLabel(w) = UNEXPLORED setLabel(e, DISCOVERY) cycleDFS(G, w, z) S.pop(e) else T new empty stack repeat o S.pop() T.push(o) until o = w return T.elements() S.pop(v) We can specialize the DFS algorithm to find a simple cycle using the template method pattern We use a stack S to keep track of the path between the start vertex and the current vertex As soon as a back edge (v, w) is encountered, we return the cycle as the portion of the stack from the top to vertex w Graphs I CSC311: Data Structures
328 Breadth-First Search Breadth-first search (BFS) is a general technique for traversing a graph A BFS traversal of a graph G Visits all the vertices and edges of G Determines whether G is connected Computes the connected components of G Computes a spanning forest of G BFS on a graph with n vertices and m edges takes O(n + m ) time BFS can be further extended to solve other graph problems Find and report a path with the minimum number of edges between two given vertices Find a simple cycle, if there is one Graphs I CSC311: Data Structures
329 BFS Algorithm The algorithm uses a mechanism for setting and getting “labels” of vertices and edges Algorithm BFS(G, s) L0 new empty sequence L0.insertLast(s) setLabel(s, VISITED) i 0 while Li.isEmpty() Li +1 new empty sequence for all v Li.elements() for all e G.incidentEdges(v) if getLabel(e) = UNEXPLORED w opposite(v,e) if getLabel(w) = UNEXPLORED setLabel(e, DISCOVERY) setLabel(w, VISITED) Li +1.insertLast(w) else setLabel(e, CROSS) i i +1 Algorithm BFS(G) Input graph G Output labeling of the edges and partition of the vertices of G for all u G.vertices() setLabel(u, UNEXPLORED) for all e G.edges() setLabel(e, UNEXPLORED) for all v G.vertices() if getLabel(v) = UNEXPLORED BFS(G, v) Graphs I CSC311: Data Structures
330 Example unexplored vertex visited vertex unexplored edgeC B A E D L0 L1 F A unexplored vertex A visited vertex unexplored edge discovery edge cross edge L0 L0 A A L1 L1 B C D B C D E F E F Graphs I CSC311: Data Structures
331 Example (cont.) L0 L1 L0 L1 L2 L0 L1 L2 L0 L1 L2 C B A E D F C B A E DGraphs I CSC311: Data Structures
332 Example (cont.) L0 L1 L2 L0 L1 L2 L0 L1 L2 C B A E D F A B C D E F C BGraphs I CSC311: Data Structures
333 Properties Notation Property 1 Property 2 Property 3Gs: connected component of s Property 1 BFS(G, s) visits all the vertices and edges of Gs Property 2 The discovery edges labeled by BFS(G, s) form a spanning tree Ts of Gs Property 3 For each vertex v in Li The path of Ts from s to v has i edges Every path from s to v in Gs has at least i edges A B C D E F L0 A L1 B C D L2 E F Graphs I CSC311: Data Structures
334 Analysis Setting/getting a vertex/edge label takes O(1) timeEach vertex is labeled twice once as UNEXPLORED once as VISITED Each edge is labeled twice once as DISCOVERY or CROSS Each vertex is inserted once into a sequence Li Method incidentEdges is called once for each vertex BFS runs in O(n + m) time provided the graph is represented by the adjacency list structure Recall that Sv deg(v) = 2m Graphs I CSC311: Data Structures
335 Applications Using the template method pattern, we can specialize the BFS traversal of a graph G to solve the following problems in O(n + m) time Compute the connected components of G Compute a spanning forest of G Find a simple cycle in G, or report that G is a forest Given two vertices of G, find a path in G between them with the minimum number of edges, or report that no such path exists Graphs I CSC311: Data Structures
336 DFS vs. BFS Applications DFS BFS DFS BFSSpanning forest, connected components, paths, cycles Shortest paths Biconnected components C B A E D L0 L1 F L2 A B C D E F DFS BFS Graphs I CSC311: Data Structures
337 DFS vs. BFS (cont.) Back edge (v,w) Cross edge (v,w) DFS BFSw is an ancestor of v in the tree of discovery edges Cross edge (v,w) w is in the same level as v or in the next level in the tree of discovery edges C B A E D L0 L1 F L2 A B C D E F DFS BFS Graphs I CSC311: Data Structures
338 Digraphs A digraph is a graph whose edges are all directedShort for “directed graph” Applications one-way streets flights task scheduling A C E B D Graphs I CSC311: Data Structures
339 Digraph Properties E A graph G=(V,E) such that DB D A graph G=(V,E) such that Each edge goes in one direction: Edge (a,b) goes from a to b, but not b to a. If G is simple, m < n*(n-1). If we keep in-edges and out-edges in separate adjacency lists, we can perform listing of in-edges and out-edges in time proportional to their size. Graphs I CSC311: Data Structures
340 Digraph Application Scheduling: edge (a,b) means task a must be completed before b can be started ics21 ics22 ics23 ics51 ics53 ics52 ics161 ics131 ics141 ics121 ics171 The good life ics151 Graphs I CSC311: Data Structures
341 Directed DFS We can specialize the traversal algorithms (DFS and BFS) to digraphs by traversing edges only along their direction In the directed DFS algorithm, we have four types of edges discovery edges back edges forward edges cross edges A directed DFS starting a vertex s determines the vertices reachable from s E D C B A Graphs I CSC311: Data Structures
342 Reachability DFS tree rooted at v: vertices reachable from v via directed paths E D C E D A C F E D A B C F A B Graphs I CSC311: Data Structures
343 Strong Connectivity Each vertex can reach all other vertices a g c d eb e f g Graphs I CSC311: Data Structures
344 Strong Connectivity AlgorithmPick a vertex v in G. Perform a DFS from v in G. If there’s a w not visited, print “no”. Let G’ be G with edges reversed. Perform a DFS from v in G’. Else, print “yes”. Running time: O(n+m). a G: g c d e b f a G’: g c d e b f Graphs I CSC311: Data Structures
345 Strongly Connected ComponentsMaximal subgraphs such that each vertex can reach all other vertices in the subgraph Can also be done in O(n+m) time using DFS, but is more complicated (similar to biconnectivity). a d c b e f g { a , c , g } { f , d , e , b } Graphs I CSC311: Data Structures
346 Transitive Closure D E Given a digraph G, the transitive closure of G is the digraph G* such that G* has the same vertices as G if G has a directed path from u to v (u v), G* has a directed edge from u to v The transitive closure provides reachability information about a digraph B G C A D E B C A G* Graphs I CSC311: Data Structures
347 Computing the Transitive ClosureWe can perform DFS starting at each vertex O(n(n+m)) If there's a way to get from A to B and from B to C, then there's a way to get from A to C. Alternatively ... Use dynamic programming: The Floyd-Warshall Algorithm Graphs I CSC311: Data Structures
348 Floyd-Warshall Transitive ClosureIdea #1: Number the vertices 1, 2, …, n. Idea #2: Consider paths that use only vertices numbered 1, 2, …, k, as intermediate vertices: Uses only vertices numbered 1,…,k (add this edge if it’s not already in) i j Uses only vertices numbered 1,…,k-1 Uses only vertices numbered 1,…,k-1 k Graphs I CSC311: Data Structures
349 Topological Sorting Number vertices, so that (u,v) in E implies u < v 1 A typical student day wake up 2 3 eat study computer sci. 4 5 nap more c.s. 7 play 8 write c.s. program 6 9 work out make cookies for professors 10 sleep 11 dream about graphs Graphs I CSC311: Data Structures
350 Algorithm for Topological SortingNote: This algorithm is different than the one in Goodrich-Tamassia Running time: O(n + m). How…? Method TopologicalSort(G) H G // Temporary copy of G n G.numVertices() while H is not empty do Let v be a vertex with no outgoing edges Label v n n n - 1 Remove v from H Graphs I CSC311: Data Structures
351 Topological Sorting Algorithm using DFSSimulate the algorithm by using depth-first search O(n+m) time. Algorithm topologicalDFS(G, v) Input graph G and a start vertex v of G Output labeling of the vertices of G in the connected component of v setLabel(v, VISITED) for all e G.incidentEdges(v) if getLabel(e) = UNEXPLORED w opposite(v,e) if getLabel(w) = UNEXPLORED setLabel(e, DISCOVERY) topologicalDFS(G, w) else {e is a forward or cross edge} Label v with topological number n n n - 1 Algorithm topologicalDFS(G) Input dag G Output topological ordering of G n G.numVertices() for all u G.vertices() setLabel(u, UNEXPLORED) for all e G.edges() setLabel(e, UNEXPLORED) for all v G.vertices() if getLabel(v) = UNEXPLORED topologicalDFS(G, v) Graphs I CSC311: Data Structures
352 Topological Sorting ExampleGraphs I CSC311: Data Structures
353 Topological Sorting Example9 Graphs I CSC311: Data Structures
354 Topological Sorting Example8 9 Graphs I CSC311: Data Structures
355 Topological Sorting Example7 8 9 Graphs I CSC311: Data Structures
356 Topological Sorting Example6 7 8 9 Graphs I CSC311: Data Structures
357 Topological Sorting Example6 5 7 8 9 Graphs I CSC311: Data Structures
358 Topological Sorting Example4 6 5 7 8 9 Graphs I CSC311: Data Structures
359 Topological Sorting Example3 4 6 5 7 8 9 Graphs I CSC311: Data Structures
360 Topological Sorting Example2 3 4 6 5 7 8 9 Graphs I CSC311: Data Structures
361 Topological Sorting Example2 1 3 4 6 5 7 8 9 Graphs I CSC311: Data Structures
362 Chapter 13: Graphs II Objectives: Graph ADT: OperationsGraph Implementation: Data structures Graph Traversals: DFS and BFS Directed graphs Weighted graphs Shortest paths Minimum Spanning Trees (MST) CSC311: Data Structures
363 Weighted Graphs PVD ORD SFO LGA HNL LAX DFW MIAIn a weighted graph, each edge has an associated numerical value, called the weight of the edge Edge weights may represent distances, costs, etc. Example: In a flight route graph, the weight of an edge represents the distance in miles between the endpoint airports 849 PVD 1843 ORD 142 SFO 802 LGA 1743 1205 337 1387 HNL 2555 1099 LAX 1233 DFW 1120 MIA Graphs II CSC311: Data Structures
364 Shortest Paths PVD ORD SFO LGA HNL LAX DFW MIAGiven a weighted graph and two vertices u and v, we want to find a path of minimum total weight between u and v. Length of a path is the sum of the weights of its edges. Example: Shortest path between Providence and Honolulu Applications Internet packet routing Flight reservations Driving directions 849 PVD 1843 ORD 142 SFO 802 LGA 1743 1205 337 1387 HNL 2555 1099 LAX 1233 DFW 1120 MIA Graphs II CSC311: Data Structures
365 Shortest Path PropertiesProperty 1: A subpath of a shortest path is itself a shortest path Property 2: There is a tree of shortest paths from a start vertex to all the other vertices Example: Tree of shortest paths from Providence 849 PVD 1843 ORD 142 SFO 802 LGA 1743 1205 337 1387 HNL 2555 1099 LAX 1233 DFW 1120 MIA Graphs II CSC311: Data Structures
366 Dijkstra’s Algorithm We grow a “cloud” of vertices, beginning with s and eventually covering all the vertices We store with each vertex v a label d(v) representing the distance of v from s in the subgraph consisting of the cloud and its adjacent vertices At each step We add to the cloud the vertex u outside the cloud with the smallest distance label, d(u) We update the labels of the vertices adjacent to u The distance of a vertex v from a vertex s is the length of a shortest path between s and v Dijkstra’s algorithm computes the distances of all the vertices from a given start vertex s Assumptions: the graph is connected the edges are undirected the edge weights are nonnegative Graphs II CSC311: Data Structures
367 Edge Relaxation Consider an edge e = (u,z) such that d(z) = 75u is the vertex most recently added to the cloud z is not in the cloud The relaxation of edge e updates distance d(z) as follows: d(z) min{d(z),d(u) + weight(e)} d(u) = 50 10 d(z) = 75 e u s z d(u) = 50 10 d(z) = 60 u e s z Graphs II CSC311: Data Structures
368 Example C B A E D F 3 2 8 5 4 7 1 9 A 4 8 2 8 2 4 7 1 B C D 3 9 2 5 E F A 4 A 4 8 8 2 2 8 2 3 7 2 3 7 1 7 1 B C D B C D 3 9 3 9 5 11 5 8 2 5 2 5 E F E F Graphs II CSC311: Data Structures
369 Example (cont.) A 4 8 2 7 2 3 7 1 B C D 3 9 5 8 2 5 E F A 4 8 2 7 2 3A 4 8 2 7 2 3 7 1 B C D 3 9 5 8 2 5 E F A 4 8 2 7 2 3 7 1 B C D 3 9 5 8 2 5 E F Graphs II CSC311: Data Structures
370 Dijkstra’s Algorithm Algorithm DijkstraDistances(G, s) Q new heap-based priority queue for all v G.vertices() if v = s setDistance(v, 0) else setDistance(v, ) l Q.insert(getDistance(v), v) setLocator(v,l) while Q.isEmpty() u Q.removeMin() for all e G.incidentEdges(u) { relax edge e } z G.opposite(u,e) r getDistance(u) + weight(e) if r < getDistance(z) setDistance(z,r) Q.replaceKey(getLocator(z),r) A priority queue stores the vertices outside the cloud Key: distance Element: vertex Locator-based methods insert(k,e) returns a locator replaceKey(l,k) changes the key of an item We store two labels with each vertex: Distance (d(v) label) locator in priority queue Graphs II CSC311: Data Structures
371 Analysis of Dijkstra’s AlgorithmGraph operations Method incidentEdges is called once for each vertex Label operations We set/get the distance and locator labels of vertex z O(deg(z)) times Setting/getting a label takes O(1) time Priority queue operations Each vertex is inserted once into and removed once from the priority queue, where each insertion or removal takes O(log n) time The key of a vertex in the priority queue is modified at most deg(w) times, where each key change takes O(log n) time Dijkstra’s algorithm runs in O((n + m) log n) time provided the graph is represented by the adjacency list structure Recall that Sv deg(v) = 2m The running time can also be expressed as O(m log n) since the graph is connected Graphs II CSC311: Data Structures
372 Shortest Paths Tree Using the template method pattern, we can extend Dijkstra’s algorithm to return a tree of shortest paths from the start vertex to all other vertices We store with each vertex a third label: parent edge in the shortest path tree In the edge relaxation step, we update the parent label Algorithm DijkstraShortestPathsTree(G, s) … for all v G.vertices() setParent(v, ) for all e G.incidentEdges(u) { relax edge e } z G.opposite(u,e) r getDistance(u) + weight(e) if r < getDistance(z) setDistance(z,r) setParent(z,e) Q.replaceKey(getLocator(z),r) Graphs II CSC311: Data Structures
373 Why Dijkstra’s Algorithm WorksDijkstra’s algorithm is based on the greedy method. It adds vertices by increasing distance. Suppose it didn’t find all shortest distances. Let F be the first wrong vertex the algorithm processed. When the previous node, D, on the true shortest path was considered, its distance was correct. But the edge (D,F) was relaxed at that time! Thus, so long as d(F)>d(D), F’s distance cannot be wrong. That is, there is no wrong vertex. A 8 4 2 7 2 3 7 1 B C D 3 9 5 8 2 5 E F Graphs II CSC311: Data Structures
374 Why It Doesn’t Work for Negative-Weight EdgesDijkstra’s algorithm is based on the greedy method. It adds vertices by increasing distance. If a node with a negative incident edge were to be added late to the cloud, it could mess up distances for vertices already in the cloud. A 4 8 6 7 5 4 7 1 B C D -8 5 9 2 5 E F C’s true distance is 1, but it is already in the cloud with d(C)=5! Graphs II CSC311: Data Structures
375 Bellman-Ford AlgorithmWorks even with negative-weight edges Must assume directed edges (for otherwise we would have negative-weight cycles) Iteration i finds all shortest paths that use i edges. Running time: O(nm). Can be extended to detect a negative-weight cycle if it exists How? Algorithm BellmanFord(G, s) for all v G.vertices() if v = s setDistance(v, 0) else setDistance(v, ) for i 1 to n-1 do for each e G.edges() { relax edge e } u G.origin(e) z G.opposite(u,e) r getDistance(u) + weight(e) if r < getDistance(z) setDistance(z,r) Graphs II CSC311: Data Structures
376 Nodes are labeled with their d(v) valuesBellman-Ford Example Nodes are labeled with their d(v) values 4 4 8 8 -2 -2 8 -2 4 7 1 7 1 3 9 3 9 -2 5 -2 5 4 4 8 8 -2 -2 5 7 1 -1 7 1 8 -2 4 5 -2 -1 1 3 9 3 9 6 9 4 -2 5 -2 5 1 9 Graphs II CSC311: Data Structures
377 DAG-based Algorithm Works even with negative-weight edgesUses topological order Doesn’t use any fancy data structures Is much faster than Dijkstra’s algorithm Running time: O(n+m). Algorithm DagDistances(G, s) for all v G.vertices() if v = s setDistance(v, 0) else setDistance(v, ) Perform a topological sort of the vertices for u 1 to n do {in topological order} for each e G.outEdges(u) { relax edge e } z G.opposite(u,e) r getDistance(u) + weight(e) if r < getDistance(z) setDistance(z,r) Graphs II CSC311: Data Structures
378 Nodes are labeled with their d(v) valuesDAG Example Nodes are labeled with their d(v) values 1 1 4 4 8 8 -2 -2 8 -2 4 3 7 2 1 4 3 7 2 1 4 3 9 3 9 -5 5 -5 5 6 5 6 5 1 1 4 4 8 8 -2 -2 5 3 7 2 1 4 -1 3 7 2 1 4 8 -2 4 5 -2 -1 9 3 3 9 1 7 4 -5 5 -5 5 1 7 6 5 6 5 (two steps) Graphs II CSC311: Data Structures
379 Minimum Spanning TreesSpanning subgraph Subgraph of a graph G containing all the vertices of G Spanning tree Spanning subgraph that is itself a (free) tree Minimum spanning tree (MST) Spanning tree of a weighted graph with minimum total edge weight Applications Communications networks Transportation networks ORD 10 1 PIT DEN 6 7 9 3 DCA STL 4 8 5 2 DFW ATL Graphs II CSC311: Data Structures
380 Replacing f with e yields a better spanning treeCycle Property 8 4 2 3 6 7 9 e C f Cycle Property: Let T be a minimum spanning tree of a weighted graph G Let e be an edge of G that is not in T and C let be the cycle formed by e with T For every edge f of C, weight(f) weight(e) Proof: By contradiction If weight(f) > weight(e) we can get a spanning tree of smaller weight by replacing e with f Replacing f with e yields a better spanning tree 8 4 2 3 6 7 9 C e f Graphs II CSC311: Data Structures
381 Replacing f with e yields another MSTPartition Property Partition Property: Consider a partition of the vertices of G into subsets U and V Let e be an edge of minimum weight across the partition There is a minimum spanning tree of G containing edge e Proof: Let T be an MST of G If T does not contain e, consider the cycle C formed by e with T and let f be an edge of C across the partition By the cycle property, weight(f) weight(e) Thus, weight(f) = weight(e) We obtain another MST by replacing f with e U V 7 f 4 9 5 2 8 8 3 e 7 Replacing f with e yields another MST U V 7 f 4 9 5 2 8 8 3 e 7 Graphs II CSC311: Data Structures
382 Kruskal’s Algorithm A priority queue stores the edges outside the cloud Key: weight Element: edge At the end of the algorithm We are left with one cloud that encompasses the MST A tree T which is our MST Algorithm KruskalMST(G) for each vertex V in G do define a Cloud(v) of {v} let Q be a priority queue. Insert all edges into Q using their weights as the key T while T has fewer than n-1 edges do edge e = T.removeMin() Let u, v be the endpoints of e if Cloud(v) Cloud(u) then Add edge e to T Merge Cloud(v) and Cloud(u) return T Graphs II CSC311: Data Structures
383 Data Structure for Kruskal AlgortihmThe algorithm maintains a forest of trees An edge is accepted it if connects distinct trees We need a data structure that maintains a partition, i.e., a collection of disjoint sets, with the operations: -find(u): return the set storing u -union(u,v): replace the sets storing u and v with their union Graphs II CSC311: Data Structures
384 Representation of a PartitionEach set is stored in a sequence Each element has a reference back to the set operation find(u) takes O(1) time, and returns the set of which u is a member. in operation union(u,v), we move the elements of the smaller set to the sequence of the larger set and update their references the time for operation union(u,v) is min(nu,nv), where nu and nv are the sizes of the sets storing u and v Whenever an element is processed, it goes into a set of size at least double, hence each element is processed at most log n times Graphs II CSC311: Data Structures
385 Partition-Based ImplementationA partition-based version of Kruskal’s Algorithm performs cloud merges as unions and tests as finds. Algorithm Kruskal(G): Input: A weighted graph G. Output: An MST T for G. Let P be a partition of the vertices of G, where each vertex forms a separate set. Let Q be a priority queue storing the edges of G, sorted by their weights Let T be an initially-empty tree while Q is not empty do (u,v) Q.removeMinElement() if P.find(u) != P.find(v) then Add (u,v) to T P.union(u,v) return T Running time: O((n+m)log n) Graphs II CSC311: Data Structures
386 Kruskal Example 2704 BOS 867 849 PVD ORD 187 740 144 1846 JFK 621 1841258 802 SFO BWI 1391 1464 337 1090 DFW 946 LAX 1235 1121 MIA 2342 Graphs II CSC311: Data Structures
387 Example Graphs II CSC311: Data Structures
388 Example Graphs II CSC311: Data Structures
389 Example Graphs II CSC311: Data Structures
390 Example Graphs II CSC311: Data Structures
391 Example Graphs II CSC311: Data Structures
392 Example Graphs II CSC311: Data Structures
393 Example Graphs II CSC311: Data Structures
394 Example Graphs II CSC311: Data Structures
395 Example Graphs II CSC311: Data Structures
396 Example Graphs II CSC311: Data Structures
397 Example Graphs II CSC311: Data Structures
398 Example Graphs II CSC311: Data Structures
399 Example JFK BOS MIA ORD LAX DFW SFO BWI PVD 867 2704 187 1258 849 144740 1391 184 946 1090 1121 2342 1846 621 802 1464 1235 337 Graphs II CSC311: Data Structures
400 Prim-Jarnik’s AlgorithmSimilar to Dijkstra’s algorithm (for a connected graph) We pick an arbitrary vertex s and we grow the MST as a cloud of vertices, starting from s We store with each vertex v a label d(v) = the smallest weight of an edge connecting v to a vertex in the cloud At each step: We add to the cloud the vertex u outside the cloud with the smallest distance label We update the labels of the vertices adjacent to u Graphs II CSC311: Data Structures
401 Prim-Jarnik’s AlgorithmAlgorithm PrimJarnikMST(G) Q new heap-based priority queue s a vertex of G for all v G.vertices() if v = s setDistance(v, 0) else setDistance(v, ) setParent(v, ) l Q.insert(getDistance(v), v) setLocator(v,l) while Q.isEmpty() u Q.removeMin() for all e G.incidentEdges(u) z G.opposite(u,e) r weight(e) if r < getDistance(z) setDistance(z,r) setParent(z,e) Q.replaceKey(getLocator(z),r) A priority queue stores the vertices outside the cloud Key: distance Element: vertex Locator-based methods insert(k,e) returns a locator replaceKey(l,k) changes the key of an item We store three labels with each vertex: Distance Parent edge in MST Locator in priority queue Graphs II CSC311: Data Structures
402 Example 7 7 D 7 D 2 2 B 4 B 4 8 9 5 9 5 5 2 F 2 F C C 8 8 3 3 8 8 E E A 7 A 7 7 7 7 7 7 D 2 7 D 2 B 4 B 4 5 9 5 5 9 4 2 F 5 C 2 F 8 C 8 3 8 3 8 E A E 7 7 A 7 7 Graphs II CSC311: Data Structures
403 Example (contd.) 7 7 D 2 B 4 9 4 5 5 2 F C 8 3 8 E A 3 7 7 7 D 2 B 4 57 D 2 B 4 5 9 4 5 2 F C 8 3 8 E A 3 7 Graphs II CSC311: Data Structures
404 Analysis Graph operations Label operations Priority queue operationsMethod incidentEdges is called once for each vertex Label operations We set/get the distance, parent and locator labels of vertex z O(deg(z)) times Setting/getting a label takes O(1) time Priority queue operations Each vertex is inserted once into and removed once from the priority queue, where each insertion or removal takes O(log n) time The key of a vertex w in the priority queue is modified at most deg(w) times, where each key change takes O(log n) time Prim-Jarnik’s algorithm runs in O((n + m) log n) time provided the graph is represented by the adjacency list structure Recall that Sv deg(v) = 2m The running time is O(m log n) since the graph is connected Graphs II CSC311: Data Structures