Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- g belongs to O(f):
- - if g is bounded above by a constant multiple of f
- - if g grows no faster than (a constant multiple of) f
- - if the ratio g/f is bounded above by a constant
- There exists a C greater than 0 and an n0 greater than 0 for which all n > n0 g(n) <= C x f(n)
- - if lim n-->infinity g(n)/f(n) = 0, then for any C >0 the ratio is less than C as long as n is sufficiently large. If it tends to infinity, then there is no C >0 such that g(n) <= C x f(n) for sufficiently large n
- L'Hospitals Rule:
- If the ratio of limits lim n--> infinity g(n)/f(n) is an indeterminate form, then take the derivative of the top and bottom and then apply the limit
- Geometric sum to n: 1 - r^n/1 - r
- If r < 1, geometric sum to infinity = 1/(1-a)
- Sum of first n integers: n(n+1)/2
- nth Harmonic number: Hn = sum (i =1 to n) 1/i
- Hn = theta(logN)
- - In probability theory, an event is a subset of the sample space. (So an event is a set of outcomes)
- Expectation:
- - The expected value, or expectation, of a random variable X represents its "average value"/
- - Formally: if X is a random variable with a finite number of possible values, E(X) = Sum(x belongs to X) of x x P(X = x)
- E(X + Y) = E(X) + E(Y)
- -----------------------------
- Dynamic Arrays:
- - Similar to arrays, but size can be increased or decreased. ArrayList in java, list in Python
- - Inserting or deleting an item in the middle of linked list is fast. Accessing a cell given its index (i.e., finding the kth item in the list) is slow
- - Hashing is fast: O(1) expected time for access, insertion
- Hashing Disadvantages:
- - Access time (except for cuckoo hashing) and insertion time (for all strategies) is expected time, not worse-case time. So there is no absolute guarantee on performance
- - Performance depends on the load factor: number of items stored/table size
- - Hashing cannot tell us about nearby keys - Exact match query, successor query
- Facts about binary trees:
- - there are at most 2^k nodes at level k
- - a binary tree with depth d has:
- - at most 2^d keaves and at most 2^(d+1) - 1 nodes
- - a binary tree with n leaves has depth >= ceiling(logn)
- Binary search trees:
- - Find, insert and remove can all be done in O(h) time where h is tree height and O(logn) for AVL trees
- Lower bound on the worst case number of 3 way comparisons (<, >, ==) required to find an item in an array is floor(logn) +1
- Hence any algorithm for locating an item in an array of size n using only comparisons must perform at least floor(logn) +1 comparisons in the worst case
- Hence binary-search is optimal with respect to worst-case performance
- --------------------------------------------------
- SORTING: measure of time = number of comparisons
- - an inversion in a sequence or list is a pair of items such that the larger one precedes the smaller one
- Insertion Sort:
- - Work from left to right across array
- - insert each item in correct position with respect to (sorted) elements to its left.
- (Like arranging a hand of cards)
- def insertionSort(n, A):
- for k = 1 to n-1:
- if x = A[k]
- j = k-1
- while (j >= 0) and (A[j] > x )
- A[j+1] = A[j]
- j = j -1
- A[j+1] = x
- Analysis of Insertion Sort:
- - Storage: in place: O(1) extra storage
- - Worse-case running time: theta(n^2)
- - Average-case time: approximately theta(n^2)
- Insertion sort is efficient if the input is almost sorted ( Time <= n - 1 + (# inversions) )
- Selection Sort:
- 2 variations:
- - Repeatedly (for i from 0 to n-1) find the minimum value, output it, delete it
- - Repeatedly (for i from n-1 down to 1) find the maximum of A[0] ... A[i] and swap this value with A[i]
- - Both variations run in O(n^2) time if we use the straightforward approach to finding the max/min. They can be improved by treating the items as items in an appropriately designed priority queue
- Quick Sort:
- function quicksort(A, first, last):
- splitpoint = split(A, first, last)
- quicksort(A, first, splitpoint)
- quicksort(A, splitpoint + 1, last)
- def split(A, first, last):
- splitpoint = first
- x = A[first]
- for k = first+1 to last do:
- if A[k] < x:
- A[splitpoint+1] <--> A[k}
- splitpoint = splitpoint + 1
- A[first] <--> A[splitpoint]
- return splitpoint
- Worst Case Running Time - theta(n^2) (worst case number of comparisons = O(n^2))
- Quicksort Key Fact: During the run of Quicksort, two values x and y get compared if and only if the first key in the range [x..y] to be chosen as pivot is either x or y
- Average time for quicksort is O(nlogn)
- Implementation tricks for improving Quicksort:
- 1. Better choice of pivot element:
- - Instead of a single element, choose median of 3 ( or 5 or 7 )
- - Choose a random element
- - Combine
- 2. Reduce procedure call overhead
- - Explicitly manipulate the stack in the program (rather than making recursive calls)
- - for small lists, use a nonrecursive sort like insertion sort or selection sort
- 3. Reduce stack space
- - Push the larger sublist and immediately working on the smaller sublist
- - Reduces worst-case stack usage from O(n) to O(logn)
- Mergesort
- function mergeSort(A, first, last):
- if first < last:
- mid = floor[(first+last)/2]
- mergeSort(A, first, mid)
- mergeSort(A, mid+1, last)
- merge(A, first, mid, mid + 1, last)
- function merge(A, first1, last1, first2, last2):
- index1 = first1
- index2 = first 2
- tempIndex = 0
- while (index1 <= last1) and (index2 <= last2):
- if A[index1] < A[index2]:
- temp[tempIndex++] = A[index1++]
- else:
- temp[tempIndex++] = A[index2++]
- while(index1 <= last1):
- temp[tempIndex++] = A[index1++]
- while(index2 <= last2):
- temp[tempIndex++] = A[index2++]
- temp = 0, index = first
- while (index <= last2): A[index++] = temp[tempIndex++]
- Asymptotic Solution of mergesort recurrence equation: T(n) = theta(nlogn)
- To count inversions using mergesort, add an invCount variable in the merge step and at each addition into tempIndex, do:
- invCount = invCount + last1 - index1 + 1
- and then return it at the end
- - if we also additionally, wanted to report the inversions, the extra running time is proportional to the number of inversions reported: Report inversions in O(nlogn + k) time
- ---------------------------------------------------------------------------
- - Priority queues are implemented in the form of a heap
- Observation: Logn (where n is the number of nodes in the binary tree) tells you the depth of the binary tree with the depth of the root being 0
- Viewing the array as a binary tree:
- - left child of H[i] = H[2i+1]
- - right child of H[i] H[2i+2]
- - parent of H[i] = H[(i-1)2]
- Binary Heap:
- - for a SiftUp, at most 1 comparison at each level, so total time is O(logn)
- - for a SiftDown, at most 2 comparisons at each level, so total time is O(logn)
- - Insert operation: O(logn) (because SiftUp time dominates)
- - Deletion: Siftup/ siftdown time dominates, so total time is O(logn)
- - Extractmax: Delete time dominates, so total time is O(logn)
- - To construct a heap in O(n) time, put the data in H in arbitrary order and then run the heapify function:
- function heapify(H, n):
- for i = [(n-1)/2] down to 0: //start with the last parent node
- SiftDown(H, i)
- def heapsort(A, n)
- heapify(A, n)
- for k = n-1 down to 1 do:
- A[k] = ExtractMax(A)
- Total Time is O(nlogn)
- - if we use a min heap instead, we can just output elements in sorted order as opposed to a max heap where we would have to store the elements back in an array
- Comparison-based sorts:Summary
- Insertion sort
- Worst Case time: theta(n^2)
- Storage requirement: In-place
- Remarks: Good if input is almost sorted
- Stable
- QuickSort
- Worst Case time: theta(n^2)
- Storage requirement: O(logn) extra for stack
- Remarks: O(nlogn) expected time
- Not stable
- Mergesort
- Worse Case time: theta(nlogn)
- Storage requirement: O(n) extra for merge
- Stable
- Heapsort
- Worse case time: theta(nlogn)
- Storage Requirement: In-place
- Remarks: Can output k smallest in sorted order in O(n + klogn) time
- Not stable
- - Any algorithm for sorting a list of size n using only comparisons must perform at least ceiling(logn!) comparisons in the worst case (decision tree with left:right)
- ceiling(lgn!) = theta(nlogn). hence any algorithm for sorting a list of size n using only comparisons must perform at least theta(nlogn) comparisons in the worst case
- Hence heapsort and mergesort are asymptotically optimal. The lower bound is asymptotically tight (cannot be improved asymptotically)
- Bucket Sort: Storage is O(n+b)
- Runtime = O(s^2) where s is worst case number of items in a bucket
- Special case Bucket sort = the number of buckets is equal to the number of keys. In which case the keys are distributed independently and uniformly over the buckets.
- Then the expected total cost of the intra-bucket sorts is O(n)
- - Suppose we have n integers in the range 1..b. Then we can use b buckets and get a runtime of O(n + b) worst case
- Radix sort:
- Useful for sorting multi-field keys (e.g dates) and multi-digit numbers
- algorithm radix_sort(A, n):
- for field ranging from rightmost (least significant) to leftmost (most significant) do:
- sort L on field using a stable sort
- External Sorting:
- When sorting a filer larger than available memory, with n records in file and m records:
- - Read in groups of m records
- - Sort each group
- - Write each run (sorted group) to a separate output file
- - Choose f files (or all that are left if there are fewer than f)
- - Merge the contents of the f input files into a new output file
- Replacement Selection:
- make initial runs longer by using this improvement
- - When a key is written and the next key is read, if the new key is >= the last key written, make it a part of the current run. If not, save it for the next run
- On average, replacement selection doubles the sizes of the runs, assuming uniform distribution of the sort keys
- - A sorting algorithm is stable if whenever two keys are equal the algorithm preserves their order (i.e, does not reverse them.)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement