1 00:00:00,000 --> 00:00:11,000 2 00:00:11,000 --> 00:00:15,000 This presentation is delivered by the Stanford Center for Professional 3 00:00:15,000 --> 00:00:25,000 Development. 4 00:00:25,000 --> 00:00:26,000 Okay. 5 00:00:26,000 --> 00:00:27,000 So this is 6 00:00:27,000 --> 00:00:30,000 the last thing we had talked about at the end of the merge sort was 7 00:00:30,000 --> 00:00:35,000 comparing a quadratic and N-squared sort, this left and right sort right here to the linear 8 00:00:35,000 --> 00:00:37,000 arrhythmic which is the N-log in 9 00:00:37,000 --> 00:00:40,000 kind that merge sort runs in, right, showing you that 10 00:00:40,000 --> 00:00:41,000 11 00:00:41,000 --> 00:00:44,000 really way out-performing even on small values and getting the large values 12 00:00:44,000 --> 00:00:47,000 [inaudible] just getting enormous in terms of what you can accomplish with 13 00:00:47,000 --> 00:00:48,000 a N-log-in algorithm 14 00:00:48,000 --> 00:00:50,000 really vastly 15 00:00:50,000 --> 00:00:52,000 superior to what you're gonna get out of a N-squared algorithm. I said 16 00:00:52,000 --> 00:00:56,000 well, N-log-I told you without proof that you're going to take on faith that I 17 00:00:56,000 --> 00:00:57,000 wouldn't lie to you, 18 00:00:57,000 --> 00:01:01,000 that that is the best we can do in terms of a comparison based sort, 19 00:01:01,000 --> 00:01:04,000 but what we're going to look for is a way of maybe even improving on those 20 00:01:04,000 --> 00:01:06,000 kinds of merge sort by kind of a 21 00:01:06,000 --> 00:01:09,000 one thing we might want to do is avoid that copying of the data out and back that 22 00:01:09,000 --> 00:01:11,000 merge sort does and 23 00:01:11,000 --> 00:01:15,000 maybe have some lower cost in factors in the process, right that can bring our 24 00:01:15,000 --> 00:01:17,000 times down even better. 25 00:01:17,000 --> 00:01:20,000 So the sort we're looking at is quick sort. 26 00:01:20,000 --> 00:01:22,000 And I think if you were gonna come up with an algorithm name, 27 00:01:22,000 --> 00:01:25,000 naming it Quick Sort becomes good marketing here so it kind of inspires you to 28 00:01:25,000 --> 00:01:29,000 believe actually that it's gonna live up to its name, is a divide-and-conquer algorithm 29 00:01:29,000 --> 00:01:33,000 that takes a strategy like merge sort in the abstract which is the divide into 30 00:01:33,000 --> 00:01:36,000 two pieces and [inaudible] sort those pieces and then join them back 31 00:01:36,000 --> 00:01:37,000 together. 32 00:01:37,000 --> 00:01:40,000 But whereas merge sort does an easy split hard join, 33 00:01:40,000 --> 00:01:43,000 this one flips it on its head and said what if we did some work on the front step 34 00:01:43,000 --> 00:01:45,000 in the splitting process 35 00:01:45,000 --> 00:01:49,000 and then that may give us less to do on the joining step 36 00:01:49,000 --> 00:01:53,000 and so the strategy for quick sort is to run to the pile of 37 00:01:53,000 --> 00:01:55,000 papers, whatever we're trying to work at 38 00:01:55,000 --> 00:01:57,000 in a partitioning step 39 00:01:57,000 --> 00:01:58,000 is the first thing that it does 40 00:01:58,000 --> 00:02:01,000 and that's the splitting step and that partitioning is designed to kind of move 41 00:02:01,000 --> 00:02:04,000 stuff to the lower half and the upper half. 42 00:02:04,000 --> 00:02:07,000 So having some idea of what the middle most value would be, 43 00:02:07,000 --> 00:02:11,000 and then actually just walking through the pile of papers and kind of you know, 44 00:02:11,000 --> 00:02:13,000 distributing to the left and right 45 00:02:13,000 --> 00:02:17,000 portions based on their comparison to this middle-most element. 46 00:02:17,000 --> 00:02:21,000 Once I have those two stacks - I've got A through M over here and N through Z over there, 47 00:02:21,000 --> 00:02:23,000 that we're cursively sorting those 48 00:02:23,000 --> 00:02:26,000 takes place, kind of assuming my delegates do their work 49 00:02:26,000 --> 00:02:32,000 and in the process of re-joining them is really kind of simple. They kind of, you know, 50 00:02:32,000 --> 00:02:37,000 joined together with actually no extra work, no looking, no process there. So the one 51 00:02:37,000 --> 00:02:41,000 thing that actually is is where all the trickiness comes into is this idea of how we 52 00:02:41,000 --> 00:02:44,000 do this partitioning. 53 00:02:44,000 --> 00:02:47,000 So I was being a little big disingenuous when I said well if I had this stack of exam papers and 54 00:02:47,000 --> 00:02:48,000 I know that 55 00:02:48,000 --> 00:02:51,000 the names, you know range over the alphabet A through Z then I know where M is and I 56 00:02:51,000 --> 00:02:54,000 can say well, that's a good mid pint around to divide. 57 00:02:54,000 --> 00:02:57,000 Given that we want to make this work for kind of any sort of data, 58 00:02:57,000 --> 00:03:01,000 we're not going to [inaudible] know, what's the minimal most value. If we have a bunch of 59 00:03:01,000 --> 00:03:04,000 numbers, are they test scores, in which case maybe 50 is a good place to move. 60 00:03:04,000 --> 00:03:06,000 Are they instead, you know, 61 00:03:06,000 --> 00:03:09,000 population counts of states in which case millions would be a better 62 00:03:09,000 --> 00:03:11,000 number to pick to kind of divide 63 00:03:11,000 --> 00:03:12,000 them up. So 64 00:03:12,000 --> 00:03:15,000 we have to come up with a strategy that's gonna work, you know, no matter 65 00:03:15,000 --> 00:03:19,000 what the input that's somehow gonna pick something that's kind of reasonable 66 00:03:19,000 --> 00:03:21,000 for how we're gonna do these divisions. There's 67 00:03:21,000 --> 00:03:23,000 this idea of picking, called a pivot. 68 00:03:23,000 --> 00:03:28,000 So given this, you know, region, this set of things to sort, 69 00:03:28,000 --> 00:03:30,000 how do I know what's a reasonable pivot. 70 00:03:30,000 --> 00:03:33,000 We're looking for something that's close to the median is actually ideal. 71 00:03:33,000 --> 00:03:34,000 So if we could 72 00:03:34,000 --> 00:03:37,000 walk through them and compute the median, that would be the best we could 73 00:03:37,000 --> 00:03:38,000 do, we're 74 00:03:38,000 --> 00:03:41,000 gonna try to get around to not doing that much work 75 00:03:41,000 --> 00:03:45,000 about it, though. We're gonna actually try to kind of make an approximation and we're gonna make a really 76 00:03:45,000 --> 00:03:47,000 very simplistic 77 00:03:47,000 --> 00:03:50,000 choice about what my approximation would be. We're gonna come back and re-visit 78 00:03:50,000 --> 00:03:51,000 this a little bit later. 79 00:03:51,000 --> 00:03:56,000 But I want to say, ''Well, I just took the first paper off the stack. If they're in random order, 80 00:03:56,000 --> 00:03:58,000 right. I look at the first one; I say Okay, it's, 81 00:03:58,000 --> 00:04:00,000 you know, somebody king. 82 00:04:00,000 --> 00:04:03,000 Well, everybody less than king goes over here. Everybody greater than king goes over 83 00:04:03,000 --> 00:04:07,000 there. Now king may not be perfectly in the middle and we're gonna come back to see 84 00:04:07,000 --> 00:04:09,000 what that will do to the analysis, 85 00:04:09,000 --> 00:04:12,000 but at least we know it's in the range of the values we're looking at, right, and so it 86 00:04:12,000 --> 00:04:14,000 at least did that for us. 87 00:04:14,000 --> 00:04:17,000 And it means no matter what our data is, we can always use that. It's a strategy that will always work. Let's 88 00:04:17,000 --> 00:04:18,000 just take the thing 89 00:04:18,000 --> 00:04:20,000 that we first see 90 00:04:20,000 --> 00:04:22,000 and use it to be the pivot and them 91 00:04:22,000 --> 00:04:24,000 slot them left and right. 92 00:04:24,000 --> 00:04:28,000 So let me go through looking at what the code actually does. 93 00:04:28,000 --> 00:04:30,000 This is another of those cases where the 94 00:04:30,000 --> 00:04:32,000 code for partition 95 00:04:32,000 --> 00:04:35,000 is a little bit 96 00:04:35,000 --> 00:04:36,000 frightening 97 00:04:36,000 --> 00:04:40,000 and the analysis we're looking at is not actually to get really really worried 98 00:04:40,000 --> 00:04:42,000 about the exact details of the less 99 00:04:42,000 --> 00:04:45,000 than or the less than or equal to. I'm gonna talk you through what it does, 100 00:04:45,000 --> 00:04:48,000 but the more important take-away point is what's the algorithm behind Quick Sort. What's 101 00:04:48,000 --> 00:04:53,000 the strategy it's using to divide and conquer and how that's working out. 102 00:04:53,000 --> 00:04:56,000 So the main Quick Sort algorithm looks like this. We're given this 103 00:04:56,000 --> 00:04:59,000 vector of things we're trying to sort and we have a start index and a stop index 104 00:04:59,000 --> 00:05:04,000 that identifies the sub region of the vector that we're currently trying to sort. I'm gonna 105 00:05:04,000 --> 00:05:08,000 use this in subsequent calls to kind of identify what piece we're recursively 106 00:05:08,000 --> 00:05:09,000 focusing on. 107 00:05:09,000 --> 00:05:12,000 As long as there's a 108 00:05:12,000 --> 00:05:15,000 positive difference between the start and the stop so there's at least 109 00:05:15,000 --> 00:05:16,000 two elements to sort, 110 00:05:16,000 --> 00:05:19,000 then we go through the process of doing the division and the sorting. So then 111 00:05:19,000 --> 00:05:23,000 it makes the call to partition saying sub-region array 112 00:05:23,000 --> 00:05:26,000 do the partition and we'll come back and talk about that code in a second and the thing we're trying to find 113 00:05:26,000 --> 00:05:31,000 the partition is the index at which the pivot was placed. So after we did all the 114 00:05:31,000 --> 00:05:31,000 work, 115 00:05:31,000 --> 00:05:34,000 it moved everything less than the pivot to the left side of that, everything 116 00:05:34,000 --> 00:05:36,000 greater than the pivot to the right side 117 00:05:36,000 --> 00:05:37,000 and then the pivot will tell us 118 00:05:37,000 --> 00:05:41,000 where the division is and then we make two recursive calls 119 00:05:41,000 --> 00:05:43,000 to sort everything up to but not including pivot, 120 00:05:43,000 --> 00:05:46,000 to sort everything past the pivot to the end, 121 00:05:46,000 --> 00:05:49,000 and then those calls are operating on the array in place, so we're not copying it out and 122 00:05:49,000 --> 00:05:54,000 copying it back. So in fact, there actually even isn't really any joint step here. 123 00:05:54,000 --> 00:05:57,000 We said sort the four things on the left, sort the seven things on the right 124 00:05:57,000 --> 00:06:00,000 and then the joining of them was where they're already where they need 125 00:06:00,000 --> 00:06:02,000 to be so in fact we don't have anything to do in the 126 00:06:02,000 --> 00:06:04,000 joined wall. 127 00:06:04,000 --> 00:06:09,000 The tricky part is certainly in partition. The 128 00:06:09,000 --> 00:06:12,000 version of the partition we're using here is one that kind of takes two indices, 129 00:06:12,000 --> 00:06:14,000 two fingers, you might imagine 130 00:06:14,000 --> 00:06:16,000 and it walks a 131 00:06:16,000 --> 00:06:18,000 one end from 132 00:06:18,000 --> 00:06:22,000 the stop position down and one from the start position over 133 00:06:22,000 --> 00:06:26,000 and it's looking for elements that are on the wrong side. 134 00:06:26,000 --> 00:06:30,000 So if we start with our right hand finger on the very end of the array, 135 00:06:30,000 --> 00:06:31,000 that first loop in there 136 00:06:31,000 --> 00:06:35,000 is designed to move the right hand down 137 00:06:35,000 --> 00:06:38,000 looking for something that doesn't belong in left side, so we 138 00:06:38,000 --> 00:06:41,000 identified 45 as the pivot value. It says 139 00:06:41,000 --> 00:06:43,000 find something that's not 140 00:06:43,000 --> 00:06:48,000 greater then 45. That's the first thing we found coming in from the right 141 00:06:48,000 --> 00:06:50,000 downward that doesn't belong, 142 00:06:50,000 --> 00:06:51,000 so only that first step. 143 00:06:51,000 --> 00:06:54,000 So it turns out it takes it just one iteration to find that, 144 00:06:54,000 --> 00:06:57,000 and so okay, this guy doesn't belong on the right-hand side. 145 00:06:57,000 --> 00:07:00,000 Now it does the same thing on the inverse on the left-hand side. 146 00:07:00,000 --> 00:07:03,000 Try to find something that doesn't belong on the left-hand side. 147 00:07:03,000 --> 00:07:06,000 So the left hand moves up. In 148 00:07:06,000 --> 00:07:08,000 this case, it took two iterations to get to that, 149 00:07:08,000 --> 00:07:09,000 to this 92. 150 00:07:09,000 --> 00:07:13,000 So now we have a perfect opportunity to fix both our problems 151 00:07:13,000 --> 00:07:15,000 with one swap, 152 00:07:15,000 --> 00:07:16,000 exchange what's 153 00:07:16,000 --> 00:07:19,000 at the current position of left hand with right hand and now we will have 154 00:07:19,000 --> 00:07:23,000 made both of our problems go away and we'll have kind of more things on the 155 00:07:23,000 --> 00:07:24,000 left and the right that belong there 156 00:07:24,000 --> 00:07:27,000 and then we can walk inward from there, you know, to find subsequent ones 157 00:07:27,000 --> 00:07:29,000 that are out of order. 158 00:07:29,000 --> 00:07:31,000 So those two get exchanged and now 159 00:07:31,000 --> 00:07:33,000 right again 160 00:07:33,000 --> 00:07:35,000 moves into find that 8, 161 00:07:35,000 --> 00:07:37,000 left moves over to find that 74. 162 00:07:37,000 --> 00:07:38,000 We swap again. 163 00:07:38,000 --> 00:07:40,000 And as we just keep doing this, 164 00:07:40,000 --> 00:07:41,000 right, until 165 00:07:41,000 --> 00:07:43,000 they cross, and once they've cross, 166 00:07:43,000 --> 00:07:47,000 right then we know that everything that the right scanned over is greater than the pivot, 167 00:07:47,000 --> 00:07:48,000 everything 168 00:07:48,000 --> 00:07:51,000 in the left was less than and 169 00:07:51,000 --> 00:07:55,000 at that point, right we have divided it, right, everything that's small is kind 170 00:07:55,000 --> 00:07:56,000 of in the left side of the 171 00:07:56,000 --> 00:07:59,000 vector. Everything that's large in the right side. So I 172 00:07:59,000 --> 00:08:03,000 swap these two and now I get to that place and I say okay, 173 00:08:03,000 --> 00:08:05,000 so now big things here, small things there. 174 00:08:05,000 --> 00:08:10,000 The last thing we do is we move the pivot into the place right between them and 175 00:08:10,000 --> 00:08:13,000 know that it is exactly located right in the 176 00:08:13,000 --> 00:08:15,000 slot where they cross. Now 177 00:08:15,000 --> 00:08:17,000 I have two sub arrays to work on. 178 00:08:17,000 --> 00:08:21,000 So the return from partition in this case will be the index four here and it 179 00:08:21,000 --> 00:08:24,000 says Okay, go ahead and quick sort 180 00:08:24,000 --> 00:08:25,000 this front-most part 181 00:08:25,000 --> 00:08:28,000 and then come back and do the second part. 182 00:08:28,000 --> 00:08:31,000 So in recursion, kind of think of that as being postponed and it's waiting; we'll get back to it in a minute, but 183 00:08:31,000 --> 00:08:34,000 while we work on this step, we'll do this same thing. It's a 184 00:08:34,000 --> 00:08:39,000 little big harder to see it in the small kind of what's going on, but in this case, right, it 185 00:08:39,000 --> 00:08:44,000 divided that because the pivot was 8 into 8 and the three things greater than the pivot. In 186 00:08:44,000 --> 00:08:48,000 this case, the 41 is the pivot so it's actually going to move 187 00:08:48,000 --> 00:08:50,000 41 over there. It's going to have the left of that as it keeps 188 00:08:50,000 --> 00:08:53,000 working its way down, it gets smaller and smaller and it's harder to see the workings of 189 00:08:53,000 --> 00:08:56,000 the algorithm in such a small case of it 190 00:08:56,000 --> 00:08:58,000 rearranging it, but 191 00:08:58,000 --> 00:08:59,000 we'll get back to the big 192 00:08:59,000 --> 00:09:01,000 left half in 193 00:09:01,000 --> 00:09:02,000 a second. 194 00:09:02,000 --> 00:09:05,000 And so now that that's all been sorted, 195 00:09:05,000 --> 00:09:09,000 we're revisiting the side that's advancing above the pivot 196 00:09:09,000 --> 00:09:13,000 to get these five guys, six guys in the order, taking the same 197 00:09:13,000 --> 00:09:18,000 strategy. You pivot around 67 [inaudible] doing some swapping 198 00:09:18,000 --> 00:09:28,000 [inaudible] and 199 00:09:28,000 --> 00:09:30,000 200 00:09:30,000 --> 00:09:32,000 we'll see it in slightly bigger case to kind of get the idea, 201 00:09:32,000 --> 00:09:35,000 but very quickly there's something very interesting about the way quick sort is working 202 00:09:35,000 --> 00:09:38,000 is that that partition step 203 00:09:38,000 --> 00:09:41,000 very quickly gets things kind of close to the right place. And 204 00:09:41,000 --> 00:09:42,000 so now that we've 205 00:09:42,000 --> 00:09:45,000 done both the left and the right 206 00:09:45,000 --> 00:09:47,000 we're done. The whole thing is done. Everything kind of got moved into the 207 00:09:47,000 --> 00:09:50,000 right position as part of the recursive, part of the sorting and doesn't need to be 208 00:09:50,000 --> 00:09:52,000 further moved 209 00:09:52,000 --> 00:09:54,000 to solve the whole problem. 210 00:09:54,000 --> 00:09:57,000 That first step of the partition is doing a lot of the kind of throwing things kind 211 00:09:57,000 --> 00:09:59,000 of left and right, 212 00:09:59,000 --> 00:10:03,000 but it's actually quickly moving big things and small things that are out of 213 00:10:03,000 --> 00:10:05,000 place closer to where they're gonna eventually need to go, and it turns 214 00:10:05,000 --> 00:10:07,000 out to be a real advantage 215 00:10:07,000 --> 00:10:11,000 in terms of the running time of this sort because it actually is doing 216 00:10:11,000 --> 00:10:14,000 some quick movement close to where it needs to be and that kind of fixes it up in the recursive calls that 217 00:10:14,000 --> 00:10:21,000 examine the smaller sub sections of the [inaudible]. Let 218 00:10:21,000 --> 00:10:28,000 me show you what it sounds like, because that's really what you want to know. So 219 00:10:28,000 --> 00:10:29,000 let's - 220 00:10:29,000 --> 00:10:34,000 it's gonna kind of [inaudible]. So 221 00:10:34,000 --> 00:10:37,000 you can see the big partitioning happening and 222 00:10:37,000 --> 00:10:40,000 then it kind of jumbling things up and then coming back to revisit them, but you 223 00:10:40,000 --> 00:10:44,000 notice that quite quickly 224 00:10:44,000 --> 00:10:47,000 kind of all the small ones kind of got thrown to the left all. All the large elements to the right 225 00:10:47,000 --> 00:10:51,000 and then the kind of process of coming back and so the big partition that you can see kind 226 00:10:51,000 --> 00:10:53,000 of working across them and then kind of noodling down. 227 00:10:53,000 --> 00:11:21,000 If I turn the sound on for it and I'll probably take it down a little bit. 228 00:11:21,000 --> 00:11:25,000 So the big partition steps making a lot of noise, right and moving things kind of 229 00:11:25,000 --> 00:11:28,000 quickly and it almost appears, you know, to hear this kind of random sort of it's hard to 230 00:11:28,000 --> 00:11:30,000 identify what's going on during the big partition, 231 00:11:30,000 --> 00:11:34,000 but then as you hear it make its way down the recursive tree it's 232 00:11:34,000 --> 00:11:36,000 focusing on these smaller and smaller regions where the numbers have all been 233 00:11:36,000 --> 00:11:39,000 gathered because they are similar in value. And 234 00:11:39,000 --> 00:11:42,000 you hear a lot of noodling in the same pitch range 235 00:11:42,000 --> 00:11:45,000 region as its working on the smaller and smaller sub arrays and then they definitely go 236 00:11:45,000 --> 00:11:49,000 from low to high because of the recursion chooses the left side to operate on 237 00:11:49,000 --> 00:11:50,000 before the right side 238 00:11:50,000 --> 00:11:51,000 that you hear 239 00:11:51,000 --> 00:12:00,000 the work being done in the lower tones before it gets to the finishing up 240 00:12:00,000 --> 00:12:01,000 of the high tones. Kinda cool. Let us 241 00:12:01,000 --> 00:12:02,000 come back here and talk about 242 00:12:02,000 --> 00:12:09,000 how to analyze this guy a little bit. 243 00:12:09,000 --> 00:12:09,000 So the main part 244 00:12:09,000 --> 00:12:13,000 of the algorithm is very simple and deceptively simple 245 00:12:13,000 --> 00:12:15,000 because all the hard work was actually done in partition. 246 00:12:15,000 --> 00:12:17,000 The partition step is linear and 247 00:12:17,000 --> 00:12:18,000 if you 248 00:12:18,000 --> 00:12:21,000 can kind of just go along with me conceptually, you'll see that we're moving this 249 00:12:21,000 --> 00:12:24,000 left finger in; we're moving this right finger in 250 00:12:24,000 --> 00:12:27,000 and we stop when they cross, so that means every element was looked at. 251 00:12:27,000 --> 00:12:28,000 Either 252 00:12:28,000 --> 00:12:31,000 they both matched over here and they matched over here, but eventually they let you 253 00:12:31,000 --> 00:12:33,000 look at every element as you worked your way in 254 00:12:33,000 --> 00:12:37,000 and did some swaps along the way. And so that process is linear and the number of 255 00:12:37,000 --> 00:12:39,000 elements. There's a thousand of them kind of has to 256 00:12:39,000 --> 00:12:41,000 advance maybe 400 here and 600 there, 257 00:12:41,000 --> 00:12:45,000 but we'll look at all the elements and the process of partitioning them to the 258 00:12:45,000 --> 00:12:48,000 lower or upper halves. 259 00:12:48,000 --> 00:12:50,000 Then it makes two calls, quick sort 260 00:12:50,000 --> 00:12:54,000 to the left and the right portions of that. 261 00:12:54,000 --> 00:12:56,000 Well, we don't know exactly what that division was, 262 00:12:56,000 --> 00:13:00,000 was it even, was it a little lop sided and that's certainly going to come back to be 263 00:13:00,000 --> 00:13:01,000 something we're gonna look at. 264 00:13:01,000 --> 00:13:05,000 If we assume kind of this is the ideal 50/50 split 265 00:13:05,000 --> 00:13:07,000 sort of in an ideal world 266 00:13:07,000 --> 00:13:09,000 if that choice we had for the pivot 267 00:13:09,000 --> 00:13:13,000 happened to be the median or close enough to the median that effectively we got an even 268 00:13:13,000 --> 00:13:15,000 division, at every level of recursion, 269 00:13:15,000 --> 00:13:19,000 then the recurrence relation for the whole process is the time required to 270 00:13:19,000 --> 00:13:19,000 sort, an 271 00:13:19,000 --> 00:13:20,000 input of N 272 00:13:20,000 --> 00:13:22,000 is in to partition it 273 00:13:22,000 --> 00:13:24,000 and then 2, 274 00:13:24,000 --> 00:13:27,000 sorting two halves that are each 275 00:13:27,000 --> 00:13:29,000 half again as big as the original input. 276 00:13:29,000 --> 00:13:33,000 We saw that recurrence relationship for merge sort; we solved it down. 277 00:13:33,000 --> 00:13:37,000 Think in terms of the tree, right you have the branching of three, branching of two, branching of two 278 00:13:37,000 --> 00:13:41,000 and at each level the partitioning being done across each level and then it 279 00:13:41,000 --> 00:13:47,000 going to a depth of login, right, the number of times you can [inaudible] by two [inaudible] 280 00:13:47,000 --> 00:13:49,000 single case arrays that are easily handled. 281 00:13:49,000 --> 00:13:53,000 So it is an N login sort in this 282 00:13:53,000 --> 00:13:54,000 perfect best 283 00:13:54,000 --> 00:13:55,000 case. 284 00:13:55,000 --> 00:13:56,000 So that should mean that 285 00:13:56,000 --> 00:13:59,000 in terms of Big O growth curves, right, 286 00:13:59,000 --> 00:14:02,000 merge sort and quick sort should look about the same. 287 00:14:02,000 --> 00:14:05,000 It does have sort of better factors, though. 288 00:14:05,000 --> 00:14:06,000 Let me show you. 289 00:14:06,000 --> 00:14:08,000 I go back here and I just run them against each other. 290 00:14:08,000 --> 00:14:15,000 If I put quick sort versus merge sort here, stop 291 00:14:15,000 --> 00:14:17,000 making noise, 292 00:14:17,000 --> 00:14:19,000 293 00:14:19,000 --> 00:14:21,000 the quick sort 294 00:14:21,000 --> 00:14:22,000 pretty handily 295 00:14:22,000 --> 00:14:27,000 managed to beat merge sort in this case, merge sort was on the top, quick sort was underneath 296 00:14:27,000 --> 00:14:29,000 and certainly in terms of what was going on it was doing, 297 00:14:29,000 --> 00:14:32,000 looks like more comparisons right, to get everything there, that partition 298 00:14:32,000 --> 00:14:33,000 step did a lot of comparison, 299 00:14:33,000 --> 00:14:35,000 but fewer moves, 300 00:14:35,000 --> 00:14:38,000 and that kind of actually does sort of make intuitive sense. You think that quick 301 00:14:38,000 --> 00:14:41,000 sort very quickly moves stuff to where it goes and does a little bit of noodling. 302 00:14:41,000 --> 00:14:44,000 Merge sort does a lot of moving things away and moving them back and kind of like 303 00:14:44,000 --> 00:14:46,000 as you see it growing, 304 00:14:46,000 --> 00:14:47,000 you'll see a lot of 305 00:14:47,000 --> 00:14:50,000 large elements that get 306 00:14:50,000 --> 00:14:53,000 placed up in the front, but then have to be moved again right, because that was 307 00:14:53,000 --> 00:14:57,000 not their final resting place. And so merge sort does a lot of kind of movement away and back 308 00:14:57,000 --> 00:15:00,000 that tends to cost it overall, right. You know, a 309 00:15:00,000 --> 00:15:05,000 higher constant factor of the moves than something like quick sort 310 00:15:05,000 --> 00:15:08,000 does where it more quickly gets to the right place and then doesn't move it 311 00:15:08,000 --> 00:15:13,000 again. So 312 00:15:13,000 --> 00:15:16,000 that was all well and good. That was assuming that everything went perfectly. 313 00:15:16,000 --> 00:15:20,000 That was given our simplistic choice of the pivot, 314 00:15:20,000 --> 00:15:24,000 you can imagine that's really assuming a lot right that went our way. 315 00:15:24,000 --> 00:15:29,000 If we look at a particularly bad split, 316 00:15:29,000 --> 00:15:32,000 let's imagine that the number we have to pick is pretty close to, 317 00:15:32,000 --> 00:15:34,000 you know, one of the extremes. If I were 318 00:15:34,000 --> 00:15:35,000 sorting papers by alphabet 319 00:15:35,000 --> 00:15:38,000 and if the one on top happened to be a C, 320 00:15:38,000 --> 00:15:40,000 right, well then I'm gonna be dividing it pretty lop sided, right, it's just gonna 321 00:15:40,000 --> 00:15:44,000 be the As and the Bs on one side and then everything [inaudible] to the end on the other 322 00:15:44,000 --> 00:15:45,000 side. 323 00:15:45,000 --> 00:15:49,000 If I got a split that was about 90/10, ninety percent went on one side, ten 324 00:15:49,000 --> 00:15:50,000 percent on the other, 325 00:15:50,000 --> 00:15:52,000 you would expect right 326 00:15:52,000 --> 00:15:56,000 for it to kind of change the running time of what's going on here, so I get 327 00:15:56,000 --> 00:15:59,000 one tenth of them over here and the low half and maybe be nine-tenths in the right half. 328 00:15:59,000 --> 00:16:03,000 Let's just say every time it always takes nine-tenths and moves it to the upper half, so I 329 00:16:03,000 --> 00:16:05,000 kept picking something that's artificially close to the front. 330 00:16:05,000 --> 00:16:07,000 These like will kind of peter out 331 00:16:07,000 --> 00:16:11,000 fairly quickly diving N by 10 and 10 and 10 and 10, eventually these are gonna peter 332 00:16:11,000 --> 00:16:14,000 out very soon, but this one arm over there 333 00:16:14,000 --> 00:16:18,000 where I keep taking 90 percent of what remains. I had 90 percent, but I had 334 00:16:18,000 --> 00:16:19,000 81 percent. 335 00:16:19,000 --> 00:16:23,000 I keep multiplying by nine-tenths and I'm still kind of retaining a lot of 336 00:16:23,000 --> 00:16:26,000 elements on this one big portion, 337 00:16:26,000 --> 00:16:29,000 that the depth of this tree isn't gonna bottom out quite as quickly as it 338 00:16:29,000 --> 00:16:33,000 used to, but it used to the number of times you could divide N by 2, the log based 339 00:16:33,000 --> 00:16:35,000 2 of N was where we landed. 340 00:16:35,000 --> 00:16:37,000 In this case, I have to 341 00:16:37,000 --> 00:16:40,000 take N and multiply it by nine-tenths so 342 00:16:40,000 --> 00:16:41,000 instead of one half, 343 00:16:41,000 --> 00:16:43,000 right, to the K, it's nine tenths to the K 344 00:16:43,000 --> 00:16:47,000 and it's like how many iterations how deep will this get before it gets 345 00:16:47,000 --> 00:16:49,000 to the simple case 346 00:16:49,000 --> 00:16:51,000 and so solving for 347 00:16:51,000 --> 00:16:56,000 N equals ten-ninths to the K is taking the log-based ten-ninths of both sides, 348 00:16:56,000 --> 00:17:02,000 then K, the number of levels is the log-based ten-ninths of them. We're kind of 349 00:17:02,000 --> 00:17:05,000 relying on a fact of mathematics here though is that all algorithms are the 350 00:17:05,000 --> 00:17:09,000 same value, so log-based A of N and log-based B of 351 00:17:09,000 --> 00:17:12,000 N, they only differ by some constant that you can 352 00:17:12,000 --> 00:17:15,000 compute if you just rearrange the terms around. So in effect 353 00:17:15,000 --> 00:17:16,000 this means that 354 00:17:16,000 --> 00:17:20,000 there is a constant in front of this. It's like a constant 355 00:17:20,000 --> 00:17:23,000 difference, the ratio between ten-ninths and 2 356 00:17:23,000 --> 00:17:27,000 that distinguishes these, but it still can be logged based 2 of N in 357 00:17:27,000 --> 00:17:30,000 terms of Big O, right, where we can throw away those constants. 358 00:17:30,000 --> 00:17:34,000 So even those this will perform, obviously in, you know, 359 00:17:34,000 --> 00:17:38,000 experimentally you would expect that getting this kind of split would cause it 360 00:17:38,000 --> 00:17:39,000 to 361 00:17:39,000 --> 00:17:40,000 work more slowly than it would have 362 00:17:40,000 --> 00:17:44,000 in a perfect split situation. The Big O [inaudible] is still gonna look in-log-in arrhythmic and 363 00:17:44,000 --> 00:17:47,000 [inaudible]. So 364 00:17:47,000 --> 00:17:50,000 that was pretty good to know. So it makes you feel a little bit confident, like sometimes it might be 365 00:17:50,000 --> 00:17:54,000 getting an even split and sometimes it might be getting one-third, two-thirds, sometimes it might be getting 366 00:17:54,000 --> 00:17:56,000 you know, nine-tenths, but if it was kind of 367 00:17:56,000 --> 00:18:01,000 in the mix still dividing off some fraction is still doing okay. 368 00:18:01,000 --> 00:18:07,000 Now, let's look at the really, really, really worst case split. The 369 00:18:07,000 --> 00:18:10,000 really, really worst case split right, it didn't even take off 370 00:18:10,000 --> 00:18:13,000 a fraction. It just managed to separate one, 371 00:18:13,000 --> 00:18:15,000 but somehow the pivot here 372 00:18:15,000 --> 00:18:16,000 373 00:18:16,000 --> 00:18:17,000 did a really 374 00:18:17,000 --> 00:18:20,000 terribly crummy job, right, of dividing it at all 375 00:18:20,000 --> 00:18:23,000 that in effect, all the elements were greater than the pivot or less than 376 00:18:23,000 --> 00:18:27,000 the pivot since they all landed on one side and the pivot was kind of by itself. 377 00:18:27,000 --> 00:18:29,000 So starting with an input of size N, 378 00:18:29,000 --> 00:18:33,000 we sectioned off 1 and we had N minus 1 remaining. Well, the next time, 379 00:18:33,000 --> 00:18:36,000 we sectioned off another one, so this happened again and again and again. 380 00:18:36,000 --> 00:18:39,000 We just got really bad luck all the way through. 381 00:18:39,000 --> 00:18:42,000 Each of these levels is only making progress for the base case at a snails 382 00:18:42,000 --> 00:18:43,000 pace. 383 00:18:43,000 --> 00:18:47,000 Right, one element is being subtracted each subsequent level 384 00:18:47,000 --> 00:18:50,000 and that's gonna go to a depth of N. 385 00:18:50,000 --> 00:18:53,000 And so now if we're doing N work across the levels, it isn't quite N 386 00:18:53,000 --> 00:18:56,000 work because in fact, some of these bottom out. It's still the Gaussian sum in the 387 00:18:56,000 --> 00:18:57,000 end 388 00:18:57,000 --> 00:19:01,000 is that we are got N levels doing N work each. 389 00:19:01,000 --> 00:19:03,000 We suddenly have 390 00:19:03,000 --> 00:19:05,000 what was a linear arrhythmic algorithm 391 00:19:05,000 --> 00:19:07,000 now performing quadritically. So 392 00:19:07,000 --> 00:19:09,000 in the class the selection sort, 393 00:19:09,000 --> 00:19:11,000 inscription sort 394 00:19:11,000 --> 00:19:14,000 in its worst-case behavior. 395 00:19:14,000 --> 00:19:17,000 So quite a range whereas merge sort is a very reliable performer, merge doesn't 396 00:19:17,000 --> 00:19:18,000 care. 397 00:19:18,000 --> 00:19:21,000 No matter what the order is, no matter what's going on, it's the same 398 00:19:21,000 --> 00:19:22,000 amount of work, 399 00:19:22,000 --> 00:19:26,000 it means that there's some inputs that can cause quick sort to vary between being linear 400 00:19:26,000 --> 00:19:27,000 arrhythmic or being quadratic, 401 00:19:27,000 --> 00:19:30,000 and there's a huge difference as we saw in our earlier charts about 402 00:19:30,000 --> 00:19:32,000 how those things will run for 403 00:19:32,000 --> 00:19:36,000 large values. So what 404 00:19:36,000 --> 00:19:38,000 makes the worst case 405 00:19:38,000 --> 00:19:42,000 - given how we're choosing a pivot right now is to take the first thing in the 406 00:19:42,000 --> 00:19:44,000 sub section to look at as the pivot, 407 00:19:44,000 --> 00:19:47,000 what's the example input that gives you this? Sorted. If 408 00:19:47,000 --> 00:19:49,000 it's already sorted. 409 00:19:49,000 --> 00:19:50,000 Isn't that a charmer? 410 00:19:50,000 --> 00:19:54,000 Here's a sorting algorithm. If you ask it to do something 411 00:19:54,000 --> 00:19:56,000 and in fact if you accidentally [inaudible] twice, you already had sorted 412 00:19:56,000 --> 00:19:59,000 the data and you said, ''Oh, you did something,'' and you passed it back to the sort, 413 00:19:59,000 --> 00:20:03,000 it would suddenly actually degenerate into its very worst case. It's already 414 00:20:03,000 --> 00:20:05,000 sorted, so it would say, 415 00:20:05,000 --> 00:20:06,000 ''I've got this stack of exam papers.'' 416 00:20:06,000 --> 00:20:09,000 I look at the first one and I say, ''Oh, look it's Adam's.'' And I say, ''Okay, well, let me 417 00:20:09,000 --> 00:20:12,000 go find all the ones that belong in the left side.'' None of them do. I said, ''Okay, 418 00:20:12,000 --> 00:20:15,000 well put Adam's over here.'' I'll [inaudible] sort that. That was easy. 419 00:20:15,000 --> 00:20:18,000 Oh, let me look at what I've got left, oh with baker on the top. 420 00:20:18,000 --> 00:20:21,000 Okay, well let me put Baker by itself, but could we find the other ones? Oh, no, 421 00:20:21,000 --> 00:20:24,000 no more. It would just you 422 00:20:24,000 --> 00:20:27,000 know, continue because I'm doing this thing, looking at all N, looking at all N minus 1, looking 423 00:20:27,000 --> 00:20:31,000 at N minus 2, all the way down to the bottom making no, 424 00:20:31,000 --> 00:20:35,000 recognizing nothing about how beautifully the data already was arranged 425 00:20:35,000 --> 00:20:36,000 and 426 00:20:36,000 --> 00:20:38,000 you know, getting its worst case. 427 00:20:38,000 --> 00:20:42,000 There's actually a couple of others that come in, too. That's probably the most obvious one to think 428 00:20:42,000 --> 00:20:46,000 of is it's coming in in increasing order. It's also true if its in decreasing order. 429 00:20:46,000 --> 00:20:49,000 If I happen to have the largest value on the top, then 430 00:20:49,000 --> 00:20:52,000 I'll end up splitting it all into, everything to the left side and nothing 431 00:20:52,000 --> 00:20:53,000 to the right. 432 00:20:53,000 --> 00:20:58,000 There are actually other variations that will also produce this. If at any given 433 00:20:58,000 --> 00:21:01,000 stage that we're looking at a sub array, if the first value happens to be the extreme 434 00:21:01,000 --> 00:21:04,000 of what remains whether it's the small to the large, and it could alternate, which would 435 00:21:04,000 --> 00:21:06,000 be kind of a funny pattern, but if you 436 00:21:06,000 --> 00:21:10,000 have the tallest and the shortest and the tallest and then another tallest and then a shortest, 437 00:21:10,000 --> 00:21:13,000 so those would look a little bit harder to describe, but 438 00:21:13,000 --> 00:21:18,000 there are some other alternatives, that product this bad result. 439 00:21:18,000 --> 00:21:21,000 All of these we could call degenerate cases. 440 00:21:21,000 --> 00:21:25,000 There's a small number of these relative to the N factorial [inaudible] your 441 00:21:25,000 --> 00:21:28,000 data could come in. In fact, there was a huge number, and so there's lots and 442 00:21:28,000 --> 00:21:29,000 lots of ways it could be arranged. 443 00:21:29,000 --> 00:21:32,000 There's a very small number of them that would be arranged in such a way to 444 00:21:32,000 --> 00:21:35,000 trigger this very, very worst-case behaviors. 445 00:21:35,000 --> 00:21:38,000 Some are close to worst case, like if they were almost sorted, they might every now and 446 00:21:38,000 --> 00:21:41,000 then get a good split, but mostly a bad split, 447 00:21:41,000 --> 00:21:44,000 but the number that actually degenerate to the absolute worst case is actually 448 00:21:44,000 --> 00:21:47,000 quite small, but 449 00:21:47,000 --> 00:21:49,000 the truth is 450 00:21:49,000 --> 00:21:51,000 is do we expect that those outputs are somehow 451 00:21:51,000 --> 00:21:56,000 hard to imagine us getting. Are they so unusual 452 00:21:56,000 --> 00:21:59,000 and weird that you might be willing to say it's okay that my algorithm has 453 00:21:59,000 --> 00:22:01,000 this one or a couple of bad cases in it 454 00:22:01,000 --> 00:22:03,000 because it never comes up. 455 00:22:03,000 --> 00:22:06,000 In this case, the only thing that your data came in sorted or almost sorted or reverse sorted 456 00:22:06,000 --> 00:22:09,000 is probably not unlikely. 457 00:22:09,000 --> 00:22:11,000 It might be that you know, you happen to be reading from 458 00:22:11,000 --> 00:22:15,000 a file on disc and somebody has taken the [inaudible] and sort it before they gave the 459 00:22:15,000 --> 00:22:19,000 data to you. If all the sudden that caused your program to behave really badly, 460 00:22:19,000 --> 00:22:20,000 that would really be 461 00:22:20,000 --> 00:22:21,000 a 462 00:22:21,000 --> 00:22:23,000 pretty questionable outcome. 463 00:22:23,000 --> 00:22:27,000 So let's go aback and look at just to see, though. It's kind of interesting to see 464 00:22:27,000 --> 00:22:30,000 it happening in 465 00:22:30,000 --> 00:22:33,000 - this is against quick sort versus merge sort. 466 00:22:33,000 --> 00:22:38,000 If I change the data to be partially sorted 467 00:22:38,000 --> 00:22:41,000 and let it run again, 468 00:22:41,000 --> 00:22:43,000 if I still manage to beat it 469 00:22:43,000 --> 00:22:44,000 doing a little bit more, 470 00:22:44,000 --> 00:22:47,000 if I change it to totally sorted 471 00:22:47,000 --> 00:22:53,000 or reverse sorted, for that matter, and 472 00:22:53,000 --> 00:22:54,000 so they look like they're doing a lot of work, 473 00:22:54,000 --> 00:22:56,000 a lot of work about nothing. 474 00:22:56,000 --> 00:22:57,000 And you can see that quick sort 475 00:22:57,000 --> 00:22:59,000 really is still way back there. 476 00:22:59,000 --> 00:23:02,000 It has looked at the first 10 percent or so and is doing a lot 477 00:23:02,000 --> 00:23:04,000 of frantic partitioning 478 00:23:04,000 --> 00:23:07,000 that never moves anything. 479 00:23:07,000 --> 00:23:11,000 And visibly kind of traipsing it's way up there. Merge sort meanwhile is actually 480 00:23:11,000 --> 00:23:13,000 on the beach in Jamaica with a 481 00:23:13,000 --> 00:23:15,000 daiquiri 482 00:23:15,000 --> 00:23:16,000 mocking quick sort 483 00:23:16,000 --> 00:23:21,000 for all those times that it had said it was so much better than it was. 484 00:23:21,000 --> 00:23:24,000 ''Duh, it was already sorted; you Doofus.'' 485 00:23:24,000 --> 00:23:25,000 Apparently it wasn't listening. Merge 486 00:23:25,000 --> 00:23:28,000 sort almost looked like it did nothing and that's because it took the left half, 487 00:23:28,000 --> 00:23:32,000 which is the smallest half, moved it off, right, kind of copied it back and it ends up copying 488 00:23:32,000 --> 00:23:38,000 each element back to the same location it was already originally present in. There you go, quick sort 489 00:23:38,000 --> 00:23:39,000 taking its time and it if I 490 00:23:39,000 --> 00:23:42,000 ran it against, let's say, 491 00:23:42,000 --> 00:23:46,000 insertion sort and selection sort, why not 492 00:23:46,000 --> 00:23:50,000 [inaudible] nothing better to do than to sort for you all day long. 493 00:23:50,000 --> 00:23:54,000 So there insertion sort actually finishing very early because it does nothing. It sort of 494 00:23:54,000 --> 00:23:57,000 looked at them all and said they were already in the right order, 495 00:23:57,000 --> 00:24:00,000 merge sort doing a little bit more work to move everything back and forth in the same place. 496 00:24:00,000 --> 00:24:02,000 But 497 00:24:02,000 --> 00:24:02,000 498 00:24:02,000 --> 00:24:04,000 in this case, selection sort 499 00:24:04,000 --> 00:24:06,000 and 500 00:24:06,000 --> 00:24:08,000 quick sort here at the bottom and selection sort here at the top 501 00:24:08,000 --> 00:24:11,000 doing really roughly about the same work if you look at the sum total of the comparisons 502 00:24:11,000 --> 00:24:13,000 and moves 503 00:24:13,000 --> 00:24:16,000 and then time spent on those suddenly shows that yeah, in this input 504 00:24:16,000 --> 00:24:19,000 situation quick sort is performing like an N-squared sort, 505 00:24:19,000 --> 00:24:23,000 and obviously very similar constant factors to those present in selection sort, 506 00:24:23,000 --> 00:24:24,000 so really 507 00:24:24,000 --> 00:24:28,000 behaving totally quadratically in that worst-case input. If I make 508 00:24:28,000 --> 00:24:38,000 it reverse sorted, it's a little more interesting because you can kind of see something going on. Everybody kind of doing their thing. 509 00:24:38,000 --> 00:24:40,000 Insertion sort now kind of hitting its worst case, 510 00:24:40,000 --> 00:24:43,000 so it actually coming in dead last because it 511 00:24:43,000 --> 00:24:45,000 did so much extra moving, but 512 00:24:45,000 --> 00:24:49,000 still showing the kind of quadratic terms, right, that selection 513 00:24:49,000 --> 00:24:53,000 sort and quick sort are both having, but higher constant factors there, so taking almost 514 00:24:53,000 --> 00:24:54,000 twice as long because 515 00:24:54,000 --> 00:24:55,000 actually doing a little 516 00:24:55,000 --> 00:25:02,000 bit extra work because of that inverted 517 00:25:02,000 --> 00:25:04,000 situation. [Inaudible] something big, just cause. We'll put it back 518 00:25:04,000 --> 00:25:10,000 into random, back into full speed and kind of watch some things happen. 519 00:25:10,000 --> 00:25:11,000 So quick sort right, just 520 00:25:11,000 --> 00:25:14,000 sort of done in a flash, merge sort finishing behind it 521 00:25:14,000 --> 00:25:17,000 and ordinarily 522 00:25:17,000 --> 00:25:20,000 the quadratic sorts taking quite a much 523 00:25:20,000 --> 00:25:22,000 longer time than our two recursive sorts. But 524 00:25:22,000 --> 00:25:30,000 it's the random input really buying quick sort its speed. Is 525 00:25:30,000 --> 00:25:33,000 it gonna win - oh, come on. Oh, come on. Don't you want selection sort to come behind? I always wanted it to 526 00:25:33,000 --> 00:25:37,000 come from behind. It's kind of like rooting for the underdog. 527 00:25:37,000 --> 00:25:38,000 Yes. 528 00:25:38,000 --> 00:25:42,000 And yet, just remember don't mock insertion sort. What it sorted is 529 00:25:42,000 --> 00:25:46,000 the only one that recognizes it and does something clever about it. [Inaudible] 530 00:25:46,000 --> 00:25:49,000 sort is anti-clever about it, so 531 00:25:49,000 --> 00:25:55,000 it still can hold its head up and be proud. This 532 00:25:55,000 --> 00:25:58,000 kind of [inaudible]. I was 533 00:25:58,000 --> 00:26:01,000 thinking about like what algorithms, right, there's a reason why 534 00:26:01,000 --> 00:26:04,000 there's four that we talked about and there's actually a dozen 535 00:26:04,000 --> 00:26:07,000 more. It's like different situations actually produce different 536 00:26:07,000 --> 00:26:09,000 outcomes for different 537 00:26:09,000 --> 00:26:12,000 sorts and there are reasons that even though they might not be the best sort for all 538 00:26:12,000 --> 00:26:15,000 purposes, there are some situations where it might be the right one, so if you had it, 539 00:26:15,000 --> 00:26:18,000 as data said that you expect it to be mostly sorted, 540 00:26:18,000 --> 00:26:19,000 but had a few [inaudible] in it. 541 00:26:19,000 --> 00:26:22,000 Like if you had a pile or sand papers that were sorted and you dropped them on the floor and they 542 00:26:22,000 --> 00:26:25,000 got scuffed up a little bit, but they're still largely sorted, insertion sort is the 543 00:26:25,000 --> 00:26:26,000 way to 544 00:26:26,000 --> 00:26:29,000 go. It will take advantage of the work that was already done 545 00:26:29,000 --> 00:26:33,000 and not redo it or create kind of havoc the way quick sort 546 00:26:33,000 --> 00:26:36,000 would. But quick sort, the last thing we need to tell you, though, about it is we can't 547 00:26:36,000 --> 00:26:40,000 tolerate this. Like the way quick sort is in its classic form 548 00:26:40,000 --> 00:26:41,000 with this 549 00:26:41,000 --> 00:26:42,000 first element as pivot 550 00:26:42,000 --> 00:26:45,000 would be an unacceptable way to do this algorithm. 551 00:26:45,000 --> 00:26:49,000 So quick sort actually typically is one of the most standard element that's offered in 552 00:26:49,000 --> 00:26:52,000 a library of programming languages, the sort element. 553 00:26:52,000 --> 00:26:53,000 Well, it 554 00:26:53,000 --> 00:26:57,000 has to have some strategy for how to deal with this in a way that does not 555 00:26:57,000 --> 00:26:59,000 degenerate 556 00:26:59,000 --> 00:27:02,000 and so the idea is, what you need to do is just pick a different 557 00:27:02,000 --> 00:27:03,000 choice for the pivot, 558 00:27:03,000 --> 00:27:06,000 a little bit more clever, spend a little bit more time, 559 00:27:06,000 --> 00:27:09,000 do something that's a little less predictable than just picking that first 560 00:27:09,000 --> 00:27:11,000 most element. 561 00:27:11,000 --> 00:27:12,000 So to take 562 00:27:12,000 --> 00:27:16,000 it to the far extreme, one thing you could do is just computer the median, analyze the data and compute the median. 563 00:27:16,000 --> 00:27:19,000 It turns out there is a linear time algorithm for this that 564 00:27:19,000 --> 00:27:22,000 would look at every element of the data set once and then be able to tell 565 00:27:22,000 --> 00:27:23,000 you what the median was 566 00:27:23,000 --> 00:27:25,000 and that would guarantee you that 50/50 split. 567 00:27:25,000 --> 00:27:30,000 So if you go find it and use it at every stage, you will guarantee it. 568 00:27:30,000 --> 00:27:34,000 Most algorithms don't tend to do that. That's actually kind of overkill for the problem. We 569 00:27:34,000 --> 00:27:34,000 want to get it to where 570 00:27:34,000 --> 00:27:38,000 it's pretty much guaranteed to never get the worst case. But we're not 571 00:27:38,000 --> 00:27:41,000 concerned with it getting 50/50. It's got 60/40 or something close enough, that 572 00:27:41,000 --> 00:27:45,000 actually, you know, 60/40, 70/30 and it was 573 00:27:45,000 --> 00:27:47,000 bopping around in that range it'd be fine, 574 00:27:47,000 --> 00:27:50,000 so the other two that are much more commonly used 575 00:27:50,000 --> 00:27:53,000 is some kind of approximation of the median 576 00:27:53,000 --> 00:27:56,000 with a little bit of guessing or something in it. For example, median of three 577 00:27:56,000 --> 00:27:57,000 takes 578 00:27:57,000 --> 00:28:01,000 three elements and typically it takes three from some specified position, it takes 579 00:28:01,000 --> 00:28:02,000 the middle most element, 580 00:28:02,000 --> 00:28:04,000 the last element and the front element and it says, 581 00:28:04,000 --> 00:28:06,000 ''Okay, given those three, 582 00:28:06,000 --> 00:28:10,000 we arrange them to find out what's the middle of those three, and use that.'' 583 00:28:10,000 --> 00:28:12,000 If the data was already sorted, it turns out you've got the median, right 584 00:28:12,000 --> 00:28:14,000 because it wasn't the middle most. 585 00:28:14,000 --> 00:28:17,000 If data was just in random position, then you've got one that you know, at least 586 00:28:17,000 --> 00:28:22,000 there's one element on one side, right, and so the odds that that every single time 587 00:28:22,000 --> 00:28:25,000 would produce a very bad split is pretty low. 588 00:28:25,000 --> 00:28:28,000 There are some inputs that could kind of get it to generate a little bit, but it's 589 00:28:28,000 --> 00:28:32,000 pretty foolproof in most ordinary situations. 590 00:28:32,000 --> 00:28:36,000 Even more unpredictable would be just choose a random element. 591 00:28:36,000 --> 00:28:39,000 So look at the start to stop index you have, pick an random there, 592 00:28:39,000 --> 00:28:44,000 flop in to the front and now use it as the pivot element. 593 00:28:44,000 --> 00:28:45,000 If you 594 00:28:45,000 --> 00:28:47,000 don't know ahead of time how your random number [inaudible] it's impossible 595 00:28:47,000 --> 00:28:51,000 to generate an input and force it into the worst-case behavior. And so the idea is that 596 00:28:51,000 --> 00:28:53,000 you're kind of counting on randomness 597 00:28:53,000 --> 00:28:56,000 and just the probabilistic outcome if 598 00:28:56,000 --> 00:29:00,000 it managing to be such that the way it shows the random element was to pick the 599 00:29:00,000 --> 00:29:02,000 extreme and everything, it is just impossible, 600 00:29:02,000 --> 00:29:04,000 the odds are astronomically against it, 601 00:29:04,000 --> 00:29:06,000 and so it will, 602 00:29:06,000 --> 00:29:08,000 a very simple fix, right 603 00:29:08,000 --> 00:29:09,000 604 00:29:09,000 --> 00:29:12,000 that still leaves the possibility of the worst case in there, 605 00:29:12,000 --> 00:29:17,000 but you know, much, much, much far removed probability sense. So, 606 00:29:17,000 --> 00:29:20,000 just simple things, right, but from there the algorithm operates the 607 00:29:20,000 --> 00:29:22,000 same way it always has which is to say, 608 00:29:22,000 --> 00:29:24,000 ''Pick the median, however, you want to do it, 609 00:29:24,000 --> 00:29:27,000 move it into that front slot and then carry on as before in terms of the left 610 00:29:27,000 --> 00:29:30,000 and the right hand and moving down and swapping and recursing and all that [inaudible].'' 611 00:29:30,000 --> 00:29:34,000 Any 612 00:29:34,000 --> 00:29:40,000 questions; anything about sorting, sorting algorithms? Why don't we 613 00:29:40,000 --> 00:29:42,000 do some coding actually, 614 00:29:42,000 --> 00:29:44,000 which is the next thing I want to show you 615 00:29:44,000 --> 00:29:46,000 because once you know how to write sorts, 616 00:29:46,000 --> 00:29:47,000 617 00:29:47,000 --> 00:29:50,000 sort of the other piece I think you need to have to go with it is how would I write 618 00:29:50,000 --> 00:29:54,000 a sort to be generally useful in a large variety of situations that 619 00:29:54,000 --> 00:29:57,000 knowing about these sorts I might decide to build 620 00:29:57,000 --> 00:30:01,000 myself a really general purpose, good, high performance, quick sort, but I 621 00:30:01,000 --> 00:30:03,000 could use again, and again and again. 622 00:30:03,000 --> 00:30:07,000 I want to make it work generically so I could sort strings, I could sort 623 00:30:07,000 --> 00:30:10,000 numbers, or I could sort students, or I could sort 624 00:30:10,000 --> 00:30:11,000 vectors of students or whatever, 625 00:30:11,000 --> 00:30:14,000 and I'd want to make sure that no matter what I needed to do it was gonna 626 00:30:14,000 --> 00:30:17,000 solve all my problems, this one kid of sorting tool, 627 00:30:17,000 --> 00:30:20,000 and we need to know a little bit of C++ to make that work 628 00:30:20,000 --> 00:30:22,000 by use of the function template. 629 00:30:22,000 --> 00:30:27,000 So if you want to ask about sorting algorithms, now is the time. [Inaudible]. We don't 630 00:30:27,000 --> 00:30:28,000 bubble sort, 631 00:30:28,000 --> 00:30:31,000 which is kind of interesting. It will come up actually in the assignment 632 00:30:31,000 --> 00:30:34,000 I give you next week because it is a little bit of an historical artifact 633 00:30:34,000 --> 00:30:38,000 to know about it, but it turns out bubble sort is one of those sorts that 634 00:30:38,000 --> 00:30:43,000 is dominated by pretty much every other sort you'll know about in every way, as there's 635 00:30:43,000 --> 00:30:45,000 really nothing to recommend it. As I said, each of these four have something 636 00:30:45,000 --> 00:30:49,000 about them that actually has a strength or whatever. Bubble sort really doesn't. 637 00:30:49,000 --> 00:30:52,000 It's a little bit harder to write than insertion sort. It's a little bit slower than insertion 638 00:30:52,000 --> 00:30:56,000 sort; it's a little bit easier to get it wrong than insertion sort. It has higher constant 639 00:30:56,000 --> 00:30:57,000 factors. You know, 640 00:30:57,000 --> 00:31:01,000 it does recognize the data and sort order, but so does insertion sort, so it's hard to come up with 641 00:31:01,000 --> 00:31:04,000 a reason to actually spend a lot of time on it, but I will expose you to it in the 642 00:31:04,000 --> 00:31:07,000 assignment because I think it's a little bit of a - 643 00:31:07,000 --> 00:31:08,000 it's part of 644 00:31:08,000 --> 00:31:10,000 our history right, as a computer science 645 00:31:10,000 --> 00:31:14,000 student to be expose to. 646 00:31:14,000 --> 00:31:16,000 Anything else? I'm gonna show 647 00:31:16,000 --> 00:31:19,000 you some things about function templates. Oh, these 648 00:31:19,000 --> 00:31:21,000 are some numbers; I forgot to give that. 649 00:31:21,000 --> 00:31:23,000 Just to say, in the end, right, 650 00:31:23,000 --> 00:31:26,000 which we saw a little bit in the analysis that the constant factors on quick 651 00:31:26,000 --> 00:31:29,000 sort are 652 00:31:29,000 --> 00:31:33,000 noticeable when compared to merge sort by a factor of 4, 653 00:31:33,000 --> 00:31:35,000 moving things more quickly and not having to 654 00:31:35,000 --> 00:31:39,000 mess with them again. Quick 655 00:31:39,000 --> 00:31:41,000 sort [inaudible] 656 00:31:41,000 --> 00:31:44,000 in totally random order? 657 00:31:44,000 --> 00:31:48,000 Yes, they are. Yes - So it doesn't have them taking - So this is actually using a classic quick 658 00:31:48,000 --> 00:31:52,000 sort with no degenerate protection on random data. 659 00:31:52,000 --> 00:31:55,000 If I had put it in, sorted it in on that case, you would definitely see 660 00:31:55,000 --> 00:31:58,000 like numbers comparable to like selection sorts, eight hours in that 661 00:31:58,000 --> 00:32:00,000 last slot, right. Or 662 00:32:00,000 --> 00:32:01,000 you 663 00:32:01,000 --> 00:32:03,000 [inaudible] degenerate protection, and it would probably 664 00:32:03,000 --> 00:32:07,000 have an in the noise a small slow down for that little extra work that's being 665 00:32:07,000 --> 00:32:11,000 done, but then you'll be getting in log and performance reliable across 666 00:32:11,000 --> 00:32:13,000 all states of inputs. So 667 00:32:13,000 --> 00:32:15,000 [inaudible] protection as it 668 00:32:15,000 --> 00:32:19,000 starts to even out time wise with merge sort, does it still - No, it doesn't. 669 00:32:19,000 --> 00:32:23,000 It actually is an almost imperceptible change in the time because it depends on 670 00:32:23,000 --> 00:32:26,000 which form of it you use because if you use, for example the random or median of three, 671 00:32:26,000 --> 00:32:31,000 the amount of work you're doing is about two more things per level in the tree and 672 00:32:31,000 --> 00:32:34,000 so it turns out that, yeah, 673 00:32:34,000 --> 00:32:37,000 over the space of login it's just not enough to even be noticeable. 674 00:32:37,000 --> 00:32:41,000 Now if you did the full median algorithm, you could actually start to see it because then 675 00:32:41,000 --> 00:32:45,000 you'd see both a linear partition step and a linear median step and that 676 00:32:45,000 --> 00:32:47,000 would actually raise the cause and factors where it would probably be closer in line to 677 00:32:47,000 --> 00:32:49,000 merge sort. Pretty 678 00:32:49,000 --> 00:32:58,000 much nobody writes it that way is the truth, but you could kind of in theory is why we [inaudible]. What I'm 679 00:32:58,000 --> 00:32:59,000 going to show you, kind 680 00:32:59,000 --> 00:33:02,000 of motivate this by starting with something simple, and then we're gonna move towards kind of building 681 00:33:02,000 --> 00:33:04,000 the 682 00:33:04,000 --> 00:33:05,000 fully generic, 683 00:33:05,000 --> 00:33:06,000 684 00:33:06,000 --> 00:33:08,000 you know, one 685 00:33:08,000 --> 00:33:10,000 algorithm does it all, 686 00:33:10,000 --> 00:33:12,000 sorting function. 687 00:33:12,000 --> 00:33:16,000 So what I'm gonna first look at is flop because it's a little bit easier to see it in this case, 688 00:33:16,000 --> 00:33:19,000 is that you know, sometimes you need to take two variables and exchange their values. I mean 689 00:33:19,000 --> 00:33:23,000 you need to go through a temporary to copy one and then copy the other back and what 690 00:33:23,000 --> 00:33:26,000 not - swapping characters, swapping ends, swapping strings, swapping students, you 691 00:33:26,000 --> 00:33:28,000 know, any kind of variables you wanted to swap, 692 00:33:28,000 --> 00:33:29,000 you could write 693 00:33:29,000 --> 00:33:31,000 a specific swap function 694 00:33:31,000 --> 00:33:35,000 that takes two variables by reference of the integer, strain or character type that 695 00:33:35,000 --> 00:33:36,000 you're trying to exchange 696 00:33:36,000 --> 00:33:40,000 and then copies one to a temporary and exchanges the two values. 697 00:33:40,000 --> 00:33:43,000 Because of the way C++ functions work, right, 698 00:33:43,000 --> 00:33:46,000 you really do need to say, ''It's a string, this is a string, you know, what's being 699 00:33:46,000 --> 00:33:49,000 declared here is a string,'' and so you might say, ''Well, if I need more than one swap, sometimes that swaps strings and characters, you know, my choices 700 00:33:49,000 --> 00:33:52,000 are 701 00:33:52,000 --> 00:33:53,000 basically 702 00:33:53,000 --> 00:33:58,000 to duplicate and copy and paste and so on. 703 00:33:58,000 --> 00:34:02,000 That's not good; we'd rather not do that. 704 00:34:02,000 --> 00:34:04,000 But what I want to do is I want to 705 00:34:04,000 --> 00:34:08,000 discuss what it takes to write something in template form. 706 00:34:08,000 --> 00:34:12,000 We have been using templates right, the vector is a template, the set is a 707 00:34:12,000 --> 00:34:15,000 template, the stack and queue and all these things are templates 708 00:34:15,000 --> 00:34:17,000 that are written 709 00:34:17,000 --> 00:34:20,000 in a very generic way using a place 710 00:34:20,000 --> 00:34:24,000 holder, so the code that holds onto your collection of things is written 711 00:34:24,000 --> 00:34:27,000 saying, ''I don't know what the thing is; is it a string; is it an N; is it a car?'' Well, 712 00:34:27,000 --> 00:34:31,000 vectors just hold onto things and you as a client can decide what you're gonna 713 00:34:31,000 --> 00:34:33,000 be storing in this particular kind of vector. 714 00:34:33,000 --> 00:34:37,000 And so what we're gonna see is if I take those flat functions and I try to 715 00:34:37,000 --> 00:34:41,000 distill them down and say, ''Well, what is it that any swap routine looks like?'' 716 00:34:41,000 --> 00:34:43,000 It needs to take two variables by reference, it needs 717 00:34:43,000 --> 00:34:46,000 to declare a template of that type and it needs to do the assignment all around 718 00:34:46,000 --> 00:34:47,000 719 00:34:47,000 --> 00:34:49,000 to exchange those values. 720 00:34:49,000 --> 00:34:51,000 Can I write a version 721 00:34:51,000 --> 00:34:55,000 that is tight unspecified leaving a placeholder in there for 722 00:34:55,000 --> 00:34:59,000 what - 723 00:34:59,000 --> 00:35:00,000 that's really 724 00:35:00,000 --> 00:35:01,000 kind of amazing, 725 00:35:01,000 --> 00:35:03,000 that what we'Q1: gonna be swapping 726 00:35:03,000 --> 00:35:05,000 and then 727 00:35:05,000 --> 00:35:10,000 let there be multiple swaps generated from that one pattern. 728 00:35:10,000 --> 00:35:11,000 So we're gonna do this actually in 729 00:35:11,000 --> 00:35:14,000 the compiler because it's always 730 00:35:14,000 --> 00:35:16,000 nice to see a little bit a code happening. So I 731 00:35:16,000 --> 00:35:19,000 go over here 732 00:35:19,000 --> 00:35:20,000 and 733 00:35:20,000 --> 00:35:24,000 I got some code up there that 734 00:35:24,000 --> 00:35:26,000 I'm just currently gonna [inaudible] because I don't want to deal 735 00:35:26,000 --> 00:35:30,000 with it right now. We're gonna see it in a second. It doesn't [inaudible]. Okay, 736 00:35:30,000 --> 00:35:41,000 so your usual swap looks like this. 737 00:35:41,000 --> 00:35:44,000 And that would only swap exactly integers. If I wanted characters I'd 738 00:35:44,000 --> 00:35:49,000 change it all around. So what I'm gonna change it to is a template. I'm gonna add a template 739 00:35:49,000 --> 00:35:50,000 header at the top 740 00:35:50,000 --> 00:35:53,000 and so the template header starts with a key-word template and then angle brackets that 741 00:35:53,000 --> 00:35:54,000 says, 742 00:35:54,000 --> 00:35:59,000 ''What kind of placeholders does this template depend on?'' It depends on one type name 743 00:35:59,000 --> 00:36:00,000 and then I've chose then name 744 00:36:00,000 --> 00:36:02,000 capital T, type for it. 745 00:36:02,000 --> 00:36:04,000 That's a choice that I'm going to make and it says 746 00:36:04,000 --> 00:36:07,000 in the body of the function that's coming up 747 00:36:07,000 --> 00:36:08,000 where I would have 748 00:36:08,000 --> 00:36:11,000 ordinarily fully committed on the type, I'm gonna use this placeholder 749 00:36:11,000 --> 00:36:13,000 capital T, type 750 00:36:13,000 --> 00:36:22,000 instead. 751 00:36:22,000 --> 00:36:25,000 And now I have written swap in such a way that it doesn't say for sure 752 00:36:25,000 --> 00:36:29,000 what's being exchanged, two strings, two Ns, two doubles. They're all 753 00:36:29,000 --> 00:36:33,000 have to be matching in this form, and I said, ''Well, there's a placeholder, and the placeholder 754 00:36:33,000 --> 00:36:36,000 was gonna be filled in by the client who used this flop, 755 00:36:36,000 --> 00:36:41,000 and on demand, we can instantiate as many versions of the swap function as we need.'' 756 00:36:41,000 --> 00:36:45,000 If you need to swap integers, you can instantiate a swap that then fills in the 757 00:36:45,000 --> 00:36:48,000 placeholder type with Ns all the way through 758 00:36:48,000 --> 00:36:51,000 and then I can use that same template or pattern to generate one that will swap 759 00:36:51,000 --> 00:36:53,000 characters or swap strings or swap whatever. 760 00:36:53,000 --> 00:37:02,000 So if I go down to my main, maybe I'll just pull my main up [inaudible]. And 761 00:37:02,000 --> 00:37:05,000 I will show you how I can do this. I can say 762 00:37:05,000 --> 00:37:08,000 int 1 equals 763 00:37:08,000 --> 00:37:13,000 54, 2 equals 32 and I say swap 1-2. 764 00:37:13,000 --> 00:37:17,000 So if the usage of a function is a little bit different 765 00:37:17,000 --> 00:37:19,000 than it was for 766 00:37:19,000 --> 00:37:21,000 class templates, 767 00:37:21,000 --> 00:37:25,000 in class templates we always did this thing where I said 768 00:37:25,000 --> 00:37:29,000 the name of the template and then it angled back, as I said. And in particular I want 769 00:37:29,000 --> 00:37:33,000 the swap function that operates where the template parameter has been filled in 770 00:37:33,000 --> 00:37:35,000 with int 771 00:37:35,000 --> 00:37:38,000 that in the case of functions, it turns out it's a little bit easier for the compiler 772 00:37:38,000 --> 00:37:39,000 to know what you intended 773 00:37:39,000 --> 00:37:42,000 that on the basis of what my arguments are to this, 774 00:37:42,000 --> 00:37:45,000 is I called swap passing two integers, 775 00:37:45,000 --> 00:37:49,000 it knows that there's only one possibility for what version of swap you were interested 776 00:37:49,000 --> 00:37:50,000 in, 777 00:37:50,000 --> 00:37:54,000 that it must be the one where type has been filled in with int and that's 778 00:37:54,000 --> 00:37:56,000 the one to generate for you. 779 00:37:56,000 --> 00:37:59,000 So you can use that long form of saying swap, angle bracket int, 780 00:37:59,000 --> 00:38:03,000 but typically you will not; you will let the compiler infer it for you on the usage, 781 00:38:03,000 --> 00:38:05,000 so based on the arguments you passed, 782 00:38:05,000 --> 00:38:10,000 it will figure out how to kind of match them to what swap said it was taking and 783 00:38:10,000 --> 00:38:13,000 if it can't match them, for example, if you pass one double and one int, you'll get compiler 784 00:38:13,000 --> 00:38:14,000 errors. 785 00:38:14,000 --> 00:38:19,000 But if I've done it correctly, then it will generate for me a swap 786 00:38:19,000 --> 00:38:23,000 where type's been filled in with int and if I call it again, 787 00:38:23,000 --> 00:38:26,000 passing different kinds of things, so having 788 00:38:26,000 --> 00:38:29,000 string S equals hello and T 789 00:38:29,000 --> 00:38:31,000 equals 790 00:38:31,000 --> 00:38:33,000 world, 791 00:38:33,000 --> 00:38:37,000 that I can write swap S and T and now 792 00:38:37,000 --> 00:38:40,000 the compiler generated for me two different versions of swap, right, 793 00:38:40,000 --> 00:38:43,000 one for integers, one for 794 00:38:43,000 --> 00:38:46,000 strings on the [inaudible] and I didn't do anything with them. I put them [inaudible], so nothing to see there, 795 00:38:46,000 --> 00:38:47,000 but showing 796 00:38:47,000 --> 00:38:50,000 it does compile and build up. That's 797 00:38:50,000 --> 00:38:53,000 a pretty neat idea. 798 00:38:53,000 --> 00:38:55,000 Kind of important because it turns out there are 799 00:38:55,000 --> 00:38:59,000 a lot of pieces of code that you're gonna find yourself wanting to write that don't 800 00:38:59,000 --> 00:39:02,000 depend, in a case like swap, swap doesn't care what it's exchanging, that they're Ns, that 801 00:39:02,000 --> 00:39:06,000 they're strings, that they're doubles. The way you swap two things 802 00:39:06,000 --> 00:39:09,000 is the same no matter what they are. You copy one aside; you overwrite one; 803 00:39:09,000 --> 00:39:11,000 you overwrite the other 804 00:39:11,000 --> 00:39:14,000 and the same thing applies to much more algorithmically interesting problems 805 00:39:14,000 --> 00:39:15,000 like 806 00:39:15,000 --> 00:39:19,000 searching, like sorting, like removing duplicates, like printing, 807 00:39:19,000 --> 00:39:23,000 where you want to take a vector and print all of its contents right that 808 00:39:23,000 --> 00:39:24,000 the 809 00:39:24,000 --> 00:39:27,000 steps you need to do to print a vector of events looks just like the steps you need to do to print a vector 810 00:39:27,000 --> 00:39:31,000 of strings. And so it would be very annoying every time I need to do that to 811 00:39:31,000 --> 00:39:33,000 duplicate that code that there's a real appeal toward 812 00:39:33,000 --> 00:39:35,000 writing it once, as a template 813 00:39:35,000 --> 00:39:37,000 and using it 814 00:39:37,000 --> 00:39:41,000 in multiple situations. So 815 00:39:41,000 --> 00:39:44,000 this shows that if I swap, right, 816 00:39:44,000 --> 00:39:47,000 it infers what I meant 817 00:39:47,000 --> 00:39:50,000 and then this thing basically just shows what happened is when I said 818 00:39:50,000 --> 00:39:53,000 int equals four, 819 00:39:53,000 --> 00:39:56,000 [inaudible] use that pattern to generate the swap int 820 00:39:56,000 --> 00:39:59,000 where the type parameter, that placeholder has been 821 00:39:59,000 --> 00:40:01,000 filled in 822 00:40:01,000 --> 00:40:05,000 and established that it's int for this particular version which is distinct from 823 00:40:05,000 --> 00:40:10,000 the swap for characters and swap for strings and what not. So 824 00:40:10,000 --> 00:40:13,000 let me show you that version I talked about print, and this one I'm 825 00:40:13,000 --> 00:40:15,000 gonna go back to the 826 00:40:15,000 --> 00:40:16,000 compiler for just a second. 827 00:40:16,000 --> 00:40:19,000 Let's say I want to print a vector, printing a vector it like iterating all 828 00:40:19,000 --> 00:40:22,000 the members and using the 829 00:40:22,000 --> 00:40:23,000 screen insertion to print them out 830 00:40:23,000 --> 00:40:26,000 and so here it is taking some vector 831 00:40:26,000 --> 00:40:29,000 where the elements in it are of this type name. In this case, I've just used T as 832 00:40:29,000 --> 00:40:31,000 my short name here. 833 00:40:31,000 --> 00:40:31,000 The iterator 834 00:40:31,000 --> 00:40:35,000 looks the same; the bracketing looks the same. I see the end now and I'd 835 00:40:35,000 --> 00:40:38,000 like to be able to print vectors of strings, vectors of ints, vector of 836 00:40:38,000 --> 00:40:39,000 doubles 837 00:40:39,000 --> 00:40:43,000 with this one piece of code. So if I call print vector and I pas code vector events, 838 00:40:43,000 --> 00:40:43,000 all's good; 839 00:40:43,000 --> 00:40:47,000 vector doubles, all's good; vector strings, all's good. 840 00:40:47,000 --> 00:40:50,000 Here's where I could get a little bit into trouble here. 841 00:40:50,000 --> 00:40:52,000 I've made this [inaudible] chord that has an X 842 00:40:52,000 --> 00:40:57,000 and a Y field. I make a vector that contains these chords. 843 00:40:57,000 --> 00:41:00,000 And then I try to call print vector of C. 844 00:41:00,000 --> 00:41:03,000 So when I do this, right, 845 00:41:03,000 --> 00:41:06,000 it will instantiate a versive print vector 846 00:41:06,000 --> 00:41:11,000 where the T has been matched up to, oh it's chords that are in there; you've got a vector 847 00:41:11,000 --> 00:41:12,000 of chords. 848 00:41:12,000 --> 00:41:15,000 And so okay, we'll go through and iterate over the sides of the vector 849 00:41:15,000 --> 00:41:18,000 and this line is trying to say see out 850 00:41:18,000 --> 00:41:21,000 of a chord type. And 851 00:41:21,000 --> 00:41:22,000 struts 852 00:41:22,000 --> 00:41:26,000 don't automatically know how to out put themselves onto a string, 853 00:41:26,000 --> 00:41:30,000 and so at this point when I try to instantiate, so the code in print vector is 854 00:41:30,000 --> 00:41:33,000 actually fine and works for a large number of types, all the primitive types 855 00:41:33,000 --> 00:41:35,000 will be fine, 856 00:41:35,000 --> 00:41:36,000 string and things like that are fine. 857 00:41:36,000 --> 00:41:40,000 But if I try to use it for a type for which it doesn't have this output 858 00:41:40,000 --> 00:41:43,000 behavior, right I will get a compiler error, 859 00:41:43,000 --> 00:41:46,000 and it may come as a surprise because a print vector appeared to be working fine in all the other 860 00:41:46,000 --> 00:41:49,000 situations and all the sudden it seems like a new error has cropped up 861 00:41:49,000 --> 00:41:51,000 but that error came from 862 00:41:51,000 --> 00:41:54,000 the instantiation of a particular version in this case, that suddenly ran 863 00:41:54,000 --> 00:41:58,000 afoul of what the template expected of the type. 864 00:41:58,000 --> 00:42:02,000 The message you get, let's say in X code looks like this, no match for 865 00:42:02,000 --> 00:42:08,000 operator, less than, less than in standard CL, vector element, operator [inaudible]. 866 00:42:08,000 --> 00:42:12,000 So I'm trying to give you a little clue, but in its very cryptic C++ way of what 867 00:42:12,000 --> 00:42:15,000 you've done wrong. 868 00:42:15,000 --> 00:42:18,000 So it comes back to what the template has to be a little bit clear about 869 00:42:18,000 --> 00:42:20,000 from a client point of view is to say well 870 00:42:20,000 --> 00:42:24,000 what is it that needs to be true. Will it really work for all types or is there 871 00:42:24,000 --> 00:42:27,000 something special about the kind of things that actually need to be true 872 00:42:27,000 --> 00:42:29,000 about what's filled in there with a placeholder to make the rest of the code 873 00:42:29,000 --> 00:42:31,000 work correctly. 874 00:42:31,000 --> 00:42:33,000 So if it uses string insertion 875 00:42:33,000 --> 00:42:36,000 or it compares to elements using equals equals, right, 876 00:42:36,000 --> 00:42:40,000 something like a strut doesn't by default have those behaviors and so you 877 00:42:40,000 --> 00:42:44,000 could have a template that would work for primitive types or types that have these 878 00:42:44,000 --> 00:42:47,000 operations declared, but wouldn't work when 879 00:42:47,000 --> 00:42:51,000 you gave it a more complex type that didn't have that 880 00:42:51,000 --> 00:42:53,000 data support in there. 881 00:42:53,000 --> 00:42:55,000 So I have that peace code over here. I can just 882 00:42:55,000 --> 00:42:59,000 show it to you. I just took it down here. 883 00:42:59,000 --> 00:43:05,000 This is print vector and 884 00:43:05,000 --> 00:43:07,000 it's a little bit of 885 00:43:07,000 --> 00:43:10,000 template. 886 00:43:10,000 --> 00:43:11,000 And 887 00:43:11,000 --> 00:43:14,000 if I were to have a vector 888 00:43:14,000 --> 00:43:17,000 declared of ints, 889 00:43:17,000 --> 00:43:20,000 I say print vector, V 890 00:43:20,000 --> 00:43:24,000 that this compiles, there's nothing to print. It turns out it'll just print empty line because 891 00:43:24,000 --> 00:43:32,000 there's nothing in there, but if I change this to be chord T 892 00:43:32,000 --> 00:43:37,000 and my build is failing, right, and the error message I'm getting here is no match for 893 00:43:37,000 --> 00:43:38,000 trying to 894 00:43:38,000 --> 00:43:41,000 output that thing and it tells me that all the things I can output on a string. Well, it could be that you could output a 895 00:43:41,000 --> 00:43:44,000 [inaudible] but it's not one of those things, right, chord T 896 00:43:44,000 --> 00:43:48,000 doesn't have 897 00:43:48,000 --> 00:43:51,000 a built-in behavior for outputting to a stream. 898 00:43:51,000 --> 00:43:53,000 So this is going to come back to be kind of important because I'm gonna 899 00:43:53,000 --> 00:43:54,000 keep going here 900 00:43:54,000 --> 00:43:58,000 is that 901 00:43:58,000 --> 00:44:00,000 when we get to the sort, right, when 902 00:44:00,000 --> 00:44:02,000 we try to build this thing, we're gonna definitely be depending on 903 00:44:02,000 --> 00:44:05,000 something being true about the elements, right, 904 00:44:05,000 --> 00:44:07,000 that's gonna constrain what the type could be. 905 00:44:07,000 --> 00:44:10,000 So this is selection sort in 906 00:44:10,000 --> 00:44:14,000 its ordinary form, nothing special about it other than the fact that I 907 00:44:14,000 --> 00:44:17,000 changed it as so sorting an int, it's the sort of vector with 908 00:44:17,000 --> 00:44:18,000 placeholder type. So it's [inaudible] with 909 00:44:18,000 --> 00:44:23,000 the template typed in header, vector of type and then in here, 910 00:44:23,000 --> 00:44:26,000 operations like this really are accessing a type 911 00:44:26,000 --> 00:44:28,000 and another type 912 00:44:28,000 --> 00:44:30,000 variable out of 913 00:44:30,000 --> 00:44:33,000 the vector that was passed and then comparing them, which is smaller to decide 914 00:44:33,000 --> 00:44:34,000 which index. 915 00:44:34,000 --> 00:44:36,000 It's using a swap 916 00:44:36,000 --> 00:44:39,000 and so the generic swap would also be necessary for this so that we need to 917 00:44:39,000 --> 00:44:42,000 have a swap for any kind of things we want to exchange. We have a template 918 00:44:42,000 --> 00:44:44,000 up there for swap, 919 00:44:44,000 --> 00:44:47,000 then the sort can build on top of that and use that as part of its workings which is then able 920 00:44:47,000 --> 00:44:48,000 to come under 921 00:44:48,000 --> 00:44:52,000 swap to any two things can be swapped using the same strategy. 922 00:44:52,000 --> 00:44:55,000 And this is actually where templates really start to shine. Like swap in itself isn't that 923 00:44:55,000 --> 00:44:56,000 924 00:44:56,000 --> 00:44:59,000 stunning of an example, because you think whatever, who cares if three lines a code, but every 925 00:44:59,000 --> 00:45:00,000 926 00:45:00,000 --> 00:45:01,000 little bit does count, 927 00:45:01,000 --> 00:45:04,000 but once we get to things like sorting where we could build a really 928 00:45:04,000 --> 00:45:05,000 killer 929 00:45:05,000 --> 00:45:09,000 totally tuned, really high performance, you know, protection against a [inaudible] 930 00:45:09,000 --> 00:45:09,000 quick sort 931 00:45:09,000 --> 00:45:12,000 and we want to be sure that we can use it in all the places we need to sort. I don't to 932 00:45:12,000 --> 00:45:15,000 make it sort just strings and later when I need to sort for integers, I need to 933 00:45:15,000 --> 00:45:18,000 copy and paste it and change string to everywhere, 934 00:45:18,000 --> 00:45:21,000 and then if I find a bug in one, I forget to change it in the other I have 935 00:45:21,000 --> 00:45:23,000 the bug still lurking there. 936 00:45:23,000 --> 00:45:25,000 And so template functions are generally used for 937 00:45:25,000 --> 00:45:29,000 all kinds of algorithmically interesting things, sorting and searching and removing 938 00:45:29,000 --> 00:45:32,000 duplicates and permuting and finding the mode 939 00:45:32,000 --> 00:45:33,000 and 940 00:45:33,000 --> 00:45:36,000 shuffling and all these things that like okay, no matter what the type of 941 00:45:36,000 --> 00:45:38,000 thing you're working on, 942 00:45:38,000 --> 00:45:41,000 the algorithm for how you do it looks the same whether it's ints or 943 00:45:41,000 --> 00:45:43,000 strings or whatever. So 944 00:45:43,000 --> 00:45:46,000 we got this guy 945 00:45:46,000 --> 00:45:46,000 946 00:45:46,000 --> 00:45:49,000 and as is, right, we could 947 00:45:49,000 --> 00:45:53,000 push vectors of ints in there, vectors of strings in there and the right thing 948 00:45:53,000 --> 00:45:57,000 would work for those without any changes to it, 949 00:45:57,000 --> 00:45:59,000 but it does have a lurking 950 00:45:59,000 --> 00:46:02,000 constraint on what much be true about the kind of elements that we're 951 00:46:02,000 --> 00:46:04,000 trying to sort here. 952 00:46:04,000 --> 00:46:05,000 Not every type will work. If 953 00:46:05,000 --> 00:46:07,000 I go back to my chord example, 954 00:46:07,000 --> 00:46:09,000 this XY, 955 00:46:09,000 --> 00:46:13,000 if have a vector of chord and I pass it to sort and if I go back and look at the 956 00:46:13,000 --> 00:46:14,000 code, 957 00:46:14,000 --> 00:46:17,000 you think about instantiating this with vector, is all chord in here, all the way 958 00:46:17,000 --> 00:46:18,000 through here, 959 00:46:18,000 --> 00:46:22,000 this part's fine; this part's fine; this part's fine, but suddenly you get to this line and 960 00:46:22,000 --> 00:46:25,000 that's gonna be taking two coordinate structures and saying is this coordinate 961 00:46:25,000 --> 00:46:27,000 structure less than another. 962 00:46:27,000 --> 00:46:30,000 And that code's not gonna compile. Yes. Back when we were doing sets, we were able to explicitly 963 00:46:30,000 --> 00:46:36,000 say how this works. Yeah, so you're right where I'm gonna be, but I'm probably 964 00:46:36,000 --> 00:46:40,000 not gonna finish today, but I'll have to finish up on Wednesday is we're gonna do exactly what 965 00:46:40,000 --> 00:46:43,000 Seth did, and what did Seth do? Comparison function. 966 00:46:43,000 --> 00:46:46,000 That's right, he used a comparison function, so you have to have some coordination here when you say, 967 00:46:46,000 --> 00:46:49,000 well instead of trying to use the operate or less than on these things, 968 00:46:49,000 --> 00:46:53,000 how about the client tell me as part of calling sort, why don't you give me the 969 00:46:53,000 --> 00:46:57,000 information in the form of a function that says call me back and tell me are 970 00:46:57,000 --> 00:46:59,000 these things less than 971 00:46:59,000 --> 00:47:02,000 and so on and how to order them. So that's exactly what we need to do, but I can't 972 00:47:02,000 --> 00:47:05,000 really show you the code right now, but I'll just kind of like 973 00:47:05,000 --> 00:47:09,000 flash it up there and we will talk about it on Wednesday, what changes make 974 00:47:09,000 --> 00:47:12,000 that happen. We'll pick up 975 00:47:12,000 --> 00:47:22,000 with Chapter 8, actually after the midterm in terms of 976 00:47:22,000 --> 00:47:23,000 reading.