1
00:00:00,000 --> 00:00:11,000


2
00:00:11,000 --> 00:00:15,000
This presentation is delivered by the Stanford Center for Professional 

3
00:00:15,000 --> 00:00:25,000
Development. 

4
00:00:25,000 --> 00:00:26,000
Okay. 

5
00:00:26,000 --> 00:00:27,000
So this is 

6
00:00:27,000 --> 00:00:30,000
the last thing we had talked about at the end of the merge sort was 

7
00:00:30,000 --> 00:00:35,000
comparing a quadratic and N-squared sort, this left and right sort right here to the linear 

8
00:00:35,000 --> 00:00:37,000
arrhythmic which is the N-log in 

9
00:00:37,000 --> 00:00:40,000
kind that merge sort runs in, right, showing you that 

10
00:00:40,000 --> 00:00:41,000


11
00:00:41,000 --> 00:00:44,000
really way out-performing even on small values and getting the large values 

12
00:00:44,000 --> 00:00:47,000
[inaudible] just getting enormous in terms of what you can accomplish with 

13
00:00:47,000 --> 00:00:48,000
a N-log-in algorithm 

14
00:00:48,000 --> 00:00:50,000
really vastly 

15
00:00:50,000 --> 00:00:52,000
superior to what you're gonna get out of a N-squared algorithm. I said 

16
00:00:52,000 --> 00:00:56,000
well, N-log-I told you without proof that you're going to take on faith that I 

17
00:00:56,000 --> 00:00:57,000
wouldn't lie to you, 

18
00:00:57,000 --> 00:01:01,000
that that is the best we can do in terms of a comparison based sort, 

19
00:01:01,000 --> 00:01:04,000
but what we're going to look for is a way of maybe even improving on those 

20
00:01:04,000 --> 00:01:06,000
kinds of merge sort by kind of a 

21
00:01:06,000 --> 00:01:09,000
one thing we might want to do is avoid that copying of the data out and back that 

22
00:01:09,000 --> 00:01:11,000
merge sort does and 

23
00:01:11,000 --> 00:01:15,000
maybe have some lower cost in factors in the process, right that can bring our 

24
00:01:15,000 --> 00:01:17,000
times down even better. 

25
00:01:17,000 --> 00:01:20,000
So the sort we're looking at is quick sort. 

26
00:01:20,000 --> 00:01:22,000
And I think if you were gonna come up with an algorithm name, 

27
00:01:22,000 --> 00:01:25,000
naming it Quick Sort becomes good marketing here so it kind of inspires you to 

28
00:01:25,000 --> 00:01:29,000
believe actually that it's gonna live up to its name, is a divide-and-conquer algorithm 

29
00:01:29,000 --> 00:01:33,000
that takes a strategy like merge sort in the abstract which is the divide into 

30
00:01:33,000 --> 00:01:36,000
two pieces and [inaudible] sort those pieces and then join them back 

31
00:01:36,000 --> 00:01:37,000
together. 

32
00:01:37,000 --> 00:01:40,000
But whereas merge sort does an easy split hard join, 

33
00:01:40,000 --> 00:01:43,000
this one flips it on its head and said what if we did some work on the front step 

34
00:01:43,000 --> 00:01:45,000
in the splitting process 

35
00:01:45,000 --> 00:01:49,000
and then that may give us less to do on the joining step 

36
00:01:49,000 --> 00:01:53,000
and so the strategy for quick sort is to run to the pile of 

37
00:01:53,000 --> 00:01:55,000
papers, whatever we're trying to work at 

38
00:01:55,000 --> 00:01:57,000
in a partitioning step 

39
00:01:57,000 --> 00:01:58,000
is the first thing that it does 

40
00:01:58,000 --> 00:02:01,000
and that's the splitting step and that partitioning is designed to kind of move 

41
00:02:01,000 --> 00:02:04,000
stuff to the lower half and the upper half. 

42
00:02:04,000 --> 00:02:07,000
So having some idea of what the middle most value would be, 

43
00:02:07,000 --> 00:02:11,000
and then actually just walking through the pile of papers and kind of you know, 

44
00:02:11,000 --> 00:02:13,000
distributing to the left and right 

45
00:02:13,000 --> 00:02:17,000
portions based on their comparison to this middle-most element. 

46
00:02:17,000 --> 00:02:21,000
Once I have those two stacks - I've got A through M over here and N through Z over there, 

47
00:02:21,000 --> 00:02:23,000
that we're cursively sorting those 

48
00:02:23,000 --> 00:02:26,000
takes place, kind of assuming my delegates do their work 

49
00:02:26,000 --> 00:02:32,000
and in the process of re-joining them is really kind of simple. They kind of, you know, 

50
00:02:32,000 --> 00:02:37,000
joined together with actually no extra work, no looking, no process there. So the one 

51
00:02:37,000 --> 00:02:41,000
thing that actually is is where all the trickiness comes into is this idea of how we 

52
00:02:41,000 --> 00:02:44,000
do this partitioning. 

53
00:02:44,000 --> 00:02:47,000
So I was being a little big disingenuous when I said well if I had this stack of exam papers and 

54
00:02:47,000 --> 00:02:48,000
I know that 

55
00:02:48,000 --> 00:02:51,000
the names, you know range over the alphabet A through Z then I know where M is and I 

56
00:02:51,000 --> 00:02:54,000
can say well, that's a good mid pint around to divide. 

57
00:02:54,000 --> 00:02:57,000
Given that we want to make this work for kind of any sort of data, 

58
00:02:57,000 --> 00:03:01,000
we're not going to [inaudible] know, what's the minimal most value. If we have a bunch of 

59
00:03:01,000 --> 00:03:04,000
numbers, are they test scores, in which case maybe 50 is a good place to move. 

60
00:03:04,000 --> 00:03:06,000
Are they instead, you know, 

61
00:03:06,000 --> 00:03:09,000
population counts of states in which case millions would be a better 

62
00:03:09,000 --> 00:03:11,000
number to pick to kind of divide 

63
00:03:11,000 --> 00:03:12,000
them up. So 

64
00:03:12,000 --> 00:03:15,000
we have to come up with a strategy that's gonna work, you know, no matter 

65
00:03:15,000 --> 00:03:19,000
what the input that's somehow gonna pick something that's kind of reasonable 

66
00:03:19,000 --> 00:03:21,000
for how we're gonna do these divisions. There's 

67
00:03:21,000 --> 00:03:23,000
this idea of picking, called a pivot. 

68
00:03:23,000 --> 00:03:28,000
So given this, you know, region, this set of things to sort, 

69
00:03:28,000 --> 00:03:30,000
how do I know what's a reasonable pivot. 

70
00:03:30,000 --> 00:03:33,000
We're looking for something that's close to the median is actually ideal. 

71
00:03:33,000 --> 00:03:34,000
So if we could 

72
00:03:34,000 --> 00:03:37,000
walk through them and compute the median, that would be the best we could 

73
00:03:37,000 --> 00:03:38,000
do, we're 

74
00:03:38,000 --> 00:03:41,000
gonna try to get around to not doing that much work 

75
00:03:41,000 --> 00:03:45,000
about it, though. We're gonna actually try to kind of make an approximation and we're gonna make a really 

76
00:03:45,000 --> 00:03:47,000
very simplistic 

77
00:03:47,000 --> 00:03:50,000
choice about what my approximation would be. We're gonna come back and re-visit 

78
00:03:50,000 --> 00:03:51,000
this a little bit later. 

79
00:03:51,000 --> 00:03:56,000
But I want to say, ''Well, I just took the first paper off the stack. If they're in random order, 

80
00:03:56,000 --> 00:03:58,000
right. I look at the first one; I say Okay, it's, 

81
00:03:58,000 --> 00:04:00,000
you know, somebody king. 

82
00:04:00,000 --> 00:04:03,000
Well, everybody less than king goes over here. Everybody greater than king goes over 

83
00:04:03,000 --> 00:04:07,000
there. Now king may not be perfectly in the middle and we're gonna come back to see 

84
00:04:07,000 --> 00:04:09,000
what that will do to the analysis, 

85
00:04:09,000 --> 00:04:12,000
but at least we know it's in the range of the values we're looking at, right, and so it 

86
00:04:12,000 --> 00:04:14,000
at least did that for us. 

87
00:04:14,000 --> 00:04:17,000
And it means no matter what our data is, we can always use that. It's a strategy that will always work. Let's 

88
00:04:17,000 --> 00:04:18,000
just take the thing 

89
00:04:18,000 --> 00:04:20,000
that we first see 

90
00:04:20,000 --> 00:04:22,000
and use it to be the pivot and them 

91
00:04:22,000 --> 00:04:24,000
slot them left and right. 

92
00:04:24,000 --> 00:04:28,000
So let me go through looking at what the code actually does. 

93
00:04:28,000 --> 00:04:30,000
This is another of those cases where the 

94
00:04:30,000 --> 00:04:32,000
code for partition 

95
00:04:32,000 --> 00:04:35,000
is a little bit 

96
00:04:35,000 --> 00:04:36,000
frightening 

97
00:04:36,000 --> 00:04:40,000
and the analysis we're looking at is not actually to get really really worried 

98
00:04:40,000 --> 00:04:42,000
about the exact details of the less 

99
00:04:42,000 --> 00:04:45,000
than or the less than or equal to. I'm gonna talk you through what it does, 

100
00:04:45,000 --> 00:04:48,000
but the more important take-away point is what's the algorithm behind Quick Sort. What's 

101
00:04:48,000 --> 00:04:53,000
the strategy it's using to divide and conquer and how that's working out. 

102
00:04:53,000 --> 00:04:56,000
So the main Quick Sort algorithm looks like this. We're given this 

103
00:04:56,000 --> 00:04:59,000
vector of things we're trying to sort and we have a start index and a stop index 

104
00:04:59,000 --> 00:05:04,000
that identifies the sub region of the vector that we're currently trying to sort. I'm gonna 

105
00:05:04,000 --> 00:05:08,000
use this in subsequent calls to kind of identify what piece we're recursively 

106
00:05:08,000 --> 00:05:09,000
focusing on. 

107
00:05:09,000 --> 00:05:12,000
As long as there's a 

108
00:05:12,000 --> 00:05:15,000
positive difference between the start and the stop so there's at least 

109
00:05:15,000 --> 00:05:16,000
two elements to sort, 

110
00:05:16,000 --> 00:05:19,000
then we go through the process of doing the division and the sorting. So then 

111
00:05:19,000 --> 00:05:23,000
it makes the call to partition saying sub-region array 

112
00:05:23,000 --> 00:05:26,000
do the partition and we'll come back and talk about that code in a second and the thing we're trying to find 

113
00:05:26,000 --> 00:05:31,000
the partition is the index at which the pivot was placed. So after we did all the 

114
00:05:31,000 --> 00:05:31,000
work, 

115
00:05:31,000 --> 00:05:34,000
it moved everything less than the pivot to the left side of that, everything 

116
00:05:34,000 --> 00:05:36,000
greater than the pivot to the right side 

117
00:05:36,000 --> 00:05:37,000
and then the pivot will tell us 

118
00:05:37,000 --> 00:05:41,000
where the division is and then we make two recursive calls 

119
00:05:41,000 --> 00:05:43,000
to sort everything up to but not including pivot, 

120
00:05:43,000 --> 00:05:46,000
to sort everything past the pivot to the end, 

121
00:05:46,000 --> 00:05:49,000
and then those calls are operating on the array in place, so we're not copying it out and 

122
00:05:49,000 --> 00:05:54,000
copying it back. So in fact, there actually even isn't really any joint step here. 

123
00:05:54,000 --> 00:05:57,000
We said sort the four things on the left, sort the seven things on the right 

124
00:05:57,000 --> 00:06:00,000
and then the joining of them was where they're already where they need 

125
00:06:00,000 --> 00:06:02,000
to be so in fact we don't have anything to do in the 

126
00:06:02,000 --> 00:06:04,000
joined wall. 

127
00:06:04,000 --> 00:06:09,000
The tricky part is certainly in partition. The 

128
00:06:09,000 --> 00:06:12,000
version of the partition we're using here is one that kind of takes two indices, 

129
00:06:12,000 --> 00:06:14,000
two fingers, you might imagine 

130
00:06:14,000 --> 00:06:16,000
and it walks a 

131
00:06:16,000 --> 00:06:18,000
one end from 

132
00:06:18,000 --> 00:06:22,000
the stop position down and one from the start position over 

133
00:06:22,000 --> 00:06:26,000
and it's looking for elements that are on the wrong side. 

134
00:06:26,000 --> 00:06:30,000
So if we start with our right hand finger on the very end of the array, 

135
00:06:30,000 --> 00:06:31,000
that first loop in there 

136
00:06:31,000 --> 00:06:35,000
is designed to move the right hand down 

137
00:06:35,000 --> 00:06:38,000
looking for something that doesn't belong in left side, so we 

138
00:06:38,000 --> 00:06:41,000
identified 45 as the pivot value. It says 

139
00:06:41,000 --> 00:06:43,000
find something that's not 

140
00:06:43,000 --> 00:06:48,000
greater then 45. That's the first thing we found coming in from the right 

141
00:06:48,000 --> 00:06:50,000
downward that doesn't belong, 

142
00:06:50,000 --> 00:06:51,000
so only that first step. 

143
00:06:51,000 --> 00:06:54,000
So it turns out it takes it just one iteration to find that, 

144
00:06:54,000 --> 00:06:57,000
and so okay, this guy doesn't belong on the right-hand side. 

145
00:06:57,000 --> 00:07:00,000
Now it does the same thing on the inverse on the left-hand side. 

146
00:07:00,000 --> 00:07:03,000
Try to find something that doesn't belong on the left-hand side. 

147
00:07:03,000 --> 00:07:06,000
So the left hand moves up. In 

148
00:07:06,000 --> 00:07:08,000
this case, it took two iterations to get to that, 

149
00:07:08,000 --> 00:07:09,000
to this 92. 

150
00:07:09,000 --> 00:07:13,000
So now we have a perfect opportunity to fix both our problems 

151
00:07:13,000 --> 00:07:15,000
with one swap, 

152
00:07:15,000 --> 00:07:16,000
exchange what's 

153
00:07:16,000 --> 00:07:19,000
at the current position of left hand with right hand and now we will have 

154
00:07:19,000 --> 00:07:23,000
made both of our problems go away and we'll have kind of more things on the 

155
00:07:23,000 --> 00:07:24,000
left and the right that belong there 

156
00:07:24,000 --> 00:07:27,000
and then we can walk inward from there, you know, to find subsequent ones 

157
00:07:27,000 --> 00:07:29,000
that are out of order. 

158
00:07:29,000 --> 00:07:31,000
So those two get exchanged and now 

159
00:07:31,000 --> 00:07:33,000
right again 

160
00:07:33,000 --> 00:07:35,000
moves into find that 8, 

161
00:07:35,000 --> 00:07:37,000
left moves over to find that 74. 

162
00:07:37,000 --> 00:07:38,000
We swap again. 

163
00:07:38,000 --> 00:07:40,000
And as we just keep doing this, 

164
00:07:40,000 --> 00:07:41,000
right, until 

165
00:07:41,000 --> 00:07:43,000
they cross, and once they've cross, 

166
00:07:43,000 --> 00:07:47,000
right then we know that everything that the right scanned over is greater than the pivot, 

167
00:07:47,000 --> 00:07:48,000
everything 

168
00:07:48,000 --> 00:07:51,000
in the left was less than and 

169
00:07:51,000 --> 00:07:55,000
at that point, right we have divided it, right, everything that's small is kind 

170
00:07:55,000 --> 00:07:56,000
of in the left side of the 

171
00:07:56,000 --> 00:07:59,000
vector. Everything that's large in the right side. So I 

172
00:07:59,000 --> 00:08:03,000
swap these two and now I get to that place and I say okay, 

173
00:08:03,000 --> 00:08:05,000
so now big things here, small things there. 

174
00:08:05,000 --> 00:08:10,000
The last thing we do is we move the pivot into the place right between them and 

175
00:08:10,000 --> 00:08:13,000
know that it is exactly located right in the 

176
00:08:13,000 --> 00:08:15,000
slot where they cross. Now 

177
00:08:15,000 --> 00:08:17,000
I have two sub arrays to work on. 

178
00:08:17,000 --> 00:08:21,000
So the return from partition in this case will be the index four here and it 

179
00:08:21,000 --> 00:08:24,000
says Okay, go ahead and quick sort 

180
00:08:24,000 --> 00:08:25,000
this front-most part 

181
00:08:25,000 --> 00:08:28,000
and then come back and do the second part. 

182
00:08:28,000 --> 00:08:31,000
So in recursion, kind of think of that as being postponed and it's waiting; we'll get back to it in a minute, but 

183
00:08:31,000 --> 00:08:34,000
while we work on this step, we'll do this same thing. It's a 

184
00:08:34,000 --> 00:08:39,000
little big harder to see it in the small kind of what's going on, but in this case, right, it 

185
00:08:39,000 --> 00:08:44,000
divided that because the pivot was 8 into 8 and the three things greater than the pivot. In 

186
00:08:44,000 --> 00:08:48,000
this case, the 41 is the pivot so it's actually going to move 

187
00:08:48,000 --> 00:08:50,000
41 over there. It's going to have the left of that as it keeps 

188
00:08:50,000 --> 00:08:53,000
working its way down, it gets smaller and smaller and it's harder to see the workings of 

189
00:08:53,000 --> 00:08:56,000
the algorithm in such a small case of it 

190
00:08:56,000 --> 00:08:58,000
rearranging it, but 

191
00:08:58,000 --> 00:08:59,000
we'll get back to the big 

192
00:08:59,000 --> 00:09:01,000
left half in 

193
00:09:01,000 --> 00:09:02,000
a second. 

194
00:09:02,000 --> 00:09:05,000
And so now that that's all been sorted, 

195
00:09:05,000 --> 00:09:09,000
we're revisiting the side that's advancing above the pivot 

196
00:09:09,000 --> 00:09:13,000
to get these five guys, six guys in the order, taking the same 

197
00:09:13,000 --> 00:09:18,000
strategy. You pivot around 67 [inaudible] doing some swapping 

198
00:09:18,000 --> 00:09:28,000
[inaudible] and 

199
00:09:28,000 --> 00:09:30,000


200
00:09:30,000 --> 00:09:32,000
we'll see it in slightly bigger case to kind of get the idea, 

201
00:09:32,000 --> 00:09:35,000
but very quickly there's something very interesting about the way quick sort is working 

202
00:09:35,000 --> 00:09:38,000
is that that partition step 

203
00:09:38,000 --> 00:09:41,000
very quickly gets things kind of close to the right place. And 

204
00:09:41,000 --> 00:09:42,000
so now that we've 

205
00:09:42,000 --> 00:09:45,000
done both the left and the right 

206
00:09:45,000 --> 00:09:47,000
we're done. The whole thing is done. Everything kind of got moved into the 

207
00:09:47,000 --> 00:09:50,000
right position as part of the recursive, part of the sorting and doesn't need to be 

208
00:09:50,000 --> 00:09:52,000
further moved 

209
00:09:52,000 --> 00:09:54,000
to solve the whole problem. 

210
00:09:54,000 --> 00:09:57,000
That first step of the partition is doing a lot of the kind of throwing things kind 

211
00:09:57,000 --> 00:09:59,000
of left and right, 

212
00:09:59,000 --> 00:10:03,000
but it's actually quickly moving big things and small things that are out of 

213
00:10:03,000 --> 00:10:05,000
place closer to where they're gonna eventually need to go, and it turns 

214
00:10:05,000 --> 00:10:07,000
out to be a real advantage 

215
00:10:07,000 --> 00:10:11,000
in terms of the running time of this sort because it actually is doing 

216
00:10:11,000 --> 00:10:14,000
some quick movement close to where it needs to be and that kind of fixes it up in the recursive calls that 

217
00:10:14,000 --> 00:10:21,000
examine the smaller sub sections of the [inaudible]. Let 

218
00:10:21,000 --> 00:10:28,000
me show you what it sounds like, because that's really what you want to know. So 

219
00:10:28,000 --> 00:10:29,000
let's - 

220
00:10:29,000 --> 00:10:34,000
it's gonna kind of [inaudible]. So 

221
00:10:34,000 --> 00:10:37,000
you can see the big partitioning happening and 

222
00:10:37,000 --> 00:10:40,000
then it kind of jumbling things up and then coming back to revisit them, but you 

223
00:10:40,000 --> 00:10:44,000
notice that quite quickly 

224
00:10:44,000 --> 00:10:47,000
kind of all the small ones kind of got thrown to the left all. All the large elements to the right 

225
00:10:47,000 --> 00:10:51,000
and then the kind of process of coming back and so the big partition that you can see kind 

226
00:10:51,000 --> 00:10:53,000
of working across them and then kind of noodling down. 

227
00:10:53,000 --> 00:11:21,000
If I turn the sound on for it and I'll probably take it down a little bit. 

228
00:11:21,000 --> 00:11:25,000
So the big partition steps making a lot of noise, right and moving things kind of 

229
00:11:25,000 --> 00:11:28,000
quickly and it almost appears, you know, to hear this kind of random sort of it's hard to 

230
00:11:28,000 --> 00:11:30,000
identify what's going on during the big partition, 

231
00:11:30,000 --> 00:11:34,000
but then as you hear it make its way down the recursive tree it's 

232
00:11:34,000 --> 00:11:36,000
focusing on these smaller and smaller regions where the numbers have all been 

233
00:11:36,000 --> 00:11:39,000
gathered because they are similar in value. And 

234
00:11:39,000 --> 00:11:42,000
you hear a lot of noodling in the same pitch range 

235
00:11:42,000 --> 00:11:45,000
region as its working on the smaller and smaller sub arrays and then they definitely go 

236
00:11:45,000 --> 00:11:49,000
from low to high because of the recursion chooses the left side to operate on 

237
00:11:49,000 --> 00:11:50,000
before the right side 

238
00:11:50,000 --> 00:11:51,000
that you hear 

239
00:11:51,000 --> 00:12:00,000
the work being done in the lower tones before it gets to the finishing up 

240
00:12:00,000 --> 00:12:01,000
of the high tones. Kinda cool. Let us 

241
00:12:01,000 --> 00:12:02,000
come back here and talk about 

242
00:12:02,000 --> 00:12:09,000
how to analyze this guy a little bit. 

243
00:12:09,000 --> 00:12:09,000
So the main part 

244
00:12:09,000 --> 00:12:13,000
of the algorithm is very simple and deceptively simple 

245
00:12:13,000 --> 00:12:15,000
because all the hard work was actually done in partition. 

246
00:12:15,000 --> 00:12:17,000
The partition step is linear and 

247
00:12:17,000 --> 00:12:18,000
if you 

248
00:12:18,000 --> 00:12:21,000
can kind of just go along with me conceptually, you'll see that we're moving this 

249
00:12:21,000 --> 00:12:24,000
left finger in; we're moving this right finger in 

250
00:12:24,000 --> 00:12:27,000
and we stop when they cross, so that means every element was looked at. 

251
00:12:27,000 --> 00:12:28,000
Either 

252
00:12:28,000 --> 00:12:31,000
they both matched over here and they matched over here, but eventually they let you 

253
00:12:31,000 --> 00:12:33,000
look at every element as you worked your way in 

254
00:12:33,000 --> 00:12:37,000
and did some swaps along the way. And so that process is linear and the number of 

255
00:12:37,000 --> 00:12:39,000
elements. There's a thousand of them kind of has to 

256
00:12:39,000 --> 00:12:41,000
advance maybe 400 here and 600 there, 

257
00:12:41,000 --> 00:12:45,000
but we'll look at all the elements and the process of partitioning them to the 

258
00:12:45,000 --> 00:12:48,000
lower or upper halves. 

259
00:12:48,000 --> 00:12:50,000
Then it makes two calls, quick sort 

260
00:12:50,000 --> 00:12:54,000
to the left and the right portions of that. 

261
00:12:54,000 --> 00:12:56,000
Well, we don't know exactly what that division was, 

262
00:12:56,000 --> 00:13:00,000
was it even, was it a little lop sided and that's certainly going to come back to be 

263
00:13:00,000 --> 00:13:01,000
something we're gonna look at. 

264
00:13:01,000 --> 00:13:05,000
If we assume kind of this is the ideal 50/50 split 

265
00:13:05,000 --> 00:13:07,000
sort of in an ideal world 

266
00:13:07,000 --> 00:13:09,000
if that choice we had for the pivot 

267
00:13:09,000 --> 00:13:13,000
happened to be the median or close enough to the median that effectively we got an even 

268
00:13:13,000 --> 00:13:15,000
division, at every level of recursion, 

269
00:13:15,000 --> 00:13:19,000
then the recurrence relation for the whole process is the time required to 

270
00:13:19,000 --> 00:13:19,000
sort, an 

271
00:13:19,000 --> 00:13:20,000
input of N 

272
00:13:20,000 --> 00:13:22,000
is in to partition it 

273
00:13:22,000 --> 00:13:24,000
and then 2, 

274
00:13:24,000 --> 00:13:27,000
sorting two halves that are each 

275
00:13:27,000 --> 00:13:29,000
half again as big as the original input. 

276
00:13:29,000 --> 00:13:33,000
We saw that recurrence relationship for merge sort; we solved it down. 

277
00:13:33,000 --> 00:13:37,000
Think in terms of the tree, right you have the branching of three, branching of two, branching of two 

278
00:13:37,000 --> 00:13:41,000
and at each level the partitioning being done across each level and then it 

279
00:13:41,000 --> 00:13:47,000
going to a depth of login, right, the number of times you can [inaudible] by two [inaudible] 

280
00:13:47,000 --> 00:13:49,000
single case arrays that are easily handled. 

281
00:13:49,000 --> 00:13:53,000
So it is an N login sort in this 

282
00:13:53,000 --> 00:13:54,000
perfect best 

283
00:13:54,000 --> 00:13:55,000
case. 

284
00:13:55,000 --> 00:13:56,000
So that should mean that 

285
00:13:56,000 --> 00:13:59,000
in terms of Big O growth curves, right, 

286
00:13:59,000 --> 00:14:02,000
merge sort and quick sort should look about the same. 

287
00:14:02,000 --> 00:14:05,000
It does have sort of better factors, though. 

288
00:14:05,000 --> 00:14:06,000
Let me show you. 

289
00:14:06,000 --> 00:14:08,000
I go back here and I just run them against each other. 

290
00:14:08,000 --> 00:14:15,000
If I put quick sort versus merge sort here, stop 

291
00:14:15,000 --> 00:14:17,000
making noise, 

292
00:14:17,000 --> 00:14:19,000


293
00:14:19,000 --> 00:14:21,000
the quick sort 

294
00:14:21,000 --> 00:14:22,000
pretty handily 

295
00:14:22,000 --> 00:14:27,000
managed to beat merge sort in this case, merge sort was on the top, quick sort was underneath 

296
00:14:27,000 --> 00:14:29,000
and certainly in terms of what was going on it was doing, 

297
00:14:29,000 --> 00:14:32,000
looks like more comparisons right, to get everything there, that partition 

298
00:14:32,000 --> 00:14:33,000
step did a lot of comparison, 

299
00:14:33,000 --> 00:14:35,000
but fewer moves, 

300
00:14:35,000 --> 00:14:38,000
and that kind of actually does sort of make intuitive sense. You think that quick 

301
00:14:38,000 --> 00:14:41,000
sort very quickly moves stuff to where it goes and does a little bit of noodling. 

302
00:14:41,000 --> 00:14:44,000
Merge sort does a lot of moving things away and moving them back and kind of like 

303
00:14:44,000 --> 00:14:46,000
as you see it growing, 

304
00:14:46,000 --> 00:14:47,000
you'll see a lot of 

305
00:14:47,000 --> 00:14:50,000
large elements that get 

306
00:14:50,000 --> 00:14:53,000
placed up in the front, but then have to be moved again right, because that was 

307
00:14:53,000 --> 00:14:57,000
not their final resting place. And so merge sort does a lot of kind of movement away and back 

308
00:14:57,000 --> 00:15:00,000
that tends to cost it overall, right. You know, a 

309
00:15:00,000 --> 00:15:05,000
higher constant factor of the moves than something like quick sort 

310
00:15:05,000 --> 00:15:08,000
does where it more quickly gets to the right place and then doesn't move it 

311
00:15:08,000 --> 00:15:13,000
again. So 

312
00:15:13,000 --> 00:15:16,000
that was all well and good. That was assuming that everything went perfectly. 

313
00:15:16,000 --> 00:15:20,000
That was given our simplistic choice of the pivot, 

314
00:15:20,000 --> 00:15:24,000
you can imagine that's really assuming a lot right that went our way. 

315
00:15:24,000 --> 00:15:29,000
If we look at a particularly bad split, 

316
00:15:29,000 --> 00:15:32,000
let's imagine that the number we have to pick is pretty close to, 

317
00:15:32,000 --> 00:15:34,000
you know, one of the extremes. If I were 

318
00:15:34,000 --> 00:15:35,000
sorting papers by alphabet 

319
00:15:35,000 --> 00:15:38,000
and if the one on top happened to be a C, 

320
00:15:38,000 --> 00:15:40,000
right, well then I'm gonna be dividing it pretty lop sided, right, it's just gonna 

321
00:15:40,000 --> 00:15:44,000
be the As and the Bs on one side and then everything [inaudible] to the end on the other 

322
00:15:44,000 --> 00:15:45,000
side. 

323
00:15:45,000 --> 00:15:49,000
If I got a split that was about 90/10, ninety percent went on one side, ten 

324
00:15:49,000 --> 00:15:50,000
percent on the other, 

325
00:15:50,000 --> 00:15:52,000
you would expect right 

326
00:15:52,000 --> 00:15:56,000
for it to kind of change the running time of what's going on here, so I get 

327
00:15:56,000 --> 00:15:59,000
one tenth of them over here and the low half and maybe be nine-tenths in the right half. 

328
00:15:59,000 --> 00:16:03,000
Let's just say every time it always takes nine-tenths and moves it to the upper half, so I 

329
00:16:03,000 --> 00:16:05,000
kept picking something that's artificially close to the front. 

330
00:16:05,000 --> 00:16:07,000
These like will kind of peter out 

331
00:16:07,000 --> 00:16:11,000
fairly quickly diving N by 10 and 10 and 10 and 10, eventually these are gonna peter 

332
00:16:11,000 --> 00:16:14,000
out very soon, but this one arm over there 

333
00:16:14,000 --> 00:16:18,000
where I keep taking 90 percent of what remains. I had 90 percent, but I had 

334
00:16:18,000 --> 00:16:19,000
81 percent. 

335
00:16:19,000 --> 00:16:23,000
I keep multiplying by nine-tenths and I'm still kind of retaining a lot of 

336
00:16:23,000 --> 00:16:26,000
elements on this one big portion, 

337
00:16:26,000 --> 00:16:29,000
that the depth of this tree isn't gonna bottom out quite as quickly as it 

338
00:16:29,000 --> 00:16:33,000
used to, but it used to the number of times you could divide N by 2, the log based 

339
00:16:33,000 --> 00:16:35,000
2 of N was where we landed. 

340
00:16:35,000 --> 00:16:37,000
In this case, I have to 

341
00:16:37,000 --> 00:16:40,000
take N and multiply it by nine-tenths so 

342
00:16:40,000 --> 00:16:41,000
instead of one half, 

343
00:16:41,000 --> 00:16:43,000
right, to the K, it's nine tenths to the K 

344
00:16:43,000 --> 00:16:47,000
and it's like how many iterations how deep will this get before it gets 

345
00:16:47,000 --> 00:16:49,000
to the simple case 

346
00:16:49,000 --> 00:16:51,000
and so solving for 

347
00:16:51,000 --> 00:16:56,000
N equals ten-ninths to the K is taking the log-based ten-ninths of both sides, 

348
00:16:56,000 --> 00:17:02,000
then K, the number of levels is the log-based ten-ninths of them. We're kind of 

349
00:17:02,000 --> 00:17:05,000
relying on a fact of mathematics here though is that all algorithms are the 

350
00:17:05,000 --> 00:17:09,000
same value, so log-based A of N and log-based B of 

351
00:17:09,000 --> 00:17:12,000
N, they only differ by some constant that you can 

352
00:17:12,000 --> 00:17:15,000
compute if you just rearrange the terms around. So in effect 

353
00:17:15,000 --> 00:17:16,000
this means that 

354
00:17:16,000 --> 00:17:20,000
there is a constant in front of this. It's like a constant 

355
00:17:20,000 --> 00:17:23,000
difference, the ratio between ten-ninths and 2 

356
00:17:23,000 --> 00:17:27,000
that distinguishes these, but it still can be logged based 2 of N in 

357
00:17:27,000 --> 00:17:30,000
terms of Big O, right, where we can throw away those constants. 

358
00:17:30,000 --> 00:17:34,000
So even those this will perform, obviously in, you know, 

359
00:17:34,000 --> 00:17:38,000
experimentally you would expect that getting this kind of split would cause it 

360
00:17:38,000 --> 00:17:39,000
to 

361
00:17:39,000 --> 00:17:40,000
work more slowly than it would have 

362
00:17:40,000 --> 00:17:44,000
in a perfect split situation. The Big O [inaudible] is still gonna look in-log-in arrhythmic and 

363
00:17:44,000 --> 00:17:47,000
[inaudible]. So 

364
00:17:47,000 --> 00:17:50,000
that was pretty good to know. So it makes you feel a little bit confident, like sometimes it might be 

365
00:17:50,000 --> 00:17:54,000
getting an even split and sometimes it might be getting one-third, two-thirds, sometimes it might be getting 

366
00:17:54,000 --> 00:17:56,000
you know, nine-tenths, but if it was kind of 

367
00:17:56,000 --> 00:18:01,000
in the mix still dividing off some fraction is still doing okay. 

368
00:18:01,000 --> 00:18:07,000
Now, let's look at the really, really, really worst case split. The 

369
00:18:07,000 --> 00:18:10,000
really, really worst case split right, it didn't even take off 

370
00:18:10,000 --> 00:18:13,000
a fraction. It just managed to separate one, 

371
00:18:13,000 --> 00:18:15,000
but somehow the pivot here 

372
00:18:15,000 --> 00:18:16,000


373
00:18:16,000 --> 00:18:17,000
did a really 

374
00:18:17,000 --> 00:18:20,000
terribly crummy job, right, of dividing it at all 

375
00:18:20,000 --> 00:18:23,000
that in effect, all the elements were greater than the pivot or less than 

376
00:18:23,000 --> 00:18:27,000
the pivot since they all landed on one side and the pivot was kind of by itself. 

377
00:18:27,000 --> 00:18:29,000
So starting with an input of size N, 

378
00:18:29,000 --> 00:18:33,000
we sectioned off 1 and we had N minus 1 remaining. Well, the next time, 

379
00:18:33,000 --> 00:18:36,000
we sectioned off another one, so this happened again and again and again. 

380
00:18:36,000 --> 00:18:39,000
We just got really bad luck all the way through. 

381
00:18:39,000 --> 00:18:42,000
Each of these levels is only making progress for the base case at a snails 

382
00:18:42,000 --> 00:18:43,000
pace. 

383
00:18:43,000 --> 00:18:47,000
Right, one element is being subtracted each subsequent level 

384
00:18:47,000 --> 00:18:50,000
and that's gonna go to a depth of N. 

385
00:18:50,000 --> 00:18:53,000
And so now if we're doing N work across the levels, it isn't quite N 

386
00:18:53,000 --> 00:18:56,000
work because in fact, some of these bottom out. It's still the Gaussian sum in the 

387
00:18:56,000 --> 00:18:57,000
end 

388
00:18:57,000 --> 00:19:01,000
is that we are got N levels doing N work each. 

389
00:19:01,000 --> 00:19:03,000
We suddenly have 

390
00:19:03,000 --> 00:19:05,000
what was a linear arrhythmic algorithm 

391
00:19:05,000 --> 00:19:07,000
now performing quadritically. So 

392
00:19:07,000 --> 00:19:09,000
in the class the selection sort, 

393
00:19:09,000 --> 00:19:11,000
inscription sort 

394
00:19:11,000 --> 00:19:14,000
in its worst-case behavior. 

395
00:19:14,000 --> 00:19:17,000
So quite a range whereas merge sort is a very reliable performer, merge doesn't 

396
00:19:17,000 --> 00:19:18,000
care. 

397
00:19:18,000 --> 00:19:21,000
No matter what the order is, no matter what's going on, it's the same 

398
00:19:21,000 --> 00:19:22,000
amount of work, 

399
00:19:22,000 --> 00:19:26,000
it means that there's some inputs that can cause quick sort to vary between being linear 

400
00:19:26,000 --> 00:19:27,000
arrhythmic or being quadratic, 

401
00:19:27,000 --> 00:19:30,000
and there's a huge difference as we saw in our earlier charts about 

402
00:19:30,000 --> 00:19:32,000
how those things will run for 

403
00:19:32,000 --> 00:19:36,000
large values. So what 

404
00:19:36,000 --> 00:19:38,000
makes the worst case 

405
00:19:38,000 --> 00:19:42,000
- given how we're choosing a pivot right now is to take the first thing in the 

406
00:19:42,000 --> 00:19:44,000
sub section to look at as the pivot, 

407
00:19:44,000 --> 00:19:47,000
what's the example input that gives you this? Sorted. If 

408
00:19:47,000 --> 00:19:49,000
it's already sorted. 

409
00:19:49,000 --> 00:19:50,000
Isn't that a charmer? 

410
00:19:50,000 --> 00:19:54,000
Here's a sorting algorithm. If you ask it to do something 

411
00:19:54,000 --> 00:19:56,000
and in fact if you accidentally [inaudible] twice, you already had sorted 

412
00:19:56,000 --> 00:19:59,000
the data and you said, ''Oh, you did something,'' and you passed it back to the sort, 

413
00:19:59,000 --> 00:20:03,000
it would suddenly actually degenerate into its very worst case. It's already 

414
00:20:03,000 --> 00:20:05,000
sorted, so it would say, 

415
00:20:05,000 --> 00:20:06,000
''I've got this stack of exam papers.'' 

416
00:20:06,000 --> 00:20:09,000
I look at the first one and I say, ''Oh, look it's Adam's.'' And I say, ''Okay, well, let me 

417
00:20:09,000 --> 00:20:12,000
go find all the ones that belong in the left side.'' None of them do. I said, ''Okay, 

418
00:20:12,000 --> 00:20:15,000
well put Adam's over here.'' I'll [inaudible] sort that. That was easy. 

419
00:20:15,000 --> 00:20:18,000
Oh, let me look at what I've got left, oh with baker on the top. 

420
00:20:18,000 --> 00:20:21,000
Okay, well let me put Baker by itself, but could we find the other ones? Oh, no, 

421
00:20:21,000 --> 00:20:24,000
no more. It would just you 

422
00:20:24,000 --> 00:20:27,000
know, continue because I'm doing this thing, looking at all N, looking at all N minus 1, looking 

423
00:20:27,000 --> 00:20:31,000
at N minus 2, all the way down to the bottom making no, 

424
00:20:31,000 --> 00:20:35,000
recognizing nothing about how beautifully the data already was arranged 

425
00:20:35,000 --> 00:20:36,000
and 

426
00:20:36,000 --> 00:20:38,000
you know, getting its worst case. 

427
00:20:38,000 --> 00:20:42,000
There's actually a couple of others that come in, too. That's probably the most obvious one to think 

428
00:20:42,000 --> 00:20:46,000
of is it's coming in in increasing order. It's also true if its in decreasing order. 

429
00:20:46,000 --> 00:20:49,000
If I happen to have the largest value on the top, then 

430
00:20:49,000 --> 00:20:52,000
I'll end up splitting it all into, everything to the left side and nothing 

431
00:20:52,000 --> 00:20:53,000
to the right. 

432
00:20:53,000 --> 00:20:58,000
There are actually other variations that will also produce this. If at any given 

433
00:20:58,000 --> 00:21:01,000
stage that we're looking at a sub array, if the first value happens to be the extreme 

434
00:21:01,000 --> 00:21:04,000
of what remains whether it's the small to the large, and it could alternate, which would 

435
00:21:04,000 --> 00:21:06,000
be kind of a funny pattern, but if you 

436
00:21:06,000 --> 00:21:10,000
have the tallest and the shortest and the tallest and then another tallest and then a shortest, 

437
00:21:10,000 --> 00:21:13,000
so those would look a little bit harder to describe, but 

438
00:21:13,000 --> 00:21:18,000
there are some other alternatives, that product this bad result. 

439
00:21:18,000 --> 00:21:21,000
All of these we could call degenerate cases. 

440
00:21:21,000 --> 00:21:25,000
There's a small number of these relative to the N factorial [inaudible] your 

441
00:21:25,000 --> 00:21:28,000
data could come in. In fact, there was a huge number, and so there's lots and 

442
00:21:28,000 --> 00:21:29,000
lots of ways it could be arranged. 

443
00:21:29,000 --> 00:21:32,000
There's a very small number of them that would be arranged in such a way to 

444
00:21:32,000 --> 00:21:35,000
trigger this very, very worst-case behaviors. 

445
00:21:35,000 --> 00:21:38,000
Some are close to worst case, like if they were almost sorted, they might every now and 

446
00:21:38,000 --> 00:21:41,000
then get a good split, but mostly a bad split, 

447
00:21:41,000 --> 00:21:44,000
but the number that actually degenerate to the absolute worst case is actually 

448
00:21:44,000 --> 00:21:47,000
quite small, but 

449
00:21:47,000 --> 00:21:49,000
the truth is 

450
00:21:49,000 --> 00:21:51,000
is do we expect that those outputs are somehow 

451
00:21:51,000 --> 00:21:56,000
hard to imagine us getting. Are they so unusual 

452
00:21:56,000 --> 00:21:59,000
and weird that you might be willing to say it's okay that my algorithm has 

453
00:21:59,000 --> 00:22:01,000
this one or a couple of bad cases in it 

454
00:22:01,000 --> 00:22:03,000
because it never comes up. 

455
00:22:03,000 --> 00:22:06,000
In this case, the only thing that your data came in sorted or almost sorted or reverse sorted 

456
00:22:06,000 --> 00:22:09,000
is probably not unlikely. 

457
00:22:09,000 --> 00:22:11,000
It might be that you know, you happen to be reading from 

458
00:22:11,000 --> 00:22:15,000
a file on disc and somebody has taken the [inaudible] and sort it before they gave the 

459
00:22:15,000 --> 00:22:19,000
data to you. If all the sudden that caused your program to behave really badly, 

460
00:22:19,000 --> 00:22:20,000
that would really be 

461
00:22:20,000 --> 00:22:21,000
a 

462
00:22:21,000 --> 00:22:23,000
pretty questionable outcome. 

463
00:22:23,000 --> 00:22:27,000
So let's go aback and look at just to see, though. It's kind of interesting to see 

464
00:22:27,000 --> 00:22:30,000
it happening in 

465
00:22:30,000 --> 00:22:33,000
- this is against quick sort versus merge sort. 

466
00:22:33,000 --> 00:22:38,000
If I change the data to be partially sorted 

467
00:22:38,000 --> 00:22:41,000
and let it run again, 

468
00:22:41,000 --> 00:22:43,000
if I still manage to beat it 

469
00:22:43,000 --> 00:22:44,000
doing a little bit more, 

470
00:22:44,000 --> 00:22:47,000
if I change it to totally sorted 

471
00:22:47,000 --> 00:22:53,000
or reverse sorted, for that matter, and 

472
00:22:53,000 --> 00:22:54,000
so they look like they're doing a lot of work, 

473
00:22:54,000 --> 00:22:56,000
a lot of work about nothing. 

474
00:22:56,000 --> 00:22:57,000
And you can see that quick sort 

475
00:22:57,000 --> 00:22:59,000
really is still way back there. 

476
00:22:59,000 --> 00:23:02,000
It has looked at the first 10 percent or so and is doing a lot 

477
00:23:02,000 --> 00:23:04,000
of frantic partitioning 

478
00:23:04,000 --> 00:23:07,000
that never moves anything. 

479
00:23:07,000 --> 00:23:11,000
And visibly kind of traipsing it's way up there. Merge sort meanwhile is actually 

480
00:23:11,000 --> 00:23:13,000
on the beach in Jamaica with a 

481
00:23:13,000 --> 00:23:15,000
daiquiri 

482
00:23:15,000 --> 00:23:16,000
mocking quick sort 

483
00:23:16,000 --> 00:23:21,000
for all those times that it had said it was so much better than it was. 

484
00:23:21,000 --> 00:23:24,000
''Duh, it was already sorted; you Doofus.'' 

485
00:23:24,000 --> 00:23:25,000
Apparently it wasn't listening. Merge 

486
00:23:25,000 --> 00:23:28,000
sort almost looked like it did nothing and that's because it took the left half, 

487
00:23:28,000 --> 00:23:32,000
which is the smallest half, moved it off, right, kind of copied it back and it ends up copying 

488
00:23:32,000 --> 00:23:38,000
each element back to the same location it was already originally present in. There you go, quick sort 

489
00:23:38,000 --> 00:23:39,000
taking its time and it if I 

490
00:23:39,000 --> 00:23:42,000
ran it against, let's say, 

491
00:23:42,000 --> 00:23:46,000
insertion sort and selection sort, why not 

492
00:23:46,000 --> 00:23:50,000
[inaudible] nothing better to do than to sort for you all day long. 

493
00:23:50,000 --> 00:23:54,000
So there insertion sort actually finishing very early because it does nothing. It sort of 

494
00:23:54,000 --> 00:23:57,000
looked at them all and said they were already in the right order, 

495
00:23:57,000 --> 00:24:00,000
merge sort doing a little bit more work to move everything back and forth in the same place. 

496
00:24:00,000 --> 00:24:02,000
But 

497
00:24:02,000 --> 00:24:02,000


498
00:24:02,000 --> 00:24:04,000
in this case, selection sort 

499
00:24:04,000 --> 00:24:06,000
and 

500
00:24:06,000 --> 00:24:08,000
quick sort here at the bottom and selection sort here at the top 

501
00:24:08,000 --> 00:24:11,000
doing really roughly about the same work if you look at the sum total of the comparisons 

502
00:24:11,000 --> 00:24:13,000
and moves 

503
00:24:13,000 --> 00:24:16,000
and then time spent on those suddenly shows that yeah, in this input 

504
00:24:16,000 --> 00:24:19,000
situation quick sort is performing like an N-squared sort, 

505
00:24:19,000 --> 00:24:23,000
and obviously very similar constant factors to those present in selection sort, 

506
00:24:23,000 --> 00:24:24,000
so really 

507
00:24:24,000 --> 00:24:28,000
behaving totally quadratically in that worst-case input. If I make 

508
00:24:28,000 --> 00:24:38,000
it reverse sorted, it's a little more interesting because you can kind of see something going on. Everybody kind of doing their thing. 

509
00:24:38,000 --> 00:24:40,000
Insertion sort now kind of hitting its worst case, 

510
00:24:40,000 --> 00:24:43,000
so it actually coming in dead last because it 

511
00:24:43,000 --> 00:24:45,000
did so much extra moving, but 

512
00:24:45,000 --> 00:24:49,000
still showing the kind of quadratic terms, right, that selection 

513
00:24:49,000 --> 00:24:53,000
sort and quick sort are both having, but higher constant factors there, so taking almost 

514
00:24:53,000 --> 00:24:54,000
twice as long because 

515
00:24:54,000 --> 00:24:55,000
actually doing a little 

516
00:24:55,000 --> 00:25:02,000
bit extra work because of that inverted 

517
00:25:02,000 --> 00:25:04,000
situation. [Inaudible] something big, just cause. We'll put it back 

518
00:25:04,000 --> 00:25:10,000
into random, back into full speed and kind of watch some things happen. 

519
00:25:10,000 --> 00:25:11,000
So quick sort right, just 

520
00:25:11,000 --> 00:25:14,000
sort of done in a flash, merge sort finishing behind it 

521
00:25:14,000 --> 00:25:17,000
and ordinarily 

522
00:25:17,000 --> 00:25:20,000
the quadratic sorts taking quite a much 

523
00:25:20,000 --> 00:25:22,000
longer time than our two recursive sorts. But 

524
00:25:22,000 --> 00:25:30,000
it's the random input really buying quick sort its speed. Is 

525
00:25:30,000 --> 00:25:33,000
it gonna win - oh, come on. Oh, come on. Don't you want selection sort to come behind? I always wanted it to 

526
00:25:33,000 --> 00:25:37,000
come from behind. It's kind of like rooting for the underdog. 

527
00:25:37,000 --> 00:25:38,000
Yes. 

528
00:25:38,000 --> 00:25:42,000
And yet, just remember don't mock insertion sort. What it sorted is 

529
00:25:42,000 --> 00:25:46,000
the only one that recognizes it and does something clever about it. [Inaudible] 

530
00:25:46,000 --> 00:25:49,000
sort is anti-clever about it, so 

531
00:25:49,000 --> 00:25:55,000
it still can hold its head up and be proud. This 

532
00:25:55,000 --> 00:25:58,000
kind of [inaudible]. I was 

533
00:25:58,000 --> 00:26:01,000
thinking about like what algorithms, right, there's a reason why 

534
00:26:01,000 --> 00:26:04,000
there's four that we talked about and there's actually a dozen 

535
00:26:04,000 --> 00:26:07,000
more. It's like different situations actually produce different 

536
00:26:07,000 --> 00:26:09,000
outcomes for different 

537
00:26:09,000 --> 00:26:12,000
sorts and there are reasons that even though they might not be the best sort for all 

538
00:26:12,000 --> 00:26:15,000
purposes, there are some situations where it might be the right one, so if you had it, 

539
00:26:15,000 --> 00:26:18,000
as data said that you expect it to be mostly sorted, 

540
00:26:18,000 --> 00:26:19,000
but had a few [inaudible] in it. 

541
00:26:19,000 --> 00:26:22,000
Like if you had a pile or sand papers that were sorted and you dropped them on the floor and they 

542
00:26:22,000 --> 00:26:25,000
got scuffed up a little bit, but they're still largely sorted, insertion sort is the 

543
00:26:25,000 --> 00:26:26,000
way to 

544
00:26:26,000 --> 00:26:29,000
go. It will take advantage of the work that was already done 

545
00:26:29,000 --> 00:26:33,000
and not redo it or create kind of havoc the way quick sort 

546
00:26:33,000 --> 00:26:36,000
would. But quick sort, the last thing we need to tell you, though, about it is we can't 

547
00:26:36,000 --> 00:26:40,000
tolerate this. Like the way quick sort is in its classic form 

548
00:26:40,000 --> 00:26:41,000
with this 

549
00:26:41,000 --> 00:26:42,000
first element as pivot 

550
00:26:42,000 --> 00:26:45,000
would be an unacceptable way to do this algorithm. 

551
00:26:45,000 --> 00:26:49,000
So quick sort actually typically is one of the most standard element that's offered in 

552
00:26:49,000 --> 00:26:52,000
a library of programming languages, the sort element. 

553
00:26:52,000 --> 00:26:53,000
Well, it 

554
00:26:53,000 --> 00:26:57,000
has to have some strategy for how to deal with this in a way that does not 

555
00:26:57,000 --> 00:26:59,000
degenerate 

556
00:26:59,000 --> 00:27:02,000
and so the idea is, what you need to do is just pick a different 

557
00:27:02,000 --> 00:27:03,000
choice for the pivot, 

558
00:27:03,000 --> 00:27:06,000
a little bit more clever, spend a little bit more time, 

559
00:27:06,000 --> 00:27:09,000
do something that's a little less predictable than just picking that first 

560
00:27:09,000 --> 00:27:11,000
most element. 

561
00:27:11,000 --> 00:27:12,000
So to take 

562
00:27:12,000 --> 00:27:16,000
it to the far extreme, one thing you could do is just computer the median, analyze the data and compute the median. 

563
00:27:16,000 --> 00:27:19,000
It turns out there is a linear time algorithm for this that 

564
00:27:19,000 --> 00:27:22,000
would look at every element of the data set once and then be able to tell 

565
00:27:22,000 --> 00:27:23,000
you what the median was 

566
00:27:23,000 --> 00:27:25,000
and that would guarantee you that 50/50 split. 

567
00:27:25,000 --> 00:27:30,000
So if you go find it and use it at every stage, you will guarantee it. 

568
00:27:30,000 --> 00:27:34,000
Most algorithms don't tend to do that. That's actually kind of overkill for the problem. We 

569
00:27:34,000 --> 00:27:34,000
want to get it to where 

570
00:27:34,000 --> 00:27:38,000
it's pretty much guaranteed to never get the worst case. But we're not 

571
00:27:38,000 --> 00:27:41,000
concerned with it getting 50/50. It's got 60/40 or something close enough, that 

572
00:27:41,000 --> 00:27:45,000
actually, you know, 60/40, 70/30 and it was 

573
00:27:45,000 --> 00:27:47,000
bopping around in that range it'd be fine, 

574
00:27:47,000 --> 00:27:50,000
so the other two that are much more commonly used 

575
00:27:50,000 --> 00:27:53,000
is some kind of approximation of the median 

576
00:27:53,000 --> 00:27:56,000
with a little bit of guessing or something in it. For example, median of three 

577
00:27:56,000 --> 00:27:57,000
takes 

578
00:27:57,000 --> 00:28:01,000
three elements and typically it takes three from some specified position, it takes 

579
00:28:01,000 --> 00:28:02,000
the middle most element, 

580
00:28:02,000 --> 00:28:04,000
the last element and the front element and it says, 

581
00:28:04,000 --> 00:28:06,000
''Okay, given those three, 

582
00:28:06,000 --> 00:28:10,000
we arrange them to find out what's the middle of those three, and use that.'' 

583
00:28:10,000 --> 00:28:12,000
If the data was already sorted, it turns out you've got the median, right 

584
00:28:12,000 --> 00:28:14,000
because it wasn't the middle most. 

585
00:28:14,000 --> 00:28:17,000
If data was just in random position, then you've got one that you know, at least 

586
00:28:17,000 --> 00:28:22,000
there's one element on one side, right, and so the odds that that every single time 

587
00:28:22,000 --> 00:28:25,000
would produce a very bad split is pretty low. 

588
00:28:25,000 --> 00:28:28,000
There are some inputs that could kind of get it to generate a little bit, but it's 

589
00:28:28,000 --> 00:28:32,000
pretty foolproof in most ordinary situations. 

590
00:28:32,000 --> 00:28:36,000
Even more unpredictable would be just choose a random element. 

591
00:28:36,000 --> 00:28:39,000
So look at the start to stop index you have, pick an random there, 

592
00:28:39,000 --> 00:28:44,000
flop in to the front and now use it as the pivot element. 

593
00:28:44,000 --> 00:28:45,000
If you 

594
00:28:45,000 --> 00:28:47,000
don't know ahead of time how your random number [inaudible] it's impossible 

595
00:28:47,000 --> 00:28:51,000
to generate an input and force it into the worst-case behavior. And so the idea is that 

596
00:28:51,000 --> 00:28:53,000
you're kind of counting on randomness 

597
00:28:53,000 --> 00:28:56,000
and just the probabilistic outcome if 

598
00:28:56,000 --> 00:29:00,000
it managing to be such that the way it shows the random element was to pick the 

599
00:29:00,000 --> 00:29:02,000
extreme and everything, it is just impossible, 

600
00:29:02,000 --> 00:29:04,000
the odds are astronomically against it, 

601
00:29:04,000 --> 00:29:06,000
and so it will, 

602
00:29:06,000 --> 00:29:08,000
a very simple fix, right 

603
00:29:08,000 --> 00:29:09,000


604
00:29:09,000 --> 00:29:12,000
that still leaves the possibility of the worst case in there, 

605
00:29:12,000 --> 00:29:17,000
but you know, much, much, much far removed probability sense. So, 

606
00:29:17,000 --> 00:29:20,000
just simple things, right, but from there the algorithm operates the 

607
00:29:20,000 --> 00:29:22,000
same way it always has which is to say, 

608
00:29:22,000 --> 00:29:24,000
''Pick the median, however, you want to do it, 

609
00:29:24,000 --> 00:29:27,000
move it into that front slot and then carry on as before in terms of the left 

610
00:29:27,000 --> 00:29:30,000
and the right hand and moving down and swapping and recursing and all that [inaudible].'' 

611
00:29:30,000 --> 00:29:34,000
Any 

612
00:29:34,000 --> 00:29:40,000
questions; anything about sorting, sorting algorithms? Why don't we 

613
00:29:40,000 --> 00:29:42,000
do some coding actually, 

614
00:29:42,000 --> 00:29:44,000
which is the next thing I want to show you 

615
00:29:44,000 --> 00:29:46,000
because once you know how to write sorts, 

616
00:29:46,000 --> 00:29:47,000


617
00:29:47,000 --> 00:29:50,000
sort of the other piece I think you need to have to go with it is how would I write 

618
00:29:50,000 --> 00:29:54,000
a sort to be generally useful in a large variety of situations that 

619
00:29:54,000 --> 00:29:57,000
knowing about these sorts I might decide to build 

620
00:29:57,000 --> 00:30:01,000
myself a really general purpose, good, high performance, quick sort, but I 

621
00:30:01,000 --> 00:30:03,000
could use again, and again and again. 

622
00:30:03,000 --> 00:30:07,000
I want to make it work generically so I could sort strings, I could sort 

623
00:30:07,000 --> 00:30:10,000
numbers, or I could sort students, or I could sort 

624
00:30:10,000 --> 00:30:11,000
vectors of students or whatever, 

625
00:30:11,000 --> 00:30:14,000
and I'd want to make sure that no matter what I needed to do it was gonna 

626
00:30:14,000 --> 00:30:17,000
solve all my problems, this one kid of sorting tool, 

627
00:30:17,000 --> 00:30:20,000
and we need to know a little bit of C++ to make that work 

628
00:30:20,000 --> 00:30:22,000
by use of the function template. 

629
00:30:22,000 --> 00:30:27,000
So if you want to ask about sorting algorithms, now is the time. [Inaudible]. We don't 

630
00:30:27,000 --> 00:30:28,000
bubble sort, 

631
00:30:28,000 --> 00:30:31,000
which is kind of interesting. It will come up actually in the assignment 

632
00:30:31,000 --> 00:30:34,000
I give you next week because it is a little bit of an historical artifact 

633
00:30:34,000 --> 00:30:38,000
to know about it, but it turns out bubble sort is one of those sorts that 

634
00:30:38,000 --> 00:30:43,000
is dominated by pretty much every other sort you'll know about in every way, as there's 

635
00:30:43,000 --> 00:30:45,000
really nothing to recommend it. As I said, each of these four have something 

636
00:30:45,000 --> 00:30:49,000
about them that actually has a strength or whatever. Bubble sort really doesn't. 

637
00:30:49,000 --> 00:30:52,000
It's a little bit harder to write than insertion sort. It's a little bit slower than insertion 

638
00:30:52,000 --> 00:30:56,000
sort; it's a little bit easier to get it wrong than insertion sort. It has higher constant 

639
00:30:56,000 --> 00:30:57,000
factors. You know, 

640
00:30:57,000 --> 00:31:01,000
it does recognize the data and sort order, but so does insertion sort, so it's hard to come up with 

641
00:31:01,000 --> 00:31:04,000
a reason to actually spend a lot of time on it, but I will expose you to it in the 

642
00:31:04,000 --> 00:31:07,000
assignment because I think it's a little bit of a - 

643
00:31:07,000 --> 00:31:08,000
it's part of 

644
00:31:08,000 --> 00:31:10,000
our history right, as a computer science 

645
00:31:10,000 --> 00:31:14,000
student to be expose to. 

646
00:31:14,000 --> 00:31:16,000
Anything else? I'm gonna show 

647
00:31:16,000 --> 00:31:19,000
you some things about function templates. Oh, these 

648
00:31:19,000 --> 00:31:21,000
are some numbers; I forgot to give that. 

649
00:31:21,000 --> 00:31:23,000
Just to say, in the end, right, 

650
00:31:23,000 --> 00:31:26,000
which we saw a little bit in the analysis that the constant factors on quick 

651
00:31:26,000 --> 00:31:29,000
sort are 

652
00:31:29,000 --> 00:31:33,000
noticeable when compared to merge sort by a factor of 4, 

653
00:31:33,000 --> 00:31:35,000
moving things more quickly and not having to 

654
00:31:35,000 --> 00:31:39,000
mess with them again. Quick 

655
00:31:39,000 --> 00:31:41,000
sort [inaudible] 

656
00:31:41,000 --> 00:31:44,000
in totally random order? 

657
00:31:44,000 --> 00:31:48,000
Yes, they are. Yes - So it doesn't have them taking - So this is actually using a classic quick 

658
00:31:48,000 --> 00:31:52,000
sort with no degenerate protection on random data. 

659
00:31:52,000 --> 00:31:55,000
If I had put it in, sorted it in on that case, you would definitely see 

660
00:31:55,000 --> 00:31:58,000
like numbers comparable to like selection sorts, eight hours in that 

661
00:31:58,000 --> 00:32:00,000
last slot, right. Or 

662
00:32:00,000 --> 00:32:01,000
you 

663
00:32:01,000 --> 00:32:03,000
[inaudible] degenerate protection, and it would probably 

664
00:32:03,000 --> 00:32:07,000
have an in the noise a small slow down for that little extra work that's being 

665
00:32:07,000 --> 00:32:11,000
done, but then you'll be getting in log and performance reliable across 

666
00:32:11,000 --> 00:32:13,000
all states of inputs. So 

667
00:32:13,000 --> 00:32:15,000
[inaudible] protection as it 

668
00:32:15,000 --> 00:32:19,000
starts to even out time wise with merge sort, does it still - No, it doesn't. 

669
00:32:19,000 --> 00:32:23,000
It actually is an almost imperceptible change in the time because it depends on 

670
00:32:23,000 --> 00:32:26,000
which form of it you use because if you use, for example the random or median of three, 

671
00:32:26,000 --> 00:32:31,000
the amount of work you're doing is about two more things per level in the tree and 

672
00:32:31,000 --> 00:32:34,000
so it turns out that, yeah, 

673
00:32:34,000 --> 00:32:37,000
over the space of login it's just not enough to even be noticeable. 

674
00:32:37,000 --> 00:32:41,000
Now if you did the full median algorithm, you could actually start to see it because then 

675
00:32:41,000 --> 00:32:45,000
you'd see both a linear partition step and a linear median step and that 

676
00:32:45,000 --> 00:32:47,000
would actually raise the cause and factors where it would probably be closer in line to 

677
00:32:47,000 --> 00:32:49,000
merge sort. Pretty 

678
00:32:49,000 --> 00:32:58,000
much nobody writes it that way is the truth, but you could kind of in theory is why we [inaudible]. What I'm 

679
00:32:58,000 --> 00:32:59,000
going to show you, kind 

680
00:32:59,000 --> 00:33:02,000
of motivate this by starting with something simple, and then we're gonna move towards kind of building 

681
00:33:02,000 --> 00:33:04,000
the 

682
00:33:04,000 --> 00:33:05,000
fully generic, 

683
00:33:05,000 --> 00:33:06,000


684
00:33:06,000 --> 00:33:08,000
you know, one 

685
00:33:08,000 --> 00:33:10,000
algorithm does it all, 

686
00:33:10,000 --> 00:33:12,000
sorting function. 

687
00:33:12,000 --> 00:33:16,000
So what I'm gonna first look at is flop because it's a little bit easier to see it in this case, 

688
00:33:16,000 --> 00:33:19,000
is that you know, sometimes you need to take two variables and exchange their values. I mean 

689
00:33:19,000 --> 00:33:23,000
you need to go through a temporary to copy one and then copy the other back and what 

690
00:33:23,000 --> 00:33:26,000
not - swapping characters, swapping ends, swapping strings, swapping students, you 

691
00:33:26,000 --> 00:33:28,000
know, any kind of variables you wanted to swap, 

692
00:33:28,000 --> 00:33:29,000
you could write 

693
00:33:29,000 --> 00:33:31,000
a specific swap function 

694
00:33:31,000 --> 00:33:35,000
that takes two variables by reference of the integer, strain or character type that 

695
00:33:35,000 --> 00:33:36,000
you're trying to exchange 

696
00:33:36,000 --> 00:33:40,000
and then copies one to a temporary and exchanges the two values. 

697
00:33:40,000 --> 00:33:43,000
Because of the way C++ functions work, right, 

698
00:33:43,000 --> 00:33:46,000
you really do need to say, ''It's a string, this is a string, you know, what's being 

699
00:33:46,000 --> 00:33:49,000
declared here is a string,'' and so you might say, ''Well, if I need more than one swap, sometimes that swaps strings and characters, you know, my choices 

700
00:33:49,000 --> 00:33:52,000
are 

701
00:33:52,000 --> 00:33:53,000
basically 

702
00:33:53,000 --> 00:33:58,000
to duplicate and copy and paste and so on. 

703
00:33:58,000 --> 00:34:02,000
That's not good; we'd rather not do that. 

704
00:34:02,000 --> 00:34:04,000
But what I want to do is I want to 

705
00:34:04,000 --> 00:34:08,000
discuss what it takes to write something in template form. 

706
00:34:08,000 --> 00:34:12,000
We have been using templates right, the vector is a template, the set is a 

707
00:34:12,000 --> 00:34:15,000
template, the stack and queue and all these things are templates 

708
00:34:15,000 --> 00:34:17,000
that are written 

709
00:34:17,000 --> 00:34:20,000
in a very generic way using a place 

710
00:34:20,000 --> 00:34:24,000
holder, so the code that holds onto your collection of things is written 

711
00:34:24,000 --> 00:34:27,000
saying, ''I don't know what the thing is; is it a string; is it an N; is it a car?'' Well, 

712
00:34:27,000 --> 00:34:31,000
vectors just hold onto things and you as a client can decide what you're gonna 

713
00:34:31,000 --> 00:34:33,000
be storing in this particular kind of vector. 

714
00:34:33,000 --> 00:34:37,000
And so what we're gonna see is if I take those flat functions and I try to 

715
00:34:37,000 --> 00:34:41,000
distill them down and say, ''Well, what is it that any swap routine looks like?'' 

716
00:34:41,000 --> 00:34:43,000
It needs to take two variables by reference, it needs 

717
00:34:43,000 --> 00:34:46,000
to declare a template of that type and it needs to do the assignment all around 

718
00:34:46,000 --> 00:34:47,000


719
00:34:47,000 --> 00:34:49,000
to exchange those values. 

720
00:34:49,000 --> 00:34:51,000
Can I write a version 

721
00:34:51,000 --> 00:34:55,000
that is tight unspecified leaving a placeholder in there for 

722
00:34:55,000 --> 00:34:59,000
what - 

723
00:34:59,000 --> 00:35:00,000
that's really 

724
00:35:00,000 --> 00:35:01,000
kind of amazing, 

725
00:35:01,000 --> 00:35:03,000
that what we'Q1: gonna be swapping 

726
00:35:03,000 --> 00:35:05,000
and then 

727
00:35:05,000 --> 00:35:10,000
let there be multiple swaps generated from that one pattern. 

728
00:35:10,000 --> 00:35:11,000
So we're gonna do this actually in 

729
00:35:11,000 --> 00:35:14,000
the compiler because it's always 

730
00:35:14,000 --> 00:35:16,000
nice to see a little bit a code happening. So I 

731
00:35:16,000 --> 00:35:19,000
go over here 

732
00:35:19,000 --> 00:35:20,000
and 

733
00:35:20,000 --> 00:35:24,000
I got some code up there that 

734
00:35:24,000 --> 00:35:26,000
I'm just currently gonna [inaudible] because I don't want to deal 

735
00:35:26,000 --> 00:35:30,000
with it right now. We're gonna see it in a second. It doesn't [inaudible]. Okay, 

736
00:35:30,000 --> 00:35:41,000
so your usual swap looks like this. 

737
00:35:41,000 --> 00:35:44,000
And that would only swap exactly integers. If I wanted characters I'd 

738
00:35:44,000 --> 00:35:49,000
change it all around. So what I'm gonna change it to is a template. I'm gonna add a template 

739
00:35:49,000 --> 00:35:50,000
header at the top 

740
00:35:50,000 --> 00:35:53,000
and so the template header starts with a key-word template and then angle brackets that 

741
00:35:53,000 --> 00:35:54,000
says, 

742
00:35:54,000 --> 00:35:59,000
''What kind of placeholders does this template depend on?'' It depends on one type name 

743
00:35:59,000 --> 00:36:00,000
and then I've chose then name 

744
00:36:00,000 --> 00:36:02,000
capital T, type for it. 

745
00:36:02,000 --> 00:36:04,000
That's a choice that I'm going to make and it says 

746
00:36:04,000 --> 00:36:07,000
in the body of the function that's coming up 

747
00:36:07,000 --> 00:36:08,000
where I would have 

748
00:36:08,000 --> 00:36:11,000
ordinarily fully committed on the type, I'm gonna use this placeholder 

749
00:36:11,000 --> 00:36:13,000
capital T, type 

750
00:36:13,000 --> 00:36:22,000
instead. 

751
00:36:22,000 --> 00:36:25,000
And now I have written swap in such a way that it doesn't say for sure 

752
00:36:25,000 --> 00:36:29,000
what's being exchanged, two strings, two Ns, two doubles. They're all 

753
00:36:29,000 --> 00:36:33,000
have to be matching in this form, and I said, ''Well, there's a placeholder, and the placeholder 

754
00:36:33,000 --> 00:36:36,000
was gonna be filled in by the client who used this flop, 

755
00:36:36,000 --> 00:36:41,000
and on demand, we can instantiate as many versions of the swap function as we need.'' 

756
00:36:41,000 --> 00:36:45,000
If you need to swap integers, you can instantiate a swap that then fills in the 

757
00:36:45,000 --> 00:36:48,000
placeholder type with Ns all the way through 

758
00:36:48,000 --> 00:36:51,000
and then I can use that same template or pattern to generate one that will swap 

759
00:36:51,000 --> 00:36:53,000
characters or swap strings or swap whatever. 

760
00:36:53,000 --> 00:37:02,000
So if I go down to my main, maybe I'll just pull my main up [inaudible]. And 

761
00:37:02,000 --> 00:37:05,000
I will show you how I can do this. I can say 

762
00:37:05,000 --> 00:37:08,000
int 1 equals 

763
00:37:08,000 --> 00:37:13,000
54, 2 equals 32 and I say swap 1-2. 

764
00:37:13,000 --> 00:37:17,000
So if the usage of a function is a little bit different 

765
00:37:17,000 --> 00:37:19,000
than it was for 

766
00:37:19,000 --> 00:37:21,000
class templates, 

767
00:37:21,000 --> 00:37:25,000
in class templates we always did this thing where I said 

768
00:37:25,000 --> 00:37:29,000
the name of the template and then it angled back, as I said. And in particular I want 

769
00:37:29,000 --> 00:37:33,000
the swap function that operates where the template parameter has been filled in 

770
00:37:33,000 --> 00:37:35,000
with int 

771
00:37:35,000 --> 00:37:38,000
that in the case of functions, it turns out it's a little bit easier for the compiler 

772
00:37:38,000 --> 00:37:39,000
to know what you intended 

773
00:37:39,000 --> 00:37:42,000
that on the basis of what my arguments are to this, 

774
00:37:42,000 --> 00:37:45,000
is I called swap passing two integers, 

775
00:37:45,000 --> 00:37:49,000
it knows that there's only one possibility for what version of swap you were interested 

776
00:37:49,000 --> 00:37:50,000
in, 

777
00:37:50,000 --> 00:37:54,000
that it must be the one where type has been filled in with int and that's 

778
00:37:54,000 --> 00:37:56,000
the one to generate for you. 

779
00:37:56,000 --> 00:37:59,000
So you can use that long form of saying swap, angle bracket int, 

780
00:37:59,000 --> 00:38:03,000
but typically you will not; you will let the compiler infer it for you on the usage, 

781
00:38:03,000 --> 00:38:05,000
so based on the arguments you passed, 

782
00:38:05,000 --> 00:38:10,000
it will figure out how to kind of match them to what swap said it was taking and 

783
00:38:10,000 --> 00:38:13,000
if it can't match them, for example, if you pass one double and one int, you'll get compiler 

784
00:38:13,000 --> 00:38:14,000
errors. 

785
00:38:14,000 --> 00:38:19,000
But if I've done it correctly, then it will generate for me a swap 

786
00:38:19,000 --> 00:38:23,000
where type's been filled in with int and if I call it again, 

787
00:38:23,000 --> 00:38:26,000
passing different kinds of things, so having 

788
00:38:26,000 --> 00:38:29,000
string S equals hello and T 

789
00:38:29,000 --> 00:38:31,000
equals 

790
00:38:31,000 --> 00:38:33,000
world, 

791
00:38:33,000 --> 00:38:37,000
that I can write swap S and T and now 

792
00:38:37,000 --> 00:38:40,000
the compiler generated for me two different versions of swap, right, 

793
00:38:40,000 --> 00:38:43,000
one for integers, one for 

794
00:38:43,000 --> 00:38:46,000
strings on the [inaudible] and I didn't do anything with them. I put them [inaudible], so nothing to see there, 

795
00:38:46,000 --> 00:38:47,000
but showing 

796
00:38:47,000 --> 00:38:50,000
it does compile and build up. That's 

797
00:38:50,000 --> 00:38:53,000
a pretty neat idea. 

798
00:38:53,000 --> 00:38:55,000
Kind of important because it turns out there are 

799
00:38:55,000 --> 00:38:59,000
a lot of pieces of code that you're gonna find yourself wanting to write that don't 

800
00:38:59,000 --> 00:39:02,000
depend, in a case like swap, swap doesn't care what it's exchanging, that they're Ns, that 

801
00:39:02,000 --> 00:39:06,000
they're strings, that they're doubles. The way you swap two things 

802
00:39:06,000 --> 00:39:09,000
is the same no matter what they are. You copy one aside; you overwrite one; 

803
00:39:09,000 --> 00:39:11,000
you overwrite the other 

804
00:39:11,000 --> 00:39:14,000
and the same thing applies to much more algorithmically interesting problems 

805
00:39:14,000 --> 00:39:15,000
like 

806
00:39:15,000 --> 00:39:19,000
searching, like sorting, like removing duplicates, like printing, 

807
00:39:19,000 --> 00:39:23,000
where you want to take a vector and print all of its contents right that 

808
00:39:23,000 --> 00:39:24,000
the 

809
00:39:24,000 --> 00:39:27,000
steps you need to do to print a vector of events looks just like the steps you need to do to print a vector 

810
00:39:27,000 --> 00:39:31,000
of strings. And so it would be very annoying every time I need to do that to 

811
00:39:31,000 --> 00:39:33,000
duplicate that code that there's a real appeal toward 

812
00:39:33,000 --> 00:39:35,000
writing it once, as a template 

813
00:39:35,000 --> 00:39:37,000
and using it 

814
00:39:37,000 --> 00:39:41,000
in multiple situations. So 

815
00:39:41,000 --> 00:39:44,000
this shows that if I swap, right, 

816
00:39:44,000 --> 00:39:47,000
it infers what I meant 

817
00:39:47,000 --> 00:39:50,000
and then this thing basically just shows what happened is when I said 

818
00:39:50,000 --> 00:39:53,000
int equals four, 

819
00:39:53,000 --> 00:39:56,000
[inaudible] use that pattern to generate the swap int 

820
00:39:56,000 --> 00:39:59,000
where the type parameter, that placeholder has been 

821
00:39:59,000 --> 00:40:01,000
filled in 

822
00:40:01,000 --> 00:40:05,000
and established that it's int for this particular version which is distinct from 

823
00:40:05,000 --> 00:40:10,000
the swap for characters and swap for strings and what not. So 

824
00:40:10,000 --> 00:40:13,000
let me show you that version I talked about print, and this one I'm 

825
00:40:13,000 --> 00:40:15,000
gonna go back to the 

826
00:40:15,000 --> 00:40:16,000
compiler for just a second. 

827
00:40:16,000 --> 00:40:19,000
Let's say I want to print a vector, printing a vector it like iterating all 

828
00:40:19,000 --> 00:40:22,000
the members and using the 

829
00:40:22,000 --> 00:40:23,000
screen insertion to print them out 

830
00:40:23,000 --> 00:40:26,000
and so here it is taking some vector 

831
00:40:26,000 --> 00:40:29,000
where the elements in it are of this type name. In this case, I've just used T as 

832
00:40:29,000 --> 00:40:31,000
my short name here. 

833
00:40:31,000 --> 00:40:31,000
The iterator 

834
00:40:31,000 --> 00:40:35,000
looks the same; the bracketing looks the same. I see the end now and I'd 

835
00:40:35,000 --> 00:40:38,000
like to be able to print vectors of strings, vectors of ints, vector of 

836
00:40:38,000 --> 00:40:39,000
doubles 

837
00:40:39,000 --> 00:40:43,000
with this one piece of code. So if I call print vector and I pas code vector events, 

838
00:40:43,000 --> 00:40:43,000
all's good; 

839
00:40:43,000 --> 00:40:47,000
vector doubles, all's good; vector strings, all's good. 

840
00:40:47,000 --> 00:40:50,000
Here's where I could get a little bit into trouble here. 

841
00:40:50,000 --> 00:40:52,000
I've made this [inaudible] chord that has an X 

842
00:40:52,000 --> 00:40:57,000
and a Y field. I make a vector that contains these chords. 

843
00:40:57,000 --> 00:41:00,000
And then I try to call print vector of C. 

844
00:41:00,000 --> 00:41:03,000
So when I do this, right, 

845
00:41:03,000 --> 00:41:06,000
it will instantiate a versive print vector 

846
00:41:06,000 --> 00:41:11,000
where the T has been matched up to, oh it's chords that are in there; you've got a vector 

847
00:41:11,000 --> 00:41:12,000
of chords. 

848
00:41:12,000 --> 00:41:15,000
And so okay, we'll go through and iterate over the sides of the vector 

849
00:41:15,000 --> 00:41:18,000
and this line is trying to say see out 

850
00:41:18,000 --> 00:41:21,000
of a chord type. And 

851
00:41:21,000 --> 00:41:22,000
struts 

852
00:41:22,000 --> 00:41:26,000
don't automatically know how to out put themselves onto a string, 

853
00:41:26,000 --> 00:41:30,000
and so at this point when I try to instantiate, so the code in print vector is 

854
00:41:30,000 --> 00:41:33,000
actually fine and works for a large number of types, all the primitive types 

855
00:41:33,000 --> 00:41:35,000
will be fine, 

856
00:41:35,000 --> 00:41:36,000
string and things like that are fine. 

857
00:41:36,000 --> 00:41:40,000
But if I try to use it for a type for which it doesn't have this output 

858
00:41:40,000 --> 00:41:43,000
behavior, right I will get a compiler error, 

859
00:41:43,000 --> 00:41:46,000
and it may come as a surprise because a print vector appeared to be working fine in all the other 

860
00:41:46,000 --> 00:41:49,000
situations and all the sudden it seems like a new error has cropped up 

861
00:41:49,000 --> 00:41:51,000
but that error came from 

862
00:41:51,000 --> 00:41:54,000
the instantiation of a particular version in this case, that suddenly ran 

863
00:41:54,000 --> 00:41:58,000
afoul of what the template expected of the type. 

864
00:41:58,000 --> 00:42:02,000
The message you get, let's say in X code looks like this, no match for 

865
00:42:02,000 --> 00:42:08,000
operator, less than, less than in standard CL, vector element, operator [inaudible]. 

866
00:42:08,000 --> 00:42:12,000
So I'm trying to give you a little clue, but in its very cryptic C++ way of what 

867
00:42:12,000 --> 00:42:15,000
you've done wrong. 

868
00:42:15,000 --> 00:42:18,000
So it comes back to what the template has to be a little bit clear about 

869
00:42:18,000 --> 00:42:20,000
from a client point of view is to say well 

870
00:42:20,000 --> 00:42:24,000
what is it that needs to be true. Will it really work for all types or is there 

871
00:42:24,000 --> 00:42:27,000
something special about the kind of things that actually need to be true 

872
00:42:27,000 --> 00:42:29,000
about what's filled in there with a placeholder to make the rest of the code 

873
00:42:29,000 --> 00:42:31,000
work correctly. 

874
00:42:31,000 --> 00:42:33,000
So if it uses string insertion 

875
00:42:33,000 --> 00:42:36,000
or it compares to elements using equals equals, right, 

876
00:42:36,000 --> 00:42:40,000
something like a strut doesn't by default have those behaviors and so you 

877
00:42:40,000 --> 00:42:44,000
could have a template that would work for primitive types or types that have these 

878
00:42:44,000 --> 00:42:47,000
operations declared, but wouldn't work when 

879
00:42:47,000 --> 00:42:51,000
you gave it a more complex type that didn't have that 

880
00:42:51,000 --> 00:42:53,000
data support in there. 

881
00:42:53,000 --> 00:42:55,000
So I have that peace code over here. I can just 

882
00:42:55,000 --> 00:42:59,000
show it to you. I just took it down here. 

883
00:42:59,000 --> 00:43:05,000
This is print vector and 

884
00:43:05,000 --> 00:43:07,000
it's a little bit of 

885
00:43:07,000 --> 00:43:10,000
template. 

886
00:43:10,000 --> 00:43:11,000
And 

887
00:43:11,000 --> 00:43:14,000
if I were to have a vector 

888
00:43:14,000 --> 00:43:17,000
declared of ints, 

889
00:43:17,000 --> 00:43:20,000
I say print vector, V 

890
00:43:20,000 --> 00:43:24,000
that this compiles, there's nothing to print. It turns out it'll just print empty line because 

891
00:43:24,000 --> 00:43:32,000
there's nothing in there, but if I change this to be chord T 

892
00:43:32,000 --> 00:43:37,000
and my build is failing, right, and the error message I'm getting here is no match for 

893
00:43:37,000 --> 00:43:38,000
trying to 

894
00:43:38,000 --> 00:43:41,000
output that thing and it tells me that all the things I can output on a string. Well, it could be that you could output a 

895
00:43:41,000 --> 00:43:44,000
[inaudible] but it's not one of those things, right, chord T 

896
00:43:44,000 --> 00:43:48,000
doesn't have 

897
00:43:48,000 --> 00:43:51,000
a built-in behavior for outputting to a stream. 

898
00:43:51,000 --> 00:43:53,000
So this is going to come back to be kind of important because I'm gonna 

899
00:43:53,000 --> 00:43:54,000
keep going here 

900
00:43:54,000 --> 00:43:58,000
is that 

901
00:43:58,000 --> 00:44:00,000
when we get to the sort, right, when 

902
00:44:00,000 --> 00:44:02,000
we try to build this thing, we're gonna definitely be depending on 

903
00:44:02,000 --> 00:44:05,000
something being true about the elements, right, 

904
00:44:05,000 --> 00:44:07,000
that's gonna constrain what the type could be. 

905
00:44:07,000 --> 00:44:10,000
So this is selection sort in 

906
00:44:10,000 --> 00:44:14,000
its ordinary form, nothing special about it other than the fact that I 

907
00:44:14,000 --> 00:44:17,000
changed it as so sorting an int, it's the sort of vector with 

908
00:44:17,000 --> 00:44:18,000
placeholder type. So it's [inaudible] with 

909
00:44:18,000 --> 00:44:23,000
the template typed in header, vector of type and then in here, 

910
00:44:23,000 --> 00:44:26,000
operations like this really are accessing a type 

911
00:44:26,000 --> 00:44:28,000
and another type 

912
00:44:28,000 --> 00:44:30,000
variable out of 

913
00:44:30,000 --> 00:44:33,000
the vector that was passed and then comparing them, which is smaller to decide 

914
00:44:33,000 --> 00:44:34,000
which index. 

915
00:44:34,000 --> 00:44:36,000
It's using a swap 

916
00:44:36,000 --> 00:44:39,000
and so the generic swap would also be necessary for this so that we need to 

917
00:44:39,000 --> 00:44:42,000
have a swap for any kind of things we want to exchange. We have a template 

918
00:44:42,000 --> 00:44:44,000
up there for swap, 

919
00:44:44,000 --> 00:44:47,000
then the sort can build on top of that and use that as part of its workings which is then able 

920
00:44:47,000 --> 00:44:48,000
to come under 

921
00:44:48,000 --> 00:44:52,000
swap to any two things can be swapped using the same strategy. 

922
00:44:52,000 --> 00:44:55,000
And this is actually where templates really start to shine. Like swap in itself isn't that 

923
00:44:55,000 --> 00:44:56,000


924
00:44:56,000 --> 00:44:59,000
stunning of an example, because you think whatever, who cares if three lines a code, but every 

925
00:44:59,000 --> 00:45:00,000


926
00:45:00,000 --> 00:45:01,000
little bit does count, 

927
00:45:01,000 --> 00:45:04,000
but once we get to things like sorting where we could build a really 

928
00:45:04,000 --> 00:45:05,000
killer 

929
00:45:05,000 --> 00:45:09,000
totally tuned, really high performance, you know, protection against a [inaudible] 

930
00:45:09,000 --> 00:45:09,000
quick sort 

931
00:45:09,000 --> 00:45:12,000
and we want to be sure that we can use it in all the places we need to sort. I don't to 

932
00:45:12,000 --> 00:45:15,000
make it sort just strings and later when I need to sort for integers, I need to 

933
00:45:15,000 --> 00:45:18,000
copy and paste it and change string to everywhere, 

934
00:45:18,000 --> 00:45:21,000
and then if I find a bug in one, I forget to change it in the other I have 

935
00:45:21,000 --> 00:45:23,000
the bug still lurking there. 

936
00:45:23,000 --> 00:45:25,000
And so template functions are generally used for 

937
00:45:25,000 --> 00:45:29,000
all kinds of algorithmically interesting things, sorting and searching and removing 

938
00:45:29,000 --> 00:45:32,000
duplicates and permuting and finding the mode 

939
00:45:32,000 --> 00:45:33,000
and 

940
00:45:33,000 --> 00:45:36,000
shuffling and all these things that like okay, no matter what the type of 

941
00:45:36,000 --> 00:45:38,000
thing you're working on, 

942
00:45:38,000 --> 00:45:41,000
the algorithm for how you do it looks the same whether it's ints or 

943
00:45:41,000 --> 00:45:43,000
strings or whatever. So 

944
00:45:43,000 --> 00:45:46,000
we got this guy 

945
00:45:46,000 --> 00:45:46,000


946
00:45:46,000 --> 00:45:49,000
and as is, right, we could 

947
00:45:49,000 --> 00:45:53,000
push vectors of ints in there, vectors of strings in there and the right thing 

948
00:45:53,000 --> 00:45:57,000
would work for those without any changes to it, 

949
00:45:57,000 --> 00:45:59,000
but it does have a lurking 

950
00:45:59,000 --> 00:46:02,000
constraint on what much be true about the kind of elements that we're 

951
00:46:02,000 --> 00:46:04,000
trying to sort here. 

952
00:46:04,000 --> 00:46:05,000
Not every type will work. If 

953
00:46:05,000 --> 00:46:07,000
I go back to my chord example, 

954
00:46:07,000 --> 00:46:09,000
this XY, 

955
00:46:09,000 --> 00:46:13,000
if have a vector of chord and I pass it to sort and if I go back and look at the 

956
00:46:13,000 --> 00:46:14,000
code, 

957
00:46:14,000 --> 00:46:17,000
you think about instantiating this with vector, is all chord in here, all the way 

958
00:46:17,000 --> 00:46:18,000
through here, 

959
00:46:18,000 --> 00:46:22,000
this part's fine; this part's fine; this part's fine, but suddenly you get to this line and 

960
00:46:22,000 --> 00:46:25,000
that's gonna be taking two coordinate structures and saying is this coordinate 

961
00:46:25,000 --> 00:46:27,000
structure less than another. 

962
00:46:27,000 --> 00:46:30,000
And that code's not gonna compile. Yes. Back when we were doing sets, we were able to explicitly 

963
00:46:30,000 --> 00:46:36,000
say how this works. Yeah, so you're right where I'm gonna be, but I'm probably 

964
00:46:36,000 --> 00:46:40,000
not gonna finish today, but I'll have to finish up on Wednesday is we're gonna do exactly what 

965
00:46:40,000 --> 00:46:43,000
Seth did, and what did Seth do? Comparison function. 

966
00:46:43,000 --> 00:46:46,000
That's right, he used a comparison function, so you have to have some coordination here when you say, 

967
00:46:46,000 --> 00:46:49,000
well instead of trying to use the operate or less than on these things, 

968
00:46:49,000 --> 00:46:53,000
how about the client tell me as part of calling sort, why don't you give me the 

969
00:46:53,000 --> 00:46:57,000
information in the form of a function that says call me back and tell me are 

970
00:46:57,000 --> 00:46:59,000
these things less than 

971
00:46:59,000 --> 00:47:02,000
and so on and how to order them. So that's exactly what we need to do, but I can't 

972
00:47:02,000 --> 00:47:05,000
really show you the code right now, but I'll just kind of like 

973
00:47:05,000 --> 00:47:09,000
flash it up there and we will talk about it on Wednesday, what changes make 

974
00:47:09,000 --> 00:47:12,000
that happen. We'll pick up 

975
00:47:12,000 --> 00:47:22,000
with Chapter 8, actually after the midterm in terms of 

976
00:47:22,000 --> 00:47:23,000
reading.