1
00:00:00,000 --> 00:00:02,000


2
00:00:02,000 --> 00:00:10,000


3
00:00:10,000 --> 00:00:13,000
This presentation is delivered by the Stanford Center for Professional 

4
00:00:13,000 --> 00:00:21,000
Development. 

5
00:00:21,000 --> 00:00:23,000
Let us 

6
00:00:23,000 --> 00:00:25,000
refresh about where we were at the end of Wednesday. 

7
00:00:25,000 --> 00:00:29,000
As we had talked about the vector form of the buffer, so the editor buffer that's 

8
00:00:29,000 --> 00:00:31,000
backing the 

9
00:00:31,000 --> 00:00:34,000
word processing client, 

10
00:00:34,000 --> 00:00:37,000
and the two stack version, where we're shuffling stuff, often on the before 

11
00:00:37,000 --> 00:00:40,000
and after stacks to get things done, right? 

12
00:00:40,000 --> 00:00:43,000
And when we were looking at the main six operations, we're looking at the four 

13
00:00:43,000 --> 00:00:47,000
cursor movements and the two editing operations, right, that we had traded off. 

14
00:00:47,000 --> 00:00:49,000
Where editing had been slow in the vector, 

15
00:00:49,000 --> 00:00:52,000
to where editing was now fast, but then these long distance movements were slow, 

16
00:00:52,000 --> 00:00:54,000
because of the shuffling. 

17
00:00:54,000 --> 00:00:57,000
And then there was a little discussion at the very end about how this probably 

18
00:00:57,000 --> 00:01:00,000
increases our space requirements, because we're now keeping two stacks that are 

19
00:01:00,000 --> 00:01:02,000
each capable of holding 

20
00:01:02,000 --> 00:01:04,000
the entire contents 

21
00:01:04,000 --> 00:01:06,000
as opposed to one vector here 

22
00:01:06,000 --> 00:01:09,000
with its slop. 

23
00:01:09,000 --> 00:01:10,000
So that was where we were at. 

24
00:01:10,000 --> 00:01:13,000
We are still in kind of the pursuit of this Holy Grail of could we 

25
00:01:13,000 --> 00:01:16,000
get it to where everything is fast. Is it always a matter of tradeoffs or could we just 

26
00:01:16,000 --> 00:01:19,000
invest some more smarts and more 

27
00:01:19,000 --> 00:01:20,000
cleverness 

28
00:01:20,000 --> 00:01:24,000
and hard work and squeeze it down to where we could get everything to perform 

29
00:01:24,000 --> 00:01:26,000
efficiently? So typically when computer scientists say they want something to perform 

30
00:01:26,000 --> 00:01:28,000
efficiently, they're hoping for log 

31
00:01:28,000 --> 00:01:32,000
in time or faster. Constant time is the best. Log in grows so slowly that in fact, 

32
00:01:32,000 --> 00:01:34,000
logarithm time is effectively constant 

33
00:01:34,000 --> 00:01:40,000
for all reasonable values of N up to millions and billions. 

34
00:01:40,000 --> 00:01:43,000
And so typically anything that is linear or higher is often a weak spot that you are 

35
00:01:43,000 --> 00:01:47,000
trying to look at, and certainly things that are quadratic or exponential 

36
00:01:47,000 --> 00:01:51,000
are definite opportunities for improvement. 

37
00:01:51,000 --> 00:01:53,000
So let's look at this idea of the link list, 

38
00:01:53,000 --> 00:01:55,000
so that both 

39
00:01:55,000 --> 00:01:56,000
the 

40
00:01:56,000 --> 00:01:58,000
vector and the stack 

41
00:01:58,000 --> 00:02:03,000
- we're relying on this idea that continuous memory is the 

42
00:02:03,000 --> 00:02:06,000
weak link that was causing some problems. In the vector, that shuffling to 

43
00:02:06,000 --> 00:02:08,000
move things up and down, 

44
00:02:08,000 --> 00:02:12,000
or in the case of the stack version, one side to the other 

45
00:02:12,000 --> 00:02:15,000
was part of what we were working up against. 

46
00:02:15,000 --> 00:02:18,000
And the link list promises to give us this opportunity to have these each 

47
00:02:18,000 --> 00:02:21,000
individual character being managed separately, 

48
00:02:21,000 --> 00:02:26,000
and flexibility might help resolve some of our problems here. 

49
00:02:26,000 --> 00:02:30,000
SO what we're going to look at is the idea of implementing the buffer using a link 

50
00:02:30,000 --> 00:02:32,000
list. So if I have 

51
00:02:32,000 --> 00:02:34,000
characters A, B, C, D, E, 

52
00:02:34,000 --> 00:02:37,000
I could have a link list here where I'm pointing to the head cell, which is A, 

53
00:02:37,000 --> 00:02:40,000
which points to B, and C, and D, and E, and down to the null. 

54
00:02:40,000 --> 00:02:43,000
SO what I'm modeling it as in the private data section will be a 

55
00:02:43,000 --> 00:02:44,000
little 

56
00:02:44,000 --> 00:02:47,000
link list node with a character in there, a pointer to the next one. 

57
00:02:47,000 --> 00:02:52,000
And then I'm going to keep two pointers, one to the front-most cell, and then that's going to 

58
00:02:52,000 --> 00:02:55,000
model where the cursor is within the buffer. 

59
00:02:55,000 --> 00:02:59,000
So rather than doing this like an index, an index was handy in the case of vector where we 

60
00:02:59,000 --> 00:03:00,000
had direct access, right? 

61
00:03:00,000 --> 00:03:02,000
Given the way the link list works, 

62
00:03:02,000 --> 00:03:05,000
knowing that it's at the fifth cell, the fourth cell isn't really very helpful. It would be more 

63
00:03:05,000 --> 00:03:09,000
helpful to have the pointer kind of directly in the midst of things to give us direct 

64
00:03:09,000 --> 00:03:12,000
access to where the cursor is. 

65
00:03:12,000 --> 00:03:14,000
But let's think a little bit about that cursor, 

66
00:03:14,000 --> 00:03:17,000
because there's an important decision to be made early about helping to 

67
00:03:17,000 --> 00:03:20,000
facilitate the later work we have to do. 

68
00:03:20,000 --> 00:03:24,000
I have the contents A, B, C, D, E, 

69
00:03:24,000 --> 00:03:28,000
so the cursor is actually between two characters. 

70
00:03:28,000 --> 00:03:31,000
If the cursor right now is situated 

71
00:03:31,000 --> 00:03:32,000
after the A and before the B, 

72
00:03:32,000 --> 00:03:33,000
what I'm 

73
00:03:33,000 --> 00:03:36,000
modeling is A cursor B, 

74
00:03:36,000 --> 00:03:38,000
C, D, 

75
00:03:38,000 --> 00:03:39,000
then 

76
00:03:39,000 --> 00:03:42,000
it seems like the obvious two choices for where the cursor might point is to 

77
00:03:42,000 --> 00:03:45,000
A or to B. And I 

78
00:03:45,000 --> 00:03:50,000
had said in the case of the vector that if it were index zero or index one 

79
00:03:50,000 --> 00:03:51,000
that 

80
00:03:51,000 --> 00:03:53,000
there wasn't a really strong preference for one over the other. 

81
00:03:53,000 --> 00:03:56,000
There is going to be a really good reason to pick one for the link list 

82
00:03:56,000 --> 00:03:59,000
over the other. Would you rather point to the A 

83
00:03:59,000 --> 00:04:11,000
for where the cursor is or would you rather point to the B? [Inaudible] 

84
00:04:11,000 --> 00:04:14,000
Exactly, so the idea is that when the cursor position is there, the next edit 

85
00:04:14,000 --> 00:04:18,000
actually goes in between the A and the B. So if I insert the character 

86
00:04:18,000 --> 00:04:21,000
Z, it really does go right here. 

87
00:04:21,000 --> 00:04:24,000
And if I am going to, if effect, wire that in and splice it in then list, 

88
00:04:24,000 --> 00:04:29,000
if I'm pointing to B then I'm kind of hosed, because I'm already one past where I need to be able slice in. Because I need to 

89
00:04:29,000 --> 00:04:33,000
know what was behind it to bring it in. So I'm pointing to the character behind the 

90
00:04:33,000 --> 00:04:34,000
cursor 

91
00:04:34,000 --> 00:04:36,000
gives you access to 

92
00:04:36,000 --> 00:04:39,000
wiring this down to here and this into there, and then updating the cursor to 

93
00:04:39,000 --> 00:04:40,000
point to the Z 

94
00:04:40,000 --> 00:04:44,000
without having to do any of the difficult backward looking backward 

95
00:04:44,000 --> 00:04:45,000
traversal. 

96
00:04:45,000 --> 00:04:48,000
So that strongly suggests what it is, it's going to point to 

97
00:04:48,000 --> 00:04:52,000
A when it's between A and B; B when it's between B and C; and so on. 

98
00:04:52,000 --> 00:04:56,000
There is another little point here, which is there are actually five letters 

99
00:04:56,000 --> 00:04:59,000
in the cursor, but there are six cursor positions. It could be at the very 

100
00:04:59,000 --> 00:05:00,000
beginning, 

101
00:05:00,000 --> 00:05:03,000
in between all these, or at the very end. 

102
00:05:03,000 --> 00:05:07,000
And the way we've chosen it right now, I can identify all of the five 

103
00:05:07,000 --> 00:05:08,000
positions 

104
00:05:08,000 --> 00:05:11,000
that have at least one character to the left, so being - 

105
00:05:11,000 --> 00:05:12,000
pointing to the A means it's 

106
00:05:12,000 --> 00:05:15,000
after A; pointing to the B it's after B; and so on, all the way down to the E. 

107
00:05:15,000 --> 00:05:18,000
But I don't have a cursor position for 

108
00:05:18,000 --> 00:05:23,000
representing when the cursor is at the very beginning of the buffer. 

109
00:05:23,000 --> 00:05:25,000
So 

110
00:05:25,000 --> 00:05:28,000
one strategy I could use, right, is to say I'll just use the special cursor 

111
00:05:28,000 --> 00:05:29,000
position of null. But 

112
00:05:29,000 --> 00:05:32,000
I need one more pointer value to hold onto, 

113
00:05:32,000 --> 00:05:36,000
and I could make a special case out of this because there's null. Once I've done that though, 

114
00:05:36,000 --> 00:05:40,000
I've actually committed myself to this thing that's going to require all of my code to be managing a 

115
00:05:40,000 --> 00:05:41,000
special case of, 

116
00:05:41,000 --> 00:05:43,000
well when the cursor's null do something a little different than when 

117
00:05:43,000 --> 00:05:46,000
the cursor is pointing to some cell. I'm going to 

118
00:05:46,000 --> 00:05:50,000
show you a way that we can kind of 

119
00:05:50,000 --> 00:05:54,000
avoid the need for making a special case out of it, 

120
00:05:54,000 --> 00:05:55,000
by 

121
00:05:55,000 --> 00:05:58,000
wasting a little memory and reorganizing our list 

122
00:05:58,000 --> 00:06:01,000
to allow for there to be a sixth cell, 

123
00:06:01,000 --> 00:06:04,000
and extra cell that we call the dummy cell. 

124
00:06:04,000 --> 00:06:07,000
So in this case that cell would go at the very front. 

125
00:06:07,000 --> 00:06:10,000
It would have no character, just let the character field be unstatused. It's 

126
00:06:10,000 --> 00:06:13,000
not really a character. What it is, is just a placeholder. 

127
00:06:13,000 --> 00:06:16,000
And when I created the buffer to begin with, even when it's empty, it's always going to have 

128
00:06:16,000 --> 00:06:20,000
that cell. I'm going to create that cell in advance, I'm going to have it sitting ready and waiting, and that cell 

129
00:06:20,000 --> 00:06:24,000
never gets deleted or moved or changed. In fact, the list always, the head 

130
00:06:24,000 --> 00:06:28,000
of this will always point to the same cell from the beginning of this object's lifetime 

131
00:06:28,000 --> 00:06:32,000
to the end. So it will never change this at all, and the cells that follow are 

132
00:06:32,000 --> 00:06:35,000
the ones that are going to be changing and growing and rearranging in 

133
00:06:35,000 --> 00:06:38,000
response to editing commands. 

134
00:06:38,000 --> 00:06:40,000
What this is going to buy us is a couple of things. First off, it's going to make it 

135
00:06:40,000 --> 00:06:42,000
so there is a sixth cursor position. 

136
00:06:42,000 --> 00:06:45,000
So now when the cursor is at the very beginning, 

137
00:06:45,000 --> 00:06:47,000
we can set the cursor to point to this dummy cell. 

138
00:06:47,000 --> 00:06:48,000
Okay 

139
00:06:48,000 --> 00:06:51,000
that's one thing that we solve by doing that. 

140
00:06:51,000 --> 00:06:55,000
Another thing that we've solved is actually we have made all the other 

141
00:06:55,000 --> 00:06:57,000
cells on the list, the ones that are being edited and manipulated. 

142
00:06:57,000 --> 00:07:00,000
We've made them all totally symmetric. 

143
00:07:00,000 --> 00:07:02,000
Previously there was always going to be a little bit of a special case about, well if you're 

144
00:07:02,000 --> 00:07:06,000
the front of the list, we're seeing that sometimes we're doing inserts on the 

145
00:07:06,000 --> 00:07:08,000
stack in queue link list. That 

146
00:07:08,000 --> 00:07:12,000
inserting later in the list involves kind of wiring in two pointers; one to command and one 

147
00:07:12,000 --> 00:07:13,000
to go out. 

148
00:07:13,000 --> 00:07:16,000
But there was a special case that oh, if you're the very first cell of the list, then 

149
00:07:16,000 --> 00:07:19,000
your outpointer is the same, but your incoming pointer is resetting the head. 

150
00:07:19,000 --> 00:07:22,000
That piece has actually gone away now, 

151
00:07:22,000 --> 00:07:26,000
that now every cell in there always has a predecessor. It always has a previous, 

152
00:07:26,000 --> 00:07:29,000
right? Something behind it so that actually they will always be 

153
00:07:29,000 --> 00:07:31,000
pulling the pointer in and pulling the pointer out 

154
00:07:31,000 --> 00:07:34,000
for every cell with character data that's being modified. 

155
00:07:34,000 --> 00:07:37,000
And that means some of that extra little handing of oh, if you're the 

156
00:07:37,000 --> 00:07:39,000
very first cell, 

157
00:07:39,000 --> 00:07:40,000
have gone away. 

158
00:07:40,000 --> 00:07:44,000
The very first cell is always this kind of dummy cell and doesn't require us to go 

159
00:07:44,000 --> 00:07:46,000
out of our way to do anything special for it. 

160
00:07:46,000 --> 00:07:49,000
So this is going to solve a bunch of problems for us, and it's kind of a neat technique. This thing 

161
00:07:49,000 --> 00:07:51,000
is used actually 

162
00:07:51,000 --> 00:07:52,000
fairly commonly 

163
00:07:52,000 --> 00:07:57,000
in situations just managing a link list just to buy yourself some symmetry in the 

164
00:07:57,000 --> 00:08:02,000
remaining cells. 

165
00:08:02,000 --> 00:08:05,000
So let me look at some code with you, and I'm going to actually draw some pictures as I 

166
00:08:05,000 --> 00:08:06,000
go. I'm 

167
00:08:06,000 --> 00:08:10,000
not going to write this code in real time, because I think it's actually more important to 

168
00:08:10,000 --> 00:08:11,000
see it 

169
00:08:11,000 --> 00:08:15,000
acting on things than it is to see me typing out the characters. 

170
00:08:15,000 --> 00:08:18,000
But the mechanism, right, in the case of the buffer here 

171
00:08:18,000 --> 00:08:21,000
is it's going to have a head point, 

172
00:08:21,000 --> 00:08:23,000
which points to the front most cell; and a 

173
00:08:23,000 --> 00:08:24,000
cursor, 

174
00:08:24,000 --> 00:08:28,000
which points to the character that proceeds the cursor. So if right now the 

175
00:08:28,000 --> 00:08:31,000
contents of the buffer 

176
00:08:31,000 --> 00:08:34,000
are 

177
00:08:34,000 --> 00:08:37,000
the contents A, D, C. 

178
00:08:37,000 --> 00:08:39,000
And let's say that right now 

179
00:08:39,000 --> 00:08:40,000
the 

180
00:08:40,000 --> 00:08:42,000
cursor is pointing to A. 

181
00:08:42,000 --> 00:08:45,000
Actually, I need my 

182
00:08:45,000 --> 00:08:50,000
dummy cell. Let me move everybody down one, so you just have dummy 

183
00:08:50,000 --> 00:08:52,000
cell and my A and my B. 

184
00:08:52,000 --> 00:08:56,000
And so the cursor right now is at the very front of the buffer. 

185
00:08:56,000 --> 00:08:59,000
This is kind of what it looks like. It has two characters, the A and the B. 

186
00:08:59,000 --> 00:09:03,000
I'm going to trace through the process of inserting a new character 

187
00:09:03,000 --> 00:09:04,000
into the buffer. It 

188
00:09:04,000 --> 00:09:07,000
is that my local variable CP 

189
00:09:07,000 --> 00:09:11,000
is going to point to a new cell allocated out in the heap. 

190
00:09:11,000 --> 00:09:15,000
Let's say the character I'm going to insert is the Z, so I'll 

191
00:09:15,000 --> 00:09:16,000
assign 

192
00:09:16,000 --> 00:09:18,000
CH to the character, the slot right there. 

193
00:09:18,000 --> 00:09:20,000
And then the next for this 

194
00:09:20,000 --> 00:09:24,000
gets to - what the cursor is - kind of following the cursor. So the 

195
00:09:24,000 --> 00:09:26,000
cursor is pointing to the cell before it, 

196
00:09:26,000 --> 00:09:29,000
and the next field of that cell tells us what follows it; 

197
00:09:29,000 --> 00:09:31,000
so if I trace my cursor's 

198
00:09:31,000 --> 00:09:33,000
next field, it points to A, 

199
00:09:33,000 --> 00:09:35,000
and so I'm going to go ahead and 

200
00:09:35,000 --> 00:09:41,000
set up this new cell's next field to also point to A. 

201
00:09:41,000 --> 00:09:44,000
So now I've got two different ways to get to the A cell, tracing off the dummy cell or 

202
00:09:44,000 --> 00:09:46,000
tracing off this new cell. 

203
00:09:46,000 --> 00:09:48,000
And then I 

204
00:09:48,000 --> 00:09:51,000
update the cursor's next field to point to CP. 

205
00:09:51,000 --> 00:09:55,000
So I just copied right there that CP's next field was pointing to A. I copied it down here 

206
00:09:55,000 --> 00:09:57,000
to kind of do the splice out. 

207
00:09:57,000 --> 00:10:02,000
Now I'm going to do the splice in, 

208
00:10:02,000 --> 00:10:05,000
causing it to no longer point to A, 

209
00:10:05,000 --> 00:10:06,000
but instead to 

210
00:10:06,000 --> 00:10:08,000
point to my new cell down here, CP. 

211
00:10:08,000 --> 00:10:10,000
And then I update the cursor 

212
00:10:10,000 --> 00:10:13,000
to point to this new cell. That just means that subsequent inserts, like the 

213
00:10:13,000 --> 00:10:18,000
C is kind of behind the cursor after I've made this 

214
00:10:18,000 --> 00:10:21,000


215
00:10:21,000 --> 00:10:23,000
edit. 

216
00:10:23,000 --> 00:10:25,000
That goes away when that one ends, 

217
00:10:25,000 --> 00:10:28,000
and so now the new structure that I have here is 

218
00:10:28,000 --> 00:10:31,000
head still points to the dummy cell that never changes. 

219
00:10:31,000 --> 00:10:33,000
Dummy cell now points to Z, 

220
00:10:33,000 --> 00:10:36,000
Z points to A, Z points to B, 

221
00:10:36,000 --> 00:10:41,000
and then the cursor in this case is between the Z and the A, 

222
00:10:41,000 --> 00:10:44,000
by virtue of the fact that the cursor is pointing to Z. 

223
00:10:44,000 --> 00:10:50,000
A subsequent one would kind of do the same thing. If I typed out ZY 

224
00:10:50,000 --> 00:10:51,000
after I did this. We'd 

225
00:10:51,000 --> 00:10:55,000
make a new cell over here. It would get what 

226
00:10:55,000 --> 00:10:58,000
was currently coming out of the cursor's field. The next field, right, we would 

227
00:10:58,000 --> 00:11:02,000
update cursor's next field to point 

228
00:11:02,000 --> 00:11:05,000
to this new one. And then we would update cursor 

229
00:11:05,000 --> 00:11:08,000
to point to this guy. And 

230
00:11:08,000 --> 00:11:10,000
so now following the list, right, dummy cell, 

231
00:11:10,000 --> 00:11:11,000
Z, 

232
00:11:11,000 --> 00:11:16,000
Y, A, B, cursor pointing to the Y, which is the 

233
00:11:16,000 --> 00:11:20,000
character directly to the left of the cursor. 

234
00:11:20,000 --> 00:11:22,000
And so inserting in any position now, 

235
00:11:22,000 --> 00:11:25,000
front cell, middle cell, end cell, 

236
00:11:25,000 --> 00:11:29,000
doesn't require any special case handling. So you don't see any if of things. 

237
00:11:29,000 --> 00:11:33,000
And you notice that you'll never see an update to the head directly here. 

238
00:11:33,000 --> 00:11:37,000
That the head was set up to point to the dummy cell in the constructor 

239
00:11:37,000 --> 00:11:39,000
and never needs any adjusting. 

240
00:11:39,000 --> 00:11:42,000
It's only the cells after it that requires 

241
00:11:42,000 --> 00:11:43,000


242
00:11:43,000 --> 00:11:44,000
the new 

243
00:11:44,000 --> 00:11:47,000
arrangement of the pointers coming in and 

244
00:11:47,000 --> 00:11:49,000
out. When I'm ready to delete a cell, 

245
00:11:49,000 --> 00:11:53,000
then the mechanism for delete in the buffer is to delete the thing that follows 

246
00:11:53,000 --> 00:11:55,000
the cursor. So if I have Z, Y, A, B, 

247
00:11:55,000 --> 00:11:58,000
this is the character that delete is supposed to delete, the one after the 

248
00:11:58,000 --> 00:11:59,000
cursor. 

249
00:11:59,000 --> 00:12:00,000
And so, 

250
00:12:00,000 --> 00:12:04,000
the first thing it looks at is to make sure that we're not already at the end. 

251
00:12:04,000 --> 00:12:07,000
In the case of the link list form of this, if the cursor was pointing to 

252
00:12:07,000 --> 00:12:09,000
the last most cell, 

253
00:12:09,000 --> 00:12:12,000
and then we looked at it's next field and saw that it was null, that would be a clue. Oh, it's 

254
00:12:12,000 --> 00:12:15,000
pointing to the very last cell, there's nothing that follows it. So we have a 

255
00:12:15,000 --> 00:12:17,000
quick test to make sure that there's something there to delete. 

256
00:12:17,000 --> 00:12:19,000
There is something following the cursor, and 

257
00:12:19,000 --> 00:12:24,000
in this case since the cursor is pointing to the Y, we're good. It says look at its 

258
00:12:24,000 --> 00:12:25,000
next field, it points to A. 

259
00:12:25,000 --> 00:12:27,000
It says okay, 

260
00:12:27,000 --> 00:12:29,000
call this thing old 

261
00:12:29,000 --> 00:12:30,000
and point to that A. 

262
00:12:30,000 --> 00:12:33,000
Now do a wire around it. 

263
00:12:33,000 --> 00:12:34,000
So take the cursor's 

264
00:12:34,000 --> 00:12:36,000
next field, and 

265
00:12:36,000 --> 00:12:39,000
so this thing is where I'm targeting my overwrite, 

266
00:12:39,000 --> 00:12:44,000
and copy out of the old its next. It's 

267
00:12:44,000 --> 00:12:49,000
no longer pointing to the A, but instead it's 

268
00:12:49,000 --> 00:12:50,000
pointing to the B. 

269
00:12:50,000 --> 00:12:52,000
And deleting the old, which 

270
00:12:52,000 --> 00:12:54,000
causes this piece of memory to 

271
00:12:54,000 --> 00:12:56,000
be reclaimed, 

272
00:12:56,000 --> 00:13:01,000
and now the contents of my buffer have been shifted over to 

273
00:13:01,000 --> 00:13:06,000
where it goes Z, Y, and straight to D. The cursor in this case doesn't move. It 

274
00:13:06,000 --> 00:13:07,000
actually deleted to the right, and so 

275
00:13:07,000 --> 00:13:11,000
whatever was to the left of the cursor just stays in place. 

276
00:13:11,000 --> 00:13:15,000
So again, no need to update anything special for the first or the last or any of those 

277
00:13:15,000 --> 00:13:15,000
cells, 

278
00:13:15,000 --> 00:13:19,000
but the symmetry of all the cells past the dummy kind of buys us some convenience 

279
00:13:19,000 --> 00:13:22,000
of handling it. What does a dummy cell look like? The 

280
00:13:22,000 --> 00:13:26,000
dummy cell looks like just any other cell. It's actually a cell, T, and what you do is 

281
00:13:26,000 --> 00:13:29,000
you just don't assign the Christian field, the character field, because it has nothing in it. 

282
00:13:29,000 --> 00:13:32,000
So it actually looks just like all the others, it's just another node in the 

283
00:13:32,000 --> 00:13:33,000
chain. 

284
00:13:33,000 --> 00:13:36,000
But this one, you know you put there purposefully just as the placeholder. 

285
00:13:36,000 --> 00:13:40,000
So it doesn't contain any character data. So if you're ready to print the 

286
00:13:40,000 --> 00:13:42,000
contents of the buffer, you have to remember the dummy cell is there, 

287
00:13:42,000 --> 00:13:43,000
and just skip over it. You 

288
00:13:43,000 --> 00:13:45,000
tend to start your list 

289
00:13:45,000 --> 00:13:49,000
at heads next, the first interesting cell to look at, then work 

290
00:13:49,000 --> 00:13:51,000
your way down to the end. So it's 

291
00:13:51,000 --> 00:13:54,000
just like all the others, but you just know 

292
00:13:54,000 --> 00:14:00,000
that the character data here is useless, it's not important. You didn't even write anything in there. When you delete, you just go left? It turns out 

293
00:14:00,000 --> 00:14:02,000
the delete in this 

294
00:14:02,000 --> 00:14:04,000
case goes forward. 

295
00:14:04,000 --> 00:14:07,000
So you can imagine that in some other world it does, but all the ones we 

296
00:14:07,000 --> 00:14:11,000
have done so far have always deleted forward. So the delete - for example, the stack case, 

297
00:14:11,000 --> 00:14:13,000
grabbed from the after stack 

298
00:14:13,000 --> 00:14:16,000
- it's done that way, because it happens to simplify some other 

299
00:14:16,000 --> 00:14:19,000
things that you can imagine that delete and reverse. What would it take? Well, you'd have to back 

300
00:14:19,000 --> 00:14:22,000
the cursor up. And we're going to find out pretty soon that would make it 

301
00:14:22,000 --> 00:14:24,000
challenging for the link list. So partly we've picked a delete that 

302
00:14:24,000 --> 00:14:26,000
made certain things work out a 

303
00:14:26,000 --> 00:14:29,000
little better for us. 

304
00:14:29,000 --> 00:14:31,000
So both of these operations 

305
00:14:31,000 --> 00:14:33,000
are O of 

306
00:14:33,000 --> 00:14:33,000
1; 

307
00:14:33,000 --> 00:14:37,000
this is where the link list really shines, right, on this sort of stuff. Because 

308
00:14:37,000 --> 00:14:41,000
if you have access to the point at which you want to make a modification, so if you are already kind of 

309
00:14:41,000 --> 00:14:44,000
in the midst of the thing you're trying to rearrange, 

310
00:14:44,000 --> 00:14:47,000
it's often to the link list's advantage, because it actually can do these this rewiring 

311
00:14:47,000 --> 00:14:51,000
in the local context. And even if there's thousands of characters that 

312
00:14:51,000 --> 00:14:52,000
follow the cursor. 

313
00:14:52,000 --> 00:14:55,000
It doesn't matter. And there can be thousands of characters preceding the cursor 

314
00:14:55,000 --> 00:14:56,000
for that matter, too. 

315
00:14:56,000 --> 00:15:00,000
The idea that there's a much larger context that's working and doesn't impact 

316
00:15:00,000 --> 00:15:03,000
its performance; it's really just doing this local little rearrangement of the pointers. 

317
00:15:03,000 --> 00:15:05,000
And that's a real strength of the link list design 

318
00:15:05,000 --> 00:15:11,000
is that flexibility that doesn't rely on everything living in contiguous memory. 

319
00:15:11,000 --> 00:15:15,000
So let's talk about the movement options for this guy. 

320
00:15:15,000 --> 00:15:19,000
So right now I have the contents 

321
00:15:19,000 --> 00:15:21,000
Z, Y with 

322
00:15:21,000 --> 00:15:22,000
the cursor 

323
00:15:22,000 --> 00:15:27,000
between that, and the B following that. So three characters worth, 

324
00:15:27,000 --> 00:15:29,000


325
00:15:29,000 --> 00:15:30,000
and then 

326
00:15:30,000 --> 00:15:31,000
the 

327
00:15:31,000 --> 00:15:33,000
initial two operations 

328
00:15:33,000 --> 00:15:36,000
are pretty simple. The later two are going to be a little bit messier. So let's look at 

329
00:15:36,000 --> 00:15:37,000
the easy ones first. 

330
00:15:37,000 --> 00:15:40,000
Moving the cursor to the beginning, so that jump that pulls you back to the very 

331
00:15:40,000 --> 00:15:41,000
beginning, 

332
00:15:41,000 --> 00:15:44,000
very easy in a link list form. If I want to move the cursor to the beginning, all 

333
00:15:44,000 --> 00:15:47,000
I need to do is take that cursor pointer, 

334
00:15:47,000 --> 00:15:49,000
and reset it 

335
00:15:49,000 --> 00:15:51,000
to the head, which is where the dummy cell is. And so, by 

336
00:15:51,000 --> 00:15:54,000
virtue of doing that, 

337
00:15:54,000 --> 00:15:57,000


338
00:15:57,000 --> 00:15:59,000
the character it points to is the one that's to the left, 

339
00:15:59,000 --> 00:16:02,000
right of that, and to the left of that the dummy cell is one we're not 

340
00:16:02,000 --> 00:16:05,000
counting. In fact, it's like nothingness that's there, 

341
00:16:05,000 --> 00:16:07,000
and that means that the first character that follows the cursor is Z, which 

342
00:16:07,000 --> 00:16:11,000
is the initial character of the buffer. So that reset, easy to do, we have 

343
00:16:11,000 --> 00:16:13,000
easy access to the front of the list, 

344
00:16:13,000 --> 00:16:14,000
and so 

345
00:16:14,000 --> 00:16:16,000
no 

346
00:16:16,000 --> 00:16:17,000
special 

347
00:16:17,000 --> 00:16:18,000
crazy code required. 

348
00:16:18,000 --> 00:16:21,000
Moving the cursor forward, also an easy thing to do, link 

349
00:16:21,000 --> 00:16:24,000
list are designed to make it easy to work your way from the front to 

350
00:16:24,000 --> 00:16:28,000
the back. And so moving the cursor forward is advancing that cursor by one step. So 

351
00:16:28,000 --> 00:16:32,000
in the array form of this, what was a cursor plus, plus, the 

352
00:16:32,000 --> 00:16:36,000
comparable form of that in the link list is cursor equals cursor next. 

353
00:16:36,000 --> 00:16:37,000
It has a little 

354
00:16:37,000 --> 00:16:40,000
if test there, all of the versions actually have something that is 

355
00:16:40,000 --> 00:16:42,000
comparable to this. If you're already at the end, then there's 

356
00:16:42,000 --> 00:16:46,000
nothing to be done, so you don't advance the cursor off into space. 

357
00:16:46,000 --> 00:16:47,000
You 

358
00:16:47,000 --> 00:16:51,000
check to make sure that there is something for it to point to. So in this case, 

359
00:16:51,000 --> 00:16:54,000
seeing if the cursor's next field is null, it's not, it points to Z. Then we go 

360
00:16:54,000 --> 00:16:58,000
ahead and update the cursor to there, 

361
00:16:58,000 --> 00:17:00,000
which has the effect of 

362
00:17:00,000 --> 00:17:03,000
changing things to where the Z is the character 

363
00:17:03,000 --> 00:17:07,000
to the left of the cursor, and then the YB follows it. 

364
00:17:07,000 --> 00:17:10,000
So moving this way, getting back to the beginning, and kind of walking our way down is kind of easy 

365
00:17:10,000 --> 00:17:11,000
to do. 

366
00:17:11,000 --> 00:17:15,000
Now we start to see where the link list causes us a little bit of grief, 

367
00:17:15,000 --> 00:17:18,000
moving the cursor to the end. 

368
00:17:18,000 --> 00:17:21,000
So now, starting at the beginning or the middle, or wherever I am, 

369
00:17:21,000 --> 00:17:25,000
I want to advance my way to the end. I don't know where the end is. 

370
00:17:25,000 --> 00:17:28,000
Link lists in general don't keep track of that, 

371
00:17:28,000 --> 00:17:31,000
that in the simple form of the single link list, I'm going to have to work to 

372
00:17:31,000 --> 00:17:34,000
find it. So I'm going to take where I am and walk my way down. 

373
00:17:34,000 --> 00:17:37,000
And this one actually just makes use of an existing 

374
00:17:37,000 --> 00:17:40,000
public number function to move cursor forward. It's like while the cursor next does 

375
00:17:40,000 --> 00:17:44,000
not equal null, which is to say while we are not pointing at the very last cell of the link list, 

376
00:17:44,000 --> 00:17:46,000
then keep going. 

377
00:17:46,000 --> 00:17:49,000
So advance cursor equals cursor. 

378
00:17:49,000 --> 00:17:50,000
Some will say, 

379
00:17:50,000 --> 00:17:54,000
"Is the cursor's next field null?" No, then set the cursor to be the cursor's 

380
00:17:54,000 --> 00:17:56,000
next field. 

381
00:17:56,000 --> 00:17:58,000
Is the cursor's next field null? 

382
00:17:58,000 --> 00:17:59,000


383
00:17:59,000 --> 00:18:02,000
No it's not, so advance the cursor to the next field. 

384
00:18:02,000 --> 00:18:04,000
Is the cursor's next field null? Yes. 

385
00:18:04,000 --> 00:18:06,000
So that would 

386
00:18:06,000 --> 00:18:08,000
the last thing in the buffer, 

387
00:18:08,000 --> 00:18:10,000
and so if we have 

388
00:18:10,000 --> 00:18:13,000
hundreds or thousands or 

389
00:18:13,000 --> 00:18:15,000
tens of characters, whichever it is, right, 

390
00:18:15,000 --> 00:18:18,000
and depending on how close we already were, it will walk through 

391
00:18:18,000 --> 00:18:21,000
everything from where the current editing position was to find the end of the 

392
00:18:21,000 --> 00:18:22,000
buffer 

393
00:18:22,000 --> 00:18:25,000
one by one, working it's way down. 

394
00:18:25,000 --> 00:18:26,000
Even more painful 

395
00:18:26,000 --> 00:18:28,000
operation 

396
00:18:28,000 --> 00:18:32,000
is the simple one of just moving backwards. 

397
00:18:32,000 --> 00:18:34,000
If I'm pointing right now to that B, 

398
00:18:34,000 --> 00:18:37,000
and I'd like to get back to that Y, the 

399
00:18:37,000 --> 00:18:41,000
link list is oblivious about how you got there. It knows where to go from 

400
00:18:41,000 --> 00:18:45,000
here, but that backing up is not supported in the simple form. 

401
00:18:45,000 --> 00:18:48,000
In order to find out what preceded the V, 

402
00:18:48,000 --> 00:18:51,000
our only option right now is to go back to the beginning, 

403
00:18:51,000 --> 00:18:53,000
and find it. 

404
00:18:53,000 --> 00:18:56,000
So starting this pointer CP at the head, 

405
00:18:56,000 --> 00:18:57,000
and saying, 

406
00:18:57,000 --> 00:18:58,000
do you point to; 

407
00:18:58,000 --> 00:19:01,000
is your next field the cursor? 

408
00:19:01,000 --> 00:19:04,000
And this one says, no. And then it says, well okay advance. Does your next field point to 

409
00:19:04,000 --> 00:19:08,000
the cursor? No it's not. Go to this one, does your next field point to the same place 

410
00:19:08,000 --> 00:19:12,000
as the cursor, yes. Wise next field, points to the same place the cursor 

411
00:19:12,000 --> 00:19:13,000
does. 

412
00:19:13,000 --> 00:19:18,000
That must mean that Y was the one that preceded the cursor in the link 

413
00:19:18,000 --> 00:19:22,000
list. So after having gone back to the beginning and walked all the way down, especially if I'm near 

414
00:19:22,000 --> 00:19:24,000
the end, that's a long traversal. 

415
00:19:24,000 --> 00:19:26,000
And when I get there, I can say "Oh yeah 

416
00:19:26,000 --> 00:19:34,000
that's the one, back it up to there." 

417
00:19:34,000 --> 00:19:37,000
Walking it all the way down; 

418
00:19:37,000 --> 00:19:41,000
some good link list coding, good practice in there, but 

419
00:19:41,000 --> 00:19:42,000
again a funny, 

420
00:19:42,000 --> 00:19:44,000
funny set of trade offs, right? 

421
00:19:44,000 --> 00:19:47,000
Does anybody have any questions about any part of the code? 

422
00:19:47,000 --> 00:19:49,000
At the end 

423
00:19:49,000 --> 00:19:51,000
could you just 

424
00:19:51,000 --> 00:19:53,000
save the address of the pointer, and then the 

425
00:19:53,000 --> 00:19:54,000
person 

426
00:19:54,000 --> 00:19:56,000
who wants to go to the end could just 

427
00:19:56,000 --> 00:19:59,000
access that? That was a great idea. It's like, so what about 

428
00:19:59,000 --> 00:20:02,000
improving this? Right now all we have is a pointer to the front, and each 

429
00:20:02,000 --> 00:20:06,000
pointer has only a pointer to the next. It's like, maybe we need to add some 

430
00:20:06,000 --> 00:20:07,000
more stuff, track 

431
00:20:07,000 --> 00:20:10,000
some more things. What if we tracked the 

432
00:20:10,000 --> 00:20:12,000
tail? Would that help us? 

433
00:20:12,000 --> 00:20:16,000
If we tracked the tail, we could move to the end quickly. Would 

434
00:20:16,000 --> 00:20:19,000
that help us moving backwards? 

435
00:20:19,000 --> 00:20:22,000
No. It solves one of our problems though, so you're on to something. 

436
00:20:22,000 --> 00:20:25,000
Let's look at what we've got, and then we'll start thinking about ways to fix it, because that's going to be 

437
00:20:25,000 --> 00:20:27,000
one of the pieces we're going to think about. 

438
00:20:27,000 --> 00:20:29,000
So if I move to link list 

439
00:20:29,000 --> 00:20:31,000
relative to what I have, 

440
00:20:31,000 --> 00:20:33,000
again, it's like a shell game. The old ends moved. 

441
00:20:33,000 --> 00:20:36,000
It used to be that editing was slow and we didn't like that. 

442
00:20:36,000 --> 00:20:38,000
Then we made big movement fast, 

443
00:20:38,000 --> 00:20:41,000
and now we have this funny thing were moving in certain ways is easy and 

444
00:20:41,000 --> 00:20:45,000
moving other ways is bad. So moving 

445
00:20:45,000 --> 00:20:46,000
down the document 

446
00:20:46,000 --> 00:20:49,000
is good, moving backwards in the document not good. 

447
00:20:49,000 --> 00:20:53,000
Moving back to the beginning is good, but moving to the end is bad. 

448
00:20:53,000 --> 00:20:56,000
But then editing in all situations is good, 

449
00:20:56,000 --> 00:20:59,000
another sort of quirky set of things. It just fells like it's like that game where the 

450
00:20:59,000 --> 00:21:00,000


451
00:21:00,000 --> 00:21:03,000
gophers pop there heads up, and you pound on them. And they just show up somewhere else. As 

452
00:21:03,000 --> 00:21:06,000
soon as you get one thing fixed, it seems like something else had 

453
00:21:06,000 --> 00:21:08,000
to give. 

454
00:21:08,000 --> 00:21:11,000
This, I think, would actually be my ideal editor, because it turns out I 

455
00:21:11,000 --> 00:21:14,000
have exactly this working style, which is like, I'll be down at the end of 

456
00:21:14,000 --> 00:21:17,000
the document, and I'll have to be generating new things. And 

457
00:21:17,000 --> 00:21:20,000
I'm getting tired and I don't like writing, and what I keep wanting to 

458
00:21:20,000 --> 00:21:24,000
do is I go back to the beginning. And I read that again, because I worked on that the most and 

459
00:21:24,000 --> 00:21:27,000
it's nice. And so I go back to the beginning, and I'm like oh yeah, look at this. Look at how lovely this 

460
00:21:27,000 --> 00:21:28,000
is, 

461
00:21:28,000 --> 00:21:31,000
and then I never want to go back to the end where all the trouble is. So I just 

462
00:21:31,000 --> 00:21:34,000
keep going back to the beginning and editing in the front, and then slowly get 

463
00:21:34,000 --> 00:21:37,000
back to the end. And then I don't like it again, and then I go back to the beginning. 

464
00:21:37,000 --> 00:21:39,000
But I never actually willingly go to the end 

465
00:21:39,000 --> 00:21:42,000
to face my fears. So 

466
00:21:42,000 --> 00:21:46,000
this is the editor I need. 

467
00:21:46,000 --> 00:21:47,000
Also 

468
00:21:47,000 --> 00:21:51,000
has a bit of kicker in terms of space. 

469
00:21:51,000 --> 00:21:56,000
So if the space for a character is 1 byte, which it typically is, 

470
00:21:56,000 --> 00:21:58,000
and the space for a pointer is 4 bytes, then 

471
00:21:58,000 --> 00:22:03,000
each of those link list cells is costing us 5 bytes, right? 

472
00:22:03,000 --> 00:22:03,000
Which is 

473
00:22:03,000 --> 00:22:07,000
sort of five times the amount of storage that the character itself 

474
00:22:07,000 --> 00:22:10,000
would have taken alone, 

475
00:22:10,000 --> 00:22:14,000
so in the vector it's fairly tight, a little bit more going up on that. But we've actually kind of blown 

476
00:22:14,000 --> 00:22:18,000
it up by another factor of two even beyond the stack, which is not quite a 

477
00:22:18,000 --> 00:22:22,000
good direction to be going. 

478
00:22:22,000 --> 00:22:24,000
So let's talk about what we can do to fix that. 

479
00:22:24,000 --> 00:22:27,000
I'm not going to go through the process of this. I'm just going to kind of give you a thought 

480
00:22:27,000 --> 00:22:29,000
exercise, so you can just kind of 

481
00:22:29,000 --> 00:22:30,000
visualize it. 

482
00:22:30,000 --> 00:22:34,000
What we have here, right, is an information problem. We 

483
00:22:34,000 --> 00:22:37,000
only are keeping track of certain 

484
00:22:37,000 --> 00:22:39,000
pathways through that link list. 

485
00:22:39,000 --> 00:22:42,000
What if we actually increased our access to the list? 

486
00:22:42,000 --> 00:22:46,000
We don't need to have the full access of a vector, being able to know the nth 

487
00:22:46,000 --> 00:22:49,000
character by number is not actually as far as we need to go. But we 

488
00:22:49,000 --> 00:22:53,000
do just need a little bit more coordination between a cell and its neighbors. 

489
00:22:53,000 --> 00:22:56,000
So the two things that would buy us out of a lot of the things that we're seeing 

490
00:22:56,000 --> 00:22:58,000
here is if we added a tail pointer. 

491
00:22:58,000 --> 00:23:01,000
So if I just add a tail pointer that points to the last cell, then I can move to the end 

492
00:23:01,000 --> 00:23:02,000
quickly. 

493
00:23:02,000 --> 00:23:05,000
You say cursor equals tail; the same that cursor equals head. So what was an O of N 

494
00:23:05,000 --> 00:23:07,000
problem now is an O of 1. 

495
00:23:07,000 --> 00:23:08,000
The backing up, 

496
00:23:08,000 --> 00:23:12,000
not as easily solved by just having one thing that's going on; 

497
00:23:12,000 --> 00:23:13,000
I also am going to need, 

498
00:23:13,000 --> 00:23:17,000
on a per cell basis this ability to move backwards. 

499
00:23:17,000 --> 00:23:19,000


500
00:23:19,000 --> 00:23:22,000
Well, it's just symmetry, right? If A knows where B is, and B knows where C is, and C knows where D is, what's to stop 

501
00:23:22,000 --> 00:23:25,000
it from also just tracking the information going in the other way, 

502
00:23:25,000 --> 00:23:30,000
so that D knows that it came from C, and C knows it came from B, and so on. 

503
00:23:30,000 --> 00:23:35,000
And so having this completely symmetric parallel set of links that just run the other 

504
00:23:35,000 --> 00:23:35,000
direction, 

505
00:23:35,000 --> 00:23:39,000
then buys you the ability to move backwards with ease. 

506
00:23:39,000 --> 00:23:42,000
The other thing that this would give you is the tail pointer 

507
00:23:42,000 --> 00:23:46,000
is almost not quite enough to give you the exact place of the end. It gives it to 

508
00:23:46,000 --> 00:23:47,000
you, but you also have to be 

509
00:23:47,000 --> 00:23:49,000
careful about what is it going to take to update it? 

510
00:23:49,000 --> 00:23:53,000
And so inserting can easily update the tail pointer. It's the deleting that kind of 

511
00:23:53,000 --> 00:23:55,000
would get you into trouble. If you were 

512
00:23:55,000 --> 00:23:58,000
here and deleting, you would have to kind of keep track of making sure if you were deleting 

513
00:23:58,000 --> 00:24:01,000
the thing that was the tail pointer, so you have to be a little about making sure 

514
00:24:01,000 --> 00:24:04,000
you don't lose track of your tail. 

515
00:24:04,000 --> 00:24:06,000
But once I have both of these in place, 

516
00:24:06,000 --> 00:24:09,000
I suddenly have something that can move 

517
00:24:09,000 --> 00:24:10,000
forward easily, O of 1; 

518
00:24:10,000 --> 00:24:15,000
move to the front easily, O of 1; move to the tail easily, O of 1; move backwards O of 1. 

519
00:24:15,000 --> 00:24:20,000
So I can make all of my operations 

520
00:24:20,000 --> 00:24:22,000
O of 1. 

521
00:24:22,000 --> 00:24:24,000
Deleting, you're still going to have to delete the individual cells, but there's one place where 

522
00:24:24,000 --> 00:24:27,000
it suffers, but that's a much more rare occurrence. 

523
00:24:27,000 --> 00:24:31,000
And I can get all six; it moves fast, in every dimension; 

524
00:24:31,000 --> 00:24:32,000
inserts and deletes 

525
00:24:32,000 --> 00:24:35,000
all good. 

526
00:24:35,000 --> 00:24:37,000
And there are two places we paid for it. 

527
00:24:37,000 --> 00:24:40,000
One is certainly the memory. 

528
00:24:40,000 --> 00:24:45,000
So a 4 byte pointer one direction, a 4 byte pointer another direction, 

529
00:24:45,000 --> 00:24:49,000
plus the 1 byte for the character means we have a 9 byte per 

530
00:24:49,000 --> 00:24:50,000
character 

531
00:24:50,000 --> 00:24:52,000
storage usage. 

532
00:24:52,000 --> 00:24:55,000
And the other way, which is a little bit hard to represent with some number 

533
00:24:55,000 --> 00:24:57,000
in a quantitative way, but actually it's also an important factor, which is 

534
00:24:57,000 --> 00:25:00,000
your code got a lot more complicated. 

535
00:25:00,000 --> 00:25:02,000
If you looked at the code for the vector of the stack version, it was a few 

536
00:25:02,000 --> 00:25:03,000
lines here a few lines there. 

537
00:25:03,000 --> 00:25:06,000
If you looked at the code for the link list singly we looked at, it's like you've got 

538
00:25:06,000 --> 00:25:07,000
to wire up these pointers. 

539
00:25:07,000 --> 00:25:10,000
Now imagine you've got to wire up twice as many pointers. 

540
00:25:10,000 --> 00:25:14,000
Not only do you have the in and out going one direction, you'll have the in and out going the 

541
00:25:14,000 --> 00:25:15,000
other direction, 

542
00:25:15,000 --> 00:25:17,000
and just the 

543
00:25:17,000 --> 00:25:21,000
opportunity to make errors with pointers has now gone up by a factor of two. 

544
00:25:21,000 --> 00:25:24,000
The idea that you could get a vector version of this up and running in 

545
00:25:24,000 --> 00:25:26,000
no time, 

546
00:25:26,000 --> 00:25:30,000
compared to several days, let's say, of getting the doubly link one working. It 

547
00:25:30,000 --> 00:25:32,000
might be a factor to keep in mind. 

548
00:25:32,000 --> 00:25:36,000
If you're kind of investing for the long haul, sure, work hard, but know 

549
00:25:36,000 --> 00:25:41,000
that it wasn't for free. 

550
00:25:41,000 --> 00:25:42,000
So you've 

551
00:25:42,000 --> 00:25:45,000
got 89 percent overhead. 

552
00:25:45,000 --> 00:25:49,000
That's a lot of overhead for small bits of data. 

553
00:25:49,000 --> 00:25:53,000
More likely, and this is actually the way that most modern word processors 

554
00:25:53,000 --> 00:25:54,000
do do this, 

555
00:25:54,000 --> 00:25:58,000
is they don't just make one or the other; it's not just a array or a stack or a link list. 

556
00:25:58,000 --> 00:25:59,000
They 

557
00:25:59,000 --> 00:26:04,000
actually look at ways to combine the features of both to get a hybrid, 

558
00:26:04,000 --> 00:26:05,000
where you can take 

559
00:26:05,000 --> 00:26:10,000
some of the advantages of one pipe of data, but also try to avoid its worst 

560
00:26:10,000 --> 00:26:13,000
weaknesses. So the most common way they're going to do this is some kind of 

561
00:26:13,000 --> 00:26:15,000
chunking strategy, 

562
00:26:15,000 --> 00:26:17,000
where you have a segment of the 

563
00:26:17,000 --> 00:26:21,000
buffer that kind of is moved aside and worked on in one little array. It 

564
00:26:21,000 --> 00:26:25,000
might be that those arrays are ten or 20 or a sentence long, or paragraph long. 

565
00:26:25,000 --> 00:26:28,000
It might be that they are actually dictated by when you change styles. 

566
00:26:28,000 --> 00:26:30,000
All of the code that's in one font 

567
00:26:30,000 --> 00:26:33,000
kind of gets moved into one chunk, and as soon as you change styles, it breaks 

568
00:26:33,000 --> 00:26:37,000
into a new chunk that follows to kind of keep track of the formatting information. It 

569
00:26:37,000 --> 00:26:40,000
depends on some of the other features that are present in the processor 

570
00:26:40,000 --> 00:26:43,000
here, but it's likely that rather than having all 1,000 or all 10 million 

571
00:26:43,000 --> 00:26:45,000
characters in one 

572
00:26:45,000 --> 00:26:46,000
data structure, 

573
00:26:46,000 --> 00:26:47,000
they're actually kind of separated 

574
00:26:47,000 --> 00:26:50,000
to where you have a little bit of both. So the most likely thing is that 

575
00:26:50,000 --> 00:26:53,000
there is some kind of array strategy on a chunk basis. 

576
00:26:53,000 --> 00:26:56,000
There's a linking between those chunks. 

577
00:26:56,000 --> 00:26:59,000
And what this allows you do is to 

578
00:26:59,000 --> 00:27:01,000
share that overhead cost. 

579
00:27:01,000 --> 00:27:05,000
So in this case, if I had every sentence in a chunk by itself, 

580
00:27:05,000 --> 00:27:08,000
and then if I had next and previous pointer, which pointed to the next sentence and the previous 

581
00:27:08,000 --> 00:27:09,000


582
00:27:09,000 --> 00:27:11,000
sentence, when I need to move the cursor forward or backwards 

583
00:27:11,000 --> 00:27:15,000
within a chunk, it's just in changing this index. Oh, it's in index two, but 

584
00:27:15,000 --> 00:27:16,000
now it's moving to three, or four, or five. 

585
00:27:16,000 --> 00:27:20,000
And then when it gets to the end of that chunk, to the end of that sentence, and you move forward, well then it 

586
00:27:20,000 --> 00:27:23,000
follows the next link forward. 

587
00:27:23,000 --> 00:27:25,000
Similarly moving backwards, right, so 

588
00:27:25,000 --> 00:27:29,000
within a chunk doing array-like manipulations, and then as it crosses the 

589
00:27:29,000 --> 00:27:32,000
boundary using a link list steps to get further down. 

590
00:27:32,000 --> 00:27:35,000
So when it needs to move to the front or the back, those things are easily 

591
00:27:35,000 --> 00:27:37,000
accessible using head and tail pointers. 

592
00:27:37,000 --> 00:27:39,000
When I need to do an edit, 

593
00:27:39,000 --> 00:27:42,000
it operates in this array-like kind of 

594
00:27:42,000 --> 00:27:44,000
way 

595
00:27:44,000 --> 00:27:46,000
when there's space within that chunk. So you start editing in a sentence, then it 

596
00:27:46,000 --> 00:27:49,000
would kind of do some shifting on that array. But the array itself being pretty 

597
00:27:49,000 --> 00:27:51,000
small 

598
00:27:51,000 --> 00:27:53,000
means that although you are paying that cost of shuffling, 

599
00:27:53,000 --> 00:27:56,000
you are doing it on a smaller chunk. It's not 1,000 characters that have to move, it's 

600
00:27:56,000 --> 00:27:59,000
20 or 40 that have to move within that sentence. 

601
00:27:59,000 --> 00:28:02,000
And that when 

602
00:28:02,000 --> 00:28:04,000
you need to insert a new sentence or a new chunk in the middle, you are doing 

603
00:28:04,000 --> 00:28:08,000
link list-like manipulations to build a new chunk and insert and splice, 

604
00:28:08,000 --> 00:28:11,000
so you have that flexibility, where the many characters aren't 

605
00:28:11,000 --> 00:28:14,000
all affected by it. So you have kind of local array editing 

606
00:28:14,000 --> 00:28:16,000
and global link list editing 

607
00:28:16,000 --> 00:28:18,000
that gives you kind of the best of both worlds. 

608
00:28:18,000 --> 00:28:19,000


609
00:28:19,000 --> 00:28:23,000
But again, the factor is cost showing up in code complexity. 

610
00:28:23,000 --> 00:28:27,000
That what you are looking at is kind of space/time tradeoff. Well, I can throw space at 

611
00:28:27,000 --> 00:28:29,000
this, and most people will say that in computer science, "Well I can throw space and this and 

612
00:28:29,000 --> 00:28:31,000
get better times." So 

613
00:28:31,000 --> 00:28:35,000
the double link list is a good example of something that throws space at a problem to get better 

614
00:28:35,000 --> 00:28:37,000
time. There's a lot of overhead. 

615
00:28:37,000 --> 00:28:39,000
Moving to the chunk list says I want to get back some of that space. I'm 

616
00:28:39,000 --> 00:28:43,000
not willing to invest 90 percent overhead to get this. Can I find a way to 

617
00:28:43,000 --> 00:28:44,000
take a 

618
00:28:44,000 --> 00:28:47,000
whole sentence worth of things and add a few pointers here? That means that 

619
00:28:47,000 --> 00:28:48,000
there's only two 

620
00:28:48,000 --> 00:28:51,000
4 byte pointers per each sentence as opposed to each character, 

621
00:28:51,000 --> 00:28:53,000
and that really way reduces the overhead. 

622
00:28:53,000 --> 00:28:55,000
But then now, 

623
00:28:55,000 --> 00:28:58,000
the problem is still back on your shoulders in terms of effort, 

624
00:28:58,000 --> 00:29:02,000
now I have both an array, and a link list, and double links, and 

625
00:29:02,000 --> 00:29:03,000
a 

626
00:29:03,000 --> 00:29:07,000
lot of complicated stuff flying around to make it work. 

627
00:29:07,000 --> 00:29:08,000
But it does work out. 

628
00:29:08,000 --> 00:29:11,000
It's kind of the process that designers go through when 

629
00:29:11,000 --> 00:29:12,000
they're building these kinds of 

630
00:29:12,000 --> 00:29:16,000
key structures. It like, what are the options, and do any of the basic 

631
00:29:16,000 --> 00:29:18,000
things we know about work? What about 

632
00:29:18,000 --> 00:29:21,000
building even fancier things that kind of take the best of both world and mix 

633
00:29:21,000 --> 00:29:23,000
them together and have a little bit of the weaknesses of this a little of that, but none of 

634
00:29:23,000 --> 00:29:26,000
the really awful 

635
00:29:26,000 --> 00:29:27,000
worst cases 

636
00:29:27,000 --> 00:29:28,000
present in the end result. 

637
00:29:28,000 --> 00:29:31,000
So in this case you'd have a lot of things that were constant time where constant was 

638
00:29:31,000 --> 00:29:33,000
defined by the chunk size. 

639
00:29:33,000 --> 00:29:37,000
So O of 100, let's say, if that was the maximum chunk size 

640
00:29:37,000 --> 00:29:39,000
to do these things, as opposed to O of 

641
00:29:39,000 --> 00:29:42,000
N for the whole block net, 

642
00:29:42,000 --> 00:29:43,000
so kind of a neat 

643
00:29:43,000 --> 00:29:45,000
little thing to think through. 

644
00:29:45,000 --> 00:29:48,000
There was a time, just so you can feel 

645
00:29:48,000 --> 00:29:51,000
gratified about how I've worked you very hard, but actually in the past I've been 

646
00:29:51,000 --> 00:29:53,000
even harsher. 

647
00:29:53,000 --> 00:29:57,000
We actually had them build the doubly linked chunked list editor buffer, 

648
00:29:57,000 --> 00:30:00,000
and now we have you build the singly linked chunk list PQ, and it's 

649
00:30:00,000 --> 00:30:02,000
actually 

650
00:30:02,000 --> 00:30:06,000
still very complicated, as you'll find out, but not nearly quite as hairy as the 

651
00:30:06,000 --> 00:30:09,000
full in both directions every way 

652
00:30:09,000 --> 00:30:11,000
craziness. 

653
00:30:11,000 --> 00:30:14,000
And you can kind of imagine in your mind. 

654
00:30:14,000 --> 00:30:16,000
Any questions about edited buffer? 

655
00:30:16,000 --> 00:30:18,000
We?ve got two more data structures to implement, 

656
00:30:18,000 --> 00:30:20,000
two more things that we've been using, 

657
00:30:20,000 --> 00:30:25,000
and happy to have in our arsenal that we need to know how they work. 

658
00:30:25,000 --> 00:30:29,000
Let's talk about map first, and in 

659
00:30:29,000 --> 00:30:31,000
fact the strategy that I'm going to end up showing you for map is actually going to be 

660
00:30:31,000 --> 00:30:35,000
the same one we use for set, then we'll come back and think of an even more clever 

661
00:30:35,000 --> 00:30:37,000
way to do map, too. So 

662
00:30:37,000 --> 00:30:40,000
we've got two things to talk about, which are binary search trees and hashing. We'll 

663
00:30:40,000 --> 00:30:41,000
do binary search trees first, 

664
00:30:41,000 --> 00:30:44,000
and then we'll get back to the hashing the second time around. 

665
00:30:44,000 --> 00:30:46,000
So 

666
00:30:46,000 --> 00:30:49,000
map is, next to vector, probably the most useful thing you've got going on in 

667
00:30:49,000 --> 00:30:50,000
your class library, 

668
00:30:50,000 --> 00:30:53,000
all kinds of things that do look up based 

669
00:30:53,000 --> 00:30:56,000
activities; looking up students by ID number, 

670
00:30:56,000 --> 00:30:57,000
phone numbers, 

671
00:30:57,000 --> 00:31:00,000
by name, any of these kinds of things where you need to have this key at associated satellite 

672
00:31:00,000 --> 00:31:02,000
information. And 

673
00:31:02,000 --> 00:31:05,000
you'd like to be able to do that retrieval and update quickly. 

674
00:31:05,000 --> 00:31:08,000
The map is the one for you. 

675
00:31:08,000 --> 00:31:11,000
So if you remember it's key value fears, it's all about key being the 

676
00:31:11,000 --> 00:31:15,000
identifying mark that allows you to find the associated value. And then the value 

677
00:31:15,000 --> 00:31:18,000
is actually just being stored and kind of retrieved without using it 

678
00:31:18,000 --> 00:31:21,000
or examining it in any kind of interesting way. 

679
00:31:21,000 --> 00:31:25,000
And what we're really interested in here is how to make this guy work really efficiently. So efficiently 

680
00:31:25,000 --> 00:31:27,000
being hopefully log in or better, 

681
00:31:27,000 --> 00:31:31,000
if we could get that on both the update operations that add and remove things 

682
00:31:31,000 --> 00:31:33,000
from the map, as well as the 

683
00:31:33,000 --> 00:31:34,000
look up operation, then 

684
00:31:34,000 --> 00:31:37,000
that would please probably all our clients immensely, if we could get both of 

685
00:31:37,000 --> 00:31:41,000
those going well. 

686
00:31:41,000 --> 00:31:45,000
So I'm going to build you on that. I probably have enough time to do that. 

687
00:31:45,000 --> 00:31:46,000
That 

688
00:31:46,000 --> 00:31:48,000
attacks it from 

689
00:31:48,000 --> 00:31:53,000
using simple tools to start with, just to kind of get a base line for what we can do. 

690
00:31:53,000 --> 00:31:54,000


691
00:31:54,000 --> 00:31:56,000
Vector, our old friend vector, 

692
00:31:56,000 --> 00:31:59,000
can't get too much mileage out of vector, apparently, because you can always find a 

693
00:31:59,000 --> 00:32:00,000
way to make it do something 

694
00:32:00,000 --> 00:32:01,000
useful for you. 

695
00:32:01,000 --> 00:32:02,000
It 

696
00:32:02,000 --> 00:32:03,000
gives us the convenience 

697
00:32:03,000 --> 00:32:05,000


698
00:32:05,000 --> 00:32:07,000
of managing that thing with low overhead, 

699
00:32:07,000 --> 00:32:09,000
direct access is good. 

700
00:32:09,000 --> 00:32:11,000
We make a parastruct 

701
00:32:11,000 --> 00:32:13,000
that has the key and the value together. 

702
00:32:13,000 --> 00:32:16,000
We store it into a vector, 

703
00:32:16,000 --> 00:32:18,000
and then we 

704
00:32:18,000 --> 00:32:21,000
may or may not sort that vector. But if it's sorted, there's probably some good reasons. Think 

705
00:32:21,000 --> 00:32:22,000
about 

706
00:32:22,000 --> 00:32:26,000
what order to sort it in, what information to be using to sort that, 

707
00:32:26,000 --> 00:32:28,000
and then we'll get value to add. We're just going to go 

708
00:32:28,000 --> 00:32:30,000
through doing it, rather than talk about it. 

709
00:32:30,000 --> 00:32:32,000
We'll just go over and 

710
00:32:32,000 --> 00:32:36,000
make a vector happen. 

711
00:32:36,000 --> 00:32:38,000
Let's get over here in X-code. 

712
00:32:38,000 --> 00:32:40,000
Let's take a look at 

713
00:32:40,000 --> 00:32:42,000
my map dot H, 

714
00:32:42,000 --> 00:32:43,000
and all 

715
00:32:43,000 --> 00:32:46,000
I'm going to put in there is actually add and get value. 

716
00:32:46,000 --> 00:32:48,000
And 

717
00:32:48,000 --> 00:32:50,000
the other one contains key and remove." 

718
00:32:50,000 --> 00:32:52,000


719
00:32:52,000 --> 00:32:55,000
But these are these two operations that really matter to us, getting something in there, and 

720
00:32:55,000 --> 00:32:57,000
getting something back out. 

721
00:32:57,000 --> 00:32:58,000


722
00:32:58,000 --> 00:33:00,000
And so I'm going to start by 

723
00:33:00,000 --> 00:33:02,000
building this thing on vector. So let me go ahead and 

724
00:33:02,000 --> 00:33:14,000
indicate that I plan to use vector as part of 

725
00:33:14,000 --> 00:33:15,000
what I'm doing. So I defined a little structure 

726
00:33:15,000 --> 00:33:20,000
that's the pair. I'm going to call that "pair" 

727
00:33:20,000 --> 00:33:23,000
even, that's a nice word for it. So the pair of the key and the value that go together, 

728
00:33:23,000 --> 00:33:29,000
and then what I will store here 

729
00:33:29,000 --> 00:33:31,000
is a vector of those pairs. They'll 

730
00:33:31,000 --> 00:33:36,000
give me a key to value; I'll stick them together in that little struct, stick them into my vector, and 

731
00:33:36,000 --> 00:33:38,000
hopefully be able to retrieve them later. 

732
00:33:38,000 --> 00:33:42,000
So given that I'm depending only on vector, I don?t have any other 

733
00:33:42,000 --> 00:33:45,000
information, I don't technically need to go out of my way to disallow then 

734
00:33:45,000 --> 00:33:48,000
memory-wise copying, because vector does a deep copy. But I'm going to leave it in there, because I 

735
00:33:48,000 --> 00:33:52,000
plan on going some place that is eventually going to need it. I might as well be safe from 

736
00:33:52,000 --> 00:33:54,000
the get go. 

737
00:33:54,000 --> 00:33:55,000
The vector 

738
00:33:55,000 --> 00:34:00,000
will be set up and destroyed automatically as part of management of my 

739
00:34:00,000 --> 00:34:04,000
data member. So I actually don't have any explicit allocation/deallocation, so if I look at my 

740
00:34:04,000 --> 00:34:06,000
constructor and destructor they'll be totally empty. 

741
00:34:06,000 --> 00:34:11,000
And then add and get value are where I'm going to actually get to do some work. 

742
00:34:11,000 --> 00:34:14,000
Before I implement them, I'm just going to think for a second about 

743
00:34:14,000 --> 00:34:16,000
how this is going to play out. 

744
00:34:16,000 --> 00:34:20,000
Let's say I'm doing a vector that's storing 

745
00:34:20,000 --> 00:34:26,000
strings and their length. Let's say, it's just a map of 

746
00:34:26,000 --> 00:34:27,000
string, 

747
00:34:27,000 --> 00:34:30,000


748
00:34:30,000 --> 00:34:34,000
and if I do M dot add 

749
00:34:34,000 --> 00:34:37,000
car. I'm going to store a three with it, 

750
00:34:37,000 --> 00:34:39,000
so on the other side of the wall, 

751
00:34:39,000 --> 00:34:41,000
what we're going to do is make a little struct 

752
00:34:41,000 --> 00:34:43,000
that has the word car 

753
00:34:43,000 --> 00:34:47,000
and the number three with it, and we package it into our ray. 

754
00:34:47,000 --> 00:34:51,000
And so then we get a second call to put something in like 

755
00:34:51,000 --> 00:34:55,000
dog, also with a three, then we'll make a second one of these and stick it in 

756
00:34:55,000 --> 00:34:58,000
our ray. 

757
00:34:58,000 --> 00:35:01,000
And so on, so the idea is to have the vector really kind of manage where the storage is going. And 

758
00:35:01,000 --> 00:35:02,000
then what 

759
00:35:02,000 --> 00:35:05,000
we'll just do is package it up and stick it in there. 

760
00:35:05,000 --> 00:35:08,000
The one thing, though, that I have to think a little bit about, because it sounds 

761
00:35:08,000 --> 00:35:11,000
like add should be pretty easy. It almost seems like what I need to do is something 

762
00:35:11,000 --> 00:35:13,000
as simple as this. And 

763
00:35:13,000 --> 00:35:15,000
I'll start to write the code, and then you can tell me why 

764
00:35:15,000 --> 00:35:17,000
this isn't what I want to do. 

765
00:35:17,000 --> 00:35:20,000
I say pair TP, and I say P dot 

766
00:35:20,000 --> 00:35:23,000
key equals key P dot 

767
00:35:23,000 --> 00:35:30,000
value equals val, and then I say entries dot add P. 

768
00:35:30,000 --> 00:35:32,000
Almost, but not quite what I want, 

769
00:35:32,000 --> 00:35:34,000
what have I forgotten 

770
00:35:34,000 --> 00:35:39,000
to take care of? 

771
00:35:39,000 --> 00:35:42,000
If it's not already in there; 

772
00:35:42,000 --> 00:35:46,000
this has to do with just what's the interface for nap. What does nap say about it if you 

773
00:35:46,000 --> 00:35:48,000
try to add 

774
00:35:48,000 --> 00:35:51,000
for a key a different value. So in the case of this one, it didn't quite make sense that I would add 

775
00:35:51,000 --> 00:35:55,000
something, but let's say I accidentally put car in there with a link to the first time that I went 

776
00:35:55,000 --> 00:35:57,000
to go fix it, right, 

777
00:35:57,000 --> 00:36:01,000
that my subsequent to ask it store car with the proper value, 

778
00:36:01,000 --> 00:36:03,000
three, 

779
00:36:03,000 --> 00:36:05,000
doesn't create a totally new entry. 

780
00:36:05,000 --> 00:36:09,000
It has to find the existing entry that already has car and overwrite it. 

781
00:36:09,000 --> 00:36:13,000
So the behavior of mad was it could add when it was not previously existing. 

782
00:36:13,000 --> 00:36:16,000
It also could be an overwrite of an existing value. So we do need 

783
00:36:16,000 --> 00:36:17,000
to find 

784
00:36:17,000 --> 00:36:20,000
if there is an entry that already has that key, and just 

785
00:36:20,000 --> 00:36:21,000
replace its value 

786
00:36:21,000 --> 00:36:23,000
rather than 

787
00:36:23,000 --> 00:36:26,000
add a second entry with the same key. So at any given point, right, in this whole 

788
00:36:26,000 --> 00:36:28,000
vector, there should be one entry 

789
00:36:28,000 --> 00:36:30,000
for each unique key. 

790
00:36:30,000 --> 00:36:33,000
So I have to do a search. And 

791
00:36:33,000 --> 00:36:35,000
let me 

792
00:36:35,000 --> 00:36:38,000
write this code, and then I'm going to mention why I'm going to decompose it in a 

793
00:36:38,000 --> 00:36:39,000
second. 

794
00:36:39,000 --> 00:36:42,000
If I do entries dot size, I plus plus 

795
00:36:42,000 --> 00:36:44,000
then if 

796
00:36:44,000 --> 00:36:49,000
entries of I dot key equals key. 

797
00:36:49,000 --> 00:36:51,000
Then 

798
00:36:51,000 --> 00:36:52,000
let's break. Let me pull out 

799
00:36:52,000 --> 00:37:00,000
I so I can get access to it when I'm done. So after this 

800
00:37:00,000 --> 00:37:01,000
loop exits, 

801
00:37:01,000 --> 00:37:06,000
if I is less than entries dot size, 

802
00:37:06,000 --> 00:37:06,000
then 

803
00:37:06,000 --> 00:37:09,000
I know that it found a match. 

804
00:37:09,000 --> 00:37:12,000
In the case where it went through all of them, and it never hit the break. Then I will 

805
00:37:12,000 --> 00:37:15,000
have gone all the way to where it is exactly equal to entries dot size; if it didn't then I 

806
00:37:15,000 --> 00:37:16,000
just 

807
00:37:16,000 --> 00:37:19,000
have a place to overwrite. And I can say entries of 

808
00:37:19,000 --> 00:37:21,000
I dot val 

809
00:37:21,000 --> 00:37:23,000
equals val. 

810
00:37:23,000 --> 00:37:27,000
Otherwise I go through this process of adding a new one. I had to 

811
00:37:27,000 --> 00:37:30,000
go hunt that thing down 

812
00:37:30,000 --> 00:37:33,000
and replace it. 

813
00:37:33,000 --> 00:37:35,000
That code, 

814
00:37:35,000 --> 00:37:38,000
when you look at it, you can say, "Gosh I feel like that code's going to be familiar." 

815
00:37:38,000 --> 00:37:43,000
Because that idea of searching to find a match to the key 

816
00:37:43,000 --> 00:37:45,000
probably is going to come in really handy 

817
00:37:45,000 --> 00:37:46,000
when I need to do get value, 

818
00:37:46,000 --> 00:37:50,000
but it has the same exact problem of oh, I've got to go find that match. So in fact, 

819
00:37:50,000 --> 00:37:55,000
that kind of motivates the idea of why don't I just take this little piece of code 

820
00:37:55,000 --> 00:37:59,000
and separate it out and do a helper that they can both use. I 

821
00:37:59,000 --> 00:38:02,000
may not have planned for this from the beginning, but once I see it happening, I might as well fix 

822
00:38:02,000 --> 00:38:05,000
it. So I can say find index 

823
00:38:05,000 --> 00:38:07,000
for key that 

824
00:38:07,000 --> 00:38:10,000
given a key will return to you 

825
00:38:10,000 --> 00:38:11,000


826
00:38:11,000 --> 00:38:15,000
the vector index at which 

827
00:38:15,000 --> 00:38:19,000
that key is or a negative one. I don't know if I need to keep that 

828
00:38:19,000 --> 00:38:22,000
one anymore. 

829
00:38:22,000 --> 00:38:24,000
If it ever found it, it returns I otherwise 

830
00:38:24,000 --> 00:38:28,000
it returned negative one. That can be our not found. 

831
00:38:28,000 --> 00:38:37,000
I can actually use that up here, 

832
00:38:37,000 --> 00:38:40,000
and then I can say, if found 

833
00:38:40,000 --> 00:38:43,000
does not equal negative one. 

834
00:38:43,000 --> 00:38:44,000
Then we do that, 

835
00:38:44,000 --> 00:38:46,000
and then that 

836
00:38:46,000 --> 00:38:48,000
tells us that 

837
00:38:48,000 --> 00:38:51,000
this little piece of code is going to come in handy right up here, 

838
00:38:51,000 --> 00:38:55,000
get value if found does not equal 

839
00:38:55,000 --> 00:38:56,000
negative one, then we can return that 

840
00:38:56,000 --> 00:39:02,000
and what is the behavior of get value if it didn't find it? Does anybody remember? 

841
00:39:02,000 --> 00:39:06,000
I ask it to get the value in this case for lollipop and it's not 

842
00:39:06,000 --> 00:39:07,000
there, 

843
00:39:07,000 --> 00:39:13,000
does it turn zero, what does it 

844
00:39:13,000 --> 00:39:22,000
do? Untracked forget value, remember? 

845
00:39:22,000 --> 00:39:25,000
This is an error. 

846
00:39:25,000 --> 00:39:26,000
There is not sort of a 

847
00:39:26,000 --> 00:39:30,000
general case return value you can produce here 

848
00:39:30,000 --> 00:39:33,000
that will really sort of sufficiently describe to someone who's made 

849
00:39:33,000 --> 00:39:37,000
this call in error what happened, right? I can't return zero, because it happens to be 

850
00:39:37,000 --> 00:39:40,000
that sometimes I?m putting in, and sometimes I'm putting strings or structs 

851
00:39:40,000 --> 00:39:44,000
kinds of things. There's no general zero or negative one to use, 

852
00:39:44,000 --> 00:39:47,000
so the behavior of get value is if you ask it to retrieve a value for a key, it ought to be 

853
00:39:47,000 --> 00:39:48,000
there. So 

854
00:39:48,000 --> 00:39:52,000
the corresponding implementation would likely also just go through 

855
00:39:52,000 --> 00:39:56,000
the process of contains key, which we call find index for key and check. 

856
00:39:56,000 --> 00:39:58,000
It could be used by the client 

857
00:39:58,000 --> 00:40:01,000
if they need to know in advance that it's really there. 

858
00:40:01,000 --> 00:40:04,000
So let's see how I'm 

859
00:40:04,000 --> 00:40:06,000
doing. Let's go back to my 

860
00:40:06,000 --> 00:40:11,000
code here, 

861
00:40:11,000 --> 00:40:13,000
and this 

862
00:40:13,000 --> 00:40:15,000
being 

863
00:40:15,000 --> 00:40:16,000
not my specialty, 

864
00:40:16,000 --> 00:40:20,000
and just see that I can put something in and 

865
00:40:20,000 --> 00:40:25,000
get it back out would be a good first test to see that things are 

866
00:40:25,000 --> 00:40:29,000
going as they should. Oh, no it didn't like that. Oh yeah, find 

867
00:40:29,000 --> 00:40:30,000
index for key. 

868
00:40:30,000 --> 00:40:32,000
Find index for key 

869
00:40:32,000 --> 00:40:34,000
was not 

870
00:40:34,000 --> 00:40:38,000
populated clear in my dot H, so let me go move it there. I'll put 

871
00:40:38,000 --> 00:40:42,000
it in the private section, because I don?t want this to be something that is 

872
00:40:42,000 --> 00:40:45,000
exposed outside of the implementation. The idea that there is an index and 

873
00:40:45,000 --> 00:40:48,000
there is a vector is not really something that I would want to make part of my 

874
00:40:48,000 --> 00:40:54,000
public interface. It's really just an internal detail of how I'm managing stuff. 

875
00:40:54,000 --> 00:40:56,000
Something about what I did, oh yeah, I 

876
00:40:56,000 --> 00:41:00,000
used the right name for that. 

877
00:41:00,000 --> 00:41:05,000
One more case let 

878
00:41:05,000 --> 00:41:06,000
me just take a look at them both 

879
00:41:06,000 --> 00:41:09,000
before I let the compiler tell me what is and what isn't right about them. Their both 

880
00:41:09,000 --> 00:41:13,000
doing find index for key using linear search. If it comes back at not negative one 

881
00:41:13,000 --> 00:41:17,000
pulling out the matching value and overriding it, and then in the case of adding 

882
00:41:17,000 --> 00:41:23,000
or replacing the error - 

883
00:41:23,000 --> 00:41:24,000
so the 

884
00:41:24,000 --> 00:41:26,000
previously stored value for that is three, 

885
00:41:26,000 --> 00:41:29,000
and if I go and I do one of those tests it's always good to know what 

886
00:41:29,000 --> 00:41:34,000
happens if I ask it for something that doesn't exist. 

887
00:41:34,000 --> 00:41:37,000
Making sure that the error handling that I put in there does 

888
00:41:37,000 --> 00:41:43,000
what it was supposed to do, right? Are 

889
00:41:43,000 --> 00:41:46,000
there any questions about the code that I wrote here? 

890
00:41:46,000 --> 00:41:49,000
The vector is always kind of a good starting place for these things, because actually it just 

891
00:41:49,000 --> 00:41:53,000
tends to use very simple things, and it tends to lend itself to easy code. 

892
00:41:53,000 --> 00:41:57,000
But because of vector constraints being at a contiguous back structure, right, there's likely to 

893
00:41:57,000 --> 00:41:59,000
be some 

894
00:41:59,000 --> 00:42:01,000
performance implication. And in 

895
00:42:01,000 --> 00:42:02,000
particular, 

896
00:42:02,000 --> 00:42:04,000
in the unsorted case here, 

897
00:42:04,000 --> 00:42:07,000
both add and get value 

898
00:42:07,000 --> 00:42:09,000
involve doing a linear search. 

899
00:42:09,000 --> 00:42:12,000
In some cases it will find it quickly at the very beginning or even half way 

900
00:42:12,000 --> 00:42:13,000
through. 

901
00:42:13,000 --> 00:42:17,000
In other cases it won't find it at all and have to search through the whole thing. So on 

902
00:42:17,000 --> 00:42:20,000
average, if you figure it's equally likely to be in any position or not there at all, 

903
00:42:20,000 --> 00:42:23,000
it's at going to at least have to look at half of them or more. 

904
00:42:23,000 --> 00:42:25,000
And so we can say they're both linear. 

905
00:42:25,000 --> 00:42:28,000
If we change it to be sorted, 

906
00:42:28,000 --> 00:42:31,000
which is the first obvious improvement to make here, 

907
00:42:31,000 --> 00:42:34,000
then get value 

908
00:42:34,000 --> 00:42:37,000
gets a lot faster, because we can take advantage of binary search. 

909
00:42:37,000 --> 00:42:39,000
I sort it by key, 

910
00:42:39,000 --> 00:42:43,000
so ignoring the value, values are just satellite data. But stored in order of 

911
00:42:43,000 --> 00:42:44,000
key, 

912
00:42:44,000 --> 00:42:48,000
then all the A words, B words, C words, D words, what not, so 

913
00:42:48,000 --> 00:42:51,000
walking it down the middle and seeing which half you look at, then looking at the 

914
00:42:51,000 --> 00:42:53,000
quarter of there and the eight of there. It's going to very quickly narrow down on 

915
00:42:53,000 --> 00:42:57,000
where it had to be if it was there or if it wasn't there at all. 

916
00:42:57,000 --> 00:43:00,000
So we can implement that in logarithmic time, meaning that if you 1,000 

917
00:43:00,000 --> 00:43:04,000
entries, it takes ten comparisons to find it or agree that it's not there. 

918
00:43:04,000 --> 00:43:08,000
If you double that it will take one more comparison. If 

919
00:43:08,000 --> 00:43:12,000
you go to 2,000, negligibly faster, even numbers in the millions 

920
00:43:12,000 --> 00:43:15,000
are still just a handful 

921
00:43:15,000 --> 00:43:19,000
of comparisons to work it out, and say, 

922
00:43:19,000 --> 00:43:22,000
to the twentieth for example is roughly a million; so 20 comparisons in or out, 

923
00:43:22,000 --> 00:43:25,000
and so it didn't look at a million things a million times over. 

924
00:43:25,000 --> 00:43:27,000
However, 

925
00:43:27,000 --> 00:43:30,000
that didn't improve add, 

926
00:43:30,000 --> 00:43:31,000
why not? 

927
00:43:31,000 --> 00:43:33,000
Why does keeping an assorted order 

928
00:43:33,000 --> 00:43:37,000
not have the same boost for add as it did for get value? 

929
00:43:37,000 --> 00:43:46,000
Can I do the same logarithmic search? You have to 

930
00:43:46,000 --> 00:43:50,000
move all the elements ? 

931
00:43:50,000 --> 00:43:51,000
Yeah, 

932
00:43:51,000 --> 00:43:55,000
you can find the place fast, right, you can say returning the word 

933
00:43:55,000 --> 00:43:59,000
apple it's like oh, okay. I can very quickly narrow in on where it needed to be. If it 

934
00:43:59,000 --> 00:44:02,000
was there I could override it quickly, but in the case where it really needed to do 

935
00:44:02,000 --> 00:44:03,000
an add, 

936
00:44:03,000 --> 00:44:04,000
it's got to move them down. 

937
00:44:04,000 --> 00:44:05,000
So the old add 

938
00:44:05,000 --> 00:44:09,000
did all its work in the search and then just tacked it on the end if it didn't find 

939
00:44:09,000 --> 00:44:12,000
it, or overrode the middle. But the new add 

940
00:44:12,000 --> 00:44:16,000
also has to modify the array to put it in the right place, and putting it in the right place means 

941
00:44:16,000 --> 00:44:18,000
shuffling to make space. 

942
00:44:18,000 --> 00:44:20,000
So even though it found it quickly, 

943
00:44:20,000 --> 00:44:25,000
it was unable to update that quickly, so it still ends up being 

944
00:44:25,000 --> 00:44:26,000
linear in that case. 

945
00:44:26,000 --> 00:44:29,000
Now this actually not a terrible result, 

946
00:44:29,000 --> 00:44:30,000
as it stands 

947
00:44:30,000 --> 00:44:33,000
it's probably much more likely that you're going to load up the data 

948
00:44:33,000 --> 00:44:36,000
structure once and then do a lot 

949
00:44:36,000 --> 00:44:40,000
of searches on it, then infrequent additions. That's a pretty common usage 

950
00:44:40,000 --> 00:44:41,000
pattern for a map. It's kind of like you 

951
00:44:41,000 --> 00:44:43,000
build a dictionary, right, 

952
00:44:43,000 --> 00:44:46,000
loading the dictionary once could be expensive, but then people get tons and tons of 

953
00:44:46,000 --> 00:44:49,000
look ups of words later. But you don't change the definitions a lot later or add a 

954
00:44:49,000 --> 00:44:51,000
lot of new words. 

955
00:44:51,000 --> 00:44:51,000
So 

956
00:44:51,000 --> 00:44:56,000
this might actually be quite tolerable in many situations 

957
00:44:56,000 --> 00:44:59,000
that the building of it was expensive, but it gave 

958
00:44:59,000 --> 00:45:00,000
you 

959
00:45:00,000 --> 00:45:02,000
fast search access. 

960
00:45:02,000 --> 00:45:05,000
But really what you want to do is say, can we get both of those? So driving to say 

961
00:45:05,000 --> 00:45:08,000
what we do with the editor buffer, you say, "Well, what if I just want both of those operations 

962
00:45:08,000 --> 00:45:11,000
to be logarithmic or better, what can I do?" So 

963
00:45:11,000 --> 00:45:12,000
I'll leave you with kind of just 

964
00:45:12,000 --> 00:45:16,000
a little bit of a taste of where we're heading. 

965
00:45:16,000 --> 00:45:19,000
Just to think a little bit about what the link 

966
00:45:19,000 --> 00:45:21,000
list does and doesn't buy us, 

967
00:45:21,000 --> 00:45:24,000
right, that the link list gave us flexibility 

968
00:45:24,000 --> 00:45:29,000
at the editing, the local editing site, but it doesn't give us fast searching. 

969
00:45:29,000 --> 00:45:31,000
Well maybe we can take this link list and try to 

970
00:45:31,000 --> 00:45:33,000
add searching 

971
00:45:33,000 --> 00:45:37,000
to the idea of this flexibility to kind of get the best of both 

972
00:45:37,000 --> 00:45:41,000
worlds. So that's what we're going to start on Monday, building a tree. So if 

973
00:45:41,000 --> 00:45:44,000
you have time today, I'd love to have you come to Truman, and otherwise have a good 

974
00:45:44,000 --> 00:45:45,000
weekend, and 

975
00:45:45,000 --> 00:45:50,000
get your PQ work in.