1 00:00:00,000 --> 00:00:02,000 2 00:00:02,000 --> 00:00:10,000 3 00:00:10,000 --> 00:00:13,000 This presentation is delivered by the Stanford Center for Professional 4 00:00:13,000 --> 00:00:21,000 Development. 5 00:00:21,000 --> 00:00:23,000 Let us 6 00:00:23,000 --> 00:00:25,000 refresh about where we were at the end of Wednesday. 7 00:00:25,000 --> 00:00:29,000 As we had talked about the vector form of the buffer, so the editor buffer that's 8 00:00:29,000 --> 00:00:31,000 backing the 9 00:00:31,000 --> 00:00:34,000 word processing client, 10 00:00:34,000 --> 00:00:37,000 and the two stack version, where we're shuffling stuff, often on the before 11 00:00:37,000 --> 00:00:40,000 and after stacks to get things done, right? 12 00:00:40,000 --> 00:00:43,000 And when we were looking at the main six operations, we're looking at the four 13 00:00:43,000 --> 00:00:47,000 cursor movements and the two editing operations, right, that we had traded off. 14 00:00:47,000 --> 00:00:49,000 Where editing had been slow in the vector, 15 00:00:49,000 --> 00:00:52,000 to where editing was now fast, but then these long distance movements were slow, 16 00:00:52,000 --> 00:00:54,000 because of the shuffling. 17 00:00:54,000 --> 00:00:57,000 And then there was a little discussion at the very end about how this probably 18 00:00:57,000 --> 00:01:00,000 increases our space requirements, because we're now keeping two stacks that are 19 00:01:00,000 --> 00:01:02,000 each capable of holding 20 00:01:02,000 --> 00:01:04,000 the entire contents 21 00:01:04,000 --> 00:01:06,000 as opposed to one vector here 22 00:01:06,000 --> 00:01:09,000 with its slop. 23 00:01:09,000 --> 00:01:10,000 So that was where we were at. 24 00:01:10,000 --> 00:01:13,000 We are still in kind of the pursuit of this Holy Grail of could we 25 00:01:13,000 --> 00:01:16,000 get it to where everything is fast. Is it always a matter of tradeoffs or could we just 26 00:01:16,000 --> 00:01:19,000 invest some more smarts and more 27 00:01:19,000 --> 00:01:20,000 cleverness 28 00:01:20,000 --> 00:01:24,000 and hard work and squeeze it down to where we could get everything to perform 29 00:01:24,000 --> 00:01:26,000 efficiently? So typically when computer scientists say they want something to perform 30 00:01:26,000 --> 00:01:28,000 efficiently, they're hoping for log 31 00:01:28,000 --> 00:01:32,000 in time or faster. Constant time is the best. Log in grows so slowly that in fact, 32 00:01:32,000 --> 00:01:34,000 logarithm time is effectively constant 33 00:01:34,000 --> 00:01:40,000 for all reasonable values of N up to millions and billions. 34 00:01:40,000 --> 00:01:43,000 And so typically anything that is linear or higher is often a weak spot that you are 35 00:01:43,000 --> 00:01:47,000 trying to look at, and certainly things that are quadratic or exponential 36 00:01:47,000 --> 00:01:51,000 are definite opportunities for improvement. 37 00:01:51,000 --> 00:01:53,000 So let's look at this idea of the link list, 38 00:01:53,000 --> 00:01:55,000 so that both 39 00:01:55,000 --> 00:01:56,000 the 40 00:01:56,000 --> 00:01:58,000 vector and the stack 41 00:01:58,000 --> 00:02:03,000 - we're relying on this idea that continuous memory is the 42 00:02:03,000 --> 00:02:06,000 weak link that was causing some problems. In the vector, that shuffling to 43 00:02:06,000 --> 00:02:08,000 move things up and down, 44 00:02:08,000 --> 00:02:12,000 or in the case of the stack version, one side to the other 45 00:02:12,000 --> 00:02:15,000 was part of what we were working up against. 46 00:02:15,000 --> 00:02:18,000 And the link list promises to give us this opportunity to have these each 47 00:02:18,000 --> 00:02:21,000 individual character being managed separately, 48 00:02:21,000 --> 00:02:26,000 and flexibility might help resolve some of our problems here. 49 00:02:26,000 --> 00:02:30,000 SO what we're going to look at is the idea of implementing the buffer using a link 50 00:02:30,000 --> 00:02:32,000 list. So if I have 51 00:02:32,000 --> 00:02:34,000 characters A, B, C, D, E, 52 00:02:34,000 --> 00:02:37,000 I could have a link list here where I'm pointing to the head cell, which is A, 53 00:02:37,000 --> 00:02:40,000 which points to B, and C, and D, and E, and down to the null. 54 00:02:40,000 --> 00:02:43,000 SO what I'm modeling it as in the private data section will be a 55 00:02:43,000 --> 00:02:44,000 little 56 00:02:44,000 --> 00:02:47,000 link list node with a character in there, a pointer to the next one. 57 00:02:47,000 --> 00:02:52,000 And then I'm going to keep two pointers, one to the front-most cell, and then that's going to 58 00:02:52,000 --> 00:02:55,000 model where the cursor is within the buffer. 59 00:02:55,000 --> 00:02:59,000 So rather than doing this like an index, an index was handy in the case of vector where we 60 00:02:59,000 --> 00:03:00,000 had direct access, right? 61 00:03:00,000 --> 00:03:02,000 Given the way the link list works, 62 00:03:02,000 --> 00:03:05,000 knowing that it's at the fifth cell, the fourth cell isn't really very helpful. It would be more 63 00:03:05,000 --> 00:03:09,000 helpful to have the pointer kind of directly in the midst of things to give us direct 64 00:03:09,000 --> 00:03:12,000 access to where the cursor is. 65 00:03:12,000 --> 00:03:14,000 But let's think a little bit about that cursor, 66 00:03:14,000 --> 00:03:17,000 because there's an important decision to be made early about helping to 67 00:03:17,000 --> 00:03:20,000 facilitate the later work we have to do. 68 00:03:20,000 --> 00:03:24,000 I have the contents A, B, C, D, E, 69 00:03:24,000 --> 00:03:28,000 so the cursor is actually between two characters. 70 00:03:28,000 --> 00:03:31,000 If the cursor right now is situated 71 00:03:31,000 --> 00:03:32,000 after the A and before the B, 72 00:03:32,000 --> 00:03:33,000 what I'm 73 00:03:33,000 --> 00:03:36,000 modeling is A cursor B, 74 00:03:36,000 --> 00:03:38,000 C, D, 75 00:03:38,000 --> 00:03:39,000 then 76 00:03:39,000 --> 00:03:42,000 it seems like the obvious two choices for where the cursor might point is to 77 00:03:42,000 --> 00:03:45,000 A or to B. And I 78 00:03:45,000 --> 00:03:50,000 had said in the case of the vector that if it were index zero or index one 79 00:03:50,000 --> 00:03:51,000 that 80 00:03:51,000 --> 00:03:53,000 there wasn't a really strong preference for one over the other. 81 00:03:53,000 --> 00:03:56,000 There is going to be a really good reason to pick one for the link list 82 00:03:56,000 --> 00:03:59,000 over the other. Would you rather point to the A 83 00:03:59,000 --> 00:04:11,000 for where the cursor is or would you rather point to the B? [Inaudible] 84 00:04:11,000 --> 00:04:14,000 Exactly, so the idea is that when the cursor position is there, the next edit 85 00:04:14,000 --> 00:04:18,000 actually goes in between the A and the B. So if I insert the character 86 00:04:18,000 --> 00:04:21,000 Z, it really does go right here. 87 00:04:21,000 --> 00:04:24,000 And if I am going to, if effect, wire that in and splice it in then list, 88 00:04:24,000 --> 00:04:29,000 if I'm pointing to B then I'm kind of hosed, because I'm already one past where I need to be able slice in. Because I need to 89 00:04:29,000 --> 00:04:33,000 know what was behind it to bring it in. So I'm pointing to the character behind the 90 00:04:33,000 --> 00:04:34,000 cursor 91 00:04:34,000 --> 00:04:36,000 gives you access to 92 00:04:36,000 --> 00:04:39,000 wiring this down to here and this into there, and then updating the cursor to 93 00:04:39,000 --> 00:04:40,000 point to the Z 94 00:04:40,000 --> 00:04:44,000 without having to do any of the difficult backward looking backward 95 00:04:44,000 --> 00:04:45,000 traversal. 96 00:04:45,000 --> 00:04:48,000 So that strongly suggests what it is, it's going to point to 97 00:04:48,000 --> 00:04:52,000 A when it's between A and B; B when it's between B and C; and so on. 98 00:04:52,000 --> 00:04:56,000 There is another little point here, which is there are actually five letters 99 00:04:56,000 --> 00:04:59,000 in the cursor, but there are six cursor positions. It could be at the very 100 00:04:59,000 --> 00:05:00,000 beginning, 101 00:05:00,000 --> 00:05:03,000 in between all these, or at the very end. 102 00:05:03,000 --> 00:05:07,000 And the way we've chosen it right now, I can identify all of the five 103 00:05:07,000 --> 00:05:08,000 positions 104 00:05:08,000 --> 00:05:11,000 that have at least one character to the left, so being - 105 00:05:11,000 --> 00:05:12,000 pointing to the A means it's 106 00:05:12,000 --> 00:05:15,000 after A; pointing to the B it's after B; and so on, all the way down to the E. 107 00:05:15,000 --> 00:05:18,000 But I don't have a cursor position for 108 00:05:18,000 --> 00:05:23,000 representing when the cursor is at the very beginning of the buffer. 109 00:05:23,000 --> 00:05:25,000 So 110 00:05:25,000 --> 00:05:28,000 one strategy I could use, right, is to say I'll just use the special cursor 111 00:05:28,000 --> 00:05:29,000 position of null. But 112 00:05:29,000 --> 00:05:32,000 I need one more pointer value to hold onto, 113 00:05:32,000 --> 00:05:36,000 and I could make a special case out of this because there's null. Once I've done that though, 114 00:05:36,000 --> 00:05:40,000 I've actually committed myself to this thing that's going to require all of my code to be managing a 115 00:05:40,000 --> 00:05:41,000 special case of, 116 00:05:41,000 --> 00:05:43,000 well when the cursor's null do something a little different than when 117 00:05:43,000 --> 00:05:46,000 the cursor is pointing to some cell. I'm going to 118 00:05:46,000 --> 00:05:50,000 show you a way that we can kind of 119 00:05:50,000 --> 00:05:54,000 avoid the need for making a special case out of it, 120 00:05:54,000 --> 00:05:55,000 by 121 00:05:55,000 --> 00:05:58,000 wasting a little memory and reorganizing our list 122 00:05:58,000 --> 00:06:01,000 to allow for there to be a sixth cell, 123 00:06:01,000 --> 00:06:04,000 and extra cell that we call the dummy cell. 124 00:06:04,000 --> 00:06:07,000 So in this case that cell would go at the very front. 125 00:06:07,000 --> 00:06:10,000 It would have no character, just let the character field be unstatused. It's 126 00:06:10,000 --> 00:06:13,000 not really a character. What it is, is just a placeholder. 127 00:06:13,000 --> 00:06:16,000 And when I created the buffer to begin with, even when it's empty, it's always going to have 128 00:06:16,000 --> 00:06:20,000 that cell. I'm going to create that cell in advance, I'm going to have it sitting ready and waiting, and that cell 129 00:06:20,000 --> 00:06:24,000 never gets deleted or moved or changed. In fact, the list always, the head 130 00:06:24,000 --> 00:06:28,000 of this will always point to the same cell from the beginning of this object's lifetime 131 00:06:28,000 --> 00:06:32,000 to the end. So it will never change this at all, and the cells that follow are 132 00:06:32,000 --> 00:06:35,000 the ones that are going to be changing and growing and rearranging in 133 00:06:35,000 --> 00:06:38,000 response to editing commands. 134 00:06:38,000 --> 00:06:40,000 What this is going to buy us is a couple of things. First off, it's going to make it 135 00:06:40,000 --> 00:06:42,000 so there is a sixth cursor position. 136 00:06:42,000 --> 00:06:45,000 So now when the cursor is at the very beginning, 137 00:06:45,000 --> 00:06:47,000 we can set the cursor to point to this dummy cell. 138 00:06:47,000 --> 00:06:48,000 Okay 139 00:06:48,000 --> 00:06:51,000 that's one thing that we solve by doing that. 140 00:06:51,000 --> 00:06:55,000 Another thing that we've solved is actually we have made all the other 141 00:06:55,000 --> 00:06:57,000 cells on the list, the ones that are being edited and manipulated. 142 00:06:57,000 --> 00:07:00,000 We've made them all totally symmetric. 143 00:07:00,000 --> 00:07:02,000 Previously there was always going to be a little bit of a special case about, well if you're 144 00:07:02,000 --> 00:07:06,000 the front of the list, we're seeing that sometimes we're doing inserts on the 145 00:07:06,000 --> 00:07:08,000 stack in queue link list. That 146 00:07:08,000 --> 00:07:12,000 inserting later in the list involves kind of wiring in two pointers; one to command and one 147 00:07:12,000 --> 00:07:13,000 to go out. 148 00:07:13,000 --> 00:07:16,000 But there was a special case that oh, if you're the very first cell of the list, then 149 00:07:16,000 --> 00:07:19,000 your outpointer is the same, but your incoming pointer is resetting the head. 150 00:07:19,000 --> 00:07:22,000 That piece has actually gone away now, 151 00:07:22,000 --> 00:07:26,000 that now every cell in there always has a predecessor. It always has a previous, 152 00:07:26,000 --> 00:07:29,000 right? Something behind it so that actually they will always be 153 00:07:29,000 --> 00:07:31,000 pulling the pointer in and pulling the pointer out 154 00:07:31,000 --> 00:07:34,000 for every cell with character data that's being modified. 155 00:07:34,000 --> 00:07:37,000 And that means some of that extra little handing of oh, if you're the 156 00:07:37,000 --> 00:07:39,000 very first cell, 157 00:07:39,000 --> 00:07:40,000 have gone away. 158 00:07:40,000 --> 00:07:44,000 The very first cell is always this kind of dummy cell and doesn't require us to go 159 00:07:44,000 --> 00:07:46,000 out of our way to do anything special for it. 160 00:07:46,000 --> 00:07:49,000 So this is going to solve a bunch of problems for us, and it's kind of a neat technique. This thing 161 00:07:49,000 --> 00:07:51,000 is used actually 162 00:07:51,000 --> 00:07:52,000 fairly commonly 163 00:07:52,000 --> 00:07:57,000 in situations just managing a link list just to buy yourself some symmetry in the 164 00:07:57,000 --> 00:08:02,000 remaining cells. 165 00:08:02,000 --> 00:08:05,000 So let me look at some code with you, and I'm going to actually draw some pictures as I 166 00:08:05,000 --> 00:08:06,000 go. I'm 167 00:08:06,000 --> 00:08:10,000 not going to write this code in real time, because I think it's actually more important to 168 00:08:10,000 --> 00:08:11,000 see it 169 00:08:11,000 --> 00:08:15,000 acting on things than it is to see me typing out the characters. 170 00:08:15,000 --> 00:08:18,000 But the mechanism, right, in the case of the buffer here 171 00:08:18,000 --> 00:08:21,000 is it's going to have a head point, 172 00:08:21,000 --> 00:08:23,000 which points to the front most cell; and a 173 00:08:23,000 --> 00:08:24,000 cursor, 174 00:08:24,000 --> 00:08:28,000 which points to the character that proceeds the cursor. So if right now the 175 00:08:28,000 --> 00:08:31,000 contents of the buffer 176 00:08:31,000 --> 00:08:34,000 are 177 00:08:34,000 --> 00:08:37,000 the contents A, D, C. 178 00:08:37,000 --> 00:08:39,000 And let's say that right now 179 00:08:39,000 --> 00:08:40,000 the 180 00:08:40,000 --> 00:08:42,000 cursor is pointing to A. 181 00:08:42,000 --> 00:08:45,000 Actually, I need my 182 00:08:45,000 --> 00:08:50,000 dummy cell. Let me move everybody down one, so you just have dummy 183 00:08:50,000 --> 00:08:52,000 cell and my A and my B. 184 00:08:52,000 --> 00:08:56,000 And so the cursor right now is at the very front of the buffer. 185 00:08:56,000 --> 00:08:59,000 This is kind of what it looks like. It has two characters, the A and the B. 186 00:08:59,000 --> 00:09:03,000 I'm going to trace through the process of inserting a new character 187 00:09:03,000 --> 00:09:04,000 into the buffer. It 188 00:09:04,000 --> 00:09:07,000 is that my local variable CP 189 00:09:07,000 --> 00:09:11,000 is going to point to a new cell allocated out in the heap. 190 00:09:11,000 --> 00:09:15,000 Let's say the character I'm going to insert is the Z, so I'll 191 00:09:15,000 --> 00:09:16,000 assign 192 00:09:16,000 --> 00:09:18,000 CH to the character, the slot right there. 193 00:09:18,000 --> 00:09:20,000 And then the next for this 194 00:09:20,000 --> 00:09:24,000 gets to - what the cursor is - kind of following the cursor. So the 195 00:09:24,000 --> 00:09:26,000 cursor is pointing to the cell before it, 196 00:09:26,000 --> 00:09:29,000 and the next field of that cell tells us what follows it; 197 00:09:29,000 --> 00:09:31,000 so if I trace my cursor's 198 00:09:31,000 --> 00:09:33,000 next field, it points to A, 199 00:09:33,000 --> 00:09:35,000 and so I'm going to go ahead and 200 00:09:35,000 --> 00:09:41,000 set up this new cell's next field to also point to A. 201 00:09:41,000 --> 00:09:44,000 So now I've got two different ways to get to the A cell, tracing off the dummy cell or 202 00:09:44,000 --> 00:09:46,000 tracing off this new cell. 203 00:09:46,000 --> 00:09:48,000 And then I 204 00:09:48,000 --> 00:09:51,000 update the cursor's next field to point to CP. 205 00:09:51,000 --> 00:09:55,000 So I just copied right there that CP's next field was pointing to A. I copied it down here 206 00:09:55,000 --> 00:09:57,000 to kind of do the splice out. 207 00:09:57,000 --> 00:10:02,000 Now I'm going to do the splice in, 208 00:10:02,000 --> 00:10:05,000 causing it to no longer point to A, 209 00:10:05,000 --> 00:10:06,000 but instead to 210 00:10:06,000 --> 00:10:08,000 point to my new cell down here, CP. 211 00:10:08,000 --> 00:10:10,000 And then I update the cursor 212 00:10:10,000 --> 00:10:13,000 to point to this new cell. That just means that subsequent inserts, like the 213 00:10:13,000 --> 00:10:18,000 C is kind of behind the cursor after I've made this 214 00:10:18,000 --> 00:10:21,000 215 00:10:21,000 --> 00:10:23,000 edit. 216 00:10:23,000 --> 00:10:25,000 That goes away when that one ends, 217 00:10:25,000 --> 00:10:28,000 and so now the new structure that I have here is 218 00:10:28,000 --> 00:10:31,000 head still points to the dummy cell that never changes. 219 00:10:31,000 --> 00:10:33,000 Dummy cell now points to Z, 220 00:10:33,000 --> 00:10:36,000 Z points to A, Z points to B, 221 00:10:36,000 --> 00:10:41,000 and then the cursor in this case is between the Z and the A, 222 00:10:41,000 --> 00:10:44,000 by virtue of the fact that the cursor is pointing to Z. 223 00:10:44,000 --> 00:10:50,000 A subsequent one would kind of do the same thing. If I typed out ZY 224 00:10:50,000 --> 00:10:51,000 after I did this. We'd 225 00:10:51,000 --> 00:10:55,000 make a new cell over here. It would get what 226 00:10:55,000 --> 00:10:58,000 was currently coming out of the cursor's field. The next field, right, we would 227 00:10:58,000 --> 00:11:02,000 update cursor's next field to point 228 00:11:02,000 --> 00:11:05,000 to this new one. And then we would update cursor 229 00:11:05,000 --> 00:11:08,000 to point to this guy. And 230 00:11:08,000 --> 00:11:10,000 so now following the list, right, dummy cell, 231 00:11:10,000 --> 00:11:11,000 Z, 232 00:11:11,000 --> 00:11:16,000 Y, A, B, cursor pointing to the Y, which is the 233 00:11:16,000 --> 00:11:20,000 character directly to the left of the cursor. 234 00:11:20,000 --> 00:11:22,000 And so inserting in any position now, 235 00:11:22,000 --> 00:11:25,000 front cell, middle cell, end cell, 236 00:11:25,000 --> 00:11:29,000 doesn't require any special case handling. So you don't see any if of things. 237 00:11:29,000 --> 00:11:33,000 And you notice that you'll never see an update to the head directly here. 238 00:11:33,000 --> 00:11:37,000 That the head was set up to point to the dummy cell in the constructor 239 00:11:37,000 --> 00:11:39,000 and never needs any adjusting. 240 00:11:39,000 --> 00:11:42,000 It's only the cells after it that requires 241 00:11:42,000 --> 00:11:43,000 242 00:11:43,000 --> 00:11:44,000 the new 243 00:11:44,000 --> 00:11:47,000 arrangement of the pointers coming in and 244 00:11:47,000 --> 00:11:49,000 out. When I'm ready to delete a cell, 245 00:11:49,000 --> 00:11:53,000 then the mechanism for delete in the buffer is to delete the thing that follows 246 00:11:53,000 --> 00:11:55,000 the cursor. So if I have Z, Y, A, B, 247 00:11:55,000 --> 00:11:58,000 this is the character that delete is supposed to delete, the one after the 248 00:11:58,000 --> 00:11:59,000 cursor. 249 00:11:59,000 --> 00:12:00,000 And so, 250 00:12:00,000 --> 00:12:04,000 the first thing it looks at is to make sure that we're not already at the end. 251 00:12:04,000 --> 00:12:07,000 In the case of the link list form of this, if the cursor was pointing to 252 00:12:07,000 --> 00:12:09,000 the last most cell, 253 00:12:09,000 --> 00:12:12,000 and then we looked at it's next field and saw that it was null, that would be a clue. Oh, it's 254 00:12:12,000 --> 00:12:15,000 pointing to the very last cell, there's nothing that follows it. So we have a 255 00:12:15,000 --> 00:12:17,000 quick test to make sure that there's something there to delete. 256 00:12:17,000 --> 00:12:19,000 There is something following the cursor, and 257 00:12:19,000 --> 00:12:24,000 in this case since the cursor is pointing to the Y, we're good. It says look at its 258 00:12:24,000 --> 00:12:25,000 next field, it points to A. 259 00:12:25,000 --> 00:12:27,000 It says okay, 260 00:12:27,000 --> 00:12:29,000 call this thing old 261 00:12:29,000 --> 00:12:30,000 and point to that A. 262 00:12:30,000 --> 00:12:33,000 Now do a wire around it. 263 00:12:33,000 --> 00:12:34,000 So take the cursor's 264 00:12:34,000 --> 00:12:36,000 next field, and 265 00:12:36,000 --> 00:12:39,000 so this thing is where I'm targeting my overwrite, 266 00:12:39,000 --> 00:12:44,000 and copy out of the old its next. It's 267 00:12:44,000 --> 00:12:49,000 no longer pointing to the A, but instead it's 268 00:12:49,000 --> 00:12:50,000 pointing to the B. 269 00:12:50,000 --> 00:12:52,000 And deleting the old, which 270 00:12:52,000 --> 00:12:54,000 causes this piece of memory to 271 00:12:54,000 --> 00:12:56,000 be reclaimed, 272 00:12:56,000 --> 00:13:01,000 and now the contents of my buffer have been shifted over to 273 00:13:01,000 --> 00:13:06,000 where it goes Z, Y, and straight to D. The cursor in this case doesn't move. It 274 00:13:06,000 --> 00:13:07,000 actually deleted to the right, and so 275 00:13:07,000 --> 00:13:11,000 whatever was to the left of the cursor just stays in place. 276 00:13:11,000 --> 00:13:15,000 So again, no need to update anything special for the first or the last or any of those 277 00:13:15,000 --> 00:13:15,000 cells, 278 00:13:15,000 --> 00:13:19,000 but the symmetry of all the cells past the dummy kind of buys us some convenience 279 00:13:19,000 --> 00:13:22,000 of handling it. What does a dummy cell look like? The 280 00:13:22,000 --> 00:13:26,000 dummy cell looks like just any other cell. It's actually a cell, T, and what you do is 281 00:13:26,000 --> 00:13:29,000 you just don't assign the Christian field, the character field, because it has nothing in it. 282 00:13:29,000 --> 00:13:32,000 So it actually looks just like all the others, it's just another node in the 283 00:13:32,000 --> 00:13:33,000 chain. 284 00:13:33,000 --> 00:13:36,000 But this one, you know you put there purposefully just as the placeholder. 285 00:13:36,000 --> 00:13:40,000 So it doesn't contain any character data. So if you're ready to print the 286 00:13:40,000 --> 00:13:42,000 contents of the buffer, you have to remember the dummy cell is there, 287 00:13:42,000 --> 00:13:43,000 and just skip over it. You 288 00:13:43,000 --> 00:13:45,000 tend to start your list 289 00:13:45,000 --> 00:13:49,000 at heads next, the first interesting cell to look at, then work 290 00:13:49,000 --> 00:13:51,000 your way down to the end. So it's 291 00:13:51,000 --> 00:13:54,000 just like all the others, but you just know 292 00:13:54,000 --> 00:14:00,000 that the character data here is useless, it's not important. You didn't even write anything in there. When you delete, you just go left? It turns out 293 00:14:00,000 --> 00:14:02,000 the delete in this 294 00:14:02,000 --> 00:14:04,000 case goes forward. 295 00:14:04,000 --> 00:14:07,000 So you can imagine that in some other world it does, but all the ones we 296 00:14:07,000 --> 00:14:11,000 have done so far have always deleted forward. So the delete - for example, the stack case, 297 00:14:11,000 --> 00:14:13,000 grabbed from the after stack 298 00:14:13,000 --> 00:14:16,000 - it's done that way, because it happens to simplify some other 299 00:14:16,000 --> 00:14:19,000 things that you can imagine that delete and reverse. What would it take? Well, you'd have to back 300 00:14:19,000 --> 00:14:22,000 the cursor up. And we're going to find out pretty soon that would make it 301 00:14:22,000 --> 00:14:24,000 challenging for the link list. So partly we've picked a delete that 302 00:14:24,000 --> 00:14:26,000 made certain things work out a 303 00:14:26,000 --> 00:14:29,000 little better for us. 304 00:14:29,000 --> 00:14:31,000 So both of these operations 305 00:14:31,000 --> 00:14:33,000 are O of 306 00:14:33,000 --> 00:14:33,000 1; 307 00:14:33,000 --> 00:14:37,000 this is where the link list really shines, right, on this sort of stuff. Because 308 00:14:37,000 --> 00:14:41,000 if you have access to the point at which you want to make a modification, so if you are already kind of 309 00:14:41,000 --> 00:14:44,000 in the midst of the thing you're trying to rearrange, 310 00:14:44,000 --> 00:14:47,000 it's often to the link list's advantage, because it actually can do these this rewiring 311 00:14:47,000 --> 00:14:51,000 in the local context. And even if there's thousands of characters that 312 00:14:51,000 --> 00:14:52,000 follow the cursor. 313 00:14:52,000 --> 00:14:55,000 It doesn't matter. And there can be thousands of characters preceding the cursor 314 00:14:55,000 --> 00:14:56,000 for that matter, too. 315 00:14:56,000 --> 00:15:00,000 The idea that there's a much larger context that's working and doesn't impact 316 00:15:00,000 --> 00:15:03,000 its performance; it's really just doing this local little rearrangement of the pointers. 317 00:15:03,000 --> 00:15:05,000 And that's a real strength of the link list design 318 00:15:05,000 --> 00:15:11,000 is that flexibility that doesn't rely on everything living in contiguous memory. 319 00:15:11,000 --> 00:15:15,000 So let's talk about the movement options for this guy. 320 00:15:15,000 --> 00:15:19,000 So right now I have the contents 321 00:15:19,000 --> 00:15:21,000 Z, Y with 322 00:15:21,000 --> 00:15:22,000 the cursor 323 00:15:22,000 --> 00:15:27,000 between that, and the B following that. So three characters worth, 324 00:15:27,000 --> 00:15:29,000 325 00:15:29,000 --> 00:15:30,000 and then 326 00:15:30,000 --> 00:15:31,000 the 327 00:15:31,000 --> 00:15:33,000 initial two operations 328 00:15:33,000 --> 00:15:36,000 are pretty simple. The later two are going to be a little bit messier. So let's look at 329 00:15:36,000 --> 00:15:37,000 the easy ones first. 330 00:15:37,000 --> 00:15:40,000 Moving the cursor to the beginning, so that jump that pulls you back to the very 331 00:15:40,000 --> 00:15:41,000 beginning, 332 00:15:41,000 --> 00:15:44,000 very easy in a link list form. If I want to move the cursor to the beginning, all 333 00:15:44,000 --> 00:15:47,000 I need to do is take that cursor pointer, 334 00:15:47,000 --> 00:15:49,000 and reset it 335 00:15:49,000 --> 00:15:51,000 to the head, which is where the dummy cell is. And so, by 336 00:15:51,000 --> 00:15:54,000 virtue of doing that, 337 00:15:54,000 --> 00:15:57,000 338 00:15:57,000 --> 00:15:59,000 the character it points to is the one that's to the left, 339 00:15:59,000 --> 00:16:02,000 right of that, and to the left of that the dummy cell is one we're not 340 00:16:02,000 --> 00:16:05,000 counting. In fact, it's like nothingness that's there, 341 00:16:05,000 --> 00:16:07,000 and that means that the first character that follows the cursor is Z, which 342 00:16:07,000 --> 00:16:11,000 is the initial character of the buffer. So that reset, easy to do, we have 343 00:16:11,000 --> 00:16:13,000 easy access to the front of the list, 344 00:16:13,000 --> 00:16:14,000 and so 345 00:16:14,000 --> 00:16:16,000 no 346 00:16:16,000 --> 00:16:17,000 special 347 00:16:17,000 --> 00:16:18,000 crazy code required. 348 00:16:18,000 --> 00:16:21,000 Moving the cursor forward, also an easy thing to do, link 349 00:16:21,000 --> 00:16:24,000 list are designed to make it easy to work your way from the front to 350 00:16:24,000 --> 00:16:28,000 the back. And so moving the cursor forward is advancing that cursor by one step. So 351 00:16:28,000 --> 00:16:32,000 in the array form of this, what was a cursor plus, plus, the 352 00:16:32,000 --> 00:16:36,000 comparable form of that in the link list is cursor equals cursor next. 353 00:16:36,000 --> 00:16:37,000 It has a little 354 00:16:37,000 --> 00:16:40,000 if test there, all of the versions actually have something that is 355 00:16:40,000 --> 00:16:42,000 comparable to this. If you're already at the end, then there's 356 00:16:42,000 --> 00:16:46,000 nothing to be done, so you don't advance the cursor off into space. 357 00:16:46,000 --> 00:16:47,000 You 358 00:16:47,000 --> 00:16:51,000 check to make sure that there is something for it to point to. So in this case, 359 00:16:51,000 --> 00:16:54,000 seeing if the cursor's next field is null, it's not, it points to Z. Then we go 360 00:16:54,000 --> 00:16:58,000 ahead and update the cursor to there, 361 00:16:58,000 --> 00:17:00,000 which has the effect of 362 00:17:00,000 --> 00:17:03,000 changing things to where the Z is the character 363 00:17:03,000 --> 00:17:07,000 to the left of the cursor, and then the YB follows it. 364 00:17:07,000 --> 00:17:10,000 So moving this way, getting back to the beginning, and kind of walking our way down is kind of easy 365 00:17:10,000 --> 00:17:11,000 to do. 366 00:17:11,000 --> 00:17:15,000 Now we start to see where the link list causes us a little bit of grief, 367 00:17:15,000 --> 00:17:18,000 moving the cursor to the end. 368 00:17:18,000 --> 00:17:21,000 So now, starting at the beginning or the middle, or wherever I am, 369 00:17:21,000 --> 00:17:25,000 I want to advance my way to the end. I don't know where the end is. 370 00:17:25,000 --> 00:17:28,000 Link lists in general don't keep track of that, 371 00:17:28,000 --> 00:17:31,000 that in the simple form of the single link list, I'm going to have to work to 372 00:17:31,000 --> 00:17:34,000 find it. So I'm going to take where I am and walk my way down. 373 00:17:34,000 --> 00:17:37,000 And this one actually just makes use of an existing 374 00:17:37,000 --> 00:17:40,000 public number function to move cursor forward. It's like while the cursor next does 375 00:17:40,000 --> 00:17:44,000 not equal null, which is to say while we are not pointing at the very last cell of the link list, 376 00:17:44,000 --> 00:17:46,000 then keep going. 377 00:17:46,000 --> 00:17:49,000 So advance cursor equals cursor. 378 00:17:49,000 --> 00:17:50,000 Some will say, 379 00:17:50,000 --> 00:17:54,000 "Is the cursor's next field null?" No, then set the cursor to be the cursor's 380 00:17:54,000 --> 00:17:56,000 next field. 381 00:17:56,000 --> 00:17:58,000 Is the cursor's next field null? 382 00:17:58,000 --> 00:17:59,000 383 00:17:59,000 --> 00:18:02,000 No it's not, so advance the cursor to the next field. 384 00:18:02,000 --> 00:18:04,000 Is the cursor's next field null? Yes. 385 00:18:04,000 --> 00:18:06,000 So that would 386 00:18:06,000 --> 00:18:08,000 the last thing in the buffer, 387 00:18:08,000 --> 00:18:10,000 and so if we have 388 00:18:10,000 --> 00:18:13,000 hundreds or thousands or 389 00:18:13,000 --> 00:18:15,000 tens of characters, whichever it is, right, 390 00:18:15,000 --> 00:18:18,000 and depending on how close we already were, it will walk through 391 00:18:18,000 --> 00:18:21,000 everything from where the current editing position was to find the end of the 392 00:18:21,000 --> 00:18:22,000 buffer 393 00:18:22,000 --> 00:18:25,000 one by one, working it's way down. 394 00:18:25,000 --> 00:18:26,000 Even more painful 395 00:18:26,000 --> 00:18:28,000 operation 396 00:18:28,000 --> 00:18:32,000 is the simple one of just moving backwards. 397 00:18:32,000 --> 00:18:34,000 If I'm pointing right now to that B, 398 00:18:34,000 --> 00:18:37,000 and I'd like to get back to that Y, the 399 00:18:37,000 --> 00:18:41,000 link list is oblivious about how you got there. It knows where to go from 400 00:18:41,000 --> 00:18:45,000 here, but that backing up is not supported in the simple form. 401 00:18:45,000 --> 00:18:48,000 In order to find out what preceded the V, 402 00:18:48,000 --> 00:18:51,000 our only option right now is to go back to the beginning, 403 00:18:51,000 --> 00:18:53,000 and find it. 404 00:18:53,000 --> 00:18:56,000 So starting this pointer CP at the head, 405 00:18:56,000 --> 00:18:57,000 and saying, 406 00:18:57,000 --> 00:18:58,000 do you point to; 407 00:18:58,000 --> 00:19:01,000 is your next field the cursor? 408 00:19:01,000 --> 00:19:04,000 And this one says, no. And then it says, well okay advance. Does your next field point to 409 00:19:04,000 --> 00:19:08,000 the cursor? No it's not. Go to this one, does your next field point to the same place 410 00:19:08,000 --> 00:19:12,000 as the cursor, yes. Wise next field, points to the same place the cursor 411 00:19:12,000 --> 00:19:13,000 does. 412 00:19:13,000 --> 00:19:18,000 That must mean that Y was the one that preceded the cursor in the link 413 00:19:18,000 --> 00:19:22,000 list. So after having gone back to the beginning and walked all the way down, especially if I'm near 414 00:19:22,000 --> 00:19:24,000 the end, that's a long traversal. 415 00:19:24,000 --> 00:19:26,000 And when I get there, I can say "Oh yeah 416 00:19:26,000 --> 00:19:34,000 that's the one, back it up to there." 417 00:19:34,000 --> 00:19:37,000 Walking it all the way down; 418 00:19:37,000 --> 00:19:41,000 some good link list coding, good practice in there, but 419 00:19:41,000 --> 00:19:42,000 again a funny, 420 00:19:42,000 --> 00:19:44,000 funny set of trade offs, right? 421 00:19:44,000 --> 00:19:47,000 Does anybody have any questions about any part of the code? 422 00:19:47,000 --> 00:19:49,000 At the end 423 00:19:49,000 --> 00:19:51,000 could you just 424 00:19:51,000 --> 00:19:53,000 save the address of the pointer, and then the 425 00:19:53,000 --> 00:19:54,000 person 426 00:19:54,000 --> 00:19:56,000 who wants to go to the end could just 427 00:19:56,000 --> 00:19:59,000 access that? That was a great idea. It's like, so what about 428 00:19:59,000 --> 00:20:02,000 improving this? Right now all we have is a pointer to the front, and each 429 00:20:02,000 --> 00:20:06,000 pointer has only a pointer to the next. It's like, maybe we need to add some 430 00:20:06,000 --> 00:20:07,000 more stuff, track 431 00:20:07,000 --> 00:20:10,000 some more things. What if we tracked the 432 00:20:10,000 --> 00:20:12,000 tail? Would that help us? 433 00:20:12,000 --> 00:20:16,000 If we tracked the tail, we could move to the end quickly. Would 434 00:20:16,000 --> 00:20:19,000 that help us moving backwards? 435 00:20:19,000 --> 00:20:22,000 No. It solves one of our problems though, so you're on to something. 436 00:20:22,000 --> 00:20:25,000 Let's look at what we've got, and then we'll start thinking about ways to fix it, because that's going to be 437 00:20:25,000 --> 00:20:27,000 one of the pieces we're going to think about. 438 00:20:27,000 --> 00:20:29,000 So if I move to link list 439 00:20:29,000 --> 00:20:31,000 relative to what I have, 440 00:20:31,000 --> 00:20:33,000 again, it's like a shell game. The old ends moved. 441 00:20:33,000 --> 00:20:36,000 It used to be that editing was slow and we didn't like that. 442 00:20:36,000 --> 00:20:38,000 Then we made big movement fast, 443 00:20:38,000 --> 00:20:41,000 and now we have this funny thing were moving in certain ways is easy and 444 00:20:41,000 --> 00:20:45,000 moving other ways is bad. So moving 445 00:20:45,000 --> 00:20:46,000 down the document 446 00:20:46,000 --> 00:20:49,000 is good, moving backwards in the document not good. 447 00:20:49,000 --> 00:20:53,000 Moving back to the beginning is good, but moving to the end is bad. 448 00:20:53,000 --> 00:20:56,000 But then editing in all situations is good, 449 00:20:56,000 --> 00:20:59,000 another sort of quirky set of things. It just fells like it's like that game where the 450 00:20:59,000 --> 00:21:00,000 451 00:21:00,000 --> 00:21:03,000 gophers pop there heads up, and you pound on them. And they just show up somewhere else. As 452 00:21:03,000 --> 00:21:06,000 soon as you get one thing fixed, it seems like something else had 453 00:21:06,000 --> 00:21:08,000 to give. 454 00:21:08,000 --> 00:21:11,000 This, I think, would actually be my ideal editor, because it turns out I 455 00:21:11,000 --> 00:21:14,000 have exactly this working style, which is like, I'll be down at the end of 456 00:21:14,000 --> 00:21:17,000 the document, and I'll have to be generating new things. And 457 00:21:17,000 --> 00:21:20,000 I'm getting tired and I don't like writing, and what I keep wanting to 458 00:21:20,000 --> 00:21:24,000 do is I go back to the beginning. And I read that again, because I worked on that the most and 459 00:21:24,000 --> 00:21:27,000 it's nice. And so I go back to the beginning, and I'm like oh yeah, look at this. Look at how lovely this 460 00:21:27,000 --> 00:21:28,000 is, 461 00:21:28,000 --> 00:21:31,000 and then I never want to go back to the end where all the trouble is. So I just 462 00:21:31,000 --> 00:21:34,000 keep going back to the beginning and editing in the front, and then slowly get 463 00:21:34,000 --> 00:21:37,000 back to the end. And then I don't like it again, and then I go back to the beginning. 464 00:21:37,000 --> 00:21:39,000 But I never actually willingly go to the end 465 00:21:39,000 --> 00:21:42,000 to face my fears. So 466 00:21:42,000 --> 00:21:46,000 this is the editor I need. 467 00:21:46,000 --> 00:21:47,000 Also 468 00:21:47,000 --> 00:21:51,000 has a bit of kicker in terms of space. 469 00:21:51,000 --> 00:21:56,000 So if the space for a character is 1 byte, which it typically is, 470 00:21:56,000 --> 00:21:58,000 and the space for a pointer is 4 bytes, then 471 00:21:58,000 --> 00:22:03,000 each of those link list cells is costing us 5 bytes, right? 472 00:22:03,000 --> 00:22:03,000 Which is 473 00:22:03,000 --> 00:22:07,000 sort of five times the amount of storage that the character itself 474 00:22:07,000 --> 00:22:10,000 would have taken alone, 475 00:22:10,000 --> 00:22:14,000 so in the vector it's fairly tight, a little bit more going up on that. But we've actually kind of blown 476 00:22:14,000 --> 00:22:18,000 it up by another factor of two even beyond the stack, which is not quite a 477 00:22:18,000 --> 00:22:22,000 good direction to be going. 478 00:22:22,000 --> 00:22:24,000 So let's talk about what we can do to fix that. 479 00:22:24,000 --> 00:22:27,000 I'm not going to go through the process of this. I'm just going to kind of give you a thought 480 00:22:27,000 --> 00:22:29,000 exercise, so you can just kind of 481 00:22:29,000 --> 00:22:30,000 visualize it. 482 00:22:30,000 --> 00:22:34,000 What we have here, right, is an information problem. We 483 00:22:34,000 --> 00:22:37,000 only are keeping track of certain 484 00:22:37,000 --> 00:22:39,000 pathways through that link list. 485 00:22:39,000 --> 00:22:42,000 What if we actually increased our access to the list? 486 00:22:42,000 --> 00:22:46,000 We don't need to have the full access of a vector, being able to know the nth 487 00:22:46,000 --> 00:22:49,000 character by number is not actually as far as we need to go. But we 488 00:22:49,000 --> 00:22:53,000 do just need a little bit more coordination between a cell and its neighbors. 489 00:22:53,000 --> 00:22:56,000 So the two things that would buy us out of a lot of the things that we're seeing 490 00:22:56,000 --> 00:22:58,000 here is if we added a tail pointer. 491 00:22:58,000 --> 00:23:01,000 So if I just add a tail pointer that points to the last cell, then I can move to the end 492 00:23:01,000 --> 00:23:02,000 quickly. 493 00:23:02,000 --> 00:23:05,000 You say cursor equals tail; the same that cursor equals head. So what was an O of N 494 00:23:05,000 --> 00:23:07,000 problem now is an O of 1. 495 00:23:07,000 --> 00:23:08,000 The backing up, 496 00:23:08,000 --> 00:23:12,000 not as easily solved by just having one thing that's going on; 497 00:23:12,000 --> 00:23:13,000 I also am going to need, 498 00:23:13,000 --> 00:23:17,000 on a per cell basis this ability to move backwards. 499 00:23:17,000 --> 00:23:19,000 500 00:23:19,000 --> 00:23:22,000 Well, it's just symmetry, right? If A knows where B is, and B knows where C is, and C knows where D is, what's to stop 501 00:23:22,000 --> 00:23:25,000 it from also just tracking the information going in the other way, 502 00:23:25,000 --> 00:23:30,000 so that D knows that it came from C, and C knows it came from B, and so on. 503 00:23:30,000 --> 00:23:35,000 And so having this completely symmetric parallel set of links that just run the other 504 00:23:35,000 --> 00:23:35,000 direction, 505 00:23:35,000 --> 00:23:39,000 then buys you the ability to move backwards with ease. 506 00:23:39,000 --> 00:23:42,000 The other thing that this would give you is the tail pointer 507 00:23:42,000 --> 00:23:46,000 is almost not quite enough to give you the exact place of the end. It gives it to 508 00:23:46,000 --> 00:23:47,000 you, but you also have to be 509 00:23:47,000 --> 00:23:49,000 careful about what is it going to take to update it? 510 00:23:49,000 --> 00:23:53,000 And so inserting can easily update the tail pointer. It's the deleting that kind of 511 00:23:53,000 --> 00:23:55,000 would get you into trouble. If you were 512 00:23:55,000 --> 00:23:58,000 here and deleting, you would have to kind of keep track of making sure if you were deleting 513 00:23:58,000 --> 00:24:01,000 the thing that was the tail pointer, so you have to be a little about making sure 514 00:24:01,000 --> 00:24:04,000 you don't lose track of your tail. 515 00:24:04,000 --> 00:24:06,000 But once I have both of these in place, 516 00:24:06,000 --> 00:24:09,000 I suddenly have something that can move 517 00:24:09,000 --> 00:24:10,000 forward easily, O of 1; 518 00:24:10,000 --> 00:24:15,000 move to the front easily, O of 1; move to the tail easily, O of 1; move backwards O of 1. 519 00:24:15,000 --> 00:24:20,000 So I can make all of my operations 520 00:24:20,000 --> 00:24:22,000 O of 1. 521 00:24:22,000 --> 00:24:24,000 Deleting, you're still going to have to delete the individual cells, but there's one place where 522 00:24:24,000 --> 00:24:27,000 it suffers, but that's a much more rare occurrence. 523 00:24:27,000 --> 00:24:31,000 And I can get all six; it moves fast, in every dimension; 524 00:24:31,000 --> 00:24:32,000 inserts and deletes 525 00:24:32,000 --> 00:24:35,000 all good. 526 00:24:35,000 --> 00:24:37,000 And there are two places we paid for it. 527 00:24:37,000 --> 00:24:40,000 One is certainly the memory. 528 00:24:40,000 --> 00:24:45,000 So a 4 byte pointer one direction, a 4 byte pointer another direction, 529 00:24:45,000 --> 00:24:49,000 plus the 1 byte for the character means we have a 9 byte per 530 00:24:49,000 --> 00:24:50,000 character 531 00:24:50,000 --> 00:24:52,000 storage usage. 532 00:24:52,000 --> 00:24:55,000 And the other way, which is a little bit hard to represent with some number 533 00:24:55,000 --> 00:24:57,000 in a quantitative way, but actually it's also an important factor, which is 534 00:24:57,000 --> 00:25:00,000 your code got a lot more complicated. 535 00:25:00,000 --> 00:25:02,000 If you looked at the code for the vector of the stack version, it was a few 536 00:25:02,000 --> 00:25:03,000 lines here a few lines there. 537 00:25:03,000 --> 00:25:06,000 If you looked at the code for the link list singly we looked at, it's like you've got 538 00:25:06,000 --> 00:25:07,000 to wire up these pointers. 539 00:25:07,000 --> 00:25:10,000 Now imagine you've got to wire up twice as many pointers. 540 00:25:10,000 --> 00:25:14,000 Not only do you have the in and out going one direction, you'll have the in and out going the 541 00:25:14,000 --> 00:25:15,000 other direction, 542 00:25:15,000 --> 00:25:17,000 and just the 543 00:25:17,000 --> 00:25:21,000 opportunity to make errors with pointers has now gone up by a factor of two. 544 00:25:21,000 --> 00:25:24,000 The idea that you could get a vector version of this up and running in 545 00:25:24,000 --> 00:25:26,000 no time, 546 00:25:26,000 --> 00:25:30,000 compared to several days, let's say, of getting the doubly link one working. It 547 00:25:30,000 --> 00:25:32,000 might be a factor to keep in mind. 548 00:25:32,000 --> 00:25:36,000 If you're kind of investing for the long haul, sure, work hard, but know 549 00:25:36,000 --> 00:25:41,000 that it wasn't for free. 550 00:25:41,000 --> 00:25:42,000 So you've 551 00:25:42,000 --> 00:25:45,000 got 89 percent overhead. 552 00:25:45,000 --> 00:25:49,000 That's a lot of overhead for small bits of data. 553 00:25:49,000 --> 00:25:53,000 More likely, and this is actually the way that most modern word processors 554 00:25:53,000 --> 00:25:54,000 do do this, 555 00:25:54,000 --> 00:25:58,000 is they don't just make one or the other; it's not just a array or a stack or a link list. 556 00:25:58,000 --> 00:25:59,000 They 557 00:25:59,000 --> 00:26:04,000 actually look at ways to combine the features of both to get a hybrid, 558 00:26:04,000 --> 00:26:05,000 where you can take 559 00:26:05,000 --> 00:26:10,000 some of the advantages of one pipe of data, but also try to avoid its worst 560 00:26:10,000 --> 00:26:13,000 weaknesses. So the most common way they're going to do this is some kind of 561 00:26:13,000 --> 00:26:15,000 chunking strategy, 562 00:26:15,000 --> 00:26:17,000 where you have a segment of the 563 00:26:17,000 --> 00:26:21,000 buffer that kind of is moved aside and worked on in one little array. It 564 00:26:21,000 --> 00:26:25,000 might be that those arrays are ten or 20 or a sentence long, or paragraph long. 565 00:26:25,000 --> 00:26:28,000 It might be that they are actually dictated by when you change styles. 566 00:26:28,000 --> 00:26:30,000 All of the code that's in one font 567 00:26:30,000 --> 00:26:33,000 kind of gets moved into one chunk, and as soon as you change styles, it breaks 568 00:26:33,000 --> 00:26:37,000 into a new chunk that follows to kind of keep track of the formatting information. It 569 00:26:37,000 --> 00:26:40,000 depends on some of the other features that are present in the processor 570 00:26:40,000 --> 00:26:43,000 here, but it's likely that rather than having all 1,000 or all 10 million 571 00:26:43,000 --> 00:26:45,000 characters in one 572 00:26:45,000 --> 00:26:46,000 data structure, 573 00:26:46,000 --> 00:26:47,000 they're actually kind of separated 574 00:26:47,000 --> 00:26:50,000 to where you have a little bit of both. So the most likely thing is that 575 00:26:50,000 --> 00:26:53,000 there is some kind of array strategy on a chunk basis. 576 00:26:53,000 --> 00:26:56,000 There's a linking between those chunks. 577 00:26:56,000 --> 00:26:59,000 And what this allows you do is to 578 00:26:59,000 --> 00:27:01,000 share that overhead cost. 579 00:27:01,000 --> 00:27:05,000 So in this case, if I had every sentence in a chunk by itself, 580 00:27:05,000 --> 00:27:08,000 and then if I had next and previous pointer, which pointed to the next sentence and the previous 581 00:27:08,000 --> 00:27:09,000 582 00:27:09,000 --> 00:27:11,000 sentence, when I need to move the cursor forward or backwards 583 00:27:11,000 --> 00:27:15,000 within a chunk, it's just in changing this index. Oh, it's in index two, but 584 00:27:15,000 --> 00:27:16,000 now it's moving to three, or four, or five. 585 00:27:16,000 --> 00:27:20,000 And then when it gets to the end of that chunk, to the end of that sentence, and you move forward, well then it 586 00:27:20,000 --> 00:27:23,000 follows the next link forward. 587 00:27:23,000 --> 00:27:25,000 Similarly moving backwards, right, so 588 00:27:25,000 --> 00:27:29,000 within a chunk doing array-like manipulations, and then as it crosses the 589 00:27:29,000 --> 00:27:32,000 boundary using a link list steps to get further down. 590 00:27:32,000 --> 00:27:35,000 So when it needs to move to the front or the back, those things are easily 591 00:27:35,000 --> 00:27:37,000 accessible using head and tail pointers. 592 00:27:37,000 --> 00:27:39,000 When I need to do an edit, 593 00:27:39,000 --> 00:27:42,000 it operates in this array-like kind of 594 00:27:42,000 --> 00:27:44,000 way 595 00:27:44,000 --> 00:27:46,000 when there's space within that chunk. So you start editing in a sentence, then it 596 00:27:46,000 --> 00:27:49,000 would kind of do some shifting on that array. But the array itself being pretty 597 00:27:49,000 --> 00:27:51,000 small 598 00:27:51,000 --> 00:27:53,000 means that although you are paying that cost of shuffling, 599 00:27:53,000 --> 00:27:56,000 you are doing it on a smaller chunk. It's not 1,000 characters that have to move, it's 600 00:27:56,000 --> 00:27:59,000 20 or 40 that have to move within that sentence. 601 00:27:59,000 --> 00:28:02,000 And that when 602 00:28:02,000 --> 00:28:04,000 you need to insert a new sentence or a new chunk in the middle, you are doing 603 00:28:04,000 --> 00:28:08,000 link list-like manipulations to build a new chunk and insert and splice, 604 00:28:08,000 --> 00:28:11,000 so you have that flexibility, where the many characters aren't 605 00:28:11,000 --> 00:28:14,000 all affected by it. So you have kind of local array editing 606 00:28:14,000 --> 00:28:16,000 and global link list editing 607 00:28:16,000 --> 00:28:18,000 that gives you kind of the best of both worlds. 608 00:28:18,000 --> 00:28:19,000 609 00:28:19,000 --> 00:28:23,000 But again, the factor is cost showing up in code complexity. 610 00:28:23,000 --> 00:28:27,000 That what you are looking at is kind of space/time tradeoff. Well, I can throw space at 611 00:28:27,000 --> 00:28:29,000 this, and most people will say that in computer science, "Well I can throw space and this and 612 00:28:29,000 --> 00:28:31,000 get better times." So 613 00:28:31,000 --> 00:28:35,000 the double link list is a good example of something that throws space at a problem to get better 614 00:28:35,000 --> 00:28:37,000 time. There's a lot of overhead. 615 00:28:37,000 --> 00:28:39,000 Moving to the chunk list says I want to get back some of that space. I'm 616 00:28:39,000 --> 00:28:43,000 not willing to invest 90 percent overhead to get this. Can I find a way to 617 00:28:43,000 --> 00:28:44,000 take a 618 00:28:44,000 --> 00:28:47,000 whole sentence worth of things and add a few pointers here? That means that 619 00:28:47,000 --> 00:28:48,000 there's only two 620 00:28:48,000 --> 00:28:51,000 4 byte pointers per each sentence as opposed to each character, 621 00:28:51,000 --> 00:28:53,000 and that really way reduces the overhead. 622 00:28:53,000 --> 00:28:55,000 But then now, 623 00:28:55,000 --> 00:28:58,000 the problem is still back on your shoulders in terms of effort, 624 00:28:58,000 --> 00:29:02,000 now I have both an array, and a link list, and double links, and 625 00:29:02,000 --> 00:29:03,000 a 626 00:29:03,000 --> 00:29:07,000 lot of complicated stuff flying around to make it work. 627 00:29:07,000 --> 00:29:08,000 But it does work out. 628 00:29:08,000 --> 00:29:11,000 It's kind of the process that designers go through when 629 00:29:11,000 --> 00:29:12,000 they're building these kinds of 630 00:29:12,000 --> 00:29:16,000 key structures. It like, what are the options, and do any of the basic 631 00:29:16,000 --> 00:29:18,000 things we know about work? What about 632 00:29:18,000 --> 00:29:21,000 building even fancier things that kind of take the best of both world and mix 633 00:29:21,000 --> 00:29:23,000 them together and have a little bit of the weaknesses of this a little of that, but none of 634 00:29:23,000 --> 00:29:26,000 the really awful 635 00:29:26,000 --> 00:29:27,000 worst cases 636 00:29:27,000 --> 00:29:28,000 present in the end result. 637 00:29:28,000 --> 00:29:31,000 So in this case you'd have a lot of things that were constant time where constant was 638 00:29:31,000 --> 00:29:33,000 defined by the chunk size. 639 00:29:33,000 --> 00:29:37,000 So O of 100, let's say, if that was the maximum chunk size 640 00:29:37,000 --> 00:29:39,000 to do these things, as opposed to O of 641 00:29:39,000 --> 00:29:42,000 N for the whole block net, 642 00:29:42,000 --> 00:29:43,000 so kind of a neat 643 00:29:43,000 --> 00:29:45,000 little thing to think through. 644 00:29:45,000 --> 00:29:48,000 There was a time, just so you can feel 645 00:29:48,000 --> 00:29:51,000 gratified about how I've worked you very hard, but actually in the past I've been 646 00:29:51,000 --> 00:29:53,000 even harsher. 647 00:29:53,000 --> 00:29:57,000 We actually had them build the doubly linked chunked list editor buffer, 648 00:29:57,000 --> 00:30:00,000 and now we have you build the singly linked chunk list PQ, and it's 649 00:30:00,000 --> 00:30:02,000 actually 650 00:30:02,000 --> 00:30:06,000 still very complicated, as you'll find out, but not nearly quite as hairy as the 651 00:30:06,000 --> 00:30:09,000 full in both directions every way 652 00:30:09,000 --> 00:30:11,000 craziness. 653 00:30:11,000 --> 00:30:14,000 And you can kind of imagine in your mind. 654 00:30:14,000 --> 00:30:16,000 Any questions about edited buffer? 655 00:30:16,000 --> 00:30:18,000 We?ve got two more data structures to implement, 656 00:30:18,000 --> 00:30:20,000 two more things that we've been using, 657 00:30:20,000 --> 00:30:25,000 and happy to have in our arsenal that we need to know how they work. 658 00:30:25,000 --> 00:30:29,000 Let's talk about map first, and in 659 00:30:29,000 --> 00:30:31,000 fact the strategy that I'm going to end up showing you for map is actually going to be 660 00:30:31,000 --> 00:30:35,000 the same one we use for set, then we'll come back and think of an even more clever 661 00:30:35,000 --> 00:30:37,000 way to do map, too. So 662 00:30:37,000 --> 00:30:40,000 we've got two things to talk about, which are binary search trees and hashing. We'll 663 00:30:40,000 --> 00:30:41,000 do binary search trees first, 664 00:30:41,000 --> 00:30:44,000 and then we'll get back to the hashing the second time around. 665 00:30:44,000 --> 00:30:46,000 So 666 00:30:46,000 --> 00:30:49,000 map is, next to vector, probably the most useful thing you've got going on in 667 00:30:49,000 --> 00:30:50,000 your class library, 668 00:30:50,000 --> 00:30:53,000 all kinds of things that do look up based 669 00:30:53,000 --> 00:30:56,000 activities; looking up students by ID number, 670 00:30:56,000 --> 00:30:57,000 phone numbers, 671 00:30:57,000 --> 00:31:00,000 by name, any of these kinds of things where you need to have this key at associated satellite 672 00:31:00,000 --> 00:31:02,000 information. And 673 00:31:02,000 --> 00:31:05,000 you'd like to be able to do that retrieval and update quickly. 674 00:31:05,000 --> 00:31:08,000 The map is the one for you. 675 00:31:08,000 --> 00:31:11,000 So if you remember it's key value fears, it's all about key being the 676 00:31:11,000 --> 00:31:15,000 identifying mark that allows you to find the associated value. And then the value 677 00:31:15,000 --> 00:31:18,000 is actually just being stored and kind of retrieved without using it 678 00:31:18,000 --> 00:31:21,000 or examining it in any kind of interesting way. 679 00:31:21,000 --> 00:31:25,000 And what we're really interested in here is how to make this guy work really efficiently. So efficiently 680 00:31:25,000 --> 00:31:27,000 being hopefully log in or better, 681 00:31:27,000 --> 00:31:31,000 if we could get that on both the update operations that add and remove things 682 00:31:31,000 --> 00:31:33,000 from the map, as well as the 683 00:31:33,000 --> 00:31:34,000 look up operation, then 684 00:31:34,000 --> 00:31:37,000 that would please probably all our clients immensely, if we could get both of 685 00:31:37,000 --> 00:31:41,000 those going well. 686 00:31:41,000 --> 00:31:45,000 So I'm going to build you on that. I probably have enough time to do that. 687 00:31:45,000 --> 00:31:46,000 That 688 00:31:46,000 --> 00:31:48,000 attacks it from 689 00:31:48,000 --> 00:31:53,000 using simple tools to start with, just to kind of get a base line for what we can do. 690 00:31:53,000 --> 00:31:54,000 691 00:31:54,000 --> 00:31:56,000 Vector, our old friend vector, 692 00:31:56,000 --> 00:31:59,000 can't get too much mileage out of vector, apparently, because you can always find a 693 00:31:59,000 --> 00:32:00,000 way to make it do something 694 00:32:00,000 --> 00:32:01,000 useful for you. 695 00:32:01,000 --> 00:32:02,000 It 696 00:32:02,000 --> 00:32:03,000 gives us the convenience 697 00:32:03,000 --> 00:32:05,000 698 00:32:05,000 --> 00:32:07,000 of managing that thing with low overhead, 699 00:32:07,000 --> 00:32:09,000 direct access is good. 700 00:32:09,000 --> 00:32:11,000 We make a parastruct 701 00:32:11,000 --> 00:32:13,000 that has the key and the value together. 702 00:32:13,000 --> 00:32:16,000 We store it into a vector, 703 00:32:16,000 --> 00:32:18,000 and then we 704 00:32:18,000 --> 00:32:21,000 may or may not sort that vector. But if it's sorted, there's probably some good reasons. Think 705 00:32:21,000 --> 00:32:22,000 about 706 00:32:22,000 --> 00:32:26,000 what order to sort it in, what information to be using to sort that, 707 00:32:26,000 --> 00:32:28,000 and then we'll get value to add. We're just going to go 708 00:32:28,000 --> 00:32:30,000 through doing it, rather than talk about it. 709 00:32:30,000 --> 00:32:32,000 We'll just go over and 710 00:32:32,000 --> 00:32:36,000 make a vector happen. 711 00:32:36,000 --> 00:32:38,000 Let's get over here in X-code. 712 00:32:38,000 --> 00:32:40,000 Let's take a look at 713 00:32:40,000 --> 00:32:42,000 my map dot H, 714 00:32:42,000 --> 00:32:43,000 and all 715 00:32:43,000 --> 00:32:46,000 I'm going to put in there is actually add and get value. 716 00:32:46,000 --> 00:32:48,000 And 717 00:32:48,000 --> 00:32:50,000 the other one contains key and remove." 718 00:32:50,000 --> 00:32:52,000 719 00:32:52,000 --> 00:32:55,000 But these are these two operations that really matter to us, getting something in there, and 720 00:32:55,000 --> 00:32:57,000 getting something back out. 721 00:32:57,000 --> 00:32:58,000 722 00:32:58,000 --> 00:33:00,000 And so I'm going to start by 723 00:33:00,000 --> 00:33:02,000 building this thing on vector. So let me go ahead and 724 00:33:02,000 --> 00:33:14,000 indicate that I plan to use vector as part of 725 00:33:14,000 --> 00:33:15,000 what I'm doing. So I defined a little structure 726 00:33:15,000 --> 00:33:20,000 that's the pair. I'm going to call that "pair" 727 00:33:20,000 --> 00:33:23,000 even, that's a nice word for it. So the pair of the key and the value that go together, 728 00:33:23,000 --> 00:33:29,000 and then what I will store here 729 00:33:29,000 --> 00:33:31,000 is a vector of those pairs. They'll 730 00:33:31,000 --> 00:33:36,000 give me a key to value; I'll stick them together in that little struct, stick them into my vector, and 731 00:33:36,000 --> 00:33:38,000 hopefully be able to retrieve them later. 732 00:33:38,000 --> 00:33:42,000 So given that I'm depending only on vector, I don?t have any other 733 00:33:42,000 --> 00:33:45,000 information, I don't technically need to go out of my way to disallow then 734 00:33:45,000 --> 00:33:48,000 memory-wise copying, because vector does a deep copy. But I'm going to leave it in there, because I 735 00:33:48,000 --> 00:33:52,000 plan on going some place that is eventually going to need it. I might as well be safe from 736 00:33:52,000 --> 00:33:54,000 the get go. 737 00:33:54,000 --> 00:33:55,000 The vector 738 00:33:55,000 --> 00:34:00,000 will be set up and destroyed automatically as part of management of my 739 00:34:00,000 --> 00:34:04,000 data member. So I actually don't have any explicit allocation/deallocation, so if I look at my 740 00:34:04,000 --> 00:34:06,000 constructor and destructor they'll be totally empty. 741 00:34:06,000 --> 00:34:11,000 And then add and get value are where I'm going to actually get to do some work. 742 00:34:11,000 --> 00:34:14,000 Before I implement them, I'm just going to think for a second about 743 00:34:14,000 --> 00:34:16,000 how this is going to play out. 744 00:34:16,000 --> 00:34:20,000 Let's say I'm doing a vector that's storing 745 00:34:20,000 --> 00:34:26,000 strings and their length. Let's say, it's just a map of 746 00:34:26,000 --> 00:34:27,000 string, 747 00:34:27,000 --> 00:34:30,000 748 00:34:30,000 --> 00:34:34,000 and if I do M dot add 749 00:34:34,000 --> 00:34:37,000 car. I'm going to store a three with it, 750 00:34:37,000 --> 00:34:39,000 so on the other side of the wall, 751 00:34:39,000 --> 00:34:41,000 what we're going to do is make a little struct 752 00:34:41,000 --> 00:34:43,000 that has the word car 753 00:34:43,000 --> 00:34:47,000 and the number three with it, and we package it into our ray. 754 00:34:47,000 --> 00:34:51,000 And so then we get a second call to put something in like 755 00:34:51,000 --> 00:34:55,000 dog, also with a three, then we'll make a second one of these and stick it in 756 00:34:55,000 --> 00:34:58,000 our ray. 757 00:34:58,000 --> 00:35:01,000 And so on, so the idea is to have the vector really kind of manage where the storage is going. And 758 00:35:01,000 --> 00:35:02,000 then what 759 00:35:02,000 --> 00:35:05,000 we'll just do is package it up and stick it in there. 760 00:35:05,000 --> 00:35:08,000 The one thing, though, that I have to think a little bit about, because it sounds 761 00:35:08,000 --> 00:35:11,000 like add should be pretty easy. It almost seems like what I need to do is something 762 00:35:11,000 --> 00:35:13,000 as simple as this. And 763 00:35:13,000 --> 00:35:15,000 I'll start to write the code, and then you can tell me why 764 00:35:15,000 --> 00:35:17,000 this isn't what I want to do. 765 00:35:17,000 --> 00:35:20,000 I say pair TP, and I say P dot 766 00:35:20,000 --> 00:35:23,000 key equals key P dot 767 00:35:23,000 --> 00:35:30,000 value equals val, and then I say entries dot add P. 768 00:35:30,000 --> 00:35:32,000 Almost, but not quite what I want, 769 00:35:32,000 --> 00:35:34,000 what have I forgotten 770 00:35:34,000 --> 00:35:39,000 to take care of? 771 00:35:39,000 --> 00:35:42,000 If it's not already in there; 772 00:35:42,000 --> 00:35:46,000 this has to do with just what's the interface for nap. What does nap say about it if you 773 00:35:46,000 --> 00:35:48,000 try to add 774 00:35:48,000 --> 00:35:51,000 for a key a different value. So in the case of this one, it didn't quite make sense that I would add 775 00:35:51,000 --> 00:35:55,000 something, but let's say I accidentally put car in there with a link to the first time that I went 776 00:35:55,000 --> 00:35:57,000 to go fix it, right, 777 00:35:57,000 --> 00:36:01,000 that my subsequent to ask it store car with the proper value, 778 00:36:01,000 --> 00:36:03,000 three, 779 00:36:03,000 --> 00:36:05,000 doesn't create a totally new entry. 780 00:36:05,000 --> 00:36:09,000 It has to find the existing entry that already has car and overwrite it. 781 00:36:09,000 --> 00:36:13,000 So the behavior of mad was it could add when it was not previously existing. 782 00:36:13,000 --> 00:36:16,000 It also could be an overwrite of an existing value. So we do need 783 00:36:16,000 --> 00:36:17,000 to find 784 00:36:17,000 --> 00:36:20,000 if there is an entry that already has that key, and just 785 00:36:20,000 --> 00:36:21,000 replace its value 786 00:36:21,000 --> 00:36:23,000 rather than 787 00:36:23,000 --> 00:36:26,000 add a second entry with the same key. So at any given point, right, in this whole 788 00:36:26,000 --> 00:36:28,000 vector, there should be one entry 789 00:36:28,000 --> 00:36:30,000 for each unique key. 790 00:36:30,000 --> 00:36:33,000 So I have to do a search. And 791 00:36:33,000 --> 00:36:35,000 let me 792 00:36:35,000 --> 00:36:38,000 write this code, and then I'm going to mention why I'm going to decompose it in a 793 00:36:38,000 --> 00:36:39,000 second. 794 00:36:39,000 --> 00:36:42,000 If I do entries dot size, I plus plus 795 00:36:42,000 --> 00:36:44,000 then if 796 00:36:44,000 --> 00:36:49,000 entries of I dot key equals key. 797 00:36:49,000 --> 00:36:51,000 Then 798 00:36:51,000 --> 00:36:52,000 let's break. Let me pull out 799 00:36:52,000 --> 00:37:00,000 I so I can get access to it when I'm done. So after this 800 00:37:00,000 --> 00:37:01,000 loop exits, 801 00:37:01,000 --> 00:37:06,000 if I is less than entries dot size, 802 00:37:06,000 --> 00:37:06,000 then 803 00:37:06,000 --> 00:37:09,000 I know that it found a match. 804 00:37:09,000 --> 00:37:12,000 In the case where it went through all of them, and it never hit the break. Then I will 805 00:37:12,000 --> 00:37:15,000 have gone all the way to where it is exactly equal to entries dot size; if it didn't then I 806 00:37:15,000 --> 00:37:16,000 just 807 00:37:16,000 --> 00:37:19,000 have a place to overwrite. And I can say entries of 808 00:37:19,000 --> 00:37:21,000 I dot val 809 00:37:21,000 --> 00:37:23,000 equals val. 810 00:37:23,000 --> 00:37:27,000 Otherwise I go through this process of adding a new one. I had to 811 00:37:27,000 --> 00:37:30,000 go hunt that thing down 812 00:37:30,000 --> 00:37:33,000 and replace it. 813 00:37:33,000 --> 00:37:35,000 That code, 814 00:37:35,000 --> 00:37:38,000 when you look at it, you can say, "Gosh I feel like that code's going to be familiar." 815 00:37:38,000 --> 00:37:43,000 Because that idea of searching to find a match to the key 816 00:37:43,000 --> 00:37:45,000 probably is going to come in really handy 817 00:37:45,000 --> 00:37:46,000 when I need to do get value, 818 00:37:46,000 --> 00:37:50,000 but it has the same exact problem of oh, I've got to go find that match. So in fact, 819 00:37:50,000 --> 00:37:55,000 that kind of motivates the idea of why don't I just take this little piece of code 820 00:37:55,000 --> 00:37:59,000 and separate it out and do a helper that they can both use. I 821 00:37:59,000 --> 00:38:02,000 may not have planned for this from the beginning, but once I see it happening, I might as well fix 822 00:38:02,000 --> 00:38:05,000 it. So I can say find index 823 00:38:05,000 --> 00:38:07,000 for key that 824 00:38:07,000 --> 00:38:10,000 given a key will return to you 825 00:38:10,000 --> 00:38:11,000 826 00:38:11,000 --> 00:38:15,000 the vector index at which 827 00:38:15,000 --> 00:38:19,000 that key is or a negative one. I don't know if I need to keep that 828 00:38:19,000 --> 00:38:22,000 one anymore. 829 00:38:22,000 --> 00:38:24,000 If it ever found it, it returns I otherwise 830 00:38:24,000 --> 00:38:28,000 it returned negative one. That can be our not found. 831 00:38:28,000 --> 00:38:37,000 I can actually use that up here, 832 00:38:37,000 --> 00:38:40,000 and then I can say, if found 833 00:38:40,000 --> 00:38:43,000 does not equal negative one. 834 00:38:43,000 --> 00:38:44,000 Then we do that, 835 00:38:44,000 --> 00:38:46,000 and then that 836 00:38:46,000 --> 00:38:48,000 tells us that 837 00:38:48,000 --> 00:38:51,000 this little piece of code is going to come in handy right up here, 838 00:38:51,000 --> 00:38:55,000 get value if found does not equal 839 00:38:55,000 --> 00:38:56,000 negative one, then we can return that 840 00:38:56,000 --> 00:39:02,000 and what is the behavior of get value if it didn't find it? Does anybody remember? 841 00:39:02,000 --> 00:39:06,000 I ask it to get the value in this case for lollipop and it's not 842 00:39:06,000 --> 00:39:07,000 there, 843 00:39:07,000 --> 00:39:13,000 does it turn zero, what does it 844 00:39:13,000 --> 00:39:22,000 do? Untracked forget value, remember? 845 00:39:22,000 --> 00:39:25,000 This is an error. 846 00:39:25,000 --> 00:39:26,000 There is not sort of a 847 00:39:26,000 --> 00:39:30,000 general case return value you can produce here 848 00:39:30,000 --> 00:39:33,000 that will really sort of sufficiently describe to someone who's made 849 00:39:33,000 --> 00:39:37,000 this call in error what happened, right? I can't return zero, because it happens to be 850 00:39:37,000 --> 00:39:40,000 that sometimes I?m putting in, and sometimes I'm putting strings or structs 851 00:39:40,000 --> 00:39:44,000 kinds of things. There's no general zero or negative one to use, 852 00:39:44,000 --> 00:39:47,000 so the behavior of get value is if you ask it to retrieve a value for a key, it ought to be 853 00:39:47,000 --> 00:39:48,000 there. So 854 00:39:48,000 --> 00:39:52,000 the corresponding implementation would likely also just go through 855 00:39:52,000 --> 00:39:56,000 the process of contains key, which we call find index for key and check. 856 00:39:56,000 --> 00:39:58,000 It could be used by the client 857 00:39:58,000 --> 00:40:01,000 if they need to know in advance that it's really there. 858 00:40:01,000 --> 00:40:04,000 So let's see how I'm 859 00:40:04,000 --> 00:40:06,000 doing. Let's go back to my 860 00:40:06,000 --> 00:40:11,000 code here, 861 00:40:11,000 --> 00:40:13,000 and this 862 00:40:13,000 --> 00:40:15,000 being 863 00:40:15,000 --> 00:40:16,000 not my specialty, 864 00:40:16,000 --> 00:40:20,000 and just see that I can put something in and 865 00:40:20,000 --> 00:40:25,000 get it back out would be a good first test to see that things are 866 00:40:25,000 --> 00:40:29,000 going as they should. Oh, no it didn't like that. Oh yeah, find 867 00:40:29,000 --> 00:40:30,000 index for key. 868 00:40:30,000 --> 00:40:32,000 Find index for key 869 00:40:32,000 --> 00:40:34,000 was not 870 00:40:34,000 --> 00:40:38,000 populated clear in my dot H, so let me go move it there. I'll put 871 00:40:38,000 --> 00:40:42,000 it in the private section, because I don?t want this to be something that is 872 00:40:42,000 --> 00:40:45,000 exposed outside of the implementation. The idea that there is an index and 873 00:40:45,000 --> 00:40:48,000 there is a vector is not really something that I would want to make part of my 874 00:40:48,000 --> 00:40:54,000 public interface. It's really just an internal detail of how I'm managing stuff. 875 00:40:54,000 --> 00:40:56,000 Something about what I did, oh yeah, I 876 00:40:56,000 --> 00:41:00,000 used the right name for that. 877 00:41:00,000 --> 00:41:05,000 One more case let 878 00:41:05,000 --> 00:41:06,000 me just take a look at them both 879 00:41:06,000 --> 00:41:09,000 before I let the compiler tell me what is and what isn't right about them. Their both 880 00:41:09,000 --> 00:41:13,000 doing find index for key using linear search. If it comes back at not negative one 881 00:41:13,000 --> 00:41:17,000 pulling out the matching value and overriding it, and then in the case of adding 882 00:41:17,000 --> 00:41:23,000 or replacing the error - 883 00:41:23,000 --> 00:41:24,000 so the 884 00:41:24,000 --> 00:41:26,000 previously stored value for that is three, 885 00:41:26,000 --> 00:41:29,000 and if I go and I do one of those tests it's always good to know what 886 00:41:29,000 --> 00:41:34,000 happens if I ask it for something that doesn't exist. 887 00:41:34,000 --> 00:41:37,000 Making sure that the error handling that I put in there does 888 00:41:37,000 --> 00:41:43,000 what it was supposed to do, right? Are 889 00:41:43,000 --> 00:41:46,000 there any questions about the code that I wrote here? 890 00:41:46,000 --> 00:41:49,000 The vector is always kind of a good starting place for these things, because actually it just 891 00:41:49,000 --> 00:41:53,000 tends to use very simple things, and it tends to lend itself to easy code. 892 00:41:53,000 --> 00:41:57,000 But because of vector constraints being at a contiguous back structure, right, there's likely to 893 00:41:57,000 --> 00:41:59,000 be some 894 00:41:59,000 --> 00:42:01,000 performance implication. And in 895 00:42:01,000 --> 00:42:02,000 particular, 896 00:42:02,000 --> 00:42:04,000 in the unsorted case here, 897 00:42:04,000 --> 00:42:07,000 both add and get value 898 00:42:07,000 --> 00:42:09,000 involve doing a linear search. 899 00:42:09,000 --> 00:42:12,000 In some cases it will find it quickly at the very beginning or even half way 900 00:42:12,000 --> 00:42:13,000 through. 901 00:42:13,000 --> 00:42:17,000 In other cases it won't find it at all and have to search through the whole thing. So on 902 00:42:17,000 --> 00:42:20,000 average, if you figure it's equally likely to be in any position or not there at all, 903 00:42:20,000 --> 00:42:23,000 it's at going to at least have to look at half of them or more. 904 00:42:23,000 --> 00:42:25,000 And so we can say they're both linear. 905 00:42:25,000 --> 00:42:28,000 If we change it to be sorted, 906 00:42:28,000 --> 00:42:31,000 which is the first obvious improvement to make here, 907 00:42:31,000 --> 00:42:34,000 then get value 908 00:42:34,000 --> 00:42:37,000 gets a lot faster, because we can take advantage of binary search. 909 00:42:37,000 --> 00:42:39,000 I sort it by key, 910 00:42:39,000 --> 00:42:43,000 so ignoring the value, values are just satellite data. But stored in order of 911 00:42:43,000 --> 00:42:44,000 key, 912 00:42:44,000 --> 00:42:48,000 then all the A words, B words, C words, D words, what not, so 913 00:42:48,000 --> 00:42:51,000 walking it down the middle and seeing which half you look at, then looking at the 914 00:42:51,000 --> 00:42:53,000 quarter of there and the eight of there. It's going to very quickly narrow down on 915 00:42:53,000 --> 00:42:57,000 where it had to be if it was there or if it wasn't there at all. 916 00:42:57,000 --> 00:43:00,000 So we can implement that in logarithmic time, meaning that if you 1,000 917 00:43:00,000 --> 00:43:04,000 entries, it takes ten comparisons to find it or agree that it's not there. 918 00:43:04,000 --> 00:43:08,000 If you double that it will take one more comparison. If 919 00:43:08,000 --> 00:43:12,000 you go to 2,000, negligibly faster, even numbers in the millions 920 00:43:12,000 --> 00:43:15,000 are still just a handful 921 00:43:15,000 --> 00:43:19,000 of comparisons to work it out, and say, 922 00:43:19,000 --> 00:43:22,000 to the twentieth for example is roughly a million; so 20 comparisons in or out, 923 00:43:22,000 --> 00:43:25,000 and so it didn't look at a million things a million times over. 924 00:43:25,000 --> 00:43:27,000 However, 925 00:43:27,000 --> 00:43:30,000 that didn't improve add, 926 00:43:30,000 --> 00:43:31,000 why not? 927 00:43:31,000 --> 00:43:33,000 Why does keeping an assorted order 928 00:43:33,000 --> 00:43:37,000 not have the same boost for add as it did for get value? 929 00:43:37,000 --> 00:43:46,000 Can I do the same logarithmic search? You have to 930 00:43:46,000 --> 00:43:50,000 move all the elements ? 931 00:43:50,000 --> 00:43:51,000 Yeah, 932 00:43:51,000 --> 00:43:55,000 you can find the place fast, right, you can say returning the word 933 00:43:55,000 --> 00:43:59,000 apple it's like oh, okay. I can very quickly narrow in on where it needed to be. If it 934 00:43:59,000 --> 00:44:02,000 was there I could override it quickly, but in the case where it really needed to do 935 00:44:02,000 --> 00:44:03,000 an add, 936 00:44:03,000 --> 00:44:04,000 it's got to move them down. 937 00:44:04,000 --> 00:44:05,000 So the old add 938 00:44:05,000 --> 00:44:09,000 did all its work in the search and then just tacked it on the end if it didn't find 939 00:44:09,000 --> 00:44:12,000 it, or overrode the middle. But the new add 940 00:44:12,000 --> 00:44:16,000 also has to modify the array to put it in the right place, and putting it in the right place means 941 00:44:16,000 --> 00:44:18,000 shuffling to make space. 942 00:44:18,000 --> 00:44:20,000 So even though it found it quickly, 943 00:44:20,000 --> 00:44:25,000 it was unable to update that quickly, so it still ends up being 944 00:44:25,000 --> 00:44:26,000 linear in that case. 945 00:44:26,000 --> 00:44:29,000 Now this actually not a terrible result, 946 00:44:29,000 --> 00:44:30,000 as it stands 947 00:44:30,000 --> 00:44:33,000 it's probably much more likely that you're going to load up the data 948 00:44:33,000 --> 00:44:36,000 structure once and then do a lot 949 00:44:36,000 --> 00:44:40,000 of searches on it, then infrequent additions. That's a pretty common usage 950 00:44:40,000 --> 00:44:41,000 pattern for a map. It's kind of like you 951 00:44:41,000 --> 00:44:43,000 build a dictionary, right, 952 00:44:43,000 --> 00:44:46,000 loading the dictionary once could be expensive, but then people get tons and tons of 953 00:44:46,000 --> 00:44:49,000 look ups of words later. But you don't change the definitions a lot later or add a 954 00:44:49,000 --> 00:44:51,000 lot of new words. 955 00:44:51,000 --> 00:44:51,000 So 956 00:44:51,000 --> 00:44:56,000 this might actually be quite tolerable in many situations 957 00:44:56,000 --> 00:44:59,000 that the building of it was expensive, but it gave 958 00:44:59,000 --> 00:45:00,000 you 959 00:45:00,000 --> 00:45:02,000 fast search access. 960 00:45:02,000 --> 00:45:05,000 But really what you want to do is say, can we get both of those? So driving to say 961 00:45:05,000 --> 00:45:08,000 what we do with the editor buffer, you say, "Well, what if I just want both of those operations 962 00:45:08,000 --> 00:45:11,000 to be logarithmic or better, what can I do?" So 963 00:45:11,000 --> 00:45:12,000 I'll leave you with kind of just 964 00:45:12,000 --> 00:45:16,000 a little bit of a taste of where we're heading. 965 00:45:16,000 --> 00:45:19,000 Just to think a little bit about what the link 966 00:45:19,000 --> 00:45:21,000 list does and doesn't buy us, 967 00:45:21,000 --> 00:45:24,000 right, that the link list gave us flexibility 968 00:45:24,000 --> 00:45:29,000 at the editing, the local editing site, but it doesn't give us fast searching. 969 00:45:29,000 --> 00:45:31,000 Well maybe we can take this link list and try to 970 00:45:31,000 --> 00:45:33,000 add searching 971 00:45:33,000 --> 00:45:37,000 to the idea of this flexibility to kind of get the best of both 972 00:45:37,000 --> 00:45:41,000 worlds. So that's what we're going to start on Monday, building a tree. So if 973 00:45:41,000 --> 00:45:44,000 you have time today, I'd love to have you come to Truman, and otherwise have a good 974 00:45:44,000 --> 00:45:45,000 weekend, and 975 00:45:45,000 --> 00:45:50,000 get your PQ work in.