1 00:00:00,000 --> 00:00:10,000 2 00:00:10,000 --> 00:00:11,000 3 00:00:11,000 --> 00:00:15,000 This presentation is delivered by the Stanford Center for Professional 4 00:00:15,000 --> 00:00:23,000 Development. 5 00:00:23,000 --> 00:00:25,000 Hey there! Oh, 6 00:00:25,000 --> 00:00:29,000 we're back. We're back and now we're getting down and dirty. We'll do more pointers, and link 7 00:00:29,000 --> 00:00:33,000 list, and implementations. It's actually going to be like our life for the next 8 00:00:33,000 --> 00:00:36,000 week and half. There's just even more implementation, trying to think about how those 9 00:00:36,000 --> 00:00:37,000 things work on the backside. So 10 00:00:37,000 --> 00:00:41,000 I will do another alternate implementation of the stack today. 11 00:00:41,000 --> 00:00:44,000 So we did one based on vector and we're going to do one based on a link list. 12 00:00:44,000 --> 00:00:47,000 And then I'm going to do a cube based on link list, as well, 13 00:00:47,000 --> 00:00:48,000 to kind of just mix 14 00:00:48,000 --> 00:00:51,000 and match a little bit. And then, we're still at the end of the case study, which the 15 00:00:51,000 --> 00:00:55,000 material is covered in Chapter 9. I probably won't get thorough all of that today. What we 16 00:00:55,000 --> 00:00:57,000 don't finish today we'll be talking about on Friday, 17 00:00:57,000 --> 00:01:00,000 and then we'll go on to start talking about map implementation and look at 18 00:01:00,000 --> 00:01:02,000 binary search trees. How many 19 00:01:02,000 --> 00:01:05,000 people started [inaudible]? How many of 20 00:01:05,000 --> 00:01:08,000 you got somewhere feeling good? 21 00:01:08,000 --> 00:01:09,000 Not just yet. One thing I will 22 00:01:09,000 --> 00:01:12,000 tell you is that there are two implementations you're working on 23 00:01:12,000 --> 00:01:16,000 and I listed them in the order of the one that the material we've seen the most, right. 24 00:01:16,000 --> 00:01:19,000 We've talked about link list more than we have talked about things that are tree and 25 00:01:19,000 --> 00:01:22,000 heat based. So I listed that one first. But, in fact, actually, in terms of difficulty of coding, 26 00:01:22,000 --> 00:01:27,000 I think, the second one is little bit more attractable than the first. So 27 00:01:27,000 --> 00:01:30,000 there's no reason you can't do them in either order if it feels good to you to start on the second one 28 00:01:30,000 --> 00:01:32,000 before you come back to the first one. 29 00:01:32,000 --> 00:01:35,000 Or just trade off between working on them if one of them is giving you fits just look 30 00:01:35,000 --> 00:01:37,000 at the other one to clear your head, 31 00:01:37,000 --> 00:01:41,000 so just a thought about how to approach it. Okay. 32 00:01:41,000 --> 00:01:46,000 So I am going to do some coding. I'm going to come back to my slides in a little bit here. 33 00:01:46,000 --> 00:01:50,000 But to pick up where we left off last time, was we had written a 34 00:01:50,000 --> 00:01:55,000 vector based implementation of the stack 35 00:01:55,000 --> 00:01:56,000 abstraction 36 00:01:56,000 --> 00:02:01,000 and so looking at the main operations here for a stack or push and pop array 37 00:02:01,000 --> 00:02:04,000 that its mechanism here is just to add to the end of the vector that we've 38 00:02:04,000 --> 00:02:08,000 got. So I'm letting the vector handle all the growing, and resizing, 39 00:02:08,000 --> 00:02:11,000 and memory, and manipulations that are needed for that. 40 00:02:11,000 --> 00:02:14,000 And then, when asked to pop one, we just take the one that was last most added, 41 00:02:14,000 --> 00:02:16,000 which will be at the far end of the array. 42 00:02:16,000 --> 00:02:19,000 And so given the way vector behaves, knowing that we know it's a continuous 43 00:02:19,000 --> 00:02:21,000 array based 44 00:02:21,000 --> 00:02:25,000 backing behind it, then we're getting this O of one access to that last element 45 00:02:25,000 --> 00:02:27,000 to either add a new one down there 46 00:02:27,000 --> 00:02:30,000 or to retrieve one and remove it from the end 47 00:02:30,000 --> 00:02:33,000 so that push and pop operations will both being in constant time no matter how many elements are in the 48 00:02:33,000 --> 00:02:36,000 stack, 10, 15, 6 million, 49 00:02:36,000 --> 00:02:40,000 the little bit of work we're doing is affecting only the far end of the array 50 00:02:40,000 --> 00:02:44,000 and not causing the rest of the configured storage to be jostled whatsoever so 51 00:02:44,000 --> 00:02:46,000 pretty easy to get up and running. 52 00:02:46,000 --> 00:02:50,000 So in general, once you have one abstraction you can start leveraging it into 53 00:02:50,000 --> 00:02:52,000 building other attractions and the vector manages a lot of the things that 54 00:02:52,000 --> 00:02:56,000 any array based implementation would need. So rather than using a raw array, 55 00:02:56,000 --> 00:02:57,000 there's very little reason to kind of, 56 00:02:57,000 --> 00:03:00,000 you know, ditch vector and go the straight raw array in this case because all it basically does 57 00:03:00,000 --> 00:03:03,000 it open up opportunities for error and kind of management and 58 00:03:03,000 --> 00:03:08,000 the kind of efficiency you can gain back by removing vectors intermediate 59 00:03:08,000 --> 00:03:11,000 stuff is really not very profound so not worth, 60 00:03:11,000 --> 00:03:12,000 probably, 61 00:03:12,000 --> 00:03:13,000 taking on. 62 00:03:13,000 --> 00:03:16,000 What we do is consider the only link limitation. We had talked about a link 63 00:03:16,000 --> 00:03:20,000 list for a vector and we kind of discarded it as likely not to lead anywhere 64 00:03:20,000 --> 00:03:21,000 interesting. 65 00:03:21,000 --> 00:03:24,000 I actually, am going to fully pursue it for this stack because it actually turns out that a link list 66 00:03:24,000 --> 00:03:28,000 is a good fit for what the behavior that a stack needs in terms of the 67 00:03:28,000 --> 00:03:29,000 manipulations and flexibility. 68 00:03:29,000 --> 00:03:33,000 So if I have a stack of enterers where I push 10, 20, and 30. I'm going to 69 00:03:33,000 --> 00:03:37,000 do that same diagram we did for the vector one as I did for the stack is 70 00:03:37,000 --> 00:03:38,000 71 00:03:38,000 --> 00:03:40,000 one possibility 72 00:03:40,000 --> 00:03:42,000 is that I put them in the list 73 00:03:42,000 --> 00:03:47,000 in this order. 74 00:03:47,000 --> 00:03:50,000 Another possibility 75 00:03:50,000 --> 00:03:51,000 is that I put them in, 76 00:03:51,000 --> 00:03:58,000 in this order. We'll call this Strategy A; we'll call this Strategy B. 77 00:03:58,000 --> 00:04:01,000 They seem, you know, fairly symmetric, right. 78 00:04:01,000 --> 00:04:05,000 If they're going to be a strong reason that 79 00:04:05,000 --> 00:04:09,000 leaves us to prefer Strategy A over B, or are they just equally 80 00:04:09,000 --> 00:04:12,000 easy to get the things done that we need to do? 81 00:04:12,000 --> 00:04:16,000 Anybody want to make an argument for me. 82 00:04:16,000 --> 00:04:18,000 Help me out. Last in, first out. 83 00:04:18,000 --> 00:04:22,000 Last in, first out. 84 00:04:22,000 --> 00:04:25,000 [Inaudible]. Be careful here. So 10, 20, 30 means the way the stack works is like 85 00:04:25,000 --> 00:04:26,000 this, 10, 86 00:04:26,000 --> 00:04:28,000 20, 30 87 00:04:28,000 --> 00:04:31,000 and so it's at the top that we're interested in, right? Um hm. 88 00:04:31,000 --> 00:04:34,000 That the top is where activity is taking place. Oh. 89 00:04:34,000 --> 00:04:37,000 So we want to be adding things and losing from the top. So do we want to put the top at 90 00:04:37,000 --> 00:04:41,000 the end of our list or at the front of our list? 91 00:04:41,000 --> 00:04:46,000 Which is the easier to access in the link list design? [Inaudible]. 92 00:04:46,000 --> 00:04:48,000 The front, right? 93 00:04:48,000 --> 00:04:52,000 So in the array format we prefer this, putting the top of the vector on the far end 94 00:04:52,000 --> 00:04:54,000 because then there is the only jostling of 95 00:04:54,000 --> 00:04:58,000 that. But needing access to the top here in the front is actually going to be the better way to get the link 96 00:04:58,000 --> 00:05:00,000 list strategy 97 00:05:00,000 --> 00:05:04,000 you work in. Let me actually go through both the push and the pop to convince 98 00:05:04,000 --> 00:05:05,000 you why this is true. 99 00:05:05,000 --> 00:05:09,000 So if I have access in the 10, 20, 30 case I could actually keep a separate 100 00:05:09,000 --> 00:05:12,000 additional pointer, one to the front and one to the back. So it could actually 101 00:05:12,000 --> 00:05:13,000 become maintaining at 102 00:05:13,000 --> 00:05:15,000 all times as pointing to the back. If I 103 00:05:15,000 --> 00:05:19,000 did that, right, then subsequent push up to a 40, it's pretty easy to tack on a 40. 104 00:05:19,000 --> 00:05:20,000 So it turns out it would be that 105 00:05:20,000 --> 00:05:24,000 the adding it's the push operation that actually isn't really a problem on this 106 00:05:24,000 --> 00:05:26,000 Strategy A. We can still kind of easily, and over 107 00:05:26,000 --> 00:05:30,000 time, if we have that tail pointer just tack one on to the end. As 108 00:05:30,000 --> 00:05:32,000 for the pop operation, 109 00:05:32,000 --> 00:05:33,000 110 00:05:33,000 --> 00:05:34,000 if I have to pop that 30, 111 00:05:34,000 --> 00:05:39,000 I need to adjust that tail pointer and move it back a cell. And that's where we 112 00:05:39,000 --> 00:05:41,000 start to get into trouble. If I have a pointer to 113 00:05:41,000 --> 00:05:45,000 the tail cell there is no mechanism in the singly linked list to back 114 00:05:45,000 --> 00:05:46,000 up. If I 115 00:05:46,000 --> 00:05:48,000 say it's time to pop 30, 116 00:05:48,000 --> 00:05:49,000 then I would actually need 117 00:05:49,000 --> 00:05:52,000 to delete the 30, get the value out, but then I need to update this tail pointer to 118 00:05:52,000 --> 00:05:55,000 point to 20, and 30 doesn't know 119 00:05:55,000 --> 00:05:57,000 where 20 is, 30's oblivious about these things. 120 00:05:57,000 --> 00:06:00,000 And so the only way to find 20 in singly linked list would be go back to the very 121 00:06:00,000 --> 00:06:03,000 beginning, walk your way down, and find the one that pointed to the cell you just took out of 122 00:06:03,000 --> 00:06:05,000 the list 123 00:06:05,000 --> 00:06:08,000 and that's going to be an operation we want to avoid. 124 00:06:08,000 --> 00:06:11,000 In the Strategy B case, right, adding a cell, this 125 00:06:11,000 --> 00:06:15,000 means putting a 40 in allocating a new cell and then attaching it, 126 00:06:15,000 --> 00:06:18,000 splicing it in, right between wiring in two pointers, and we can do these links in 127 00:06:18,000 --> 00:06:19,000 constant time, 128 00:06:19,000 --> 00:06:24,000 and then popping one is taking that front row cell off. Easy to get to, easy to update, and 129 00:06:24,000 --> 00:06:25,000 splice out 130 00:06:25,000 --> 00:06:29,000 so no traversal, no extra pointers needed. 131 00:06:29,000 --> 00:06:31,000 The idea that there are, 132 00:06:31,000 --> 00:06:34,000 you know, 100 or 1,000 elements following the one that's on 133 00:06:34,000 --> 00:06:36,000 top is irrelevant. It 134 00:06:36,000 --> 00:06:38,000 doesn't have any impact, whatsoever, on the running time 135 00:06:38,000 --> 00:06:42,000 to access the front, or stuff another front, or taking it on or off the front. 136 00:06:42,000 --> 00:06:45,000 So it is Strategy B 137 00:06:45,000 --> 00:06:47,000 that gives us the best setup when it's 138 00:06:47,000 --> 00:06:50,000 in the back of the linked list. 139 00:06:50,000 --> 00:06:55,000 So let me go ahead and do it. I think it's kind of 140 00:06:55,000 --> 00:06:58,000 good to see. Oh, like, what is it like to just write code and make stuff work. 141 00:06:58,000 --> 00:07:04,000 And so I'm kind of fond of this and you can tell me. 142 00:07:04,000 --> 00:07:07,000 I make a cell C 143 00:07:07,000 --> 00:07:08,000 it has an L type 144 00:07:08,000 --> 00:07:12,000 value and it has a soft T next 145 00:07:12,000 --> 00:07:15,000 right there. 146 00:07:15,000 --> 00:07:19,000 And then I have a pointer to the front linked list cell. 147 00:07:19,000 --> 00:07:21,000 So in addition, you know, 148 00:07:21,000 --> 00:07:25,000 it might that in order to make something like size operate 149 00:07:25,000 --> 00:07:26,000 efficiently, 150 00:07:26,000 --> 00:07:27,000 I might also want to catch the 151 00:07:27,000 --> 00:07:30,000 number of cells that update, or added or renew. I can also decide to 152 00:07:30,000 --> 00:07:33,000 just leave it out for now. Maybe 153 00:07:33,000 --> 00:07:36,000 I'll actually even change this to be the easier function, right, which 154 00:07:36,000 --> 00:07:38,000 is empty form. 155 00:07:38,000 --> 00:07:40,000 That way I don't even have to go down that road 156 00:07:40,000 --> 00:07:44,000 for now. 157 00:07:44,000 --> 00:07:46,000 Go back over here to 158 00:07:46,000 --> 00:07:49,000 this side and I've got to make everything now consistent with what's going on 159 00:07:49,000 --> 00:07:50,000 there. 160 00:07:50,000 --> 00:07:51,000 Okay. 161 00:07:51,000 --> 00:07:54,000 Let's start off with our constructor, which is a good place to make sure you get 162 00:07:54,000 --> 00:07:56,000 into a valid state. 163 00:07:56,000 --> 00:07:59,000 In this case, we're going to use the empty list so the head pointer points and tells us 164 00:07:59,000 --> 00:08:03,000 we have no cells whatsoever in the list. So we don't pre allocate anything, 165 00:08:03,000 --> 00:08:04,000 we don't get ready. 166 00:08:04,000 --> 00:08:07,000 What's kind of nice with this linked list is to allocate on demand 167 00:08:07,000 --> 00:08:10,000 each individual cell that's asked to be added to the list. 168 00:08:10,000 --> 00:08:13,000 The delete operation actually does show we need to do some work. I'm actually not 169 00:08:13,000 --> 00:08:15,000 going to implement it but I'm going to mention here that I would need to delete the 170 00:08:15,000 --> 00:08:16,000 entire list, 171 00:08:16,000 --> 00:08:19,000 which would involve, either iterates or recursion my way down to kind of 172 00:08:19,000 --> 00:08:21,000 do all the work. 173 00:08:21,000 --> 00:08:25,000 And then I changed this from size to iterate. 174 00:08:25,000 --> 00:08:28,000 I'd like to know if my list is empty. 175 00:08:28,000 --> 00:08:30,000 So I can just test for head 176 00:08:30,000 --> 00:08:33,000 is equal, equal 177 00:08:33,000 --> 00:08:36,000 to null. If it's null we have no list. Let 178 00:08:36,000 --> 00:08:41,000 me work on push before I work on pop. I'll 179 00:08:41,000 --> 00:08:45,000 turn them around. The push operation is, make myself a new cell. All 180 00:08:45,000 --> 00:08:47,000 right, let's go through the steps for that. New 181 00:08:47,000 --> 00:08:50,000 cell equals new cell 182 00:08:50,000 --> 00:08:54,000 T. [Inaudible] the size to hold a value 183 00:08:54,000 --> 00:08:56,000 and an X link. I'm going to set 184 00:08:56,000 --> 00:08:59,000 the new styles val to be the 185 00:08:59,000 --> 00:09:01,000 perimeter that came in. 186 00:09:01,000 --> 00:09:05,000 And now, I'm going to splice this guy onto the front of my list. 187 00:09:05,000 --> 00:09:10,000 So the outgoing pointer I'm doing first to the new cell is allocated and 188 00:09:10,000 --> 00:09:15,000 leads to what was previously the front row cell. 189 00:09:15,000 --> 00:09:19,000 And then it is now updated to point to this 190 00:09:19,000 --> 00:09:21,000 one. We've got pointer wired, one value 191 00:09:21,000 --> 00:09:23,000 assigned, and we've got a new 192 00:09:23,000 --> 00:09:28,000 cell on the front. The 193 00:09:28,000 --> 00:09:29,000 inverse of that, 194 00:09:29,000 --> 00:09:32,000 taking something off my list, I'm going to need 195 00:09:32,000 --> 00:09:35,000 to kind of detach that front row cell, delete its memory, get its value, things like 196 00:09:35,000 --> 00:09:38,000 that. So the error checking is still the same at the very beginning. Like, if I've 197 00:09:38,000 --> 00:09:41,000 got an empty staff I want to be sure I don't just kind of do reference no 198 00:09:41,000 --> 00:09:42,000 and get into 199 00:09:42,000 --> 00:09:48,000 some bad situations. So I report that error back using error, which will halt the program 200 00:09:48,000 --> 00:09:49,000 there. 201 00:09:49,000 --> 00:09:51,000 Otherwise, right, 202 00:09:51,000 --> 00:09:52,000 the element on 203 00:09:52,000 --> 00:09:57,000 top is the one that's in the value field of the head cell. So knowing that head's 204 00:09:57,000 --> 00:09:59,000 null is a safety reference to do there. 205 00:09:59,000 --> 00:10:03,000 And then I'm going to go ahead and 206 00:10:03,000 --> 00:10:06,000 get a hold of what that thing is and I'm 207 00:10:06,000 --> 00:10:09,000 going update 208 00:10:09,000 --> 00:10:13,000 the head to point to the cell after it. So splicing it out 209 00:10:13,000 --> 00:10:14,000 210 00:10:14,000 --> 00:10:17,000 and then deleting the 211 00:10:17,000 --> 00:10:20,000 old cell. 212 00:10:20,000 --> 00:10:23,000 Just take a look at these. Once we start doing 213 00:10:23,000 --> 00:10:25,000 pointers, right, there's just opportunities for error. 214 00:10:25,000 --> 00:10:28,000 I feel kind of good about what I did here but I am going to just do a little bit of 215 00:10:28,000 --> 00:10:30,000 tracing to kind of 216 00:10:30,000 --> 00:10:33,000 confirm for myself that the most likely situations that get you in trouble with link list is you'll 217 00:10:33,000 --> 00:10:36,000 handle the mainstream case but somehow forget one of the edge 218 00:10:36,000 --> 00:10:40,000 cases. Such as, what if the list was totally empty or just had a single cell? 219 00:10:40,000 --> 00:10:43,000 Is there some special case handling that needs to be dealt with? 220 00:10:43,000 --> 00:10:45,000 So if I think about it in terms of the push case, 221 00:10:45,000 --> 00:10:48,000 you know, I might say, well, if there's an existing set of cells, 222 00:10:48,000 --> 00:10:50,000 so we're using Strategy B here, let me 223 00:10:50,000 --> 00:10:54,000 erase A so I know what I'm looking at, 224 00:10:54,000 --> 00:10:58,000 that its head is currently pointing to a set of cells, right, that my 225 00:10:58,000 --> 00:11:00,000 allocation of the new cell out here, 226 00:11:00,000 --> 00:11:02,000 right, assigning its value, 227 00:11:02,000 --> 00:11:07,000 assigning its next field to be where head points two, and then sending head to 228 00:11:07,000 --> 00:11:08,000 this new guy, it 229 00:11:08,000 --> 00:11:10,000 looks to be good. 230 00:11:10,000 --> 00:11:14,000 If I also do that tracing on the - what if the list that we were looking at was 231 00:11:14,000 --> 00:11:17,000 just null to begin with. 232 00:11:17,000 --> 00:11:20,000 So there were no cells in there. This is the very first cell. It isn't going to have any kind of trouble 233 00:11:20,000 --> 00:11:24,000 stumbling over this case. So I would allocate that new cell 40, assign 234 00:11:24,000 --> 00:11:25,000 its value. 235 00:11:25,000 --> 00:11:28,000 Set the next field to be what head is, so that just basically sets the 236 00:11:28,000 --> 00:11:32,000 trail coming out of 40 to be null and then updates the head there. So we've 237 00:11:32,000 --> 00:11:35,000 produced a single linked list, right, with one cell. 238 00:11:35,000 --> 00:11:37,000 It looks like 239 00:11:37,000 --> 00:11:39,000 we're doing okay on the push behaviors. 240 00:11:39,000 --> 00:11:43,000 Now, the pop behavior, in this sort of case may be relevant to think about, if it's 241 00:11:43,000 --> 00:11:46,000 totally empty. All right, we come in and we've got an empty cell, 242 00:11:46,000 --> 00:11:49,000 then the empty test should cause it to error and get down to the rest of the 243 00:11:49,000 --> 00:11:50,000 code. 244 00:11:50,000 --> 00:11:52,000 Let's say that B points to 245 00:11:52,000 --> 00:11:55,000 some sequence of things, 30, 20, 246 00:11:55,000 --> 00:11:56,000 10, 247 00:11:56,000 --> 00:12:02,000 that 248 00:12:02,000 --> 00:12:05,000 taking the head value off and kind of writing it down somewhere, saying, okay, 249 00:12:05,000 --> 00:12:07,000 30 was the top value. 250 00:12:07,000 --> 00:12:10,000 The old cell is currently the head so we're 251 00:12:10,000 --> 00:12:13,000 keeping a temporary on this thing. 252 00:12:13,000 --> 00:12:17,000 And then head equals head error next, 253 00:12:17,000 --> 00:12:19,000 which splices out 254 00:12:19,000 --> 00:12:23,000 the cell around so we no longer have a pointer into this 30. The only place 255 00:12:23,000 --> 00:12:27,000 we can get back to it is with this temporary variable we assigned right here of old 256 00:12:27,000 --> 00:12:30,000 and then we delete that 257 00:12:30,000 --> 00:12:31,000 old to reclaim that storage 258 00:12:31,000 --> 00:12:33,000 and then our list now is 259 00:12:33,000 --> 00:12:37,000 the values 20 and 10, which followed it later in the list. To, also, 260 00:12:37,000 --> 00:12:41,000 do a little check what if it was the very last cell 261 00:12:41,000 --> 00:12:44,000 in the list that we're trying to pop? Does that 262 00:12:44,000 --> 00:12:46,000 have any special handling that we have overlooked 263 00:12:46,000 --> 00:12:50,000 if we just have a ten here with a null after it? I'll go 264 00:12:50,000 --> 00:12:55,000 through the process of writing down the ten. 265 00:12:55,000 --> 00:12:59,000 Assigning old to where head is, 266 00:12:59,000 --> 00:13:05,000 and assigning head-to-head error next, where head error of next cell is null 267 00:13:05,000 --> 00:13:08,000 but pointing to that, and then we set our list to null, and then we delete this cell. And so after 268 00:13:08,000 --> 00:13:11,000 we're done, right, we have the empty list again, the 269 00:13:11,000 --> 00:13:12,000 head pointer back to null. 270 00:13:12,000 --> 00:13:16,000 So it seems like we're doing pretty good job. Have a bunch of cells. Have one cell. 271 00:13:16,000 --> 00:13:25,000 No cells. Different things kind of seem to be going through here doing the right thing. Let's do it. A little bit 272 00:13:25,000 --> 00:13:34,000 of code on this side. [Inaudible]. Okay. Pop empty cell. I've 273 00:13:34,000 --> 00:13:37,000 got that. Oh, 274 00:13:37,000 --> 00:13:40,000 I think I may have proved, oh, it deliberately makes the error, in 275 00:13:40,000 --> 00:13:41,000 fact. 276 00:13:41,000 --> 00:13:43,000 That it 277 00:13:43,000 --> 00:13:48,000 was designed to test on that. [Inaudible] up 278 00:13:48,000 --> 00:13:52,000 to the end. So let's 279 00:13:52,000 --> 00:13:54,000 280 00:13:54,000 --> 00:13:57,000 see if 281 00:13:57,000 --> 00:14:03,000 we get back a three, two, one out of this guy. Okay. So 282 00:14:03,000 --> 00:14:08,000 we manage to do okay even on that. 283 00:14:08,000 --> 00:14:10,000 So looking at this piece of code, all 284 00:14:10,000 --> 00:14:14,000 right, the two operations we're most interested in, push and pop, right, 285 00:14:14,000 --> 00:14:19,000 are each oh of one, right. The number of elements out there, right, aren't 286 00:14:19,000 --> 00:14:21,000 changing anything about its performance. It does a little bit of 287 00:14:21,000 --> 00:14:23,000 imagination and a little bit of pointer arrangement up here to the front to 288 00:14:23,000 --> 00:14:24,000 the end 289 00:14:24,000 --> 00:14:28,000 and so it has a very nice tight allocation strategy, too, which is appealing relative 290 00:14:28,000 --> 00:14:29,000 to what 291 00:14:29,000 --> 00:14:33,000 the vector-based strategy was doing. It's like, now, it's doing things in advance on our behalf like 292 00:14:33,000 --> 00:14:36,000 extending our capacity so that every now and then we had some 293 00:14:36,000 --> 00:14:39,000 little glitch right when you were doing that big resize operation that happened infrequently, 294 00:14:39,000 --> 00:14:41,000 you won't have that same 295 00:14:41,000 --> 00:14:44,000 concern with this one because it does exactly what it needs at any given 296 00:14:44,000 --> 00:14:47,000 time, which is allocate a single cell, deletes a single cell, 297 00:14:47,000 --> 00:14:51,000 and also keeping the allocation very tidy in that way. So they'll both 298 00:14:51,000 --> 00:14:54,000 end of being oh of one. And both of these are 299 00:14:54,000 --> 00:14:58,000 commonly used viable strategies for how a stack is implemented. Right. Depending on the, 300 00:14:58,000 --> 00:14:59,000 301 00:14:59,000 --> 00:15:02,000 you know, just the inclination of the program or they may have decided 302 00:15:02,000 --> 00:15:05,000 one way was the way to go versus another 303 00:15:05,000 --> 00:15:07,000 that 304 00:15:07,000 --> 00:15:09,000 you will see both used in common practice because there really isn't 305 00:15:09,000 --> 00:15:12,000 one of them that's obviously better or obviously worse, right. 306 00:15:12,000 --> 00:15:16,000 Now, this uses a little bit more memory per cell, but then you have excess capacity 307 00:15:16,000 --> 00:15:21,000 and vector has excess capacity but no overhead per cell. 308 00:15:21,000 --> 00:15:23,000 There's a few other things to kind of think about trading off. You say, well, the code, 309 00:15:23,000 --> 00:15:27,000 itself, is a little harder to write if you're using only [inaudible]. 310 00:15:27,000 --> 00:15:30,000 That means it's a little bit more error prone 311 00:15:30,000 --> 00:15:33,000 and more likely, you'll make a mistake and have to do debug your way through it and the way with the 312 00:15:33,000 --> 00:15:35,000 vector it was pretty easy to get without tearing out your 313 00:15:35,000 --> 00:15:36,000 hair. Now, if 314 00:15:36,000 --> 00:15:40,000 we can 315 00:15:40,000 --> 00:15:42,000 do 316 00:15:42,000 --> 00:15:44,000 stack and queue that fast - 317 00:15:44,000 --> 00:15:45,000 I mean, stack that fast, then you figure queue 318 00:15:45,000 --> 00:15:47,000 must be 319 00:15:47,000 --> 00:15:51,000 not so crazy either. Let me 320 00:15:51,000 --> 00:15:59,000 draw you a picture of queue. 321 00:15:59,000 --> 00:16:02,000 Okay. 322 00:16:02,000 --> 00:16:08,000 I've got my 323 00:16:08,000 --> 00:16:12,000 queue and I can use the 324 00:16:12,000 --> 00:16:14,000 End queue operation on it to put in a 10, to put 325 00:16:14,000 --> 00:16:16,000 in a 20, to put in a 30. 326 00:16:16,000 --> 00:16:18,000 And 327 00:16:18,000 --> 00:16:20,000 I'm going to try and get back out what was 328 00:16:20,000 --> 00:16:23,000 front most. 329 00:16:23,000 --> 00:16:25,000 Okay. So queue is FIFO, first in 330 00:16:25,000 --> 00:16:27,000 first out, 331 00:16:27,000 --> 00:16:30,000 and waiting in line, whoever's been in the queue longest is the first one to come 332 00:16:30,000 --> 00:16:31,000 out, 333 00:16:31,000 --> 00:16:33,000 very fair handling of things. 334 00:16:33,000 --> 00:16:37,000 And so if were to think about this in terms of let's go straight across the 335 00:16:37,000 --> 00:16:39,000 linked list. We just got our head around the link list with the stack. Let's see if there's a link list for the 336 00:16:39,000 --> 00:16:40,000 queue 337 00:16:40,000 --> 00:16:44,000 provide the same kind of easy access to the things we need to do the work. 338 00:16:44,000 --> 00:16:48,000 So it seems like the 339 00:16:48,000 --> 00:16:51,000 two strategies that we looked at for stack 340 00:16:51,000 --> 00:16:55,000 are probably the same two to look at for this one, right, which we go in the front 341 00:16:55,000 --> 00:16:57,000 ways orders. I mean the front of the queue accessible 342 00:16:57,000 --> 00:17:01,000 in there and reversing our way down to the tail 343 00:17:01,000 --> 00:17:05,000 or the other way around. Right. I mean, the tail of the queue up 344 00:17:05,000 --> 00:17:09,000 here in the front 345 00:17:09,000 --> 00:17:15,000 leading the way to the head of the queue. So this is head - I'll call it 346 00:17:15,000 --> 00:17:17,000 front of queue, 347 00:17:17,000 --> 00:17:20,000 and this is back, 348 00:17:20,000 --> 00:17:24,000 this is the back 349 00:17:24,000 --> 00:17:27,000 and that's the front. Okay. So that will be GA that will be GB. But first off, 350 00:17:27,000 --> 00:17:29,000 ask yourself, 351 00:17:29,000 --> 00:17:30,000 where does the queue do its work? 352 00:17:30,000 --> 00:17:38,000 Does it do it at the front or does it do it at the back? 353 00:17:38,000 --> 00:17:41,000 It does it at both. 354 00:17:41,000 --> 00:17:44,000 But when you're end queuing, right, you're manipulating the back of the queue, adding 355 00:17:44,000 --> 00:17:47,000 something to the tail end of the line. When 356 00:17:47,000 --> 00:17:51,000 you are de queuing, you're adding something to the front of the queue. So like unlike the stack where both the 357 00:17:51,000 --> 00:17:53,000 actions were happening in the same place, and that kind of unified your thinking 358 00:17:53,000 --> 00:17:56,000 about where to 359 00:17:56,000 --> 00:17:57,000 allow for that, 360 00:17:57,000 --> 00:18:00,000 you know, easy access to the top, the one thing the stack cared 361 00:18:00,000 --> 00:18:03,000 about, but the bottom of the stack, right, was, you know, 362 00:18:03,000 --> 00:18:04,000 buried and it didn't matter. 363 00:18:04,000 --> 00:18:06,000 The queue actually needs accesses to both. 364 00:18:06,000 --> 00:18:07,000 So that 365 00:18:07,000 --> 00:18:10,000 sort of sounds like that both these strategies, right, have the potential to be 366 00:18:10,000 --> 00:18:14,000 trouble for us. Because it's like in the link list form, right, you know the idea that 367 00:18:14,000 --> 00:18:17,000 access to the head is easy, access to the tail that is a little bit more tricky. 368 00:18:17,000 --> 00:18:20,000 Now, I said we could add a tail pointer. And maybe that's actually going to be part of the 369 00:18:20,000 --> 00:18:22,000 answer to this, right, 370 00:18:22,000 --> 00:18:26,000 but as it is, with no head pointers in here, that adding an item to A would 371 00:18:26,000 --> 00:18:30,000 involve traversing that queue to find the end to put a new one on. 372 00:18:30,000 --> 00:18:34,000 Similarly, if indeed, the inverse operation is going to be our trouble, which 373 00:18:34,000 --> 00:18:38,000 is de queuing, we have to kind of walk its way down to find the last element 374 00:18:38,000 --> 00:18:41,000 in the link list ordering, which is [inaudible] queue. 375 00:18:41,000 --> 00:18:43,000 So what say we add a 376 00:18:43,000 --> 00:18:44,000 tail pointer in both cases? So we 377 00:18:44,000 --> 00:18:46,000 have a head that 378 00:18:46,000 --> 00:18:48,000 points to here and a tail that points to there, it's 379 00:18:48,000 --> 00:18:50,000 like that will make our 380 00:18:50,000 --> 00:18:53,000 problem a little bit easier to deal with. So 381 00:18:53,000 --> 00:18:55,000 let me look at B first. 382 00:18:55,000 --> 00:18:58,000 That adding to the back of the queue is easy. We saw how we did that with the stack, 383 00:18:58,000 --> 00:18:59,000 right, so that 384 00:18:59,000 --> 00:19:01,000 the End queue 385 00:19:01,000 --> 00:19:05,000 operation doesn't need that tail pointer and it already kind of wasn't a problem for us. 386 00:19:05,000 --> 00:19:08,000 Now, the D queue operation would mean, okay, using our tail pointer to know where the 387 00:19:08,000 --> 00:19:11,000 last close value is tells us it's ten. 388 00:19:11,000 --> 00:19:14,000 But we have the same problem I had pointed out with the stack, which is, 389 00:19:14,000 --> 00:19:18,000 we can get to this value, we can return this value, you know, we can delete this 390 00:19:18,000 --> 00:19:23,000 cell, but what we need to do, also, in the operation, update our tail pointer so a 391 00:19:23,000 --> 00:19:25,000 subsequent D queue can do its work efficiently. 392 00:19:25,000 --> 00:19:28,000 And it's that backing up of the tail pointer that 393 00:19:28,000 --> 00:19:29,000 is the sticky point 394 00:19:29,000 --> 00:19:33,000 that given that ten doesn't know who points into it, the single link list if very 395 00:19:33,000 --> 00:19:35,000 asymmetric that way. 396 00:19:35,000 --> 00:19:37,000 But you only know what's follows you, not what proceeds, 397 00:19:37,000 --> 00:19:40,000 but backing up this pointer is not easy. It 398 00:19:40,000 --> 00:19:44,000 involves the reversal of going back to the beginning, walking your way down to find somebody who 399 00:19:44,000 --> 00:19:46,000 points to the last [inaudible]. 400 00:19:46,000 --> 00:19:50,000 So it seems like this tail pointer didn't buy us a lot. It helped a little bit, you know, 401 00:19:50,000 --> 00:19:53,000 to get some of the operation up and running, but in the end, keeping and maintaining 402 00:19:53,000 --> 00:19:56,000 that tail pointer sort of came back to bite us. 403 00:19:56,000 --> 00:19:58,000 In the case of the Strategy A, 404 00:19:58,000 --> 00:20:02,000 the tail pointer gives us immediate access to the back, which we're going to need for 405 00:20:02,000 --> 00:20:05,000 the end queue. If I go to end queue of 40, in this case, 406 00:20:05,000 --> 00:20:10,000 then what I'm going to need to do is make a new cell, attach it off of the 407 00:20:10,000 --> 00:20:11,000 current cell, 408 00:20:11,000 --> 00:20:13,000 and then update the tail, 409 00:20:13,000 --> 00:20:17,000 right, to point to that cell while that update's moving forward down the list, 410 00:20:17,000 --> 00:20:20,000 right. If you're on the third cell and you add a fourth cell, you need to move the tail 411 00:20:20,000 --> 00:20:24,000 from the third to the fourth. Moving that direction's easy. It's the backing 412 00:20:24,000 --> 00:20:27,000 up where we got into trouble. So in fact, this 413 00:20:27,000 --> 00:20:31,000 suggests that Strategy A is the one we can make both these operations 414 00:20:31,000 --> 00:20:33,000 manipulate in constant time in a way 415 00:20:33,000 --> 00:20:35,000 that this one got 416 00:20:35,000 --> 00:20:37,000 us into trouble. 417 00:20:37,000 --> 00:20:41,000 I think we can do it. 418 00:20:41,000 --> 00:20:44,000 Let's switch it over. I've got an 419 00:20:44,000 --> 00:20:49,000 empty queue here. 420 00:20:49,000 --> 00:20:53,000 Also, it is empty in size. This is the same problem with size, which is either you have to go count them 421 00:20:53,000 --> 00:20:55,000 or you have to 422 00:20:55,000 --> 00:20:55,000 423 00:20:55,000 --> 00:20:57,000 cache the size. 424 00:20:57,000 --> 00:21:01,000 I will build the same exactly structure, in fact. 425 00:21:01,000 --> 00:21:11,000 Now, I'll type 426 00:21:11,000 --> 00:21:12,000 next. It's going to have both a 427 00:21:12,000 --> 00:21:14,000 head and a tail pointer 428 00:21:14,000 --> 00:21:18,000 so we can keep track of the two ends we've got going. I think I'm 429 00:21:18,000 --> 00:21:24,000 missing my [inaudible]. Slip over. 430 00:21:24,000 --> 00:21:29,000 Okay. And 431 00:21:29,000 --> 00:21:32,000 then I will make the same comment here about, yeah, you need to delete all cells. 432 00:21:32,000 --> 00:21:34,000 So I set the head and the tail to null. I'm 433 00:21:34,000 --> 00:21:41,000 now in my empty state that tells me I've got nothing in the queue, nothing at all, so far. And 434 00:21:41,000 --> 00:21:48,000 my Y is empty. Oh. 435 00:21:48,000 --> 00:21:50,000 We'll check that 436 00:21:50,000 --> 00:21:53,000 return equals, equals null, just like 437 00:21:53,000 --> 00:21:54,000 it did. In fact, it looks a lot like 438 00:21:54,000 --> 00:21:55,000 the stack, actually. 439 00:21:55,000 --> 00:21:59,000 And here I'm going to show you something kind of funny. If I go take 440 00:21:59,000 --> 00:22:04,000 my stack CCPs pop. 441 00:22:04,000 --> 00:22:08,000 So if you remember what pop did over here, is it took the front most cell off the 442 00:22:08,000 --> 00:22:11,000 list. And 443 00:22:11,000 --> 00:22:12,000 in the case of the queue 444 00:22:12,000 --> 00:22:16,000 that was the 445 00:22:16,000 --> 00:22:18,000 popping operation. And 446 00:22:18,000 --> 00:22:21,000 it turns out the D queue 447 00:22:21,000 --> 00:22:25,000 operation is exactly the same. 448 00:22:25,000 --> 00:22:29,000 If we're empty, right, then we want to raise there. Otherwise, we take the head's 449 00:22:29,000 --> 00:22:29,000 value. 450 00:22:29,000 --> 00:22:33,000 We move around the head, we delete the old memory. So it's not 451 00:22:33,000 --> 00:22:36,000 actually the top element. I should, actually, be a little bit more 452 00:22:36,000 --> 00:22:39,000 careful about my copying and pasting here. I could really call that 453 00:22:39,000 --> 00:22:41,000 front. It's the front-most element on the top. 454 00:22:41,000 --> 00:22:43,000 But it's like the exact same mechanics work 455 00:22:43,000 --> 00:22:48,000 to take a cell off the front in that way. Well, that's sort of 456 00:22:48,000 --> 00:22:48,000 nice. 457 00:22:48,000 --> 00:22:51,000 And it's like, yeah, well, since I have some code around I want to go try it. 458 00:22:51,000 --> 00:22:55,000 My stack, also, kind of does some things useful in push that I might be able to use. It's not 459 00:22:55,000 --> 00:22:58,000 exactly the same because it's adding it to the front, but it is 460 00:22:58,000 --> 00:23:00,000 setting up a new cell. So I'm going to 461 00:23:00,000 --> 00:23:05,000 make a new cell and I set its value. Then, it's attaching 462 00:23:05,000 --> 00:23:07,000 looks a little different. Let's take a look. 463 00:23:07,000 --> 00:23:11,000 We omitted a cell, copied the value in, 464 00:23:11,000 --> 00:23:16,000 now the goal is giving our pointer to the tail, we want to attach onto the tail. 465 00:23:16,000 --> 00:23:22,000 So we know that it's always going to have a next of null, the new cell. So no 466 00:23:22,000 --> 00:23:23,000 matter what, 467 00:23:23,000 --> 00:23:26,000 it will be the new last cell and it defiantly needs that value that tells 468 00:23:26,000 --> 00:23:28,000 us we're at the end of the list. 469 00:23:28,000 --> 00:23:32,000 And then we have to attach it to our tail. I'm going to write this code first and it's going to be 470 00:23:32,000 --> 00:23:35,000 wrong. So just go along with me in this case. 471 00:23:35,000 --> 00:23:37,000 If the tail's next is the new cell. 472 00:23:37,000 --> 00:23:40,000 So if I'm wiring in the pointer from the tail onto the cell we have 473 00:23:40,000 --> 00:23:41,000 there, 474 00:23:41,000 --> 00:23:48,000 and then we want to update the tail to point to the cell. So 475 00:23:48,000 --> 00:23:48,000 almost, 476 00:23:48,000 --> 00:23:53,000 but not quite, everything I need to do. 477 00:23:53,000 --> 00:23:56,000 Someone want to point out to me what it is I have failed 478 00:23:56,000 --> 00:24:05,000 to consider in this situation. [Inaudible]. 479 00:24:05,000 --> 00:24:09,000 It's the what? [Inaudible]. Yeah. ? trying to make null point to something [inaudible]. Exactly. So if my queue is totally empty. So in these cases like 480 00:24:09,000 --> 00:24:12,000 - one thing you can often think about whenever you see yourself with this 481 00:24:12,000 --> 00:24:15,000 error, and you're dereferencing something, you have to considered, well, is there a possibility 482 00:24:15,000 --> 00:24:20,000 that that value is null. And if so I'd better do something 483 00:24:20,000 --> 00:24:23,000 to protect against it. So in the case of the new cell here, I'm setting aside and 484 00:24:23,000 --> 00:24:27,000 not clearing that cell. We know it's valid, [inaudible], we've got good memory. 485 00:24:27,000 --> 00:24:30,000 That part's good but the access to the tail, and that's assuming 486 00:24:30,000 --> 00:24:34,000 there is a tail, that there is at least one cell already in the queue that the tail 487 00:24:34,000 --> 00:24:35,000 is pointing to. 488 00:24:35,000 --> 00:24:37,000 When that's not true, 489 00:24:37,000 --> 00:24:39,000 this is going to blow up. It's going to try to dereference it. 490 00:24:39,000 --> 00:24:43,000 So what we can do here is we can just check for that. We can say 491 00:24:43,000 --> 00:24:45,000 if the queue is empty 492 00:24:45,000 --> 00:24:48,000 then 493 00:24:48,000 --> 00:24:50,000 what we want to do is say that head 494 00:24:50,000 --> 00:24:53,000 equals tail equals new 495 00:24:53,000 --> 00:24:54,000 cell. 496 00:24:54,000 --> 00:25:00,000 Otherwise, we've got some existing tail and we 497 00:25:00,000 --> 00:25:02,000 just need to attach to it. 498 00:25:02,000 --> 00:25:05,000 Now, this is what I meant about that often there are little bit 499 00:25:05,000 --> 00:25:10,000 of special cases for certain 500 00:25:10,000 --> 00:25:12,000 different inputs. In particular, the most common ones are 501 00:25:12,000 --> 00:25:16,000 an empty list or a single link list being distinguished from a list that has 502 00:25:16,000 --> 00:25:20,000 two or more elements. So sometimes it's all about - a single link list has problems and 503 00:25:20,000 --> 00:25:23,000 so does the empty list. In this case, it's exactly the empty list. Like a 504 00:25:23,000 --> 00:25:27,000 single cell is actually fine, you know, one, ten, 6,000 505 00:25:27,000 --> 00:25:31,000 all work equally well. It's that empty case that we have to work on 506 00:25:31,000 --> 00:25:36,000 setting our head and tail to that. So with 507 00:25:36,000 --> 00:25:38,000 this plan we go back to my 508 00:25:38,000 --> 00:25:40,000 use of this. And 509 00:25:40,000 --> 00:25:47,000 stop talking about my stack and change into my queue, 510 00:25:47,000 --> 00:25:56,000 end queuing instead 511 00:25:56,000 --> 00:25:58,000 of pushing. [Inaudible]. Oh, 591 errors. 512 00:25:58,000 --> 00:26:01,000 Well, 513 00:26:01,000 --> 00:26:15,000 that's really - 591, what is that? [Inaudible]. 514 00:26:15,000 --> 00:26:17,000 Oh, yeah, this is the year I was going 515 00:26:17,000 --> 00:26:20,000 to talk about but I 516 00:26:20,000 --> 00:26:23,000 failed to do. 517 00:26:23,000 --> 00:26:26,000 So this is the kind of thing you'll get 518 00:26:26,000 --> 00:26:30,000 from failing to protect your [inaudible] from 519 00:26:30,000 --> 00:26:31,000 being revisited. 520 00:26:31,000 --> 00:26:33,000 And so I'll 521 00:26:33,000 --> 00:26:36,000 be - if I haven't deft this 522 00:26:36,000 --> 00:26:39,000 funny little symbol I just made up then I want to find it in the 523 00:26:39,000 --> 00:26:42,000 end deft. So if it came back around and it saw this same header file, it was 524 00:26:42,000 --> 00:26:45,000 supposed to say I already seen that symbol and I won't do it. But, in fact, I had 525 00:26:45,000 --> 00:26:47,000 accidentally named one of them 526 00:26:47,000 --> 00:26:50,000 with a capital H and one with a lowercase H. It says if this symbol's not been 527 00:26:50,000 --> 00:26:53,000 identified then define this other symbol and keep going. And so the next time it saw 528 00:26:53,000 --> 00:26:57,000 the header files it said if that this lowercase H hasn't been 529 00:26:57,000 --> 00:27:00,000 defined, well it hadn't so it made it again and then it kept doing that until it got really 530 00:27:00,000 --> 00:27:03,000 totally twisted up about it. So I will 531 00:27:03,000 --> 00:27:06,000 make them match, with any 532 00:27:06,000 --> 00:27:07,000 luck, 533 00:27:07,000 --> 00:27:11,000 without the caps lock on. Three times is a charm. 534 00:27:11,000 --> 00:27:12,000 And then one, two, three 535 00:27:12,000 --> 00:27:21,000 pop out the other side of queue, as I was hoping. Okay. 536 00:27:21,000 --> 00:27:24,000 So you know what I'm going to do, actually? I'm actually not going 537 00:27:24,000 --> 00:27:27,000 to go through the alternate implementation of queue. I'm just saying 538 00:27:27,000 --> 00:27:30,000 like given that it works so well for stack, right, to have this kind of 539 00:27:30,000 --> 00:27:34,000 single link list, you know, efficient allocation, it's not surprising that queue, 540 00:27:34,000 --> 00:27:35,000 like, given to us. And what 541 00:27:35,000 --> 00:27:37,000 this is, like I said earlier, I said, well, 542 00:27:37,000 --> 00:27:41,000 you know you can do things with vector, you can do things with 543 00:27:41,000 --> 00:27:44,000 stack, anything you do with stack you can do with vector if you were just being 544 00:27:44,000 --> 00:27:48,000 careful. Right. The pushing, and popping, and the queue D queue, are operations that you could express 545 00:27:48,000 --> 00:27:52,000 in other terms of vector adds and removes and inserts that things. 546 00:27:52,000 --> 00:27:55,000 But one of the nice things about providing a structure like a stack or queue is it says 547 00:27:55,000 --> 00:27:58,000 these are the only things you're allowed to add at this end, remove from that end, 548 00:27:58,000 --> 00:28:02,000 it gives you options as a implementer that don't make sense for the general 549 00:28:02,000 --> 00:28:06,000 purpose case. That the vector as a link list had some compromises 550 00:28:06,000 --> 00:28:10,000 that probably weren't worth making. But in the case of stack and queue it actually came out 551 00:28:10,000 --> 00:28:13,000 very cleanly and very nicely, like this tidy little access to one end, or the 552 00:28:13,000 --> 00:28:15,000 other end, or both, right, 553 00:28:15,000 --> 00:28:17,000 was easily managed with a link list and then it didn't have any of the constraints of 554 00:28:17,000 --> 00:28:20,000 the continuous memory to fight with. 555 00:28:20,000 --> 00:28:21,000 And so often having a structure 556 00:28:21,000 --> 00:28:25,000 that actually offers less gives you some different options as an 557 00:28:25,000 --> 00:28:28,000 implementer because you know you don't have to support access into the middle in 558 00:28:28,000 --> 00:28:31,000 an efficient way because 559 00:28:31,000 --> 00:28:33,000 the stack and queue don't let you. They don't let you riffle through the 560 00:28:33,000 --> 00:28:35,000 middle of the stack or the queue to do things 561 00:28:35,000 --> 00:28:38,000 and that gives you some freedom as an implementer, which is neat to take advantage 562 00:28:38,000 --> 00:28:43,000 of. Yes, sir. [Inaudible] in Java 563 00:28:43,000 --> 00:28:55,000 that, you know, we supposed to use an Array whenever we could because Array uses more 564 00:28:55,000 --> 00:29:00,000 memory ? Um hm. Does the queue 565 00:29:00,000 --> 00:29:03,000 and stack use more memory than an Array, for example? So typically they'll be about the same, because of exactly this one thing, which is it allocates tightly so it asks for one cell per entry that's in there, right. And that cell has this overhead in the form of this extra pointer. 566 00:29:03,000 --> 00:29:06,000 Right. So in effect it means that everything got a little bit bigger because 567 00:29:06,000 --> 00:29:08,000 everything's carrying a little overhead. 568 00:29:08,000 --> 00:29:11,000 Okay. So there's a little bit more memory. The thing the Array is doing is it tends to be 569 00:29:11,000 --> 00:29:15,000 allocating extra slots that aren't being used. So it's not usually tightly allocated 570 00:29:15,000 --> 00:29:18,000 either. So in any given situation it could be as much as twice as big 571 00:29:18,000 --> 00:29:19,000 because it had that extra space. 572 00:29:19,000 --> 00:29:22,000 So kind of in the tradeoff they tend to be in the same 573 00:29:22,000 --> 00:29:25,000 general range. This one has about twice the space, I think it's about four 574 00:29:25,000 --> 00:29:28,000 bites it has a four-byte pointer. The thing's much bigger 575 00:29:28,000 --> 00:29:30,000 and it turns out actually this four bite pointer might be a smaller percentage of 576 00:29:30,000 --> 00:29:35,000 the overhead and will be a larger amount of the capacity in the excess case. 577 00:29:35,000 --> 00:29:37,000 So for a lot of things they're going to be about the same range. And then there's 578 00:29:37,000 --> 00:29:40,000 a few isolated situations where, 579 00:29:40,000 --> 00:29:43,000 yeah, if you have very big things being stored having a lot of excess capacity 580 00:29:43,000 --> 00:29:45,000 is likely to be a more expensive 581 00:29:45,000 --> 00:29:48,000 part of the vector strategy than the link list. But the 582 00:29:48,000 --> 00:29:51,000 link list does have this built in overhead for every element. 583 00:29:51,000 --> 00:29:54,000 And so it's not the case where you would say right off the bat, 584 00:29:54,000 --> 00:29:55,000 well, 585 00:29:55,000 --> 00:29:58,000 definitely an array over link list because that the allocation space. 586 00:29:58,000 --> 00:30:01,000 It's like, hey, you have to think about the whole picture. 587 00:30:01,000 --> 00:30:05,000 That is going to be the whole next lectures. Well, you 588 00:30:05,000 --> 00:30:06,000 know, it's about tradeoffs. 589 00:30:06,000 --> 00:30:09,000 It's not that one strategy's always going to be 590 00:30:09,000 --> 00:30:12,000 the clearly better one. If it is, then we don't need to think about 591 00:30:12,000 --> 00:30:14,000 anything else. There are times when 592 00:30:14,000 --> 00:30:16,000 you know 593 00:30:16,000 --> 00:30:16,000 594 00:30:16,000 --> 00:30:19,000 that one is just great and there's no reason to think about anything else. And then 595 00:30:19,000 --> 00:30:20,000 there's other times when 596 00:30:20,000 --> 00:30:24,000 there are reasons to think about different ways to solve the same problem. 597 00:30:24,000 --> 00:30:26,000 So let me 598 00:30:26,000 --> 00:30:27,000 propose a case study 599 00:30:27,000 --> 00:30:31,000 that has some 600 00:30:31,000 --> 00:30:34,000 meat. You may think along the way, but you're going to really have to pretend that 601 00:30:34,000 --> 00:30:38,000 this is going to be relevant at first because it's going to seem like you've been 602 00:30:38,000 --> 00:30:41,000 transferred back to 603 00:30:41,000 --> 00:30:44,000 1970, which was a very bad year, long before you born. 604 00:30:44,000 --> 00:30:49,000 I want you to think about the idea of a text setter. So Microsoft Word, 605 00:30:49,000 --> 00:30:50,000 or 606 00:30:50,000 --> 00:30:53,000 you know your email send window, or 607 00:30:53,000 --> 00:30:57,000 BB edit, or X code. All these things, right, have some backing of 608 00:30:57,000 --> 00:31:00,000 what usually is called a buffer, 609 00:31:00,000 --> 00:31:02,000 a buffer, sort of a funny word, right, 610 00:31:02,000 --> 00:31:06,000 but a buffer of characters behind it. 611 00:31:06,000 --> 00:31:07,000 Okay. And that buffer 612 00:31:07,000 --> 00:31:11,000 is used as the kind of document storage for all the characters 613 00:31:11,000 --> 00:31:12,000 614 00:31:12,000 --> 00:31:15,000 that you're editing and moving around. As you kind of bop around and move around there's 615 00:31:15,000 --> 00:31:19,000 something often called the cursor where you type characters 616 00:31:19,000 --> 00:31:21,000 and characters get moved as you insert. 617 00:31:21,000 --> 00:31:25,000 You can select things, and delete them, and cut, copy, and paste, and all that sort of stuff. 618 00:31:25,000 --> 00:31:28,000 So what does that data structure look like? What is the kind of thing that wants to 619 00:31:28,000 --> 00:31:29,000 back 620 00:31:29,000 --> 00:31:30,000 a buffer? 621 00:31:30,000 --> 00:31:33,000 What kinds of things are important in that 622 00:31:33,000 --> 00:31:35,000 context to be able to support 623 00:31:35,000 --> 00:31:40,000 and what operations needs to be optimized for that tool to field [inaudible]? 624 00:31:40,000 --> 00:31:44,000 So what we're going to look at is a real simplification kind of taking that problem, kind of 625 00:31:44,000 --> 00:31:48,000 distilling it down to its essences and looking at just six operations that are kind 626 00:31:48,000 --> 00:31:50,000 of at the core of any text editor 627 00:31:50,000 --> 00:31:54,000 about moving the cursor forward and backwards with these commands. 628 00:31:54,000 --> 00:31:57,000 Jumping to the front and the back of the entire buffer and then inserting and 629 00:31:57,000 --> 00:32:00,000 deleting in relative to the cursor 630 00:32:00,000 --> 00:32:01,000 within that buffer. 631 00:32:01,000 --> 00:32:04,000 And we're going to do this with an extremely old school interface that's 632 00:32:04,000 --> 00:32:08,000 actually command base. It's like VI comes back from the dead for those of you who 633 00:32:08,000 --> 00:32:11,000 have ever heard of such things as VI and E Max. 634 00:32:11,000 --> 00:32:13,000 And so we're going to look at this because there's actually a lot of ways 635 00:32:13,000 --> 00:32:15,000 you can approach that, 636 00:32:15,000 --> 00:32:18,000 that take advantage of some of the things that we've already built like the vector, and stacking queue, 637 00:32:18,000 --> 00:32:21,000 and the link list. And sort of think about what 638 00:32:21,000 --> 00:32:24,000 kind of options play out well. 639 00:32:24,000 --> 00:32:28,000 What I'm going to show you first off is just what the tool looks like 640 00:32:28,000 --> 00:32:31,000 so you can really feel 641 00:32:31,000 --> 00:32:33,000 schooled in the old way of doing things. 642 00:32:33,000 --> 00:32:34,000 So I insert 643 00:32:34,000 --> 00:32:38,000 the characters A, B, C, D, E, F, G into the editor. 644 00:32:38,000 --> 00:32:41,000 And then I can use the command B to backup. 645 00:32:41,000 --> 00:32:44,000 I can use the command J to jump to the beginning, E to jump to the end, 646 00:32:44,000 --> 00:32:46,000 and so F and B, 647 00:32:46,000 --> 00:32:50,000 B moving me backwards, F moving me forwards, and then once I'm at a place, let's say I jump 648 00:32:50,000 --> 00:32:53,000 to the end, I can insert Z, Z, 649 00:32:53,000 --> 00:32:58,000 Y, X here at the beginning and it'll slide things over, and then I can delete characters 650 00:32:58,000 --> 00:33:00,000 shuffling things down. 651 00:33:00,000 --> 00:33:04,000 So that's what we've got. Imagine now, it's like building 652 00:33:04,000 --> 00:33:07,000 Microsoft Word on top of this. There's a lot of stuff that goes, you know, between here 653 00:33:07,000 --> 00:33:11,000 and there. but it does give you an idea that the kind of core data requirements 654 00:33:11,000 --> 00:33:12,000 every word processor 655 00:33:12,000 --> 00:33:16,000 needs some tool like this some abstraction underneath it to manage the 656 00:33:16,000 --> 00:33:23,000 character data. Okay. Where do we start? Where do 657 00:33:23,000 --> 00:33:31,000 we start? 658 00:33:31,000 --> 00:33:33,000 Anyone want to propose an implementation. 659 00:33:33,000 --> 00:33:37,000 What's the easiest thing to do? [Inaudible]. In a 660 00:33:37,000 --> 00:33:40,000 way. Something like that. You know, and if you think Array, 661 00:33:40,000 --> 00:33:43,000 then you should immediately think after it, no, I don't want to deal with Array, I want a vector. 662 00:33:43,000 --> 00:33:44,000 I'd like a vector. 663 00:33:44,000 --> 00:33:45,000 Vector please. 664 00:33:45,000 --> 00:33:47,000 Right. So I'm going to show you what the interface looks like, here, that we 665 00:33:47,000 --> 00:33:50,000 have a buffer in these six operations. We're going 666 00:33:50,000 --> 00:33:52,000 to think about what the private part looks like, 667 00:33:52,000 --> 00:33:54,000 and we say, what about vector. 668 00:33:54,000 --> 00:33:57,000 Vector handles big, you know, 669 00:33:57,000 --> 00:34:00,000 chunky things indexed 670 00:34:00,000 --> 00:34:01,000 by slots. 671 00:34:01,000 --> 00:34:05,000 That might be a good idea. So if I had the buffer contents A, B, C, D, E, 672 00:34:05,000 --> 00:34:08,000 right, and if I could slam those into a vector, 673 00:34:08,000 --> 00:34:10,000 and then I would need one other little bit of information here, which is 674 00:34:10,000 --> 00:34:14,000 where is the cursor in any given point because the cursor is actually where the 675 00:34:14,000 --> 00:34:16,000 uncertain lead operations take place relative to the cursor. 676 00:34:16,000 --> 00:34:19,000 The buffer's also going to manage that, knowing where the insertion point is 677 00:34:19,000 --> 00:34:22,000 and then deleting it and inserting relative to that. 678 00:34:22,000 --> 00:34:23,000 So I say, okay. 679 00:34:23,000 --> 00:34:25,000 I need some index. 680 00:34:25,000 --> 00:34:29,000 The cursor actually is between two character positions. 681 00:34:29,000 --> 00:34:31,000 682 00:34:31,000 --> 00:34:31,000 683 00:34:31,000 --> 00:34:33,000 This is like slot zero, one, and two. 684 00:34:33,000 --> 00:34:37,000 And the cursor is between, actually, one 685 00:34:37,000 --> 00:34:41,000 and two. And there's just a minor detail to kind of have worked out 686 00:34:41,000 --> 00:34:46,000 before we start trying to write this code, which is do we want the 687 00:34:46,000 --> 00:34:49,000 cursor to hold the index on the character that's to the left of it or to the 688 00:34:49,000 --> 00:34:50,000 right of it. 689 00:34:50,000 --> 00:34:52,000 It's totally symmetric, right. 690 00:34:52,000 --> 00:34:55,000 If I have these five characters, it could be that my cursor then goes from 691 00:34:55,000 --> 00:34:56,000 zero 692 00:34:56,000 --> 00:34:58,000 to four, it could be that it goes 693 00:34:58,000 --> 00:35:01,000 from zero to five, or it could be it goes from negative one to four, depending on 694 00:35:01,000 --> 00:35:05,000 which way I'm doing it. I'm going to happen to use the one that does zero 695 00:35:05,000 --> 00:35:07,000 to five 696 00:35:07,000 --> 00:35:12,000 where it's actually recorded as the character that's after it. Okay. 697 00:35:12,000 --> 00:35:16,000 So that's just kind of the staring point. I'm going to write some code and 698 00:35:16,000 --> 00:35:20,000 make it happen. 699 00:35:20,000 --> 00:35:23,000 So I'm going 700 00:35:23,000 --> 00:35:25,000 to remove 701 00:35:25,000 --> 00:35:34,000 this. Reference it, that's okay. 702 00:35:34,000 --> 00:35:36,000 And then I'm going to add the 703 00:35:36,000 --> 00:35:37,000 things I want to 704 00:35:37,000 --> 00:35:39,000 replace it with. So I put in the 705 00:35:39,000 --> 00:35:45,000 editor code and to put in the buffer code. Was buffer already in here? I think buffer's already here. 706 00:35:45,000 --> 00:35:51,000 Okay. Let's take a look. I get my buffer. 707 00:35:51,000 --> 00:36:02,000 So 708 00:36:02,000 --> 00:36:08,000 right now, let's take a look what's in buffered. I think I have the 709 00:36:08,000 --> 00:36:11,000 starting set information. So it has move cursor over, you know move the 710 00:36:11,000 --> 00:36:14,000 cursor around, it has the delete, and it has - let 711 00:36:14,000 --> 00:36:18,000 me make it have 712 00:36:18,000 --> 00:36:20,000 some variables that I want. I've got 713 00:36:20,000 --> 00:36:23,000 my cursor. I've got my vector of characters. Right now, it has 714 00:36:23,000 --> 00:36:26,000 a lot of copying, I think, just in anticipation of going places 715 00:36:26,000 --> 00:36:29,000 where copying is going to be required. 716 00:36:29,000 --> 00:36:31,000 Right now, because it's using vector and vector has deep copying behavior, I'm 717 00:36:31,000 --> 00:36:37,000 actually okay if I let windows copy and go through. But I'm actually going to 718 00:36:37,000 --> 00:36:39,000 not do that right away. 719 00:36:39,000 --> 00:36:45,000 All right. So then let's look at the implementation side of this and I have an 720 00:36:45,000 --> 00:36:47,000 implementation in here I want to get rid of. Sorry about this but I 721 00:36:47,000 --> 00:36:54,000 left the old one in here. That will do all the things that will need to 722 00:36:54,000 --> 00:36:56,000 get done. 723 00:36:56,000 --> 00:36:58,000 Okay. So let's deal with 724 00:36:58,000 --> 00:37:06,000 at the beginning. Starting off, right, we will set the cursor to zero. So that's 725 00:37:06,000 --> 00:37:08,000 the vector gets started, 726 00:37:08,000 --> 00:37:11,000 initialize empty, and nothing else I need to do with the vector. 727 00:37:11,000 --> 00:37:16,000 A character [inaudible] cursor position in that, and then moving the cursor position 728 00:37:16,000 --> 00:37:17,000 forward, 729 00:37:17,000 --> 00:37:21,000 it's mostly just doing this, right, cursor plus, plus, right. It's just 730 00:37:21,000 --> 00:37:23,000 an easy index to move it down. But 731 00:37:23,000 --> 00:37:26,000 I do need to make sure that the cursor isn't already at the end. So if 732 00:37:26,000 --> 00:37:29,000 the cursor is less than 733 00:37:29,000 --> 00:37:36,000 the size of the vector then I will let it advance. But it never gets beyond the 734 00:37:36,000 --> 00:37:38,000 735 00:37:38,000 --> 00:37:40,000 size itself. So 736 00:37:40,000 --> 00:37:43,000 like all of these are going to have same problems. Like, if I'm already at the end and somebody asks me to advance, I don't 737 00:37:43,000 --> 00:37:46,000 want to suddenly get my cursor in some lax stated that 738 00:37:46,000 --> 00:37:52,000 sort of back that. Similarly, on this one, if the cursor 739 00:37:52,000 --> 00:37:56,000 is greater than zero, all right, then we can back it up and 740 00:37:56,000 --> 00:37:58,000 move the cursor around. 741 00:37:58,000 --> 00:38:01,000 Moving the cursor at the start is very easy. 742 00:38:01,000 --> 00:38:03,000 Moving the cursor to the end is, also, 743 00:38:03,000 --> 00:38:05,000 very easy, right. We have a 744 00:38:05,000 --> 00:38:08,000 convention about what it is. 745 00:38:08,000 --> 00:38:11,000 And so then inserting a character 746 00:38:11,000 --> 00:38:15,000 is, also, pretty easy because all the hard work is done 747 00:38:15,000 --> 00:38:17,000 by the vector. 748 00:38:17,000 --> 00:38:18,000 749 00:38:18,000 --> 00:38:20,000 When I say insert this new character 750 00:38:20,000 --> 00:38:23,000 at the cursor position, right, then this character advance the 751 00:38:23,000 --> 00:38:26,000 cursor to be kind of past it, and then 752 00:38:26,000 --> 00:38:31,000 deleting the character, also, pretty easy. Let 753 00:38:31,000 --> 00:38:34,000 me show you, before I go finishing the cursor, I just want to show 754 00:38:34,000 --> 00:38:38,000 you this little diagram of what's happening while I'm doing stuff. So this is kind of a 755 00:38:38,000 --> 00:38:41,000 visualization of the things that are happening. So this is showing the vector and 756 00:38:41,000 --> 00:38:45,000 its length. And then the cursor position 757 00:38:45,000 --> 00:38:48,000 as we've done things. Inserted six characters, the cursor's currently sitting 758 00:38:48,000 --> 00:38:52,000 at the end, and for the length of the cursor, actually, the same value right now. If 759 00:38:52,000 --> 00:38:56,000 I issue a back command, right, that deducts the cursor by 760 00:38:56,000 --> 00:38:58,000 one, it shows another one backs up by one, 761 00:38:58,000 --> 00:39:01,000 and then eventually the cursor gets to the position zero and subsequent backs don't 762 00:39:01,000 --> 00:39:04,000 do anything to it, it just kind of bounces off the edge. And 763 00:39:04,000 --> 00:39:08,000 moving forward, things are easier to deal with. It will advance until the gets to the length, 764 00:39:08,000 --> 00:39:10,000 the size of that text, and it won't go any further. 765 00:39:10,000 --> 00:39:14,000 And that jumping to the front in the end are just a matter of updating that 766 00:39:14,000 --> 00:39:17,000 cursor position from zero to the size and whatnot. 767 00:39:17,000 --> 00:39:19,000 The operation where 768 00:39:19,000 --> 00:39:21,000 things get a little 769 00:39:21,000 --> 00:39:22,000 more 770 00:39:22,000 --> 00:39:25,000 bogged down is on this insert 771 00:39:25,000 --> 00:39:30,000 where I need to insert a position. If I insert the X, Y, Z 772 00:39:30,000 --> 00:39:32,000 into the middle there and you can see it kind of 773 00:39:32,000 --> 00:39:37,000 chopping those characters down, making that space to insert those things in the 774 00:39:37,000 --> 00:39:37,000 middle. 775 00:39:37,000 --> 00:39:41,000 Similarly, doing delete operations, if I jump 776 00:39:41,000 --> 00:39:43,000 to the 777 00:39:43,000 --> 00:39:45,000 beginning I start doing deletes here. 778 00:39:45,000 --> 00:39:48,000 It has to copy all those characters over and 779 00:39:48,000 --> 00:39:53,000 deduct that length, right, so that the vectors do that work for us on remove that. 780 00:39:53,000 --> 00:39:55,000 That means that if the vector's very large, which is 781 00:39:55,000 --> 00:39:58,000 not a typical in a word processor situation, you could have you PhD theses 782 00:39:58,000 --> 00:40:00,000 which has hundreds of pages, 783 00:40:00,000 --> 00:40:03,000 if you go back to the beginning and you start deleting characters you'd hate to think it was just 784 00:40:03,000 --> 00:40:05,000 taking this massive shuffle time 785 00:40:05,000 --> 00:40:09,000 to get the work done. So 786 00:40:09,000 --> 00:40:13,000 let's see my - I'm going 787 00:40:13,000 --> 00:40:14,000 to 788 00:40:14,000 --> 00:40:24,000 bring in the display that 789 00:40:24,000 --> 00:40:38,000 - 790 00:40:38,000 --> 00:40:40,000 whoops, 791 00:40:40,000 --> 00:40:42,000 I don't like something here. [Inaudible]. Now, what do we have? Oh, look at this. Well, well, we will continue on. 792 00:40:42,000 --> 00:41:00,000 This inside - okay. Hello. [Inaudible]. I have no idea. I'm not going to try to [inaudible]. 793 00:41:00,000 --> 00:41:02,000 Okay. 794 00:41:02,000 --> 00:41:08,000 It's inside. 795 00:41:08,000 --> 00:41:21,000 That's got me totally upset. Okay. We still have our old code somewhere. Okay. Why do I still have the wrong - what is - okay. Oh, 796 00:41:21,000 --> 00:41:25,000 well. Today's not my lucky day. I'm not going to worry about that too much. I'm just going to show [inaudible]. Okay. 797 00:41:25,000 --> 00:41:27,000 So 798 00:41:27,000 --> 00:41:29,000 seeing what's actually happening, right, it's like just mostly 799 00:41:29,000 --> 00:41:31,000 800 00:41:31,000 --> 00:41:34,000 kind of moving that stuff back and forth, and then having the vector do the big 801 00:41:34,000 --> 00:41:37,000 work, right, of inserting and deleting those characters, and doing all the copying 802 00:41:37,000 --> 00:41:40,000 to get the thing done. 803 00:41:40,000 --> 00:41:43,000 We'll come back over here 804 00:41:43,000 --> 00:41:47,000 and kind of see what good it does and what things it's not so hot 805 00:41:47,000 --> 00:41:48,000 at, right, 806 00:41:48,000 --> 00:41:51,000 is that it is easy to move the cursor wherever you want. 807 00:41:51,000 --> 00:41:54,000 You know, that moving it a long distance or a short distance, moving it is a little 808 00:41:54,000 --> 00:41:55,000 bit 809 00:41:55,000 --> 00:41:56,000 810 00:41:56,000 --> 00:41:59,000 or to the beginning to the end, equally easy as by just resetting 811 00:41:59,000 --> 00:42:01,000 some variable. 812 00:42:01,000 --> 00:42:05,000 It's the enter and delete, right where this one actually really suffers, right, 813 00:42:05,000 --> 00:42:08,000 based on how many characters, right, follow that cursor, 814 00:42:08,000 --> 00:42:11,000 right you could potentially be moving the entire contents of the buffer. 815 00:42:11,000 --> 00:42:14,000 Adding at the end is actually going to be relatively fast. So if you have 816 00:42:14,000 --> 00:42:17,000 to type in order, like you just type you theses out 817 00:42:17,000 --> 00:42:19,000 and never go back to edit anything, 818 00:42:19,000 --> 00:42:22,000 then it would be fine, adding those characters at the end. But at any 819 00:42:22,000 --> 00:42:25,000 point, if you move the cursor back into the mid of the body, somewhere in 820 00:42:25,000 --> 00:42:28,000 the front, somewhere in the middle, right, a lot of characters get shuffled as you 821 00:42:28,000 --> 00:42:30,000 change, makes edits in the middle there. 822 00:42:30,000 --> 00:42:35,000 And so what this seems that it actually has good movement but bad editing 823 00:42:35,000 --> 00:42:38,000 as a buffer strategy. 824 00:42:38,000 --> 00:42:39,000 That's 825 00:42:39,000 --> 00:42:40,000 probably, 826 00:42:40,000 --> 00:42:43,000 you know, given that we have these six operations we'll all interested in, 827 00:42:43,000 --> 00:42:46,000 this is probably not the mix that seems to make the most sense for something 828 00:42:46,000 --> 00:42:49,000 that's called an editor, right. It is the editing operations are the ones that is 829 00:42:49,000 --> 00:42:52,000 weakest on, it has its most 830 00:42:52,000 --> 00:42:53,000 831 00:42:53,000 --> 00:42:54,000 debilitating performance, 832 00:42:54,000 --> 00:42:58,000 it wouldn't make that good of an editor, right, if that was going to be your behavior. It 833 00:42:58,000 --> 00:43:02,000 has one advantage that we're going see, actually, we're going to have to kind of 834 00:43:02,000 --> 00:43:06,000 make compromises on it, which it actually uses very little extra space, though, but 835 00:43:06,000 --> 00:43:08,000 it's using the vector, 836 00:43:08,000 --> 00:43:12,000 which potentially might have excess capacity up to twice that. So maybe it's 837 00:43:12,000 --> 00:43:14,000 about two bites per char in the very worse case. 838 00:43:14,000 --> 00:43:18,000 But it actually has no per character overhead that's already imbedded 839 00:43:18,000 --> 00:43:23,000 in the data structure from this side. 840 00:43:23,000 --> 00:43:31,000 Now, I'm just going to go totally somewhere else. Okay. 841 00:43:31,000 --> 00:43:34,000 So instead of 842 00:43:34,000 --> 00:43:37,000 thinking of it as vector, thinking of it as continuous block, 843 00:43:37,000 --> 00:43:41,000 is to kind of realize the editing operations, right, 844 00:43:41,000 --> 00:43:44,000 that comes back to the idea, like if I were working at the very end of my document that I 845 00:43:44,000 --> 00:43:47,000 can edit there efficiently. 846 00:43:47,000 --> 00:43:48,000 847 00:43:48,000 --> 00:43:51,000 It kind of inspires me to think of, how do I think I can make it so 848 00:43:51,000 --> 00:43:55,000 the insertion point is somehow in the efficient place. Like the 849 00:43:55,000 --> 00:43:58,000 edits that are happening on the insertion point, can I somehow make it to where 850 00:43:58,000 --> 00:44:01,000 access to the insertion point is easy? 851 00:44:01,000 --> 00:44:04,000 That rather than bearing that in the middle of my vector, if I can expose that 852 00:44:04,000 --> 00:44:08,000 to make it accessible easily in the way the code manipulates the internal of the 853 00:44:08,000 --> 00:44:09,000 buffer. 854 00:44:09,000 --> 00:44:14,000 And so the idea here is to break up the text into two pieces. 855 00:44:14,000 --> 00:44:17,000 Things that are to the left of the cursor, things that are to the right, and I'm calling this 856 00:44:17,000 --> 00:44:19,000 the before and the after. 857 00:44:19,000 --> 00:44:20,000 And then 858 00:44:20,000 --> 00:44:24,000 organize those so they're actually averted from each other to where all the 859 00:44:24,000 --> 00:44:27,000 characters that lead up to the cursor 860 00:44:27,000 --> 00:44:31,000 are set up to where B is very close, assessable at the top of, in this 861 00:44:31,000 --> 00:44:32,000 case, a stack, 862 00:44:32,000 --> 00:44:36,000 and that the things that are farther away are buried further down in the stack in that 863 00:44:36,000 --> 00:44:37,000 inaccessible region. 864 00:44:37,000 --> 00:44:41,000 And similarly over here, but inverted, that C is the top of this stack and D 865 00:44:41,000 --> 00:44:44,000 and E, and things are further down, they are buried 866 00:44:44,000 --> 00:44:46,000 away from me, 867 00:44:46,000 --> 00:44:50,000 figuring the things I really need access to are right next to the cursor 868 00:44:50,000 --> 00:44:55,000 that I'm willing to kind of move those other things father away to gain that access. 869 00:44:55,000 --> 00:45:00,000 And so what this buffer does is it uses two stacks, a 870 00:45:00,000 --> 00:45:01,000 before stack, 871 00:45:01,000 --> 00:45:03,000 an after stack, 872 00:45:03,000 --> 00:45:07,000 and that the operations for moving the data around 873 00:45:07,000 --> 00:45:09,000 become 874 00:45:09,000 --> 00:45:10,000 transferring things 875 00:45:10,000 --> 00:45:13,000 from the before to the after stack. 876 00:45:13,000 --> 00:45:16,000 Where's the cursor? 877 00:45:16,000 --> 00:45:19,000 How does the cursor represent it? Do I need some more data here? 878 00:45:19,000 --> 00:45:25,000 Some integer some ? 879 00:45:25,000 --> 00:45:28,000 It's really kind of odd to think about but there is no explicit cursor, right, 880 00:45:28,000 --> 00:45:31,000 being stored here but the cursor is, in this case, right, like 881 00:45:31,000 --> 00:45:34,000 modeled by what's on the before stack or all the things to the left of the cursor. 882 00:45:34,000 --> 00:45:38,000 What's on the after stack is following the cursor. So in 883 00:45:38,000 --> 00:45:39,000 fact, 884 00:45:39,000 --> 00:45:42,000 the cursor is represented as the empty space between these two 885 00:45:42,000 --> 00:45:45,000 where you have, 886 00:45:45,000 --> 00:45:48,000 you know, how many ever characters wrong before is actually the index of where 887 00:45:48,000 --> 00:45:51,000 the cursor is. So kind of a 888 00:45:51,000 --> 00:45:54,000 wacky way of thinking about it, but one that actually have some pretty neat properties 889 00:45:54,000 --> 00:45:57,000 in terms of making it run efficiently. 890 00:45:57,000 --> 00:46:01,000 So let me show you 891 00:46:01,000 --> 00:46:02,000 the 892 00:46:02,000 --> 00:46:03,000 893 00:46:03,000 --> 00:46:05,000 diagram of it happening. 894 00:46:05,000 --> 00:46:09,000 So insert A, B, C, D, E, F, G. 895 00:46:09,000 --> 00:46:12,000 And so, the operation of inserting 896 00:46:12,000 --> 00:46:15,000 a new character into this form of the buffer 897 00:46:15,000 --> 00:46:19,000 is pushing something on the before stack. 898 00:46:19,000 --> 00:46:23,000 So if I want to move the cursor, let's say I want to move it back one, 899 00:46:23,000 --> 00:46:25,000 then I pop 900 00:46:25,000 --> 00:46:27,000 the top off of the before push it on the after. 901 00:46:27,000 --> 00:46:32,000 I keep doing that and it transfers the G to the H, you know, the E and so on, as I 902 00:46:32,000 --> 00:46:33,000 back up. 903 00:46:33,000 --> 00:46:37,000 Moving forward does that same operation in the inverse. If I want to move forward I take 904 00:46:37,000 --> 00:46:40,000 something off the top of the after and push it on before. So it's kind of shuffling it 905 00:46:40,000 --> 00:46:42,000 from one side to the other. 906 00:46:42,000 --> 00:46:46,000 Given that these push and pop operations are constant on both implementations of 907 00:46:46,000 --> 00:46:47,000 the stack we saw, 908 00:46:47,000 --> 00:46:52,000 then that means that our cursor movement is also of one so I can do 909 00:46:52,000 --> 00:46:55,000 this. And if I do deletes it's actually just popping from the after stack and 910 00:46:55,000 --> 00:46:57,000 throwing it away. 911 00:46:57,000 --> 00:47:00,000 So also easy to do edits 912 00:47:00,000 --> 00:47:04,000 because I'm talking about adding things to the before and deleting things from after, 913 00:47:04,000 --> 00:47:06,000 both of which can be done very efficiently. 914 00:47:06,000 --> 00:47:07,000 The one operation 915 00:47:07,000 --> 00:47:10,000 that this guy has trouble with 916 00:47:10,000 --> 00:47:14,000 is big cursor movement. 917 00:47:14,000 --> 00:47:17,000 If I want to go all the way to the beginning 918 00:47:17,000 --> 00:47:19,000 or all the way to the end, 919 00:47:19,000 --> 00:47:22,000 then everything's got to go. No 920 00:47:22,000 --> 00:47:23,000 matter 921 00:47:23,000 --> 00:47:28,000 how many things I have to empty out the entire after stack to position the 922 00:47:28,000 --> 00:47:29,000 cursor at the end, 923 00:47:29,000 --> 00:47:31,000 or empty out the entire before stack 924 00:47:31,000 --> 00:47:35,000 to get it all the way over here. 925 00:47:35,000 --> 00:47:38,000 So I could of sort of talk you though this code, and actually, I won't type it 926 00:47:38,000 --> 00:47:41,000 just because I actually want to talk about its analysis more than I want to see its code. Each of 927 00:47:41,000 --> 00:47:43,000 these actually becomes a one liner. 928 00:47:43,000 --> 00:47:46,000 Insert is push on the before stack. 929 00:47:46,000 --> 00:47:49,000 Delete is pop from the after stack. The cursor movement is popping 930 00:47:49,000 --> 00:47:53,000 from one and pushing on the other depending on the direction and then that same code 931 00:47:53,000 --> 00:47:56,000 just for the Y loop, like wild or something on the one stack, just keep 932 00:47:56,000 --> 00:47:59,000 pushing and popping from one empty to the other. 933 00:47:59,000 --> 00:48:02,000 So very little code to write, right, 934 00:48:02,000 --> 00:48:05,000 depending a lot on the stacks abstractions coming through for us 935 00:48:05,000 --> 00:48:09,000 and then making this ultra fit that we've managed to get editing suddenly, 936 00:48:09,000 --> 00:48:10,000 like, a lot more efficient 937 00:48:10,000 --> 00:48:15,000 but the cost of sacrificing one of our other operations. Let's take a look at what that 938 00:48:15,000 --> 00:48:19,000 look like, right. 939 00:48:19,000 --> 00:48:22,000 Suddenly I can do all this insert and delete 940 00:48:22,000 --> 00:48:23,000 at the insertion point 941 00:48:23,000 --> 00:48:24,000 942 00:48:24,000 --> 00:48:25,000 with very little trouble, 943 00:48:25,000 --> 00:48:29,000 and I suddenly made this other operation, oddly enough, 944 00:48:29,000 --> 00:48:30,000 slow. 945 00:48:30,000 --> 00:48:33,000 All a sudden it's like if you go back to the beginning of your document you're in the 946 00:48:33,000 --> 00:48:36,000 middle of editing, you're at the bottom. You say, oh, I want to go back to the beginning 947 00:48:36,000 --> 00:48:39,000 and you can imagine how that would feel if you were typing 948 00:48:39,000 --> 00:48:42,000 it would actually be the act of going up and clicking up in the top right by where you made 949 00:48:42,000 --> 00:48:44,000 the insertion and all a sudden you'd see the white cursor 950 00:48:44,000 --> 00:48:45,000 and you'd be like, 951 00:48:45,000 --> 00:48:49,000 why is that taking so long. How could that possibly be, you know, 952 00:48:49,000 --> 00:48:51,000 that it's doing this kind of rearrangement, but yet it made 953 00:48:51,000 --> 00:48:54,000 editing even on a million character document fast. But 954 00:48:54,000 --> 00:48:55,000 that movement 955 00:48:55,000 --> 00:48:58,000 long distances, you know, jumping a couple of pages or 956 00:48:58,000 --> 00:48:59,000 front to back, 957 00:48:59,000 --> 00:49:05,000 suddenly, having to do this big transfer behind the scenes, so 958 00:49:05,000 --> 00:49:08,000 different, right. So now I had six operations, I had four fast, 959 00:49:08,000 --> 00:49:09,000 two slow, 960 00:49:09,000 --> 00:49:12,000 now I have four fast, two slow. 961 00:49:12,000 --> 00:49:16,000 It still might be that, actually, this is an interesting improvement, though, because 962 00:49:16,000 --> 00:49:19,000 we have decided those are operations that we're willing to tolerate, 963 00:49:19,000 --> 00:49:23,000 being slower, especially, if it means some operation that we're more interested in, if 964 00:49:23,000 --> 00:49:24,000 performance being faster. 965 00:49:24,000 --> 00:49:28,000 And that likely seems like it's moving in the right direction because editing, right, being 966 00:49:28,000 --> 00:49:31,000 the whole purpose, right, behind what we're trying to do is 967 00:49:31,000 --> 00:49:32,000 a text setter, 968 00:49:32,000 --> 00:49:34,000 that seems like that might be a good call. 969 00:49:34,000 --> 00:49:38,000 What I'm going to start looking at and it's going to be something I want to talk about 970 00:49:38,000 --> 00:49:39,000 next time, 971 00:49:39,000 --> 00:49:42,000 is what can a link list do for us? 972 00:49:42,000 --> 00:49:46,000 Both of those are fighting against that continuous thing, the copying and the 973 00:49:46,000 --> 00:49:48,000 shuffling, and the inserting 974 00:49:48,000 --> 00:49:51,000 is there something about that flexibility of the rewiring 975 00:49:51,000 --> 00:49:53,000 that can kind of get us out of this 976 00:49:53,000 --> 00:49:55,000 some operations having to be slow 977 00:49:55,000 --> 00:50:00,000 because other operations are fast. Question. [Inaudible]. Yeah. 978 00:50:00,000 --> 00:50:03,000 Why is space used for stacks when 979 00:50:03,000 --> 00:50:04,000 those are half? Yeah. 980 00:50:04,000 --> 00:50:08,000 So in this case, right, depending on how you're 981 00:50:08,000 --> 00:50:09,000 modeling your stack, if it is with 982 00:50:09,000 --> 00:50:12,000 vectors, right, then you're likely to have the before and the after stack, both 983 00:50:12,000 --> 00:50:16,000 have capacity for everything, even when they're not used, right. And so it's very 984 00:50:16,000 --> 00:50:17,000 likely that either right 985 00:50:17,000 --> 00:50:20,000 give 100 characters they're either on one stack or the other, but the other one might be kind of 986 00:50:20,000 --> 00:50:23,000 harboring the 100 spots for them when they come back. 987 00:50:23,000 --> 00:50:27,000 Or it could just be that you have the link list, which are allocating and de allocating but 988 00:50:27,000 --> 00:50:30,000 has the overhead of that. So it's likely that no matter which implementation the stack 989 00:50:30,000 --> 00:50:33,000 you have, you probably have about twice the storage that you need because of the 990 00:50:33,000 --> 00:50:38,000 two sides and the overhead that's kind of coming and going there. 991 00:50:38,000 --> 00:50:41,000 So we will see the link list and we'll talk that guy through on 992 00:50:41,000 --> 00:50:46,000 Friday. I'll see you then.