1 00:00:00,000 --> 00:00:07,000 2 00:00:07,000 --> 00:00:12,000 3 00:00:12,000 --> 00:00:15,000 This presentation is delivered by the Stanford Center for Professional 4 00:00:15,000 --> 00:00:25,000 Development. 5 00:00:25,000 --> 00:00:27,000 How are we doing? 6 00:00:27,000 --> 00:00:28,000 Spring apparently is here; 7 00:00:28,000 --> 00:00:33,000 just like California, one day it's winter, one day it's spring. Hopefully it's here to say. 8 00:00:33,000 --> 00:00:38,000 So we had talked about doing it, we did this last time, we did an unsorted vector 9 00:00:38,000 --> 00:00:40,000 implementation of the map, we just did a little struct of 10 00:00:40,000 --> 00:00:42,000 the key value pair. 11 00:00:42,000 --> 00:00:44,000 And then we went through 12 00:00:44,000 --> 00:00:48,000 that both add and get value are running at linear time in that form because of the need 13 00:00:48,000 --> 00:00:52,000 to search; search to find a value to override it, and search to find a value to 14 00:00:52,000 --> 00:00:54,000 pull out the existing 15 00:00:54,000 --> 00:00:55,000 match to it. 16 00:00:55,000 --> 00:00:58,000 And if we went to the sorted form of that, we could get get value to be able to 17 00:00:58,000 --> 00:01:02,000 capitalize on that binary search property and be able to quickly find the 18 00:01:02,000 --> 00:01:06,000 key if it exists in there or to determine it doesn't exist. 19 00:01:06,000 --> 00:01:09,000 But the add still suffered from the shuffling problem. We can quickly 20 00:01:09,000 --> 00:01:12,000 find, using the binary search to find where the 21 00:01:12,000 --> 00:01:15,000 new element would need to go. But if it doesn't already go there, it doesn't 22 00:01:15,000 --> 00:01:17,000 already exist. We're going to have to shuffle to make that space. 23 00:01:17,000 --> 00:01:21,000 So we have still a linear factor in there that we're working on getting rid of. 24 00:01:21,000 --> 00:01:24,000 And then I know at the bottom that none of them have any 25 00:01:24,000 --> 00:01:27,000 built in overhead per element. There may be some capacity, excess capacity in the 26 00:01:27,000 --> 00:01:30,000 vector that we're absorbing, but there's no 27 00:01:30,000 --> 00:01:33,000 four, ten, eight-byte chunk that every 28 00:01:33,000 --> 00:01:36,000 element is taking as a cost. 29 00:01:36,000 --> 00:01:39,000 So we started to talk about this at the end on Friday, and we're going to go with this and 30 00:01:39,000 --> 00:01:41,000 see where this takes 31 00:01:41,000 --> 00:01:45,000 us. We had the idea of the vector, and we know what vector constraints are, right? This 32 00:01:45,000 --> 00:01:48,000 idea of the contiguous memory, and we were going to think about the link list for a minute to 33 00:01:48,000 --> 00:01:51,000 see if it was going to buy us anything. 34 00:01:51,000 --> 00:01:54,000 It tends to have an inverse relationship, that once you know where you 35 00:01:54,000 --> 00:01:57,000 want to insert something it's easy to do that rearrangement because it's just a 36 00:01:57,000 --> 00:02:01,000 little bit of pointer rewiring that needs to happen in that situation. 37 00:02:01,000 --> 00:02:03,000 But it's hard to find a position insert, 38 00:02:03,000 --> 00:02:06,000 even if the link list is in sorted order, 39 00:02:06,000 --> 00:02:10,000 knowing where the S's are, or the M's are, or the B's are is still a 40 00:02:10,000 --> 00:02:13,000 matter of starting from the beginning; that your only access in the 41 00:02:13,000 --> 00:02:16,000 singly linked list is to start at the beginning and work your way down. 42 00:02:16,000 --> 00:02:20,000 And so I'm going to take you through a little bit of a thought exercise about well, what if 43 00:02:20,000 --> 00:02:23,000 we try to do something about that? Right now having a pointer coming into 44 00:02:23,000 --> 00:02:26,000 Bashful and then subsequent ones going in sorted order 45 00:02:26,000 --> 00:02:27,000 isn't really very helpful. 46 00:02:27,000 --> 00:02:30,000 But if we were willing to try to rearrange our pointers 47 00:02:30,000 --> 00:02:33,000 to give us the access that we have in the 48 00:02:33,000 --> 00:02:36,000 sorted array form; for example, if I had a pointer 49 00:02:36,000 --> 00:02:39,000 to the middlemost element, in 50 00:02:39,000 --> 00:02:42,000 this case the Grumpy that's 51 00:02:42,000 --> 00:02:43,000 midway down the list, if I had 52 00:02:43,000 --> 00:02:45,000 that pointer 53 00:02:45,000 --> 00:02:49,000 and from there I could say, is the element I'm looking for 54 00:02:49,000 --> 00:02:53,000 ahead of Grumpy or behind Grumpy. So having Grumpy split my input in 55 00:02:53,000 --> 00:02:54,000 half. 56 00:02:54,000 --> 00:02:56,000 If I had that, right, then that would actually get me 57 00:02:56,000 --> 00:02:59,000 sort of moving towards that direction that I wanted to get, which is using the binary 58 00:02:59,000 --> 00:03:03,000 search property to facilitate throwing away half the data 59 00:03:03,000 --> 00:03:06,000 and look into just the half remaining that's interesting. 60 00:03:06,000 --> 00:03:09,000 Well, if I had a pointer to Grumpy, that doesn't quite solve my problem. A pointer to Grumpy 61 00:03:09,000 --> 00:03:12,000 could get me halfway down the list, but then the pointer's coming in this way. If I 62 00:03:12,000 --> 00:03:16,000 inverted the pointers leading away from it, 63 00:03:16,000 --> 00:03:17,000 that's also kind of helping 64 00:03:17,000 --> 00:03:20,000 to facilitate what I'm trying to do. So if I could point toward the middle 65 00:03:20,000 --> 00:03:24,000 and then kind of move away, and then if I actually kind of applied the same strategy not just at the topmost 66 00:03:24,000 --> 00:03:28,000 level, but at every subsequent division down, from the half, to the 67 00:03:28,000 --> 00:03:29,000 quarter to the eight, 68 00:03:29,000 --> 00:03:32,000 to the sixteenth. And instead of having a pointer 69 00:03:32,000 --> 00:03:34,000 to the front of the list, what I really maintained was a pointer to the 70 00:03:34,000 --> 00:03:36,000 middlemost element, and then 71 00:03:36,000 --> 00:03:38,000 that element maintained pointers 72 00:03:38,000 --> 00:03:40,000 to the middle of the 73 00:03:40,000 --> 00:03:44,000 upper quarter and the middle of the lower quarter, and all the way down. 74 00:03:44,000 --> 00:03:46,000 Then I could pull up this list 75 00:03:46,000 --> 00:03:50,000 into something that we call a binary search tree. 76 00:03:50,000 --> 00:03:54,000 So the way I would be storing this is having a pointer to a node, which 77 00:03:54,000 --> 00:03:58,000 is expected to the median or the middle, or somewhere right along 78 00:03:58,000 --> 00:03:59,000 the 79 00:03:59,000 --> 00:04:00,000 values being stored. 80 00:04:00,000 --> 00:04:04,000 And that it has two pointers, not just a next or previous or something like that, that are related to all 81 00:04:04,000 --> 00:04:08,000 the elements that precede this one in the ordering, and all the elements that 82 00:04:08,000 --> 00:04:09,000 follow it are in 83 00:04:09,000 --> 00:04:11,000 these two pointers. 84 00:04:11,000 --> 00:04:14,000 They call these subtrees coming off the smaller trees over here with three 85 00:04:14,000 --> 00:04:17,000 elements on this side, and three elements on that side. 86 00:04:17,000 --> 00:04:19,000 And that each of those nodes 87 00:04:19,000 --> 00:04:22,000 has recursively the same formulation; that it has two 88 00:04:22,000 --> 00:04:25,000 subtree hanging off of it 89 00:04:25,000 --> 00:04:27,000 that contain, 90 00:04:27,000 --> 00:04:32,000 given any particular node, that there's this binary search ordering here that says 91 00:04:32,000 --> 00:04:36,000 everything that's in that left subtree precedes this node in ordering. 92 00:04:36,000 --> 00:04:38,000 And everything that's in that right subtree 93 00:04:38,000 --> 00:04:39,000 follows it. 94 00:04:39,000 --> 00:04:44,000 And that's true for every node throughout the entire arrangement here. 95 00:04:44,000 --> 00:04:47,000 So this is called a tree, and 96 00:04:47,000 --> 00:04:50,000 this particular form is actually called the binary search tree. I'm 97 00:04:50,000 --> 00:04:53,000 going to throw a little bit of vocabulary at you, because I'm going to start using these words, and I just want to get 98 00:04:53,000 --> 00:04:54,000 them out there 99 00:04:54,000 --> 00:04:58,000 early so that you understand what those words mean when I start using them. 100 00:04:58,000 --> 00:05:01,000 Is that each of the cells in here, 101 00:05:01,000 --> 00:05:02,000 like the cells that are 102 00:05:02,000 --> 00:05:04,000 allocated for a link list, 103 00:05:04,000 --> 00:05:07,000 are individually heap allocated, we're going to have pointers that allow us to get back 104 00:05:07,000 --> 00:05:08,000 to 105 00:05:08,000 --> 00:05:12,000 them. We typically call those nodes; in terms of tree I'll call it a cell or node in the list kind 106 00:05:12,000 --> 00:05:16,000 of interchangeably. I'm more likely to use node when I'm talking about tree. 107 00:05:16,000 --> 00:05:19,000 And a tree is really just a pointer to a node. 108 00:05:19,000 --> 00:05:21,000 The empty tree 109 00:05:21,000 --> 00:05:24,000 would be a pointer whose value is null, so pointing at nothing. 110 00:05:24,000 --> 00:05:27,000 A singleton tree would be a pointer to a single node 111 00:05:27,000 --> 00:05:29,000 that has left and right subtrees 112 00:05:29,000 --> 00:05:31,000 that are both empty or both null. 113 00:05:31,000 --> 00:05:35,000 So Bashful by itself is a singleton, a pointer to 114 00:05:35,000 --> 00:05:38,000 Bashful. When we talk about subtrees, it just kind of means looking at the top; Grumpy has two 115 00:05:38,000 --> 00:05:40,000 subtrees, a left and a right, 116 00:05:40,000 --> 00:05:41,000 117 00:05:41,000 --> 00:05:42,000 that 118 00:05:42,000 --> 00:05:45,000 themselves are just smaller forms of the same recursive structure. 119 00:05:45,000 --> 00:05:49,000 And then we would say that Grumpy is the parent of Doc and Sleepy. 120 00:05:49,000 --> 00:05:52,000 So those are the two children nodes, and we'll call them the left child and the right 121 00:05:52,000 --> 00:05:53,000 child. 122 00:05:53,000 --> 00:05:56,000 And the node that's at the very top, 123 00:05:56,000 --> 00:06:00,000 the one that our external pointer is coming into and then kind of leads the way, is called 124 00:06:00,000 --> 00:06:02,000 the root node. The 125 00:06:02,000 --> 00:06:02,000 first node 126 00:06:02,000 --> 00:06:05,000 of the tree where we first start our work is going to be called the root. 127 00:06:05,000 --> 00:06:09,000 And these nodes down here at the bottom that have empty subtrees are called leaf nodes. 128 00:06:09,000 --> 00:06:14,000 So a node that has no further subtrees, so by itself has null 129 00:06:14,000 --> 00:06:16,000 left and right trees, is called a leaf. 130 00:06:16,000 --> 00:06:18,000 So that actually means the tree is kind of upside down 131 00:06:18,000 --> 00:06:21,000 if you look at it. That Grumpy's the root and these things are the 132 00:06:21,000 --> 00:06:25,000 leaves shows you that apparently most computer scientists don't get out very much, 133 00:06:25,000 --> 00:06:28,000 because they haven't noticed that trees actually grow the other direction. 134 00:06:28,000 --> 00:06:29,000 135 00:06:29,000 --> 00:06:32,000 But just kind of go with it, and 136 00:06:32,000 --> 00:06:33,000 you'll be 137 00:06:33,000 --> 00:06:36,000 speaking tree-speak in no time. 138 00:06:36,000 --> 00:06:43,000 Anybody have any questions about what we've got basically set up here? 139 00:06:43,000 --> 00:06:45,000 So there is a 140 00:06:45,000 --> 00:06:48,000 whole family of things that come under the heading of tree in computer 141 00:06:48,000 --> 00:06:52,000 science. We're going to look in particular at this one embodiment, the binary search 142 00:06:52,000 --> 00:06:53,000 tree. 143 00:06:53,000 --> 00:06:56,000 But before I get deep down in the details of that, I just want to back 144 00:06:56,000 --> 00:06:59,000 up for a second and just talk about what trees in general mean, and what kind of 145 00:06:59,000 --> 00:07:01,000 things in general are represented in manipulative trees. 146 00:07:01,000 --> 00:07:06,000 So that you have a sense of what is the broad notion of tree as a 147 00:07:06,000 --> 00:07:08,000 computer scientist would see it, 148 00:07:08,000 --> 00:07:10,000 is just that a tree is a recursive branching structure 149 00:07:10,000 --> 00:07:14,000 where there are nodes, which have pointers to subtrees. 150 00:07:14,000 --> 00:07:17,000 There may be one of those subtrees. There may be ten of those subtrees. There 151 00:07:17,000 --> 00:07:20,000 may be two; depending on the nature of the tree you're working on there may be a 152 00:07:20,000 --> 00:07:22,000 constraint on exactly how many children they have 153 00:07:22,000 --> 00:07:26,000 or some flexibility in the number of children they have 154 00:07:26,000 --> 00:07:27,000 155 00:07:27,000 --> 00:07:29,000 depending on what's being modeled here. 156 00:07:29,000 --> 00:07:31,000 There's always a single root node. 157 00:07:31,000 --> 00:07:35,000 There's a node that kind of sits at the top of this hierarchy that leads down to 158 00:07:35,000 --> 00:07:36,000 these things, 159 00:07:36,000 --> 00:07:41,000 and that to form a tree there has to be a reachable path from the root 160 00:07:41,000 --> 00:07:43,000 to every single node. 161 00:07:43,000 --> 00:07:46,000 So there aren't some nodes that exist way out here, and there aren't two different ways to 162 00:07:46,000 --> 00:07:48,000 get to the same 163 00:07:48,000 --> 00:07:50,000 node in a tree; that if the 164 00:07:50,000 --> 00:07:53,000 midterm is kept in the folder called exams, which is in the folder 165 00:07:53,000 --> 00:07:56,000 called cs106, which is in my home folder. There's not some other way that 166 00:07:56,000 --> 00:07:59,000 you can get to the midterm; the midterm lives exactly that part in the tree, 167 00:07:59,000 --> 00:08:01,000 and there's no 168 00:08:01,000 --> 00:08:04,000 circularity or loops in the structure. 169 00:08:04,000 --> 00:08:07,000 And so a lot of things do get modeled tree, the one I've drawn here is just the file system. 170 00:08:07,000 --> 00:08:10,000 Think about how your files are maintained on a computer, there tends to be a 171 00:08:10,000 --> 00:08:11,000 structure of an 172 00:08:11,000 --> 00:08:14,000 outermost folder, which has inner folders, 173 00:08:14,000 --> 00:08:17,000 which have inner folders. And along the way there are these files, which you can think of as leaf 174 00:08:17,000 --> 00:08:21,000 nodes that have no further subtrees underneath them. 175 00:08:21,000 --> 00:08:24,000 And you can talk about, say, what is the size of 176 00:08:24,000 --> 00:08:27,000 all the folders in my home directory? Well it would kind 177 00:08:27,000 --> 00:08:28,000 of be recursively defined as summing over 178 00:08:28,000 --> 00:08:30,000 all the folders within it, 179 00:08:30,000 --> 00:08:33,000 how much storage is being used by those; which when it gets down to a leaf node can say, how 180 00:08:33,000 --> 00:08:38,000 much space does this file take up and kind of add those together. 181 00:08:38,000 --> 00:08:42,000 Things like your family trees, right, sort of modeling genealogy. 182 00:08:42,000 --> 00:08:44,000 Things like the decomposition tree in your program; 183 00:08:44,000 --> 00:08:47,000 main calling, 184 00:08:47,000 --> 00:08:50,000 read file calling, set up boggle board calling, 185 00:08:50,000 --> 00:08:52,000 give instructions. 186 00:08:52,000 --> 00:08:56,000 There is a tree structure to that, about what pieces use what pieces and 187 00:08:56,000 --> 00:08:57,000 descendants from there. 188 00:08:57,000 --> 00:09:01,000 The ones that we're going to look at in particularly are the ones that are in the family of the binary trees. So a binary 189 00:09:01,000 --> 00:09:05,000 tree has two children, exactly two children. There's a left and a right 190 00:09:05,000 --> 00:09:07,000 pointer coming out of each node. 191 00:09:07,000 --> 00:09:11,000 One or both of those can be null, but there can never be more than two 192 00:09:11,000 --> 00:09:11,000 children. 193 00:09:11,000 --> 00:09:15,000 And there is sort of space for recording two children, as opposed to recording things which 194 00:09:15,000 --> 00:09:17,000 are trinary, which have three 195 00:09:17,000 --> 00:09:20,000 possibilities for children, or where there may be some four or eight or 196 00:09:20,000 --> 00:09:22,000 some other number there. 197 00:09:22,000 --> 00:09:24,000 And in particular it's the search tree 198 00:09:24,000 --> 00:09:26,000 that's going to help us get 199 00:09:26,000 --> 00:09:28,000 the work done that we want, which is 200 00:09:28,000 --> 00:09:32,000 there's not just any old arrangement of what's on the left, and what's on the right side. 201 00:09:32,000 --> 00:09:35,000 That the nodes are going to be organized in here to facilitate binary search by 202 00:09:35,000 --> 00:09:36,000 keeping 203 00:09:36,000 --> 00:09:37,000 one half 204 00:09:37,000 --> 00:09:40,000 over on the left and one half over on the right, and that we'll know which 205 00:09:40,000 --> 00:09:44,000 half something goes in by virtue of looking at the key. So 206 00:09:44,000 --> 00:09:47,000 let's take a look at a simple example of a binary search tree with 207 00:09:47,000 --> 00:09:49,000 numbers. So 208 00:09:49,000 --> 00:09:53,000 if I have a struct node that keeps track of an integer value and has two pointers, the 209 00:09:53,000 --> 00:09:55,000 left and the right, 210 00:09:55,000 --> 00:09:59,000 that point away from it. And in this case my root node is 57, 211 00:09:59,000 --> 00:10:03,000 and looking at 57 I can tell you that every value in this tree 212 00:10:03,000 --> 00:10:07,000 that is less than 57 will be in the left subtree. There will be no values 213 00:10:07,000 --> 00:10:09,000 that are 56 or lower that are on the 214 00:10:09,000 --> 00:10:10,000 right side. 215 00:10:10,000 --> 00:10:14,000 And that will be true at every step down the tree. So if I'm looking for 216 00:10:14,000 --> 00:10:15,000 something 217 00:10:15,000 --> 00:10:18,000 that actually gives me that immediate access we were using the vector's 218 00:10:18,000 --> 00:10:19,000 sorting property for. Which is if 219 00:10:19,000 --> 00:10:24,000 I'm looking for the number 70, and I start at the root node of 57, then I know 220 00:10:24,000 --> 00:10:25,000 it's not in the left subtree, 221 00:10:25,000 --> 00:10:28,000 so assuming that about half the nodes are 222 00:10:28,000 --> 00:10:30,000 pointed to over on that side, 223 00:10:30,000 --> 00:10:34,000 I've now discarded half of the input even from consideration. If 224 00:10:34,000 --> 00:10:36,000 I start at the 70, 225 00:10:36,000 --> 00:10:39,000 and I say okay well if 57 must be over here, then again recursively apply the 226 00:10:39,000 --> 00:10:44,000 same search property here. It's like if I'm looking for 70, is it going to be 227 00:10:44,000 --> 00:10:45,000 before or after 79? 228 00:10:45,000 --> 00:10:49,000 It's going to be before, so in fact I throw away anything that leads away on 229 00:10:49,000 --> 00:10:52,000 the right subtree of 79, and work my way one level down. 230 00:10:52,000 --> 00:10:53,000 So each step 231 00:10:53,000 --> 00:10:58,000 in the recursive search is represented by sort of taking a step down 232 00:10:58,000 --> 00:11:02,000 in that search tree; working my way down from the root to the leaf nodes. And 233 00:11:02,000 --> 00:11:05,000 I either will find the value I'm looking for along the way, 234 00:11:05,000 --> 00:11:08,000 or I will fall off the bottom of that tree 235 00:11:08,000 --> 00:11:10,000 when I haven't been able to find it. 236 00:11:10,000 --> 00:11:13,000 And that will tell me; oh it couldn't have been in the tree, because the only place 237 00:11:13,000 --> 00:11:16,000 it could have been is the one that followed the binary search property. 238 00:11:16,000 --> 00:11:19,000 And so if I look for 70, I say is it greater or less than 239 00:11:19,000 --> 00:11:21,000 62? It's greater. Is it 240 00:11:21,000 --> 00:11:24,000 greater or less than 71? It's less then. And then I work off there I get to the empty 241 00:11:24,000 --> 00:11:26,000 tree. Then I say well if there was a 60, that's where it was 242 00:11:26,000 --> 00:11:31,000 going to be, and it wasn't there so there much not be a 70 in this tree. 243 00:11:31,000 --> 00:11:32,000 So it is exactly designed 244 00:11:32,000 --> 00:11:36,000 to support one kind of search, and one kind of search only, which is this 245 00:11:36,000 --> 00:11:40,000 efficient divide and conquer binary search, work its way down. 246 00:11:40,000 --> 00:11:40,000 247 00:11:40,000 --> 00:11:42,000 The other kinds of things, let's say, that 248 00:11:42,000 --> 00:11:44,000 trees 249 00:11:44,000 --> 00:11:45,000 are used for 250 00:11:45,000 --> 00:11:48,000 may not need this kind of support, so they may not go out of their way to do this. But the one 251 00:11:48,000 --> 00:11:52,000 we're actually very interested in is because of maps doing a lot of 252 00:11:52,000 --> 00:11:55,000 additions into a sorted facility and 253 00:11:55,000 --> 00:11:57,000 searches in that sort of facility. If we can 254 00:11:57,000 --> 00:12:00,000 bottle it as a binary search tree, we're able to get the best of both worlds. To 255 00:12:00,000 --> 00:12:03,000 have that super fast logarithmic search, 256 00:12:03,000 --> 00:12:07,000 as well as have the flexibility of the pointer based structure 257 00:12:07,000 --> 00:12:11,000 to help us do an insert. So if I'm ready to put a 70 into this tree, 258 00:12:11,000 --> 00:12:14,000 the same path that led to me to where it should be 259 00:12:14,000 --> 00:12:16,000 tells us where to put that new node 260 00:12:16,000 --> 00:12:20,000 in place. And if all I need to do is to wire that in place; and if all I need to do to wire that in place is to kind of create a new node in the 261 00:12:20,000 --> 00:12:21,000 heap and 262 00:12:21,000 --> 00:12:24,000 put the pointers coming in and out of it the way that I would in a link list, 263 00:12:24,000 --> 00:12:27,000 then I should be able to do both searches and inserts 264 00:12:27,000 --> 00:12:30,000 in time proportional to the height of the tree, 265 00:12:30,000 --> 00:12:35,000 which for a relatively balanced tree would be logarithmic. So 266 00:12:35,000 --> 00:12:37,000 it sounds like we can get everything we wanted here 267 00:12:37,000 --> 00:12:40,000 with a little bit of work. So 268 00:12:40,000 --> 00:12:43,000 let me tell you - just do a little bit of playing with trees, just to kind of get used to thinking 269 00:12:43,000 --> 00:12:46,000 about how we do stuff on them, and then I'm going to go forward with 270 00:12:46,000 --> 00:12:49,000 implementing map using a binary search tree. 271 00:12:49,000 --> 00:12:52,000 Trees are very, very recursive. So for those of you who still feel like 272 00:12:52,000 --> 00:12:54,000 recursion is a little bit hazy, 273 00:12:54,000 --> 00:12:58,000 this is another great area to strengthen your thinking about 274 00:12:58,000 --> 00:13:01,000 recursive. In fact, I think a lot of people find operating on trees very, very 275 00:13:01,000 --> 00:13:03,000 natural. Because trees are so recursive, the 276 00:13:03,000 --> 00:13:07,000 idea that you write recursive algorithms to them is almost like you do it without even thinking, you don't 277 00:13:07,000 --> 00:13:08,000 have to think hard. You're like, 278 00:13:08,000 --> 00:13:11,000 you typically need to do something to your current node, 279 00:13:11,000 --> 00:13:14,000 and then recursively operate on your left and your right. 280 00:13:14,000 --> 00:13:17,000 And it's so natural that you don't even have stop and think hard about 281 00:13:17,000 --> 00:13:18,000 what that means. So 282 00:13:18,000 --> 00:13:21,000 your base case tends to be getting to an empty tree and having a 283 00:13:21,000 --> 00:13:23,000 pointer that points to null, 284 00:13:23,000 --> 00:13:28,000 and in the other cases have some node operate on your children. 285 00:13:28,000 --> 00:13:31,000 Some of the simplest things are just traversals. 286 00:13:31,000 --> 00:13:33,000 If I tree structure and I'd like to 287 00:13:33,000 --> 00:13:35,000 do something to every node, 288 00:13:35,000 --> 00:13:37,000 go through it and print them all, 289 00:13:37,000 --> 00:13:39,000 go through it and write them to a file, 290 00:13:39,000 --> 00:13:42,000 go through and add them all up to determine the average score, I 291 00:13:42,000 --> 00:13:46,000 need these things. I'm going to need to do some processing that visits every node of the 292 00:13:46,000 --> 00:13:48,000 tree once and exactly once, 293 00:13:48,000 --> 00:13:50,000 and recursion is exactly the way to do that. 294 00:13:50,000 --> 00:13:53,000 And there's a little bit of some choices, and when you're handling a 295 00:13:53,000 --> 00:13:55,000 particular part of a tree, are you handling the current node and then 296 00:13:55,000 --> 00:13:59,000 recurring on the left and right subtrees? Are you recurring first and then handling 297 00:13:59,000 --> 00:14:02,000 the nodes. So some of those things that we saw in terms of the link list 298 00:14:02,000 --> 00:14:03,000 that 299 00:14:03,000 --> 00:14:06,000 change the order of the processing, but they don't change 300 00:14:06,000 --> 00:14:08,000 whether you visit everybody. It's just a matter of which one you want to do 301 00:14:08,000 --> 00:14:09,000 before. 302 00:14:09,000 --> 00:14:12,000 I'm going to look at a couple of the simple traversals here, 303 00:14:12,000 --> 00:14:14,000 and I'll point out 304 00:14:14,000 --> 00:14:16,000 why that mattered. 305 00:14:16,000 --> 00:14:18,000 If we look at in order, 306 00:14:18,000 --> 00:14:22,000 so we'll consider that an in order traversal is for you to operate on your 307 00:14:22,000 --> 00:14:24,000 left subtree recursively, 308 00:14:24,000 --> 00:14:27,000 handle the current node right here, and then operate on your right subtree 309 00:14:27,000 --> 00:14:29,000 recursively. 310 00:14:29,000 --> 00:14:31,000 The base case, which is 311 00:14:31,000 --> 00:14:34,000 kind of hidden in here, is if T is null, we get to an empty subtree we 312 00:14:34,000 --> 00:14:37,000 don't do anything. So only in the nonempty case are we actually 313 00:14:37,000 --> 00:14:39,000 processing and making recursive calls. 314 00:14:39,000 --> 00:14:43,000 So if I start with my node 57 being the root of my tree, and I say 315 00:14:43,000 --> 00:14:45,000 print that tree starting from the root. 316 00:14:45,000 --> 00:14:48,000 It will print everything out of the left subtree, 317 00:14:48,000 --> 00:14:52,000 then print the 57, and then print everything out of the right subtree. 318 00:14:52,000 --> 00:14:55,000 And given that gets applied everywhere down, 319 00:14:55,000 --> 00:14:58,000 the effect will be 57 says first print my left subtree. 320 00:14:58,000 --> 00:15:01,000 And 37 says, well print my left subtree. And 15 says, well first 321 00:15:01,000 --> 00:15:03,000 print my left subtree. 322 00:15:03,000 --> 00:15:06,000 And then seven says, first print my left subtree. Well, that ones empty, 323 00:15:06,000 --> 00:15:09,000 and so nothing gets printed. 324 00:15:09,000 --> 00:15:13,000 As the recursion unwinds, it comes back to the last most call. The last most call was 325 00:15:13,000 --> 00:15:17,000 seven. It says, okay, print seven and now print seven's right subtree. Again that's an empty, 326 00:15:17,000 --> 00:15:19,000 so nothing gets printed there. It 327 00:15:19,000 --> 00:15:22,000 will come back and then do the same thing with 15. I've done everything 328 00:15:22,000 --> 00:15:23,000 to the left of 15; 329 00:15:23,000 --> 00:15:26,000 let's print 15 and its right. 330 00:15:26,000 --> 00:15:29,000 And doing that kind of all the way throughout the tree means that we will get the numbers out 331 00:15:29,000 --> 00:15:31,000 in sorted order; 332 00:15:31,000 --> 00:15:34,000 all right, five, 17, seven, 333 00:15:34,000 --> 00:15:37,000 15, 32, 57, 59, 62, 71, 334 00:15:37,000 --> 00:15:40,000 79, 93. An in order traversal 335 00:15:40,000 --> 00:15:42,000 in a binary search tree 336 00:15:42,000 --> 00:15:50,000 rediscovers, visits all the nodes from smallest to largest. [Inaudible] 337 00:15:50,000 --> 00:15:54,000 So duplicates, you just have to have a strategy. It turns out in the map, the duplicates 338 00:15:54,000 --> 00:15:54,000 overwrite. 339 00:15:54,000 --> 00:15:58,000 So any time you get a new value you print there. Typically you just need to decide, 340 00:15:58,000 --> 00:16:00,000 do they go to the left or do they go to the right? 341 00:16:00,000 --> 00:16:03,000 So that you know when you're equal to if you wanted to find another one 342 00:16:03,000 --> 00:16:04,000 of these, where would it go? 343 00:16:04,000 --> 00:16:07,000 In a lot of situations you just don't have duplicates, so it's not actually a big deal. 344 00:16:07,000 --> 00:16:10,000 You can't throw them arbitrarily on either side because you would end up having to search 345 00:16:10,000 --> 00:16:14,000 both sides to find them. So you want to just know that's it's less than or equal to on the 346 00:16:14,000 --> 00:16:18,000 left and greater than on the right. So that's 347 00:16:18,000 --> 00:16:22,000 pretty neat, right, that that means that this 348 00:16:22,000 --> 00:16:22,000 349 00:16:22,000 --> 00:16:25,000 for purposes of being able to iterate for something, if 350 00:16:25,000 --> 00:16:26,000 you were to walk 351 00:16:26,000 --> 00:16:30,000 the structure in order, then you can produce them back out in sorted order, which tends to be 352 00:16:30,000 --> 00:16:31,000 a convenient way 353 00:16:31,000 --> 00:16:33,000 354 00:16:33,000 --> 00:16:36,000 to print the output. There are a lot of situations where you do want that browsable 355 00:16:36,000 --> 00:16:43,000 list in alphanumerical order. And the tree makes that easy to do. 356 00:16:43,000 --> 00:16:45,000 When it comes time to delete the tree, I'm 357 00:16:45,000 --> 00:16:48,000 going to need to go through and delete each of those nodes individually, the 358 00:16:48,000 --> 00:16:51,000 same way I would on a link list to delete the whole structure. 359 00:16:51,000 --> 00:16:54,000 The 360 00:16:54,000 --> 00:16:57,000 order of traversal that's going to be most convenient for doing that is going to be to do it 361 00:16:57,000 --> 00:16:58,000 post order. 362 00:16:58,000 --> 00:17:00,000 Which is to say, when I'm looking at the root node 57, 363 00:17:00,000 --> 00:17:05,000 I want to delete my entire left subtree recursively, delete my entire right 364 00:17:05,000 --> 00:17:10,000 subtree recursively, and only after both of those pieces have been cleaned up, 365 00:17:10,000 --> 00:17:12,000 delete the node we're on; 366 00:17:12,000 --> 00:17:14,000 so deleting down below to the left, down below to the right 367 00:17:14,000 --> 00:17:17,000 and then delete this one. If I did it in the other order, if 368 00:17:17,000 --> 00:17:21,000 I tried to delete in between in the in order or in the pre order case where I did 369 00:17:21,000 --> 00:17:23,000 my node and then to my subtrees, 370 00:17:23,000 --> 00:17:26,000 I'd be reaching into parts of my memory I've deleted. 371 00:17:26,000 --> 00:17:31,000 So we do want to do it in that order, so 372 00:17:31,000 --> 00:17:32,000 recursion 373 00:17:32,000 --> 00:17:35,000 coming back to visit you again. 374 00:17:35,000 --> 00:17:42,000 Let's talk about how to make map work as a tree. 375 00:17:42,000 --> 00:17:46,000 So if you think about what maps hold, maps hold a key, which is string type; a 376 00:17:46,000 --> 00:17:48,000 value, which is client 377 00:17:48,000 --> 00:17:51,000 some sort of template parameter there. 378 00:17:51,000 --> 00:17:55,000 Now that I've built a structure that raps those two things with a pair and 379 00:17:55,000 --> 00:17:58,000 adds to it, right these two additional pointers to the left and to the right 380 00:17:58,000 --> 00:18:01,000 subtrees that point back to a recursive struct, 381 00:18:01,000 --> 00:18:05,000 the one data member that the map will have at the top is just the root 382 00:18:05,000 --> 00:18:09,000 pointer. The pointer to the node at the top of the tree, 383 00:18:09,000 --> 00:18:12,000 and then from there we'll traverse pointers to find our way to other ones. 384 00:18:12,000 --> 00:18:15,000 And we will organize that tree for the binary search property. 385 00:18:15,000 --> 00:18:16,000 So using the 386 00:18:16,000 --> 00:18:20,000 key as the discriminate of what goes left and right, 387 00:18:20,000 --> 00:18:25,000 we'll compare each new key to the key at the root and decide is it going to go in the left subtree 388 00:18:25,000 --> 00:18:27,000 root or the right subtree root? And so on down to the 389 00:18:27,000 --> 00:18:31,000 matching position when we're doing a find, or to the place to insert if we're ready to 390 00:18:31,000 --> 00:18:33,000 put a new node in. 391 00:18:33,000 --> 00:18:36,000 So get value and add will have very similar structures, in terms of 392 00:18:36,000 --> 00:18:38,000 searching that tree 393 00:18:38,000 --> 00:18:39,000 394 00:18:39,000 --> 00:18:40,000 from the root 395 00:18:40,000 --> 00:18:45,000 down to the leaf to find that match along the way, or to 396 00:18:45,000 --> 00:18:49,000 report where to put a new one in. So 397 00:18:49,000 --> 00:18:52,000 let's look at the actual code; it has 398 00:18:52,000 --> 00:18:54,000 my struct, it 399 00:18:54,000 --> 00:18:57,000 has the strike key, it has the value type node, 400 00:18:57,000 --> 00:19:00,000 two pointers there, and then the one data member for the map 401 00:19:00,000 --> 00:19:03,000 itself here is the pointer to the root node. I 402 00:19:03,000 --> 00:19:09,000 also put a couple of helper number functions that I'm going to need. 403 00:19:09,000 --> 00:19:10,000 404 00:19:10,000 --> 00:19:13,000 Searching the tree and entering a new node in the tree are going to require a little 405 00:19:13,000 --> 00:19:18,000 helper, and we're going to see why that is. That basically that the add and get value public 406 00:19:18,000 --> 00:19:20,000 number functions are going to be just wrappers 407 00:19:20,000 --> 00:19:23,000 that turn and pass the code off to these recursive helpers. 408 00:19:23,000 --> 00:19:26,000 And if these cases, the additional information that each of these is 409 00:19:26,000 --> 00:19:28,000 tracking is what node we're at 410 00:19:28,000 --> 00:19:32,000 as we work our way down the tree. But the initial call to add is just going to say, well 411 00:19:32,000 --> 00:19:34,000 here's a key to value go put it in the right place. 412 00:19:34,000 --> 00:19:37,000 And that while we're doing the recursion we need to know where we are in the tree. So 413 00:19:37,000 --> 00:19:39,000 we're adding that extra parameter. 414 00:19:39,000 --> 00:19:43,000 What a lot of recursive code needs is just a little more housekeeping, and the 415 00:19:43,000 --> 00:19:44,000 housekeeping is 416 00:19:44,000 --> 00:19:51,000 what part of the tree we're currently searching or trying to enter into. 417 00:19:51,000 --> 00:19:54,000 So this is the outer call for get value. 418 00:19:54,000 --> 00:19:57,000 So get value is really just a wrapper, 419 00:19:57,000 --> 00:19:59,000 it's going to call tree search. It's going to 420 00:19:59,000 --> 00:20:03,000 ask the starting value for root. It's the 421 00:20:03,000 --> 00:20:04,000 beginning point for the search, 422 00:20:04,000 --> 00:20:07,000 looking for a particular key. And what tree searches are going to return is a 423 00:20:07,000 --> 00:20:10,000 pointer to a node. If 424 00:20:10,000 --> 00:20:14,000 that node was not found, if there was no node that has a string 425 00:20:14,000 --> 00:20:17,000 key that matches the one that we're looking for, then we'll return null. Otherwise, we'll return 426 00:20:17,000 --> 00:20:18,000 the node. 427 00:20:18,000 --> 00:20:20,000 And so it either will error 428 00:20:20,000 --> 00:20:23,000 when it didn't find it or 429 00:20:23,000 --> 00:20:25,000 pull out the value that was out of 430 00:20:25,000 --> 00:20:25,000 that 431 00:20:25,000 --> 00:20:27,000 node structure. 432 00:20:27,000 --> 00:20:30,000 Well, this part looks pretty okay. The template, right, there's always just a 433 00:20:30,000 --> 00:20:33,000 little bit of goo up there, right, seeing kind of 434 00:20:33,000 --> 00:20:36,000 the template header on the top and the use of the val type, and all this other stuff. 435 00:20:36,000 --> 00:20:39,000 It's going to get a little bit worse here. So just kind of 436 00:20:39,000 --> 00:20:42,000 hopefully take a deep breath and say, okay that part actually 437 00:20:42,000 --> 00:20:46,000 doesn't scare me too much. We're going to look at what happens when I write the tree search code, where 438 00:20:46,000 --> 00:20:48,000 it really actually goes off and does the search down the tree. And there's going to be 439 00:20:48,000 --> 00:20:49,000 440 00:20:49,000 --> 00:20:52,000 a little bit more machinations required on this. So let's just talk about it first, 441 00:20:52,000 --> 00:20:55,000 just what the code does. And then I'm going to come back and talk about some of the syntax that's 442 00:20:55,000 --> 00:20:58,000 required to make this work correctly. 443 00:20:58,000 --> 00:21:02,000 So the idea of tree search is, given a pointer to a particular 444 00:21:02,000 --> 00:21:02,000 subtree, it 445 00:21:02,000 --> 00:21:05,000 could be the root of the original tree or some 446 00:21:05,000 --> 00:21:07,000 smaller subtree further down, and a key that we're looking for, we're 447 00:21:07,000 --> 00:21:10,000 going to return a match when we find that node. 448 00:21:10,000 --> 00:21:15,000 So looking at the step here, where if the key that we have matches the key of the node 449 00:21:15,000 --> 00:21:17,000 we're looking at, then this is the node. 450 00:21:17,000 --> 00:21:19,000 This is the one we wanted. 451 00:21:19,000 --> 00:21:23,000 Otherwise, right, we look at two cases. If the key is less than this node's key, 452 00:21:23,000 --> 00:21:26,000 then we're just going to return the result from searching in the left subtree. 453 00:21:26,000 --> 00:21:31,000 Otherwise it must be greater than, and we're going to search in the right subtree. 454 00:21:31,000 --> 00:21:34,000 So this case, right, duplicates aren't allowed, so there will be exactly one match, if 455 00:21:34,000 --> 00:21:37,000 at all. 456 00:21:37,000 --> 00:21:39,000 And when we find it we return. 457 00:21:39,000 --> 00:21:43,000 The other case that needs to be handled as part of base case is if what if we ever get to an 458 00:21:43,000 --> 00:21:44,000 empty tree. 459 00:21:44,000 --> 00:21:47,000 So we're looking for 32 and the last node we just passed was 460 00:21:47,000 --> 00:21:50,000 35. We needed to be the left of it, and it turns out there was nothing down there, we 461 00:21:50,000 --> 00:21:52,000 have an empty tree. 462 00:21:52,000 --> 00:21:54,000 We'll hit this case where T is null, and we'll return null. That means there was never 463 00:21:54,000 --> 00:21:58,000 a match to that. There's no other place in the tree 464 00:21:58,000 --> 00:22:01,000 where that 32 could have been. The binary search property completely dictates the 465 00:22:01,000 --> 00:22:04,000 series of left and right turns. There's nowhere else to look. 466 00:22:04,000 --> 00:22:06,000 So that's actually, really solving 467 00:22:06,000 --> 00:22:07,000 468 00:22:07,000 --> 00:22:11,000 a lot of our - cleaning up our work for us, streamlining where we need to go; 469 00:22:11,000 --> 00:22:14,000 because the tree is organized through the 470 00:22:14,000 --> 00:22:16,000 mid point and the quarter point and then the eighth point 471 00:22:16,000 --> 00:22:19,000 to narrow down to where we could go. So as 472 00:22:19,000 --> 00:22:21,000 it is, this code works correctly. 473 00:22:21,000 --> 00:22:25,000 So do you have any questions about how it's thinking about doing its work? 474 00:22:25,000 --> 00:22:30,000 And then I'm going to show you why the syntax is a little bit wrong though. 475 00:22:30,000 --> 00:22:32,000 Okay, so let's talk about this 476 00:22:32,000 --> 00:22:34,000 return type coming out of 477 00:22:34,000 --> 00:22:36,000 the function. 478 00:22:36,000 --> 00:22:38,000 So let's go back to this for a second. 479 00:22:38,000 --> 00:22:41,000 That node is defined 480 00:22:41,000 --> 00:22:43,000 as a private member 481 00:22:43,000 --> 00:22:46,000 of the map class. 482 00:22:46,000 --> 00:22:48,000 So node is actually not a global type. 483 00:22:48,000 --> 00:22:53,000 It does not exist outside the map class. Its real name is 484 00:22:53,000 --> 00:22:56,000 485 00:22:56,000 --> 00:22:58,000 map 486 00:22:58,000 --> 00:23:00,000 val type:: 487 00:23:00,000 --> 00:23:03,000 node. 488 00:23:03,000 --> 00:23:05,000 That's its full name, 489 00:23:05,000 --> 00:23:08,000 is that. Now, 490 00:23:08,000 --> 00:23:08,000 inside 491 00:23:08,000 --> 00:23:10,000 the implementation of map, 492 00:23:10,000 --> 00:23:14,000 which is to say inside the class map, or inside member functions that we're 493 00:23:14,000 --> 00:23:17,000 defining later, you'll not that we can just kind of drop 494 00:23:17,000 --> 00:23:21,000 the big full name. The kind of my name is mister such and such, and such and 495 00:23:21,000 --> 00:23:26,000 such; it's like okay, you can just call it node, the short name internal to the class 496 00:23:26,000 --> 00:23:28,000 because it's inside that scope. And 497 00:23:28,000 --> 00:23:31,000 it's able to say, okay in this scope is there something called node? Okay there is. Okay, that's the 498 00:23:31,000 --> 00:23:34,000 one. Outside of that scope, though, if you were trying to use that outside of the 499 00:23:34,000 --> 00:23:37,000 class or some other places, it's going to 500 00:23:37,000 --> 00:23:40,000 have a little bit more trouble with this idea of node by itself. It's going to need to see 501 00:23:40,000 --> 00:23:42,000 that full name. Let 502 00:23:42,000 --> 00:23:44,000 me show you on this piece of code, 503 00:23:44,000 --> 00:23:46,000 that node here 504 00:23:46,000 --> 00:23:51,000 happens to be in the slight quirk of the way sequel plus defines things, that that 505 00:23:51,000 --> 00:23:56,000 use of node is not considered in class scope yet. 506 00:23:56,000 --> 00:23:57,000 That 507 00:23:57,000 --> 00:24:01,000 the compiler leads from left to write, so we'll look at the first one. It says, template 508 00:24:01,000 --> 00:24:04,000 type name val type, and so what the compiler sees so far is, I'm about to 509 00:24:04,000 --> 00:24:07,000 see something that's templated on val type. 510 00:24:07,000 --> 00:24:10,000 So it's a pattern for which val type is a placeholder in. 511 00:24:10,000 --> 00:24:12,000 And then it sees val type 512 00:24:12,000 --> 00:24:16,000 as the return value, and it says, okay, that's the placeholder. And then it sees this map {val type}:: and 513 00:24:16,000 --> 00:24:18,000 514 00:24:18,000 --> 00:24:21,000 it says, okay, this is within the - the thing 515 00:24:21,000 --> 00:24:26,000 that I'm defining here is within scope of the map class. And so it sees the map 516 00:24:26,000 --> 00:24:26,000 {val type}:: 517 00:24:26,000 --> 00:24:29,000 and that as soon as it crosses those double colons, 518 00:24:29,000 --> 00:24:31,000 from that point forward 519 00:24:31,000 --> 00:24:33,000 it's inside the map scope. 520 00:24:33,000 --> 00:24:34,000 And so, 521 00:24:34,000 --> 00:24:37,000 when I use things like tree search, or I use the name node, it knows what I'm 522 00:24:37,000 --> 00:24:40,000 talking about. It says, oh there's a tree search in map class. There's a node struct in map 523 00:24:40,000 --> 00:24:43,000 class, it's all good. 524 00:24:43,000 --> 00:24:47,000 In the case of this one I've seen template type and val type, and 525 00:24:47,000 --> 00:24:50,000 at this point it doesn't know anything yet. 526 00:24:50,000 --> 00:24:53,000 And so it sees me using node, and it say, 527 00:24:53,000 --> 00:24:54,000 node I've 528 00:24:54,000 --> 00:24:55,000 never heard of node. 529 00:24:55,000 --> 00:24:58,000 I'm in the middle of making some template based on val type, but I haven't heard 530 00:24:58,000 --> 00:25:01,000 anything about node. Node isn't something that exists in the global name space. 531 00:25:01,000 --> 00:25:03,000 You must have made some 532 00:25:03,000 --> 00:25:04,000 tragic error. 533 00:25:04,000 --> 00:25:08,000 What it wants you to say is 534 00:25:08,000 --> 00:25:09,000 to give it the full name 535 00:25:09,000 --> 00:25:12,000 of map {value type}:: node. 536 00:25:12,000 --> 00:25:14,000 That the real full 537 00:25:14,000 --> 00:25:16,000 this is my name, 538 00:25:16,000 --> 00:25:21,000 and otherwise it will be quite confused. 539 00:25:21,000 --> 00:25:24,000 If that weren't enough, 540 00:25:24,000 --> 00:25:26,000 we'd like to think that we were done 541 00:25:26,000 --> 00:25:27,000 with 542 00:25:27,000 --> 00:25:30,000 placating the C compiler. But it turns out there's one other little 543 00:25:30,000 --> 00:25:32,000 thing that has to happen here, 544 00:25:32,000 --> 00:25:34,000 which is there is an obscure 545 00:25:34,000 --> 00:25:38,000 reason why, even though I've given it the full name, and I feel like I've done my 546 00:25:38,000 --> 00:25:40,000 duty in describing this thing, 547 00:25:40,000 --> 00:25:43,000 that the compiler still is a little bit confused 548 00:25:43,000 --> 00:25:45,000 by this usage. 549 00:25:45,000 --> 00:25:48,000 And because the C language is just immensely complicated, 550 00:25:48,000 --> 00:25:50,000 there are a lot of funny interactions, 551 00:25:50,000 --> 00:25:52,000 there are a couple of situations it can get in to, 552 00:25:52,000 --> 00:25:53,000 where 553 00:25:53,000 --> 00:25:56,000 because it's trying hard to read left to right and see the things. 554 00:25:56,000 --> 00:25:59,000 Sometimes it needs to know some things in advance that it doesn't have all the 555 00:25:59,000 --> 00:26:00,000 information for. 556 00:26:00,000 --> 00:26:03,000 And it turns out in this case there's actually some obscure situation where 557 00:26:03,000 --> 00:26:06,000 it might not realize this is really a type. 558 00:26:06,000 --> 00:26:08,000 It might think it was 559 00:26:08,000 --> 00:26:10,000 more of a global constant 560 00:26:10,000 --> 00:26:14,000 that we actually have to use the type name key word again here 561 00:26:14,000 --> 00:26:16,000 on this one 562 00:26:16,000 --> 00:26:17,000 construction. 563 00:26:17,000 --> 00:26:20,000 It is basically a hint to the compiler 564 00:26:20,000 --> 00:26:22,000 that says, the next thing coming up 565 00:26:22,000 --> 00:26:25,000 really is a type name. 566 00:26:25,000 --> 00:26:28,000 So given that you can't tell that, you'd think it would be obvious; in 567 00:26:28,000 --> 00:26:31,000 this case it doesn't seem like there's a lot for it to be confused at, but there's 568 00:26:31,000 --> 00:26:33,000 actually some other constructions that we can't rule out. 569 00:26:33,000 --> 00:26:37,000 So the compiler is a little bit confused in this situation, and it wants you to 570 00:26:37,000 --> 00:26:38,000 tell it in advance, 571 00:26:38,000 --> 00:26:41,000 and so I'll tell you that the basic rules for when you're going to need to use this is 572 00:26:41,000 --> 00:26:45,000 exactly in this and only this situation. 573 00:26:45,000 --> 00:26:48,000 And I'll tell you what the things are that are required for type name to 574 00:26:48,000 --> 00:26:50,000 become part of your 575 00:26:50,000 --> 00:26:51,000 repertoire. 576 00:26:51,000 --> 00:26:53,000 You have a return type 577 00:26:53,000 --> 00:26:54,000 on a member function 578 00:26:54,000 --> 00:26:58,000 that is defined as part of a class that is a template, 579 00:26:58,000 --> 00:27:00,000 and that type that you are using 580 00:27:00,000 --> 00:27:03,000 is defined within the scope of that template. 581 00:27:03,000 --> 00:27:07,000 So it's trying to use something that's within a template class scope 582 00:27:07,000 --> 00:27:08,000 outside of that scope 583 00:27:08,000 --> 00:27:09,000 584 00:27:09,000 --> 00:27:11,000 that requires this type name. 585 00:27:11,000 --> 00:27:14,000 And the only place where we see things used out of their scope is on the 586 00:27:14,000 --> 00:27:15,000 return value. 587 00:27:15,000 --> 00:27:19,000 So this is kind of the one 588 00:27:19,000 --> 00:27:21,000 configuration that causes you to have to do this. 589 00:27:21,000 --> 00:27:22,000 590 00:27:22,000 --> 00:27:25,000 It's really just, you know, a little bit of a C quirk. 591 00:27:25,000 --> 00:27:28,000 It doesn't change anything about what the code's doing or anything interesting about binary 592 00:27:28,000 --> 00:27:31,000 search trees, but it will come up in any situation where you have this helper member 593 00:27:31,000 --> 00:27:34,000 function that wants to return something that's templated. You 594 00:27:34,000 --> 00:27:35,000 have 595 00:27:35,000 --> 00:27:37,000 to placate the compiler there. 596 00:27:37,000 --> 00:27:39,000 The error message when you don't do it, actually turns out to be very good in 597 00:27:39,000 --> 00:27:45,000 GCC and very terrible in visual studio. GCC says type name required here, 598 00:27:45,000 --> 00:27:47,000 and visual studio says something completely 599 00:27:47,000 --> 00:27:48,000 mystical. 600 00:27:48,000 --> 00:27:49,000 So it 601 00:27:49,000 --> 00:27:52,000 is something to be just a little bit careful about when you're trying to write 602 00:27:52,000 --> 00:27:56,000 that kind of 603 00:27:56,000 --> 00:27:58,000 code. Question? The asterisk? The asterisk is just because it returns a pointer. 604 00:27:58,000 --> 00:28:00,000 605 00:28:00,000 --> 00:28:03,000 In all the situations, I'm always returning a pointer to the node back 606 00:28:03,000 --> 00:28:04,000 there. 607 00:28:04,000 --> 00:28:16,000 So here's the pointer to the match I found or node when I didn't find one. All right, that's a little bit of grimy code. Let's keep going. 608 00:28:16,000 --> 00:28:17,000 So how is it 609 00:28:17,000 --> 00:28:18,000 that add 610 00:28:18,000 --> 00:28:22,000 works? Well, it starts like get value, that 611 00:28:22,000 --> 00:28:23,000 the 612 00:28:23,000 --> 00:28:27,000 strategy we're going to use for inserting into the tree is the really simple one that say, 613 00:28:27,000 --> 00:28:30,000 well don't rearrange anything that's there. For example, if I were trying to put 614 00:28:30,000 --> 00:28:32,000 the number 54 into this tree, 615 00:28:32,000 --> 00:28:33,000 you could say - well 616 00:28:33,000 --> 00:28:36,000 you could imagine putting it at the root and moving things around. Imagine 617 00:28:36,000 --> 00:28:40,000 that your goal is to put it in with a minimal amount of rearrangement. 618 00:28:40,000 --> 00:28:44,000 So leaving all the existing nodes connected the way they are, 619 00:28:44,000 --> 00:28:47,000 we're going to follow the binary search path of 620 00:28:47,000 --> 00:28:49,000 the get value code 621 00:28:49,000 --> 00:28:51,000 to find the place where 54 would have been 622 00:28:51,000 --> 00:28:54,000 if we left everything else in tact. 623 00:28:54,000 --> 00:28:57,000 Looking at 57 you could say, well it has to be to the left of 624 00:28:57,000 --> 00:29:01,000 57. Looking to the 32 you say, it has to be to the right of 625 00:29:01,000 --> 00:29:01,000 32. 626 00:29:01,000 --> 00:29:05,000 Here's where we get some empty tree, so we say, well that's the place where we'll put it. 627 00:29:05,000 --> 00:29:06,000 We're going to tack it off, kind 628 00:29:06,000 --> 00:29:12,000 of hang it off one of the empty trees 629 00:29:12,000 --> 00:29:15,000 along the front tier, the bottom of that tree. 630 00:29:15,000 --> 00:29:18,000 And so, we'll just walk our way down to where we fall off the tree, and put the 631 00:29:18,000 --> 00:29:19,000 new node in place there. 632 00:29:19,000 --> 00:29:22,000 So if I want the node 58 in, I'll say it 633 00:29:22,000 --> 00:29:25,000 has to go to the right; it has to go to the left; it has to go to the left; it has to go to the left; it 634 00:29:25,000 --> 00:29:27,000 will go right down there. If 635 00:29:27,000 --> 00:29:30,000 I want the node 100 in, it has to go right, right, right, right, right. 636 00:29:30,000 --> 00:29:35,000 That just kind of following the path, leaving all the current pointers and 637 00:29:35,000 --> 00:29:40,000 nodes in place, kind of finding a new place. And there's exactly one 638 00:29:40,000 --> 00:29:43,000 that given any number in a current configuration 639 00:29:43,000 --> 00:29:46,000 and the goal of not rearranging anything, there's exactly one place to put 640 00:29:46,000 --> 00:29:48,000 that new node in there 641 00:29:48,000 --> 00:29:49,000 that will 642 00:29:49,000 --> 00:29:51,000 keep the binary search property in tact. 643 00:29:51,000 --> 00:29:55,000 So it actually does a lot of decision making for us. 644 00:29:55,000 --> 00:29:58,000 We don't have to make any hard decisions about where to go. It just all this 645 00:29:58,000 --> 00:30:05,000 left, right, left, right based on less than and greater than. What if the values are equal? If 646 00:30:05,000 --> 00:30:09,000 the values are equal you have to have a plan, and it could be 647 00:30:09,000 --> 00:30:11,000 that you're using all the less than or equal to or greater than 648 00:30:11,000 --> 00:30:14,000 or equal to. You just have to have a plan. In terms of our map it turns out to 649 00:30:14,000 --> 00:30:17,000 equal to all the rights. 650 00:30:17,000 --> 00:30:20,000 So we handle it differently, which we don't every put a duplicate into the 651 00:30:20,000 --> 00:30:23,000 tree. 652 00:30:23,000 --> 00:30:26,000 So the add call is going to basically just be a wrapper 653 00:30:26,000 --> 00:30:29,000 that calls tree enter, starting from the root 654 00:30:29,000 --> 00:30:31,000 with the key and the value 655 00:30:31,000 --> 00:30:34,000 that then goes and does that work, walks its way down to the bottom to 656 00:30:34,000 --> 00:30:40,000 find the right place to stick the new one in. 657 00:30:40,000 --> 00:30:43,000 Here is 658 00:30:43,000 --> 00:30:44,000 a 659 00:30:44,000 --> 00:30:49,000 very dense little piece of code that will put a node into a tree 660 00:30:49,000 --> 00:30:52,000 using the strategy that we've just talked about of, 661 00:30:52,000 --> 00:30:55,000 don't rearrange any of the pointers, just walk your way down to where you find an empty 662 00:30:55,000 --> 00:30:59,000 tree. And once you find that empty tree, replace the empty tree with a singleton node 663 00:30:59,000 --> 00:31:02,000 with the new key and value that you have. 664 00:31:02,000 --> 00:31:06,000 So let's not look at the base case first. Let's actually look at the 665 00:31:06,000 --> 00:31:09,000 recursive cases, because I think that they're a little bit easier to think about. 666 00:31:09,000 --> 00:31:13,000 Is if we every find a match, so we have a key coming in, and it matches 667 00:31:13,000 --> 00:31:14,000 the current node, then we just overwrite. 668 00:31:14,000 --> 00:31:17,000 So that's the way the map handles 669 00:31:17,000 --> 00:31:18,000 insertion of a duplicate value. 670 00:31:18,000 --> 00:31:21,000 Now, in some other situation you might actually insert one. But in our case it turns out we 671 00:31:21,000 --> 00:31:24,000 never want a duplicate. We always want to take the existing value and replace 672 00:31:24,000 --> 00:31:26,000 it with a new one. 673 00:31:26,000 --> 00:31:28,000 If it matches, we're done. 674 00:31:28,000 --> 00:31:29,000 Otherwise, 675 00:31:29,000 --> 00:31:32,000 we need to decide whether we go left or right. 676 00:31:32,000 --> 00:31:36,000 Based on the key we're attempting to insert or the current node value, it either 677 00:31:36,000 --> 00:31:39,000 will go in the left of the subtree, left of the right, in which case we make 678 00:31:39,000 --> 00:31:44,000 exactly one recursive call either to the left or to the right depending on the 679 00:31:44,000 --> 00:31:46,000 relationship of key to the current 680 00:31:46,000 --> 00:31:48,000 key's value. 681 00:31:48,000 --> 00:31:49,000 And then 682 00:31:49,000 --> 00:31:52,000 the one base case that everything eventually gets down to, that's going to create a 683 00:31:52,000 --> 00:31:53,000 new node, 684 00:31:53,000 --> 00:31:57,000 is when you have an empty tree, 685 00:31:57,000 --> 00:31:59,000 then that says that you have followed the path that 686 00:31:59,000 --> 00:32:04,000 has lead off the tree and fallen off the bottom. It says replace that with 687 00:32:04,000 --> 00:32:05,000 this new singleton node. 688 00:32:05,000 --> 00:32:07,000 So we make a new node out in the heap. 689 00:32:07,000 --> 00:32:09,000 We're assigning the key 690 00:32:09,000 --> 00:32:12,000 based on this. We're assigning the value based on the parameter coming in. And then we set 691 00:32:12,000 --> 00:32:14,000 its left and right null to do 692 00:32:14,000 --> 00:32:16,000 693 00:32:16,000 --> 00:32:20,000 a new leaf node, right, it has left and right null subtrees. 694 00:32:20,000 --> 00:32:22,000 And the way this got wired in the tree, 695 00:32:22,000 --> 00:32:26,000 you don't see who is it that's now pointing to this tree, this new 696 00:32:26,000 --> 00:32:28,000 singleton node that I placed into here? 697 00:32:28,000 --> 00:32:31,000 You don't typically see that wire that's coming into it. Who 698 00:32:31,000 --> 00:32:33,000 is pointing to it 699 00:32:33,000 --> 00:32:34,000 is actually being done 700 00:32:34,000 --> 00:32:39,000 by T being passed by reference. 701 00:32:39,000 --> 00:32:42,000 The node right above it, 702 00:32:42,000 --> 00:32:44,000 the left or the right 703 00:32:44,000 --> 00:32:48,000 of what's going to become the parent of the new node being entered, 704 00:32:48,000 --> 00:32:51,000 was passed by reference. It previously was null, 705 00:32:51,000 --> 00:32:55,000 and it got reassigned to point to this new node. So the kind of wiring in of 706 00:32:55,000 --> 00:32:58,000 that node is happening by the pass by reference of the pointer, which is a 707 00:32:58,000 --> 00:33:04,000 tricky thing to kind of watch and get your head around. Let's 708 00:33:04,000 --> 00:33:06,000 look at the code operating. Do you have a 709 00:33:06,000 --> 00:33:13,000 question? Yes. [Inaudible]. Can you 710 00:33:13,000 --> 00:33:14,000 explain 711 00:33:14,000 --> 00:33:16,000 712 00:33:16,000 --> 00:33:17,000 that little step? 713 00:33:17,000 --> 00:33:22,000 Instructor: Well this step is just something in the case of the map, if we every find an existing entity that has that key, and 714 00:33:22,000 --> 00:33:23,000 we overwrite. 715 00:33:23,000 --> 00:33:27,000 So in the map you said previously, Bob's phone number is this, and then you install a new 716 00:33:27,000 --> 00:33:30,000 phone number on Bob. If it finds a map to Bob saying Bob was previously entered, it just 717 00:33:30,000 --> 00:33:34,000 replaces his value. So no nodes are created, no pointers are rearranged; 718 00:33:34,000 --> 00:33:38,000 it just commandeers that node and changes its value. 719 00:33:38,000 --> 00:33:43,000 That's the behaviors specified by map. 720 00:33:43,000 --> 00:33:45,000 So if we have this 721 00:33:45,000 --> 00:33:46,000 right now, we've got 722 00:33:46,000 --> 00:33:49,000 some handful of fruits that are installed 723 00:33:49,000 --> 00:33:53,000 in our tree with some value. It's doesn't matter what the value is, so I didn't even bother to fill 724 00:33:53,000 --> 00:33:57,000 it in to confuse you. So this is what I have, papaya is my root node, and then 725 00:33:57,000 --> 00:34:01,000 the next level down has banana and peach, and the pear and what not. So 726 00:34:01,000 --> 00:34:03,000 root is pointing to papaya right now. 727 00:34:03,000 --> 00:34:06,000 When I make my call to insert a new value, 728 00:34:06,000 --> 00:34:09,000 let's say that the one that I want to put in the tree now is orange, 729 00:34:09,000 --> 00:34:13,000 the very first call to tree enter looks like this, 730 00:34:13,000 --> 00:34:16,000 T is the reference parameter that's coming in, and 731 00:34:16,000 --> 00:34:20,000 it refers to root. 732 00:34:20,000 --> 00:34:24,000 So the very first call to tree enter says please insert 733 00:34:24,000 --> 00:34:27,000 orange and its value 734 00:34:27,000 --> 00:34:28,000 into the tree 735 00:34:28,000 --> 00:34:31,000 who is pointed to by 736 00:34:31,000 --> 00:34:34,000 root. So it actually has that by reference. It's going to need that by 737 00:34:34,000 --> 00:34:37,000 reference because it's actually planning on updating that 738 00:34:37,000 --> 00:34:39,000 when it finally installs the new node. 739 00:34:39,000 --> 00:34:43,000 So the first call to tree enter says, okay well is orange less than or greater than 740 00:34:43,000 --> 00:34:45,000 papaya. 741 00:34:45,000 --> 00:34:47,000 Orange precedes papaya 742 00:34:47,000 --> 00:34:48,000 in the alphabet, 743 00:34:48,000 --> 00:34:50,000 so okay, well call tree enter 744 00:34:50,000 --> 00:34:52,000 the second time 745 00:34:52,000 --> 00:34:53,000 passing 746 00:34:53,000 --> 00:34:58,000 the left subtree of the current value of tee. And so T 747 00:34:58,000 --> 00:35:00,000 was a reference to root right then, 748 00:35:00,000 --> 00:35:04,000 now T becomes a reference to the left subtree coming out of papaya. So it says, now 749 00:35:04,000 --> 00:35:06,000 okay, for the tree 750 00:35:06,000 --> 00:35:10,000 that we're talking about here, which is the one that points to the banana and 751 00:35:10,000 --> 00:35:14,000 below. That's the new root of that subtree is banana, it says, does orange go 752 00:35:14,000 --> 00:35:15,000 to the left or right 753 00:35:15,000 --> 00:35:19,000 of the root node 754 00:35:19,000 --> 00:35:21,000 banana? And it goes to the right. So the third stack frame 755 00:35:21,000 --> 00:35:24,000 of tree enter now says, well the reference pointer 756 00:35:24,000 --> 00:35:28,000 is now looking at the right subtree coming out of banana. And it 757 00:35:28,000 --> 00:35:29,000 says, well does melon go to 758 00:35:29,000 --> 00:35:31,000 the left or the right of 759 00:35:31,000 --> 00:35:32,000 that? 760 00:35:32,000 --> 00:35:34,000 It goes to the right, 761 00:35:34,000 --> 00:35:36,000 so now on this call, right, 762 00:35:36,000 --> 00:35:40,000 the current value of T is going to be null. It is a reference, so a synonym for 763 00:35:40,000 --> 00:35:44,000 what was the right field coming out of the melon node, 764 00:35:44,000 --> 00:35:47,000 and in this is the case where it says, if orange was in the tree, 765 00:35:47,000 --> 00:35:50,000 then this is where it would be hanging off of. It would be coming off of the 766 00:35:50,000 --> 00:35:51,000 right 767 00:35:51,000 --> 00:35:52,000 768 00:35:52,000 --> 00:35:53,000 subtree of melon. 769 00:35:53,000 --> 00:35:55,000 That part is null, 770 00:35:55,000 --> 00:35:58,000 and so that's our place. That's the place where we're going to take out the 771 00:35:58,000 --> 00:35:59,000 existing null, 772 00:35:59,000 --> 00:36:00,000 wipe over it, 773 00:36:00,000 --> 00:36:04,000 and put in a new pointer to the node orange 774 00:36:04,000 --> 00:36:06,000 that's out here. 775 00:36:06,000 --> 00:36:09,000 And so that's how the thing got wired in. It's actually kind of 776 00:36:09,000 --> 00:36:12,000 subtle to see how that happened, because you don?t ever see a 777 00:36:12,000 --> 00:36:16,000 direct descendant, like T is left equals this, equals that new node; or T is 778 00:36:16,000 --> 00:36:17,000 right equals that new 779 00:36:17,000 --> 00:36:20,000 node. It happened by virtue of passing T right 780 00:36:20,000 --> 00:36:22,000 by reference into the 781 00:36:22,000 --> 00:36:26,000 calls further down the stack. That eventually, right when it hit the base 782 00:36:26,000 --> 00:36:26,000 case, 783 00:36:26,000 --> 00:36:30,000 it took what was previously a pointer to null and overwrote it with a pointer to this 784 00:36:30,000 --> 00:36:31,000 new cell, 785 00:36:31,000 --> 00:36:37,000 that then got attached into the tree. Let's take a 786 00:36:37,000 --> 00:36:38,000 look at this piece 787 00:36:38,000 --> 00:36:40,000 of code. 788 00:36:40,000 --> 00:36:44,000 That by reference there turns out to be really, really important. 789 00:36:44,000 --> 00:36:46,000 If that 790 00:36:46,000 --> 00:36:49,000 T was just being passed as an ordinary node star, 791 00:36:49,000 --> 00:36:52,000 nodes would just get lost, dropped on the floor like crazy. 792 00:36:52,000 --> 00:36:56,000 For example, consider the very first time when you're passing in root and root 793 00:36:56,000 --> 00:36:58,000 is empty. If root is null, 794 00:36:58,000 --> 00:36:59,000 it would say, well if root is null 795 00:36:59,000 --> 00:37:02,000 and then it would change this local parameter T; T is some new node of these 796 00:37:02,000 --> 00:37:06,000 things. But if it really wasn't making permanent changes to the root, 797 00:37:06,000 --> 00:37:08,000 really wasn't making root pointer of that new node, root 798 00:37:08,000 --> 00:37:12,000 would continue pointing to null, and that node would just get orphaned. And 799 00:37:12,000 --> 00:37:14,000 every subsequent one, the same thing would happen. 800 00:37:14,000 --> 00:37:17,000 So in order to have this be a persistent change, 801 00:37:17,000 --> 00:37:19,000 the pointer that's coming in, 802 00:37:19,000 --> 00:37:22,000 either the root in the first case or T left or T 803 00:37:22,000 --> 00:37:23,000 right 804 00:37:23,000 --> 00:37:25,000 in subsequent calls, 805 00:37:25,000 --> 00:37:30,000 needed to have the ability to be permanently modified when we attach that new 806 00:37:30,000 --> 00:37:33,000 subtree. 807 00:37:33,000 --> 00:37:37,000 That's really very tricky code, so I would be happy to 808 00:37:37,000 --> 00:37:39,000 try to answer 809 00:37:39,000 --> 00:37:41,000 questions or go through something to try to get your head around what 810 00:37:41,000 --> 00:37:44,000 it's supposed to be doing. If you could ask a question I 811 00:37:44,000 --> 00:37:47,000 could 812 00:37:47,000 --> 00:37:49,000 help you. Or you could just nod your head and say, I'm just totally fine. 813 00:37:49,000 --> 00:37:52,000 Could you use the tree search algorithm that you programmed before? 814 00:37:52,000 --> 00:37:53,000 They're 815 00:37:53,000 --> 00:37:58,000 very close, right, and 816 00:37:58,000 --> 00:37:59,000 where did I put it? 817 00:37:59,000 --> 00:38:02,000 It's on this slide. 818 00:38:02,000 --> 00:38:05,000 The problem with this one right now is it does really stop and return the 819 00:38:05,000 --> 00:38:07,000 null. So I could actually unify them, but it would take a little more 820 00:38:07,000 --> 00:38:11,000 work, and at some point you say, wow I could unify this 821 00:38:11,000 --> 00:38:15,000 at the expense of making this other piece of code a little more complicated. So in this case I chose 822 00:38:15,000 --> 00:38:18,000 to keep them separate. It's not impossible to unify them, but you end up 823 00:38:18,000 --> 00:38:19,000 having to point to 824 00:38:19,000 --> 00:38:22,000 where the node is or a null where you would insert a new node. Tree 825 00:38:22,000 --> 00:38:26,000 search doesn't care about, but tree enter does. It sort of would 826 00:38:26,000 --> 00:38:29,000 modify it to fit the needs of both at the expense of making it a little more awkward 827 00:38:29,000 --> 00:38:38,000 on both sides. Question here. 828 00:38:38,000 --> 00:38:43,000 [Inaudible] the number example. 829 00:38:43,000 --> 00:38:47,000 If you were inserting 54 where you said, if you called print, 830 00:38:47,000 --> 00:38:49,000 wouldn't it print 831 00:38:49,000 --> 00:38:51,000 out of order? So 832 00:38:51,000 --> 00:38:56,000 54 would go left of 57, right of 32, so it'd go over here. 833 00:38:56,000 --> 00:38:59,000 But the way in order traversals work is says, 834 00:38:59,000 --> 00:39:03,000 print everything in my left subtree, print everything in my left subtree, left subtree. So it would kind of print seven 835 00:39:03,000 --> 00:39:07,000 as it came back five, 32, and then it would print 54 before it got to the 836 00:39:07,000 --> 00:39:08,000 57. So 837 00:39:08,000 --> 00:39:12,000 an in order traversal of a binary search tree, as long as the binary search tree 838 00:39:12,000 --> 00:39:16,000 is properly maintained, will print them in sorted order. 839 00:39:16,000 --> 00:39:19,000 It's almost like, don't even think too hard about it, if 840 00:39:19,000 --> 00:39:22,000 everything in my left subtree is smaller, and everything in my right subtree is greater than me, 841 00:39:22,000 --> 00:39:26,000 if I print all of those I have to come out in sorted order. 842 00:39:26,000 --> 00:39:30,000 If the property is correctly maintained in the tree, then that 843 00:39:30,000 --> 00:39:31,000 by definition 844 00:39:31,000 --> 00:39:34,000 is a sorted way to pass through all the numbers. So if 845 00:39:34,000 --> 00:39:38,000 at any point an in order traversal is hitting numbers out of order, it would mean that the tree wasn't 846 00:39:38,000 --> 00:39:40,000 properly 847 00:39:40,000 --> 00:39:51,000 organized. 848 00:39:51,000 --> 00:39:55,000 So let's talk about how that works, there are a 849 00:39:55,000 --> 00:39:59,000 couple of other operations on the tree, things like doing the size, 850 00:39:59,000 --> 00:40:01,000 which would count the number of members. Which you could do by just doing a traversal, or you 851 00:40:01,000 --> 00:40:04,000 might do by just caching a number 852 00:40:04,000 --> 00:40:07,000 that you updated as you added and moved nodes. And 853 00:40:07,000 --> 00:40:10,000 the contains keys, that's basically just a tree search 854 00:40:10,000 --> 00:40:14,000 that determines whether it found a node or not. 855 00:40:14,000 --> 00:40:17,000 The rest of the implementation is not interest. There's also deleting, but I 856 00:40:17,000 --> 00:40:18,000 didn't show you 857 00:40:18,000 --> 00:40:23,000 that. The space use, how does this relate to the other known 858 00:40:23,000 --> 00:40:24,000 implementations that we have for math? 859 00:40:24,000 --> 00:40:27,000 It adds certainly some space overhead. We're 860 00:40:27,000 --> 00:40:31,000 going to have two pointers, a left and a right off every entry. So every entry is 861 00:40:31,000 --> 00:40:35,000 guaranteed to have an excess 8-byte capacity 862 00:40:35,000 --> 00:40:37,000 relative to the data being stored. 863 00:40:37,000 --> 00:40:40,000 It does actually add those tightly to the allocation. So like a link 864 00:40:40,000 --> 00:40:43,000 list, it allocates nodes on a per 865 00:40:43,000 --> 00:40:46,000 entry basis, so there's actually not a lot of 866 00:40:46,000 --> 00:40:49,000 reserve capacity that's waiting around unused the way it might been in a 867 00:40:49,000 --> 00:40:51,000 vector situation. 868 00:40:51,000 --> 00:40:53,000 And the performance of it is that it's based on the 869 00:40:53,000 --> 00:40:55,000 height of the tree, 870 00:40:55,000 --> 00:41:00,000 that both the add and the get value are doing a traversal starting from the root and 871 00:41:00,000 --> 00:41:03,000 working their way down to the bottom of the tree, either finding what it wanted 872 00:41:03,000 --> 00:41:05,000 along the way 873 00:41:05,000 --> 00:41:07,000 or falling off the bottom of the tree 874 00:41:07,000 --> 00:41:11,000 when it wasn't found or when it needs to be inserting a new node. 875 00:41:11,000 --> 00:41:15,000 That height of the tree, you're expecting it to be logarithmic. And so 876 00:41:15,000 --> 00:41:17,000 if I look at, for example, the values, I've got 877 00:41:17,000 --> 00:41:22,000 seven values here, two, eight, 14, 15, 20, and 21. 878 00:41:22,000 --> 00:41:23,000 Those seven, 879 00:41:23,000 --> 00:41:27,000 eight values can be inserted in a tree to get very 880 00:41:27,000 --> 00:41:28,000 different results. 881 00:41:28,000 --> 00:41:30,000 There's not one binary search tree 882 00:41:30,000 --> 00:41:33,000 that represents those values 883 00:41:33,000 --> 00:41:37,000 uniquely. Depending on what order you manage to insert them you'll have 884 00:41:37,000 --> 00:41:38,000 different things. 885 00:41:38,000 --> 00:41:39,000 For example, 886 00:41:39,000 --> 00:41:41,000 given the way we're doing it, 887 00:41:41,000 --> 00:41:44,000 whatever node is inserted first will be the root, and we never move the 888 00:41:44,000 --> 00:41:48,000 root. We leave the root along. So if you look at this as a historical 889 00:41:48,000 --> 00:41:50,000 artifact and say, well they entered a bunch of numbers, 890 00:41:50,000 --> 00:41:52,000 and this is the tree I got. 891 00:41:52,000 --> 00:41:55,000 I can tell you for sure that 15 was the very first number that they entered. 892 00:41:55,000 --> 00:41:57,000 There's no other possibility, 15 got entered first. 893 00:41:57,000 --> 00:42:01,000 The next number that got inserted was one of eight or 20, I don't know 894 00:42:01,000 --> 00:42:03,000 which. 895 00:42:03,000 --> 00:42:06,000 I can tell you, for example, that eight got inserted before 14. In terms 896 00:42:06,000 --> 00:42:09,000 of a level-by-level arrangement, the ones on level one 897 00:42:09,000 --> 00:42:13,000 were always inserted before the ones on level two. 898 00:42:13,000 --> 00:42:16,000 But one possibility I said here was well maybe 15 went in first, 899 00:42:16,000 --> 00:42:18,000 then eight, and then two, 900 00:42:18,000 --> 00:42:21,000 and then 20, and then 21, and then 14, and then 18. There 901 00:42:21,000 --> 00:42:25,000 are some other rearrangements that are also possible that would create the exact same 902 00:42:25,000 --> 00:42:26,000 tree. 903 00:42:26,000 --> 00:42:30,000 And there are other variations that would produce a different, but still valid, 904 00:42:30,000 --> 00:42:32,000 binary search tree. 905 00:42:32,000 --> 00:42:33,000 So with these seven nodes, right, 906 00:42:33,000 --> 00:42:36,000 this tree is height three, 907 00:42:36,000 --> 00:42:39,000 and is what's considered perfectly balanced. Half the nodes are on the 908 00:42:39,000 --> 00:42:42,000 left and the right of each tree all the way through it, 909 00:42:42,000 --> 00:42:45,000 leading to that just beautifully balanced 910 00:42:45,000 --> 00:42:46,000 911 00:42:46,000 --> 00:42:47,000 result we're hoping for 912 00:42:47,000 --> 00:42:52,000 where both our operations, both add and get value, will be logarithmic. So 913 00:42:52,000 --> 00:42:54,000 if I have 914 00:42:54,000 --> 00:42:56,000 1,000 915 00:42:56,000 --> 00:42:59,000 nodes inserted, that height tree should be about ten. And if everything kind of went 916 00:42:59,000 --> 00:43:03,000 well and we got kind of went well, and we got an even split all the way around, then it would take ten 917 00:43:03,000 --> 00:43:06,000 comparisons to find something in that tree or determine it wasn't there' and 918 00:43:06,000 --> 00:43:07,000 also to insert, 919 00:43:07,000 --> 00:43:10,000 which was the thing we were having trouble with at the vector case, that 920 00:43:10,000 --> 00:43:13,000 because that search leads us right to the place where it needs to go, 921 00:43:13,000 --> 00:43:16,000 and the inserting because of the flexibility of the pointer base structure is 922 00:43:16,000 --> 00:43:17,000 very easy, 923 00:43:17,000 --> 00:43:21,000 we can make both that insert and add operate logarithmically. 924 00:43:21,000 --> 00:43:24,000 Let's think about some cases that aren't so good. 925 00:43:24,000 --> 00:43:28,000 I take those same seven numbers and I insert them 926 00:43:28,000 --> 00:43:32,000 not quite so perfectly, and I get a not quite so even result. 927 00:43:32,000 --> 00:43:36,000 So on this side I can tell you that 20 was the first one inserted. 928 00:43:36,000 --> 00:43:38,000 It was the root and the root doesn't move, 929 00:43:38,000 --> 00:43:41,000 but then subsequent ones was inserted an eight, and then a 21, and then an 18, 930 00:43:41,000 --> 00:43:46,000 and then a 14, and then a 15, and then it went back and got that two, 931 00:43:46,000 --> 00:43:47,000 the one over there. 932 00:43:47,000 --> 00:43:50,000 Another possibility for how it got inserted 933 00:43:50,000 --> 00:43:51,000 934 00:43:51,000 --> 00:43:53,000 18, 14, 15, and so on. 935 00:43:53,000 --> 00:43:54,000 These are 936 00:43:54,000 --> 00:43:58,000 not quite as balanced as the last one, but they're not terribly unbalanced. 937 00:43:58,000 --> 00:44:00,000 They're like one or two 938 00:44:00,000 --> 00:44:03,000 more than the perfectly balanced case. 939 00:44:03,000 --> 00:44:09,000 So at logarithmic plus a little constant still doesn't look 940 00:44:09,000 --> 00:44:11,000 that bad. Okay, how bad can it really get? 941 00:44:11,000 --> 00:44:15,000 It can get really bad. 942 00:44:15,000 --> 00:44:18,000 I can take those seven numbers and insert them in the worst 943 00:44:18,000 --> 00:44:21,000 case, and produce a completely degenerate tree 944 00:44:21,000 --> 00:44:24,000 where, for example, if I insert them in sorted order 945 00:44:24,000 --> 00:44:25,000 or in reverse sorted order, 946 00:44:25,000 --> 00:44:26,000 947 00:44:26,000 --> 00:44:28,000 I really have a link list. 948 00:44:28,000 --> 00:44:32,000 I don't have a tree at all. It goes two, 18, 14, 15, 20, 21, 949 00:44:32,000 --> 00:44:34,000 that's still a valid binary search tree. 950 00:44:34,000 --> 00:44:35,000 If you 951 00:44:35,000 --> 00:44:38,000 print it in order, it still comes out in sorted order. 952 00:44:38,000 --> 00:44:41,000 It still can be searched using a binary search strategy. 953 00:44:41,000 --> 00:44:45,000 It's just because the way it's been divided up here, it's not buying 954 00:44:45,000 --> 00:44:48,000 us a lot. So if I'm looking for the number 22, I'll say, well 955 00:44:48,000 --> 00:44:52,000 is it to the left or the right of two? It's to the right. Is it to the left or right of 956 00:44:52,000 --> 00:44:55,000 eight? Oh, it's to the right. Is it to the left or right, you know like all the way down, so finding 957 00:44:55,000 --> 00:44:57,000 22 looking at every single one 958 00:44:57,000 --> 00:45:01,000 of the current entries in the tree, before I could conclude, as I fell off the 959 00:45:01,000 --> 00:45:04,000 bottom, that it wasn't there. 960 00:45:04,000 --> 00:45:06,000 Similarly kind of over there; 961 00:45:06,000 --> 00:45:09,000 it didn't buy us a lot. It has a lot to do with 962 00:45:09,000 --> 00:45:13,000 how we inserted it as to how good a balance we're going to get. 963 00:45:13,000 --> 00:45:16,000 And there are some cases that will produce just drastically unbalanced 964 00:45:16,000 --> 00:45:17,000 965 00:45:17,000 --> 00:45:21,000 things that require linear searches, totally linear to walk that way 966 00:45:21,000 --> 00:45:22,000 down 967 00:45:22,000 --> 00:45:25,000 to find something. 968 00:45:25,000 --> 00:45:26,000 There's a couple of other 969 00:45:26,000 --> 00:45:29,000 degenerate trees, just to mention it's not only the sorted or reverse sorted. 970 00:45:29,000 --> 00:45:33,000 There's also these other kind of saw tooth inputs, they're called, 971 00:45:33,000 --> 00:45:35,000 where you just happen to at any given point to 972 00:45:35,000 --> 00:45:39,000 have inserted the max or the min of the elements that remain, so it 973 00:45:39,000 --> 00:45:42,000 keeps kind of dividing the input into one and 974 00:45:42,000 --> 00:45:44,000 N-1. So having the 975 00:45:44,000 --> 00:45:47,000 largest and the smallest, and the largest and the smallest, and then 976 00:45:47,000 --> 00:45:48,000 subsequent smallest down there 977 00:45:48,000 --> 00:45:52,000 can also produce a linear line. It's not quite all just down the left or 978 00:45:52,000 --> 00:45:55,000 down the right, but it does mean that every single node has an empty 979 00:45:55,000 --> 00:45:59,000 left or right subtree is what's considered the degenerate case here. And 980 00:45:59,000 --> 00:46:01,000 both have the same property where 981 00:46:01,000 --> 00:46:04,000 half of the pointers off 982 00:46:04,000 --> 00:46:08,000 of each cell are going nowhere. 983 00:46:08,000 --> 00:46:10,000 There is an interesting relationship here 984 00:46:10,000 --> 00:46:12,000 between these worst-case inputs for transversion and quick sort, and they're 985 00:46:12,000 --> 00:46:14,000 actually kind of exactly one to one. 986 00:46:14,000 --> 00:46:18,000 That what made quick sort deteriorate was the element of the input, 987 00:46:18,000 --> 00:46:21,000 the max or the min of what remain. And that's exactly the same case 988 00:46:21,000 --> 00:46:25,000 for transversion, max or min of what remains means that you are not dividing your 989 00:46:25,000 --> 00:46:29,000 input into two partitions of roughly equal size. You're dividing them into the 990 00:46:29,000 --> 00:46:33,000 one, N-1, which is getting you nowhere. What do you 991 00:46:33,000 --> 00:46:37,000 get to do about it? 992 00:46:37,000 --> 00:46:40,000 You have to decide first of all if it's a problem worth solving. 993 00:46:40,000 --> 00:46:41,000 994 00:46:41,000 --> 00:46:44,000 So that's what the first one say. Sometimes you just say that 995 00:46:44,000 --> 00:46:46,000 it will sometimes happen in some 996 00:46:46,000 --> 00:46:47,000 inputs 997 00:46:47,000 --> 00:46:50,000 that I consider rare enough that it will degenerate. I don't think they're 998 00:46:50,000 --> 00:46:52,000 important enough or common enough to make a big deal out of. 999 00:46:52,000 --> 00:46:56,000 I don't think that applies here, because it turns out that getting your data in sorted order is 1000 00:46:56,000 --> 00:46:59,000 probably not unlikely. That somebody might be refilling a map 1001 00:46:59,000 --> 00:47:02,000 with data they read from a file that they wrote out in sorted order last time. 1002 00:47:02,000 --> 00:47:05,000 So saying that if it is coming in sorted, 1003 00:47:05,000 --> 00:47:09,000 then I'm probably going to tolerate that, is probably not going to work. 1004 00:47:09,000 --> 00:47:12,000 And there's two main strategies then, if you've agreed to take action about it, 1005 00:47:12,000 --> 00:47:14,000 about what you're going to do. 1006 00:47:14,000 --> 00:47:15,000 1007 00:47:15,000 --> 00:47:19,000 The first one is to wait until 1008 00:47:19,000 --> 00:47:21,000 the problem gets bad enough 1009 00:47:21,000 --> 00:47:23,000 that you decide to do something about it. 1010 00:47:23,000 --> 00:47:28,000 So you don't do any rebalancing as you go. You just let stuff go, left, right, left, 1011 00:47:28,000 --> 00:47:30,000 right, whatever's happening. And 1012 00:47:30,000 --> 00:47:31,000 1013 00:47:31,000 --> 00:47:34,000 you try to keep track of the height of your tree, 1014 00:47:34,000 --> 00:47:37,000 and you realize when the height of your tree seems sufficiently out of whack with 1015 00:47:37,000 --> 00:47:39,000 your expected logarithmic height, 1016 00:47:39,000 --> 00:47:42,000 to do something about it. Then you just fix the whole tree. 1017 00:47:42,000 --> 00:47:45,000 You can clean the whole thing up. Pull them all out into a vector, rearrange them, 1018 00:47:45,000 --> 00:47:49,000 put them back in, so the middlemost node is the root all the way down. And kind of 1019 00:47:49,000 --> 00:47:50,000 create the perfectly balanced tree, 1020 00:47:50,000 --> 00:47:52,000 and then keep going 1021 00:47:52,000 --> 00:47:54,000 and wait for it to get out of whack, and fix it again; 1022 00:47:54,000 --> 00:47:56,000 kind 1023 00:47:56,000 --> 00:47:58,000 of wait and see. 1024 00:47:58,000 --> 00:48:01,000 The other strategy, and this happens to be a more common one in 1025 00:48:01,000 --> 00:48:06,000 practice, is you just constantly do a little bit of fixing up all the time. 1026 00:48:06,000 --> 00:48:09,000 Whenever you notice that the tree is looking just a little bit lopsided, you let 1027 00:48:09,000 --> 00:48:12,000 it get a little bit; usually you'll let it be out of balance by one. 1028 00:48:12,000 --> 00:48:15,000 So you might have height four on that side and height three on that side. 1029 00:48:15,000 --> 00:48:17,000 But as soon as it starts to get to be three and five, 1030 00:48:17,000 --> 00:48:20,000 you do a little shuffle to bring them back to both four. And it 1031 00:48:20,000 --> 00:48:21,000 involves just a little bit of work 1032 00:48:21,000 --> 00:48:23,000 all the time 1033 00:48:23,000 --> 00:48:25,000 to where you just don't let 1034 00:48:25,000 --> 00:48:28,000 the tree turn into a real mess. 1035 00:48:28,000 --> 00:48:31,000 But there's a little bit of extra fixing up being done all the time. 1036 00:48:31,000 --> 00:48:35,000 And so the particular kind of tree that's discussed in the text, which you can read about, 1037 00:48:35,000 --> 00:48:37,000 but we won't cover 1038 00:48:37,000 --> 00:48:40,000 as part of our core material, is called the ADL tree. It's interesting to 1039 00:48:40,000 --> 00:48:43,000 read about it to see what does it take, what kind of 1040 00:48:43,000 --> 00:48:47,000 process is required to monitor this and do something about it. And 1041 00:48:47,000 --> 00:48:50,000 so if we have that in place, 1042 00:48:50,000 --> 00:48:51,000 1043 00:48:51,000 --> 00:48:53,000 we can get it to where 1044 00:48:53,000 --> 00:48:57,000 the binary search tree part of this is logarithmic on both 1045 00:48:57,000 --> 00:48:59,000 inserting new values and searching for values, 1046 00:48:59,000 --> 00:49:03,000 which is that Holy Grail of efficiency boundary. You can say, yeah logarithmic is 1047 00:49:03,000 --> 00:49:04,000 something we can tolerate 1048 00:49:04,000 --> 00:49:08,000 by virtue of adding 8 bytes of pointer, potentially some balance factor, 1049 00:49:08,000 --> 00:49:12,000 and some fanciness to guarantee that performance all over, 1050 00:49:12,000 --> 00:49:15,000 but gets us somewhere that will scale very, very well for thousands 1051 00:49:15,000 --> 00:49:18,000 and millions of entries, right, logarithmic is still a very, very fast 1052 00:49:18,000 --> 00:49:20,000 function. So it 1053 00:49:20,000 --> 00:49:24,000 creates a map that could really actually go the distance in scale. So, that's 1054 00:49:24,000 --> 00:49:28,000 your talk about binary search trees. We'll come back and talk about 1055 00:49:28,000 --> 00:49:38,000 hashing in graphs and stuff. I will see you on Wednesday.