1 00:00:00,000 --> 00:00:08,000 2 00:00:08,000 --> 00:00:11,000 3 00:00:11,000 --> 00:00:22,000 This presentation is delivered by the Stanford Center for Professional Development. 4 00:00:22,000 --> 00:00:25,000 Okay, so thank you for coming out in the rain. 5 00:00:25,000 --> 00:00:26,000 I 6 00:00:26,000 --> 00:00:27,000 appreciate you 7 00:00:27,000 --> 00:00:29,000 trudging over here and 8 00:00:29,000 --> 00:00:30,000 risking 9 00:00:30,000 --> 00:00:31,000 your health and life. 10 00:00:31,000 --> 00:00:37,000 So I'm gonna do a little diversion here to explain this idea of functions as data before I come back 11 00:00:37,000 --> 00:00:40,000 to fixing the problem we had kind of hit upon at the very end of the last lecture 12 00:00:40,000 --> 00:00:42,000 about set. 13 00:00:42,000 --> 00:00:44,000 So what I'm just gonna show you is 14 00:00:44,000 --> 00:00:45,000 a little bit about 15 00:00:45,000 --> 00:00:46,000 16 00:00:46,000 --> 00:00:49,000 the design of something in C++ 17 00:00:49,000 --> 00:00:52,000 that is gonna helpful in solving the problem we'd run into. 18 00:00:52,000 --> 00:00:55,000 What I'm gonna show you here is two pieces of code that plot 19 00:00:55,000 --> 00:00:57,000 a 20 00:00:57,000 --> 00:01:01,000 single value function across the number line. So in this case, the function is that it's the 21 00:01:01,000 --> 00:01:05,000 top one is applying the sine function, the sine wave as it varies across, and then the 22 00:01:05,000 --> 00:01:06,000 square root function 23 00:01:06,000 --> 00:01:09,000 which goes up and has a sloping curve to it, 24 00:01:09,000 --> 00:01:13,000 and that both of these have exactly the same behaviors. They're designed to kind of go into the graphics 25 00:01:13,000 --> 00:01:17,000 window, and they're just using a kind of a very simple kind of hand moving strategy. It 26 00:01:17,000 --> 00:01:21,000 starts on a particular location based on what the value of sine is here, 27 00:01:21,000 --> 00:01:23,000 and then over a particular interval 28 00:01:23,000 --> 00:01:24,000 at 0.1 29 00:01:24,000 --> 00:01:28,000 of one tenth of an inch, it plots the next point and connects a line between 30 00:01:28,000 --> 00:01:31,000 them. So it does a simple kind of line approximation of what 31 00:01:31,000 --> 00:01:34,000 that function looks like over the interval you asked it to plot. 32 00:01:34,000 --> 00:01:37,000 The same code is being used here for plotting the square root. 33 00:01:37,000 --> 00:01:41,000 And the thing to note and I tried to highlight by putting it in blue here was that every other 34 00:01:41,000 --> 00:01:45,000 part of the code is exactly the same except for those two calls where I need to know what's 35 00:01:45,000 --> 00:01:48,000 the value of square root starting at this particular 36 00:01:48,000 --> 00:01:51,000 X. So starting at Value 1, 37 00:01:51,000 --> 00:01:53,000 what is sine of one? What is square root of one? As I 38 00:01:53,000 --> 00:01:57,000 work my way across the interval, what's a square root of 1.1, 1.2, 1.3 and 39 00:01:57,000 --> 00:01:59,000 sine of those same values? 40 00:01:59,000 --> 00:02:03,000 And so everything about the code is functionally identical. It's kind of frustrating 41 00:02:03,000 --> 00:02:07,000 to look at it, though, and realize if I wanted to plot a bunch of other things -I also wanna be able to plot 42 00:02:07,000 --> 00:02:09,000 the cosine function, or 43 00:02:09,000 --> 00:02:12,000 some other function of my own creation 44 00:02:12,000 --> 00:02:13,000 -across this interval 45 00:02:13,000 --> 00:02:17,000 that I keep having to copy and paste this code and duplicate it. 46 00:02:17,000 --> 00:02:21,000 And so one of the things that hopefully your 106A and prior experiences really 47 00:02:21,000 --> 00:02:24,000 heightened your attention to is 48 00:02:24,000 --> 00:02:26,000 if I have the same piece of code 49 00:02:26,000 --> 00:02:30,000 in multiple places, I ought to be able to unify it. I ought to be able to make there be a plot 50 00:02:30,000 --> 00:02:31,000 function 51 00:02:31,000 --> 00:02:36,000 that can handle both sine and square root without actually having to distinguish it by copy and pasting, 52 00:02:36,000 --> 00:02:38,000 so I wanna unify these two. 53 00:02:38,000 --> 00:02:43,000 And so there is -the mechanism that we're gonna use kinda follows naturally if you don't let yourself get 54 00:02:43,000 --> 00:02:45,000 too tripped up by what it means. 55 00:02:45,000 --> 00:02:48,000 Just imagine that the parameter for example going into the function right now are the start and the stop, 56 00:02:48,000 --> 00:02:49,000 the interval 57 00:02:49,000 --> 00:02:52,000 from X is one to five. 58 00:02:52,000 --> 00:02:55,000 What we'd like to do is further parameterize the function. 59 00:02:55,000 --> 00:02:58,000 We'd like to add a third argument to it, which is to say 60 00:02:58,000 --> 00:03:03,000 and when you're ready to know what function you're plotting, here's the one to use. 61 00:03:03,000 --> 00:03:04,000 I'd like you to plot 62 00:03:04,000 --> 00:03:08,000 sine over the interval start to stop. I'd like you to plot square root over that. So we added the third argument, 63 00:03:08,000 --> 00:03:11,000 which was the function we wanted to invoke there. 64 00:03:11,000 --> 00:03:16,000 Then we would be able to unify this down to where we had one generic plot function. 65 00:03:16,000 --> 00:03:19,000 The good news is that this does actually 66 00:03:19,000 --> 00:03:20,000 have 67 00:03:20,000 --> 00:03:23,000 support features in the C++ language that let us do this. 68 00:03:23,000 --> 00:03:26,000 The syntax for it is a little bit unusual, and 69 00:03:26,000 --> 00:03:29,000 if you think about it too much, it can actually make your head hurt a little bit, think about what 70 00:03:29,000 --> 00:03:33,000 it's doing. But you can actually use a function, a function's name 71 00:03:33,000 --> 00:03:35,000 and the code that is associated with that name, 72 00:03:35,000 --> 00:03:37,000 as a piece of data 73 00:03:37,000 --> 00:03:41,000 that not just -you think of the code as we're calling this function, and 74 00:03:41,000 --> 00:03:44,000 we're moving these things around, and executing things. 75 00:03:44,000 --> 00:03:47,000 The things that we tend to be executing and operating on you think of as being integers, and strings, 76 00:03:47,000 --> 00:03:49,000 and characters, and files. 77 00:03:49,000 --> 00:03:52,000 But you can also extend your notion of what's data 78 00:03:52,000 --> 00:03:54,000 to include the code you wrote 79 00:03:54,000 --> 00:03:56,000 as part of the possibilities. 80 00:03:56,000 --> 00:03:59,000 So in this case, I've added that third argument that I wanted to plot, 81 00:03:59,000 --> 00:04:02,000 and this 82 00:04:02,000 --> 00:04:06,000 syntax here that's a little bit unusual to you, and I'll kind of identify it, is that the 83 00:04:06,000 --> 00:04:09,000 name of the parameter is actually FN. 84 00:04:09,000 --> 00:04:12,000 Its name is enclosed in parentheses there. 85 00:04:12,000 --> 00:04:16,000 And then to the right would be a list of the arguments 86 00:04:16,000 --> 00:04:17,000 or the prototype 87 00:04:17,000 --> 00:04:21,000 information about this function, and on the left is its return value. 88 00:04:21,000 --> 00:04:24,000 So what this says is you have two doubles, the start and stop, 89 00:04:24,000 --> 00:04:26,000 and the third thing in there isn't a double at all. It is a 90 00:04:26,000 --> 00:04:27,000 function 91 00:04:27,000 --> 00:04:32,000 of one double that returns a double, so a single value function that operates on doubles 92 00:04:32,000 --> 00:04:33,000 here. 93 00:04:33,000 --> 00:04:36,000 That is the syntax in C++ for specifying what you want coming in here 94 00:04:36,000 --> 00:04:37,000 is 95 00:04:37,000 --> 00:04:41,000 not a double, not a ray of doubles, anything funky like that. It is a function of a 96 00:04:41,000 --> 00:04:43,000 double that returns a double. 97 00:04:43,000 --> 00:04:45,000 And then in the body of this code, 98 00:04:45,000 --> 00:04:47,000 when I say FN here 99 00:04:47,000 --> 00:04:50,000 where I would've said sine, or square root, or identified 100 00:04:50,000 --> 00:04:51,000 a particular function, 101 00:04:51,000 --> 00:04:53,000 it's using the parameter 102 00:04:53,000 --> 00:04:57,000 that was passed in by the client, so it says call the client's function 103 00:04:57,000 --> 00:04:58,000 passing these values 104 00:04:58,000 --> 00:05:01,000 to plot over the range 105 00:05:01,000 --> 00:05:03,000 using their function 106 00:05:03,000 --> 00:05:07,000 in. So the idea is that valid calls to plot now become things like plot -and then give it an 107 00:05:07,000 --> 00:05:09,000 interval, zero to two, 108 00:05:09,000 --> 00:05:11,000 and you give the name of a function: 109 00:05:11,000 --> 00:05:15,000 the sine function, which comes out of the math library, the square root function, 110 00:05:15,000 --> 00:05:16,000 also in the math library. 111 00:05:16,000 --> 00:05:19,000 It could be that the function is something you wrote yourself, 112 00:05:19,000 --> 00:05:21,000 the my function. 113 00:05:21,000 --> 00:05:25,000 In order for this to be valid though, you can't just put any old function name there. It is 114 00:05:25,000 --> 00:05:27,000 actually being quite specific about it 115 00:05:27,000 --> 00:05:30,000 that plot was to find [inaudible]. It took a double, returned a double. That's the kind of function 116 00:05:30,000 --> 00:05:33,000 that you can give it, so any function that has that 117 00:05:33,000 --> 00:05:36,000 prototype, so it matches that 118 00:05:36,000 --> 00:05:40,000 format is an acceptable one to pass in. If you try to pass something that actually 119 00:05:40,000 --> 00:05:42,000 just does some other kind of function, 120 00:05:42,000 --> 00:05:45,000 it doesn't have the same prototype, so the get line that takes no arguments and returns a string just 121 00:05:45,000 --> 00:05:47,000 doesn't match on any front. 122 00:05:47,000 --> 00:05:51,000 And if I try to say here, plot the get line function of the integral two to five, 123 00:05:51,000 --> 00:05:53,000 it will quite rightfully complain to me 124 00:05:53,000 --> 00:05:54,000 125 00:05:54,000 --> 00:05:56,000 that that just doesn't make sense. 126 00:05:56,000 --> 00:05:57,000 That's because it doesn't match. 127 00:05:57,000 --> 00:06:02,000 So a little bit of syntax here, but actually kind of a very powerful thing, and it allows us to write to 128 00:06:02,000 --> 00:06:05,000 an addition to kind of parameterizing on these things you think of as traditional 129 00:06:05,000 --> 00:06:07,000 data, integers, and strings, and whatnot. It's 130 00:06:07,000 --> 00:06:12,000 also say as part of your operations, you may need to make a call out to some other function. 131 00:06:12,000 --> 00:06:15,000 Let's leave it open what that function is and allow the client to specify what function 132 00:06:15,000 --> 00:06:20,000 to call at that time. So in this case for a plot, what function that you're trying to plot, let the client 133 00:06:20,000 --> 00:06:21,000 tell you, 134 00:06:21,000 --> 00:06:29,000 and then based on what they ask you to plot you can plot different things. All right. Way in the back. Is 135 00:06:29,000 --> 00:06:34,000 there a similar setup for multivariable functions [inaudible]? Certainly. So all I would 136 00:06:34,000 --> 00:06:37,000 need to do if this was a function that took a couple arguments, I would say double comma double comma int. If it 137 00:06:37,000 --> 00:06:42,000 returned void, [inaudible] returned. Sometimes it looks a little bit like the prototype kinda taken out of context 138 00:06:42,000 --> 00:06:43,000 and stuffed in there, 139 00:06:43,000 --> 00:06:46,000 and then those parens around the function are a very important part of that 140 00:06:46,000 --> 00:06:50,000 which is telling you yeah, this is a function of these 141 00:06:50,000 --> 00:06:52,000 with this prototype information. 142 00:06:52,000 --> 00:06:54,000 Behind you. No? You're good now. Somebody else? Over here? Is that FN a fixed 143 00:06:54,000 --> 00:06:56,000 name 144 00:06:56,000 --> 00:06:57,000 145 00:06:57,000 --> 00:07:01,000 [inaudible]? No. It's just like any parameter name. You get to pick it. So I could've called it plot function, my function, 146 00:07:01,000 --> 00:07:08,000 your function, whatever I wanted. Here 147 00:07:08,000 --> 00:07:12,000 in the front. [Inaudible] Java? So Java doesn't really have a similar mechanism that looks like this. C does, so C++ inherits it 148 00:07:12,000 --> 00:07:14,000 from 149 00:07:14,000 --> 00:07:18,000 C. There are other ways you try to accomplish this in Java. It tries to support the same functionality in the end, but it 150 00:07:18,000 --> 00:07:19,000 151 00:07:19,000 --> 00:07:22,000 uses a pretty different approach than a functions as 152 00:07:22,000 --> 00:07:27,000 data approach. [Inaudible]? 153 00:07:27,000 --> 00:07:28,000 Can you pass like a 154 00:07:28,000 --> 00:07:31,000 method, like an operator? So typically not. 155 00:07:31,000 --> 00:07:35,000 This syntax that's being used here is for a free function, a function 156 00:07:35,000 --> 00:07:38,000 that's kind of out in the global namespace, that level. 157 00:07:38,000 --> 00:07:42,000 There is a different syntax for passing a method, which is a little bit more messy, 158 00:07:42,000 --> 00:07:46,000 and we won't tend to need it, so I won't go there with you, but it does exist. It just as it stands 159 00:07:46,000 --> 00:07:50,000 does not want a method. It wants a function. So 160 00:07:50,000 --> 00:07:54,000 a method meaning member function of a class. 161 00:07:54,000 --> 00:07:56,000 Okay, so let me -that was kind of just to 162 00:07:56,000 --> 00:08:00,000 set the groundwork for the problem we were trying to solve in set, 163 00:08:00,000 --> 00:08:05,000 which was set is holding this collection of elements that the client has stuffed in there. It's 164 00:08:05,000 --> 00:08:05,000 a generic 165 00:08:05,000 --> 00:08:07,000 templative class, so it doesn't 166 00:08:07,000 --> 00:08:11,000 have any preconceived notion about what's being stored. Are the strings? Are they student structures? 167 00:08:11,000 --> 00:08:13,000 Are they integers? 168 00:08:13,000 --> 00:08:17,000 And that in order to perform its operations efficiently, it actually is using this 169 00:08:17,000 --> 00:08:22,000 notion of ordering, keeping them in an order so that it can iterate an order. It can quickly find on the basis 170 00:08:22,000 --> 00:08:27,000 of using order to quickly decide where something has to be and if it's present in the collection. 171 00:08:27,000 --> 00:08:29,000 So how does it know how to compare 172 00:08:29,000 --> 00:08:31,000 something that's of unknown type? 173 00:08:31,000 --> 00:08:36,000 Well, what it does is it has a default strategy. It makes an assumption that 174 00:08:36,000 --> 00:08:39,000 if I used equals equals and less than that would tell me kinda where to go. 175 00:08:39,000 --> 00:08:42,000 And so it's got this idea that it wants to know. We'll give it two things. 176 00:08:42,000 --> 00:08:47,000 Are they the same thing? In which case, it uses kinda the zero to show the ordering between them. If one 177 00:08:47,000 --> 00:08:51,000 precedes the other, it wants to have some negative number that says well this one precedes it, or some 178 00:08:51,000 --> 00:08:53,000 positive number 179 00:08:53,000 --> 00:08:54,000 if it follows it. 180 00:08:54,000 --> 00:08:57,000 So it applies this operation to the [inaudible] that 181 00:08:57,000 --> 00:09:02,000 are there, and for strings, and ints, and doubles, and characters, 182 00:09:02,000 --> 00:09:04,000 this works perfectly well. 183 00:09:04,000 --> 00:09:08,000 And so that's how without us going out of our way, we can have sets of things 184 00:09:08,000 --> 00:09:11,000 that respond to the built in relational operators 185 00:09:11,000 --> 00:09:14,000 without any special effort as a client. 186 00:09:14,000 --> 00:09:17,000 But what we can get into trouble with right is when 187 00:09:17,000 --> 00:09:20,000 equals equals less than don't make sense for type. 188 00:09:20,000 --> 00:09:24,000 So let me just go actually type some code, and I'll show you. I have it on the slide, but I'm gonna -wow, 189 00:09:24,000 --> 00:09:29,000 this chair suddenly got really short. [Inaudible] fix that. Okay. 190 00:09:29,000 --> 00:09:30,000 191 00:09:30,000 --> 00:09:33,000 We'll go over here because I think it's better just to see it really happening, 192 00:09:33,000 --> 00:09:36,000 so I'm gonna ignore this piece of code because it's not what I wanted. 193 00:09:36,000 --> 00:09:40,000 But if I make some student T structure, 194 00:09:40,000 --> 00:09:40,000 and it's got 195 00:09:40,000 --> 00:09:44,000 the first and last name, and it's got the ID number, 196 00:09:44,000 --> 00:09:48,000 and maybe that's all I want for now 197 00:09:48,000 --> 00:09:48,000 -that if 198 00:09:48,000 --> 00:09:51,000 down in my main -can't 199 00:09:51,000 --> 00:09:54,000 find my main. There it is. I'm 200 00:09:54,000 --> 00:09:56,000 gonna need that piece of code later, so I'm gonna leave it there. 201 00:09:56,000 --> 00:09:58,000 If I make a set, 202 00:09:58,000 --> 00:10:05,000 and I say I'd like a set of students -my class 203 00:10:05,000 --> 00:10:09,000 -and I do this. And so I feel like I haven't gotten [inaudible]. I made this structure. I 204 00:10:09,000 --> 00:10:14,000 say I'd like to make a set of students, that each student is in the class exactly once and I don't need any duplicates, 205 00:10:14,000 --> 00:10:18,000 and I go through the process of trying to compile this, it's gonna give me some complaints. 206 00:10:18,000 --> 00:10:21,000 And its complaint, which is a little bit hard to see up here, is 207 00:10:21,000 --> 00:10:26,000 there's no match for operator equals equals in one, equals equals two, and operator less than in 208 00:10:26,000 --> 00:10:27,000 one equals 209 00:10:27,000 --> 00:10:30,000 two. So it's kind of showing me another piece of code that's kind of a 210 00:10:30,000 --> 00:10:35,000 hidden piece of code I haven't actually seen directly, which is this operator compare function. 211 00:10:35,000 --> 00:10:37,000 And that is the one that the set is using, as 212 00:10:37,000 --> 00:10:40,000 it has this idea of what's the way it should compare two things. It says well I have some of this 213 00:10:40,000 --> 00:10:46,000 operator compare that works generically on any type of things using equals equals and less than. And 214 00:10:46,000 --> 00:10:50,000 if I click up here on the instantiate from here, it's gonna help me to understand what caused 215 00:10:50,000 --> 00:10:52,000 this problem. 216 00:10:52,000 --> 00:10:53,000 The problem was caused by 217 00:10:53,000 --> 00:10:57,000 trying to create a set who was holding student T. 218 00:10:57,000 --> 00:11:00,000 And so this gives you a little bit of an insight on how the template 219 00:11:00,000 --> 00:11:02,000 operations are handled by the compiler, that 220 00:11:02,000 --> 00:11:06,000 I have built this whole set class, and it depends on there being equals equals and less than 221 00:11:06,000 --> 00:11:08,000 working for the type. 222 00:11:08,000 --> 00:11:13,000 The fact that it didn't work for student T wasn't a cause for alarm until 223 00:11:13,000 --> 00:11:15,000 I actually tried to instantiate 224 00:11:15,000 --> 00:11:18,000 it. So at the point where I said I'd like to make a set holding student 225 00:11:18,000 --> 00:11:22,000 T, the first point where the compiler actually goes through the process of generating a whole set 226 00:11:22,000 --> 00:11:25,000 class, the set angle brackets student T, 227 00:11:25,000 --> 00:11:29,000 filling in all the details, kinda working it all out, making the add, and contains, and whatever 228 00:11:29,000 --> 00:11:32,000 operations, and then in the process of those, those 229 00:11:32,000 --> 00:11:36,000 things are making calls that try to take student T objects and compare them using less than and 230 00:11:36,000 --> 00:11:37,000 equals equals. 231 00:11:37,000 --> 00:11:39,000 And that causes that code to fail. 232 00:11:39,000 --> 00:11:42,000 So the code that's really failing is kind of somewhere in the class library, 233 00:11:42,000 --> 00:11:44,000 but it's failing because 234 00:11:44,000 --> 00:11:49,000 the things that we're passing through and instantiating for don't work with that 235 00:11:49,000 --> 00:11:50,000 setup. 236 00:11:50,000 --> 00:11:52,000 So it's something we've gotta 237 00:11:52,000 --> 00:11:54,000 fix, we've gotta do something about. 238 00:11:54,000 --> 00:11:59,000 Let me go back here and say what am I gonna do. Well, so 239 00:11:59,000 --> 00:12:01,000 there's my error message. Same one, 240 00:12:01,000 --> 00:12:04,000 right? Saying yeah, you can't do that 241 00:12:04,000 --> 00:12:05,000 with a [inaudible]. Well, what 242 00:12:05,000 --> 00:12:06,000 we do 243 00:12:06,000 --> 00:12:09,000 is we use this notion of functions as data 244 00:12:09,000 --> 00:12:12,000 to work out a solution for this problem 245 00:12:12,000 --> 00:12:15,000 that if you think about kinda what's going on, the set actually knows everything about how 246 00:12:15,000 --> 00:12:19,000 to store things. It's a very fancy efficient structure that says given your things, keeps them 247 00:12:19,000 --> 00:12:20,000 in order, and 248 00:12:20,000 --> 00:12:22,000 it manages to update, and 249 00:12:22,000 --> 00:12:24,000 insert, and search that thing very efficiently. 250 00:12:24,000 --> 00:12:28,000 But it doesn't know given any two random things how to compare them other than this assumption 251 00:12:28,000 --> 00:12:32,000 it was making about less than and equals equals being a way to tell. 252 00:12:32,000 --> 00:12:37,000 If it wants to have sort of a more sophisticated handling of that, what it needs to do is cooperate 253 00:12:37,000 --> 00:12:38,000 with the client -that 254 00:12:38,000 --> 00:12:43,000 the implementer of the set can't do it all. So there's these two programmers that 255 00:12:43,000 --> 00:12:44,000 need to work in harmony. 256 00:12:44,000 --> 00:12:50,000 So what the set does is it allows for the client to specify by providing a function. 257 00:12:50,000 --> 00:12:54,000 It says well when I need to compare two things, how about you give me the name of a function 258 00:12:54,000 --> 00:12:58,000 that when given two elements will return to me their ordering, this integer zero, 259 00:12:58,000 --> 00:13:00,000 negative, positive that tells me how to put them in place. 260 00:13:00,000 --> 00:13:05,000 And so the set kind of writes all of its operations in terms of well there's some function I can call, this function that 261 00:13:05,000 --> 00:13:07,000 will compare two things. 262 00:13:07,000 --> 00:13:11,000 If they don't specify one, I'll use this default one that maps them to the relationals, but if they do give 263 00:13:11,000 --> 00:13:15,000 me one, I'll just ask them to do the comparison. And so then as its doing its searching and inserting 264 00:13:15,000 --> 00:13:16,000 and whatnot, 265 00:13:16,000 --> 00:13:19,000 it's calling back. We call that calling back to the client. 266 00:13:19,000 --> 00:13:22,000 So the client writes a function. If I wanna put 267 00:13:22,000 --> 00:13:25,000 student Ts into a set, 268 00:13:25,000 --> 00:13:28,000 then I need to say when you compare two student Ts, what do you look at to know if they're 269 00:13:28,000 --> 00:13:29,000 the same or how to order them. 270 00:13:29,000 --> 00:13:31,000 So maybe I'm gonna order them by ID number. 271 00:13:31,000 --> 00:13:33,000 Maybe I'm gonna use their first and last name. 272 00:13:33,000 --> 00:13:37,000 Whatever it means for two things to be equal and have some sense of order, 273 00:13:37,000 --> 00:13:38,000 I supply 274 00:13:38,000 --> 00:13:40,000 -I write the function, 275 00:13:40,000 --> 00:13:43,000 and then I pass it to the set constructor. 276 00:13:43,000 --> 00:13:44,000 I say here's the function to use. 277 00:13:44,000 --> 00:13:49,000 The set will hold on to that function. So I say here's the compare student structure function. 278 00:13:49,000 --> 00:13:52,000 It holds onto that name, and when needed it calls back. 279 00:13:52,000 --> 00:13:53,000 It says I'm 280 00:13:53,000 --> 00:13:57,000 about to go look for a student structure. Is this the one? Well, I don't know if two student structures are 281 00:13:57,000 --> 00:14:01,000 the same. I'll ask the client. Here's two student structures. Are they the same? 282 00:14:01,000 --> 00:14:02,000 And then 283 00:14:02,000 --> 00:14:06,000 as needed, it'll keep looking or 284 00:14:06,000 --> 00:14:09,000 insert and add and do whatever I need to do. 285 00:14:09,000 --> 00:14:13,000 So let's go back over here. 286 00:14:13,000 --> 00:14:19,000 I'll write a little function. So the 287 00:14:19,000 --> 00:14:21,000 prototype for it is it takes two elem Ts 288 00:14:21,000 --> 00:14:24,000 and it returns an int. 289 00:14:24,000 --> 00:14:31,000 That int is expected to have a value zero if they are the same, 290 00:14:31,000 --> 00:14:33,000 and a value that is negative if 291 00:14:33,000 --> 00:14:37,000 the first argument precedes the second. So if A is less than B, returns some negative 292 00:14:37,000 --> 00:14:41,000 thing. You can return negative one, or negative 100, or negative one million, 293 00:14:41,000 --> 00:14:42,000 but 294 00:14:42,000 --> 00:14:45,000 you need to return some negative value. And then if A [cuts 295 00:14:45,000 --> 00:14:47,000 out] some ordering, so they're not equal 296 00:14:47,000 --> 00:14:51,000 [cuts out] later, it will return some positive value: 1, 10, 6 million. 297 00:14:51,000 --> 00:14:55,000 So if I do [cuts out] say I use ID num as my comparison. 298 00:14:55,000 --> 00:14:56,000 Based on [cuts 299 00:14:56,000 --> 00:14:58,000 out] are the same, if they are 300 00:14:58,000 --> 00:15:01,000 I can return zero. And if 301 00:15:01,000 --> 00:15:03,000 ID num of A 302 00:15:03,000 --> 00:15:07,000 is less than the ID num of B, I can 303 00:15:07,000 --> 00:15:08,000 return negative one, and then in 304 00:15:08,000 --> 00:15:11,000 the other case, I'll return one. 305 00:15:11,000 --> 00:15:14,000 So it will compare them on the basis [cuts out] 306 00:15:14,000 --> 00:15:16,000 figuring that the name field 307 00:15:16,000 --> 00:15:17,000 at that point is 308 00:15:17,000 --> 00:15:19,000 nothing new. 309 00:15:19,000 --> 00:15:24,000 And then the way I use that is over here when I'm constructing it is there is 310 00:15:24,000 --> 00:15:27,000 [cuts out] to the constructor, and so that's how I would pass [cuts out] add 311 00:15:27,000 --> 00:15:29,000 parens to the [cuts 312 00:15:29,000 --> 00:15:35,000 out] as I'm declaring it, and then I pass the name. Do I call it compare student or compare students? I can't remember. Compare student. Okay. 313 00:15:35,000 --> 00:15:38,000 And 314 00:15:38,000 --> 00:15:42,000 this then -so now there's nothing going on in the code -causes it to compile, 315 00:15:42,000 --> 00:15:45,000 and that if you were to put 316 00:15:45,000 --> 00:15:50,000 let's say a see out statement in your comparison function just for fun, you would find out as 317 00:15:50,000 --> 00:15:53,000 you were doing adds, and compares, and removes, and whatnot on this set 318 00:15:53,000 --> 00:15:58,000 that you would see that your call kept being made. It kept calling back to you as the 319 00:15:58,000 --> 00:16:01,000 client saying I need to compare these things. I need to compare these things to decide where to put something, 320 00:16:01,000 --> 00:16:03,000 whether it had something, 321 00:16:03,000 --> 00:16:04,000 and whatnot. 322 00:16:04,000 --> 00:16:08,000 And then based on your ordering, that would control for example how the iterator worked, that 323 00:16:08,000 --> 00:16:12,000 the smallest one according to your function would be the one first returned by your 324 00:16:12,000 --> 00:16:14,000 iterator, and it would move through larger 325 00:16:14,000 --> 00:16:17,000 or sort of later in the ordering ones until the end. So 326 00:16:17,000 --> 00:16:20,000 it's a very powerful mechanism 327 00:16:20,000 --> 00:16:24,000 that's at work here because it means that for 328 00:16:24,000 --> 00:16:28,000 anything you wanna put in a set, as long as you're willing to say how it is you compare 329 00:16:28,000 --> 00:16:28,000 it, 330 00:16:28,000 --> 00:16:33,000 then the set will take over and do the very efficient storing, and searching, and organizing 331 00:16:33,000 --> 00:16:34,000 of that. 332 00:16:34,000 --> 00:16:38,000 But you, the only piece you need to supply is this one little thing it can't figure 333 00:16:38,000 --> 00:16:42,000 out for itself, which is given your type of thing, how do you compare 334 00:16:42,000 --> 00:16:46,000 it. For the built in types string, and int, and double, and car, 335 00:16:46,000 --> 00:16:51,000 it does have a default comparison function, that one that was called operator compare. Let 336 00:16:51,000 --> 00:16:55,000 me go open it for you 337 00:16:55,000 --> 00:16:56,000 [inaudible] the 106. 338 00:16:56,000 --> 00:16:58,000 So there is a 339 00:16:58,000 --> 00:17:00,000 compare function dot H, 340 00:17:00,000 --> 00:17:02,000 and this is actually what 341 00:17:02,000 --> 00:17:05,000 the default version of it looks. It's actually written as a template itself 342 00:17:05,000 --> 00:17:07,000 that given two things it 343 00:17:07,000 --> 00:17:08,000 just 344 00:17:08,000 --> 00:17:11,000 turns around and asks the built in operators to help us out with that. 345 00:17:11,000 --> 00:17:15,000 And that is the name that's being used 346 00:17:15,000 --> 00:17:17,000 if I open the set 347 00:17:17,000 --> 00:17:20,000 and you look at its constructor call. I had said that I would come back and tell you about 348 00:17:20,000 --> 00:17:22,000 what this was 349 00:17:22,000 --> 00:17:25,000 that the argument going into the set constructor 350 00:17:25,000 --> 00:17:31,000 is one parameter whose name is CMP function that takes two elem type things -so here's the 351 00:17:31,000 --> 00:17:33,000 one in the example of the two argument prototype 352 00:17:33,000 --> 00:17:38,000 -returns an int, and then it uses a default assignment for that of operator compare, the one we just 353 00:17:38,000 --> 00:17:39,000 looked at, 354 00:17:39,000 --> 00:17:41,000 so that if you don't specify it, it goes 355 00:17:41,000 --> 00:17:46,000 through and generates the standard comparison function for the type, 356 00:17:46,000 --> 00:17:47,000 which 357 00:17:47,000 --> 00:17:49,000 for built-ins will work fine, but for 358 00:17:49,000 --> 00:17:53,000 user defined things is gonna create compiler errors. 359 00:17:53,000 --> 00:17:56,000 So you can also choose if you don't like the default ordering works. 360 00:17:56,000 --> 00:18:01,000 So for example, if you wanted to build a set of strings that was case insensitive 361 00:18:01,000 --> 00:18:04,000 -so the default string handling would be to use equals equals and less than, which actually 362 00:18:04,000 --> 00:18:08,000 does care about case. It doesn't think that capital [inaudible] is the same as lower case. 363 00:18:08,000 --> 00:18:10,000 If you wanted it to consider them the same, 364 00:18:10,000 --> 00:18:14,000 you could supply your own. A compare case insensitively function took two strings, 365 00:18:14,000 --> 00:18:16,000 converted their case, and then compared them. 366 00:18:16,000 --> 00:18:19,000 And then when you establish a set of strings, instead of letting the default argument take over, 367 00:18:19,000 --> 00:18:24,000 go ahead and use your case insensitive compare, and then now you have a set of strings 368 00:18:24,000 --> 00:18:26,000 that operates case insensitively. 369 00:18:26,000 --> 00:18:28,000 So you can change 370 00:18:28,000 --> 00:18:31,000 the ordering, adapt the ordering, whatever you like for the primitives, as well as supply the necessary 371 00:18:31,000 --> 00:18:32,000 one for the 372 00:18:32,000 --> 00:18:34,000 things the built-ins 373 00:18:34,000 --> 00:18:35,000 don't have 374 00:18:35,000 --> 00:18:40,000 properties for. So then that's that piece of code right 375 00:18:40,000 --> 00:18:41,000 there. All right. So 376 00:18:41,000 --> 00:18:44,000 does that make sense? 377 00:18:44,000 --> 00:18:45,000 Well, now you know 378 00:18:45,000 --> 00:18:49,000 kind of the whole range of things that are available in the class library. All 379 00:18:49,000 --> 00:18:53,000 right, so we saw the four sequential containers, the vector, stack, queue, 380 00:18:53,000 --> 00:18:55,000 and the grid that you kinda indexed ordering, 381 00:18:55,000 --> 00:18:58,000 and kinda allowed you to throw things in and get them back out. 382 00:18:58,000 --> 00:19:02,000 We went through the map, which is the sort of fancy heavy lifter that does that key value 383 00:19:02,000 --> 00:19:04,000 lookup, and then we've seen the set, which does kind of 384 00:19:04,000 --> 00:19:06,000 aggregate collection management, 385 00:19:06,000 --> 00:19:10,000 and very efficient operations for kind of searching, retrieving, ordering, 386 00:19:10,000 --> 00:19:12,000 387 00:19:12,000 --> 00:19:13,000 joining with other kind of sets, and stuff 388 00:19:13,000 --> 00:19:16,000 that also has a lot of high utility. I wanna do 389 00:19:16,000 --> 00:19:19,000 one quick little program with you before I start recursion 390 00:19:19,000 --> 00:19:21,000 just because I think it's kinda cool 391 00:19:21,000 --> 00:19:23,000 is to talk a little about this idea of like once 392 00:19:23,000 --> 00:19:26,000 you have these ADTs, you can solve a lot of cool problems, and that's certainly what this 393 00:19:26,000 --> 00:19:29,000 week's assignment is about. It's like well here are these tasks that 394 00:19:29,000 --> 00:19:31,000 if you didn't have 395 00:19:31,000 --> 00:19:34,000 -So ADTs, just a reminder -I say this word as though everybody knows exactly what it means 396 00:19:34,000 --> 00:19:37,000 -is just the idea of an abstract data type. 397 00:19:37,000 --> 00:19:39,000 So an abstract data type is a data type 398 00:19:39,000 --> 00:19:42,000 that you think of in terms of what it provides as an abstraction. 399 00:19:42,000 --> 00:19:44,000 So a queue is this FIFO line, 400 00:19:44,000 --> 00:19:47,000 and how it works internally, what it's implemented as, we're not worried 401 00:19:47,000 --> 00:19:51,000 about it at all. We're only worried about the abstraction of enqueue and dequeue, and it coming first 402 00:19:51,000 --> 00:19:53,000 in first out. So we talk 403 00:19:53,000 --> 00:19:56,000 about ADTs, we say once we have a queue, a stack, or a vector, 404 00:19:56,000 --> 00:19:58,000 we know what those things do, 405 00:19:58,000 --> 00:20:00,000 what a mathematical set is about. 406 00:20:00,000 --> 00:20:03,000 We build on top of that. We write code 407 00:20:03,000 --> 00:20:06,000 that leverages those ADTs to do cool things 408 00:20:06,000 --> 00:20:10,000 without having to also manage the low level details of where the memory came from for these things, 409 00:20:10,000 --> 00:20:13,000 how they grow, how they search, how 410 00:20:13,000 --> 00:20:15,000 they store and organize the data. 411 00:20:15,000 --> 00:20:17,000 You just get to do real cool things. 412 00:20:17,000 --> 00:20:21,000 So you probably got a little taste of that at the end of 106A when you get to use the array list and the 413 00:20:21,000 --> 00:20:24,000 hashmap to do things. This set kinda just expands out 414 00:20:24,000 --> 00:20:26,000 to fill out some other niches 415 00:20:26,000 --> 00:20:28,000 where you can do a lot of really cool things 416 00:20:28,000 --> 00:20:30,000 because you have these things around to build on. 417 00:20:30,000 --> 00:20:34,000 So one of the things that happens a lot is you tend to do layered things, and you'll see a little bit of this 418 00:20:34,000 --> 00:20:38,000 in the assignment you're doing this week where it's not just a set of something, it's a set of a map 419 00:20:38,000 --> 00:20:39,000 of something, or 420 00:20:39,000 --> 00:20:43,000 a vector of queues, a map of set. So I gave you a couple of examples here of 421 00:20:43,000 --> 00:20:44,000 the things 422 00:20:44,000 --> 00:20:48,000 that might be useful. Like if you think of what a smoothie is, it's a set of things 423 00:20:48,000 --> 00:20:52,000 mixed together, some yogurt, and some different fruits, some wheatgrass, whatever it is you have in 424 00:20:52,000 --> 00:20:53,000 it. 425 00:20:53,000 --> 00:20:57,000 And that the menu for a smoothie shop is really just a bunch of those sets, so each set 426 00:20:57,000 --> 00:21:00,000 of ingredients is a particular smoothie they have, and then 427 00:21:00,000 --> 00:21:04,000 the set of all those sets is the menu that they post up on the board you can come 428 00:21:04,000 --> 00:21:06,000 in and order. 429 00:21:06,000 --> 00:21:07,000 The 430 00:21:07,000 --> 00:21:09,000 compiler tends to use a map 431 00:21:09,000 --> 00:21:13,000 to keep track of all the variables that are in scope. As you declare variables, it 432 00:21:13,000 --> 00:21:17,000 adds them to the map so that when you later use that name, it knows where to find it. 433 00:21:17,000 --> 00:21:19,000 Well, it also has to manage though 434 00:21:19,000 --> 00:21:21,000 not just one map, 435 00:21:21,000 --> 00:21:25,000 but the idea is as you enter and exit scopes, there is this 436 00:21:25,000 --> 00:21:27,000 layering of open scopes. 437 00:21:27,000 --> 00:21:32,000 So you have some open scope here. You go into a for loop where you open another scope. You add some 438 00:21:32,000 --> 00:21:33,000 new variables 439 00:21:33,000 --> 00:21:37,000 that when you look it actually shadows then at the nearest definition, so if you had two variables 440 00:21:37,000 --> 00:21:41,000 of the same, it needs to look at the one that's closest. 441 00:21:41,000 --> 00:21:44,000 And then when you exit that scope, it needs those variables to go away and 442 00:21:44,000 --> 00:21:48,000 no longer be accessible. So one model for that could be very much a stack of 443 00:21:48,000 --> 00:21:51,000 maps where each of those maps represents the scope 444 00:21:51,000 --> 00:21:52,000 that's active, 445 00:21:52,000 --> 00:21:56,000 a set of variable names, and maybe their types and information is stored in that map. And then 446 00:21:56,000 --> 00:21:58,000 you stack them up. As you 447 00:21:58,000 --> 00:22:01,000 open a scope, you push on a new empty map. 448 00:22:01,000 --> 00:22:03,000 You put things into it, and then 449 00:22:03,000 --> 00:22:06,000 maybe you enter another scope. You push on another new empty map. You stick things into it, but as you 450 00:22:06,000 --> 00:22:11,000 exit and hit those closing braces, you pop those things from the stack to get to 451 00:22:11,000 --> 00:22:13,000 this previous 452 00:22:13,000 --> 00:22:16,000 environment you were in. So let me do a 453 00:22:16,000 --> 00:22:19,000 little program with you. I just have 454 00:22:19,000 --> 00:22:22,000 this idea of how this would work, and we'll see if 455 00:22:22,000 --> 00:22:26,000 I can code something up. So I have -let's go 456 00:22:26,000 --> 00:22:28,000 back over 457 00:22:28,000 --> 00:22:32,000 here. I'm going to -this is the piece of code that just reads 458 00:22:32,000 --> 00:22:34,000 words, so that's a 459 00:22:34,000 --> 00:22:36,000 fine piece of code to have here. 460 00:22:36,000 --> 00:22:38,000 I have a file here -let 461 00:22:38,000 --> 00:22:39,000 me open 462 00:22:39,000 --> 00:22:43,000 it up for you -that just contains the contents of the Official Scrabble Players' Dictionary, 463 00:22:43,000 --> 00:22:45,000 Edition 2. 464 00:22:45,000 --> 00:22:47,000 It's got a lot of words it. 465 00:22:47,000 --> 00:22:48,000 It's pretty long. 466 00:22:48,000 --> 00:22:49,000 It's still loading. 467 00:22:49,000 --> 00:22:53,000 Let's go back and do something else while it's loading. 468 00:22:53,000 --> 00:22:54,000 It happened to have 469 00:22:54,000 --> 00:22:55,000 470 00:22:55,000 --> 00:22:58,000 about 120,000 words I think is what it would be right now. 471 00:22:58,000 --> 00:22:59,000 So it's busy loading, 472 00:22:59,000 --> 00:23:01,000 and I have this question for you. 473 00:23:01,000 --> 00:23:04,000 There are certain words that are anagrams of each other. 474 00:23:04,000 --> 00:23:06,000 475 00:23:06,000 --> 00:23:08,000 The word 476 00:23:08,000 --> 00:23:10,000 cheap can be anagrammed into the word peach, 477 00:23:10,000 --> 00:23:11,000 things like that. 478 00:23:11,000 --> 00:23:12,000 479 00:23:12,000 --> 00:23:13,000 And so I 480 00:23:13,000 --> 00:23:15,000 am curious for the 481 00:23:15,000 --> 00:23:16,000 Official Scrabble Players' Dictionary 482 00:23:16,000 --> 00:23:20,000 what -so if you imagine that some words can be anagrammed a couple times, five 483 00:23:20,000 --> 00:23:24,000 or six different words just on how you can rearrange the letters. I'm curious to know what the largest anagram 484 00:23:24,000 --> 00:23:28,000 cluster is in the Official Scrabble Players' Dictionary. So I'd like to know across 485 00:23:28,000 --> 00:23:30,000 all 127,000 words 486 00:23:30,000 --> 00:23:34,000 that they form little clusters, and I'd like to find out what's the biggest of 487 00:23:34,000 --> 00:23:39,000 those clusters. 488 00:23:39,000 --> 00:23:41,000 Okay. That's my goal. So 489 00:23:41,000 --> 00:23:45,000 here's what I've got going. I've got something that's gonna read them one by one. 490 00:23:45,000 --> 00:23:48,000 So let's brainstorm for a second. 491 00:23:48,000 --> 00:23:49,000 492 00:23:49,000 --> 00:23:51,000 I want a way to take a particular word and kinda stick 493 00:23:51,000 --> 00:23:55,000 it with its other anagram cluster friends. 494 00:23:55,000 --> 00:23:57,000 What's a way I might do that? 495 00:23:57,000 --> 00:24:03,000 Help me design my data structure. 496 00:24:03,000 --> 00:24:06,000 Help me out. [Inaudible]. I've 497 00:24:06,000 --> 00:24:08,000 got the word peach. 498 00:24:08,000 --> 00:24:10,000 Where should I stick it so I can ? You could treat each string as like a set of 499 00:24:10,000 --> 00:24:13,000 ? So I have this 500 00:24:13,000 --> 00:24:17,000 string, 501 00:24:17,000 --> 00:24:21,000 which represents the letters. I've 502 00:24:21,000 --> 00:24:22,000 got the word peach. I wanna be able to stick peach with cheap, 503 00:24:22,000 --> 00:24:26,000 so where should I stick peach in such a way that I could find it. And you've got this idea that 504 00:24:26,000 --> 00:24:27,000 505 00:24:27,000 --> 00:24:29,000 the letters there are a set. 506 00:24:29,000 --> 00:24:32,000 They're not quite a set, though. Be careful because the word banana has a 507 00:24:32,000 --> 00:24:36,000 couple As and a couple Ns, and so it's not that I'd really want it to come down to be 508 00:24:36,000 --> 00:24:39,000 the set BAN. I wouldn't wanna coalesce the duplicates on that, so I really do wanna preserve all the letters that 509 00:24:39,000 --> 00:24:43,000 are in there, 510 00:24:43,000 --> 00:24:46,000 but your idea's getting us somewhere. It's like there is this idea of kind of like for any 511 00:24:46,000 --> 00:24:49,000 particular word there's the 512 00:24:49,000 --> 00:24:53,000 collection of letters that it is formed from. 513 00:24:53,000 --> 00:24:56,000 And somehow if I could use that as an identifier 514 00:24:56,000 --> 00:24:58,000 in a way that was reliable 515 00:24:58,000 --> 00:25:01,000 -anybody 516 00:25:01,000 --> 00:25:02,000 got any 517 00:25:02,000 --> 00:25:03,000 ideas 518 00:25:03,000 --> 00:25:08,000 about what to do with that? If you did that a vector, each letter and then the frequency of each letter in the word? So I could certainly do that. I could build kind of a vector that 519 00:25:08,000 --> 00:25:12,000 had kinda frequencies, that had this little struct that maybe it was number of times it occurs. 520 00:25:12,000 --> 00:25:15,000 Then I could try to build something on the basis of that vector that was like 521 00:25:15,000 --> 00:25:18,000 here is -do these two vectors 522 00:25:18,000 --> 00:25:22,000 match? Does banana match apple? And you'd say well no. It turns out they don't 523 00:25:22,000 --> 00:25:23,000 have the same letters. I'm trying to 524 00:25:23,000 --> 00:25:27,000 think of a really easy way to represent that. Your idea's good, 525 00:25:27,000 --> 00:25:31,000 but I'm thinking really lazy. So somebody help me who's lazy. Could you 526 00:25:31,000 --> 00:25:36,000 have a map where the key is all the letters in alphabetical order and the value is a 527 00:25:36,000 --> 00:25:41,000 vector of all the ? Yeah. That is a great idea. You're taking his idea, and you're making it easier. 528 00:25:41,000 --> 00:25:43,000 You're capitalizing on lazy, which is yeah 529 00:25:43,000 --> 00:25:45,000 -I wanna keep track of all the letters that are in the word, 530 00:25:45,000 --> 00:25:48,000 but I wanna do it in a way that makes it really easy for 531 00:25:48,000 --> 00:25:52,000 example to know whether two have the same -we'll call it the signature. The signature of 532 00:25:52,000 --> 00:25:54,000 the word is the letter frequency across it. 533 00:25:54,000 --> 00:25:57,000 If I could come up with a way to represent the signature that was really to compare two 534 00:25:57,000 --> 00:26:01,000 signatures quickly to see if they're the same, then I will have less work to do. And so your idea is a great 535 00:26:01,000 --> 00:26:04,000 one. We take the letters and we alphabetize them. 536 00:26:04,000 --> 00:26:08,000 So cheap and peach both turn into the same ACEHP. It's 537 00:26:08,000 --> 00:26:13,000 a nonsense thing, but it's a signature. It's unique for any anagram, and 538 00:26:13,000 --> 00:26:14,000 if I use a map 539 00:26:14,000 --> 00:26:15,000 where that's the key, 540 00:26:15,000 --> 00:26:17,000 then I can associate with every word 541 00:26:17,000 --> 00:26:20,000 that had that same signature. 542 00:26:20,000 --> 00:26:23,000 So 543 00:26:23,000 --> 00:26:27,000 let's start building that. So let me write 544 00:26:27,000 --> 00:26:29,000 -I wanna call this the signature. 545 00:26:29,000 --> 00:26:31,000 Given a string, 546 00:26:31,000 --> 00:26:33,000 it's going to 547 00:26:33,000 --> 00:26:34,000 alphabetize it. 548 00:26:34,000 --> 00:26:37,000 I'm going to write the 549 00:26:37,000 --> 00:26:40,000 dumbest version of this ever, 550 00:26:40,000 --> 00:26:47,000 but I'm just gonna use a simple sorting routine on this. So 551 00:26:47,000 --> 00:26:51,000 smallest is gonna small index -we'll call it min index. That's a better name for it. 552 00:26:51,000 --> 00:26:53,000 Min index equals I, 553 00:26:53,000 --> 00:26:57,000 and then I'm gonna look through the string, and I'm gonna find the smallest letter that's 554 00:26:57,000 --> 00:26:58,000 there and 555 00:26:58,000 --> 00:27:03,000 move it to the front. That's basically gonna be my strategy. I've 556 00:27:03,000 --> 00:27:05,000 got the wrong place to start, though. I'm gonna 557 00:27:05,000 --> 00:27:11,000 look from I to the one. So if 558 00:27:11,000 --> 00:27:13,000 S sub J 559 00:27:13,000 --> 00:27:15,000 is less than S sub min index, 560 00:27:15,000 --> 00:27:20,000 so it is a smaller letter, then the min index gets to be J. 561 00:27:20,000 --> 00:27:23,000 So far, what I've done is I've run this loop that kind 562 00:27:23,000 --> 00:27:27,000 of starts in this case on the very first iteration and says the first character in Slot 0, that 563 00:27:27,000 --> 00:27:31,000 must be the min. And then it looks from there to the end. Is there anything smaller? Whenever it 564 00:27:31,000 --> 00:27:34,000 find anything smaller, it updates the min index, so when it's done 565 00:27:34,000 --> 00:27:39,000 after that second loop has fully 566 00:27:39,000 --> 00:27:40,000 operated, 567 00:27:40,000 --> 00:27:44,000 then min index will point to the smallest alphabetically 568 00:27:44,000 --> 00:27:45,000 character in the 569 00:27:45,000 --> 00:27:49,000 string that we have. Then I'm gonna swap it to the front. 570 00:27:49,000 --> 00:27:52,000 So S sub I 571 00:27:52,000 --> 00:27:55,000 with S sub min index, and 572 00:27:55,000 --> 00:28:08,000 I'll write a swap because swap is pretty easy to write. 573 00:28:08,000 --> 00:28:10,000 See about how we do this, okay. 574 00:28:10,000 --> 00:28:12,000 So 575 00:28:12,000 --> 00:28:13,000 if I 576 00:28:13,000 --> 00:28:16,000 -I like how this works, and then let me stop and say 577 00:28:16,000 --> 00:28:17,000 right here, now 578 00:28:17,000 --> 00:28:19,000 is a good time to test this thing. 579 00:28:19,000 --> 00:28:23,000 I just wrote signature. It probably works, but it potentially 580 00:28:23,000 --> 00:28:24,000 could not. 581 00:28:24,000 --> 00:28:27,000 And I'm just gonna show you a little bit of how I write code. This is good to know. As I 582 00:28:27,000 --> 00:28:28,000 say, 583 00:28:28,000 --> 00:28:29,000 584 00:28:29,000 --> 00:28:32,000 I put some code in here that I plan on throwing away. 585 00:28:32,000 --> 00:28:35,000 Enter word 586 00:28:35,000 --> 00:28:42,000 -and then I throw it through my function. I think I called it 587 00:28:42,000 --> 00:28:44,000 signature, didn't I? 588 00:28:44,000 --> 00:28:47,000 Signature S 589 00:28:47,000 --> 00:28:51,000 -so the idea being if it doesn't work, I wanna find out sooner rather than 590 00:28:51,000 --> 00:28:52,000 later. It 591 00:28:52,000 --> 00:28:55,000 doesn't like my use of get line. Is that because it's not 592 00:28:55,000 --> 00:28:57,000 included? 593 00:28:57,000 --> 00:29:00,000 Yes, it's not. So let's go get the right header files. This is half the 594 00:29:00,000 --> 00:29:03,000 battle sometimes is figuring out what 595 00:29:03,000 --> 00:29:04,000 headers 596 00:29:04,000 --> 00:29:05,000 are needed to make 597 00:29:05,000 --> 00:29:10,000 your compiler happy. So now it's up 598 00:29:10,000 --> 00:29:14,000 here at enter word, and I say what does cheap come out as? 599 00:29:14,000 --> 00:29:16,000 It goes into the debugger. That's good. 600 00:29:16,000 --> 00:29:17,000 So it wasn't right. 601 00:29:17,000 --> 00:29:19,000 Now we're gonna get to find out 602 00:29:19,000 --> 00:29:21,000 what does it not like about that. 603 00:29:21,000 --> 00:29:27,000 It says 604 00:29:27,000 --> 00:29:29,000 -did we forget to return something? 605 00:29:29,000 --> 00:29:30,000 Let's go look at our code. 606 00:29:30,000 --> 00:29:34,000 So it's complaining about -oh yeah, I can see where we're gonna get in trouble here. It's 607 00:29:34,000 --> 00:29:38,000 complaining that the return -it was saying that I'm trying to print something and it looks like garbage, that 608 00:29:38,000 --> 00:29:40,000 what I'm trying to print didn't make sense at all. 609 00:29:40,000 --> 00:29:47,000 I could say well that's funny. Let's go look at signature. [Inaudible] pass the [inaudible]. 610 00:29:47,000 --> 00:29:51,000 That's probably a good idea. We ought to fix that while we're there. 611 00:29:51,000 --> 00:29:54,000 Let me leave that bug in for a minute because I'm gonna fix my first bug, and then I'll come back 612 00:29:54,000 --> 00:29:55,000 to that one. 613 00:29:55,000 --> 00:29:57,000 So it turns out what it's really complaining about though is it has to do with 614 00:29:57,000 --> 00:30:01,000 -I said well what does signature return. Somehow, what's being returned by signature is being interpreted 615 00:30:01,000 --> 00:30:04,000 as total crap when it got back to main, and there's a very good reason for that 616 00:30:04,000 --> 00:30:06,000 because I never returned 617 00:30:06,000 --> 00:30:09,000 anything. So maybe if I had been less cavalier about the fact that it was giving me a warning 618 00:30:09,000 --> 00:30:11,000 over here that said 619 00:30:11,000 --> 00:30:14,000 control reaches the end of non-void function, but I was being lazy and I didn't look 620 00:30:14,000 --> 00:30:17,000 -was that I 621 00:30:17,000 --> 00:30:18,000 didn't pay attention to the fact. 622 00:30:18,000 --> 00:30:20,000 So let's leave my other bug in that you've already pointed out 623 00:30:20,000 --> 00:30:24,000 because this is exactly what it's like when you're doing it. You enter words and you say cheap, 624 00:30:24,000 --> 00:30:25,000 and it says cheap, and you're like 625 00:30:25,000 --> 00:30:28,000 no. 626 00:30:28,000 --> 00:30:32,000 And then you're like about how about an. Hey look, that's in alphabetical order. And then you spend all your time thinking about words 627 00:30:32,000 --> 00:30:37,000 for a while. Well, tux. That's in alphabetical order. It seems to work. You could do that for a while, but 628 00:30:37,000 --> 00:30:39,000 then 629 00:30:39,000 --> 00:30:41,000 you're like this whole cheap, 630 00:30:41,000 --> 00:30:42,000 not good. 631 00:30:42,000 --> 00:30:46,000 Okay, so I come back here and my friend who's already one step ahead of me has pointed out that my swap 632 00:30:46,000 --> 00:30:50,000 is missing the all important pass by reference 633 00:30:50,000 --> 00:30:53,000 that as it is it what swapping the copies that got passed to 634 00:30:53,000 --> 00:30:56,000 the function, but of course nothing was happening back here in signature land. 635 00:30:56,000 --> 00:30:57,000 So if I fix that, and 636 00:30:57,000 --> 00:30:59,000 I come back in here, 637 00:30:59,000 --> 00:31:01,000 I'm gonna feel better about this. Oh, 638 00:31:01,000 --> 00:31:04,000 my gosh. And even tux still works. 639 00:31:04,000 --> 00:31:08,000 And if I say banana, I should get a bunch of As, a bunch of B and an N, so there's various cases to test out if you have 640 00:31:08,000 --> 00:31:09,000 multiple letters, and if 641 00:31:09,000 --> 00:31:12,000 you have letters that are the same letters like 642 00:31:12,000 --> 00:31:16,000 apple so that it doesn't lose your duplicates, and it seems to come out right. 643 00:31:16,000 --> 00:31:17,000 So given this, 644 00:31:17,000 --> 00:31:19,000 the word peach 645 00:31:19,000 --> 00:31:22,000 should come down to the same signature as cheap, and so that seems to indicate we're on 646 00:31:22,000 --> 00:31:24,000 the path 647 00:31:24,000 --> 00:31:24,000 toward 648 00:31:24,000 --> 00:31:26,000 building the thing we wanted to build. So I 649 00:31:26,000 --> 00:31:28,000 have this read file thing that I 650 00:31:28,000 --> 00:31:29,000 have left over from last time that 651 00:31:29,000 --> 00:31:31,000 reads each word, 652 00:31:31,000 --> 00:31:35,000 and I wanna change my vector 653 00:31:35,000 --> 00:31:37,000 into a map of 654 00:31:37,000 --> 00:31:40,000 set of string. 655 00:31:40,000 --> 00:31:48,000 What capitalization do you not like? [Inaudible]. 656 00:31:48,000 --> 00:31:52,000 Yeah. So it turns out this file happens to all be 657 00:31:52,000 --> 00:32:00,000 lower case, but there's no harm in doing this. 658 00:32:00,000 --> 00:32:03,000 That way even if they weren't, it'll 659 00:32:03,000 --> 00:32:06,000 take care of that problem for us if we wanted it to. 660 00:32:06,000 --> 00:32:09,000 [Inaudible]. 661 00:32:09,000 --> 00:32:11,000 I want lower case. Oh 662 00:32:11,000 --> 00:32:12,000 yeah, I do. Well, your 663 00:32:12,000 --> 00:32:15,000 case. You guys are so picky. 664 00:32:15,000 --> 00:32:18,000 All right. Here's my deal. I'm gonna take my map, 665 00:32:18,000 --> 00:32:21,000 and this is gonna be a line of code that's gonna make your head spin. 666 00:32:21,000 --> 00:32:31,000 Just go with it. 667 00:32:31,000 --> 00:32:38,000 This is the do all 668 00:32:38,000 --> 00:32:40,000 craziness. Okay. 669 00:32:40,000 --> 00:32:41,000 So 670 00:32:41,000 --> 00:32:43,000 I've got this map, 671 00:32:43,000 --> 00:32:46,000 and what I wanna do is under the signature of this word, 672 00:32:46,000 --> 00:32:50,000 I want to look up the set of strings that's associated with it and tack this one in. 673 00:32:50,000 --> 00:32:54,000 And the add in this case with the set, I know that's gonna do non-duplication. In fact, 674 00:32:54,000 --> 00:32:58,000 the file doesn't contain duplicates, but if it did, I certainly don't want it to record it 675 00:32:58,000 --> 00:33:00,000 twice anyway, so I might as well do this. 676 00:33:00,000 --> 00:33:04,000 Now this form of this is a heavy lifter for a small piece of 677 00:33:04,000 --> 00:33:04,000 code. 678 00:33:04,000 --> 00:33:09,000 The signature then went and converted into the ACEHP form. 679 00:33:09,000 --> 00:33:12,000 I used that as the key into the table. 680 00:33:12,000 --> 00:33:16,000 If it was already there, it's gonna retrieve me an existing set that I'm gonna just go ahead and 681 00:33:16,000 --> 00:33:21,000 add a word onto. If it wasn't there, the behavior for the brackets is to create 682 00:33:21,000 --> 00:33:26,000 kind of a new empty 683 00:33:26,000 --> 00:33:28,000 value for that. So it'll use the key 684 00:33:28,000 --> 00:33:32,000 and create a default value for type. Well, the default value for set of string -so if you just create a 685 00:33:32,000 --> 00:33:32,000 set 686 00:33:32,000 --> 00:33:36,000 without any other information in the default constructor will always create you a 687 00:33:36,000 --> 00:33:40,000 nice clean empty set. So in fact, it will get me exactly what I want which is to put it in there with an empty 688 00:33:40,000 --> 00:33:43,000 set that I will immediately add the word into. So after 689 00:33:43,000 --> 00:33:46,000 I do this, I should have this fully populated map. And 690 00:33:46,000 --> 00:33:49,000 then I'm 691 00:33:49,000 --> 00:33:56,000 gonna do this just as a little test. I'm gonna say num words when I'm done 692 00:33:56,000 --> 00:33:58,000 to feel 693 00:33:58,000 --> 00:34:00,000 a little bit confident about what 694 00:34:00,000 --> 00:34:04,000 got put in there. How 695 00:34:04,000 --> 00:34:06,000 about I call it -that's a good idea. It's 696 00:34:06,000 --> 00:34:11,000 so fussy. C++ never does what you want. 697 00:34:11,000 --> 00:34:15,000 I think I called this thing OSPD2.txt that has the 698 00:34:15,000 --> 00:34:20,000 words in it. 699 00:34:20,000 --> 00:34:29,000 And then I need to declare the variable that I'm sticking all this stuff into is a set. So 700 00:34:29,000 --> 00:34:32,000 go in, load stuff, 701 00:34:32,000 --> 00:34:35,000 702 00:34:35,000 --> 00:34:36,000 doing it's thing, the 703 00:34:36,000 --> 00:34:38,000 number of words 112,882. Okay. 704 00:34:38,000 --> 00:34:42,000 That's close enough. I can't remember the number that's in there, but that sounds like a fine 705 00:34:42,000 --> 00:34:43,000 706 00:34:43,000 --> 00:34:47,000 approximation of it. So I feel like it did sort of manage to do something for it. And I can actually do this if I 707 00:34:47,000 --> 00:34:51,000 just wanna get a little glimpse of it is to use my iterator to look 708 00:34:51,000 --> 00:34:53,000 at 709 00:34:53,000 --> 00:35:04,000 something that's in there. Wait, is map a set of string? Iterator -- 710 00:35:04,000 --> 00:35:05,000 file iter.hasNext. I'm gonna 711 00:35:05,000 --> 00:35:07,000 say key 712 00:35:07,000 --> 00:35:10,000 equals iter.next, 713 00:35:10,000 --> 00:35:13,000 714 00:35:13,000 --> 00:35:18,000 and I'm going to print that key, and I'm going to print the size of the 715 00:35:18,000 --> 00:35:28,000 set because I'm at this 716 00:35:28,000 --> 00:35:33,000 point -I should see gobs of printing come out of this thing. It 717 00:35:33,000 --> 00:35:37,000 takes a little bit of a while to process the thing, and then 718 00:35:37,000 --> 00:35:42,000 see gobs and gobs of stuff going by. It looks like a lot of things are ones if you can 719 00:35:42,000 --> 00:35:45,000 imagine reading them as they go by because a lot of words are really just not anagrams 720 00:35:45,000 --> 00:35:48,000 of anything else. But 721 00:35:48,000 --> 00:35:52,000 some of the shorter ones have sort of a better chance. 722 00:35:52,000 --> 00:35:56,000 So you can find out here at the end that there are EEIKLPST. I don't know what that is. Leakiest? No. I don't 723 00:35:56,000 --> 00:35:58,000 know 724 00:35:58,000 --> 00:35:59,000 what that word is. I 725 00:35:59,000 --> 00:36:03,000 should write it in for any of them. 726 00:36:03,000 --> 00:36:09,000 This dictionary has a bunch of really crazy words in it too, so it makes it especially challenging. What is that? I don't know. That one 727 00:36:09,000 --> 00:36:10,000 almost looks 728 00:36:10,000 --> 00:36:14,000 like beginners, but it's got an F in it. It's the F beginners, a very famous 729 00:36:14,000 --> 00:36:16,000 word. You guys have probably heard 730 00:36:16,000 --> 00:36:19,000 of it. So I've seen that, and now I wanna do the thing where I will pick the 731 00:36:19,000 --> 00:36:21,000 largest one. I'd like to know. 732 00:36:21,000 --> 00:36:23,000 Somebody should tell me. 733 00:36:23,000 --> 00:36:25,000 Int max size [cuts 734 00:36:25,000 --> 00:36:28,000 out] 735 00:36:28,000 --> 00:36:30,000 max key, so I'll 736 00:36:30,000 --> 00:36:33,000 set this to be zero, and max 737 00:36:33,000 --> 00:36:34,000 key is that. 738 00:36:34,000 --> 00:36:35,000 And then I'm gonna do this. If 739 00:36:35,000 --> 00:36:38,000 the 740 00:36:38,000 --> 00:36:40,000 size of this key is 741 00:36:40,000 --> 00:36:43,000 greater than my max size, 742 00:36:43,000 --> 00:36:55,000 then it's gonna get to be the new key. 743 00:36:55,000 --> 00:37:06,000 So after I did all of that then [cuts out] 744 00:37:06,000 --> 00:37:08,000 key. 745 00:37:08,000 --> 00:37:13,000 And then I probably wanna see what they are, so why don't I go ahead and take a look. Ah, I have to go to 746 00:37:13,000 --> 00:37:17,000 the type, though. [Inaudible] 747 00:37:17,000 --> 00:37:21,000 to equals 748 00:37:21,000 --> 00:37:22,000 M of 749 00:37:22,000 --> 00:37:27,000 key -max key, I guess, dot 750 00:37:27,000 --> 00:37:29,000 iterator, 751 00:37:29,000 --> 00:37:30,000 [inaudible] IT 752 00:37:30,000 --> 00:37:37,000 has next, CLIT.next [inaudible]. 753 00:37:37,000 --> 00:37:43,000 So it went back. It found the maximum, in this case using the size of the sets as the distinguishing 754 00:37:43,000 --> 00:37:47,000 feature. And 755 00:37:47,000 --> 00:37:49,000 then max is AEPRS, 756 00:37:49,000 --> 00:37:51,000 which 757 00:37:51,000 --> 00:37:53,000 it's got a big old list of about 12. 758 00:37:53,000 --> 00:37:55,000 I think that's 759 00:37:55,000 --> 00:37:58,000 12, actually. Maybe 13. 760 00:37:58,000 --> 00:37:59,000 So now you know. 761 00:37:59,000 --> 00:38:02,000 You can impress your friends at parties. This is the kind of thing you can win bar bets on. 762 00:38:02,000 --> 00:38:03,000 Oh, 763 00:38:03,000 --> 00:38:06,000 yeah. What's the size of the largest 764 00:38:06,000 --> 00:38:07,000 anagram cluster? 765 00:38:07,000 --> 00:38:10,000 Everybody wants to know this kind of stuff. I can't believe you guys can sleep at night without 766 00:38:10,000 --> 00:38:12,000 actually knowing this. 767 00:38:12,000 --> 00:38:16,000 And what's neat to know though [inaudible] just to point out a couple of things that -you can use a little decomposition 768 00:38:16,000 --> 00:38:18,000 on this code, 769 00:38:18,000 --> 00:38:23,000 but there's kind of a very small amount of things we're having to do. For example, one of the things that's really 770 00:38:23,000 --> 00:38:27,000 powerful, things like the map where we can just figure out how to key the things 771 00:38:27,000 --> 00:38:29,000 to store the collection right under that key, 772 00:38:29,000 --> 00:38:33,000 then looking it up and adding something is a very 773 00:38:33,000 --> 00:38:36,000 efficient sort of direct operation, just building on these things and it going through and doing all 774 00:38:36,000 --> 00:38:37,000 the work of 775 00:38:37,000 --> 00:38:41,000 storing them, sorting them, making it efficient for us to retrieve them and look them up such that I can 776 00:38:41,000 --> 00:38:44,000 process 100,000 words 777 00:38:44,000 --> 00:38:49,000 in the blink of an eye, and then go back through, look at them all, find the biggest one, 778 00:38:49,000 --> 00:38:51,000 get my information out. 779 00:38:51,000 --> 00:38:55,000 When you make the call to M.size, is 780 00:38:55,000 --> 00:39:00,000 that the number of words? [Inaudible]. That is the number of keys. Keys, right. So that's not actually [inaudible]. 781 00:39:00,000 --> 00:39:01,000 Yeah. So it doesn't know anything about 782 00:39:01,000 --> 00:39:04,000 everything else that was [inaudible], but in fact that's why it's 783 00:39:04,000 --> 00:39:07,000 [inaudible]. I know the dictionary has about 127,000 words. It turns out they form about 784 00:39:07,000 --> 00:39:08,000 785 00:39:08,000 --> 00:39:11,000 112 unique 786 00:39:11,000 --> 00:39:14,000 signatures, and so there's actually another 20,000 words that are 787 00:39:14,000 --> 00:39:17,000 clung onto some existing signature. That's the number of unique signatures across 788 00:39:17,000 --> 00:39:20,000 the dictionary, not the number of words, so that's probably the wrong name to call 789 00:39:20,000 --> 00:39:22,000 it. For the M 790 00:39:22,000 --> 00:39:24,000 791 00:39:24,000 --> 00:39:28,000 signature word thing where [inaudible] of the default just to create a new stack, that 792 00:39:28,000 --> 00:39:32,000 works as well for vectors [inaudible]? Yeah. It works for anything if you were just to declare 793 00:39:32,000 --> 00:39:34,000 it on the stack and the right thing happened, 794 00:39:34,000 --> 00:39:37,000 so vectors, set, maps. All those things do. 795 00:39:37,000 --> 00:39:41,000 But the primitive types like int and double, it doesn't. So it would work for string. String 796 00:39:41,000 --> 00:39:45,000 is an empty string. So for some of the fancier, more modern types tend to actually 797 00:39:45,000 --> 00:39:49,000 know how to just default construct themselves into a good state, but the primitives don't do that. 798 00:39:49,000 --> 00:39:53,000 So if you were having a map of ints and you wanted to have them start at zero, you need to really 799 00:39:53,000 --> 00:39:58,000 start them at zero. You can call just M sub this. It would be garbage, and it would just be operating with 800 00:39:58,000 --> 00:40:02,000 garbage from that way forward. 801 00:40:02,000 --> 00:40:06,000 All right. Well, we're good. What 802 00:40:06,000 --> 00:40:08,000 I'm gonna give you 803 00:40:08,000 --> 00:40:10,000 is the 804 00:40:10,000 --> 00:40:11,000 eight minute 805 00:40:11,000 --> 00:40:12,000 discussion of recursion 806 00:40:12,000 --> 00:40:14,000 that whets your appetite for 807 00:40:14,000 --> 00:40:18,000 the things we're gonna be doing next week. 808 00:40:18,000 --> 00:40:21,000 So recursion is one of those things I think that when you haven't yet 809 00:40:21,000 --> 00:40:24,000 had a chance to explore it first hand and other people tell you about it, it 810 00:40:24,000 --> 00:40:30,000 has sort of an awe inspiring sort of mystery, some fear, and whatnot. 811 00:40:30,000 --> 00:40:31,000 So 812 00:40:31,000 --> 00:40:34,000 first off, I wanna kinda shake that fear off. It is a little bit 813 00:40:34,000 --> 00:40:35,000 hard to wrap your head around 814 00:40:35,000 --> 00:40:38,000 the first time you see it, but we're gonna have a whole week's worth of time to spend on it, so 815 00:40:38,000 --> 00:40:41,000 we're gonna 816 00:40:41,000 --> 00:40:42,000 try to give you a lot of 817 00:40:42,000 --> 00:40:46,000 different ways to think about it, and different problems to see to kinda help you do it. And I think 818 00:40:46,000 --> 00:40:48,000 once you do get your head around it, it turns out then 819 00:40:48,000 --> 00:40:53,000 you'll discover how infinitely powerful it is, that there is kind of a simple idea in it 820 00:40:53,000 --> 00:40:57,000 that once you kinda fully get your head around, you can explore and solve lots of different 821 00:40:57,000 --> 00:41:00,000 problems using this just one technique again and again. 822 00:41:00,000 --> 00:41:02,000 So in itself, it's 823 00:41:02,000 --> 00:41:04,000 a little bit mysterious at first glance, but then 824 00:41:04,000 --> 00:41:06,000 once you kind of 825 00:41:06,000 --> 00:41:09,000 master it, you'll be amazed at the kind of neat things you can do with it. 826 00:41:09,000 --> 00:41:15,000 So it is certainly what I'd consider an indispensable tool in a programmer's tool kit. The kind of 827 00:41:15,000 --> 00:41:19,000 problems you can solve using the techniques you have so far is fundamentally limited. And 828 00:41:19,000 --> 00:41:22,000 part of what we need to do in this class is expose you to these new ways of solving 829 00:41:22,000 --> 00:41:26,000 harder, more sophisticated problems that the old techniques don't work for. One of the 830 00:41:26,000 --> 00:41:31,000 cool things about recursion is it actually lends very simple, elegant, short solutions 831 00:41:31,000 --> 00:41:35,000 to problems that at first glance seem completely unsolvable. 832 00:41:35,000 --> 00:41:36,000 That if you can 833 00:41:36,000 --> 00:41:40,000 formulate a structure for [inaudible], you will discover that the code is not long 834 00:41:40,000 --> 00:41:45,000 to write. It's not tedious to write. The tricky part is to figure out how 835 00:41:45,000 --> 00:41:47,000 to express it, 836 00:41:47,000 --> 00:41:50,000 so it's more of a thinking puzzle than it is a coding puzzle. I 837 00:41:50,000 --> 00:41:55,000 certainly like thinking puzzles as much as coding puzzles if not more. 838 00:41:55,000 --> 00:41:58,000 The general sort of idea is that 839 00:41:58,000 --> 00:42:02,000 you are going to try to solve a problem 840 00:42:02,000 --> 00:42:05,000 -instead of sort of breaking it down into component tasks like if I need to make dinner, I need 841 00:42:05,000 --> 00:42:08,000 to go to the store and buy things, and I need to come home, and chop them, and get 842 00:42:08,000 --> 00:42:12,000 the recipe. You think of what your standard decomposition is all about 843 00:42:12,000 --> 00:42:16,000 -breaking down your tasks into A, B, and C, and D, and then you add them all together to get 844 00:42:16,000 --> 00:42:17,000 845 00:42:17,000 --> 00:42:20,000 the whole task done. Recursion has this kind of very different way of thinking about the problem, which is like 846 00:42:20,000 --> 00:42:22,000 well if I needed to get Task A done, 847 00:42:22,000 --> 00:42:26,000 and I had Task A prime, which was somehow a lot like the task I was trying to solve, but it somehow 848 00:42:26,000 --> 00:42:29,000 was a little bit simpler, a little bit easier, 849 00:42:29,000 --> 00:42:32,000 a little bit more manageable than the one I started out to solve, 850 00:42:32,000 --> 00:42:34,000 and if I had that solution 851 00:42:34,000 --> 00:42:37,000 -so if somehow I could delegate it off to some 852 00:42:37,000 --> 00:42:39,000 853 00:42:39,000 --> 00:42:43,000 minion who works for me, and then I could use that to solve my problem, then my 854 00:42:43,000 --> 00:42:45,000 job would be made much easier by using that result of 855 00:42:45,000 --> 00:42:48,000 solving a similar problem that's a little bit [inaudible]. 856 00:42:48,000 --> 00:42:51,000 Okay, that seems a little bit wacky. 857 00:42:51,000 --> 00:42:55,000 Let me give you sort of an example of how this might work. 858 00:42:55,000 --> 00:42:58,000 So your standard problem, I said yeah, it's like you do these dissimilar sub problems. 859 00:42:58,000 --> 00:43:04,000 Let's imagine I had this goal where I wanted to survey the Stanford student body. 860 00:43:04,000 --> 00:43:06,000 I don't want just like a 861 00:43:06,000 --> 00:43:08,000 haphazard 862 00:43:08,000 --> 00:43:11,000 most of the people involved. Let's say I really wanted to get input from every single person 863 00:43:11,000 --> 00:43:16,000 on campus whether they think having cardinal as your mascot is a ridiculous choice or not. So 864 00:43:16,000 --> 00:43:19,000 let's imagine I really wanna hear from all 10,000 students. 865 00:43:19,000 --> 00:43:21,000 Now I can stand out in White Plaza with a big 866 00:43:21,000 --> 00:43:26,000 note pad and try to accost people and sort of work my way down the list. 867 00:43:26,000 --> 00:43:29,000 And then I'd be there for eons and never solve my problem. 868 00:43:29,000 --> 00:43:31,000 Instead, what I decide to do is I say well I'm gonna 869 00:43:31,000 --> 00:43:32,000 recruit some people to help me 870 00:43:32,000 --> 00:43:34,000 because I'm lazy 871 00:43:34,000 --> 00:43:35,000 as we've already established, 872 00:43:35,000 --> 00:43:37,000 and I would like to get 873 00:43:37,000 --> 00:43:41,000 some other people to join in my quest to answer these burning questions and to 874 00:43:41,000 --> 00:43:43,000 solve the survey. 875 00:43:43,000 --> 00:43:45,000 So what I do is I round up ten people let's say, 876 00:43:45,000 --> 00:43:49,000 and I say would you help me, and I decide to divide the campus into kind of ten 877 00:43:49,000 --> 00:43:51,000 partitions. And I say if you 878 00:43:51,000 --> 00:43:54,000 could survey all the people whose names begin with A, B, and C, that would really help. 879 00:43:54,000 --> 00:43:56,000 And if you could do [inaudible] Gs, and 880 00:43:56,000 --> 00:44:00,000 if you would do -and if I divide up the alphabet that way, give each of them two or three letters, and I 881 00:44:00,000 --> 00:44:02,000 say if you would go get the data, it'd be really 882 00:44:02,000 --> 00:44:05,000 easy for me to do my job then. If I just took all their 883 00:44:05,000 --> 00:44:08,000 data [inaudible]. Well, being 884 00:44:08,000 --> 00:44:12,000 the kind of lazy person that I am, it's likely that the ten people I recruit would have similar lazy qualities 885 00:44:12,000 --> 00:44:15,000 because lazy people hang out with other lazy people. 886 00:44:15,000 --> 00:44:18,000 And so the person who was in charge of A, B, C, the first thing they do is turn around and find ten 887 00:44:18,000 --> 00:44:22,000 more friends, and then they divide it up and say could you do the 888 00:44:22,000 --> 00:44:23,000 AA through AM and so on. 889 00:44:23,000 --> 00:44:25,000 If they divide it into these pools of 890 00:44:25,000 --> 00:44:30,000 one tenth of what they were responsible for, and say you can go get the information from these people, 891 00:44:30,000 --> 00:44:31,000 and if they did the same thing 892 00:44:31,000 --> 00:44:35,000 -so if everybody along the road. We started with 10,000. Now each person had 1,000 893 00:44:35,000 --> 00:44:39,000 to survey. They asked their friend to do 100. Their friend asked ten people to do 894 00:44:39,000 --> 00:44:39,000 ten. 895 00:44:39,000 --> 00:44:43,000 And then at some point, the person who has ten says well I just need to ask these ten people. 896 00:44:43,000 --> 00:44:45,000 897 00:44:45,000 --> 00:44:47,000 Once I get their data, we don't need to do anything further. 898 00:44:47,000 --> 00:44:50,000 So at some point the problem becomes so small, so simple, 899 00:44:50,000 --> 00:44:55,000 even though it was kind of the same problem all along. I just reproduced the same problem we had, but in 900 00:44:55,000 --> 00:44:58,000 a slightly more tractable form, but then I divided it around. Divide and conquer 901 00:44:58,000 --> 00:45:00,000 sometimes they call this to where 902 00:45:00,000 --> 00:45:01,000 I spread out the work 903 00:45:01,000 --> 00:45:05,000 around a bunch of people to where each person's contribution is just a little part of the whole. 904 00:45:05,000 --> 00:45:09,000 You had to find the ten volunteers around underneath you and 905 00:45:09,000 --> 00:45:11,000 get their 906 00:45:11,000 --> 00:45:14,000 help in solving the problem, but nobody had to do much work, and that's kind of a really 907 00:45:14,000 --> 00:45:18,000 interesting way to solve a problem. It sounds like a very big problem of surveying 908 00:45:18,000 --> 00:45:19,000 10,000 people, 909 00:45:19,000 --> 00:45:23,000 but by dividing and conquer, everybody has a little tiny role to play. You leverage a lot 910 00:45:23,000 --> 00:45:24,000 of people getting it done, 911 00:45:24,000 --> 00:45:29,000 and there is this self-similarity to it, which is kind of intriguing that 912 00:45:29,000 --> 00:45:34,000 everybody is trying to solve the same problem but just at different levels of scale. 913 00:45:34,000 --> 00:45:36,000 And so this idea 914 00:45:36,000 --> 00:45:38,000 applies to things like phone trees rather than 915 00:45:38,000 --> 00:45:42,000 trying to get the message out to everybody in my class, it might be that I call two people who call two 916 00:45:42,000 --> 00:45:45,000 friends who call two friends until everybody gets covered. 917 00:45:45,000 --> 00:45:48,000 Nobody does the whole job. Everybody just does a little part of it. Sometimes you'll see these 918 00:45:48,000 --> 00:45:50,000 fractal drawings where 919 00:45:50,000 --> 00:45:55,000 there is a large leaf which when you look closer actually consists of littler leaves, 920 00:45:55,000 --> 00:45:57,000 which themselves are littler leaves, so that at 921 00:45:57,000 --> 00:45:58,000 every level of scale 922 00:45:58,000 --> 00:46:01,000 the same image is being reproduced, 923 00:46:01,000 --> 00:46:05,000 and the result kind of on the outside is something that in itself if you look deeper has the 924 00:46:05,000 --> 00:46:08,000 same problem but smaller embedded in it. 925 00:46:08,000 --> 00:46:10,000 So it's a pretty neat 926 00:46:10,000 --> 00:46:11,000 sort of 927 00:46:11,000 --> 00:46:13,000 way of solving things. 928 00:46:13,000 --> 00:46:14,000 I am 929 00:46:14,000 --> 00:46:18,000 gonna tell you a little bit about how the code looks, and then I really am not gonna be able to show you much code today. 930 00:46:18,000 --> 00:46:23,000 I think it's actually even better to show you this code on Monday when we can come back fresh, 931 00:46:23,000 --> 00:46:27,000 but that it involves taking -So we're looking at functional recursion first, 932 00:46:27,000 --> 00:46:29,000 and functional recursion is 933 00:46:29,000 --> 00:46:33,000 taking some sort of functions, a function that takes an input and returns an answers, returns 934 00:46:33,000 --> 00:46:36,000 non-void thing that comes back. 935 00:46:36,000 --> 00:46:39,000 And we're gonna be looking at problems where you're trying to solve this big problem, 936 00:46:39,000 --> 00:46:44,000 and that if you had the answer to making a call to yourself on a smaller version of 937 00:46:44,000 --> 00:46:45,000 the problem 938 00:46:45,000 --> 00:46:48,000 -maybe one call, maybe two calls, maybe several calls 939 00:46:48,000 --> 00:46:52,000 -that you could add those, or multiply those, or combine those in some way to answer the 940 00:46:52,000 --> 00:46:53,000 bigger problem. So 941 00:46:53,000 --> 00:46:56,000 if I were trying to solve 942 00:46:56,000 --> 00:47:03,000 this campus survey, having the answers to these smaller campus surveys gives me that total result. And 943 00:47:03,000 --> 00:47:05,000 so the -I really should not try to do this in two minutes. What 944 00:47:05,000 --> 00:47:09,000 I should do is try to tell you on Monday. We'll come back and we'll talk about recursion, and it 945 00:47:09,000 --> 00:47:11,000 will be an impressive week. Meanwhile, work on 946 00:47:11,000 --> 00:47:14,000 your ADT homework.