2 00:00:10,000 --> 00:00:11,000 3 00:00:11,000 --> 00:00:14,000 This presentation is delivered by the Stanford Center for Professional 4 00:00:14,000 --> 00:00:23,000 Development. 5 00:00:23,000 --> 00:00:25,000 So we've seen vector, grid, stack, and queue, 6 00:00:25,000 --> 00:00:28,000 which is what we were doing on Friday, 7 00:00:28,000 --> 00:00:31,000 and these four are the group we call the sequential containers 8 00:00:31,000 --> 00:00:34,000 where they're storing and retrieving things on the basis of sequence. 9 00:00:34,000 --> 00:00:38,000 You place things LIFO into the stack, last in first out. You put things in 10 00:00:38,000 --> 00:00:39,000 the vector 11 00:00:39,000 --> 00:00:43,000 by index, retrieved by index. There's a sequence that's being 12 00:00:43,000 --> 00:00:46,000 maintained behind the scenes to store and retrieve the data. 13 00:00:46,000 --> 00:00:48,000 All of these containers 14 00:00:48,000 --> 00:00:51,000 have the property that they don't really examine your elements. They just store them. 15 00:00:51,000 --> 00:00:54,000 So you can put things into a stack, but it never really looks at them. It never looks 16 00:00:54,000 --> 00:00:56,000 at the strings you put in or the 17 00:00:56,000 --> 00:00:58,000 students that you put in the vector. 18 00:00:58,000 --> 00:01:02,000 It just holds onto it, and then when asked to give you something back on N queue -I mean a 19 00:01:02,000 --> 00:01:05,000 D queue operation or a get at or something like that, it retrieves what you 20 00:01:05,000 --> 00:01:07,000 earlier placed there. 21 00:01:07,000 --> 00:01:11,000 These are not very fancy. They do things that are very useful, 22 00:01:11,000 --> 00:01:14,000 but the bang for the buck is 23 00:01:14,000 --> 00:01:18,000 mostly in terms of just modeling a particular behavior like the stack and the queue, 24 00:01:18,000 --> 00:01:21,000 and then allowing you to kind of use it easily without error. 25 00:01:21,000 --> 00:01:24,000 The associative containers, which is what map and set belong to, 26 00:01:24,000 --> 00:01:27,000 are the ones where you're really getting a lot of power out of them, 27 00:01:27,000 --> 00:01:31,000 where they do something that actually is hard, or inconvenient, or 28 00:01:31,000 --> 00:01:33,000 inefficient for you to do manually. 29 00:01:33,000 --> 00:01:36,000 They're not about sequencing. They're about relationships. The 30 00:01:36,000 --> 00:01:40,000 set is a little bit like a vector in the sense that it's a collection of things 31 00:01:40,000 --> 00:01:41,000 without duplication, but 32 00:01:41,000 --> 00:01:45,000 it has fast operations for checking membership, adding things and removing 33 00:01:45,000 --> 00:01:47,000 things, checking to see if something's in the set 34 00:01:47,000 --> 00:01:50,000 in a way that the vector you'd end up kind of trudging through it to see 35 00:01:50,000 --> 00:01:52,000 do I already have this value in here. 36 00:01:52,000 --> 00:01:55,000 There's no easy way to do that with a vector as it stands. 37 00:01:55,000 --> 00:01:59,000 And so set gives you this kind of access for is this in the set, is it not, 38 00:01:59,000 --> 00:02:03,000 as well as a bunch of operations that allow you to do high level 39 00:02:03,000 --> 00:02:06,000 combinations like take this set and union it with another, or intersect it, or 40 00:02:06,000 --> 00:02:07,000 subtract this one from that, 41 00:02:07,000 --> 00:02:11,000 and get the mathematical properties of sets expressed in code. 42 00:02:11,000 --> 00:02:12,000 The map, 43 00:02:12,000 --> 00:02:16,000 even fancier in some sense, is a key value pairing 44 00:02:16,000 --> 00:02:19,000 that you want to be able to do some kind of look up by key. I wanna have a 45 00:02:19,000 --> 00:02:22,000 license plate number, and I wanna know what car that's assigned to, and what 46 00:02:22,000 --> 00:02:25,000 its make, and model, and owner is. 47 00:02:25,000 --> 00:02:28,000 The map is exactly the thing you want for that where you have 48 00:02:28,000 --> 00:02:32,000 some sort of key, or some sort of tag, or identifying ID 49 00:02:32,000 --> 00:02:36,000 that you wanna associate some larger thing with, or some other thing with for that 50 00:02:36,000 --> 00:02:39,000 matter, and be able to look that up, and make that pairing, and retrieving that 51 00:02:39,000 --> 00:02:41,000 pairing information quickly. 52 00:02:41,000 --> 00:02:45,000 Both of these are designed for very efficient access. It's gonna be possible 53 00:02:45,000 --> 00:02:46,000 to do 54 00:02:46,000 --> 00:02:52,000 these lookups, and additions, and combinations in 55 00:02:52,000 --> 00:02:55,000 much faster than you would be able to do if you were doing it manually. We're gonna 56 00:02:55,000 --> 00:02:58,000 peek under the hood in a couple weeks and see how that implementation works 57 00:02:58,000 --> 00:02:59,000 out and why, 58 00:02:59,000 --> 00:03:02,000 but for now what's pretty cool as a client is you just get to do neat things with 59 00:03:02,000 --> 00:03:04,000 it, stick stuff into it, check things, 60 00:03:04,000 --> 00:03:05,000 build things on top of it, 61 00:03:05,000 --> 00:03:09,000 taking advantage of all the power that's there through the kind of simple 62 00:03:09,000 --> 00:03:11,000 interface without having to learn 63 00:03:11,000 --> 00:03:14,000 what crazy internals behind it are. 64 00:03:14,000 --> 00:03:18,000 It's not something we have to worry about in the first pass. So 65 00:03:18,000 --> 00:03:20,000 we'll talk about map. As 66 00:03:20,000 --> 00:03:24,000 I said, it's a collection of key value pairs sort of modeling a dictionary, a word 67 00:03:24,000 --> 00:03:26,000 and its definition. 68 00:03:26,000 --> 00:03:29,000 It could be that it's the student ID and it's their transcript. 69 00:03:29,000 --> 00:03:33,000 It could be that it's an index which tells you what a word -and what page as it 70 00:03:33,000 --> 00:03:35,000 appears in this text. 71 00:03:35,000 --> 00:03:37,000 So there's a key. There's a value. 72 00:03:37,000 --> 00:03:41,000 You have operations that once you create a map where you can add a new pair, so you add 73 00:03:41,000 --> 00:03:44,000 a key and a value together and stick them in there. 74 00:03:44,000 --> 00:03:47,000 If you attempt to add a 75 00:03:47,000 --> 00:03:49,000 different value for an existing key, it will actually overwrite. 76 00:03:49,000 --> 00:03:54,000 So at any given time, there's exactly one value associated per 77 00:03:54,000 --> 00:03:58,000 key. You can access the elements that you stored earlier saying I've got a key, give 78 00:03:58,000 --> 00:03:59,000 me the associated value. 79 00:03:59,000 --> 00:04:03,000 I can check to see whether a key is contained in the map, just if I wanna 80 00:04:03,000 --> 00:04:05,000 know is there some entry already there. 81 00:04:05,000 --> 00:04:07,000 And a couple of the fancy 82 00:04:07,000 --> 00:04:10,000 features we'll get to in a second have some shorthand operators, and some 83 00:04:10,000 --> 00:04:13,000 access that allow you browsing. 84 00:04:13,000 --> 00:04:17,000 Some examples of things that maps are really good for -so maps turns out to be just 85 00:04:17,000 --> 00:04:19,000 ubiquitous across computer science 86 00:04:19,000 --> 00:04:22,000 problem solving. You probably saw the hashmap in your 106A 87 00:04:22,000 --> 00:04:23,000 class, and 88 00:04:23,000 --> 00:04:24,000 you got a little bit of work 89 00:04:24,000 --> 00:04:26,000 out of that toward the end of the quarter. 90 00:04:26,000 --> 00:04:31,000 This guy, there's just a thousand uses for which a map is a great tool. 91 00:04:31,000 --> 00:04:35,000 A dictionary is an obvious one, a thesaurus where you have a word mapped to 92 00:04:35,000 --> 00:04:39,000 synonyms, the DMV which has license plate tags 93 00:04:39,000 --> 00:04:42,000 that tell you the three cars I've owned in my life 94 00:04:42,000 --> 00:04:44,000 and what their license plate numbers were 95 00:04:44,000 --> 00:04:46,000 96 00:04:46,000 --> 00:04:47,000 -any sort of thing that looks like that 97 00:04:47,000 --> 00:04:51,000 fits in the map's goal. 98 00:04:51,000 --> 00:04:54,000 Let's take a look at what the interface actually looks like. 99 00:04:54,000 --> 00:04:56,000 This is a little bit 100 00:04:56,000 --> 00:04:59,000 simplified. I removed a couple of the advance features I'm gonna get to in a second. 101 00:04:59,000 --> 00:05:01,000 The class name is map. 102 00:05:01,000 --> 00:05:04,000 It's templatized on the value type. 103 00:05:04,000 --> 00:05:07,000 If you look at the add operation, 104 00:05:07,000 --> 00:05:08,000 it takes a string 105 00:05:08,000 --> 00:05:11,000 type for the key and a value type for the value, 106 00:05:11,000 --> 00:05:15,000 and that means that the map is templatized on only one 107 00:05:15,000 --> 00:05:17,000 of the half of the pair, that 108 00:05:17,000 --> 00:05:20,000 the pair is always a key which is a string mapped to some other thing, whether 109 00:05:20,000 --> 00:05:25,000 it's a student structure, or a vector of 110 00:05:25,000 --> 00:05:28,000 synonyms, or a string which represents a definition. That value 111 00:05:28,000 --> 00:05:32,000 type can vary based on what the client chooses to fill in the template parameter 112 00:05:32,000 --> 00:05:35,000 with, but the keys are always strings. 113 00:05:35,000 --> 00:05:37,000 I'll kinda get to why that is a little bit later but just to note it while we're 114 00:05:37,000 --> 00:05:38,000 here. 115 00:05:38,000 --> 00:05:42,000 So things like remove, and contains key, and get value that all operate on a key 116 00:05:42,000 --> 00:05:45,000 always take a string, and then what they return 117 00:05:45,000 --> 00:05:48,000 or access is 118 00:05:48,000 --> 00:05:51,000 templatized by this value type. 119 00:05:51,000 --> 00:05:54,000 So there's one little thing to need to know a little about how the interface works. 120 00:05:54,000 --> 00:05:58,000 When you create a map, it is empty. It has no pairs. If you ask it for the size, it tells 121 00:05:58,000 --> 00:06:00,000 you the number of pairs in it. 122 00:06:00,000 --> 00:06:05,000 If you ever add something to a key that was already present 123 00:06:05,000 --> 00:06:07,000 -so if earlier I'd said Julie's phone number is this, 124 00:06:07,000 --> 00:06:10,000 and then I later went in and tried to add under Julie another phone number, it will 125 00:06:10,000 --> 00:06:14,000 replace, so there will not be entries for Julie. There's always exactly one. 126 00:06:14,000 --> 00:06:15,000 So add 127 00:06:15,000 --> 00:06:17,000 sometimes is incrementing the size, 128 00:06:17,000 --> 00:06:20,000 and other times the size will be unchanged but will have overwritten 129 00:06:20,000 --> 00:06:22,000 some previous pair. 130 00:06:22,000 --> 00:06:23,000 Remove, 131 00:06:23,000 --> 00:06:26,000 if there was a matching value will decrement the size. If actually there wasn't a 132 00:06:26,000 --> 00:06:28,000 matching entry, it makes no changes. 133 00:06:28,000 --> 00:06:30,000 Contains key can tell you whether something's in there. 134 00:06:30,000 --> 00:06:32,000 And then get value 135 00:06:32,000 --> 00:06:35,000 has a little bit of a quirk in its interface that you'll need to be aware of. If you 136 00:06:35,000 --> 00:06:39,000 ask it to retrieve a key that doesn't exist, 137 00:06:39,000 --> 00:06:42,000 it's got a little bit of a problem there. If you say give me the phone number for 138 00:06:42,000 --> 00:06:44,000 Julie, 139 00:06:44,000 --> 00:06:45,000 it's supposed to return to you 140 00:06:45,000 --> 00:06:49,000 the previously associated value, something of value type. 141 00:06:49,000 --> 00:06:53,000 Well, if you ask it to get a value for something and there was no pair that 142 00:06:53,000 --> 00:06:54,000 matched that, 143 00:06:54,000 --> 00:06:56,000 it has a little bit of a 144 00:06:56,000 --> 00:06:59,000 conundrum about what to return. For example, 145 00:06:59,000 --> 00:07:04,000 if this table were storing 146 00:07:04,000 --> 00:07:07,000 numbers -maybe you were storing the latitude and longitude of a city. Well, when 147 00:07:07,000 --> 00:07:10,000 if you asked it to return the value for a city it doesn't know about, 148 00:07:10,000 --> 00:07:13,000 what is it supposed to return? What is the default -some 149 00:07:13,000 --> 00:07:17,000 sort of sentinel value that says no latitude, no longitude. It says, gWell, what 150 00:07:17,000 --> 00:07:20,000 is that?h If you were getting strings -if you had definitions, you might return 151 00:07:20,000 --> 00:07:23,000 the empty string. If you were returning numbers, it might be that what you 152 00:07:23,000 --> 00:07:28,000 wanna return was zero or negative one. There is not for all types of values 153 00:07:28,000 --> 00:07:29,000 some 154 00:07:29,000 --> 00:07:32,000 clear identifiable sentinel that would be appropriate to return. 155 00:07:32,000 --> 00:07:34,000 So what happens actually in get value 156 00:07:34,000 --> 00:07:38,000 is if you ask it to retrieve a value for something that doesn't exist, it throws 157 00:07:38,000 --> 00:07:39,000 an error message. 158 00:07:39,000 --> 00:07:41,000 There is no other sensible 159 00:07:41,000 --> 00:07:43,000 return value it knows of, 160 00:07:43,000 --> 00:07:46,000 and so it treats it actually as a drastic situation. 161 00:07:46,000 --> 00:07:50,000 That means if you are probing the table for something 162 00:07:50,000 --> 00:07:53,000 that you traditionally wanna be checking contains key if you're not 163 00:07:53,000 --> 00:07:54,000 positive that it's already in there. 164 00:07:54,000 --> 00:07:57,000 If you do know, it's fine to go ahead and get value through, but if you 165 00:07:57,000 --> 00:08:00,000 are querying it, 166 00:08:00,000 --> 00:08:05,000 it's a two-step process to verify it's there before you go get it. 167 00:08:05,000 --> 00:08:09,000 So I got a little bit of code. I'm gonna go actually just type this code rather than show it to 168 00:08:09,000 --> 00:08:12,000 you prewritten because I think it's easier to learn something from when I actually 169 00:08:12,000 --> 00:08:14,000 type it, so I'm gonna go ahead and just show you. 170 00:08:14,000 --> 00:08:19,000 So what I have here is a little bit of code I wrote before you got here 171 00:08:19,000 --> 00:08:23,000 which is just designed to take a particular input text file and break it 172 00:08:23,000 --> 00:08:23,000 apart 173 00:08:23,000 --> 00:08:27,000 using stream extraction into words, in this 174 00:08:27,000 --> 00:08:31,000 case tokens that are separated by white space using the default stream extraction. 175 00:08:31,000 --> 00:08:32,000 So it has a 176 00:08:32,000 --> 00:08:35,000 while true loop that keeps reading until it fails, and in this case it just 177 00:08:35,000 --> 00:08:37,000 prints them back out. 178 00:08:37,000 --> 00:08:39,000 What I'm gonna do is I'm actually gonna gather those words, 179 00:08:39,000 --> 00:08:43,000 and I'm gonna put them into a map, 180 00:08:43,000 --> 00:08:45,000 so this'll be my plan. 181 00:08:45,000 --> 00:08:46,000 So let's build a map 182 00:08:46,000 --> 00:08:47,000 that's gonna map 183 00:08:47,000 --> 00:08:49,000 184 00:08:49,000 --> 00:08:51,000 words to counts, so 185 00:08:51,000 --> 00:08:55,000 the number of times a word occurs in a document 186 00:08:55,000 --> 00:08:58,000 will be stored under that key. So the first time I see gthe,h I'll put a one in, and 187 00:08:58,000 --> 00:09:01,000 the next time I see a gthe,h I'll update that one into a two, 188 00:09:01,000 --> 00:09:04,000 and so on. And when I'm done, I should have a complete 189 00:09:04,000 --> 00:09:07,000 table of all the words that occurred in the document 190 00:09:07,000 --> 00:09:10,000 with the counts of the number of times they occurred. 191 00:09:10,000 --> 00:09:14,000 So let me go ahead and I'm gonna 192 00:09:14,000 --> 00:09:17,000 pass it up here by reference so I can fill it in with something, 193 00:09:17,000 --> 00:09:20,000 and then what I'm gonna do here 194 00:09:20,000 --> 00:09:23,000 -I'm gonna say if M contains key 195 00:09:23,000 --> 00:09:23,000 word 196 00:09:23,000 --> 00:09:26,000 -so this means that there was an existing 197 00:09:26,000 --> 00:09:28,000 entry. 198 00:09:28,000 --> 00:09:30,000 I'm gonna add to it 199 00:09:30,000 --> 00:09:43,000 under the word -let's 200 00:09:43,000 --> 00:09:44,000 take a look at what we did here. 201 00:09:44,000 --> 00:09:46,000 If it was already there, 202 00:09:46,000 --> 00:09:50,000 then I'm gonna get the value that was previously stored underneath it, add one 203 00:09:50,000 --> 00:09:54,000 to it, and then stick it back into the table, so that add is gonna overwrite the 204 00:09:54,000 --> 00:09:57,000 previous value. If it was four, it's gonna retrieve the four, add one to it, and then 205 00:09:57,000 --> 00:09:58,000 overwrite it with a five. 206 00:09:58,000 --> 00:10:02,000 In the case where it didn't contain that key, so no occurrence has been seen 207 00:10:02,000 --> 00:10:06,000 yet, we go ahead and establish a new entry in the able using m.add 208 00:10:06,000 --> 00:10:10,000 of the word with the count one. 209 00:10:10,000 --> 00:10:11,000 Yeah? Could you 210 00:10:11,000 --> 00:10:14,000 try the remove number function if there's no value there to develop the [inaudible]? 211 00:10:14,000 --> 00:10:17,000 It does not, actually. So remove can actually -partly it has to do with a 212 00:10:17,000 --> 00:10:20,000 little bit about can you do something reasonable in that case. 213 00:10:20,000 --> 00:10:22,000 Remove in this case can say there's nothing to remove. Get value 214 00:10:22,000 --> 00:10:25,000 just can't do anything reasonable. It's supposed to return you something. It's 215 00:10:25,000 --> 00:10:28,000 like there's nothing to return. I can't make it up, 216 00:10:28,000 --> 00:10:34,000 and that's why that's considered a more drastic situation. Question? You 217 00:10:34,000 --> 00:10:37,000 passed the map 218 00:10:37,000 --> 00:10:39,000 on the wrong 219 00:10:39,000 --> 00:10:39,000 line. 220 00:10:39,000 --> 00:10:42,000 Oh, you're right. I totally did. Thank you very much. That was gonna give me quite an 221 00:10:42,000 --> 00:10:46,000 error when I got that. 222 00:10:46,000 --> 00:10:47,000 There we go. 223 00:10:47,000 --> 00:10:48,000 Okay, fixed that. Thank you very much. 224 00:10:48,000 --> 00:10:50,000 And so then maybe when I'm done I could say 225 00:10:50,000 --> 00:10:53,000 num unique words, 226 00:10:53,000 --> 00:10:57,000 and then m.size. 227 00:10:57,000 --> 00:10:59,000 So if we let this guy go, it's 228 00:10:59,000 --> 00:11:01,000 gonna complain all over the place because it says 229 00:11:01,000 --> 00:11:03,000 what's this map of which you speak. Then we'll tell 230 00:11:03,000 --> 00:11:03,000 it 231 00:11:03,000 --> 00:11:09,000 here's a header file you'd like to derive. Okay. 232 00:11:09,000 --> 00:11:13,000 So there were 512 unique words in this document. It's 233 00:11:13,000 --> 00:11:17,000 the text of some handout we had earlier that I just quickly put in a text file. 234 00:11:17,000 --> 00:11:21,000 So this shows sort of the basic usage pattern will typically be I'm sticking stuff in 235 00:11:21,000 --> 00:11:24,000 there, and intentionally updating and changing, 236 00:11:24,000 --> 00:11:28,000 and then I in this case was using a count at the end. 237 00:11:28,000 --> 00:11:31,000 I'm gonna note one funky little thing about the map while I'm here 238 00:11:31,000 --> 00:11:33,000 because 239 00:11:33,000 --> 00:11:36,000 map has a shorthand access that 240 00:11:36,000 --> 00:11:40,000 allows you to retrieve the value associated with a key 241 00:11:40,000 --> 00:11:44,000 using a syntax that looks a little bit bizarre, but it's kinda becoming 242 00:11:44,000 --> 00:11:45,000 common 243 00:11:45,000 --> 00:11:48,000 -is to instead of saying m.get value word, 244 00:11:48,000 --> 00:11:49,000 that you say m 245 00:11:49,000 --> 00:11:52,000 of square brackets of word. 246 00:11:52,000 --> 00:11:53,000 247 00:11:53,000 --> 00:11:57,000 So that looks a little bit like an array access, and the first time I saw 248 00:11:57,000 --> 00:11:58,000 this I have to say I was 249 00:11:58,000 --> 00:12:00,000 just shocked, and I thought, gOh my god. What an abuse 250 00:12:00,000 --> 00:12:04,000 of the notational system,h but it has become so ubiquitous in languages now 251 00:12:04,000 --> 00:12:06,000 that I actually don't consider it so frightening anymore. 252 00:12:06,000 --> 00:12:08,000 It's basically saying 253 00:12:08,000 --> 00:12:09,000 reach into the table, 254 00:12:09,000 --> 00:12:13,000 and instead of using an index to identify which element you're doing, you're using a key and 255 00:12:13,000 --> 00:12:15,000 saying reach into the bucket 256 00:12:15,000 --> 00:12:18,000 that's tagged with this index and get the value back out. So this is effectively an 257 00:12:18,000 --> 00:12:20,000 m.get 258 00:12:20,000 --> 00:12:21,000 value of word but 259 00:12:21,000 --> 00:12:22,000 using this syntax. 260 00:12:22,000 --> 00:12:25,000 There's something that's a little bit 261 00:12:25,000 --> 00:12:26,000 wacky bout 262 00:12:26,000 --> 00:12:31,000 this syntax though. It's not exactly equivalent to 263 00:12:31,000 --> 00:12:34,000 get value in that it 264 00:12:34,000 --> 00:12:37,000 doesn't mind 265 00:12:37,000 --> 00:12:40,000 if you try to access something that doesn't exist. 266 00:12:40,000 --> 00:12:41,000 And what it will do 267 00:12:41,000 --> 00:12:45,000 is its strategy is if you access a word 268 00:12:45,000 --> 00:12:47,000 m of some word where this is not an existing key, 269 00:12:47,000 --> 00:12:49,000 it will create a pair, 270 00:12:49,000 --> 00:12:52,000 it will tag it with the key you asked for, 271 00:12:52,000 --> 00:12:56,000 and then it will kind of leave the associated value as to whatever the default value 272 00:12:56,000 --> 00:12:59,000 for that type is. So in this case it's almost like you declared an int variable on 273 00:12:59,000 --> 00:13:02,000 the stack, what do you get? It turns out you'll get garbage. 274 00:13:02,000 --> 00:13:07,000 You'll get kind of an uninitialized number which is of very little value, but 275 00:13:07,000 --> 00:13:10,000 the side effect of then having set this pair was up that we could immediately 276 00:13:10,000 --> 00:13:12,000 go in, and write on top of it, and 277 00:13:12,000 --> 00:13:15,000 make the assignment into the one. And so whatever junk condits were 278 00:13:15,000 --> 00:13:18,000 there were only temporary, and then I immediately overwrote it 279 00:13:18,000 --> 00:13:19,000 with that. 280 00:13:19,000 --> 00:13:25,000 In effect, that also means I can do things like this 281 00:13:25,000 --> 00:13:26,000 where I 282 00:13:26,000 --> 00:13:28,000 do m.word 283 00:13:28,000 --> 00:13:32,000 plus equals one, 284 00:13:32,000 --> 00:13:35,000 and now I'm using it both for the add and the get value part. So the m 285 00:13:35,000 --> 00:13:37,000 bracket form kind of 286 00:13:37,000 --> 00:13:40,000 handles both the things you think of as being add -set adding something into 287 00:13:40,000 --> 00:13:46,000 there, overwriting or adding a new value, as well as 288 00:13:46,000 --> 00:13:48,000 just retrieving it when you wanna read it like a get value. 289 00:13:48,000 --> 00:13:52,000 So you'll see both of these used, but they do have a little bit of a quirky 290 00:13:52,000 --> 00:13:53,000 behavior that 291 00:13:53,000 --> 00:13:56,000 using the get value without the 292 00:13:56,000 --> 00:13:58,000 existing key throws an error. Using it 293 00:13:58,000 --> 00:14:01,000 this way actually kinda sets up an empty error that the assumption is that you're using it 294 00:14:01,000 --> 00:14:06,000 in this context where you're trying to immediately create it and overwrite it. 295 00:14:06,000 --> 00:14:09,000 Question? Does the case not matter 296 00:14:09,000 --> 00:14:11,000 297 00:14:11,000 --> 00:14:15,000 for map? If it's ? Case does matter. If I had the word capital T gtheh 298 00:14:15,000 --> 00:14:17,000 and lowercase gthe,h those are different words, so it 299 00:14:17,000 --> 00:14:23,000 really does -basically what you think of a string equals comparisons at the base level. If I 300 00:14:23,000 --> 00:14:26,000 wanted it to be case insensitive, then I would need to actually go out of my way to 301 00:14:26,000 --> 00:14:28,000 convert the keys to 302 00:14:28,000 --> 00:14:29,000 uniform case 303 00:14:29,000 --> 00:14:31,000 to guarantee that they would match other 304 00:14:31,000 --> 00:14:37,000 forms of the word. 305 00:14:37,000 --> 00:14:39,000 So let me go back -and 306 00:14:39,000 --> 00:14:41,000 that's a little bit of code there. 307 00:14:41,000 --> 00:14:43,000 So let me 308 00:14:43,000 --> 00:14:46,000 just review a little bit about where we're at. That shorthand operator 309 00:14:46,000 --> 00:14:47,000 that we have 310 00:14:47,000 --> 00:14:52,000 -I said I would explain why is it that keys have to be string type. 311 00:14:52,000 --> 00:14:55,000 That if we're trying really hard to build these general purpose containers, it seems like it 312 00:14:55,000 --> 00:14:59,000 would be extra convenient if we could say the key is an integer, 313 00:14:59,000 --> 00:15:01,000 and the mapping is 314 00:15:01,000 --> 00:15:03,000 -we could use our 315 00:15:03,000 --> 00:15:05,000 student ID numbers, and it's a 316 00:15:05,000 --> 00:15:08,000 student's record that's there. Why does it have to be string type? 317 00:15:08,000 --> 00:15:12,000 Well, it turns out that it really does use the known structure of strings 318 00:15:12,000 --> 00:15:16,000 to store them efficiently. It's capitalizing on certain properties that 319 00:15:16,000 --> 00:15:18,000 strings and strings only have 320 00:15:18,000 --> 00:15:22,000 to decide how to store those things so that it can actually do this very fast lookup 321 00:15:22,000 --> 00:15:26,000 that does not generally apply to other types of things. I couldn't say 322 00:15:26,000 --> 00:15:29,000 the key is a student structure and you have to match student structures. It's like 323 00:15:29,000 --> 00:15:33,000 there's not easy efficient ways to match any kind of value type, but there 324 00:15:33,000 --> 00:15:35,000 are certain things you can do with strings and strings only that it's 325 00:15:35,000 --> 00:15:37,000 capitalizing on. 326 00:15:37,000 --> 00:15:40,000 So in the case where you have something and you're trying to use a key that isn't already in a 327 00:15:40,000 --> 00:15:42,000 string type, 328 00:15:42,000 --> 00:15:45,000 the recommended way to get around that is you have to figure out a way to convert it to a 329 00:15:45,000 --> 00:15:47,000 string. So if you have an 330 00:15:47,000 --> 00:15:50,000 int, a student ID number that you wanna use, or a phone number, something like that, 331 00:15:50,000 --> 00:15:54,000 you just convert it into a string form. You take the integer; you make it 332 00:15:54,000 --> 00:15:55,000 into a string. 333 00:15:55,000 --> 00:15:56,000 334 00:15:56,000 --> 00:15:59,000 If you have a first name and a last name, and you wanna use them both, then you concatenate 335 00:15:59,000 --> 00:16:03,000 them together, and use that string as the key -just ways of building a string out 336 00:16:03,000 --> 00:16:05,000 of what you have. 337 00:16:05,000 --> 00:16:07,000 It's a little bit of a hack, but 338 00:16:07,000 --> 00:16:07,000 it 339 00:16:07,000 --> 00:16:12,000 solves the problem without too much trouble in most situations. The 340 00:16:12,000 --> 00:16:16,000 other question that comes up is what if I have more than one value for a key? 341 00:16:16,000 --> 00:16:18,000 The thesaurus example I gave earlier, 342 00:16:18,000 --> 00:16:20,000 the key is the word. 343 00:16:20,000 --> 00:16:25,000 The thing I want to map it to is the synonyms, that list of words. 344 00:16:25,000 --> 00:16:26,000 If I did add a 345 00:16:26,000 --> 00:16:28,000 word with each synonym, 346 00:16:28,000 --> 00:16:32,000 each subsequent synonym would just replace any previous pairing I'd put in. 347 00:16:32,000 --> 00:16:35,000 If what I really want is a 348 00:16:35,000 --> 00:16:37,000 one to many relationship, 349 00:16:37,000 --> 00:16:39,000 one word many options, 350 00:16:39,000 --> 00:16:41,000 then what I can do is just use vector as the value, 351 00:16:41,000 --> 00:16:43,000 so build a map 352 00:16:43,000 --> 00:16:47,000 containing vectors containing strings. So I'll get a nested template out of that, 353 00:16:47,000 --> 00:16:51,000 and then when I wanna add a new synonym to the map, it's a matter of 354 00:16:51,000 --> 00:16:55,000 pulling out the existing map that's there and adding it to the end of that vector, so it actually is kind of a two 355 00:16:55,000 --> 00:16:59,000 step add to 356 00:16:59,000 --> 00:17:02,000 get it into the vector and get that vector into the table. 357 00:17:02,000 --> 00:17:04,000 So the last thing that's missing 358 00:17:04,000 --> 00:17:08,000 is one that I'll go back to my code and we can do together. 359 00:17:08,000 --> 00:17:10,000 Once I've put the information in the table, 360 00:17:10,000 --> 00:17:14,000 I hope you believe me that I can get it back fast. I can check to see if there's an 361 00:17:14,000 --> 00:17:17,000 existing entry and do the lookup by key, 362 00:17:17,000 --> 00:17:21,000 put new entries in, replace it, overwrite -all of that in the interface so far 363 00:17:21,000 --> 00:17:22,000 seems just fine. 364 00:17:22,000 --> 00:17:25,000 But let's say I just wanna see that whole table. 365 00:17:25,000 --> 00:17:29,000 I wanna print my class list. I wanna see the thesaurus or the dictionary 366 00:17:29,000 --> 00:17:32,000 just spewed out, or just actually walk through it and look for new words. 367 00:17:32,000 --> 00:17:37,000 As it stands, the interface doesn't give you that access. They're not indexed. 368 00:17:37,000 --> 00:17:40,000 They're not in there under Slot 0, Slot 1, or Slot 2. It's not like a 369 00:17:40,000 --> 00:17:43,000 stack or a queue where you can just dequeue them back out to see all the things you put in 370 00:17:43,000 --> 00:17:46,000 there. They kinda go in and so far it's the roach motel. You can put them in, and if you 371 00:17:46,000 --> 00:17:49,000 know who you're looking for, you can get them back, 372 00:17:49,000 --> 00:17:52,000 but there's no way to kind of scan the contents. 373 00:17:52,000 --> 00:17:56,000 What we're gonna see is that map provides access to the elements 374 00:17:56,000 --> 00:17:58,000 using a tool called an iterator. 375 00:17:58,000 --> 00:18:02,000 This is the same tool that Java uses, so if you've seen that, you're already ahead of 376 00:18:02,000 --> 00:18:03,000 the game. 377 00:18:03,000 --> 00:18:04,000 Where 378 00:18:04,000 --> 00:18:07,000 you would like to see the entries in it, and rather than giving you a whole vector or some 379 00:18:07,000 --> 00:18:10,000 other kind of random access version of it, 380 00:18:10,000 --> 00:18:13,000 it provides you with a little intermediary. 381 00:18:13,000 --> 00:18:15,000 In this case, this class is called iterator. 382 00:18:15,000 --> 00:18:17,000 You create an iterator 383 00:18:17,000 --> 00:18:18,000 384 00:18:18,000 --> 00:18:20,000 on 385 00:18:20,000 --> 00:18:22,000 the class. Here, let me just go 386 00:18:22,000 --> 00:18:25,000 write the code. It's better to see that. So if I got here and I said I wanna see what 387 00:18:25,000 --> 00:18:26,000 they are, 388 00:18:26,000 --> 00:18:31,000 what I do is I create an iterator. The map's iterator name is a little bit 389 00:18:31,000 --> 00:18:34,000 wacky. 390 00:18:34,000 --> 00:18:37,000 There are iterators on both the map and the set, and to distinguish them because 391 00:18:37,000 --> 00:18:42,000 they're similar but not exactly the same thing, the iterator type is actually 392 00:18:42,000 --> 00:18:46,000 declared within the map class itself. So it's actually map 393 00:18:46,000 --> 00:18:50,000 of the type in angle brackets colon colon iterator is the name of the 394 00:18:50,000 --> 00:18:51,000 iterator for 395 00:18:51,000 --> 00:18:55,000 an iterator that walks over a map containing integers. 396 00:18:55,000 --> 00:18:57,000 We get one of those 397 00:18:57,000 --> 00:18:58,000 from the map 398 00:18:58,000 --> 00:19:02,000 by asking for it with the iterator member function. 399 00:19:02,000 --> 00:19:04,000 So that sets you up an iterator that's positioned to kind of work its way 400 00:19:04,000 --> 00:19:05,000 through it. 401 00:19:05,000 --> 00:19:09,000 The map iterator makes no guarantee about 402 00:19:09,000 --> 00:19:12,000 what order it will visit them, 403 00:19:12,000 --> 00:19:14,000 but it will visit them all. It 404 00:19:14,000 --> 00:19:21,000 uses a syntax that looks like this. And 405 00:19:21,000 --> 00:19:22,000 I say 406 00:19:22,000 --> 00:19:26,000 iter.hasNext -are there more things to retrieve? So the iterator's kind of keeping track of 407 00:19:26,000 --> 00:19:29,000 where we are as we're working our way through the map. 408 00:19:29,000 --> 00:19:32,000 If there are more keys we haven't yet visited, haven't yet retrieved from the 409 00:19:32,000 --> 00:19:36,000 iterator, it will return true in which case the call to iter.next will 410 00:19:36,000 --> 00:19:39,000 retrieve the next key to be visited, 411 00:19:39,000 --> 00:19:41,000 and then advance the iter, 412 00:19:41,000 --> 00:19:45,000 so it will have moved past that and kind of moved that to the already visited side, 413 00:19:45,000 --> 00:19:48,000 and now it will continue to work on the things that are 414 00:19:48,000 --> 00:19:51,000 unvisited. As I keep doing that, iter.next is 415 00:19:51,000 --> 00:19:55,000 both retrieving it and advancing it, so you typically wanna call that once inside the 416 00:19:55,000 --> 00:19:55,000 loop 417 00:19:55,000 --> 00:19:57,000 to get the value, 418 00:19:57,000 --> 00:20:01,000 and then check that hasNext again to see whether there's more and keep going. 419 00:20:01,000 --> 00:20:02,000 It gives you just the key. 420 00:20:02,000 --> 00:20:06,000 It doesn't give you the pair. The pair is just an easy call away, 421 00:20:06,000 --> 00:20:07,000 though. 422 00:20:07,000 --> 00:20:08,000 So I can say 423 00:20:08,000 --> 00:20:12,000 key 424 00:20:12,000 --> 00:20:13,000 equals 425 00:20:13,000 --> 00:20:17,000 and then M of key. 426 00:20:17,000 --> 00:20:22,000 And so using the shorthand syntax or the M -the get value, 427 00:20:22,000 --> 00:20:25,000 either way once I have a key, it's very easy to look up the value. In fact, it's so 428 00:20:25,000 --> 00:20:28,000 efficient, it means there's actually really no harm in 429 00:20:28,000 --> 00:20:31,000 just getting the key and going back in to get the value separately. It's still works out 430 00:20:31,000 --> 00:20:33,000 just fine. 431 00:20:33,000 --> 00:20:36,000 So when I do this, this should actually iterate through, 432 00:20:36,000 --> 00:20:38,000 show me every entry in the 433 00:20:38,000 --> 00:20:40,000 map, 434 00:20:40,000 --> 00:20:41,000 and the 435 00:20:41,000 --> 00:20:42,000 count for 436 00:20:42,000 --> 00:20:45,000 it. So 437 00:20:45,000 --> 00:20:48,000 one thing I will note is the sequence by which it's traveling through 438 00:20:48,000 --> 00:20:49,000 seems to be 439 00:20:49,000 --> 00:20:52,000 effectively random. 440 00:20:52,000 --> 00:20:56,000 You see my reasonable experience few following somewhere using 441 00:20:56,000 --> 00:20:58,000 the numbers 1, 4, 2, 442 00:20:58,000 --> 00:21:02,000 so there is no pattern. There is no rhyme or reason. It turns out it is 443 00:21:02,000 --> 00:21:06,000 actually internally you're doing something kinda sensible, but it does not guarantee as 444 00:21:06,000 --> 00:21:07,000 for the iterator 445 00:21:07,000 --> 00:21:11,000 anything predictable to you about which sequence it will visit them in. It 446 00:21:11,000 --> 00:21:14,000 says I will get them all before it's all done and said, 447 00:21:14,000 --> 00:21:18,000 but don't count on seeing them in alphabetical order, reverse alphabetical order, in 448 00:21:18,000 --> 00:21:20,000 order of increasing 449 00:21:20,000 --> 00:21:22,000 value or anything clever like that. 450 00:21:22,000 --> 00:21:23,000 It will get to them 451 00:21:23,000 --> 00:21:35,000 all. That's all you can be sure of. 452 00:21:35,000 --> 00:21:38,000 That is the [inaudible] that's up here. Question? On the previous 453 00:21:38,000 --> 00:21:41,000 page, on the previous slide, you mentioned something about 454 00:21:41,000 --> 00:21:43,000 [inaudible] 455 00:21:43,000 --> 00:21:49,000 finding the [inaudible], you would have to use a vector for that? Yeah. So if I had a 456 00:21:49,000 --> 00:21:52,000 one to many in that case, so maybe it was synonym to vector of strings, then I 457 00:21:52,000 --> 00:21:56,000 could use the iterator to go through and say how big is this vector compared 458 00:21:56,000 --> 00:21:59,000 to the other vectors I've seen so far. So I keep track of let's say the [inaudible] so 459 00:21:59,000 --> 00:22:00,000 far. So you'd run an iterator 460 00:22:00,000 --> 00:22:03,000 that -so let's just say I wanna find the most frequent word. That's 461 00:22:03,000 --> 00:22:04,000 about the same operation. Let 462 00:22:04,000 --> 00:22:06,000 me come back over here. 463 00:22:06,000 --> 00:22:10,000 If I just wanna see the most frequent, I could do this. 464 00:22:10,000 --> 00:22:12,000 I could say 465 00:22:12,000 --> 00:22:14,000 string max, 466 00:22:14,000 --> 00:22:17,000 and I'll say int 467 00:22:17,000 --> 00:22:19,000 max count equals zero. 468 00:22:19,000 --> 00:22:23,000 So the max is empty here. When I get this thing, I say if 469 00:22:23,000 --> 00:22:26,000 M of key 470 00:22:26,000 --> 00:22:30,000 is greater than my max count, 471 00:22:30,000 --> 00:22:35,000 then I want my max to be the key, and my max count to be M 472 00:22:35,000 --> 00:22:40,000 of the key. So I've 473 00:22:40,000 --> 00:22:42,000 started right with no assumptions about what the max is, 474 00:22:42,000 --> 00:22:45,000 and any time I see something that's greater than the max I have so far, I 475 00:22:45,000 --> 00:22:48,000 update that. So if this were a vector, I would be checking to see that the size 476 00:22:48,000 --> 00:22:52,000 of a value is bigger than the size of the previous vector. 477 00:22:52,000 --> 00:22:53,000 When I'm done, 478 00:22:53,000 --> 00:22:56,000 I can say max 479 00:22:56,000 --> 00:22:58,000 equals 480 00:22:58,000 --> 00:23:04,000 max 481 00:23:04,000 --> 00:23:08,000 count. It turns out gthe.h The most common word in there? What a 482 00:23:08,000 --> 00:23:10,000 surprise, right? It shows up 483 00:23:10,000 --> 00:23:12,000 42 times. So the same sort of [inaudible] would be used 484 00:23:12,000 --> 00:23:15,000 if there was some other thing I wanna do. So what the iterator is just giving us 485 00:23:15,000 --> 00:23:17,000 is a general-purpose way of 486 00:23:17,000 --> 00:23:19,000 walking through all the values. And then what you wanna do with them when you get 487 00:23:19,000 --> 00:23:22,000 there is your business. Do you wanna print them? Do you wanna put them in a file? Do you wanna 488 00:23:22,000 --> 00:23:26,000 compare them to find the top two, or the smallest, or the biggest, or 489 00:23:26,000 --> 00:23:28,000 whatever is 490 00:23:28,000 --> 00:23:30,000 up to you. So it's just giving you 491 00:23:30,000 --> 00:23:33,000 access to that data that you stuck in there 492 00:23:33,000 --> 00:23:39,000 and get it back out. So what does iter.next do? 493 00:23:39,000 --> 00:23:40,000 Iter.next is -think of 494 00:23:40,000 --> 00:23:44,000 it as an abstraction. The idea is that it's almost like it's got a little 495 00:23:44,000 --> 00:23:46,000 pointer. It's kind of like if I were in charge of walking 496 00:23:46,000 --> 00:23:48,000 through this class and returning each student, it 497 00:23:48,000 --> 00:23:50,000 is a pointer 498 00:23:50,000 --> 00:23:53,000 that given a certain student will return you the student I'm currently pointing 499 00:23:53,000 --> 00:23:53,000 at, 500 00:23:53,000 --> 00:23:56,000 and then move to another random student. And at any given point when I'm done them 501 00:23:56,000 --> 00:24:00,000 all, that hasNext returns false. And so next does two things, which is 502 00:24:00,000 --> 00:24:03,000 return to you the next key that you haven't yet seen, 503 00:24:03,000 --> 00:24:07,000 but advances kind of the iterator as a side effect so you can now test for are 504 00:24:07,000 --> 00:24:11,000 there any more, and if so, the next call will return 505 00:24:11,000 --> 00:24:12,000 a different one. 506 00:24:12,000 --> 00:24:15,000 So every time you call, it returns a new unvisited 507 00:24:15,000 --> 00:24:16,000 508 00:24:16,000 --> 00:24:17,000 key in the map until 509 00:24:17,000 --> 00:24:21,000 they're all gone. [Inaudible] or 510 00:24:21,000 --> 00:24:22,000 the next one? 511 00:24:22,000 --> 00:24:26,000 Well, it returns the next one in the iteration. You can imagine that it's almost like 512 00:24:26,000 --> 00:24:29,000 a caret that's between characters at any 513 00:24:29,000 --> 00:24:32,000 given point. It will return the next character, 514 00:24:32,000 --> 00:24:36,000 but if you call it again, it will return a new one. So it never returns the same 515 00:24:36,000 --> 00:24:38,000 value unless there was the same 516 00:24:38,000 --> 00:24:39,000 thing being iterated over. 517 00:24:39,000 --> 00:24:44,000 So it advances and returns in one step. 518 00:24:44,000 --> 00:24:47,000 That is key. There are some iterators that don't have that design. They tend to 519 00:24:47,000 --> 00:24:50,000 return the current one, and then you side effect, the STL iterators for example. 520 00:24:50,000 --> 00:24:53,000 In this one is kind of more of a Java style iterator. It does both. 521 00:24:53,000 --> 00:24:57,000 So if you needed to use the value a couple times now, you'd probably store it in a local variable, and 522 00:24:57,000 --> 00:24:59,000 then use it throughout that thing, and then 523 00:24:59,000 --> 00:25:07,000 only call iter.next once in each loop. 524 00:25:07,000 --> 00:25:12,000 There is an alternate way of doing something to every 525 00:25:12,000 --> 00:25:12,000 pair in a map. I'm 526 00:25:12,000 --> 00:25:15,000 actually just not gonna show it to you 527 00:25:15,000 --> 00:25:16,000 because I think actually it's 528 00:25:16,000 --> 00:25:20,000 something that's overkill. So there is something you can read about in Handout 529 00:25:20,000 --> 00:25:23,000 14 and look at which is called the mapping function. 530 00:25:23,000 --> 00:25:26,000 I'm gonna actually just skip it because I think it confuses the 531 00:25:26,000 --> 00:25:30,000 issue more than it helps. When you know how to do iteration, it's a great way to get 532 00:25:30,000 --> 00:25:32,000 access to all things in a map, and since the 533 00:25:32,000 --> 00:25:34,000 other function is actually just a more complicated way to get at the same 534 00:25:34,000 --> 00:25:39,000 functionality, I won't bore you with it. 535 00:25:39,000 --> 00:25:41,000 Then let me tell you about set. 536 00:25:41,000 --> 00:25:42,000 Once you have set, 537 00:25:42,000 --> 00:25:44,000 you've got it all. 538 00:25:44,000 --> 00:25:46,000 So map I think is the heavy lifter 539 00:25:46,000 --> 00:25:48,000 of the class library. The set 540 00:25:48,000 --> 00:25:53,000 is kind of a close second best in terms of just everyday utility 541 00:25:53,000 --> 00:25:54,000 value. 542 00:25:54,000 --> 00:25:57,000 What set is is a little bit like a vector in some ways, 543 00:25:57,000 --> 00:26:01,000 but it has the added features that 544 00:26:01,000 --> 00:26:05,000 there is no sequencing implied or 545 00:26:05,000 --> 00:26:07,000 maintained by it. If you have the set that you've added 3 and 5 to, 546 00:26:07,000 --> 00:26:10,000 if you add 5 and 3 in the other order, you end up with the same set when you're 547 00:26:10,000 --> 00:26:13,000 done. Those two sets are equal. They have the same contents, so 548 00:26:13,000 --> 00:26:17,000 it doesn't consider the order of insertion as important at all. It's just about 549 00:26:17,000 --> 00:26:20,000 membership. Does it contain these values? 550 00:26:20,000 --> 00:26:23,000 There are never any duplicate elements. If you have a set containing 3 and 5, and you 551 00:26:23,000 --> 00:26:26,000 attempt to add 3 again, you don't get the set 3, 3, 5. You just get 3, 5. So kind of like 552 00:26:26,000 --> 00:26:29,000 the mathematical property that it derives its name from, there 553 00:26:29,000 --> 00:26:33,000 I just existence. Is the number in or not? 554 00:26:33,000 --> 00:26:37,000 And adding it again does not change 555 00:26:37,000 --> 00:26:39,000 its status if it was already there. 556 00:26:39,000 --> 00:26:41,000 So your typical usage is you make an empty set. 557 00:26:41,000 --> 00:26:45,000 You can use add, remove, and contains to operate on individual elements. Add this one. 558 00:26:45,000 --> 00:26:46,000 Remove that one. 559 00:26:46,000 --> 00:26:48,000 Contains 560 00:26:48,000 --> 00:26:52,000 -remove has the same behavior that it did on the 561 00:26:52,000 --> 00:26:53,000 562 00:26:53,000 --> 00:26:57,000 map which is it kind of silently fails if you ask it to remove something that wasn't there. 563 00:26:57,000 --> 00:27:00,000 If you ask add to add something that was already there, it silently doesn't change 564 00:27:00,000 --> 00:27:02,000 anything, so both of those operations just kind of 565 00:27:02,000 --> 00:27:06,000 quietly deal with the cases that might be a little bit unusual. And then contains will give 566 00:27:06,000 --> 00:27:10,000 you information about does this number exist in the set. All of those are very efficient, 567 00:27:10,000 --> 00:27:14,000 can be done many, many times a microsecond without taking any 568 00:27:14,000 --> 00:27:15,000 processor time at all, so 569 00:27:15,000 --> 00:27:18,000 that means it's very handy for just throwing things in a set and then later wanting 570 00:27:18,000 --> 00:27:22,000 to know if it's there, as opposed to walking your way through a vector 571 00:27:22,000 --> 00:27:25,000 piecemeal to see if something was there. So for example, a vector doesn't have 572 00:27:25,000 --> 00:27:28,000 anything like this. If you wanna know if 573 00:27:28,000 --> 00:27:29,000 574 00:27:29,000 --> 00:27:32,000 Julie is in your list of students, and you have them in a vector, the only way to 575 00:27:32,000 --> 00:27:35,000 know is to walk through it Sub 1, Sub 2, Sub 3 until you 576 00:27:35,000 --> 00:27:38,000 find it or you run out of entries to check. 577 00:27:38,000 --> 00:27:41,000 The contains operation does a blinding fast sort of check 578 00:27:41,000 --> 00:27:44,000 and can almost immediately tell you is Julie in there or not, whether your entries 579 00:27:44,000 --> 00:27:46,000 are a thousand, 580 00:27:46,000 --> 00:27:49,000 a million, a billion, 581 00:27:49,000 --> 00:27:53,000 pretty much immediate access to dig it out. 582 00:27:53,000 --> 00:27:56,000 It also offers these high level operations, and 583 00:27:56,000 --> 00:27:59,000 these actually are where sets are particularly valuable is this 584 00:27:59,000 --> 00:28:02,000 idea of unioning, intersection, subtracting, checking to see whether two sets are equal, so whether they 585 00:28:02,000 --> 00:28:05,000 have all the same numbers, or whether one has a subset of another 586 00:28:05,000 --> 00:28:08,000 -that allow you to model -a lot of times the thing you want to 587 00:28:08,000 --> 00:28:12,000 take your code to do is something that in the real world needs a subset or an 588 00:28:12,000 --> 00:28:14,000 intersect kind of operation. 589 00:28:14,000 --> 00:28:17,000 You'd like to know if the courses you have taken 590 00:28:17,000 --> 00:28:20,000 meet the requirements you need to graduate. Well, if the requirements 591 00:28:20,000 --> 00:28:23,000 to graduate are a subset of what you take, whether they're a proper subset or not, if they're 592 00:28:23,000 --> 00:28:25,000 all in there, then you can graduate. 593 00:28:25,000 --> 00:28:27,000 If you wanna know 594 00:28:27,000 --> 00:28:30,000 if you and I are taking any classes together, or have any requirements 595 00:28:30,000 --> 00:28:33,000 that we both need to do, we can take the requirements, see what we have, subtract 596 00:28:33,000 --> 00:28:35,000 what's left, intersect that 597 00:28:35,000 --> 00:28:38,000 to find out what we have left to see if there's any classes we could take 598 00:28:38,000 --> 00:28:41,000 together and both satisfy requirements. 599 00:28:41,000 --> 00:28:44,000 Any kind of compound Boolean queries like when you're trying to do 600 00:28:44,000 --> 00:28:47,000 searches on an index, you wanna see pages that have this word, and this word, 601 00:28:47,000 --> 00:28:49,000 or that word, and not that 602 00:28:49,000 --> 00:28:51,000 word are very easily modeled in terms of sets. 603 00:28:51,000 --> 00:28:54,000 You have the set that have A and the set that have B. If you 604 00:28:54,000 --> 00:28:57,000 intersect them, it will tell you about which things have both. 605 00:28:57,000 --> 00:29:01,000 If you wanna see the gor,h you can use the union and find out 606 00:29:01,000 --> 00:29:02,000 which things have either. 607 00:29:02,000 --> 00:29:04,000 Sometimes you use sets 608 00:29:04,000 --> 00:29:07,000 simply for this idea of coalescing duplicates. 609 00:29:07,000 --> 00:29:10,000 If I just wanted to see the set of words that were in my file, and I don't care about 610 00:29:10,000 --> 00:29:12,000 how many times a word occurs, 611 00:29:12,000 --> 00:29:15,000 I could just dump them all into a set, and then when I'm done, I will know 612 00:29:15,000 --> 00:29:18,000 what the 512 unique words are, 613 00:29:18,000 --> 00:29:19,000 and I will have 614 00:29:19,000 --> 00:29:22,000 not had to chase down did I already have a copy of that word because the set's add 615 00:29:22,000 --> 00:29:28,000 just automatically coalesces with any existing entry that matches. 616 00:29:28,000 --> 00:29:29,000 So let me show 617 00:29:29,000 --> 00:29:31,000 you its interface. This is probably 618 00:29:31,000 --> 00:29:34,000 the biggest of all our classes in terms of number of 619 00:29:34,000 --> 00:29:36,000 member functions there, 620 00:29:36,000 --> 00:29:38,000 and features of them. 621 00:29:38,000 --> 00:29:41,000 I'm not gonna talk about the constructor just yet because it has 622 00:29:41,000 --> 00:29:45,000 something scary in it that we're gonna come back to which is a little bit fancy 623 00:29:45,000 --> 00:29:48,000 in terms of how it works. 624 00:29:48,000 --> 00:29:49,000 625 00:29:49,000 --> 00:29:52,000 There's a default argument over there. I'll note that, so that means that actually if 626 00:29:52,000 --> 00:29:53,000 you don't specify, 627 00:29:53,000 --> 00:29:57,000 in some situations the set doesn't need that argument and can kind of figure out for 628 00:29:57,000 --> 00:29:58,000 itself what to do. 629 00:29:58,000 --> 00:30:02,000 So we can create a set. We can ask for the size. Is empty 630 00:30:02,000 --> 00:30:04,000 -just checking to see if the size is zero. 631 00:30:04,000 --> 00:30:06,000 The things that operate on a single element here, 632 00:30:06,000 --> 00:30:09,000 adding, removing, containing 633 00:30:09,000 --> 00:30:12,000 -and then ones that operate on the set as a whole comparing it to another 634 00:30:12,000 --> 00:30:14,000 set. 635 00:30:14,000 --> 00:30:18,000 That set there is one of the few places where you're allowed to use the name set 636 00:30:18,000 --> 00:30:22,000 without the qualification. That's because we're in the template, 637 00:30:22,000 --> 00:30:25,000 and in this situation, this set without qualification is assumed to be whatever 638 00:30:25,000 --> 00:30:27,000 set is currently being built. 639 00:30:27,000 --> 00:30:30,000 So if I have a set of int, 640 00:30:30,000 --> 00:30:33,000 the equals method for set of int 641 00:30:33,000 --> 00:30:35,000 expects another set of int as the argument. 642 00:30:35,000 --> 00:30:39,000 Same for is subset, and union, and intersect, so it actually kinda matches it out 643 00:30:39,000 --> 00:30:42,000 correctly all the way through, so you can only union sets of integers 644 00:30:42,000 --> 00:30:46,000 with other sets of integers. That should make sense to you. 645 00:30:46,000 --> 00:30:51,000 These operations return the Boolean. These guys actually destructively 646 00:30:51,000 --> 00:30:55,000 modify the receiver. So you have a set that you're a 647 00:30:55,000 --> 00:30:56,000 messaging with a union with. 648 00:30:56,000 --> 00:31:00,000 You give it another set. It will join into the receiver set, so the 649 00:31:00,000 --> 00:31:03,000 receiver set will be enlarged by whatever new members are contributed by 650 00:31:03,000 --> 00:31:04,000 the other set. 651 00:31:04,000 --> 00:31:08,000 Intersect could potentially shrink this set 652 00:31:08,000 --> 00:31:11,000 down to just those elements that are in common with this other one. And then 653 00:31:11,000 --> 00:31:15,000 subtract takes all the elements that are in there that are present in here and removes them. 654 00:31:15,000 --> 00:31:17,000 So you can think of those as 655 00:31:17,000 --> 00:31:18,000 656 00:31:18,000 --> 00:31:20,000 just modeling what the 657 00:31:20,000 --> 00:31:23,000 [inaudible] formula for those operations are. 658 00:31:23,000 --> 00:31:25,000 It also uses an iterator. 659 00:31:25,000 --> 00:31:28,000 Like the map, once you stick them in there, they're not indexed. They're not by 660 00:31:28,000 --> 00:31:30,000 number. There's no way to say give me the nth element. 661 00:31:30,000 --> 00:31:32,000 662 00:31:32,000 --> 00:31:32,000 663 00:31:32,000 --> 00:31:36,000 There's no sequencing to them, so you use an iterator if you wanna walk through a set and 664 00:31:36,000 --> 00:31:39,000 see what's in there. So let me 665 00:31:39,000 --> 00:31:41,000 write you a little set 666 00:31:41,000 --> 00:31:45,000 code. Any questions about its basic interface? 667 00:31:45,000 --> 00:31:51,000 Why would somebody use a vector over a set? Sometimes you actually do care about 668 00:31:51,000 --> 00:31:54,000 ordering, or you do want duplicates. You might have a bunch of names 669 00:31:54,000 --> 00:31:56,000 where maybe some people are gonna have the same names, 670 00:31:56,000 --> 00:31:58,000 or the ordering really is -that 671 00:31:58,000 --> 00:32:01,000 you wanna know who's in what room in a dorm, you might be using the index to say it's 672 00:32:01,000 --> 00:32:04,000 in Room 100. That's at what index? So definitely when you care about that index, 673 00:32:04,000 --> 00:32:05,000 674 00:32:05,000 --> 00:32:07,000 and where you also want that random access 675 00:32:07,000 --> 00:32:10,000 that I know a particular number, and I wanna see what's in that box. There really 676 00:32:10,000 --> 00:32:14,000 isn't that same feature in a set. There's no way to get the nth member. 677 00:32:14,000 --> 00:32:17,000 If you just wanted to pick a random member out of a set, 678 00:32:17,000 --> 00:32:20,000 the only way to do it would be to open up the iterator and take a random 679 00:32:20,000 --> 00:32:23,000 number of steps forward, whereas in a vector you can say just pick from zero 680 00:32:23,000 --> 00:32:24,000 to N minus one. So there 681 00:32:24,000 --> 00:32:28,000 are definitely things that vector can do that set doesn't, 682 00:32:28,000 --> 00:32:31,000 and it just depends on the task which you have. Do you need 683 00:32:31,000 --> 00:32:31,000 684 00:32:31,000 --> 00:32:35,000 duplicates? You're definitely stuck with vector. Do you care about random access? Do you 685 00:32:35,000 --> 00:32:37,000 use the index in some interesting way? 686 00:32:37,000 --> 00:32:40,000 Do you care about recording the sequence, like knowing who was first or last that 687 00:32:40,000 --> 00:32:42,000 you added? 688 00:32:42,000 --> 00:32:42,000 Set 689 00:32:42,000 --> 00:32:47,000 doesn't track those things in the same way. So let me write 690 00:32:47,000 --> 00:32:51,000 a little code that uses a set, 691 00:32:51,000 --> 00:32:54,000 and 692 00:32:54,000 --> 00:32:56,000 693 00:32:56,000 --> 00:33:00,000 let's write this. 694 00:33:00,000 --> 00:33:03,000 I'm gonna write something that tests the random number generator. Every time I do this it 695 00:33:03,000 --> 00:33:06,000 alarms me, but I think it's good to know. 696 00:33:06,000 --> 00:33:08,000 697 00:33:08,000 --> 00:33:12,000 I'm gonna keep a list of the numbers I've seen, 698 00:33:12,000 --> 00:33:14,000 and I'm going to write a for loop 699 00:33:14,000 --> 00:33:16,000 that then 700 00:33:16,000 --> 00:33:23,000 -I don't know. Let's just run it until we see a duplicate. 701 00:33:23,000 --> 00:33:25,000 I'm going to 702 00:33:25,000 --> 00:33:29,000 generate a number between 1 and 100. If 703 00:33:29,000 --> 00:33:32,000 seen.contains that number, 704 00:33:32,000 --> 00:33:34,000 then I'm gonna break. 705 00:33:34,000 --> 00:33:37,000 Let me do a count. No, they'll be fine. 706 00:33:37,000 --> 00:33:41,000 Otherwise, I'm going to add it. 707 00:33:41,000 --> 00:33:43,000 And then I'm gonna 708 00:33:43,000 --> 00:33:44,000 print 709 00:33:44,000 --> 00:33:46,000 710 00:33:46,000 --> 00:33:48,000 found 711 00:33:48,000 --> 00:33:51,000 seen.size 712 00:33:51,000 --> 00:33:55,000 or repeat. 713 00:33:55,000 --> 00:33:59,000 So what you might be hoping is that if random number 714 00:33:59,000 --> 00:34:01,000 generator was still truly random 715 00:34:01,000 --> 00:34:01,000 716 00:34:01,000 --> 00:34:04,000 but still a little bit predictable -in this case, if I asked it for the numbers between 1 and 717 00:34:04,000 --> 00:34:07,000 100, you'd like to think it would take about 100 iterations before it would get 718 00:34:07,000 --> 00:34:08,000 back to a number that's seen. 719 00:34:08,000 --> 00:34:12,000 But certainly given it's random, it's not actually picking and choosing 720 00:34:12,000 --> 00:34:13,000 without replacement. 721 00:34:13,000 --> 00:34:17,000 They're just kinda bopping around the space 1 to 100, and it may very well 722 00:34:17,000 --> 00:34:20,000 light back on a number it's already seen before it would get around to some of the 723 00:34:20,000 --> 00:34:21,000 other numbers. 724 00:34:21,000 --> 00:34:24,000 So when you run this, you're kinda curious to see what is it like. 725 00:34:24,000 --> 00:34:26,000 Let's find out. 726 00:34:26,000 --> 00:34:31,000 Every time I do this I always think, gWow. The random number generator really not that random,h but 727 00:34:31,000 --> 00:34:33,000 we'll see what today says. I'm 728 00:34:33,000 --> 00:34:36,000 including random to get access to random integer. 729 00:34:36,000 --> 00:34:38,000 And I'm going to put 730 00:34:38,000 --> 00:34:39,000 my set here. 731 00:34:39,000 --> 00:34:42,000 And I'm gonna add one called randomize here at the top. 732 00:34:42,000 --> 00:34:46,000 That randomize initialized the library so that we have 733 00:34:46,000 --> 00:34:50,000 a new random sequence for this particular run which will allow us to have different 734 00:34:50,000 --> 00:34:51,000 results from run to run. It 735 00:34:51,000 --> 00:34:54,000 said 24 numbers before repeat. 736 00:34:54,000 --> 00:34:56,000 Okay. Let's run that again. 737 00:34:56,000 --> 00:35:00,000 Seven before repeat. Let's 738 00:35:00,000 --> 00:35:03,000 try it again. 20 before repeat. 739 00:35:03,000 --> 00:35:08,000 So showing you a little bit about the -seven again. Six. 740 00:35:08,000 --> 00:35:09,000 Ten. 741 00:35:09,000 --> 00:35:11,000 742 00:35:11,000 --> 00:35:12,000 Like is it ever going to get close to 100? 743 00:35:12,000 --> 00:35:14,000 No. 744 00:35:14,000 --> 00:35:15,000 So 745 00:35:15,000 --> 00:35:19,000 it just tells you that it bops around, but it actually does not 746 00:35:19,000 --> 00:35:22,000 -it is pseudorandom, which is one of the words that computer scientists use to say, 747 00:35:22,000 --> 00:35:26,000 gYeah, don't count on it being really random.h I would not wanna run my casino 748 00:35:26,000 --> 00:35:30,000 on the basis of numbers being generated by your average C++ 749 00:35:30,000 --> 00:35:31,000 random library. 750 00:35:31,000 --> 00:35:34,000 There you go. So in this case, just using it for 751 00:35:34,000 --> 00:35:34,000 containment 752 00:35:34,000 --> 00:35:38,000 so that each time through I can add that new number and then check 753 00:35:38,000 --> 00:35:42,000 the contains. Both those operations act operating very quickly. If I did this 754 00:35:42,000 --> 00:35:45,000 instead with a vector, I could accomplish the same thing, but it'd just be more 755 00:35:45,000 --> 00:35:45,000 manual. 756 00:35:45,000 --> 00:35:47,000 I'd stick it on the end, 757 00:35:47,000 --> 00:35:50,000 and if I wanted to see if it was there, contains is not an operation 758 00:35:50,000 --> 00:35:53,000 vector supports, so I'd have to walk through the vector from zero to the length minus one, 759 00:35:53,000 --> 00:35:56,000 and try to do the matching myself, 760 00:35:56,000 --> 00:35:59,000 which gets slower and slower as the vector gets longer. 761 00:35:59,000 --> 00:36:02,000 It's just a hassle to write that code, so having contains around 762 00:36:02,000 --> 00:36:05,000 saved us a little bit. 763 00:36:05,000 --> 00:36:06,000 If I want to print 764 00:36:06,000 --> 00:36:10,000 my sets, why don't go ahead and just at the bottom show that the iterator gets 765 00:36:10,000 --> 00:36:11,000 used on the set, 766 00:36:11,000 --> 00:36:13,000 767 00:36:13,000 --> 00:36:17,000 has the similar kind of formed name as the map, so the iterator -in this 768 00:36:17,000 --> 00:36:21,000 case capital I is a nested class declared within the set. 769 00:36:21,000 --> 00:36:25,000 I ask the set to give me its iterator 770 00:36:25,000 --> 00:36:28,000 while there are things left 771 00:36:28,000 --> 00:36:31,000 to pull out, 772 00:36:31,000 --> 00:36:39,000 then I will pull out it and print it. 773 00:36:39,000 --> 00:36:43,000 And so there's the sequence of let's say 20 or so numbers. 774 00:36:43,000 --> 00:36:45,000 So this 775 00:36:45,000 --> 00:36:47,000 highlights something that's a little bit 776 00:36:47,000 --> 00:36:52,000 distinctive about the iterator as it applies to the set is that those numbers 777 00:36:52,000 --> 00:36:55,000 -15, 16, 22, 24, 37 -are coming out in increasing 778 00:36:55,000 --> 00:36:57,000 order 779 00:36:57,000 --> 00:37:00,000 all the way down to the bottom. And if I run it again, 780 00:37:00,000 --> 00:37:01,000 781 00:37:01,000 --> 00:37:04,000 I will get the same 782 00:37:04,000 --> 00:37:07,000 effect, but perhaps on a shorter list. 783 00:37:07,000 --> 00:37:09,000 That the sets iterator 784 00:37:09,000 --> 00:37:11,000 is not 785 00:37:11,000 --> 00:37:14,000 as unpredictable as the map iterator was 786 00:37:14,000 --> 00:37:18,000 -that the set iterator -that in fact part of how the set is able 787 00:37:18,000 --> 00:37:21,000 to supply its operations so efficiently is that internally it is using some notion of 788 00:37:21,000 --> 00:37:22,000 sorting. 789 00:37:22,000 --> 00:37:26,000 It is keeping track of things by ordering. In this case, the increasing 790 00:37:26,000 --> 00:37:28,000 order is the default 791 00:37:28,000 --> 00:37:30,000 strategy for how it lines things up 792 00:37:30,000 --> 00:37:33,000 -and that the iterator takes advantage of that. It is internally 793 00:37:33,000 --> 00:37:36,000 storing in sorted order. It might as well just use the iterator to walk them in sorted order, and that tends to 794 00:37:36,000 --> 00:37:39,000 be convenient for somebody who wants to process this set. 795 00:37:39,000 --> 00:37:43,000 So as a result, you can count on that, that when you're using a set, 796 00:37:43,000 --> 00:37:47,000 that the iterator will put out the values from kinda smallest to largest. So 797 00:37:47,000 --> 00:37:51,000 for example, for strings it would be the lexicographic ordering, that kind of slightly 798 00:37:51,000 --> 00:37:54,000 alphabetical thing based on ASCII codes 799 00:37:54,000 --> 00:37:58,000 that it would produce them in. In numbers, they'll go from smallest to largest 800 00:37:58,000 --> 00:38:02,000 as a convenience, and it happens to be kinda nice for just browsing it in alphabetical 801 00:38:02,000 --> 00:38:04,000 order. Often it's something that's handy 802 00:38:04,000 --> 00:38:10,000 to be able to do. We 803 00:38:10,000 --> 00:38:14,000 head back over here. 804 00:38:14,000 --> 00:38:15,000 And so the 805 00:38:15,000 --> 00:38:16,000 code we just 806 00:38:16,000 --> 00:38:18,000 wrote, printing the set, 807 00:38:18,000 --> 00:38:20,000 doing the random test, 808 00:38:20,000 --> 00:38:25,000 I can apparently reproduce my own code on demand. And then here 809 00:38:25,000 --> 00:38:29,000 it just shows a little bit of some of the fancier features you can start going 810 00:38:29,000 --> 00:38:31,000 once you have sets in play. 811 00:38:31,000 --> 00:38:33,000 So if I were keeping track of 812 00:38:33,000 --> 00:38:36,000 for the friends that I know who they consider to be their friends, and 813 00:38:36,000 --> 00:38:39,000 who they consider to be their enemies -of course, those are very small lists I'm sure 814 00:38:39,000 --> 00:38:40,000 for you indeed, 815 00:38:40,000 --> 00:38:42,000 but if I were putting together a party, 816 00:38:42,000 --> 00:38:45,000 what I might wanna do is say let's invite everybody friends with and everybody 817 00:38:45,000 --> 00:38:48,000 I'm friends with, but then let's make sure we don't invite anybody that one of 818 00:38:48,000 --> 00:38:50,000 us doesn't like, that's on our enemy list. 819 00:38:50,000 --> 00:38:52,000 And so by starting with 820 00:38:52,000 --> 00:38:57,000 a copy of the friends that one has, unioning that with the two's friends, right 821 00:38:57,000 --> 00:39:02,000 now I've got the enlarged set which includes -note that I did a set string 822 00:39:02,000 --> 00:39:06,000 result equals one friends. That actually does a complete deep copy, so the map and the 823 00:39:06,000 --> 00:39:09,000 set just like our other containers know how to copy themselves and produce new things. So 824 00:39:09,000 --> 00:39:11,000 I got a new set that was 825 00:39:11,000 --> 00:39:13,000 initialized with the contents of one's friends. 826 00:39:13,000 --> 00:39:17,000 I took that set and destructively modified it to add two's friends, so now 827 00:39:17,000 --> 00:39:20,000 that result has an enlarged circle that 828 00:39:20,000 --> 00:39:22,000 -and then I'm further destructively modifying it 829 00:39:22,000 --> 00:39:27,000 to remove anyone who was on my enemy list and your enemy list 830 00:39:27,000 --> 00:39:28,000 to get us down to kind of the 831 00:39:28,000 --> 00:39:31,000 happy 832 00:39:31,000 --> 00:39:35,000 set of folks that either of us likes and neither of us 833 00:39:35,000 --> 00:39:38,000 dislikes to do that. So things that you can imagine if you had 834 00:39:38,000 --> 00:39:40,000 to do this with vectors would start to get pretty ugly. 835 00:39:40,000 --> 00:39:44,000 Walking down your list and my list to see who's matching, to see who you have, 836 00:39:44,000 --> 00:39:45,000 I 837 00:39:45,000 --> 00:39:48,000 don't, to make sure that everybody got one and only one invitation. The set is 838 00:39:48,000 --> 00:39:51,000 automatically handling all the coalescing of duplicates for us. 839 00:39:51,000 --> 00:39:53,000 And then allowing you this kind of 840 00:39:53,000 --> 00:39:55,000 high level expression of your 841 00:39:55,000 --> 00:39:58,000 actions is very powerful. If in the 842 00:39:58,000 --> 00:40:02,000 end what you're trying to model is this kind of rearrangement, then being 843 00:40:02,000 --> 00:40:04,000 able to union, subtract, intersect 844 00:40:04,000 --> 00:40:07,000 really helps the clarity of your code. Somebody can just read it and say, gOh, I see what 845 00:40:07,000 --> 00:40:10,000 they're doing here,h doing set operations 846 00:40:10,000 --> 00:40:16,000 to get to where we wanna be. Question? [Inaudible] 847 00:40:16,000 --> 00:40:18,000 the same enemies, so when you subtract 848 00:40:18,000 --> 00:40:20,000 one's enemies, 849 00:40:20,000 --> 00:40:27,000 and then you [inaudible] again with two it's not there. It turns out subtract will say remove these things if they're present. So if 850 00:40:27,000 --> 00:40:30,000 you and I have some of the same enemies, after I've removed them from your list, I'll 851 00:40:30,000 --> 00:40:33,000 remove them from my list. It turns out they won't be there, but it doesn't complain. 852 00:40:33,000 --> 00:40:37,000 Like the remove operation, if you ask it to subtract some things, 853 00:40:37,000 --> 00:40:38,000 854 00:40:38,000 --> 00:40:39,000 it doesn't get upset about it. 855 00:40:39,000 --> 00:40:46,000 It just skips over them basically. 856 00:40:46,000 --> 00:40:48,000 The set though 857 00:40:48,000 --> 00:40:52,000 is a little bit more quirky than the others in one kind of peculiar way. 858 00:40:52,000 --> 00:40:56,000 I'm gonna get you -foreshadow this. I'm actually gonna do more 859 00:40:56,000 --> 00:40:58,000 serious work on this on Friday 860 00:40:58,000 --> 00:41:02,000 -is that the other containers kinda store and retrieve things. They're putting them in buckets, or 861 00:41:02,000 --> 00:41:05,000 in slabs, or boxes, or whatever, and returning them back to you. But set is 862 00:41:05,000 --> 00:41:09,000 actually really has a much more intimate relationship with the data that it's 863 00:41:09,000 --> 00:41:10,000 storing 864 00:41:10,000 --> 00:41:13,000 that it really is examining the things that you give it 865 00:41:13,000 --> 00:41:16,000 to make sure for example that any existing copy 866 00:41:16,000 --> 00:41:18,000 is 867 00:41:18,000 --> 00:41:21,000 coalesced with this one, that when you ask it to go find something it has to do 868 00:41:21,000 --> 00:41:23,000 some kind of matching operation to say do I have something that looks like 869 00:41:23,000 --> 00:41:27,000 this. It's doing this sorting internally, as I said. It's keeping them ordered. 870 00:41:27,000 --> 00:41:28,000 Well, it needs to know something about 871 00:41:28,000 --> 00:41:29,000 how to do that ordering, 872 00:41:29,000 --> 00:41:32,000 what it means for two elements to be the same, what it means for one element to 873 00:41:32,000 --> 00:41:36,000 precede another or follow another in some ordering. 874 00:41:36,000 --> 00:41:37,000 Okay. 875 00:41:37,000 --> 00:41:39,000 But set is written as a template. 876 00:41:39,000 --> 00:41:43,000 Set takes any kind of elem type thing. 877 00:41:43,000 --> 00:41:46,000 How is it you compare two things 878 00:41:46,000 --> 00:41:47,000 generically? 879 00:41:47,000 --> 00:41:49,000 880 00:41:49,000 --> 00:41:50,000 That actually is not 881 00:41:50,000 --> 00:41:52,000 an easy problem. 882 00:41:52,000 --> 00:41:57,000 I'll show you a little bit off this code, and then we'll talk about this on Friday more -is that 883 00:41:57,000 --> 00:42:00,000 one way you could do it is you could assume that if I take the two elements, I could just 884 00:42:00,000 --> 00:42:03,000 use equals equals and less than. If I just 885 00:42:03,000 --> 00:42:06,000 plug them into the built-in operators and say, gTell me. Are they equal? What does equal 886 00:42:06,000 --> 00:42:07,000 equal say? 887 00:42:07,000 --> 00:42:10,000 Is one of them less than the other? Tell me that.h 888 00:42:10,000 --> 00:42:14,000 Using the built-in operators will work for some types of things you would store. It'll 889 00:42:14,000 --> 00:42:17,000 work for ints. It'll work doubles. It'll work for characters. 890 00:42:17,000 --> 00:42:19,000 So all the primitive types 891 00:42:19,000 --> 00:42:24,000 have relational operator behavior. Even things like string, which is a sort of 892 00:42:24,000 --> 00:42:26,000 fancier type 893 00:42:26,000 --> 00:42:29,000 also has behavior for equals equals and less than. But 894 00:42:29,000 --> 00:42:32,000 you start throwing things like student structures into a set, 895 00:42:32,000 --> 00:42:35,000 and I say take two students and say if they're equal equal, we're gonna run into 896 00:42:35,000 --> 00:42:36,000 some trouble. 897 00:42:36,000 --> 00:42:39,000 So I'll show you that error message, and we'll talk more about what we can do 898 00:42:39,000 --> 00:42:43,000 about it when I see you again on Friday. If you 899 00:42:43,000 --> 00:42:46,000 have something you need to talk to me about, now would be a good time before I run away, so if you wanted to see me today at