2
00:00:10,000 --> 00:00:11,000


3
00:00:11,000 --> 00:00:14,000
This presentation is delivered by the Stanford Center for Professional 

4
00:00:14,000 --> 00:00:23,000
Development. 

5
00:00:23,000 --> 00:00:25,000
So we've seen vector, grid, stack, and queue, 

6
00:00:25,000 --> 00:00:28,000
which is what we were doing on Friday, 

7
00:00:28,000 --> 00:00:31,000
and these four are the group we call the sequential containers 

8
00:00:31,000 --> 00:00:34,000
where they're storing and retrieving things on the basis of sequence. 

9
00:00:34,000 --> 00:00:38,000
You place things LIFO into the stack, last in first out. You put things in 

10
00:00:38,000 --> 00:00:39,000
the vector 

11
00:00:39,000 --> 00:00:43,000
by index, retrieved by index. There's a sequence that's being 

12
00:00:43,000 --> 00:00:46,000
maintained behind the scenes to store and retrieve the data. 

13
00:00:46,000 --> 00:00:48,000
All of these containers 

14
00:00:48,000 --> 00:00:51,000
have the property that they don't really examine your elements. They just store them. 

15
00:00:51,000 --> 00:00:54,000
So you can put things into a stack, but it never really looks at them. It never looks 

16
00:00:54,000 --> 00:00:56,000
at the strings you put in or the 

17
00:00:56,000 --> 00:00:58,000
students that you put in the vector. 

18
00:00:58,000 --> 00:01:02,000
It just holds onto it, and then when asked to give you something back on N queue -I mean a 

19
00:01:02,000 --> 00:01:05,000
D queue operation or a get at or something like that, it retrieves what you 

20
00:01:05,000 --> 00:01:07,000
earlier placed there. 

21
00:01:07,000 --> 00:01:11,000
These are not very fancy. They do things that are very useful, 

22
00:01:11,000 --> 00:01:14,000
but the bang for the buck is 

23
00:01:14,000 --> 00:01:18,000
mostly in terms of just modeling a particular behavior like the stack and the queue, 

24
00:01:18,000 --> 00:01:21,000
and then allowing you to kind of use it easily without error. 

25
00:01:21,000 --> 00:01:24,000
The associative containers, which is what map and set belong to, 

26
00:01:24,000 --> 00:01:27,000
are the ones where you're really getting a lot of power out of them, 

27
00:01:27,000 --> 00:01:31,000
where they do something that actually is hard, or inconvenient, or 

28
00:01:31,000 --> 00:01:33,000
inefficient for you to do manually. 

29
00:01:33,000 --> 00:01:36,000
They're not about sequencing. They're about relationships. The 

30
00:01:36,000 --> 00:01:40,000
set is a little bit like a vector in the sense that it's a collection of things 

31
00:01:40,000 --> 00:01:41,000
without duplication, but 

32
00:01:41,000 --> 00:01:45,000
it has fast operations for checking membership, adding things and removing 

33
00:01:45,000 --> 00:01:47,000
things, checking to see if something's in the set 

34
00:01:47,000 --> 00:01:50,000
in a way that the vector you'd end up kind of trudging through it to see 

35
00:01:50,000 --> 00:01:52,000
do I already have this value in here. 

36
00:01:52,000 --> 00:01:55,000
There's no easy way to do that with a vector as it stands. 

37
00:01:55,000 --> 00:01:59,000
And so set gives you this kind of access for is this in the set, is it not, 

38
00:01:59,000 --> 00:02:03,000
as well as a bunch of operations that allow you to do high level 

39
00:02:03,000 --> 00:02:06,000
combinations like take this set and union it with another, or intersect it, or 

40
00:02:06,000 --> 00:02:07,000
subtract this one from that, 

41
00:02:07,000 --> 00:02:11,000
and get the mathematical properties of sets expressed in code. 

42
00:02:11,000 --> 00:02:12,000
The map, 

43
00:02:12,000 --> 00:02:16,000
even fancier in some sense, is a key value pairing 

44
00:02:16,000 --> 00:02:19,000
that you want to be able to do some kind of look up by key. I wanna have a 

45
00:02:19,000 --> 00:02:22,000
license plate number, and I wanna know what car that's assigned to, and what 

46
00:02:22,000 --> 00:02:25,000
its make, and model, and owner is. 

47
00:02:25,000 --> 00:02:28,000
The map is exactly the thing you want for that where you have 

48
00:02:28,000 --> 00:02:32,000
some sort of key, or some sort of tag, or identifying ID 

49
00:02:32,000 --> 00:02:36,000
that you wanna associate some larger thing with, or some other thing with for that 

50
00:02:36,000 --> 00:02:39,000
matter, and be able to look that up, and make that pairing, and retrieving that 

51
00:02:39,000 --> 00:02:41,000
pairing information quickly. 

52
00:02:41,000 --> 00:02:45,000
Both of these are designed for very efficient access. It's gonna be possible 

53
00:02:45,000 --> 00:02:46,000
to do 

54
00:02:46,000 --> 00:02:52,000
these lookups, and additions, and combinations in 

55
00:02:52,000 --> 00:02:55,000
much faster than you would be able to do if you were doing it manually. We're gonna 

56
00:02:55,000 --> 00:02:58,000
peek under the hood in a couple weeks and see how that implementation works 

57
00:02:58,000 --> 00:02:59,000
out and why, 

58
00:02:59,000 --> 00:03:02,000
but for now what's pretty cool as a client is you just get to do neat things with 

59
00:03:02,000 --> 00:03:04,000
it, stick stuff into it, check things, 

60
00:03:04,000 --> 00:03:05,000
build things on top of it, 

61
00:03:05,000 --> 00:03:09,000
taking advantage of all the power that's there through the kind of simple 

62
00:03:09,000 --> 00:03:11,000
interface without having to learn 

63
00:03:11,000 --> 00:03:14,000
what crazy internals behind it are. 

64
00:03:14,000 --> 00:03:18,000
It's not something we have to worry about in the first pass. So 

65
00:03:18,000 --> 00:03:20,000
we'll talk about map. As 

66
00:03:20,000 --> 00:03:24,000
I said, it's a collection of key value pairs sort of modeling a dictionary, a word 

67
00:03:24,000 --> 00:03:26,000
and its definition. 

68
00:03:26,000 --> 00:03:29,000
It could be that it's the student ID and it's their transcript. 

69
00:03:29,000 --> 00:03:33,000
It could be that it's an index which tells you what a word -and what page as it 

70
00:03:33,000 --> 00:03:35,000
appears in this text. 

71
00:03:35,000 --> 00:03:37,000
So there's a key. There's a value. 

72
00:03:37,000 --> 00:03:41,000
You have operations that once you create a map where you can add a new pair, so you add 

73
00:03:41,000 --> 00:03:44,000
a key and a value together and stick them in there. 

74
00:03:44,000 --> 00:03:47,000
If you attempt to add a 

75
00:03:47,000 --> 00:03:49,000
different value for an existing key, it will actually overwrite. 

76
00:03:49,000 --> 00:03:54,000
So at any given time, there's exactly one value associated per 

77
00:03:54,000 --> 00:03:58,000
key. You can access the elements that you stored earlier saying I've got a key, give 

78
00:03:58,000 --> 00:03:59,000
me the associated value. 

79
00:03:59,000 --> 00:04:03,000
I can check to see whether a key is contained in the map, just if I wanna 

80
00:04:03,000 --> 00:04:05,000
know is there some entry already there. 

81
00:04:05,000 --> 00:04:07,000
And a couple of the fancy 

82
00:04:07,000 --> 00:04:10,000
features we'll get to in a second have some shorthand operators, and some 

83
00:04:10,000 --> 00:04:13,000
access that allow you browsing. 

84
00:04:13,000 --> 00:04:17,000
Some examples of things that maps are really good for -so maps turns out to be just 

85
00:04:17,000 --> 00:04:19,000
ubiquitous across computer science 

86
00:04:19,000 --> 00:04:22,000
problem solving. You probably saw the hashmap in your 106A 

87
00:04:22,000 --> 00:04:23,000
class, and 

88
00:04:23,000 --> 00:04:24,000
you got a little bit of work 

89
00:04:24,000 --> 00:04:26,000
out of that toward the end of the quarter. 

90
00:04:26,000 --> 00:04:31,000
This guy, there's just a thousand uses for which a map is a great tool. 

91
00:04:31,000 --> 00:04:35,000
A dictionary is an obvious one, a thesaurus where you have a word mapped to 

92
00:04:35,000 --> 00:04:39,000
synonyms, the DMV which has license plate tags 

93
00:04:39,000 --> 00:04:42,000
that tell you the three cars I've owned in my life 

94
00:04:42,000 --> 00:04:44,000
and what their license plate numbers were 

95
00:04:44,000 --> 00:04:46,000


96
00:04:46,000 --> 00:04:47,000
-any sort of thing that looks like that 

97
00:04:47,000 --> 00:04:51,000
fits in the map's goal. 

98
00:04:51,000 --> 00:04:54,000
Let's take a look at what the interface actually looks like. 

99
00:04:54,000 --> 00:04:56,000
This is a little bit 

100
00:04:56,000 --> 00:04:59,000
simplified. I removed a couple of the advance features I'm gonna get to in a second. 

101
00:04:59,000 --> 00:05:01,000
The class name is map. 

102
00:05:01,000 --> 00:05:04,000
It's templatized on the value type. 

103
00:05:04,000 --> 00:05:07,000
If you look at the add operation, 

104
00:05:07,000 --> 00:05:08,000
it takes a string 

105
00:05:08,000 --> 00:05:11,000
type for the key and a value type for the value, 

106
00:05:11,000 --> 00:05:15,000
and that means that the map is templatized on only one 

107
00:05:15,000 --> 00:05:17,000
of the half of the pair, that 

108
00:05:17,000 --> 00:05:20,000
the pair is always a key which is a string mapped to some other thing, whether 

109
00:05:20,000 --> 00:05:25,000
it's a student structure, or a vector of 

110
00:05:25,000 --> 00:05:28,000
synonyms, or a string which represents a definition. That value 

111
00:05:28,000 --> 00:05:32,000
type can vary based on what the client chooses to fill in the template parameter 

112
00:05:32,000 --> 00:05:35,000
with, but the keys are always strings. 

113
00:05:35,000 --> 00:05:37,000
I'll kinda get to why that is a little bit later but just to note it while we're 

114
00:05:37,000 --> 00:05:38,000
here. 

115
00:05:38,000 --> 00:05:42,000
So things like remove, and contains key, and get value that all operate on a key 

116
00:05:42,000 --> 00:05:45,000
always take a string, and then what they return 

117
00:05:45,000 --> 00:05:48,000
or access is 

118
00:05:48,000 --> 00:05:51,000
templatized by this value type. 

119
00:05:51,000 --> 00:05:54,000
So there's one little thing to need to know a little about how the interface works. 

120
00:05:54,000 --> 00:05:58,000
When you create a map, it is empty. It has no pairs. If you ask it for the size, it tells 

121
00:05:58,000 --> 00:06:00,000
you the number of pairs in it. 

122
00:06:00,000 --> 00:06:05,000
If you ever add something to a key that was already present 

123
00:06:05,000 --> 00:06:07,000
-so if earlier I'd said Julie's phone number is this, 

124
00:06:07,000 --> 00:06:10,000
and then I later went in and tried to add under Julie another phone number, it will 

125
00:06:10,000 --> 00:06:14,000
replace, so there will not be entries for Julie. There's always exactly one. 

126
00:06:14,000 --> 00:06:15,000
So add 

127
00:06:15,000 --> 00:06:17,000
sometimes is incrementing the size, 

128
00:06:17,000 --> 00:06:20,000
and other times the size will be unchanged but will have overwritten 

129
00:06:20,000 --> 00:06:22,000
some previous pair. 

130
00:06:22,000 --> 00:06:23,000
Remove, 

131
00:06:23,000 --> 00:06:26,000
if there was a matching value will decrement the size. If actually there wasn't a 

132
00:06:26,000 --> 00:06:28,000
matching entry, it makes no changes. 

133
00:06:28,000 --> 00:06:30,000
Contains key can tell you whether something's in there. 

134
00:06:30,000 --> 00:06:32,000
And then get value 

135
00:06:32,000 --> 00:06:35,000
has a little bit of a quirk in its interface that you'll need to be aware of. If you 

136
00:06:35,000 --> 00:06:39,000
ask it to retrieve a key that doesn't exist, 

137
00:06:39,000 --> 00:06:42,000
it's got a little bit of a problem there. If you say give me the phone number for 

138
00:06:42,000 --> 00:06:44,000
Julie, 

139
00:06:44,000 --> 00:06:45,000
it's supposed to return to you 

140
00:06:45,000 --> 00:06:49,000
the previously associated value, something of value type. 

141
00:06:49,000 --> 00:06:53,000
Well, if you ask it to get a value for something and there was no pair that 

142
00:06:53,000 --> 00:06:54,000
matched that, 

143
00:06:54,000 --> 00:06:56,000
it has a little bit of a 

144
00:06:56,000 --> 00:06:59,000
conundrum about what to return. For example, 

145
00:06:59,000 --> 00:07:04,000
if this table were storing 

146
00:07:04,000 --> 00:07:07,000
numbers -maybe you were storing the latitude and longitude of a city. Well, when 

147
00:07:07,000 --> 00:07:10,000
if you asked it to return the value for a city it doesn't know about, 

148
00:07:10,000 --> 00:07:13,000
what is it supposed to return? What is the default -some 

149
00:07:13,000 --> 00:07:17,000
sort of sentinel value that says no latitude, no longitude. It says, “Well, what 

150
00:07:17,000 --> 00:07:20,000
is that?” If you were getting strings -if you had definitions, you might return 

151
00:07:20,000 --> 00:07:23,000
the empty string. If you were returning numbers, it might be that what you 

152
00:07:23,000 --> 00:07:28,000
wanna return was zero or negative one. There is not for all types of values 

153
00:07:28,000 --> 00:07:29,000
some 

154
00:07:29,000 --> 00:07:32,000
clear identifiable sentinel that would be appropriate to return. 

155
00:07:32,000 --> 00:07:34,000
So what happens actually in get value 

156
00:07:34,000 --> 00:07:38,000
is if you ask it to retrieve a value for something that doesn't exist, it throws 

157
00:07:38,000 --> 00:07:39,000
an error message. 

158
00:07:39,000 --> 00:07:41,000
There is no other sensible 

159
00:07:41,000 --> 00:07:43,000
return value it knows of, 

160
00:07:43,000 --> 00:07:46,000
and so it treats it actually as a drastic situation. 

161
00:07:46,000 --> 00:07:50,000
That means if you are probing the table for something 

162
00:07:50,000 --> 00:07:53,000
that you traditionally wanna be checking contains key if you're not 

163
00:07:53,000 --> 00:07:54,000
positive that it's already in there. 

164
00:07:54,000 --> 00:07:57,000
If you do know, it's fine to go ahead and get value through, but if you 

165
00:07:57,000 --> 00:08:00,000
are querying it, 

166
00:08:00,000 --> 00:08:05,000
it's a two-step process to verify it's there before you go get it. 

167
00:08:05,000 --> 00:08:09,000
So I got a little bit of code. I'm gonna go actually just type this code rather than show it to 

168
00:08:09,000 --> 00:08:12,000
you prewritten because I think it's easier to learn something from when I actually 

169
00:08:12,000 --> 00:08:14,000
type it, so I'm gonna go ahead and just show you. 

170
00:08:14,000 --> 00:08:19,000
So what I have here is a little bit of code I wrote before you got here 

171
00:08:19,000 --> 00:08:23,000
which is just designed to take a particular input text file and break it 

172
00:08:23,000 --> 00:08:23,000
apart 

173
00:08:23,000 --> 00:08:27,000
using stream extraction into words, in this 

174
00:08:27,000 --> 00:08:31,000
case tokens that are separated by white space using the default stream extraction. 

175
00:08:31,000 --> 00:08:32,000
So it has a 

176
00:08:32,000 --> 00:08:35,000
while true loop that keeps reading until it fails, and in this case it just 

177
00:08:35,000 --> 00:08:37,000
prints them back out. 

178
00:08:37,000 --> 00:08:39,000
What I'm gonna do is I'm actually gonna gather those words, 

179
00:08:39,000 --> 00:08:43,000
and I'm gonna put them into a map, 

180
00:08:43,000 --> 00:08:45,000
so this'll be my plan. 

181
00:08:45,000 --> 00:08:46,000
So let's build a map 

182
00:08:46,000 --> 00:08:47,000
that's gonna map 

183
00:08:47,000 --> 00:08:49,000


184
00:08:49,000 --> 00:08:51,000
words to counts, so 

185
00:08:51,000 --> 00:08:55,000
the number of times a word occurs in a document 

186
00:08:55,000 --> 00:08:58,000
will be stored under that key. So the first time I see “the,” I'll put a one in, and 

187
00:08:58,000 --> 00:09:01,000
the next time I see a “the,” I'll update that one into a two, 

188
00:09:01,000 --> 00:09:04,000
and so on. And when I'm done, I should have a complete 

189
00:09:04,000 --> 00:09:07,000
table of all the words that occurred in the document 

190
00:09:07,000 --> 00:09:10,000
with the counts of the number of times they occurred. 

191
00:09:10,000 --> 00:09:14,000
So let me go ahead and I'm gonna 

192
00:09:14,000 --> 00:09:17,000
pass it up here by reference so I can fill it in with something, 

193
00:09:17,000 --> 00:09:20,000
and then what I'm gonna do here 

194
00:09:20,000 --> 00:09:23,000
-I'm gonna say if M contains key 

195
00:09:23,000 --> 00:09:23,000
word 

196
00:09:23,000 --> 00:09:26,000
-so this means that there was an existing 

197
00:09:26,000 --> 00:09:28,000
entry. 

198
00:09:28,000 --> 00:09:30,000
I'm gonna add to it 

199
00:09:30,000 --> 00:09:43,000
under the word -let's 

200
00:09:43,000 --> 00:09:44,000
take a look at what we did here. 

201
00:09:44,000 --> 00:09:46,000
If it was already there, 

202
00:09:46,000 --> 00:09:50,000
then I'm gonna get the value that was previously stored underneath it, add one 

203
00:09:50,000 --> 00:09:54,000
to it, and then stick it back into the table, so that add is gonna overwrite the 

204
00:09:54,000 --> 00:09:57,000
previous value. If it was four, it's gonna retrieve the four, add one to it, and then 

205
00:09:57,000 --> 00:09:58,000
overwrite it with a five. 

206
00:09:58,000 --> 00:10:02,000
In the case where it didn't contain that key, so no occurrence has been seen 

207
00:10:02,000 --> 00:10:06,000
yet, we go ahead and establish a new entry in the able using m.add 

208
00:10:06,000 --> 00:10:10,000
of the word with the count one. 

209
00:10:10,000 --> 00:10:11,000
Yeah? Could you 

210
00:10:11,000 --> 00:10:14,000
try the remove number function if there's no value there to develop the [inaudible]? 

211
00:10:14,000 --> 00:10:17,000
It does not, actually. So remove can actually -partly it has to do with a 

212
00:10:17,000 --> 00:10:20,000
little bit about can you do something reasonable in that case. 

213
00:10:20,000 --> 00:10:22,000
Remove in this case can say there's nothing to remove. Get value 

214
00:10:22,000 --> 00:10:25,000
just can't do anything reasonable. It's supposed to return you something. It's 

215
00:10:25,000 --> 00:10:28,000
like there's nothing to return. I can't make it up, 

216
00:10:28,000 --> 00:10:34,000
and that's why that's considered a more drastic situation. Question? You 

217
00:10:34,000 --> 00:10:37,000
passed the map 

218
00:10:37,000 --> 00:10:39,000
on the wrong 

219
00:10:39,000 --> 00:10:39,000
line. 

220
00:10:39,000 --> 00:10:42,000
Oh, you're right. I totally did. Thank you very much. That was gonna give me quite an 

221
00:10:42,000 --> 00:10:46,000
error when I got that. 

222
00:10:46,000 --> 00:10:47,000
There we go. 

223
00:10:47,000 --> 00:10:48,000
Okay, fixed that. Thank you very much. 

224
00:10:48,000 --> 00:10:50,000
And so then maybe when I'm done I could say 

225
00:10:50,000 --> 00:10:53,000
num unique words, 

226
00:10:53,000 --> 00:10:57,000
and then m.size. 

227
00:10:57,000 --> 00:10:59,000
So if we let this guy go, it's 

228
00:10:59,000 --> 00:11:01,000
gonna complain all over the place because it says 

229
00:11:01,000 --> 00:11:03,000
what's this map of which you speak. Then we'll tell 

230
00:11:03,000 --> 00:11:03,000
it 

231
00:11:03,000 --> 00:11:09,000
here's a header file you'd like to derive. Okay. 

232
00:11:09,000 --> 00:11:13,000
So there were 512 unique words in this document. It's 

233
00:11:13,000 --> 00:11:17,000
the text of some handout we had earlier that I just quickly put in a text file. 

234
00:11:17,000 --> 00:11:21,000
So this shows sort of the basic usage pattern will typically be I'm sticking stuff in 

235
00:11:21,000 --> 00:11:24,000
there, and intentionally updating and changing, 

236
00:11:24,000 --> 00:11:28,000
and then I in this case was using a count at the end. 

237
00:11:28,000 --> 00:11:31,000
I'm gonna note one funky little thing about the map while I'm here 

238
00:11:31,000 --> 00:11:33,000
because 

239
00:11:33,000 --> 00:11:36,000
map has a shorthand access that 

240
00:11:36,000 --> 00:11:40,000
allows you to retrieve the value associated with a key 

241
00:11:40,000 --> 00:11:44,000
using a syntax that looks a little bit bizarre, but it's kinda becoming 

242
00:11:44,000 --> 00:11:45,000
common 

243
00:11:45,000 --> 00:11:48,000
-is to instead of saying m.get value word, 

244
00:11:48,000 --> 00:11:49,000
that you say m 

245
00:11:49,000 --> 00:11:52,000
of square brackets of word. 

246
00:11:52,000 --> 00:11:53,000


247
00:11:53,000 --> 00:11:57,000
So that looks a little bit like an array access, and the first time I saw 

248
00:11:57,000 --> 00:11:58,000
this I have to say I was 

249
00:11:58,000 --> 00:12:00,000
just shocked, and I thought, “Oh my god. What an abuse 

250
00:12:00,000 --> 00:12:04,000
of the notational system,” but it has become so ubiquitous in languages now 

251
00:12:04,000 --> 00:12:06,000
that I actually don't consider it so frightening anymore. 

252
00:12:06,000 --> 00:12:08,000
It's basically saying 

253
00:12:08,000 --> 00:12:09,000
reach into the table, 

254
00:12:09,000 --> 00:12:13,000
and instead of using an index to identify which element you're doing, you're using a key and 

255
00:12:13,000 --> 00:12:15,000
saying reach into the bucket 

256
00:12:15,000 --> 00:12:18,000
that's tagged with this index and get the value back out. So this is effectively an 

257
00:12:18,000 --> 00:12:20,000
m.get 

258
00:12:20,000 --> 00:12:21,000
value of word but 

259
00:12:21,000 --> 00:12:22,000
using this syntax. 

260
00:12:22,000 --> 00:12:25,000
There's something that's a little bit 

261
00:12:25,000 --> 00:12:26,000
wacky bout 

262
00:12:26,000 --> 00:12:31,000
this syntax though. It's not exactly equivalent to 

263
00:12:31,000 --> 00:12:34,000
get value in that it 

264
00:12:34,000 --> 00:12:37,000
doesn't mind 

265
00:12:37,000 --> 00:12:40,000
if you try to access something that doesn't exist. 

266
00:12:40,000 --> 00:12:41,000
And what it will do 

267
00:12:41,000 --> 00:12:45,000
is its strategy is if you access a word 

268
00:12:45,000 --> 00:12:47,000
m of some word where this is not an existing key, 

269
00:12:47,000 --> 00:12:49,000
it will create a pair, 

270
00:12:49,000 --> 00:12:52,000
it will tag it with the key you asked for, 

271
00:12:52,000 --> 00:12:56,000
and then it will kind of leave the associated value as to whatever the default value 

272
00:12:56,000 --> 00:12:59,000
for that type is. So in this case it's almost like you declared an int variable on 

273
00:12:59,000 --> 00:13:02,000
the stack, what do you get? It turns out you'll get garbage. 

274
00:13:02,000 --> 00:13:07,000
You'll get kind of an uninitialized number which is of very little value, but 

275
00:13:07,000 --> 00:13:10,000
the side effect of then having set this pair was up that we could immediately 

276
00:13:10,000 --> 00:13:12,000
go in, and write on top of it, and 

277
00:13:12,000 --> 00:13:15,000
make the assignment into the one. And so whatever junk condits were 

278
00:13:15,000 --> 00:13:18,000
there were only temporary, and then I immediately overwrote it 

279
00:13:18,000 --> 00:13:19,000
with that. 

280
00:13:19,000 --> 00:13:25,000
In effect, that also means I can do things like this 

281
00:13:25,000 --> 00:13:26,000
where I 

282
00:13:26,000 --> 00:13:28,000
do m.word 

283
00:13:28,000 --> 00:13:32,000
plus equals one, 

284
00:13:32,000 --> 00:13:35,000
and now I'm using it both for the add and the get value part. So the m 

285
00:13:35,000 --> 00:13:37,000
bracket form kind of 

286
00:13:37,000 --> 00:13:40,000
handles both the things you think of as being add -set adding something into 

287
00:13:40,000 --> 00:13:46,000
there, overwriting or adding a new value, as well as 

288
00:13:46,000 --> 00:13:48,000
just retrieving it when you wanna read it like a get value. 

289
00:13:48,000 --> 00:13:52,000
So you'll see both of these used, but they do have a little bit of a quirky 

290
00:13:52,000 --> 00:13:53,000
behavior that 

291
00:13:53,000 --> 00:13:56,000
using the get value without the 

292
00:13:56,000 --> 00:13:58,000
existing key throws an error. Using it 

293
00:13:58,000 --> 00:14:01,000
this way actually kinda sets up an empty error that the assumption is that you're using it 

294
00:14:01,000 --> 00:14:06,000
in this context where you're trying to immediately create it and overwrite it. 

295
00:14:06,000 --> 00:14:09,000
Question? Does the case not matter 

296
00:14:09,000 --> 00:14:11,000


297
00:14:11,000 --> 00:14:15,000
for map? If it's ? Case does matter. If I had the word capital T “the” 

298
00:14:15,000 --> 00:14:17,000
and lowercase “the,” those are different words, so it 

299
00:14:17,000 --> 00:14:23,000
really does -basically what you think of a string equals comparisons at the base level. If I 

300
00:14:23,000 --> 00:14:26,000
wanted it to be case insensitive, then I would need to actually go out of my way to 

301
00:14:26,000 --> 00:14:28,000
convert the keys to 

302
00:14:28,000 --> 00:14:29,000
uniform case 

303
00:14:29,000 --> 00:14:31,000
to guarantee that they would match other 

304
00:14:31,000 --> 00:14:37,000
forms of the word. 

305
00:14:37,000 --> 00:14:39,000
So let me go back -and 

306
00:14:39,000 --> 00:14:41,000
that's a little bit of code there. 

307
00:14:41,000 --> 00:14:43,000
So let me 

308
00:14:43,000 --> 00:14:46,000
just review a little bit about where we're at. That shorthand operator 

309
00:14:46,000 --> 00:14:47,000
that we have 

310
00:14:47,000 --> 00:14:52,000
-I said I would explain why is it that keys have to be string type. 

311
00:14:52,000 --> 00:14:55,000
That if we're trying really hard to build these general purpose containers, it seems like it 

312
00:14:55,000 --> 00:14:59,000
would be extra convenient if we could say the key is an integer, 

313
00:14:59,000 --> 00:15:01,000
and the mapping is 

314
00:15:01,000 --> 00:15:03,000
-we could use our 

315
00:15:03,000 --> 00:15:05,000
student ID numbers, and it's a 

316
00:15:05,000 --> 00:15:08,000
student's record that's there. Why does it have to be string type? 

317
00:15:08,000 --> 00:15:12,000
Well, it turns out that it really does use the known structure of strings 

318
00:15:12,000 --> 00:15:16,000
to store them efficiently. It's capitalizing on certain properties that 

319
00:15:16,000 --> 00:15:18,000
strings and strings only have 

320
00:15:18,000 --> 00:15:22,000
to decide how to store those things so that it can actually do this very fast lookup 

321
00:15:22,000 --> 00:15:26,000
that does not generally apply to other types of things. I couldn't say 

322
00:15:26,000 --> 00:15:29,000
the key is a student structure and you have to match student structures. It's like 

323
00:15:29,000 --> 00:15:33,000
there's not easy efficient ways to match any kind of value type, but there 

324
00:15:33,000 --> 00:15:35,000
are certain things you can do with strings and strings only that it's 

325
00:15:35,000 --> 00:15:37,000
capitalizing on. 

326
00:15:37,000 --> 00:15:40,000
So in the case where you have something and you're trying to use a key that isn't already in a 

327
00:15:40,000 --> 00:15:42,000
string type, 

328
00:15:42,000 --> 00:15:45,000
the recommended way to get around that is you have to figure out a way to convert it to a 

329
00:15:45,000 --> 00:15:47,000
string. So if you have an 

330
00:15:47,000 --> 00:15:50,000
int, a student ID number that you wanna use, or a phone number, something like that, 

331
00:15:50,000 --> 00:15:54,000
you just convert it into a string form. You take the integer; you make it 

332
00:15:54,000 --> 00:15:55,000
into a string. 

333
00:15:55,000 --> 00:15:56,000


334
00:15:56,000 --> 00:15:59,000
If you have a first name and a last name, and you wanna use them both, then you concatenate 

335
00:15:59,000 --> 00:16:03,000
them together, and use that string as the key -just ways of building a string out 

336
00:16:03,000 --> 00:16:05,000
of what you have. 

337
00:16:05,000 --> 00:16:07,000
It's a little bit of a hack, but 

338
00:16:07,000 --> 00:16:07,000
it 

339
00:16:07,000 --> 00:16:12,000
solves the problem without too much trouble in most situations. The 

340
00:16:12,000 --> 00:16:16,000
other question that comes up is what if I have more than one value for a key? 

341
00:16:16,000 --> 00:16:18,000
The thesaurus example I gave earlier, 

342
00:16:18,000 --> 00:16:20,000
the key is the word. 

343
00:16:20,000 --> 00:16:25,000
The thing I want to map it to is the synonyms, that list of words. 

344
00:16:25,000 --> 00:16:26,000
If I did add a 

345
00:16:26,000 --> 00:16:28,000
word with each synonym, 

346
00:16:28,000 --> 00:16:32,000
each subsequent synonym would just replace any previous pairing I'd put in. 

347
00:16:32,000 --> 00:16:35,000
If what I really want is a 

348
00:16:35,000 --> 00:16:37,000
one to many relationship, 

349
00:16:37,000 --> 00:16:39,000
one word many options, 

350
00:16:39,000 --> 00:16:41,000
then what I can do is just use vector as the value, 

351
00:16:41,000 --> 00:16:43,000
so build a map 

352
00:16:43,000 --> 00:16:47,000
containing vectors containing strings. So I'll get a nested template out of that, 

353
00:16:47,000 --> 00:16:51,000
and then when I wanna add a new synonym to the map, it's a matter of 

354
00:16:51,000 --> 00:16:55,000
pulling out the existing map that's there and adding it to the end of that vector, so it actually is kind of a two 

355
00:16:55,000 --> 00:16:59,000
step add to 

356
00:16:59,000 --> 00:17:02,000
get it into the vector and get that vector into the table. 

357
00:17:02,000 --> 00:17:04,000
So the last thing that's missing 

358
00:17:04,000 --> 00:17:08,000
is one that I'll go back to my code and we can do together. 

359
00:17:08,000 --> 00:17:10,000
Once I've put the information in the table, 

360
00:17:10,000 --> 00:17:14,000
I hope you believe me that I can get it back fast. I can check to see if there's an 

361
00:17:14,000 --> 00:17:17,000
existing entry and do the lookup by key, 

362
00:17:17,000 --> 00:17:21,000
put new entries in, replace it, overwrite -all of that in the interface so far 

363
00:17:21,000 --> 00:17:22,000
seems just fine. 

364
00:17:22,000 --> 00:17:25,000
But let's say I just wanna see that whole table. 

365
00:17:25,000 --> 00:17:29,000
I wanna print my class list. I wanna see the thesaurus or the dictionary 

366
00:17:29,000 --> 00:17:32,000
just spewed out, or just actually walk through it and look for new words. 

367
00:17:32,000 --> 00:17:37,000
As it stands, the interface doesn't give you that access. They're not indexed. 

368
00:17:37,000 --> 00:17:40,000
They're not in there under Slot 0, Slot 1, or Slot 2. It's not like a 

369
00:17:40,000 --> 00:17:43,000
stack or a queue where you can just dequeue them back out to see all the things you put in 

370
00:17:43,000 --> 00:17:46,000
there. They kinda go in and so far it's the roach motel. You can put them in, and if you 

371
00:17:46,000 --> 00:17:49,000
know who you're looking for, you can get them back, 

372
00:17:49,000 --> 00:17:52,000
but there's no way to kind of scan the contents. 

373
00:17:52,000 --> 00:17:56,000
What we're gonna see is that map provides access to the elements 

374
00:17:56,000 --> 00:17:58,000
using a tool called an iterator. 

375
00:17:58,000 --> 00:18:02,000
This is the same tool that Java uses, so if you've seen that, you're already ahead of 

376
00:18:02,000 --> 00:18:03,000
the game. 

377
00:18:03,000 --> 00:18:04,000
Where 

378
00:18:04,000 --> 00:18:07,000
you would like to see the entries in it, and rather than giving you a whole vector or some 

379
00:18:07,000 --> 00:18:10,000
other kind of random access version of it, 

380
00:18:10,000 --> 00:18:13,000
it provides you with a little intermediary. 

381
00:18:13,000 --> 00:18:15,000
In this case, this class is called iterator. 

382
00:18:15,000 --> 00:18:17,000
You create an iterator 

383
00:18:17,000 --> 00:18:18,000


384
00:18:18,000 --> 00:18:20,000
on 

385
00:18:20,000 --> 00:18:22,000
the class. Here, let me just go 

386
00:18:22,000 --> 00:18:25,000
write the code. It's better to see that. So if I got here and I said I wanna see what 

387
00:18:25,000 --> 00:18:26,000
they are, 

388
00:18:26,000 --> 00:18:31,000
what I do is I create an iterator. The map's iterator name is a little bit 

389
00:18:31,000 --> 00:18:34,000
wacky. 

390
00:18:34,000 --> 00:18:37,000
There are iterators on both the map and the set, and to distinguish them because 

391
00:18:37,000 --> 00:18:42,000
they're similar but not exactly the same thing, the iterator type is actually 

392
00:18:42,000 --> 00:18:46,000
declared within the map class itself. So it's actually map 

393
00:18:46,000 --> 00:18:50,000
of the type in angle brackets colon colon iterator is the name of the 

394
00:18:50,000 --> 00:18:51,000
iterator for 

395
00:18:51,000 --> 00:18:55,000
an iterator that walks over a map containing integers. 

396
00:18:55,000 --> 00:18:57,000
We get one of those 

397
00:18:57,000 --> 00:18:58,000
from the map 

398
00:18:58,000 --> 00:19:02,000
by asking for it with the iterator member function. 

399
00:19:02,000 --> 00:19:04,000
So that sets you up an iterator that's positioned to kind of work its way 

400
00:19:04,000 --> 00:19:05,000
through it. 

401
00:19:05,000 --> 00:19:09,000
The map iterator makes no guarantee about 

402
00:19:09,000 --> 00:19:12,000
what order it will visit them, 

403
00:19:12,000 --> 00:19:14,000
but it will visit them all. It 

404
00:19:14,000 --> 00:19:21,000
uses a syntax that looks like this. And 

405
00:19:21,000 --> 00:19:22,000
I say 

406
00:19:22,000 --> 00:19:26,000
iter.hasNext -are there more things to retrieve? So the iterator's kind of keeping track of 

407
00:19:26,000 --> 00:19:29,000
where we are as we're working our way through the map. 

408
00:19:29,000 --> 00:19:32,000
If there are more keys we haven't yet visited, haven't yet retrieved from the 

409
00:19:32,000 --> 00:19:36,000
iterator, it will return true in which case the call to iter.next will 

410
00:19:36,000 --> 00:19:39,000
retrieve the next key to be visited, 

411
00:19:39,000 --> 00:19:41,000
and then advance the iter, 

412
00:19:41,000 --> 00:19:45,000
so it will have moved past that and kind of moved that to the already visited side, 

413
00:19:45,000 --> 00:19:48,000
and now it will continue to work on the things that are 

414
00:19:48,000 --> 00:19:51,000
unvisited. As I keep doing that, iter.next is 

415
00:19:51,000 --> 00:19:55,000
both retrieving it and advancing it, so you typically wanna call that once inside the 

416
00:19:55,000 --> 00:19:55,000
loop 

417
00:19:55,000 --> 00:19:57,000
to get the value, 

418
00:19:57,000 --> 00:20:01,000
and then check that hasNext again to see whether there's more and keep going. 

419
00:20:01,000 --> 00:20:02,000
It gives you just the key. 

420
00:20:02,000 --> 00:20:06,000
It doesn't give you the pair. The pair is just an easy call away, 

421
00:20:06,000 --> 00:20:07,000
though. 

422
00:20:07,000 --> 00:20:08,000
So I can say 

423
00:20:08,000 --> 00:20:12,000
key 

424
00:20:12,000 --> 00:20:13,000
equals 

425
00:20:13,000 --> 00:20:17,000
and then M of key. 

426
00:20:17,000 --> 00:20:22,000
And so using the shorthand syntax or the M -the get value, 

427
00:20:22,000 --> 00:20:25,000
either way once I have a key, it's very easy to look up the value. In fact, it's so 

428
00:20:25,000 --> 00:20:28,000
efficient, it means there's actually really no harm in 

429
00:20:28,000 --> 00:20:31,000
just getting the key and going back in to get the value separately. It's still works out 

430
00:20:31,000 --> 00:20:33,000
just fine. 

431
00:20:33,000 --> 00:20:36,000
So when I do this, this should actually iterate through, 

432
00:20:36,000 --> 00:20:38,000
show me every entry in the 

433
00:20:38,000 --> 00:20:40,000
map, 

434
00:20:40,000 --> 00:20:41,000
and the 

435
00:20:41,000 --> 00:20:42,000
count for 

436
00:20:42,000 --> 00:20:45,000
it. So 

437
00:20:45,000 --> 00:20:48,000
one thing I will note is the sequence by which it's traveling through 

438
00:20:48,000 --> 00:20:49,000
seems to be 

439
00:20:49,000 --> 00:20:52,000
effectively random. 

440
00:20:52,000 --> 00:20:56,000
You see my reasonable experience few following somewhere using 

441
00:20:56,000 --> 00:20:58,000
the numbers 1, 4, 2, 

442
00:20:58,000 --> 00:21:02,000
so there is no pattern. There is no rhyme or reason. It turns out it is 

443
00:21:02,000 --> 00:21:06,000
actually internally you're doing something kinda sensible, but it does not guarantee as 

444
00:21:06,000 --> 00:21:07,000
for the iterator 

445
00:21:07,000 --> 00:21:11,000
anything predictable to you about which sequence it will visit them in. It 

446
00:21:11,000 --> 00:21:14,000
says I will get them all before it's all done and said, 

447
00:21:14,000 --> 00:21:18,000
but don't count on seeing them in alphabetical order, reverse alphabetical order, in 

448
00:21:18,000 --> 00:21:20,000
order of increasing 

449
00:21:20,000 --> 00:21:22,000
value or anything clever like that. 

450
00:21:22,000 --> 00:21:23,000
It will get to them 

451
00:21:23,000 --> 00:21:35,000
all. That's all you can be sure of. 

452
00:21:35,000 --> 00:21:38,000
That is the [inaudible] that's up here. Question? On the previous 

453
00:21:38,000 --> 00:21:41,000
page, on the previous slide, you mentioned something about 

454
00:21:41,000 --> 00:21:43,000
[inaudible] 

455
00:21:43,000 --> 00:21:49,000
finding the [inaudible], you would have to use a vector for that? Yeah. So if I had a 

456
00:21:49,000 --> 00:21:52,000
one to many in that case, so maybe it was synonym to vector of strings, then I 

457
00:21:52,000 --> 00:21:56,000
could use the iterator to go through and say how big is this vector compared 

458
00:21:56,000 --> 00:21:59,000
to the other vectors I've seen so far. So I keep track of let's say the [inaudible] so 

459
00:21:59,000 --> 00:22:00,000
far. So you'd run an iterator 

460
00:22:00,000 --> 00:22:03,000
that -so let's just say I wanna find the most frequent word. That's 

461
00:22:03,000 --> 00:22:04,000
about the same operation. Let 

462
00:22:04,000 --> 00:22:06,000
me come back over here. 

463
00:22:06,000 --> 00:22:10,000
If I just wanna see the most frequent, I could do this. 

464
00:22:10,000 --> 00:22:12,000
I could say 

465
00:22:12,000 --> 00:22:14,000
string max, 

466
00:22:14,000 --> 00:22:17,000
and I'll say int 

467
00:22:17,000 --> 00:22:19,000
max count equals zero. 

468
00:22:19,000 --> 00:22:23,000
So the max is empty here. When I get this thing, I say if 

469
00:22:23,000 --> 00:22:26,000
M of key 

470
00:22:26,000 --> 00:22:30,000
is greater than my max count, 

471
00:22:30,000 --> 00:22:35,000
then I want my max to be the key, and my max count to be M 

472
00:22:35,000 --> 00:22:40,000
of the key. So I've 

473
00:22:40,000 --> 00:22:42,000
started right with no assumptions about what the max is, 

474
00:22:42,000 --> 00:22:45,000
and any time I see something that's greater than the max I have so far, I 

475
00:22:45,000 --> 00:22:48,000
update that. So if this were a vector, I would be checking to see that the size 

476
00:22:48,000 --> 00:22:52,000
of a value is bigger than the size of the previous vector. 

477
00:22:52,000 --> 00:22:53,000
When I'm done, 

478
00:22:53,000 --> 00:22:56,000
I can say max 

479
00:22:56,000 --> 00:22:58,000
equals 

480
00:22:58,000 --> 00:23:04,000
max 

481
00:23:04,000 --> 00:23:08,000
count. It turns out “the.” The most common word in there? What a 

482
00:23:08,000 --> 00:23:10,000
surprise, right? It shows up 

483
00:23:10,000 --> 00:23:12,000
42 times. So the same sort of [inaudible] would be used 

484
00:23:12,000 --> 00:23:15,000
if there was some other thing I wanna do. So what the iterator is just giving us 

485
00:23:15,000 --> 00:23:17,000
is a general-purpose way of 

486
00:23:17,000 --> 00:23:19,000
walking through all the values. And then what you wanna do with them when you get 

487
00:23:19,000 --> 00:23:22,000
there is your business. Do you wanna print them? Do you wanna put them in a file? Do you wanna 

488
00:23:22,000 --> 00:23:26,000
compare them to find the top two, or the smallest, or the biggest, or 

489
00:23:26,000 --> 00:23:28,000
whatever is 

490
00:23:28,000 --> 00:23:30,000
up to you. So it's just giving you 

491
00:23:30,000 --> 00:23:33,000
access to that data that you stuck in there 

492
00:23:33,000 --> 00:23:39,000
and get it back out. So what does iter.next do? 

493
00:23:39,000 --> 00:23:40,000
Iter.next is -think of 

494
00:23:40,000 --> 00:23:44,000
it as an abstraction. The idea is that it's almost like it's got a little 

495
00:23:44,000 --> 00:23:46,000
pointer. It's kind of like if I were in charge of walking 

496
00:23:46,000 --> 00:23:48,000
through this class and returning each student, it 

497
00:23:48,000 --> 00:23:50,000
is a pointer 

498
00:23:50,000 --> 00:23:53,000
that given a certain student will return you the student I'm currently pointing 

499
00:23:53,000 --> 00:23:53,000
at, 

500
00:23:53,000 --> 00:23:56,000
and then move to another random student. And at any given point when I'm done them 

501
00:23:56,000 --> 00:24:00,000
all, that hasNext returns false. And so next does two things, which is 

502
00:24:00,000 --> 00:24:03,000
return to you the next key that you haven't yet seen, 

503
00:24:03,000 --> 00:24:07,000
but advances kind of the iterator as a side effect so you can now test for are 

504
00:24:07,000 --> 00:24:11,000
there any more, and if so, the next call will return 

505
00:24:11,000 --> 00:24:12,000
a different one. 

506
00:24:12,000 --> 00:24:15,000
So every time you call, it returns a new unvisited 

507
00:24:15,000 --> 00:24:16,000


508
00:24:16,000 --> 00:24:17,000
key in the map until 

509
00:24:17,000 --> 00:24:21,000
they're all gone. [Inaudible] or 

510
00:24:21,000 --> 00:24:22,000
the next one? 

511
00:24:22,000 --> 00:24:26,000
Well, it returns the next one in the iteration. You can imagine that it's almost like 

512
00:24:26,000 --> 00:24:29,000
a caret that's between characters at any 

513
00:24:29,000 --> 00:24:32,000
given point. It will return the next character, 

514
00:24:32,000 --> 00:24:36,000
but if you call it again, it will return a new one. So it never returns the same 

515
00:24:36,000 --> 00:24:38,000
value unless there was the same 

516
00:24:38,000 --> 00:24:39,000
thing being iterated over. 

517
00:24:39,000 --> 00:24:44,000
So it advances and returns in one step. 

518
00:24:44,000 --> 00:24:47,000
That is key. There are some iterators that don't have that design. They tend to 

519
00:24:47,000 --> 00:24:50,000
return the current one, and then you side effect, the STL iterators for example. 

520
00:24:50,000 --> 00:24:53,000
In this one is kind of more of a Java style iterator. It does both. 

521
00:24:53,000 --> 00:24:57,000
So if you needed to use the value a couple times now, you'd probably store it in a local variable, and 

522
00:24:57,000 --> 00:24:59,000
then use it throughout that thing, and then 

523
00:24:59,000 --> 00:25:07,000
only call iter.next once in each loop. 

524
00:25:07,000 --> 00:25:12,000
There is an alternate way of doing something to every 

525
00:25:12,000 --> 00:25:12,000
pair in a map. I'm 

526
00:25:12,000 --> 00:25:15,000
actually just not gonna show it to you 

527
00:25:15,000 --> 00:25:16,000
because I think actually it's 

528
00:25:16,000 --> 00:25:20,000
something that's overkill. So there is something you can read about in Handout 

529
00:25:20,000 --> 00:25:23,000
14 and look at which is called the mapping function. 

530
00:25:23,000 --> 00:25:26,000
I'm gonna actually just skip it because I think it confuses the 

531
00:25:26,000 --> 00:25:30,000
issue more than it helps. When you know how to do iteration, it's a great way to get 

532
00:25:30,000 --> 00:25:32,000
access to all things in a map, and since the 

533
00:25:32,000 --> 00:25:34,000
other function is actually just a more complicated way to get at the same 

534
00:25:34,000 --> 00:25:39,000
functionality, I won't bore you with it. 

535
00:25:39,000 --> 00:25:41,000
Then let me tell you about set. 

536
00:25:41,000 --> 00:25:42,000
Once you have set, 

537
00:25:42,000 --> 00:25:44,000
you've got it all. 

538
00:25:44,000 --> 00:25:46,000
So map I think is the heavy lifter 

539
00:25:46,000 --> 00:25:48,000
of the class library. The set 

540
00:25:48,000 --> 00:25:53,000
is kind of a close second best in terms of just everyday utility 

541
00:25:53,000 --> 00:25:54,000
value. 

542
00:25:54,000 --> 00:25:57,000
What set is is a little bit like a vector in some ways, 

543
00:25:57,000 --> 00:26:01,000
but it has the added features that 

544
00:26:01,000 --> 00:26:05,000
there is no sequencing implied or 

545
00:26:05,000 --> 00:26:07,000
maintained by it. If you have the set that you've added 3 and 5 to, 

546
00:26:07,000 --> 00:26:10,000
if you add 5 and 3 in the other order, you end up with the same set when you're 

547
00:26:10,000 --> 00:26:13,000
done. Those two sets are equal. They have the same contents, so 

548
00:26:13,000 --> 00:26:17,000
it doesn't consider the order of insertion as important at all. It's just about 

549
00:26:17,000 --> 00:26:20,000
membership. Does it contain these values? 

550
00:26:20,000 --> 00:26:23,000
There are never any duplicate elements. If you have a set containing 3 and 5, and you 

551
00:26:23,000 --> 00:26:26,000
attempt to add 3 again, you don't get the set 3, 3, 5. You just get 3, 5. So kind of like 

552
00:26:26,000 --> 00:26:29,000
the mathematical property that it derives its name from, there 

553
00:26:29,000 --> 00:26:33,000
I just existence. Is the number in or not? 

554
00:26:33,000 --> 00:26:37,000
And adding it again does not change 

555
00:26:37,000 --> 00:26:39,000
its status if it was already there. 

556
00:26:39,000 --> 00:26:41,000
So your typical usage is you make an empty set. 

557
00:26:41,000 --> 00:26:45,000
You can use add, remove, and contains to operate on individual elements. Add this one. 

558
00:26:45,000 --> 00:26:46,000
Remove that one. 

559
00:26:46,000 --> 00:26:48,000
Contains 

560
00:26:48,000 --> 00:26:52,000
-remove has the same behavior that it did on the 

561
00:26:52,000 --> 00:26:53,000


562
00:26:53,000 --> 00:26:57,000
map which is it kind of silently fails if you ask it to remove something that wasn't there. 

563
00:26:57,000 --> 00:27:00,000
If you ask add to add something that was already there, it silently doesn't change 

564
00:27:00,000 --> 00:27:02,000
anything, so both of those operations just kind of 

565
00:27:02,000 --> 00:27:06,000
quietly deal with the cases that might be a little bit unusual. And then contains will give 

566
00:27:06,000 --> 00:27:10,000
you information about does this number exist in the set. All of those are very efficient, 

567
00:27:10,000 --> 00:27:14,000
can be done many, many times a microsecond without taking any 

568
00:27:14,000 --> 00:27:15,000
processor time at all, so 

569
00:27:15,000 --> 00:27:18,000
that means it's very handy for just throwing things in a set and then later wanting 

570
00:27:18,000 --> 00:27:22,000
to know if it's there, as opposed to walking your way through a vector 

571
00:27:22,000 --> 00:27:25,000
piecemeal to see if something was there. So for example, a vector doesn't have 

572
00:27:25,000 --> 00:27:28,000
anything like this. If you wanna know if 

573
00:27:28,000 --> 00:27:29,000


574
00:27:29,000 --> 00:27:32,000
Julie is in your list of students, and you have them in a vector, the only way to 

575
00:27:32,000 --> 00:27:35,000
know is to walk through it Sub 1, Sub 2, Sub 3 until you 

576
00:27:35,000 --> 00:27:38,000
find it or you run out of entries to check. 

577
00:27:38,000 --> 00:27:41,000
The contains operation does a blinding fast sort of check 

578
00:27:41,000 --> 00:27:44,000
and can almost immediately tell you is Julie in there or not, whether your entries 

579
00:27:44,000 --> 00:27:46,000
are a thousand, 

580
00:27:46,000 --> 00:27:49,000
a million, a billion, 

581
00:27:49,000 --> 00:27:53,000
pretty much immediate access to dig it out. 

582
00:27:53,000 --> 00:27:56,000
It also offers these high level operations, and 

583
00:27:56,000 --> 00:27:59,000
these actually are where sets are particularly valuable is this 

584
00:27:59,000 --> 00:28:02,000
idea of unioning, intersection, subtracting, checking to see whether two sets are equal, so whether they 

585
00:28:02,000 --> 00:28:05,000
have all the same numbers, or whether one has a subset of another 

586
00:28:05,000 --> 00:28:08,000
-that allow you to model -a lot of times the thing you want to 

587
00:28:08,000 --> 00:28:12,000
take your code to do is something that in the real world needs a subset or an 

588
00:28:12,000 --> 00:28:14,000
intersect kind of operation. 

589
00:28:14,000 --> 00:28:17,000
You'd like to know if the courses you have taken 

590
00:28:17,000 --> 00:28:20,000
meet the requirements you need to graduate. Well, if the requirements 

591
00:28:20,000 --> 00:28:23,000
to graduate are a subset of what you take, whether they're a proper subset or not, if they're 

592
00:28:23,000 --> 00:28:25,000
all in there, then you can graduate. 

593
00:28:25,000 --> 00:28:27,000
If you wanna know 

594
00:28:27,000 --> 00:28:30,000
if you and I are taking any classes together, or have any requirements 

595
00:28:30,000 --> 00:28:33,000
that we both need to do, we can take the requirements, see what we have, subtract 

596
00:28:33,000 --> 00:28:35,000
what's left, intersect that 

597
00:28:35,000 --> 00:28:38,000
to find out what we have left to see if there's any classes we could take 

598
00:28:38,000 --> 00:28:41,000
together and both satisfy requirements. 

599
00:28:41,000 --> 00:28:44,000
Any kind of compound Boolean queries like when you're trying to do 

600
00:28:44,000 --> 00:28:47,000
searches on an index, you wanna see pages that have this word, and this word, 

601
00:28:47,000 --> 00:28:49,000
or that word, and not that 

602
00:28:49,000 --> 00:28:51,000
word are very easily modeled in terms of sets. 

603
00:28:51,000 --> 00:28:54,000
You have the set that have A and the set that have B. If you 

604
00:28:54,000 --> 00:28:57,000
intersect them, it will tell you about which things have both. 

605
00:28:57,000 --> 00:29:01,000
If you wanna see the “or,” you can use the union and find out 

606
00:29:01,000 --> 00:29:02,000
which things have either. 

607
00:29:02,000 --> 00:29:04,000
Sometimes you use sets 

608
00:29:04,000 --> 00:29:07,000
simply for this idea of coalescing duplicates. 

609
00:29:07,000 --> 00:29:10,000
If I just wanted to see the set of words that were in my file, and I don't care about 

610
00:29:10,000 --> 00:29:12,000
how many times a word occurs, 

611
00:29:12,000 --> 00:29:15,000
I could just dump them all into a set, and then when I'm done, I will know 

612
00:29:15,000 --> 00:29:18,000
what the 512 unique words are, 

613
00:29:18,000 --> 00:29:19,000
and I will have 

614
00:29:19,000 --> 00:29:22,000
not had to chase down did I already have a copy of that word because the set's add 

615
00:29:22,000 --> 00:29:28,000
just automatically coalesces with any existing entry that matches. 

616
00:29:28,000 --> 00:29:29,000
So let me show 

617
00:29:29,000 --> 00:29:31,000
you its interface. This is probably 

618
00:29:31,000 --> 00:29:34,000
the biggest of all our classes in terms of number of 

619
00:29:34,000 --> 00:29:36,000
member functions there, 

620
00:29:36,000 --> 00:29:38,000
and features of them. 

621
00:29:38,000 --> 00:29:41,000
I'm not gonna talk about the constructor just yet because it has 

622
00:29:41,000 --> 00:29:45,000
something scary in it that we're gonna come back to which is a little bit fancy 

623
00:29:45,000 --> 00:29:48,000
in terms of how it works. 

624
00:29:48,000 --> 00:29:49,000


625
00:29:49,000 --> 00:29:52,000
There's a default argument over there. I'll note that, so that means that actually if 

626
00:29:52,000 --> 00:29:53,000
you don't specify, 

627
00:29:53,000 --> 00:29:57,000
in some situations the set doesn't need that argument and can kind of figure out for 

628
00:29:57,000 --> 00:29:58,000
itself what to do. 

629
00:29:58,000 --> 00:30:02,000
So we can create a set. We can ask for the size. Is empty 

630
00:30:02,000 --> 00:30:04,000
-just checking to see if the size is zero. 

631
00:30:04,000 --> 00:30:06,000
The things that operate on a single element here, 

632
00:30:06,000 --> 00:30:09,000
adding, removing, containing 

633
00:30:09,000 --> 00:30:12,000
-and then ones that operate on the set as a whole comparing it to another 

634
00:30:12,000 --> 00:30:14,000
set. 

635
00:30:14,000 --> 00:30:18,000
That set there is one of the few places where you're allowed to use the name set 

636
00:30:18,000 --> 00:30:22,000
without the qualification. That's because we're in the template, 

637
00:30:22,000 --> 00:30:25,000
and in this situation, this set without qualification is assumed to be whatever 

638
00:30:25,000 --> 00:30:27,000
set is currently being built. 

639
00:30:27,000 --> 00:30:30,000
So if I have a set of int, 

640
00:30:30,000 --> 00:30:33,000
the equals method for set of int 

641
00:30:33,000 --> 00:30:35,000
expects another set of int as the argument. 

642
00:30:35,000 --> 00:30:39,000
Same for is subset, and union, and intersect, so it actually kinda matches it out 

643
00:30:39,000 --> 00:30:42,000
correctly all the way through, so you can only union sets of integers 

644
00:30:42,000 --> 00:30:46,000
with other sets of integers. That should make sense to you. 

645
00:30:46,000 --> 00:30:51,000
These operations return the Boolean. These guys actually destructively 

646
00:30:51,000 --> 00:30:55,000
modify the receiver. So you have a set that you're a 

647
00:30:55,000 --> 00:30:56,000
messaging with a union with. 

648
00:30:56,000 --> 00:31:00,000
You give it another set. It will join into the receiver set, so the 

649
00:31:00,000 --> 00:31:03,000
receiver set will be enlarged by whatever new members are contributed by 

650
00:31:03,000 --> 00:31:04,000
the other set. 

651
00:31:04,000 --> 00:31:08,000
Intersect could potentially shrink this set 

652
00:31:08,000 --> 00:31:11,000
down to just those elements that are in common with this other one. And then 

653
00:31:11,000 --> 00:31:15,000
subtract takes all the elements that are in there that are present in here and removes them. 

654
00:31:15,000 --> 00:31:17,000
So you can think of those as 

655
00:31:17,000 --> 00:31:18,000


656
00:31:18,000 --> 00:31:20,000
just modeling what the 

657
00:31:20,000 --> 00:31:23,000
[inaudible] formula for those operations are. 

658
00:31:23,000 --> 00:31:25,000
It also uses an iterator. 

659
00:31:25,000 --> 00:31:28,000
Like the map, once you stick them in there, they're not indexed. They're not by 

660
00:31:28,000 --> 00:31:30,000
number. There's no way to say give me the nth element. 

661
00:31:30,000 --> 00:31:32,000


662
00:31:32,000 --> 00:31:32,000


663
00:31:32,000 --> 00:31:36,000
There's no sequencing to them, so you use an iterator if you wanna walk through a set and 

664
00:31:36,000 --> 00:31:39,000
see what's in there. So let me 

665
00:31:39,000 --> 00:31:41,000
write you a little set 

666
00:31:41,000 --> 00:31:45,000
code. Any questions about its basic interface? 

667
00:31:45,000 --> 00:31:51,000
Why would somebody use a vector over a set? Sometimes you actually do care about 

668
00:31:51,000 --> 00:31:54,000
ordering, or you do want duplicates. You might have a bunch of names 

669
00:31:54,000 --> 00:31:56,000
where maybe some people are gonna have the same names, 

670
00:31:56,000 --> 00:31:58,000
or the ordering really is -that 

671
00:31:58,000 --> 00:32:01,000
you wanna know who's in what room in a dorm, you might be using the index to say it's 

672
00:32:01,000 --> 00:32:04,000
in Room 100. That's at what index? So definitely when you care about that index, 

673
00:32:04,000 --> 00:32:05,000


674
00:32:05,000 --> 00:32:07,000
and where you also want that random access 

675
00:32:07,000 --> 00:32:10,000
that I know a particular number, and I wanna see what's in that box. There really 

676
00:32:10,000 --> 00:32:14,000
isn't that same feature in a set. There's no way to get the nth member. 

677
00:32:14,000 --> 00:32:17,000
If you just wanted to pick a random member out of a set, 

678
00:32:17,000 --> 00:32:20,000
the only way to do it would be to open up the iterator and take a random 

679
00:32:20,000 --> 00:32:23,000
number of steps forward, whereas in a vector you can say just pick from zero 

680
00:32:23,000 --> 00:32:24,000
to N minus one. So there 

681
00:32:24,000 --> 00:32:28,000
are definitely things that vector can do that set doesn't, 

682
00:32:28,000 --> 00:32:31,000
and it just depends on the task which you have. Do you need 

683
00:32:31,000 --> 00:32:31,000


684
00:32:31,000 --> 00:32:35,000
duplicates? You're definitely stuck with vector. Do you care about random access? Do you 

685
00:32:35,000 --> 00:32:37,000
use the index in some interesting way? 

686
00:32:37,000 --> 00:32:40,000
Do you care about recording the sequence, like knowing who was first or last that 

687
00:32:40,000 --> 00:32:42,000
you added? 

688
00:32:42,000 --> 00:32:42,000
Set 

689
00:32:42,000 --> 00:32:47,000
doesn't track those things in the same way. So let me write 

690
00:32:47,000 --> 00:32:51,000
a little code that uses a set, 

691
00:32:51,000 --> 00:32:54,000
and 

692
00:32:54,000 --> 00:32:56,000


693
00:32:56,000 --> 00:33:00,000
let's write this. 

694
00:33:00,000 --> 00:33:03,000
I'm gonna write something that tests the random number generator. Every time I do this it 

695
00:33:03,000 --> 00:33:06,000
alarms me, but I think it's good to know. 

696
00:33:06,000 --> 00:33:08,000


697
00:33:08,000 --> 00:33:12,000
I'm gonna keep a list of the numbers I've seen, 

698
00:33:12,000 --> 00:33:14,000
and I'm going to write a for loop 

699
00:33:14,000 --> 00:33:16,000
that then 

700
00:33:16,000 --> 00:33:23,000
-I don't know. Let's just run it until we see a duplicate. 

701
00:33:23,000 --> 00:33:25,000
I'm going to 

702
00:33:25,000 --> 00:33:29,000
generate a number between 1 and 100. If 

703
00:33:29,000 --> 00:33:32,000
seen.contains that number, 

704
00:33:32,000 --> 00:33:34,000
then I'm gonna break. 

705
00:33:34,000 --> 00:33:37,000
Let me do a count. No, they'll be fine. 

706
00:33:37,000 --> 00:33:41,000
Otherwise, I'm going to add it. 

707
00:33:41,000 --> 00:33:43,000
And then I'm gonna 

708
00:33:43,000 --> 00:33:44,000
print 

709
00:33:44,000 --> 00:33:46,000


710
00:33:46,000 --> 00:33:48,000
found 

711
00:33:48,000 --> 00:33:51,000
seen.size 

712
00:33:51,000 --> 00:33:55,000
or repeat. 

713
00:33:55,000 --> 00:33:59,000
So what you might be hoping is that if random number 

714
00:33:59,000 --> 00:34:01,000
generator was still truly random 

715
00:34:01,000 --> 00:34:01,000


716
00:34:01,000 --> 00:34:04,000
but still a little bit predictable -in this case, if I asked it for the numbers between 1 and 

717
00:34:04,000 --> 00:34:07,000
100, you'd like to think it would take about 100 iterations before it would get 

718
00:34:07,000 --> 00:34:08,000
back to a number that's seen. 

719
00:34:08,000 --> 00:34:12,000
But certainly given it's random, it's not actually picking and choosing 

720
00:34:12,000 --> 00:34:13,000
without replacement. 

721
00:34:13,000 --> 00:34:17,000
They're just kinda bopping around the space 1 to 100, and it may very well 

722
00:34:17,000 --> 00:34:20,000
light back on a number it's already seen before it would get around to some of the 

723
00:34:20,000 --> 00:34:21,000
other numbers. 

724
00:34:21,000 --> 00:34:24,000
So when you run this, you're kinda curious to see what is it like. 

725
00:34:24,000 --> 00:34:26,000
Let's find out. 

726
00:34:26,000 --> 00:34:31,000
Every time I do this I always think, “Wow. The random number generator really not that random,” but 

727
00:34:31,000 --> 00:34:33,000
we'll see what today says. I'm 

728
00:34:33,000 --> 00:34:36,000
including random to get access to random integer. 

729
00:34:36,000 --> 00:34:38,000
And I'm going to put 

730
00:34:38,000 --> 00:34:39,000
my set here. 

731
00:34:39,000 --> 00:34:42,000
And I'm gonna add one called randomize here at the top. 

732
00:34:42,000 --> 00:34:46,000
That randomize initialized the library so that we have 

733
00:34:46,000 --> 00:34:50,000
a new random sequence for this particular run which will allow us to have different 

734
00:34:50,000 --> 00:34:51,000
results from run to run. It 

735
00:34:51,000 --> 00:34:54,000
said 24 numbers before repeat. 

736
00:34:54,000 --> 00:34:56,000
Okay. Let's run that again. 

737
00:34:56,000 --> 00:35:00,000
Seven before repeat. Let's 

738
00:35:00,000 --> 00:35:03,000
try it again. 20 before repeat. 

739
00:35:03,000 --> 00:35:08,000
So showing you a little bit about the -seven again. Six. 

740
00:35:08,000 --> 00:35:09,000
Ten. 

741
00:35:09,000 --> 00:35:11,000


742
00:35:11,000 --> 00:35:12,000
Like is it ever going to get close to 100? 

743
00:35:12,000 --> 00:35:14,000
No. 

744
00:35:14,000 --> 00:35:15,000
So 

745
00:35:15,000 --> 00:35:19,000
it just tells you that it bops around, but it actually does not 

746
00:35:19,000 --> 00:35:22,000
-it is pseudorandom, which is one of the words that computer scientists use to say, 

747
00:35:22,000 --> 00:35:26,000
“Yeah, don't count on it being really random.” I would not wanna run my casino 

748
00:35:26,000 --> 00:35:30,000
on the basis of numbers being generated by your average C++ 

749
00:35:30,000 --> 00:35:31,000
random library. 

750
00:35:31,000 --> 00:35:34,000
There you go. So in this case, just using it for 

751
00:35:34,000 --> 00:35:34,000
containment 

752
00:35:34,000 --> 00:35:38,000
so that each time through I can add that new number and then check 

753
00:35:38,000 --> 00:35:42,000
the contains. Both those operations act operating very quickly. If I did this 

754
00:35:42,000 --> 00:35:45,000
instead with a vector, I could accomplish the same thing, but it'd just be more 

755
00:35:45,000 --> 00:35:45,000
manual. 

756
00:35:45,000 --> 00:35:47,000
I'd stick it on the end, 

757
00:35:47,000 --> 00:35:50,000
and if I wanted to see if it was there, contains is not an operation 

758
00:35:50,000 --> 00:35:53,000
vector supports, so I'd have to walk through the vector from zero to the length minus one, 

759
00:35:53,000 --> 00:35:56,000
and try to do the matching myself, 

760
00:35:56,000 --> 00:35:59,000
which gets slower and slower as the vector gets longer. 

761
00:35:59,000 --> 00:36:02,000
It's just a hassle to write that code, so having contains around 

762
00:36:02,000 --> 00:36:05,000
saved us a little bit. 

763
00:36:05,000 --> 00:36:06,000
If I want to print 

764
00:36:06,000 --> 00:36:10,000
my sets, why don't go ahead and just at the bottom show that the iterator gets 

765
00:36:10,000 --> 00:36:11,000
used on the set, 

766
00:36:11,000 --> 00:36:13,000


767
00:36:13,000 --> 00:36:17,000
has the similar kind of formed name as the map, so the iterator -in this 

768
00:36:17,000 --> 00:36:21,000
case capital I is a nested class declared within the set. 

769
00:36:21,000 --> 00:36:25,000
I ask the set to give me its iterator 

770
00:36:25,000 --> 00:36:28,000
while there are things left 

771
00:36:28,000 --> 00:36:31,000
to pull out, 

772
00:36:31,000 --> 00:36:39,000
then I will pull out it and print it. 

773
00:36:39,000 --> 00:36:43,000
And so there's the sequence of let's say 20 or so numbers. 

774
00:36:43,000 --> 00:36:45,000
So this 

775
00:36:45,000 --> 00:36:47,000
highlights something that's a little bit 

776
00:36:47,000 --> 00:36:52,000
distinctive about the iterator as it applies to the set is that those numbers 

777
00:36:52,000 --> 00:36:55,000
-15, 16, 22, 24, 37 -are coming out in increasing 

778
00:36:55,000 --> 00:36:57,000
order 

779
00:36:57,000 --> 00:37:00,000
all the way down to the bottom. And if I run it again, 

780
00:37:00,000 --> 00:37:01,000


781
00:37:01,000 --> 00:37:04,000
I will get the same 

782
00:37:04,000 --> 00:37:07,000
effect, but perhaps on a shorter list. 

783
00:37:07,000 --> 00:37:09,000
That the sets iterator 

784
00:37:09,000 --> 00:37:11,000
is not 

785
00:37:11,000 --> 00:37:14,000
as unpredictable as the map iterator was 

786
00:37:14,000 --> 00:37:18,000
-that the set iterator -that in fact part of how the set is able 

787
00:37:18,000 --> 00:37:21,000
to supply its operations so efficiently is that internally it is using some notion of 

788
00:37:21,000 --> 00:37:22,000
sorting. 

789
00:37:22,000 --> 00:37:26,000
It is keeping track of things by ordering. In this case, the increasing 

790
00:37:26,000 --> 00:37:28,000
order is the default 

791
00:37:28,000 --> 00:37:30,000
strategy for how it lines things up 

792
00:37:30,000 --> 00:37:33,000
-and that the iterator takes advantage of that. It is internally 

793
00:37:33,000 --> 00:37:36,000
storing in sorted order. It might as well just use the iterator to walk them in sorted order, and that tends to 

794
00:37:36,000 --> 00:37:39,000
be convenient for somebody who wants to process this set. 

795
00:37:39,000 --> 00:37:43,000
So as a result, you can count on that, that when you're using a set, 

796
00:37:43,000 --> 00:37:47,000
that the iterator will put out the values from kinda smallest to largest. So 

797
00:37:47,000 --> 00:37:51,000
for example, for strings it would be the lexicographic ordering, that kind of slightly 

798
00:37:51,000 --> 00:37:54,000
alphabetical thing based on ASCII codes 

799
00:37:54,000 --> 00:37:58,000
that it would produce them in. In numbers, they'll go from smallest to largest 

800
00:37:58,000 --> 00:38:02,000
as a convenience, and it happens to be kinda nice for just browsing it in alphabetical 

801
00:38:02,000 --> 00:38:04,000
order. Often it's something that's handy 

802
00:38:04,000 --> 00:38:10,000
to be able to do. We 

803
00:38:10,000 --> 00:38:14,000
head back over here. 

804
00:38:14,000 --> 00:38:15,000
And so the 

805
00:38:15,000 --> 00:38:16,000
code we just 

806
00:38:16,000 --> 00:38:18,000
wrote, printing the set, 

807
00:38:18,000 --> 00:38:20,000
doing the random test, 

808
00:38:20,000 --> 00:38:25,000
I can apparently reproduce my own code on demand. And then here 

809
00:38:25,000 --> 00:38:29,000
it just shows a little bit of some of the fancier features you can start going 

810
00:38:29,000 --> 00:38:31,000
once you have sets in play. 

811
00:38:31,000 --> 00:38:33,000
So if I were keeping track of 

812
00:38:33,000 --> 00:38:36,000
for the friends that I know who they consider to be their friends, and 

813
00:38:36,000 --> 00:38:39,000
who they consider to be their enemies -of course, those are very small lists I'm sure 

814
00:38:39,000 --> 00:38:40,000
for you indeed, 

815
00:38:40,000 --> 00:38:42,000
but if I were putting together a party, 

816
00:38:42,000 --> 00:38:45,000
what I might wanna do is say let's invite everybody friends with and everybody 

817
00:38:45,000 --> 00:38:48,000
I'm friends with, but then let's make sure we don't invite anybody that one of 

818
00:38:48,000 --> 00:38:50,000
us doesn't like, that's on our enemy list. 

819
00:38:50,000 --> 00:38:52,000
And so by starting with 

820
00:38:52,000 --> 00:38:57,000
a copy of the friends that one has, unioning that with the two's friends, right 

821
00:38:57,000 --> 00:39:02,000
now I've got the enlarged set which includes -note that I did a set string 

822
00:39:02,000 --> 00:39:06,000
result equals one friends. That actually does a complete deep copy, so the map and the 

823
00:39:06,000 --> 00:39:09,000
set just like our other containers know how to copy themselves and produce new things. So 

824
00:39:09,000 --> 00:39:11,000
I got a new set that was 

825
00:39:11,000 --> 00:39:13,000
initialized with the contents of one's friends. 

826
00:39:13,000 --> 00:39:17,000
I took that set and destructively modified it to add two's friends, so now 

827
00:39:17,000 --> 00:39:20,000
that result has an enlarged circle that 

828
00:39:20,000 --> 00:39:22,000
-and then I'm further destructively modifying it 

829
00:39:22,000 --> 00:39:27,000
to remove anyone who was on my enemy list and your enemy list 

830
00:39:27,000 --> 00:39:28,000
to get us down to kind of the 

831
00:39:28,000 --> 00:39:31,000
happy 

832
00:39:31,000 --> 00:39:35,000
set of folks that either of us likes and neither of us 

833
00:39:35,000 --> 00:39:38,000
dislikes to do that. So things that you can imagine if you had 

834
00:39:38,000 --> 00:39:40,000
to do this with vectors would start to get pretty ugly. 

835
00:39:40,000 --> 00:39:44,000
Walking down your list and my list to see who's matching, to see who you have, 

836
00:39:44,000 --> 00:39:45,000
I 

837
00:39:45,000 --> 00:39:48,000
don't, to make sure that everybody got one and only one invitation. The set is 

838
00:39:48,000 --> 00:39:51,000
automatically handling all the coalescing of duplicates for us. 

839
00:39:51,000 --> 00:39:53,000
And then allowing you this kind of 

840
00:39:53,000 --> 00:39:55,000
high level expression of your 

841
00:39:55,000 --> 00:39:58,000
actions is very powerful. If in the 

842
00:39:58,000 --> 00:40:02,000
end what you're trying to model is this kind of rearrangement, then being 

843
00:40:02,000 --> 00:40:04,000
able to union, subtract, intersect 

844
00:40:04,000 --> 00:40:07,000
really helps the clarity of your code. Somebody can just read it and say, “Oh, I see what 

845
00:40:07,000 --> 00:40:10,000
they're doing here,” doing set operations 

846
00:40:10,000 --> 00:40:16,000
to get to where we wanna be. Question? [Inaudible] 

847
00:40:16,000 --> 00:40:18,000
the same enemies, so when you subtract 

848
00:40:18,000 --> 00:40:20,000
one's enemies, 

849
00:40:20,000 --> 00:40:27,000
and then you [inaudible] again with two it's not there. It turns out subtract will say remove these things if they're present. So if 

850
00:40:27,000 --> 00:40:30,000
you and I have some of the same enemies, after I've removed them from your list, I'll 

851
00:40:30,000 --> 00:40:33,000
remove them from my list. It turns out they won't be there, but it doesn't complain. 

852
00:40:33,000 --> 00:40:37,000
Like the remove operation, if you ask it to subtract some things, 

853
00:40:37,000 --> 00:40:38,000


854
00:40:38,000 --> 00:40:39,000
it doesn't get upset about it. 

855
00:40:39,000 --> 00:40:46,000
It just skips over them basically. 

856
00:40:46,000 --> 00:40:48,000
The set though 

857
00:40:48,000 --> 00:40:52,000
is a little bit more quirky than the others in one kind of peculiar way. 

858
00:40:52,000 --> 00:40:56,000
I'm gonna get you -foreshadow this. I'm actually gonna do more 

859
00:40:56,000 --> 00:40:58,000
serious work on this on Friday 

860
00:40:58,000 --> 00:41:02,000
-is that the other containers kinda store and retrieve things. They're putting them in buckets, or 

861
00:41:02,000 --> 00:41:05,000
in slabs, or boxes, or whatever, and returning them back to you. But set is 

862
00:41:05,000 --> 00:41:09,000
actually really has a much more intimate relationship with the data that it's 

863
00:41:09,000 --> 00:41:10,000
storing 

864
00:41:10,000 --> 00:41:13,000
that it really is examining the things that you give it 

865
00:41:13,000 --> 00:41:16,000
to make sure for example that any existing copy 

866
00:41:16,000 --> 00:41:18,000
is 

867
00:41:18,000 --> 00:41:21,000
coalesced with this one, that when you ask it to go find something it has to do 

868
00:41:21,000 --> 00:41:23,000
some kind of matching operation to say do I have something that looks like 

869
00:41:23,000 --> 00:41:27,000
this. It's doing this sorting internally, as I said. It's keeping them ordered. 

870
00:41:27,000 --> 00:41:28,000
Well, it needs to know something about 

871
00:41:28,000 --> 00:41:29,000
how to do that ordering, 

872
00:41:29,000 --> 00:41:32,000
what it means for two elements to be the same, what it means for one element to 

873
00:41:32,000 --> 00:41:36,000
precede another or follow another in some ordering. 

874
00:41:36,000 --> 00:41:37,000
Okay. 

875
00:41:37,000 --> 00:41:39,000
But set is written as a template. 

876
00:41:39,000 --> 00:41:43,000
Set takes any kind of elem type thing. 

877
00:41:43,000 --> 00:41:46,000
How is it you compare two things 

878
00:41:46,000 --> 00:41:47,000
generically? 

879
00:41:47,000 --> 00:41:49,000


880
00:41:49,000 --> 00:41:50,000
That actually is not 

881
00:41:50,000 --> 00:41:52,000
an easy problem. 

882
00:41:52,000 --> 00:41:57,000
I'll show you a little bit off this code, and then we'll talk about this on Friday more -is that 

883
00:41:57,000 --> 00:42:00,000
one way you could do it is you could assume that if I take the two elements, I could just 

884
00:42:00,000 --> 00:42:03,000
use equals equals and less than. If I just 

885
00:42:03,000 --> 00:42:06,000
plug them into the built-in operators and say, “Tell me. Are they equal? What does equal 

886
00:42:06,000 --> 00:42:07,000
equal say? 

887
00:42:07,000 --> 00:42:10,000
Is one of them less than the other? Tell me that.” 

888
00:42:10,000 --> 00:42:14,000
Using the built-in operators will work for some types of things you would store. It'll 

889
00:42:14,000 --> 00:42:17,000
work for ints. It'll work doubles. It'll work for characters. 

890
00:42:17,000 --> 00:42:19,000
So all the primitive types 

891
00:42:19,000 --> 00:42:24,000
have relational operator behavior. Even things like string, which is a sort of 

892
00:42:24,000 --> 00:42:26,000
fancier type 

893
00:42:26,000 --> 00:42:29,000
also has behavior for equals equals and less than. But 

894
00:42:29,000 --> 00:42:32,000
you start throwing things like student structures into a set, 

895
00:42:32,000 --> 00:42:35,000
and I say take two students and say if they're equal equal, we're gonna run into 

896
00:42:35,000 --> 00:42:36,000
some trouble. 

897
00:42:36,000 --> 00:42:39,000
So I'll show you that error message, and we'll talk more about what we can do 

898
00:42:39,000 --> 00:42:43,000
about it when I see you again on Friday. If you 

899
00:42:43,000 --> 00:42:46,000
have something you need to talk to me about, now would be a good time before I run away, so if you wanted to see me today at