2
00:00:12,868 --> 00:00:16,128
This presentation is delivered by the Stanford Center for Professional 

3
00:00:16,129 --> 00:00:23,129
Development. 

4
00:00:24,939 --> 00:00:26,178
So with that said, 

5
00:00:26,178 --> 00:00:28,899
any questions about strings? We're gonna do a bunch more stuff today 

6
00:00:28,899 --> 00:00:32,058
with strings and characters. But if there's any questions before we actually 

7
00:00:32,058 --> 00:00:33,449
dive in to things, 

8
00:00:33,450 --> 00:00:36,630
let me know now. And if you could use the microphone, that would be great. One more time, take those 

9
00:00:36,630 --> 00:00:40,570
microphones out, hold them close to your heart. Air and gear, they're lots of fun, they're your 

10
00:00:40,570 --> 00:00:41,479
friend. Keep 

11
00:00:41,479 --> 00:00:43,279
the microphone with you. 

12
00:00:43,280 --> 00:00:48,490
Actually, sorry, about the mid-term, is it going - what's the cutoff of the mid-term in terms of, like, caustes. Right, 

13
00:00:48,490 --> 00:00:51,800
so the mid-term for stuff you need to know, the cutoff will be Wednesday's 

14
00:00:51,799 --> 00:00:52,628
class. 

15
00:00:52,628 --> 00:00:55,939
So, basically, you'll have a whole week of material that you won't need to 

16
00:00:55,939 --> 00:00:57,250
be responsible for 

17
00:00:57,250 --> 00:01:00,058
that will be from this Wednesday up until the mid-term. 

18
00:01:00,058 --> 00:01:03,058
The other thing, though, is to keep in mind that a few people have asked, well, do I need the 

19
00:01:03,058 --> 00:01:05,959
book versus your lectures for the mid-term. 

20
00:01:05,959 --> 00:01:07,399
You need to know 

21
00:01:07,400 --> 00:01:10,780
the lectures, and you need to know all the material from the book that is 

22
00:01:10,780 --> 00:01:13,549
covered with respect to lectures, which is most of the material from the 

23
00:01:13,549 --> 00:01:16,989
book. But there's a few cases where we go over something very quickly in 

24
00:01:16,989 --> 00:01:19,298
class, or I say refer to this page of 

25
00:01:19,299 --> 00:01:21,909
the book or whatever. That stuff you're responsible for knowing. 

26
00:01:21,909 --> 00:01:25,780
Stuff that I've, like, explicitly told you you don't need to know, like, polar coordinates, 

27
00:01:25,780 --> 00:01:27,700
aren't gonna be on the exam, okay. 

28
00:01:27,700 --> 00:01:30,990
So the exam will be more heavily geared towards stuff from lecture, but you still 

29
00:01:30,989 --> 00:01:33,769
should know all the stuff from the book that we've kind of referred to 

30
00:01:33,769 --> 00:01:38,199
in lecture as we've gone along, allrighty. All right, 

31
00:01:38,200 --> 00:01:42,159
so let's dive into our next great topic. Actually, it's a continuation of our last great 

32
00:01:42,159 --> 00:01:43,909
topic, which is strings. 

33
00:01:43,909 --> 00:01:46,119
And so, if we think about strings a little bit, one 

34
00:01:46,120 --> 00:01:49,120
of the things we might want to do with strings, is we want to do some string 

35
00:01:49,120 --> 00:01:51,120
processing that also involves some characters. 

36
00:01:51,120 --> 00:01:52,740
So how are we gonna do that? 

37
00:01:52,739 --> 00:01:55,519
One thing we might want to do is let's just do a simple example to begin with, 

38
00:01:55,519 --> 00:01:58,810
which is going through a string, and counting the number of uppercase characters in 

39
00:01:58,810 --> 00:02:01,370
the string. And the reason why I'm gonna harp on strings a whole bunch - we talked 

40
00:02:01,370 --> 00:02:02,219
about it last time - 

41
00:02:02,218 --> 00:02:05,178
we're gonna talk about it this time - guess what your next assignments gonna be. It's 

42
00:02:05,179 --> 00:02:08,748
gonna be all about string processing. So it's good stuff to know, okay. 

43
00:02:08,748 --> 00:02:10,718
So we might want to have some function. 

44
00:02:10,718 --> 00:02:12,739
Count uppercase. 

45
00:02:12,739 --> 00:02:15,539
And that's a function I've actually given to you in one of the handouts, so you 

46
00:02:15,539 --> 00:02:18,308
don't need to worry about jotting down all my code real quickly, but you might want 

47
00:02:18,308 --> 00:02:19,829
to pay close attention. 

48
00:02:19,829 --> 00:02:23,670
And what this does, is it gets past some string, STR, 

49
00:02:23,669 --> 00:02:26,969
and it's gonna count how many uppercase characters are in that string. So 

50
00:02:26,969 --> 00:02:28,649
it's gonna return an int. 

51
00:02:28,649 --> 00:02:31,739
And let's just say this is part of some other program, so we'll call this private, 

52
00:02:31,739 --> 00:02:33,500
although you could make it public if it was 

53
00:02:33,500 --> 00:02:36,908
in some class that you wanted to make available for other people to use. 

54
00:02:36,908 --> 00:02:39,229
So if we want to count the number of uppercase characters, 

55
00:02:39,229 --> 00:02:42,289
what do we want to think about doing? What's the kind of standard idiom that 

56
00:02:42,289 --> 00:02:48,868
we use for strings? Anyone remember? 

57
00:02:48,868 --> 00:02:52,598
What? We want to have a foreloop, [inaudible] somewhere over here. 

58
00:02:52,598 --> 00:02:56,588
Yeah, it's just raining candy on you. We want to have a foreloop that goes through all the 

59
00:02:56,588 --> 00:02:59,019
characters of the string, sort of counting through the character. 

60
00:02:59,019 --> 00:03:04,229
So we can do that by just saying for N2i equals zero, "i" less than 

61
00:03:04,229 --> 00:03:06,019
the length of the string, 

62
00:03:06,019 --> 00:03:09,989
right. So STR.length is the method we use to get the length of the string, 

63
00:03:09,989 --> 00:03:13,609
and then i++. And this is gonna loop through all the characters of the 

64
00:03:13,609 --> 00:03:14,549
string. Okay, 

65
00:03:14,549 --> 00:03:15,729
where actually it's gonna loop 

66
00:03:15,729 --> 00:03:18,650
through some number of times, which is the number of characters in the string. 

67
00:03:18,650 --> 00:03:21,769
Now we want to pull out each one of the characters, individually, to check to see if it's an uppercase 

68
00:03:21,769 --> 00:03:22,860
character. 

69
00:03:22,860 --> 00:03:27,009
What method might we use to do that? 

70
00:03:27,008 --> 00:03:31,628
Get a character out of a string in a particular position. 

71
00:03:31,628 --> 00:03:37,278
Come on, I'm begging for it. 

72
00:03:37,278 --> 00:03:37,799
Char at - it's like 

73
00:03:37,800 --> 00:03:39,900
where'd it go? It's just gone, 

74
00:03:39,900 --> 00:03:41,088
char at, and we'll just 

75
00:03:41,088 --> 00:03:43,959
for the delayed reaction, we'll do it in slow 

76
00:03:43,959 --> 00:03:44,729
mo. 

77
00:03:44,729 --> 00:03:49,729
Anyone remember "The Six Million Dollar Man," that show? No, all right. 

78
00:03:49,729 --> 00:03:50,940
Get another 

79
00:03:50,939 --> 00:03:51,408
man. 

80
00:03:51,408 --> 00:03:54,938
I'm just getting so old, I gotta hang it up. And the thing is, I'm not that much 

81
00:03:54,938 --> 00:03:55,609
older than you. But it's just amazing 

82
00:03:55,610 --> 00:03:59,180
what big a difference a few years makes. So 

83
00:03:59,180 --> 00:03:59,979


84
00:03:59,979 --> 00:04:01,249
char CH 

85
00:04:01,248 --> 00:04:06,769
is going to be from this string. Were gonna pull out the char at apposition 

86
00:04:06,769 --> 00:04:09,979
i. So now, we've actually each. We're gonna loop through each 

87
00:04:09,979 --> 00:04:11,879
character of the string, pulling out that character, 

88
00:04:11,878 --> 00:04:14,248
and we want to check to see if the character's uppercase. 

89
00:04:14,248 --> 00:04:17,848
We could actually have an if statement in here to check to see if that CH is in 

90
00:04:17,848 --> 00:04:21,949
between uppercase A and uppercase Z, which is kind of how you saw last time we could do 

91
00:04:21,949 --> 00:04:23,210
some math on characters. 

92
00:04:23,209 --> 00:04:26,060
We're gonna use the new funky way, which is to actually use one of the 

93
00:04:26,060 --> 00:04:29,370
methods from the character class, and just say if. 

94
00:04:29,370 --> 00:04:33,098
And the way we use the methods from the character class, we specify the name of the 

95
00:04:33,098 --> 00:04:35,188
class here as opposed to the name of an object 

96
00:04:35,189 --> 00:04:38,789
because the methods from the character class are what we refer to as static method. 

97
00:04:38,788 --> 00:04:42,110
There is no object associated with them. There are just methods that you call and 

98
00:04:42,110 --> 00:04:43,819
pass into character. 

99
00:04:43,819 --> 00:04:48,699
Is uppercase because this returns a Boolean, and will pass at CH to see if CH 

100
00:04:48,699 --> 00:04:51,030
is an uppercase character, okay. If it is 

101
00:04:51,029 --> 00:04:54,029
an uppercase character, okay. If it is an uppercase character, we want to somehow keep track of the number of the 

102
00:04:54,029 --> 00:04:57,758
uppercase characters we have. So how might we do that? 

103
00:04:57,759 --> 00:05:00,720
Counter, right. So have some int 

104
00:05:00,720 --> 00:05:04,770
count equals zero, up here, that I want initialized. Who said that? It came from 

105
00:05:04,769 --> 00:05:06,158
somewhere over here. 

106
00:05:06,158 --> 00:05:07,379
Come on, raise your hand. 

107
00:05:07,379 --> 00:05:08,360
Don't be shy. 

108
00:05:08,360 --> 00:05:10,199
It's a candy extravaganza. 

109
00:05:10,199 --> 00:05:13,689
So if character is uppercase, CH then got count, 

110
00:05:13,689 --> 00:05:15,629
we're just gonna add 1 to. 

111
00:05:15,629 --> 00:05:18,800
Otherwise we're not gonna increment the counts. It's not an uppercase character. 

112
00:05:18,800 --> 00:05:19,660
And then, 

113
00:05:19,660 --> 00:05:22,630
we end the foreloop. So this is gonna go through all the characters of the string. 

114
00:05:22,629 --> 00:05:26,490
For every character check seeks the uppercase. If it is, increment our count, 

115
00:05:26,490 --> 00:05:28,168
and at the end, what we want to do 

116
00:05:28,168 --> 00:05:29,959
is, basically, return 

117
00:05:29,959 --> 00:05:30,818
that count, 

118
00:05:30,819 --> 00:05:33,399
which tells us how many uppercase characters were 

119
00:05:33,399 --> 00:05:35,290
actually in the string, okay. Is there 

120
00:05:35,290 --> 00:05:37,220
any questions about this? This 

121
00:05:37,220 --> 00:05:39,869
is kind of like an example of the sort of vanilla string processing 

122
00:05:39,869 --> 00:05:43,309
you might do. You have some string. You go through all the characters of the string. You 

123
00:05:43,309 --> 00:05:46,189
do some kind of thing for a character of the string. In this case, were not 

124
00:05:46,189 --> 00:05:47,600
creating a new resulting 

125
00:05:47,600 --> 00:05:50,810
string. We're just counting up some number of characters that might be in 

126
00:05:50,810 --> 00:05:53,048
the string. 

127
00:05:53,048 --> 00:05:56,188
So we can do something a little bit more funky. This is kind of fun, but it's sort of 

128
00:05:56,189 --> 00:05:57,120
like, yeah, 

129
00:05:57,120 --> 00:05:58,788
just basic kind of string 

130
00:05:58,788 --> 00:06:02,459
and character stuff. Let's see something a little bit more funky, 

131
00:06:02,459 --> 00:06:03,978
which is actually to do 

132
00:06:03,978 --> 00:06:07,620
some string manipulation to break the string up into smaller pieces. And so what we 

133
00:06:07,620 --> 00:06:09,389
want to do is replace 

134
00:06:09,389 --> 00:06:12,740
some occurrence of a substring in a larger string 

135
00:06:12,740 --> 00:06:15,960
with some other sting, sort of like when you work on [inaudible], when you do Find/Replace. 

136
00:06:15,959 --> 00:06:16,558
You say, 

137
00:06:16,559 --> 00:06:20,669
hey, find me some little string, or some little work that's actually in my bigger 

138
00:06:20,668 --> 00:06:22,370
document. I'm gonna replace it with some other word. We're actually 

139
00:06:22,370 --> 00:06:25,300
gonna implement that as a little function, okay. So 

140
00:06:25,300 --> 00:06:26,560
what this is gonna do, 

141
00:06:26,560 --> 00:06:27,930
we'll call this 

142
00:06:27,930 --> 00:06:29,569
replace 

143
00:06:29,569 --> 00:06:33,150
occurrence just to keep the name short, but, in fact, all we're gonna do is 

144
00:06:33,149 --> 00:06:35,589
replace the very first occurrence in a string. 

145
00:06:35,589 --> 00:06:37,399
So we're gonna get past int 

146
00:06:37,399 --> 00:06:38,609
some string, 

147
00:06:38,610 --> 00:06:39,759
STR, 

148
00:06:39,759 --> 00:06:42,678
and what we want to do is, basically, 

149
00:06:42,678 --> 00:06:44,209
have some 

150
00:06:44,209 --> 00:06:47,029
original string, which is the thing that we want to replace, 

151
00:06:47,029 --> 00:06:50,418
with some replacement string. So we're gonna get past three parameters 

152
00:06:50,418 --> 00:06:51,058
here. 

153
00:06:51,059 --> 00:06:54,338
Will call this RPL for replace, okay. Which is 

154
00:06:54,338 --> 00:06:57,790
the large string, a piece of text that I want to replace some word in, 

155
00:06:57,790 --> 00:07:01,360
the original word that I want to replace, and the thing that I want to replace it 

156
00:07:01,360 --> 00:07:02,620
with, okay. 

157
00:07:02,620 --> 00:07:05,810
And so what I want to do because strings are immutable, right. I can't change the string 

158
00:07:05,810 --> 00:07:06,709
in place. 

159
00:07:06,709 --> 00:07:09,120
I have to actually return a new string, 

160
00:07:09,120 --> 00:07:12,129
which has this original replaced by this string. So, 

161
00:07:12,129 --> 00:07:14,528
this puppy's gonna return the string, 

162
00:07:14,528 --> 00:07:17,749
and we'll just make this private again, although we could have made it public if 

163
00:07:17,749 --> 00:07:20,830
we wanted to have it in the library that other people would use, or a class that other 

164
00:07:20,829 --> 00:07:23,038
people would use, okay. So 

165
00:07:23,038 --> 00:07:25,199
how might we think about the algorithm 

166
00:07:25,199 --> 00:07:28,430
for replacing this original string with the replacement. What's the first thing we 

167
00:07:28,430 --> 00:07:35,269
might want to think about that we want to do with the original string. Do, do, do, do, a 

168
00:07:35,269 --> 00:07:37,709
little concentration music. 

169
00:07:37,709 --> 00:07:40,788
We want to find it, right. We want to see if this original string 

170
00:07:40,788 --> 00:07:44,098
appears somewhere on that string, right because if it doesn't we're done. 

171
00:07:44,098 --> 00:07:46,769
Thanks for playing, right, but that's actually the good things for playing. It's sort of like 

172
00:07:46,769 --> 00:07:48,438
you got no more work to do. 

173
00:07:48,439 --> 00:07:51,809
And there's, actually, some methods from the string class that we can use to do 

174
00:07:51,809 --> 00:07:56,529
that. So there's a string in the string class called "index of." 

175
00:07:56,528 --> 00:08:00,689
And what index of does is I can pass it some string, like the original 

176
00:08:00,689 --> 00:08:01,959
string I want to look up, 

177
00:08:01,959 --> 00:08:03,949
and it will return to me a number. 

178
00:08:03,949 --> 00:08:07,028
That number is the index of the position of the 

179
00:08:07,028 --> 00:08:09,449
first character of this string 

180
00:08:09,449 --> 00:08:12,379
if it appears in the larger string. So 

181
00:08:12,379 --> 00:08:15,499
the larger string is the one that I'm sending the message to, and I'm asking it do 

182
00:08:15,499 --> 00:08:18,899
you have this original string somewhere inside you. If you do, 

183
00:08:18,899 --> 00:08:21,519
return me the index of its first occurrence. 

184
00:08:21,519 --> 00:08:25,758
And if you don't, it returns a negative 1. So I'm gonna assign this thing 

185
00:08:25,759 --> 00:08:29,559
to some variable I'll call index, and first of all, I want to check to see if I have any work to 

186
00:08:29,559 --> 00:08:31,629
do. If 

187
00:08:31,629 --> 00:08:34,399
index is not equal to negative 1, 

188
00:08:34,399 --> 00:08:35,970
then I have some work to do. 

189
00:08:35,970 --> 00:08:39,190
If it is equal to negative 1, that means hey, you know what, you want it to 

190
00:08:39,190 --> 00:08:39,930
replace 

191
00:08:39,929 --> 00:08:43,819
this original string, inside string STR. That original string doesn't 

192
00:08:43,820 --> 00:08:47,140
exist, so I got no work to do. You just called, like, find and replace in the word 

193
00:08:47,139 --> 00:08:47,588
processor, 

194
00:08:47,589 --> 00:08:50,470
and the thing you wanted to find wasn't there, okay. 

195
00:08:50,470 --> 00:08:52,950
So in that case, all I would do 

196
00:08:52,950 --> 00:08:54,860
is I would just return 

197
00:08:54,860 --> 00:08:56,139
STR, 

198
00:08:56,139 --> 00:09:00,120
right. Sort of unchanged, if I assume that I'm not doing what's inside the braces. 

199
00:09:00,120 --> 00:09:02,480
If I do find that string, though, 

200
00:09:02,480 --> 00:09:06,090
I'm gonna get some index, which is not negative 1, which is the position of 

201
00:09:06,090 --> 00:09:09,170
this original string. So let's do a little example just to make this a 

202
00:09:09,169 --> 00:09:11,829
little bit more clear what's going on. 

203
00:09:11,830 --> 00:09:15,310
So if we were to call this function - do, do, do, do, do - 

204
00:09:15,309 --> 00:09:17,779
and 

205
00:09:17,779 --> 00:09:19,600
pass in the string, 

206
00:09:19,600 --> 00:09:22,409
STR. So here's STR that we're gonna pass in. 

207
00:09:22,409 --> 00:09:24,159
We'll just put it in a big box, 

208
00:09:24,159 --> 00:09:28,610
and we'll say, at this point in life, everyone's just friendly. So we say 

209
00:09:28,610 --> 00:09:30,330
Stanford 

210
00:09:30,330 --> 00:09:31,840
loves 

211
00:09:31,840 --> 00:09:33,019
Cal, 

212
00:09:33,019 --> 00:09:36,409
right. Sometimes you have to distort reality in order t make an example. All 

213
00:09:36,409 --> 00:09:39,600
right, so we have Stanford loves Cal. That's our original string, STR, and we might 

214
00:09:39,600 --> 00:09:40,360
want to say, 

215
00:09:40,360 --> 00:09:42,669
well, you know, this is, really, not 

216
00:09:42,669 --> 00:09:48,799
always the way life is. Really, the way life is, is we want to replace the occurrence 

217
00:09:48,799 --> 00:09:51,679
on STR 

218
00:09:51,679 --> 00:09:54,019
of the word "loves" 

219
00:09:54,019 --> 00:10:00,379
with kind of a more realistic example, like the word "beats," 

220
00:10:00,379 --> 00:10:01,350
right. 

221
00:10:01,350 --> 00:10:03,350
So what we want to do - and then we're gonna - 

222
00:10:03,350 --> 00:10:06,370
this is gonna be some string that comes back, will find it back to STR. And the 

223
00:10:06,370 --> 00:10:08,330
question is, when we call this, 

224
00:10:08,330 --> 00:10:12,070
what index are we actually gonna find in here of the original string. So strings we start 

225
00:10:12,070 --> 00:10:16,390
counting from zero. Zero, 1, 2, 3, 4, 5, 

226
00:10:16,389 --> 00:10:18,460
6, 7, 8. 

227
00:10:18,460 --> 00:10:20,600
The nine is where the L is at. 

228
00:10:20,600 --> 00:10:25,830
And it keeps going. And 11, 12, 13, just put these all 

229
00:10:25,830 --> 00:10:30,000
together - 15, 16, 17 is the L and that would be the end of the string. Sorry, the 

230
00:10:30,000 --> 00:10:34,860
numbers are a little bit small. But the key is this L is at 9, okay. 

231
00:10:34,860 --> 00:10:39,620
So when I call string index up original, it says there's the word, 

232
00:10:39,620 --> 00:10:42,909
or the string, "loves," up here somewhere in the larger string. Yeah, 

233
00:10:42,909 --> 00:10:46,709
it does. It appears at Index 9 so that's what you get. 

234
00:10:46,710 --> 00:10:51,590
So if I've just gotten Index 9, and what I want to do is construct some new string 

235
00:10:51,590 --> 00:10:55,700
that, essentially, is going to have this portion removed from it, 

236
00:10:55,700 --> 00:10:57,560
how do I want to do that. 

237
00:10:57,559 --> 00:11:01,099
What I want to think about is the way I construct that string, it's from three 

238
00:11:01,100 --> 00:11:01,840
pieces. 

239
00:11:01,840 --> 00:11:04,600
The first piece is everything up to 

240
00:11:04,600 --> 00:11:06,229
the word I want to replace. 

241
00:11:06,229 --> 00:11:07,520
That's Piece No. 1. 

242
00:11:07,519 --> 00:11:11,078
The second piece is the thing that I actually want to replace, 

243
00:11:11,078 --> 00:11:12,889
the string I'm replacing with, 

244
00:11:12,889 --> 00:11:14,980
right. So this becomes Piece No. 2. 

245
00:11:14,980 --> 00:11:18,970
And then, everything else after the piece I've replaced is Piece No. 3. So 

246
00:11:18,970 --> 00:11:21,509
if I can concatenate those three pieces together, 

247
00:11:21,509 --> 00:11:23,720
I'm going to essentially get the new string, 

248
00:11:23,720 --> 00:11:25,399
which has this part replaced. 

249
00:11:25,399 --> 00:11:29,199
And the question is how do I find the appropriate indexes inside my larger 

250
00:11:29,200 --> 00:11:32,090
string to be able to actually do the replacement, okay. 

251
00:11:32,090 --> 00:11:34,810
So first thing that I'm gonna do here is say 

252
00:11:34,809 --> 00:11:38,559
get me the first portion. So what is, essentially, the substring of the 

253
00:11:38,559 --> 00:11:41,569
original string up to this L position. 

254
00:11:41,570 --> 00:11:44,770
So the way I can do that is I can say STR, 

255
00:11:44,769 --> 00:11:46,960
substring, and 

256
00:11:46,960 --> 00:11:50,009
I'm gonna get the substring starting at zero 'cause I want to start at the 

257
00:11:50,009 --> 00:11:50,899
beginning of the string, 

258
00:11:50,899 --> 00:11:55,129
and I want to go all the way up, but not including the L. That means 

259
00:11:55,129 --> 00:11:58,700
the last position in substring. Remember, in substring you give it two 

260
00:11:58,700 --> 00:12:02,160
indexes. You give it the starting point, and the position up to, but not including 

261
00:12:02,159 --> 00:12:03,569
that last chapter. 

262
00:12:03,570 --> 00:12:04,959
That's Position 9. 

263
00:12:04,958 --> 00:12:10,229
Where am I getting Position 9 from this thing? 

264
00:12:10,230 --> 00:12:13,889
From index, right. Index says where does love start. It starts at Position 9. I'm, like, 

265
00:12:13,889 --> 00:12:16,449
hey, that's fantastic. So 

266
00:12:16,450 --> 00:12:20,440
zero up to index, or zero up to 9 is Stanford and the states. It does not 

267
00:12:20,440 --> 00:12:22,240
include the L. So I get that portion. 

268
00:12:22,240 --> 00:12:24,940
Then I say well, to that - I'm not done yet, so premature [inaudible] in 

269
00:12:24,940 --> 00:12:29,470
there. Always gotta watch out for that, bad time. So 

270
00:12:29,470 --> 00:12:32,420
what we're gonna add to that is we're gonna add the string that we want to 

271
00:12:32,419 --> 00:12:35,329
replace in here, "beats," which happens to be the string called 

272
00:12:35,330 --> 00:12:36,350
the replacement, 

273
00:12:36,350 --> 00:12:37,779
or RPL. 

274
00:12:37,779 --> 00:12:40,418
And then to that, we want to add one more string. 

275
00:12:40,418 --> 00:12:45,799
And that's, essentially, everything from after "loves" over, to get that third piece, okay. 

276
00:12:45,799 --> 00:12:49,990
So what I want to know is what's the index of the position at which I need to 

277
00:12:49,990 --> 00:12:52,740
get characters over to the end. 

278
00:12:52,740 --> 00:12:54,389
That happens to be Position 

279
00:12:54,389 --> 00:12:55,949
14. 

280
00:12:55,950 --> 00:12:58,720
What is 14 equal to, 

281
00:12:58,720 --> 00:13:04,870
relative to the kinds of things I have over here? 

282
00:13:04,870 --> 00:13:07,450
It's index 'cause 

283
00:13:07,450 --> 00:13:10,120
I have to first get over to the 9, 

284
00:13:10,120 --> 00:13:12,730
then I need to jump over 

285
00:13:12,730 --> 00:13:14,060
the length of this thing, 

286
00:13:14,059 --> 00:13:17,039
which is the length of my original string. 

287
00:13:17,039 --> 00:13:19,169
So if I add to index, what's 

288
00:13:19,169 --> 00:13:19,919


289
00:13:19,919 --> 00:13:21,288
my original 

290
00:13:21,288 --> 00:13:22,580
dot link, what 

291
00:13:22,580 --> 00:13:26,810
that gives me is the index from which I want to take a substring over to the end 

292
00:13:26,809 --> 00:13:27,829
of the string. 

293
00:13:27,830 --> 00:13:29,528
So if I want to take a substring, 

294
00:13:29,528 --> 00:13:31,278
this becomes an index 

295
00:13:31,278 --> 00:13:32,639
to the 

296
00:13:32,639 --> 00:13:36,210
substring function, or the substring method. 

297
00:13:36,210 --> 00:13:37,050
And so 

298
00:13:37,049 --> 00:13:40,289
from the string, what I do is I take the substring, starting at Position 

299
00:13:40,289 --> 00:13:41,208
14. 

300
00:13:41,208 --> 00:13:45,188
Notice I haven't given a second index here. In this case I gave two indexes. I 

301
00:13:45,188 --> 00:13:46,919
gave a start and end position. 

302
00:13:46,919 --> 00:13:51,519
Here I just gave one index, and what happens if I only give one index? 

303
00:13:51,519 --> 00:13:52,799
It goes to the end. 

304
00:13:52,799 --> 00:13:55,949
So that's part of the beauty is a lot of times you just say, hey, from this position go 

305
00:13:55,950 --> 00:13:56,870
to the end. 

306
00:13:56,870 --> 00:14:00,209
And so that's what I get when I put all these string things together. 

307
00:14:00,208 --> 00:14:04,039
And what I need to do is these three things are just pieces. I'm concatenating 

308
00:14:04,039 --> 00:14:04,808
them together. 

309
00:14:04,808 --> 00:14:07,759
I assigned them back to STR. 

310
00:14:07,759 --> 00:14:12,509
And then, when I return STR here, I've gotten those three pieces concatenated together. Is there 

311
00:14:12,509 --> 00:14:18,340
any questions about that? Un huh. If 

312
00:14:18,340 --> 00:14:21,920
love appears more than once, index has just returned the index of the very first 

313
00:14:21,919 --> 00:14:22,870
occurrence. 

314
00:14:22,870 --> 00:14:26,600
There's actually a version of index sub that takes two parameters. One is the thing you're 

315
00:14:26,600 --> 00:14:30,090
looking for, and the second is from which position you should start looking for it at. 

316
00:14:30,090 --> 00:14:30,940


317
00:14:30,940 --> 00:14:34,420
And so you could actually say look for love starting at Position, you know, 

318
00:14:34,419 --> 00:14:37,199
13, and then it wouldn't actually find love in the remainder of the string. So 

319
00:14:37,200 --> 00:14:39,310
there's a different version of index of, 

320
00:14:39,309 --> 00:14:41,479
but index of always returns the 

321
00:14:41,480 --> 00:14:42,570
index of the 

322
00:14:42,570 --> 00:14:48,560
very first occurrence of the string you're looking for in 

323
00:14:48,559 --> 00:14:52,289
that string. So let's actually do a little example of this in a running program. Do, do, 

324
00:14:52,289 --> 00:14:54,419
do, do, do. 

325
00:14:54,419 --> 00:14:58,829
And we'll do replace occurrence. And one thing that actually goes on at Stanford, 

326
00:14:58,830 --> 00:15:01,290
which I thought was an interesting thing when I got here professionally, is 

327
00:15:01,289 --> 00:15:05,230
we don't like to speak in full terms. So if we want to 

328
00:15:05,230 --> 00:15:06,759
Stanfordize some strings, 

329
00:15:06,759 --> 00:15:08,799
we do all these string replacements. We sort of say, 

330
00:15:08,799 --> 00:15:12,979
you know what, if you have Florence Moore in your string, that's really FloMo. 

331
00:15:12,980 --> 00:15:17,490
And Memorial Church is memchu; AmerSc, [inaudible]; psychology is psyche; 

332
00:15:17,490 --> 00:15:19,310
economics, econ; your 

333
00:15:19,309 --> 00:15:20,429
most fun class, 

334
00:15:20,429 --> 00:15:21,689
CS 106A. So it's 

335
00:15:21,690 --> 00:15:24,320
just what Stanford's all about. 

336
00:15:24,320 --> 00:15:27,530
And so if we go ahead and run this, right. Here's the function we just wrote. Here's 

337
00:15:27,529 --> 00:15:30,870
our little friend, replace first occurrence. Over here we called it 

338
00:15:30,870 --> 00:15:34,100
replace occurrence. I'm being explicit and saying it's only replacing the first occurrence. 

339
00:15:34,100 --> 00:15:37,350
You could think of a way to generalize this to replace all occurrences in a 

340
00:15:37,350 --> 00:15:39,870
string if you wanted to. But I didn't give you that version 'cause 

341
00:15:39,870 --> 00:15:41,850
I might give you that version on another 

342
00:15:41,850 --> 00:15:43,230
problem set at some point. 

343
00:15:43,230 --> 00:15:47,240
So what we're gonna do is we're gonna ask the user, enter a line to Stanfordize. 

344
00:15:47,240 --> 00:15:50,779
Notice I want to put Stanfordize inside double quotes. So I put it in these 

345
00:15:50,779 --> 00:15:51,759
characters, 

346
00:15:51,759 --> 00:15:55,109
/quote, which just means a single, double-quote character. That's how I print 

347
00:15:55,109 --> 00:15:55,980
double quotes. 

348
00:15:55,980 --> 00:15:58,778
So it says read line for Stanfordize in quotes. 

349
00:15:58,778 --> 00:16:03,099
I want to keep reading lines and Stanfordizing them until the user gives me an 

350
00:16:03,099 --> 00:16:04,009
empty line. 

351
00:16:04,009 --> 00:16:08,539
How do I do that? I check to see if the line the user gives me is equal 

352
00:16:08,539 --> 00:16:09,098


353
00:16:09,099 --> 00:16:10,420
to a quote-quote. 

354
00:16:10,419 --> 00:16:14,519
So if it's equal to a quote-quote, it's equal to the empty string. That means, hey, 

355
00:16:14,519 --> 00:16:17,899
you entered in - if we ask the user for a string, they just hit enter. They 

356
00:16:17,899 --> 00:16:21,139
didn't enter any characters. That's the empty string, so we would break out the loop. It's 

357
00:16:21,139 --> 00:16:23,659
our little loop and a-half concept. Otherwise, we say 

358
00:16:23,659 --> 00:16:27,129
at Stanford we say, and we Stanfordize the line. And when someone's finally 

359
00:16:27,129 --> 00:16:31,450
done, we say thank you for visiting Stanford, ha, ha, ha. That'll be $45,000.00. 

360
00:16:31,450 --> 00:16:34,080
[Laughter]. All right, 

361
00:16:34,080 --> 00:16:38,660
so it's money well spent, trust me. Really. 

362
00:16:38,659 --> 00:16:41,709
Okay, so replace occurrence string we want to run, 

363
00:16:41,710 --> 00:16:45,290
and we come along, and it's running, it's running, it's running. 

364
00:16:45,289 --> 00:16:48,110
Sometimes my computer's running a little bit slow. I notice this weird thing last night. I'm gonna tell 

365
00:16:48,110 --> 00:16:51,300
you a story while the computer's actually running. 

366
00:16:51,299 --> 00:16:55,199
I couldn't type N's on my keyboard for some reason. And then I reset my computer, and I 

367
00:16:55,200 --> 00:16:59,360
could. So at this point, I don't know if I can type N's. So let's just hope we can. So 

368
00:16:59,360 --> 00:17:02,480
I live in - oh, I got the 

369
00:17:02,480 --> 00:17:05,860
N - Florence - you should have been here last night. I was like, N, N, and I wasn't getting it - 

370
00:17:05,859 --> 00:17:07,759
Florence Moore, 

371
00:17:07,759 --> 00:17:10,279
major in 

372
00:17:10,279 --> 00:17:13,559
economics - I can't even type 

373
00:17:13,559 --> 00:17:15,159
today - 

374
00:17:15,160 --> 00:17:16,750
and spend 

375
00:17:16,750 --> 00:17:22,699
all my time on my most fun class. And so, 

376
00:17:22,699 --> 00:17:26,660
at Stanford we say I live in FloMo, major in Econ, and spend all my time on CS 106A, 

377
00:17:26,660 --> 00:17:28,360
okay. 

378
00:17:28,359 --> 00:17:33,809
And now, I hit return, Thank you for visiting Stanford. Go home. All right, so 

379
00:17:33,809 --> 00:17:36,730
that's kind of a simple version of 

380
00:17:36,730 --> 00:17:39,670
replace first occurrence. And notice you can actually replace multiple things in 

381
00:17:39,670 --> 00:17:42,700
the same string, as long as the string that you're doing the replacement on 

382
00:17:42,700 --> 00:17:46,319
you assign back to itself. And then we kind of do all bunch of these replacements in a row, okay. Is 

383
00:17:46,319 --> 00:17:48,039
there 

384
00:17:48,039 --> 00:17:50,389
any questions about that? 

385
00:17:50,390 --> 00:17:53,720
Are you feeling okay about doing replacement. All right. 

386
00:17:53,720 --> 00:17:56,319
So now, it's time for something completely different. Although it's not 

387
00:17:56,319 --> 00:17:58,889
completely different, it's just kind of different. 

388
00:17:58,890 --> 00:18:00,549
And the idea is sometimes - 

389
00:18:00,548 --> 00:18:03,480
and I always say that - sometimes you want to do this. Yeah, 'cause 

390
00:18:03,480 --> 00:18:06,360
sometimes you want to do it, and other times you don't. 

391
00:18:06,359 --> 00:18:08,099
Sometimes you feel like a nut. 

392
00:18:08,099 --> 00:18:11,099
Sometimes you don't. Oh, man, 

393
00:18:11,099 --> 00:18:14,399
I gotta start watching TV in this decade. 

394
00:18:14,400 --> 00:18:15,580


395
00:18:15,579 --> 00:18:18,369
So, tokenizers. 

396
00:18:18,369 --> 00:18:22,629
What is a tokenizer? A tokenizer is something, as they say it's a 

397
00:18:22,630 --> 00:18:27,280
computer science term. All a tokenizer is, is we have some string of text. What we 

398
00:18:27,279 --> 00:18:30,829
want to do is break it up into tokens. That's called tokenization. So you 

399
00:18:30,829 --> 00:18:31,609
might say, 

400
00:18:31,609 --> 00:18:34,879
Marilyn, what is a token? Like, last time I remember what a token was, is when I gave 

401
00:18:34,880 --> 00:18:39,470
a dollar at the arcade and I got back, like, ten tokens instead of quarters. And you're like, 

402
00:18:39,470 --> 00:18:42,680
yeah, Marilyn, I never did that. I had 

403
00:18:42,680 --> 00:18:46,259
an XBox. All right, so a tokenizer - anyone ever go to an arcade? All right, 

404
00:18:46,259 --> 00:18:47,549
just checking. 

405
00:18:47,549 --> 00:18:52,940
All right a token, basically, is a piece of string - a piece of string - is a 

406
00:18:52,940 --> 00:18:55,900
string that has on the two sides of it, 

407
00:18:55,900 --> 00:18:57,929
white space. So if I say, 

408
00:18:57,929 --> 00:18:59,790
hello 

409
00:18:59,789 --> 00:19:00,980
there, 

410
00:19:00,980 --> 00:19:02,169
Mary, 

411
00:19:02,169 --> 00:19:03,020
hello 

412
00:19:03,019 --> 00:19:06,529
there and Mary are tokens. They are something that we refer to as delimited 

413
00:19:06,529 --> 00:19:11,200
by what space, which means there is either spaces, or tabs, or returns, or whatever, 

414
00:19:11,200 --> 00:19:14,700
in between the individual tokens. We like to think of tokens as 

415
00:19:14,700 --> 00:19:15,420
words, 

416
00:19:15,420 --> 00:19:18,690
but computer scientists say token. Token is a more general term 'cause if I actually said 

417
00:19:18,690 --> 00:19:20,210
hello there 

418
00:19:20,210 --> 00:19:20,970
comma 

419
00:19:20,970 --> 00:19:21,740
Mary, 

420
00:19:21,740 --> 00:19:25,509
the "there comma" might actually be considered one token by itself 'cause it's 

421
00:19:25,509 --> 00:19:27,519
just delimited by space. 

422
00:19:27,519 --> 00:19:30,538
Here's a space here and has a space there, so the comma's in there. And you would think why 

423
00:19:30,538 --> 00:19:34,509
comma's not part of the word. Yeah, that's why we call them tokens and not words. 

424
00:19:34,509 --> 00:19:38,309
So if we want to tokenize, there is a library that we can use in Java that 

425
00:19:38,309 --> 00:19:40,899
actually has some fun stuff in it for tokenization. 

426
00:19:40,900 --> 00:19:47,759
And that's Java util, so we would import Java.util.*, 

427
00:19:47,759 --> 00:19:50,109
and what we get for doing that, 

428
00:19:50,109 --> 00:19:53,029
is we get something called the string tokenizer, 

429
00:19:53,029 --> 00:19:58,889
which is a class that we can use to tokenize text. All right, so 

430
00:19:58,890 --> 00:20:02,080
we get this thing called the string tokenizer. 

431
00:20:02,079 --> 00:20:05,369
How do I create one of these? Well, I paste string tokenizer as the type 

432
00:20:05,369 --> 00:20:09,439
'cause that's the class that I have, and I'll call it tokenizer 

433
00:20:09,440 --> 00:20:14,570
equals I want to create a new tokenizer. So I say new 

434
00:20:14,569 --> 00:20:16,220
string tokenizer, 

435
00:20:16,220 --> 00:20:19,299
and the question that comes up here is well, 

436
00:20:19,299 --> 00:20:21,450
what is the string you're gonna tokenize? 

437
00:20:21,450 --> 00:20:22,920
That is the string that we 

438
00:20:22,920 --> 00:20:26,850
passed to the string tokenizer's constructor when we create a new one. So 

439
00:20:26,849 --> 00:20:29,558
we might have some line here that we passed in. 

440
00:20:29,558 --> 00:20:31,609
And now, line is just some string 

441
00:20:31,609 --> 00:20:35,490
that maybe we got from the user for example by doing a read 

442
00:20:35,490 --> 00:20:39,200
line. Maybe we were unfriendly and didn't give the user a prompt. We just like, if a 

443
00:20:39,200 --> 00:20:42,058
blinking comes up, and there's like oh, I gotta turn and write something. 

444
00:20:42,058 --> 00:20:44,339
It's just like when you're writing a paper, right. 

445
00:20:44,339 --> 00:20:47,639
The blinking cursor comes up and there's nothing there. You just gotta fill it in. 

446
00:20:47,640 --> 00:20:50,000
So you write some line, and then we can say, hey, 

447
00:20:50,000 --> 00:20:53,429
string tokenizer, I'm gonna create a new one of you, and the line I want you to tokenize is 

448
00:20:53,429 --> 00:20:56,310
this line that I'm giving you to begin with. 

449
00:20:56,309 --> 00:20:59,849
So once you get that line, there's a couple of things you can ask the string tokenizer. 

450
00:20:59,849 --> 00:21:02,689
One of them is a method that returns a booleon, 

451
00:21:02,690 --> 00:21:07,019
which is called has more tokens. 

452
00:21:07,019 --> 00:21:09,288
And the way this puppy works is you just ask 

453
00:21:09,288 --> 00:21:12,480
this string tokenizer, like you would say tokenizer dot 

454
00:21:12,480 --> 00:21:15,329
has more tokens, like; do you have more tokens? 

455
00:21:15,329 --> 00:21:18,730
Have you processed the whole string yet? So if you've just created the new line, and this 

456
00:21:18,730 --> 00:21:20,079
line is kind of sitting here 

457
00:21:20,079 --> 00:21:23,528
like that, and it's saying do you have any more tokens. Yeah, I got tokens, 

458
00:21:23,528 --> 00:21:26,549
man. I got tokens up the wazoo. You want tokens, I'll give you tokens. 

459
00:21:26,549 --> 00:21:30,700
And so, has more tokens [inaudible] true. If you process the whole string, when 

460
00:21:30,700 --> 00:21:32,259
you will see when we get there, 

461
00:21:32,259 --> 00:21:34,799
it'll say no, I don't have any more tokens. 

462
00:21:34,799 --> 00:21:36,500
How do you get each token? 

463
00:21:36,500 --> 00:21:40,079
Well, you ask for next token. 

464
00:21:40,079 --> 00:21:43,799
And what next token does, when you call the tokenizer with next token, is it gives 

465
00:21:43,799 --> 00:21:45,239
you the next token 

466
00:21:45,239 --> 00:21:48,009
of the string that it's processing, 

467
00:21:48,009 --> 00:21:52,798
as a separate string. So if I started off the tokenizer with this line, I say hey, do you 

468
00:21:52,798 --> 00:21:56,160
have more tokens. It says yeah. Well, give me the next token. So what it will 

469
00:21:56,160 --> 00:21:58,820
return to you is hello. 

470
00:21:58,819 --> 00:22:02,189
And it will be sort of sitting here waiting to give you the next token. You can 

471
00:22:02,190 --> 00:22:04,140
ask if you have more tokens. It says yeah, give 

472
00:22:04,140 --> 00:22:07,579
me the next token. It will give you "there" and the comma 

473
00:22:07,578 --> 00:22:11,200
'cause the default version of the tokenizer, the only think that delimits 

474
00:22:11,200 --> 00:22:14,558
tokens - delimit is a funky word for splits between tokens - 

475
00:22:14,558 --> 00:22:18,980
are spaces, or tabs, or return characters. But for a single line, you won't have returns 

476
00:22:18,980 --> 00:22:19,750
in that. 

477
00:22:19,750 --> 00:22:23,069
And then you said you had more tokens. Yeah, give me the next token. It will give you "Mary" as 

478
00:22:23,069 --> 00:22:24,638
a token that's sitting here. 

479
00:22:24,638 --> 00:22:28,539
And then when you say do you have more tokens, that's all, okay. And 

480
00:22:28,539 --> 00:22:31,039
at that point, you shouldn't call next token. 

481
00:22:31,039 --> 00:22:33,950
You can if you want. You can experiment with this if you want to 

482
00:22:33,950 --> 00:22:36,140
experiment with random error messages, but 

483
00:22:36,140 --> 00:22:37,500
there's no more tokens to give 

484
00:22:37,500 --> 00:22:42,910
you. It's all out of love. It's so lost without you. It has no more tokens. Yeah, 

485
00:22:42,910 --> 00:22:44,110
Air Supply. Not that I 

486
00:22:44,109 --> 00:22:47,178
would recommend that you have to listen to Air Supply, but sometimes you hear a song and you 

487
00:22:47,179 --> 00:22:48,680
can't get it out of your head 

488
00:22:48,680 --> 00:22:50,789
as much as you wish you could. 

489
00:22:50,789 --> 00:22:53,039
Sometimes selective brain surgery would not be a bad thing, but that's important right now. What is 

490
00:22:53,039 --> 00:22:54,579
important 

491
00:22:54,579 --> 00:22:55,980
right now 

492
00:22:55,980 --> 00:22:59,170
is how do we put all this together at the tokenizer line. So let me show you an 

493
00:22:59,170 --> 00:23:01,140
example of the tokenizer. 

494
00:23:01,140 --> 00:23:03,429
This one's very simple. All we're gonna do here 

495
00:23:03,429 --> 00:23:05,970
is we're gonna ask the user - I'll just scroll over a 

496
00:23:05,970 --> 00:23:08,390
little 

497
00:23:08,390 --> 00:23:12,040
bit. We're gonna ask the user to enter some lines to tokenize and we're gonna 

498
00:23:12,039 --> 00:23:15,609
write out the tokens of the string R, and then we're gonna call the message 

499
00:23:15,609 --> 00:23:16,559
sprint token. 

500
00:23:16,559 --> 00:23:19,329
What sprint token's gonna do, it's gonna take in the string you want to 

501
00:23:19,329 --> 00:23:20,579
tokenize. It 

502
00:23:20,579 --> 00:23:27,579
creates one of these string tokenizers - I'm so lost without you. [Laughter]. 

503
00:23:28,430 --> 00:23:30,390
Can 

504
00:23:30,390 --> 00:23:31,288
we make 

505
00:23:31,288 --> 00:23:35,640
Marilyn snap? No. I know it's 

506
00:23:35,640 --> 00:23:38,630
like great fun to listen to when you're, like, 14, and you just broke up with 

507
00:23:38,630 --> 00:23:40,850
a girlfriend for the first time. 

508
00:23:40,849 --> 00:23:44,789
And then, after that, you want to kill the next time you hear it. Fine, 

509
00:23:44,789 --> 00:23:46,809
So, tokenizer. 

510
00:23:46,809 --> 00:23:50,309
I'm glad we're having fun though. 

511
00:23:50,309 --> 00:23:53,990
So what I'm gonna do is I'm gonna count through all the tokens. So I'm gonna a foreloop 

512
00:23:53,990 --> 00:23:56,069
interestingly enough. Here's something funky. I'm gonna have a 

513
00:23:56,069 --> 00:23:59,939
foreloop, but the thing I'm gonna do in my foreloop, my test, is not to check to see if 

514
00:23:59,940 --> 00:24:01,710
I've reached some maximum number. 

515
00:24:01,710 --> 00:24:05,539
But my test is actually gonna be to see if tokenizer has more tokens. 

516
00:24:05,539 --> 00:24:08,889
So I have a foreloop that's just like a regular foreloop, but I start off with a count 

517
00:24:08,890 --> 00:24:11,720
that's equal to zero, and you're like that looks okay. 

518
00:24:11,720 --> 00:24:15,190
I do a count ++ over here, and you're like that's okay, what are you counting up 

519
00:24:15,190 --> 00:24:19,430
to Marilyn. And I say I'm counting up to however many tokens you have. And you go, 

520
00:24:19,430 --> 00:24:23,769
oh, interesting. So my condition's to leave, or to continue on with the 

521
00:24:23,769 --> 00:24:27,210
loop, is tokenizer has more tokens. If 

522
00:24:27,210 --> 00:24:28,548
it has more tokens, 

523
00:24:28,548 --> 00:24:31,058
then I'm gonna do something here to get the next token. I'm gonna keep 

524
00:24:31,058 --> 00:24:34,410
doing this loop. But what the counter's gonna give me is a way to count 

525
00:24:34,410 --> 00:24:36,048
through all my tokens. 

526
00:24:36,048 --> 00:24:38,480
So I can write out token number count, 

527
00:24:38,480 --> 00:24:42,620
and then a colon, and then write out the next token that the tokenizer gives me. 

528
00:24:42,619 --> 00:24:44,689
Is there any questions about there? Let's 

529
00:24:44,690 --> 00:24:47,730
actually run this puppy [inaudible]. Do, 

530
00:24:47,730 --> 00:24:49,430
de, do. 

531
00:24:49,430 --> 00:24:53,450
You can feel free to keep singing now if you want, if 

532
00:24:53,450 --> 00:24:56,069
you want. All right, 

533
00:24:56,069 --> 00:24:58,308
so we're gonna do our friend. 

534
00:24:58,308 --> 00:25:01,000
What's our friend called? The tokenizer example. 

535
00:25:01,000 --> 00:25:04,789
Do, do, do, do, we're running the tokenizer, interline's tokenized, so I might say 

536
00:25:04,789 --> 00:25:05,779
"I, 

537
00:25:05,779 --> 00:25:07,450
for one, 

538
00:25:07,450 --> 00:25:09,970
love CS." We're very formal here. 

539
00:25:09,970 --> 00:25:13,370
And it says the tokens of the string are on notice. It got the "I" and the comma 

540
00:25:13,369 --> 00:25:17,379
together as one token because as we talked about, spaces are the delimiter. 

541
00:25:17,380 --> 00:25:21,240
And so "for" and then "one" with the comma, and "love" and "CS," and that's all the 

542
00:25:21,240 --> 00:25:23,759
tokens we got. And so at this point you might be thinking, yeah, man, 

543
00:25:23,759 --> 00:25:27,299
that's great, but you know what. I really don't like punctuation. 

544
00:25:27,299 --> 00:25:30,440
And sometimes I don't like punctuation, but I can't stop the user from 

545
00:25:30,440 --> 00:25:33,350
using punctuation because even though I don't like to be grammatically correct, 

546
00:25:33,349 --> 00:25:34,589
they do. 

547
00:25:34,589 --> 00:25:37,829
So how do I prevent them from being grammatically correct as well, 

548
00:25:37,829 --> 00:25:40,220
which is kind of a fun thing to do. What you can say is hey, 

549
00:25:40,220 --> 00:25:43,829
what I want to do is change my tokenizer, so that it not only 

550
00:25:43,829 --> 00:25:47,689
stops at spaces, but it's gonna stop or consider a delimiter, 

551
00:25:47,690 --> 00:25:51,179
any of this list of characteristics that I give it. So you give it a list of characteristics as 

552
00:25:51,179 --> 00:25:53,798
a string. So here I'm gonna give it a comma 

553
00:25:53,798 --> 00:25:55,668
and a space, okay. 

554
00:25:55,669 --> 00:25:59,620
And this version of the string tokenizer constructor, what it will do is 

555
00:25:59,619 --> 00:26:01,939
it will actually tokenize the string. 

556
00:26:01,940 --> 00:26:06,070
But think of the thing that you're using as your delimiter, or what chops up your 

557
00:26:06,069 --> 00:26:07,189
individual tokens 

558
00:26:07,190 --> 00:26:10,749
as either a comma, or a space, or anything you want to put in that string there. 

559
00:26:10,749 --> 00:26:13,649
Each of the individual characters in that string is treated as a potential 

560
00:26:13,648 --> 00:26:18,529
delimiter. So if you say "I for one love CS," 

561
00:26:18,529 --> 00:26:20,009
ah, no commas. 

562
00:26:20,009 --> 00:26:22,700
Why? Because commas are considered delimiter. So it just gives you 

563
00:26:22,700 --> 00:26:26,559
everything up to a comma or a space, and you could imagine you could put in period, and 

564
00:26:26,559 --> 00:26:28,178
exclamation point, and all that other stuff, 

565
00:26:28,179 --> 00:26:32,860
if you just want to get out the non punctuation here. So 

566
00:26:32,859 --> 00:26:35,349
tokenizing is something that's oftentimes useful if you get a bigger 

567
00:26:35,349 --> 00:26:38,349
piece of text, and you want to break it up into any individual words, and then maybe do 

568
00:26:38,349 --> 00:26:40,558
something on those individual words, okay. 

569
00:26:40,558 --> 00:26:42,629
Any questions about tokenization? 

570
00:26:42,630 --> 00:26:46,490
Hopefully, it's not too painful or scary. All right. So 

571
00:26:46,490 --> 00:26:50,298
the next thing I want to do, will just pay for the smorgasbord of string, 

572
00:26:50,298 --> 00:26:53,408
is I want to teach you about something that's really gotten to be an important 

573
00:26:53,409 --> 00:26:55,650
thing about computer science these 

574
00:26:55,650 --> 00:26:56,559
last few years, 

575
00:26:56,558 --> 00:26:57,430
which is, 

576
00:26:57,430 --> 00:27:00,120
basically, this idea known as encryption. 

577
00:27:00,119 --> 00:27:03,889
And encryption is something that's been around for thousands of years. All 

578
00:27:03,890 --> 00:27:07,000
encryption is, is it's kind of like sending secret messages. 

579
00:27:07,000 --> 00:27:08,808
You have some particular message. 

580
00:27:08,808 --> 00:27:11,749
You want to send it to someone else, but you want to send a secret version of 

581
00:27:11,749 --> 00:27:15,568
that message. And people have been doing this for thousands of years, actually, 

582
00:27:15,568 --> 00:27:16,339
interestingly enough. 

583
00:27:16,339 --> 00:27:20,269
They just didn't have very good methods of doing it until about the last, oh, 50 years. 

584
00:27:20,269 --> 00:27:23,240
But you know they did it for a long time, and people broke encryption. As a matter of fact, 

585
00:27:23,240 --> 00:27:26,759
there's this really interesting book by Simon Singh. I'll bring in a copy, 

586
00:27:26,759 --> 00:27:29,398
perhaps next class, if you're really interested, 

587
00:27:29,398 --> 00:27:33,439
about the whole history of encryption. It goes back thousands of years, and how, like, 

588
00:27:33,440 --> 00:27:37,410
wars, and queenships, and kingships, and stuff, were, basically, lost in one on the 

589
00:27:37,410 --> 00:27:38,200
strength of 

590
00:27:38,200 --> 00:27:41,269
how well someone could break a piece of code. 

591
00:27:41,269 --> 00:27:44,408
But the basic idea of encryption, and it probably dates back even further than this, but 

592
00:27:44,409 --> 00:27:47,390
one of the most well-known ones is something that's known as the Caesar 

593
00:27:47,390 --> 00:27:48,470


594
00:27:48,470 --> 00:27:50,129
cipher, not to be confused with the salad. 

595
00:27:50,128 --> 00:27:52,759
But the basic idea with the Caesar cipher - I 

596
00:27:52,759 --> 00:27:54,190
picked up the wrong newspaper - 

597
00:27:54,190 --> 00:27:55,960
the Caesar cipher 

598
00:27:55,960 --> 00:27:58,240
is that what we want to do 

599
00:27:58,240 --> 00:28:02,808
is, basically, take our alphabet and rotate it by some number of letters to get a 

600
00:28:02,808 --> 00:28:03,220
replacement. 

601
00:28:03,220 --> 00:28:05,920
What does that mean? That's just a whole bunch of words. 

602
00:28:05,920 --> 00:28:10,070
So let me show you a little slide that just makes that clear. 

603
00:28:10,069 --> 00:28:11,240
So in Caesar's day - 

604
00:28:11,240 --> 00:28:13,779
I will now play the role of Caesar. I actually considered wearing 

605
00:28:13,779 --> 00:28:18,170
a toga to class today. I just thought that was fraught with way too much peril. 

606
00:28:18,170 --> 00:28:21,670
So I just decided to bring my little Caesar crown. And that's what I'm trying to find my little 

607
00:28:21,670 --> 00:28:23,970
crown of reason stuff, but I couldn't. 

608
00:28:23,970 --> 00:28:26,288
So I just got a little hat. 

609
00:28:26,288 --> 00:28:28,519
[Laughter]. 

610
00:28:28,519 --> 00:28:32,359
And so the basic idea - say you are Caesar - well, I did crown myself, actually. 

611
00:28:32,359 --> 00:28:35,699
I knew someone here could actually take the crown from [inaudible]. That was Napoleon, 

612
00:28:35,700 --> 00:28:38,170
a whole different story. I really 

613
00:28:38,170 --> 00:28:42,660
like to take history and mix it up. It's just to see if you're actually paying attention. 

614
00:28:42,660 --> 00:28:46,460
All right, the basic way the Caesar cipher works is we take our original 

615
00:28:46,460 --> 00:28:48,600
alphabet. Here's all of our letters from A through Z. 

616
00:28:48,599 --> 00:28:52,048
We take that whole alphabet, and we shift it over some number of letters. Like let's 

617
00:28:52,048 --> 00:28:55,980
say we shift it over three letters. So I take this whole thing, I shift it over 

618
00:28:55,980 --> 00:29:00,360
three letters, so now the D lines up over here where the A should have been so I've 

619
00:29:00,359 --> 00:29:03,649
shifted over these bottom characters. And the characters that kind of went off the 

620
00:29:03,650 --> 00:29:07,410
end here like the A, B, and C, were kind of like whoa, we're going off the end. Where do we go? 

621
00:29:07,410 --> 00:29:09,450
We just kind of shuffle them 

622
00:29:09,450 --> 00:29:10,950
back around over here. 

623
00:29:10,950 --> 00:29:15,169
So the basic idea is we're gonna rotate our alphabet by N letters, 

624
00:29:15,169 --> 00:29:18,709
and N is 3 in the example here, and N is called the key. So the key of the Caesar 

625
00:29:18,709 --> 00:29:19,919
cipher 

626
00:29:19,919 --> 00:29:23,380
is how many letters you're actually shifting. 

627
00:29:23,380 --> 00:29:27,139
And then we wrap around it again. And now, once we've done this little wraparound, 

628
00:29:27,138 --> 00:29:30,678
we take our original message that we want to encrypt. That's something that's referred to 

629
00:29:30,679 --> 00:29:35,060
as the plain text. The plain text is your actual original message. And 

630
00:29:35,059 --> 00:29:39,179
we want to encrypt that or change it to our cipher text, which is what the 

631
00:29:39,180 --> 00:29:43,150
encrypted message is, by using this mapping. So every time an A appears in the original, 

632
00:29:43,150 --> 00:29:47,320
we replace it by a D. And a D appears in the original, we replace it by G, and a C 

633
00:29:47,319 --> 00:29:51,369
appears in the original, we replace it by an F, etc., for the whole alphabet. Is there 

634
00:29:51,369 --> 00:29:54,579
any questions about the Caesar cipher? This is actually an actual cipher that, 

635
00:29:54,579 --> 00:29:57,928
evidently, historians tell us that Caesar used in the days of yore. 

636
00:29:57,929 --> 00:30:01,980
And you know, evidently, he was killed, so it didn't work that 

637
00:30:01,980 --> 00:30:04,528
well. But you know most people, that's one of the things that when you were a little kid, and 

638
00:30:04,528 --> 00:30:07,538
you had like the Super Secret Decoder Ring, you were 

639
00:30:07,538 --> 00:30:09,480
probably getting a Caesar cipher. All 

640
00:30:09,480 --> 00:30:12,509
right, any questions about the basics of the Caesar cipher. 

641
00:30:12,509 --> 00:30:15,910
So what we're gonna do is let's write a program that actually can be able to 

642
00:30:15,910 --> 00:30:18,980
encrypt and decrypt text according to a Caesar cipher, 

643
00:30:18,980 --> 00:30:22,509
and we'll do it doing pop-down design. So we'll actually just do it on the computer 

644
00:30:22,509 --> 00:30:23,379
together 

645
00:30:23,380 --> 00:30:25,560
'cause it's more fun that way. And 

646
00:30:25,559 --> 00:30:29,470
because I'm Caesar, I will drive. So 

647
00:30:29,470 --> 00:30:31,279
we're gonna have my Caesar cipher, all 

648
00:30:31,279 --> 00:30:34,308
right. And I just gave you a little bit of a run message here. It's kind of 

649
00:30:34,308 --> 00:30:36,879
the very beginnings of the program. But all this does - 

650
00:30:36,880 --> 00:30:38,410
it's not a big deal. It says 

651
00:30:38,410 --> 00:30:41,290
this program uses a Caesar cipher for encryption. 

652
00:30:41,289 --> 00:30:45,178
It's going to ask for the encryption key. That means it's asking for the number 

653
00:30:45,179 --> 00:30:48,860
by which it's gonna rotate the alphabet to create your Caesar key, 

654
00:30:48,859 --> 00:30:51,528
or to create your Caesar cipher, and that's just our key 

655
00:30:51,528 --> 00:30:52,460
that's an integer. 

656
00:30:52,460 --> 00:30:56,230
So our plain text, that's the original message that we want to encrypt. 

657
00:30:56,230 --> 00:30:59,779
We ask the user for the plain text, so we just get a line form the user. And then 

658
00:30:59,779 --> 00:31:03,428
what we're gonna do is we're gonna create our cipher text, or the encrypted 

659
00:31:03,429 --> 00:31:03,870
text 

660
00:31:03,869 --> 00:31:08,079
by calling a function called encrypt Caesar. We're sort of giving a directive. It's 

661
00:31:08,079 --> 00:31:10,949
kind of like an inquisitive tape. Encrypt Caesar, 

662
00:31:10,950 --> 00:31:14,840
and we give it the plain text, and we give it the number for the key that we want it to 

663
00:31:14,839 --> 00:31:15,909
encrypt using. 

664
00:31:15,910 --> 00:31:19,090
And then, hopefully, that will give us back the encrypted string, and we're just gonna write that out, okay. S 

665
00:31:19,089 --> 00:31:20,909
o 

666
00:31:20,910 --> 00:31:23,350
how do we do this encryption? All right, so 

667
00:31:23,349 --> 00:31:26,559
at this point, and it should be clear that the thing we want to write is probably 

668
00:31:26,559 --> 00:31:28,009
encrypt Caesar. 

669
00:31:28,009 --> 00:31:31,019
So what we're gonna do is we're gonna write a pleasant message, 

670
00:31:31,019 --> 00:31:34,109
and what is this puppy gonna return to us? 

671
00:31:34,109 --> 00:31:36,709
String, right 'cause that's what we're expecting, the encoded version of this 

672
00:31:36,710 --> 00:31:37,788
particular 

673
00:31:37,788 --> 00:31:39,378
message as a string. 

674
00:31:39,378 --> 00:31:41,368
So we'll call this encrypt Caesar. 

675
00:31:41,368 --> 00:31:44,259
And what's it getting past? It's getting past 

676
00:31:44,259 --> 00:31:47,430
the string, which we'll just call STR, and it's getting past an integer, 

677
00:31:47,430 --> 00:31:50,700
which will we will refer to as the key. So 

678
00:31:50,700 --> 00:31:53,259
if I want to think about doing the encryption, 

679
00:31:53,259 --> 00:31:54,789
right, what I'm gonna do is, 

680
00:31:54,789 --> 00:31:57,928
on a character-by-character basis, I want to do this replacement. I want to 

681
00:31:57,929 --> 00:32:00,679
say for every character that I see in my original string, 

682
00:32:00,679 --> 00:32:04,369
there is some shifted version of that character that I want to use in my 

683
00:32:04,368 --> 00:32:05,809
encrypted string. 

684
00:32:05,809 --> 00:32:09,759
So in order to do that, I'm gonna use my standard kind of string building idiom, which 

685
00:32:09,759 --> 00:32:13,680
says I start off with a string, which I'll call results, which starts of 

686
00:32:13,680 --> 00:32:16,180
empty, right. It says, quote, quote, empty string. 

687
00:32:16,180 --> 00:32:17,798
And I'm gonna do a foreloop 

688
00:32:17,798 --> 00:32:19,509
through my string 

689
00:32:19,509 --> 00:32:25,599
that I'm giving to encrypt. So up the string's length, I'm just 

690
00:32:25,599 --> 00:32:28,138
gonna count through and get each character. So I'll 

691
00:32:28,138 --> 00:32:31,819
so sort of a standard thing. I'm gonna say CH and 

692
00:32:31,819 --> 00:32:35,369
I'm gonna essentially get the character that I want to get from the string, so 

693
00:32:35,369 --> 00:32:37,739
I'll 

694
00:32:37,740 --> 00:32:39,109
say 

695
00:32:39,109 --> 00:32:39,629
STR.char@chat@char@I. 

696
00:32:39,630 --> 00:32:40,450


697
00:32:40,450 --> 00:32:41,909
So I've now gotten my character. 

698
00:32:41,909 --> 00:32:45,540
I want to figure out how to encrypt that character, okay. 

699
00:32:45,539 --> 00:32:48,970
So I think to myself, wow, gee, while encrypting the character involves all this 

700
00:32:48,970 --> 00:32:51,069
stuff, doing the shift and all that, 

701
00:32:51,069 --> 00:32:52,619
that's kind of complicated. 

702
00:32:52,619 --> 00:32:54,639
Maybe I should just create a function to do it. 

703
00:32:54,640 --> 00:32:57,520
All right, that's the old notion of pop-down design. Any time you get somewhere, well, 

704
00:32:57,519 --> 00:32:58,470
you're, like, 

705
00:32:58,470 --> 00:33:01,960
wow, that's kind of complicated. Maybe I don't want to stick this all in here and 

706
00:33:01,960 --> 00:33:02,750
figure it out. 

707
00:33:02,750 --> 00:33:05,890
But it's the smaller piece, which is just dealing with a single character instead 

708
00:33:05,890 --> 00:33:07,340
of dealing with the whole string. 

709
00:33:07,339 --> 00:33:10,429
Let me write a function that will actually do it, or a method that'll actually do it. So what I'm 

710
00:33:10,430 --> 00:33:13,419
gonna do, is I'm gonna append to my results 

711
00:33:13,419 --> 00:33:17,759
what I get by calling encrypt, 

712
00:33:17,759 --> 00:33:20,860
a single character. So I'll just call it encrypt char 

713
00:33:20,859 --> 00:33:24,238
and what I'm gonna pass to it is the character that I want encrypt, and I need 

714
00:33:24,239 --> 00:33:26,900
to also pass to it the key so it knows 

715
00:33:26,900 --> 00:33:30,559
how to do the appropriate shifting to encrypt that character. 

716
00:33:30,558 --> 00:33:35,589
And after it does this encryption, I'm just gonna say hey, if you've 

717
00:33:35,589 --> 00:33:38,928
successfully encrypted all of your strings, what I want to do is return, 

718
00:33:38,929 --> 00:33:40,000


719
00:33:40,000 --> 00:33:40,910
RTN, 

720
00:33:40,910 --> 00:33:43,009
my results, right. 

721
00:33:43,009 --> 00:33:46,339
That's your standard string idiom. I start off with an empty string. 

722
00:33:46,339 --> 00:33:49,709
I do some kind of loop through every character of the string. I'm gonna do 

723
00:33:49,710 --> 00:33:52,919
the processing one character at a time, and return my results. 

724
00:33:52,919 --> 00:33:54,660
Everything in that 

725
00:33:54,660 --> 00:33:58,080
function that you see, or in that method that you see, except for that one line, should 

726
00:33:58,079 --> 00:34:01,319
be something you can do in your sleep now. You've seen it, like, over and over. We 

727
00:34:01,319 --> 00:34:04,210
just did it a couple of times today. We did it a couple times last time. 

728
00:34:04,210 --> 00:34:07,759
It's the standard kind of thing for going through a string one character at a time. 

729
00:34:07,759 --> 00:34:10,929
And now, we reduced the whole problem of encrypting a whole string 

730
00:34:10,929 --> 00:34:13,239
to the problem of just encrypting a single letter. 

731
00:34:13,239 --> 00:34:14,619
So what I'm gonna have in here 

732
00:34:14,619 --> 00:34:15,940
is private, 

733
00:34:15,940 --> 00:34:18,739
and this is gonna return a single character called - 

734
00:34:18,739 --> 00:34:21,639
and this puppy's called encrypt char. 

735
00:34:21,639 --> 00:34:25,499
And it's gonna get passed in some character to encrypt as well as the key 

736
00:34:25,498 --> 00:34:28,018
that it's gonna use to encrypt it. 

737
00:34:28,018 --> 00:34:29,408
And now I want to figure 

738
00:34:29,409 --> 00:34:33,898
out how do I encrypt that single character. So 

739
00:34:33,898 --> 00:34:37,168
what's something I could do to think about how this character actually gets 

740
00:34:37,168 --> 00:34:38,708
encrypted. 

741
00:34:38,708 --> 00:34:41,778
How do I want to do the appropriate shifting of the character. So let's say I've 

742
00:34:41,778 --> 00:34:42,548
gotten 

743
00:34:42,548 --> 00:34:43,940
an uppercase A. 

744
00:34:43,940 --> 00:34:47,068
Let's assume for right now all my characters are uppercase. As a matter of fact, that's a perfectly fine 

745
00:34:47,068 --> 00:34:48,070
assumption to make. 

746
00:34:48,070 --> 00:34:50,780
The solution you've gotten to, it assumes all the characters are uppercase, so 

747
00:34:50,780 --> 00:34:52,540
assume all the plain text is uppercase, 

748
00:34:52,539 --> 00:34:56,808
and I want to return to the encrypted cipher text also in uppercase. Let's say 

749
00:34:56,809 --> 00:34:59,430
I've gotten an uppercase A, okay. 

750
00:34:59,429 --> 00:35:00,919
And my 

751
00:35:00,920 --> 00:35:06,528
T is 3. So I want to do, is take that A somehow, and convert it to a D. 

752
00:35:06,528 --> 00:35:13,528
How do I do that? [Inaudible]. Un huh. 

753
00:35:14,498 --> 00:35:17,578
I want to add 3 to the character. 

754
00:35:17,579 --> 00:35:21,068
Now the only problem is I might go off the end of the character. 

755
00:35:21,068 --> 00:35:24,248
If I just add 3, and I have a Z, I'm gonna 

756
00:35:24,248 --> 00:35:27,248
- if I just have the A and go to D, that works perfectly fine, but if I have 

757
00:35:27,248 --> 00:35:30,048
a Z, I'm gonna get something like an exclamation point, or something I 

758
00:35:30,048 --> 00:35:32,278
don't know 'cause I go off the end of the character. 

759
00:35:32,278 --> 00:35:35,039
So I need to do slightly a little bit more math. And what I'm gonna do is 

760
00:35:35,039 --> 00:35:36,450
say take this character, 

761
00:35:36,449 --> 00:35:38,889
and subtract from it uppercase A. 

762
00:35:38,889 --> 00:35:42,759
That's gonna tell me which character in the alphabet it is, which number 

763
00:35:42,759 --> 00:35:44,259
character it is, right. 

764
00:35:44,259 --> 00:35:46,019
Now, if I add the key, 

765
00:35:46,018 --> 00:35:47,868
what I get is the 

766
00:35:47,869 --> 00:35:50,930
number, or the index, of the shifted character. 

767
00:35:50,929 --> 00:35:54,079
So if I had an uppercase A, and I subtract off uppercase A, I'm gonna 

768
00:35:54,079 --> 00:35:55,018
get a zero. 

769
00:35:55,018 --> 00:35:57,888
I now add the key, so I get 3. 

770
00:35:57,889 --> 00:36:01,349
And you might say, well, if you just convert that to a character, you get a D. That's perfectly fine. 

771
00:36:01,349 --> 00:36:05,649
Yeah, but if I had a Z and I subtract off an uppercase A, I get 25. 

772
00:36:05,648 --> 00:36:09,328
If I add 3 to 25, I get 28, which is now outside the 

773
00:36:09,329 --> 00:36:12,778
bounds of the alphabet. How do I wrap around that 28 back to the 

774
00:36:12,778 --> 00:36:16,389
beginning of the alphabet. 

775
00:36:16,389 --> 00:36:20,759
Mod it by 26, or we do with the remainder operator by 26, 

776
00:36:20,759 --> 00:36:23,630
right. So what that does is it says if you've gone off the end, 

777
00:36:23,630 --> 00:36:26,440
basically, when you divide by 26 and take the remainder, if you've 

778
00:36:26,440 --> 00:36:31,869
gone off the end, it kind of gets rid of the first 26, and wraps you back around the beginning. 

779
00:36:31,869 --> 00:36:33,630
So if I do that, 

780
00:36:33,630 --> 00:36:35,548
this will actually work 

781
00:36:35,548 --> 00:36:38,639
to get me the position of the character 

782
00:36:38,639 --> 00:36:41,650
wrapped around, and once I've gotten the position of the character, here's the 

783
00:36:41,650 --> 00:36:45,568
funky thing. I need to add the A back in 

784
00:36:45,568 --> 00:36:48,808
because if I have, let's say, an uppercase A to being with, and I subtract out 

785
00:36:48,809 --> 00:36:50,380
uppercase A, that gives me zero. 

786
00:36:50,380 --> 00:36:55,970
I add the key. That gives me 3. I do the remainder by 26. Three 

787
00:36:55,969 --> 00:36:59,558
divided by 26 as the remainder is still 3. So 

788
00:36:59,559 --> 00:37:03,499
now I have the number 3. I need to get that 3 converted to the letter D. 

789
00:37:03,498 --> 00:37:05,848
How do I do that? I add the letter A 

790
00:37:05,849 --> 00:37:08,649
to that 3, okay. Is there 

791
00:37:08,648 --> 00:37:10,848
any questions about that? 

792
00:37:10,849 --> 00:37:13,939
Now, the final funky thing that I need to do, 

793
00:37:13,938 --> 00:37:17,438
is if I want to assign this to a character, I can't do this directly. Notice if I 

794
00:37:17,438 --> 00:37:20,239
try to do this directly, I get this little thingy here. And you might say 

795
00:37:20,239 --> 00:37:23,959
Marilyn, what's going on? Like you told me characters were the same as numbers, and everything 

796
00:37:23,960 --> 00:37:27,889
I've done so far has to do with numbers, so why can't I assign that to a character? 

797
00:37:27,889 --> 00:37:29,969
And this little error message comes up. 

798
00:37:29,969 --> 00:37:33,268
And this has to do with the same thing when we talked about converting from real 

799
00:37:33,268 --> 00:37:36,498
values to integers. Remember when we went from a real value to an integer. We said you'll 

800
00:37:36,498 --> 00:37:39,768
lose some information if you try to truncate a real value, like a double to an 

801
00:37:39,768 --> 00:37:40,199
integer. 

802
00:37:40,199 --> 00:37:43,728
So you explicitly have to cast it from being a double-splint integer. 

803
00:37:43,728 --> 00:37:46,899
Same thing with characters and integers. The set of possible integers is 

804
00:37:46,900 --> 00:37:48,628
huge. It's like billions and billions. 

805
00:37:48,628 --> 00:37:51,739
The set of characters is much smaller than that. So if you want to go from 

806
00:37:51,739 --> 00:37:55,460
an integer back to a character, you need to explicitly say 

807
00:37:55,460 --> 00:37:57,510
convert that integer back to a character. 

808
00:37:57,510 --> 00:37:59,610
So we need to explicitly do a 

809
00:37:59,610 --> 00:38:00,360
cast here 

810
00:38:00,360 --> 00:38:01,999
back to a character. 

811
00:38:01,998 --> 00:38:03,039
And if we do that, 

812
00:38:03,039 --> 00:38:06,429
then we're happy and friendly. Did 

813
00:38:06,429 --> 00:38:11,690
I get all my friends right? One, two, three, one two three. All right, 

814
00:38:11,690 --> 00:38:13,989
why is this still unhappy? Oh, 

815
00:38:13,989 --> 00:38:17,749
duplicate variable CH, yeah. Let me call this C. 

816
00:38:17,748 --> 00:38:20,399
Actually, let me make my life easier. This is a thing I just want to return, so 

817
00:38:20,400 --> 00:38:22,119
I'm just gonna return it. Do, do, 

818
00:38:22,119 --> 00:38:26,499
do, do, do. I won't even assign it to any temporary variable. We'll just return it 'cause 

819
00:38:26,498 --> 00:38:28,509
now I'm upset. No, 

820
00:38:28,509 --> 00:38:31,688
I'm really not upset. We're just gonna return it. 

821
00:38:31,688 --> 00:38:34,679
So, hopefully, that will give us our little Caesar cipher. 

822
00:38:34,679 --> 00:38:37,788
So let's go ahead and run this, and see if, in fact, it's working. 

823
00:38:37,789 --> 00:38:40,790
Any questions about this while this is running? I'll sort of scroll this 

824
00:38:40,789 --> 00:38:42,708
down a little bit so you can see 

825
00:38:42,708 --> 00:38:44,588
what's going on for that single character. 

826
00:38:44,588 --> 00:38:49,139
So this was my Caesar cipher. 

827
00:38:49,139 --> 00:38:52,798
So we say, et tu, brute. 

828
00:38:52,798 --> 00:38:56,409
Illegal number format. Yeah 'cause that's not the thing I wanted to encrypt. 

829
00:38:56,409 --> 00:38:58,519
My encryption here is 3, 

830
00:38:58,518 --> 00:39:02,139
then I will give it the plain text I was to - everyone's like what did 

831
00:39:02,139 --> 00:39:03,929
he do. Sometimes 

832
00:39:03,929 --> 00:39:05,759
it's the obvious that's wrong, 

833
00:39:05,760 --> 00:39:08,059
and you just need to read. 

834
00:39:08,059 --> 00:39:11,249
All right, there we actually go. Now, there's a little problem here. See, the 

835
00:39:11,248 --> 00:39:14,708
little problem is the spaces actually got encrypted. 

836
00:39:14,708 --> 00:39:18,448
We don't want to encrypt spaces. We only want to encrypt things that are actually 

837
00:39:18,449 --> 00:39:21,688
valid characters. So we're not quite done yet. What we need to do is come back over 

838
00:39:21,688 --> 00:39:22,598
here and say, 

839
00:39:22,599 --> 00:39:26,009
hey, you know what, for my encrypt character, I wasn't quite as bright as I thought I 

840
00:39:26,009 --> 00:39:26,259
was. 

841
00:39:26,250 --> 00:39:30,170
I need to make sure this thing's actually in uppercase character before I try to encrypt 

842
00:39:30,170 --> 00:39:33,430
it. So we can sort of do that if 

843
00:39:33,429 --> 00:39:36,528
I just call my little friend character. 

844
00:39:36,528 --> 00:39:39,079
And the thing we want to say is, 

845
00:39:39,079 --> 00:39:41,369


846
00:39:41,369 --> 00:39:45,039
is uppercase, 

847
00:39:45,039 --> 00:39:49,829
and I'll pass at CH. So if it's already - if it's an uppercase character, then I'll 

848
00:39:49,829 --> 00:39:51,859
return this. 

849
00:39:51,858 --> 00:39:54,088
Otherwise, what I'll do - I'll tab this in - 

850
00:39:54,088 --> 00:39:56,688
is I will just return CH 

851
00:39:56,688 --> 00:40:03,688
unchanged. So if I've gotten something that isn't actually a character, then I'll return - do, do - why 

852
00:40:04,849 --> 00:40:10,649
is this unhappy again. Oh, semicolon, thank you. All 

853
00:40:10,648 --> 00:40:14,268
right. [Inaudible] Now I got an extra one. Notice it doesn't give me an error on the extra one 'cause, actually, semicolon without a statement 

854
00:40:14,268 --> 00:40:17,448
is the emsin statement. It's perfectly fine, but thank you for catching the straight 

855
00:40:17,449 --> 00:40:18,469
semicolon. 

856
00:40:18,469 --> 00:40:20,259
So we'll go ahead and run this, 

857
00:40:20,259 --> 00:40:23,639
and we'll try our friend, et tu, Brute, again. 

858
00:40:23,639 --> 00:40:26,588
Sometimes it's all about texting, and so we have 

859
00:40:26,588 --> 00:40:27,759
et tu, 

860
00:40:27,759 --> 00:40:28,938
Brute, 

861
00:40:28,938 --> 00:40:29,918
and now we're okay 

862
00:40:29,918 --> 00:40:33,588
'cause we're not encrypting anything that is not a letter. 

863
00:40:33,588 --> 00:40:36,719
So sometimes we think we're okay. We need to go back and just make sure we actually do 

864
00:40:36,719 --> 00:40:41,188
the texting. Any questions about this? 

865
00:40:41,188 --> 00:40:44,658
If this all made sense to you, nod your head. 

866
00:40:44,659 --> 00:40:48,969
If this didn't make sense to you, shake your head. Feel no qualms about shaking your head. 

867
00:40:48,969 --> 00:40:50,539
If you're someone in the middle, just 

868
00:40:50,539 --> 00:40:52,170
stare 

869
00:40:52,170 --> 00:40:59,170
and stare at me. No, if you're someone in the middle, shake your head. Okay, un huh. [Inaudible]. 

870
00:41:01,139 --> 00:41:03,708
Why don't I need an L statement, like, say here? 

871
00:41:03,708 --> 00:41:07,048
'Cause if I hit the return, I return from the function immediately 

872
00:41:07,048 --> 00:41:10,798
and I never, actually, get it down to this return. So if I hit this return 

873
00:41:10,798 --> 00:41:11,300
statement, 

874
00:41:11,300 --> 00:41:14,200
I'm done with the method. As soon as I hit that return, it doesn't matter if there's 

875
00:41:14,199 --> 00:41:19,478
any more lines in the method. I'm done. I actually return out. So the 

876
00:41:19,478 --> 00:41:21,659
one other thing we might like to do with this that doesn't 

877
00:41:21,659 --> 00:41:24,500
quite actually work right now. Let's actually try running this, then I'll show 

878
00:41:24,500 --> 00:41:27,030
you what happens, just to show you that it's bad time. 

879
00:41:27,030 --> 00:41:30,339
If I actually encrypt something like et tu, Brute, and I want to decrypt it, I might 

880
00:41:30,338 --> 00:41:31,250
say hey, 

881
00:41:31,250 --> 00:41:35,108
try to use minus 3 as your key, 

882
00:41:35,108 --> 00:41:38,568
and if I try to put in the text - I don't even remember what the text was that I wanted to 

883
00:41:38,568 --> 00:41:41,748
encrypt. I guess this funky thing with questions marks, and it's just not working 

884
00:41:41,748 --> 00:41:43,448
to move in the negative directly. 

885
00:41:43,449 --> 00:41:47,789
So I want to allow for my Caesar cipher to also be able to decrypt information, 

886
00:41:47,789 --> 00:41:51,700
which means if I got a Caesar cipher by encrypting with a key of 3, 

887
00:41:51,699 --> 00:41:55,839
if I give it the text that's been encoded, and I give it the key minus 3, it 

888
00:41:55,840 --> 00:41:59,278
should shift it back three letters and actually work for me. So 

889
00:41:59,278 --> 00:42:01,318
how do I do that? Well, 

890
00:42:01,318 --> 00:42:04,909
it's something that has to deal with each individual character. If I want to 

891
00:42:04,909 --> 00:42:07,849
encrypt each individual character, I need to figure out what's the right way of 

892
00:42:07,849 --> 00:42:09,709
using the key, okay. 

893
00:42:09,708 --> 00:42:16,518
Think about a key of minus 3. What's a key of minus 3 equivalent to? 

894
00:42:16,518 --> 00:42:19,788
A key of 23, right. A few people mumbled it, so we'll just throw out 

895
00:42:19,789 --> 00:42:20,699
some candy. 

896
00:42:20,699 --> 00:42:24,278
If I want to go 3 in the opposite direction, if I want to go 3 

897
00:42:24,278 --> 00:42:26,260
sort of this way, as opposed to this way. 

898
00:42:26,260 --> 00:42:27,890
It's the same thing as going 

899
00:42:27,889 --> 00:42:30,648
23 characters in the opposite direction. 

900
00:42:30,648 --> 00:42:32,389
So if I want to think about doing that, 

901
00:42:32,389 --> 00:42:34,309
I can say, if my key 

902
00:42:34,309 --> 00:42:38,469
is a negative number, so if my key is less than zero, there's some shifting I 

903
00:42:38,469 --> 00:42:41,568
need to do of the key to actually get this puppy to work. 

904
00:42:41,568 --> 00:42:44,829
So if my key is less than zero - as a matter of fact, I'm gonna do this once 

905
00:42:44,829 --> 00:42:45,609
down here. 

906
00:42:45,608 --> 00:42:48,288
So rather than doing it, and encrypting each character, I'm gonna do it over here 

907
00:42:48,289 --> 00:42:51,809
by saying you know what, once I shift my key over, I want to use that 

908
00:42:51,809 --> 00:42:56,019
same key to encrypt all my characters. So I want to do the shifting just once up 

909
00:42:56,018 --> 00:42:59,598
here. It makes sense to do it once for the string, and then I'll use my updated version of key. 

910
00:42:59,599 --> 00:43:01,919
So here's what I'm gonna do. I'm 

911
00:43:01,918 --> 00:43:03,478
gonna say take the key, 

912
00:43:03,478 --> 00:43:07,118
and the way I'm gonna update key is I'm gonna say it's 26. And this 

913
00:43:07,119 --> 00:43:10,568
looks a little bit funky, but I'll explain it to you in just a second - 

914
00:43:10,568 --> 00:43:12,369
modded 

915
00:43:12,369 --> 00:43:14,499
by 26. 

916
00:43:14,498 --> 00:43:18,129
And you might say Marilyn, why do you need all this math to actually pull it off 

917
00:43:18,130 --> 00:43:20,838
'cause if you do something, you could say why can't you just take 26 

918
00:43:20,838 --> 00:43:22,358
and subtract from it 

919
00:43:22,358 --> 00:43:26,498
your key. So if you want to say minus - or add toward your key. So if 

920
00:43:26,498 --> 00:43:30,688
you want to have a key of minus 3, isn't that just the same as adding minus 3 

921
00:43:30,688 --> 00:43:33,379
to 26. You'll get 23, aren't you fine. 

922
00:43:33,380 --> 00:43:37,130
Yeah, that's fine for sufficiently small values of key. So if this thing 

923
00:43:37,130 --> 00:43:39,710
actually is minus 3, minus, minus 3 

924
00:43:39,710 --> 00:43:40,829
gives me 3, 

925
00:43:40,829 --> 00:43:43,199
and if I were to - oh, missing 

926
00:43:43,199 --> 00:43:44,190
a minus in here. 

927
00:43:44,190 --> 00:43:46,349
Sorry, my bad. I had two 

928
00:43:46,349 --> 00:43:49,829
minuses. I want to have another minus 

929
00:43:49,829 --> 00:43:50,260


930
00:43:50,260 --> 00:43:53,620
right there. So if key is minus 3, and I take a negative of minus 3, that gives 

931
00:43:53,619 --> 00:43:57,190
me 3. Twenty-six minus 3 by itself would give me 23, which is 

932
00:43:57,190 --> 00:44:00,519
the value I care about and that's perfectly fine. 

933
00:44:00,518 --> 00:44:04,258
But what happens if this key that someone gives me is something, for example, that's 

934
00:44:04,259 --> 00:44:06,009
larger than 

935
00:44:06,009 --> 00:44:08,119
26. That's kind of bad time 

936
00:44:08,119 --> 00:44:10,449
because if I subtract a number that's 

937
00:44:10,449 --> 00:44:14,338
larger than 26 from the 26, so if this happens to be minus, 

938
00:44:14,338 --> 00:44:18,599
let's say 27, and I say minus minus 27 is positive 27. And I 

939
00:44:18,599 --> 00:44:22,499
subtract 27 from 26, I get minus 1. That's bad time. 

940
00:44:22,498 --> 00:44:23,058


941
00:44:23,059 --> 00:44:26,940
So the reason why I have this 26 in here, is it says first take the key. They 

942
00:44:26,940 --> 00:44:28,619
gave me some negative value. 

943
00:44:28,619 --> 00:44:33,039
Take the negative of that, which gives you some positive value. When you mod it by 26, you 

944
00:44:33,039 --> 00:44:37,669
will guarantee that that value they've given you is less than 

945
00:44:37,668 --> 00:44:39,478
26 'cause if it was 

946
00:44:39,478 --> 00:44:42,588
26, 26 mod, 26 is zero. Something larger than 26 gives me 

947
00:44:42,588 --> 00:44:43,978
a remainder. 

948
00:44:43,978 --> 00:44:46,699
So as long as I mod by 26, I will always get back 

949
00:44:46,699 --> 00:44:52,328
the appropriately mapped value, less than 26, and then I will subtract that [inaudible]. 

950
00:44:52,329 --> 00:44:54,739
So just to make sure this actually works, 

951
00:44:54,739 --> 00:44:55,759
what I'm gonna do 

952
00:44:55,759 --> 00:44:57,909
is in my main program, 

953
00:44:57,909 --> 00:45:03,689
I'm gonna say encrypt Caesar using this key. 

954
00:45:03,688 --> 00:45:05,778
And then, do, do, do, so I have some cipher text. 

955
00:45:05,778 --> 00:45:08,599
I'm going to now - well, actually, let 

956
00:45:08,599 --> 00:45:11,640
me write out the cipher text so I'll still use this print link. 

957
00:45:11,639 --> 00:45:12,699
And then, 

958
00:45:12,699 --> 00:45:17,838
I'm going to have some other string, 

959
00:45:17,838 --> 00:45:19,150
new plain. 

960
00:45:19,150 --> 00:45:23,180
A new plain is just going to be doing encrypt 

961
00:45:23,179 --> 00:45:24,239


962
00:45:24,239 --> 00:45:25,278
Caesar 

963
00:45:25,278 --> 00:45:28,289
on my cipher text, so that should be my encrypted text 

964
00:45:28,289 --> 00:45:30,199
with the negative of the key. 

965
00:45:30,199 --> 00:45:32,449
So I want to, essentially, switch back 

966
00:45:32,449 --> 00:45:33,548
to what I've got. 

967
00:45:33,548 --> 00:45:35,900
And so I'll have print link 

968
00:45:35,900 --> 00:45:40,599
new plane quote dot 

969
00:45:40,599 --> 00:45:42,709
whatever the new - man, 

970
00:45:42,708 --> 00:45:50,909
I cannot type to save my life - L print link. Thank you.

972
00:45:50,909 --> 00:45:54,759
So now I run this puppy 

973
00:45:54,759 --> 00:45:57,960
in our final moment together, 3 

974
00:45:57,960 --> 00:46:00,918
et tu, Brute. 

975
00:46:00,918 --> 00:46:03,248
Well, at least I got it back even though I misspelled it. 

976
00:46:03,248 --> 00:46:04,778
I got my 

977
00:46:04,778 --> 00:46:07,978
mixed-up characters, and then I got my new plain text, which is the same as my 

978
00:46:07,978 --> 00:46:11,998
original text, which I got just by, essentially, shifting in the negative direction. 

979
00:46:11,998 --> 00:46:14,858
So any questions about that. 

980
00:46:14,858 --> 00:46:17,818
Allrighty, then we're done with strings for the time being, and I'll see you on 

981
00:46:17,818 --> 00:46:18,108
Wednesday.