2 00:00:12,868 --> 00:00:16,128 This presentation is delivered by the Stanford Center for Professional 3 00:00:16,129 --> 00:00:23,129 Development. 4 00:00:24,939 --> 00:00:26,178 So with that said, 5 00:00:26,178 --> 00:00:28,899 any questions about strings? We're gonna do a bunch more stuff today 6 00:00:28,899 --> 00:00:32,058 with strings and characters. But if there's any questions before we actually 7 00:00:32,058 --> 00:00:33,449 dive in to things, 8 00:00:33,450 --> 00:00:36,630 let me know now. And if you could use the microphone, that would be great. One more time, take those 9 00:00:36,630 --> 00:00:40,570 microphones out, hold them close to your heart. Air and gear, they're lots of fun, they're your 10 00:00:40,570 --> 00:00:41,479 friend. Keep 11 00:00:41,479 --> 00:00:43,279 the microphone with you. 12 00:00:43,280 --> 00:00:48,490 Actually, sorry, about the mid-term, is it going - what's the cutoff of the mid-term in terms of, like, caustes. Right, 13 00:00:48,490 --> 00:00:51,800 so the mid-term for stuff you need to know, the cutoff will be Wednesday's 14 00:00:51,799 --> 00:00:52,628 class. 15 00:00:52,628 --> 00:00:55,939 So, basically, you'll have a whole week of material that you won't need to 16 00:00:55,939 --> 00:00:57,250 be responsible for 17 00:00:57,250 --> 00:01:00,058 that will be from this Wednesday up until the mid-term. 18 00:01:00,058 --> 00:01:03,058 The other thing, though, is to keep in mind that a few people have asked, well, do I need the 19 00:01:03,058 --> 00:01:05,959 book versus your lectures for the mid-term. 20 00:01:05,959 --> 00:01:07,399 You need to know 21 00:01:07,400 --> 00:01:10,780 the lectures, and you need to know all the material from the book that is 22 00:01:10,780 --> 00:01:13,549 covered with respect to lectures, which is most of the material from the 23 00:01:13,549 --> 00:01:16,989 book. But there's a few cases where we go over something very quickly in 24 00:01:16,989 --> 00:01:19,298 class, or I say refer to this page of 25 00:01:19,299 --> 00:01:21,909 the book or whatever. That stuff you're responsible for knowing. 26 00:01:21,909 --> 00:01:25,780 Stuff that I've, like, explicitly told you you don't need to know, like, polar coordinates, 27 00:01:25,780 --> 00:01:27,700 aren't gonna be on the exam, okay. 28 00:01:27,700 --> 00:01:30,990 So the exam will be more heavily geared towards stuff from lecture, but you still 29 00:01:30,989 --> 00:01:33,769 should know all the stuff from the book that we've kind of referred to 30 00:01:33,769 --> 00:01:38,199 in lecture as we've gone along, allrighty. All right, 31 00:01:38,200 --> 00:01:42,159 so let's dive into our next great topic. Actually, it's a continuation of our last great 32 00:01:42,159 --> 00:01:43,909 topic, which is strings. 33 00:01:43,909 --> 00:01:46,119 And so, if we think about strings a little bit, one 34 00:01:46,120 --> 00:01:49,120 of the things we might want to do with strings, is we want to do some string 35 00:01:49,120 --> 00:01:51,120 processing that also involves some characters. 36 00:01:51,120 --> 00:01:52,740 So how are we gonna do that? 37 00:01:52,739 --> 00:01:55,519 One thing we might want to do is let's just do a simple example to begin with, 38 00:01:55,519 --> 00:01:58,810 which is going through a string, and counting the number of uppercase characters in 39 00:01:58,810 --> 00:02:01,370 the string. And the reason why I'm gonna harp on strings a whole bunch - we talked 40 00:02:01,370 --> 00:02:02,219 about it last time - 41 00:02:02,218 --> 00:02:05,178 we're gonna talk about it this time - guess what your next assignments gonna be. It's 42 00:02:05,179 --> 00:02:08,748 gonna be all about string processing. So it's good stuff to know, okay. 43 00:02:08,748 --> 00:02:10,718 So we might want to have some function. 44 00:02:10,718 --> 00:02:12,739 Count uppercase. 45 00:02:12,739 --> 00:02:15,539 And that's a function I've actually given to you in one of the handouts, so you 46 00:02:15,539 --> 00:02:18,308 don't need to worry about jotting down all my code real quickly, but you might want 47 00:02:18,308 --> 00:02:19,829 to pay close attention. 48 00:02:19,829 --> 00:02:23,670 And what this does, is it gets past some string, STR, 49 00:02:23,669 --> 00:02:26,969 and it's gonna count how many uppercase characters are in that string. So 50 00:02:26,969 --> 00:02:28,649 it's gonna return an int. 51 00:02:28,649 --> 00:02:31,739 And let's just say this is part of some other program, so we'll call this private, 52 00:02:31,739 --> 00:02:33,500 although you could make it public if it was 53 00:02:33,500 --> 00:02:36,908 in some class that you wanted to make available for other people to use. 54 00:02:36,908 --> 00:02:39,229 So if we want to count the number of uppercase characters, 55 00:02:39,229 --> 00:02:42,289 what do we want to think about doing? What's the kind of standard idiom that 56 00:02:42,289 --> 00:02:48,868 we use for strings? Anyone remember? 57 00:02:48,868 --> 00:02:52,598 What? We want to have a foreloop, [inaudible] somewhere over here. 58 00:02:52,598 --> 00:02:56,588 Yeah, it's just raining candy on you. We want to have a foreloop that goes through all the 59 00:02:56,588 --> 00:02:59,019 characters of the string, sort of counting through the character. 60 00:02:59,019 --> 00:03:04,229 So we can do that by just saying for N2i equals zero, "i" less than 61 00:03:04,229 --> 00:03:06,019 the length of the string, 62 00:03:06,019 --> 00:03:09,989 right. So STR.length is the method we use to get the length of the string, 63 00:03:09,989 --> 00:03:13,609 and then i++. And this is gonna loop through all the characters of the 64 00:03:13,609 --> 00:03:14,549 string. Okay, 65 00:03:14,549 --> 00:03:15,729 where actually it's gonna loop 66 00:03:15,729 --> 00:03:18,650 through some number of times, which is the number of characters in the string. 67 00:03:18,650 --> 00:03:21,769 Now we want to pull out each one of the characters, individually, to check to see if it's an uppercase 68 00:03:21,769 --> 00:03:22,860 character. 69 00:03:22,860 --> 00:03:27,009 What method might we use to do that? 70 00:03:27,008 --> 00:03:31,628 Get a character out of a string in a particular position. 71 00:03:31,628 --> 00:03:37,278 Come on, I'm begging for it. 72 00:03:37,278 --> 00:03:37,799 Char at - it's like 73 00:03:37,800 --> 00:03:39,900 where'd it go? It's just gone, 74 00:03:39,900 --> 00:03:41,088 char at, and we'll just 75 00:03:41,088 --> 00:03:43,959 for the delayed reaction, we'll do it in slow 76 00:03:43,959 --> 00:03:44,729 mo. 77 00:03:44,729 --> 00:03:49,729 Anyone remember "The Six Million Dollar Man," that show? No, all right. 78 00:03:49,729 --> 00:03:50,940 Get another 79 00:03:50,939 --> 00:03:51,408 man. 80 00:03:51,408 --> 00:03:54,938 I'm just getting so old, I gotta hang it up. And the thing is, I'm not that much 81 00:03:54,938 --> 00:03:55,609 older than you. But it's just amazing 82 00:03:55,610 --> 00:03:59,180 what big a difference a few years makes. So 83 00:03:59,180 --> 00:03:59,979 84 00:03:59,979 --> 00:04:01,249 char CH 85 00:04:01,248 --> 00:04:06,769 is going to be from this string. Were gonna pull out the char at apposition 86 00:04:06,769 --> 00:04:09,979 i. So now, we've actually each. We're gonna loop through each 87 00:04:09,979 --> 00:04:11,879 character of the string, pulling out that character, 88 00:04:11,878 --> 00:04:14,248 and we want to check to see if the character's uppercase. 89 00:04:14,248 --> 00:04:17,848 We could actually have an if statement in here to check to see if that CH is in 90 00:04:17,848 --> 00:04:21,949 between uppercase A and uppercase Z, which is kind of how you saw last time we could do 91 00:04:21,949 --> 00:04:23,210 some math on characters. 92 00:04:23,209 --> 00:04:26,060 We're gonna use the new funky way, which is to actually use one of the 93 00:04:26,060 --> 00:04:29,370 methods from the character class, and just say if. 94 00:04:29,370 --> 00:04:33,098 And the way we use the methods from the character class, we specify the name of the 95 00:04:33,098 --> 00:04:35,188 class here as opposed to the name of an object 96 00:04:35,189 --> 00:04:38,789 because the methods from the character class are what we refer to as static method. 97 00:04:38,788 --> 00:04:42,110 There is no object associated with them. There are just methods that you call and 98 00:04:42,110 --> 00:04:43,819 pass into character. 99 00:04:43,819 --> 00:04:48,699 Is uppercase because this returns a Boolean, and will pass at CH to see if CH 100 00:04:48,699 --> 00:04:51,030 is an uppercase character, okay. If it is 101 00:04:51,029 --> 00:04:54,029 an uppercase character, okay. If it is an uppercase character, we want to somehow keep track of the number of the 102 00:04:54,029 --> 00:04:57,758 uppercase characters we have. So how might we do that? 103 00:04:57,759 --> 00:05:00,720 Counter, right. So have some int 104 00:05:00,720 --> 00:05:04,770 count equals zero, up here, that I want initialized. Who said that? It came from 105 00:05:04,769 --> 00:05:06,158 somewhere over here. 106 00:05:06,158 --> 00:05:07,379 Come on, raise your hand. 107 00:05:07,379 --> 00:05:08,360 Don't be shy. 108 00:05:08,360 --> 00:05:10,199 It's a candy extravaganza. 109 00:05:10,199 --> 00:05:13,689 So if character is uppercase, CH then got count, 110 00:05:13,689 --> 00:05:15,629 we're just gonna add 1 to. 111 00:05:15,629 --> 00:05:18,800 Otherwise we're not gonna increment the counts. It's not an uppercase character. 112 00:05:18,800 --> 00:05:19,660 And then, 113 00:05:19,660 --> 00:05:22,630 we end the foreloop. So this is gonna go through all the characters of the string. 114 00:05:22,629 --> 00:05:26,490 For every character check seeks the uppercase. If it is, increment our count, 115 00:05:26,490 --> 00:05:28,168 and at the end, what we want to do 116 00:05:28,168 --> 00:05:29,959 is, basically, return 117 00:05:29,959 --> 00:05:30,818 that count, 118 00:05:30,819 --> 00:05:33,399 which tells us how many uppercase characters were 119 00:05:33,399 --> 00:05:35,290 actually in the string, okay. Is there 120 00:05:35,290 --> 00:05:37,220 any questions about this? This 121 00:05:37,220 --> 00:05:39,869 is kind of like an example of the sort of vanilla string processing 122 00:05:39,869 --> 00:05:43,309 you might do. You have some string. You go through all the characters of the string. You 123 00:05:43,309 --> 00:05:46,189 do some kind of thing for a character of the string. In this case, were not 124 00:05:46,189 --> 00:05:47,600 creating a new resulting 125 00:05:47,600 --> 00:05:50,810 string. We're just counting up some number of characters that might be in 126 00:05:50,810 --> 00:05:53,048 the string. 127 00:05:53,048 --> 00:05:56,188 So we can do something a little bit more funky. This is kind of fun, but it's sort of 128 00:05:56,189 --> 00:05:57,120 like, yeah, 129 00:05:57,120 --> 00:05:58,788 just basic kind of string 130 00:05:58,788 --> 00:06:02,459 and character stuff. Let's see something a little bit more funky, 131 00:06:02,459 --> 00:06:03,978 which is actually to do 132 00:06:03,978 --> 00:06:07,620 some string manipulation to break the string up into smaller pieces. And so what we 133 00:06:07,620 --> 00:06:09,389 want to do is replace 134 00:06:09,389 --> 00:06:12,740 some occurrence of a substring in a larger string 135 00:06:12,740 --> 00:06:15,960 with some other sting, sort of like when you work on [inaudible], when you do Find/Replace. 136 00:06:15,959 --> 00:06:16,558 You say, 137 00:06:16,559 --> 00:06:20,669 hey, find me some little string, or some little work that's actually in my bigger 138 00:06:20,668 --> 00:06:22,370 document. I'm gonna replace it with some other word. We're actually 139 00:06:22,370 --> 00:06:25,300 gonna implement that as a little function, okay. So 140 00:06:25,300 --> 00:06:26,560 what this is gonna do, 141 00:06:26,560 --> 00:06:27,930 we'll call this 142 00:06:27,930 --> 00:06:29,569 replace 143 00:06:29,569 --> 00:06:33,150 occurrence just to keep the name short, but, in fact, all we're gonna do is 144 00:06:33,149 --> 00:06:35,589 replace the very first occurrence in a string. 145 00:06:35,589 --> 00:06:37,399 So we're gonna get past int 146 00:06:37,399 --> 00:06:38,609 some string, 147 00:06:38,610 --> 00:06:39,759 STR, 148 00:06:39,759 --> 00:06:42,678 and what we want to do is, basically, 149 00:06:42,678 --> 00:06:44,209 have some 150 00:06:44,209 --> 00:06:47,029 original string, which is the thing that we want to replace, 151 00:06:47,029 --> 00:06:50,418 with some replacement string. So we're gonna get past three parameters 152 00:06:50,418 --> 00:06:51,058 here. 153 00:06:51,059 --> 00:06:54,338 Will call this RPL for replace, okay. Which is 154 00:06:54,338 --> 00:06:57,790 the large string, a piece of text that I want to replace some word in, 155 00:06:57,790 --> 00:07:01,360 the original word that I want to replace, and the thing that I want to replace it 156 00:07:01,360 --> 00:07:02,620 with, okay. 157 00:07:02,620 --> 00:07:05,810 And so what I want to do because strings are immutable, right. I can't change the string 158 00:07:05,810 --> 00:07:06,709 in place. 159 00:07:06,709 --> 00:07:09,120 I have to actually return a new string, 160 00:07:09,120 --> 00:07:12,129 which has this original replaced by this string. So, 161 00:07:12,129 --> 00:07:14,528 this puppy's gonna return the string, 162 00:07:14,528 --> 00:07:17,749 and we'll just make this private again, although we could have made it public if 163 00:07:17,749 --> 00:07:20,830 we wanted to have it in the library that other people would use, or a class that other 164 00:07:20,829 --> 00:07:23,038 people would use, okay. So 165 00:07:23,038 --> 00:07:25,199 how might we think about the algorithm 166 00:07:25,199 --> 00:07:28,430 for replacing this original string with the replacement. What's the first thing we 167 00:07:28,430 --> 00:07:35,269 might want to think about that we want to do with the original string. Do, do, do, do, a 168 00:07:35,269 --> 00:07:37,709 little concentration music. 169 00:07:37,709 --> 00:07:40,788 We want to find it, right. We want to see if this original string 170 00:07:40,788 --> 00:07:44,098 appears somewhere on that string, right because if it doesn't we're done. 171 00:07:44,098 --> 00:07:46,769 Thanks for playing, right, but that's actually the good things for playing. It's sort of like 172 00:07:46,769 --> 00:07:48,438 you got no more work to do. 173 00:07:48,439 --> 00:07:51,809 And there's, actually, some methods from the string class that we can use to do 174 00:07:51,809 --> 00:07:56,529 that. So there's a string in the string class called "index of." 175 00:07:56,528 --> 00:08:00,689 And what index of does is I can pass it some string, like the original 176 00:08:00,689 --> 00:08:01,959 string I want to look up, 177 00:08:01,959 --> 00:08:03,949 and it will return to me a number. 178 00:08:03,949 --> 00:08:07,028 That number is the index of the position of the 179 00:08:07,028 --> 00:08:09,449 first character of this string 180 00:08:09,449 --> 00:08:12,379 if it appears in the larger string. So 181 00:08:12,379 --> 00:08:15,499 the larger string is the one that I'm sending the message to, and I'm asking it do 182 00:08:15,499 --> 00:08:18,899 you have this original string somewhere inside you. If you do, 183 00:08:18,899 --> 00:08:21,519 return me the index of its first occurrence. 184 00:08:21,519 --> 00:08:25,758 And if you don't, it returns a negative 1. So I'm gonna assign this thing 185 00:08:25,759 --> 00:08:29,559 to some variable I'll call index, and first of all, I want to check to see if I have any work to 186 00:08:29,559 --> 00:08:31,629 do. If 187 00:08:31,629 --> 00:08:34,399 index is not equal to negative 1, 188 00:08:34,399 --> 00:08:35,970 then I have some work to do. 189 00:08:35,970 --> 00:08:39,190 If it is equal to negative 1, that means hey, you know what, you want it to 190 00:08:39,190 --> 00:08:39,930 replace 191 00:08:39,929 --> 00:08:43,819 this original string, inside string STR. That original string doesn't 192 00:08:43,820 --> 00:08:47,140 exist, so I got no work to do. You just called, like, find and replace in the word 193 00:08:47,139 --> 00:08:47,588 processor, 194 00:08:47,589 --> 00:08:50,470 and the thing you wanted to find wasn't there, okay. 195 00:08:50,470 --> 00:08:52,950 So in that case, all I would do 196 00:08:52,950 --> 00:08:54,860 is I would just return 197 00:08:54,860 --> 00:08:56,139 STR, 198 00:08:56,139 --> 00:09:00,120 right. Sort of unchanged, if I assume that I'm not doing what's inside the braces. 199 00:09:00,120 --> 00:09:02,480 If I do find that string, though, 200 00:09:02,480 --> 00:09:06,090 I'm gonna get some index, which is not negative 1, which is the position of 201 00:09:06,090 --> 00:09:09,170 this original string. So let's do a little example just to make this a 202 00:09:09,169 --> 00:09:11,829 little bit more clear what's going on. 203 00:09:11,830 --> 00:09:15,310 So if we were to call this function - do, do, do, do, do - 204 00:09:15,309 --> 00:09:17,779 and 205 00:09:17,779 --> 00:09:19,600 pass in the string, 206 00:09:19,600 --> 00:09:22,409 STR. So here's STR that we're gonna pass in. 207 00:09:22,409 --> 00:09:24,159 We'll just put it in a big box, 208 00:09:24,159 --> 00:09:28,610 and we'll say, at this point in life, everyone's just friendly. So we say 209 00:09:28,610 --> 00:09:30,330 Stanford 210 00:09:30,330 --> 00:09:31,840 loves 211 00:09:31,840 --> 00:09:33,019 Cal, 212 00:09:33,019 --> 00:09:36,409 right. Sometimes you have to distort reality in order t make an example. All 213 00:09:36,409 --> 00:09:39,600 right, so we have Stanford loves Cal. That's our original string, STR, and we might 214 00:09:39,600 --> 00:09:40,360 want to say, 215 00:09:40,360 --> 00:09:42,669 well, you know, this is, really, not 216 00:09:42,669 --> 00:09:48,799 always the way life is. Really, the way life is, is we want to replace the occurrence 217 00:09:48,799 --> 00:09:51,679 on STR 218 00:09:51,679 --> 00:09:54,019 of the word "loves" 219 00:09:54,019 --> 00:10:00,379 with kind of a more realistic example, like the word "beats," 220 00:10:00,379 --> 00:10:01,350 right. 221 00:10:01,350 --> 00:10:03,350 So what we want to do - and then we're gonna - 222 00:10:03,350 --> 00:10:06,370 this is gonna be some string that comes back, will find it back to STR. And the 223 00:10:06,370 --> 00:10:08,330 question is, when we call this, 224 00:10:08,330 --> 00:10:12,070 what index are we actually gonna find in here of the original string. So strings we start 225 00:10:12,070 --> 00:10:16,390 counting from zero. Zero, 1, 2, 3, 4, 5, 226 00:10:16,389 --> 00:10:18,460 6, 7, 8. 227 00:10:18,460 --> 00:10:20,600 The nine is where the L is at. 228 00:10:20,600 --> 00:10:25,830 And it keeps going. And 11, 12, 13, just put these all 229 00:10:25,830 --> 00:10:30,000 together - 15, 16, 17 is the L and that would be the end of the string. Sorry, the 230 00:10:30,000 --> 00:10:34,860 numbers are a little bit small. But the key is this L is at 9, okay. 231 00:10:34,860 --> 00:10:39,620 So when I call string index up original, it says there's the word, 232 00:10:39,620 --> 00:10:42,909 or the string, "loves," up here somewhere in the larger string. Yeah, 233 00:10:42,909 --> 00:10:46,709 it does. It appears at Index 9 so that's what you get. 234 00:10:46,710 --> 00:10:51,590 So if I've just gotten Index 9, and what I want to do is construct some new string 235 00:10:51,590 --> 00:10:55,700 that, essentially, is going to have this portion removed from it, 236 00:10:55,700 --> 00:10:57,560 how do I want to do that. 237 00:10:57,559 --> 00:11:01,099 What I want to think about is the way I construct that string, it's from three 238 00:11:01,100 --> 00:11:01,840 pieces. 239 00:11:01,840 --> 00:11:04,600 The first piece is everything up to 240 00:11:04,600 --> 00:11:06,229 the word I want to replace. 241 00:11:06,229 --> 00:11:07,520 That's Piece No. 1. 242 00:11:07,519 --> 00:11:11,078 The second piece is the thing that I actually want to replace, 243 00:11:11,078 --> 00:11:12,889 the string I'm replacing with, 244 00:11:12,889 --> 00:11:14,980 right. So this becomes Piece No. 2. 245 00:11:14,980 --> 00:11:18,970 And then, everything else after the piece I've replaced is Piece No. 3. So 246 00:11:18,970 --> 00:11:21,509 if I can concatenate those three pieces together, 247 00:11:21,509 --> 00:11:23,720 I'm going to essentially get the new string, 248 00:11:23,720 --> 00:11:25,399 which has this part replaced. 249 00:11:25,399 --> 00:11:29,199 And the question is how do I find the appropriate indexes inside my larger 250 00:11:29,200 --> 00:11:32,090 string to be able to actually do the replacement, okay. 251 00:11:32,090 --> 00:11:34,810 So first thing that I'm gonna do here is say 252 00:11:34,809 --> 00:11:38,559 get me the first portion. So what is, essentially, the substring of the 253 00:11:38,559 --> 00:11:41,569 original string up to this L position. 254 00:11:41,570 --> 00:11:44,770 So the way I can do that is I can say STR, 255 00:11:44,769 --> 00:11:46,960 substring, and 256 00:11:46,960 --> 00:11:50,009 I'm gonna get the substring starting at zero 'cause I want to start at the 257 00:11:50,009 --> 00:11:50,899 beginning of the string, 258 00:11:50,899 --> 00:11:55,129 and I want to go all the way up, but not including the L. That means 259 00:11:55,129 --> 00:11:58,700 the last position in substring. Remember, in substring you give it two 260 00:11:58,700 --> 00:12:02,160 indexes. You give it the starting point, and the position up to, but not including 261 00:12:02,159 --> 00:12:03,569 that last chapter. 262 00:12:03,570 --> 00:12:04,959 That's Position 9. 263 00:12:04,958 --> 00:12:10,229 Where am I getting Position 9 from this thing? 264 00:12:10,230 --> 00:12:13,889 From index, right. Index says where does love start. It starts at Position 9. I'm, like, 265 00:12:13,889 --> 00:12:16,449 hey, that's fantastic. So 266 00:12:16,450 --> 00:12:20,440 zero up to index, or zero up to 9 is Stanford and the states. It does not 267 00:12:20,440 --> 00:12:22,240 include the L. So I get that portion. 268 00:12:22,240 --> 00:12:24,940 Then I say well, to that - I'm not done yet, so premature [inaudible] in 269 00:12:24,940 --> 00:12:29,470 there. Always gotta watch out for that, bad time. So 270 00:12:29,470 --> 00:12:32,420 what we're gonna add to that is we're gonna add the string that we want to 271 00:12:32,419 --> 00:12:35,329 replace in here, "beats," which happens to be the string called 272 00:12:35,330 --> 00:12:36,350 the replacement, 273 00:12:36,350 --> 00:12:37,779 or RPL. 274 00:12:37,779 --> 00:12:40,418 And then to that, we want to add one more string. 275 00:12:40,418 --> 00:12:45,799 And that's, essentially, everything from after "loves" over, to get that third piece, okay. 276 00:12:45,799 --> 00:12:49,990 So what I want to know is what's the index of the position at which I need to 277 00:12:49,990 --> 00:12:52,740 get characters over to the end. 278 00:12:52,740 --> 00:12:54,389 That happens to be Position 279 00:12:54,389 --> 00:12:55,949 14. 280 00:12:55,950 --> 00:12:58,720 What is 14 equal to, 281 00:12:58,720 --> 00:13:04,870 relative to the kinds of things I have over here? 282 00:13:04,870 --> 00:13:07,450 It's index 'cause 283 00:13:07,450 --> 00:13:10,120 I have to first get over to the 9, 284 00:13:10,120 --> 00:13:12,730 then I need to jump over 285 00:13:12,730 --> 00:13:14,060 the length of this thing, 286 00:13:14,059 --> 00:13:17,039 which is the length of my original string. 287 00:13:17,039 --> 00:13:19,169 So if I add to index, what's 288 00:13:19,169 --> 00:13:19,919 289 00:13:19,919 --> 00:13:21,288 my original 290 00:13:21,288 --> 00:13:22,580 dot link, what 291 00:13:22,580 --> 00:13:26,810 that gives me is the index from which I want to take a substring over to the end 292 00:13:26,809 --> 00:13:27,829 of the string. 293 00:13:27,830 --> 00:13:29,528 So if I want to take a substring, 294 00:13:29,528 --> 00:13:31,278 this becomes an index 295 00:13:31,278 --> 00:13:32,639 to the 296 00:13:32,639 --> 00:13:36,210 substring function, or the substring method. 297 00:13:36,210 --> 00:13:37,050 And so 298 00:13:37,049 --> 00:13:40,289 from the string, what I do is I take the substring, starting at Position 299 00:13:40,289 --> 00:13:41,208 14. 300 00:13:41,208 --> 00:13:45,188 Notice I haven't given a second index here. In this case I gave two indexes. I 301 00:13:45,188 --> 00:13:46,919 gave a start and end position. 302 00:13:46,919 --> 00:13:51,519 Here I just gave one index, and what happens if I only give one index? 303 00:13:51,519 --> 00:13:52,799 It goes to the end. 304 00:13:52,799 --> 00:13:55,949 So that's part of the beauty is a lot of times you just say, hey, from this position go 305 00:13:55,950 --> 00:13:56,870 to the end. 306 00:13:56,870 --> 00:14:00,209 And so that's what I get when I put all these string things together. 307 00:14:00,208 --> 00:14:04,039 And what I need to do is these three things are just pieces. I'm concatenating 308 00:14:04,039 --> 00:14:04,808 them together. 309 00:14:04,808 --> 00:14:07,759 I assigned them back to STR. 310 00:14:07,759 --> 00:14:12,509 And then, when I return STR here, I've gotten those three pieces concatenated together. Is there 311 00:14:12,509 --> 00:14:18,340 any questions about that? Un huh. If 312 00:14:18,340 --> 00:14:21,920 love appears more than once, index has just returned the index of the very first 313 00:14:21,919 --> 00:14:22,870 occurrence. 314 00:14:22,870 --> 00:14:26,600 There's actually a version of index sub that takes two parameters. One is the thing you're 315 00:14:26,600 --> 00:14:30,090 looking for, and the second is from which position you should start looking for it at. 316 00:14:30,090 --> 00:14:30,940 317 00:14:30,940 --> 00:14:34,420 And so you could actually say look for love starting at Position, you know, 318 00:14:34,419 --> 00:14:37,199 13, and then it wouldn't actually find love in the remainder of the string. So 319 00:14:37,200 --> 00:14:39,310 there's a different version of index of, 320 00:14:39,309 --> 00:14:41,479 but index of always returns the 321 00:14:41,480 --> 00:14:42,570 index of the 322 00:14:42,570 --> 00:14:48,560 very first occurrence of the string you're looking for in 323 00:14:48,559 --> 00:14:52,289 that string. So let's actually do a little example of this in a running program. Do, do, 324 00:14:52,289 --> 00:14:54,419 do, do, do. 325 00:14:54,419 --> 00:14:58,829 And we'll do replace occurrence. And one thing that actually goes on at Stanford, 326 00:14:58,830 --> 00:15:01,290 which I thought was an interesting thing when I got here professionally, is 327 00:15:01,289 --> 00:15:05,230 we don't like to speak in full terms. So if we want to 328 00:15:05,230 --> 00:15:06,759 Stanfordize some strings, 329 00:15:06,759 --> 00:15:08,799 we do all these string replacements. We sort of say, 330 00:15:08,799 --> 00:15:12,979 you know what, if you have Florence Moore in your string, that's really FloMo. 331 00:15:12,980 --> 00:15:17,490 And Memorial Church is memchu; AmerSc, [inaudible]; psychology is psyche; 332 00:15:17,490 --> 00:15:19,310 economics, econ; your 333 00:15:19,309 --> 00:15:20,429 most fun class, 334 00:15:20,429 --> 00:15:21,689 CS 106A. So it's 335 00:15:21,690 --> 00:15:24,320 just what Stanford's all about. 336 00:15:24,320 --> 00:15:27,530 And so if we go ahead and run this, right. Here's the function we just wrote. Here's 337 00:15:27,529 --> 00:15:30,870 our little friend, replace first occurrence. Over here we called it 338 00:15:30,870 --> 00:15:34,100 replace occurrence. I'm being explicit and saying it's only replacing the first occurrence. 339 00:15:34,100 --> 00:15:37,350 You could think of a way to generalize this to replace all occurrences in a 340 00:15:37,350 --> 00:15:39,870 string if you wanted to. But I didn't give you that version 'cause 341 00:15:39,870 --> 00:15:41,850 I might give you that version on another 342 00:15:41,850 --> 00:15:43,230 problem set at some point. 343 00:15:43,230 --> 00:15:47,240 So what we're gonna do is we're gonna ask the user, enter a line to Stanfordize. 344 00:15:47,240 --> 00:15:50,779 Notice I want to put Stanfordize inside double quotes. So I put it in these 345 00:15:50,779 --> 00:15:51,759 characters, 346 00:15:51,759 --> 00:15:55,109 /quote, which just means a single, double-quote character. That's how I print 347 00:15:55,109 --> 00:15:55,980 double quotes. 348 00:15:55,980 --> 00:15:58,778 So it says read line for Stanfordize in quotes. 349 00:15:58,778 --> 00:16:03,099 I want to keep reading lines and Stanfordizing them until the user gives me an 350 00:16:03,099 --> 00:16:04,009 empty line. 351 00:16:04,009 --> 00:16:08,539 How do I do that? I check to see if the line the user gives me is equal 352 00:16:08,539 --> 00:16:09,098 353 00:16:09,099 --> 00:16:10,420 to a quote-quote. 354 00:16:10,419 --> 00:16:14,519 So if it's equal to a quote-quote, it's equal to the empty string. That means, hey, 355 00:16:14,519 --> 00:16:17,899 you entered in - if we ask the user for a string, they just hit enter. They 356 00:16:17,899 --> 00:16:21,139 didn't enter any characters. That's the empty string, so we would break out the loop. It's 357 00:16:21,139 --> 00:16:23,659 our little loop and a-half concept. Otherwise, we say 358 00:16:23,659 --> 00:16:27,129 at Stanford we say, and we Stanfordize the line. And when someone's finally 359 00:16:27,129 --> 00:16:31,450 done, we say thank you for visiting Stanford, ha, ha, ha. That'll be $45,000.00. 360 00:16:31,450 --> 00:16:34,080 [Laughter]. All right, 361 00:16:34,080 --> 00:16:38,660 so it's money well spent, trust me. Really. 362 00:16:38,659 --> 00:16:41,709 Okay, so replace occurrence string we want to run, 363 00:16:41,710 --> 00:16:45,290 and we come along, and it's running, it's running, it's running. 364 00:16:45,289 --> 00:16:48,110 Sometimes my computer's running a little bit slow. I notice this weird thing last night. I'm gonna tell 365 00:16:48,110 --> 00:16:51,300 you a story while the computer's actually running. 366 00:16:51,299 --> 00:16:55,199 I couldn't type N's on my keyboard for some reason. And then I reset my computer, and I 367 00:16:55,200 --> 00:16:59,360 could. So at this point, I don't know if I can type N's. So let's just hope we can. So 368 00:16:59,360 --> 00:17:02,480 I live in - oh, I got the 369 00:17:02,480 --> 00:17:05,860 N - Florence - you should have been here last night. I was like, N, N, and I wasn't getting it - 370 00:17:05,859 --> 00:17:07,759 Florence Moore, 371 00:17:07,759 --> 00:17:10,279 major in 372 00:17:10,279 --> 00:17:13,559 economics - I can't even type 373 00:17:13,559 --> 00:17:15,159 today - 374 00:17:15,160 --> 00:17:16,750 and spend 375 00:17:16,750 --> 00:17:22,699 all my time on my most fun class. And so, 376 00:17:22,699 --> 00:17:26,660 at Stanford we say I live in FloMo, major in Econ, and spend all my time on CS 106A, 377 00:17:26,660 --> 00:17:28,360 okay. 378 00:17:28,359 --> 00:17:33,809 And now, I hit return, Thank you for visiting Stanford. Go home. All right, so 379 00:17:33,809 --> 00:17:36,730 that's kind of a simple version of 380 00:17:36,730 --> 00:17:39,670 replace first occurrence. And notice you can actually replace multiple things in 381 00:17:39,670 --> 00:17:42,700 the same string, as long as the string that you're doing the replacement on 382 00:17:42,700 --> 00:17:46,319 you assign back to itself. And then we kind of do all bunch of these replacements in a row, okay. Is 383 00:17:46,319 --> 00:17:48,039 there 384 00:17:48,039 --> 00:17:50,389 any questions about that? 385 00:17:50,390 --> 00:17:53,720 Are you feeling okay about doing replacement. All right. 386 00:17:53,720 --> 00:17:56,319 So now, it's time for something completely different. Although it's not 387 00:17:56,319 --> 00:17:58,889 completely different, it's just kind of different. 388 00:17:58,890 --> 00:18:00,549 And the idea is sometimes - 389 00:18:00,548 --> 00:18:03,480 and I always say that - sometimes you want to do this. Yeah, 'cause 390 00:18:03,480 --> 00:18:06,360 sometimes you want to do it, and other times you don't. 391 00:18:06,359 --> 00:18:08,099 Sometimes you feel like a nut. 392 00:18:08,099 --> 00:18:11,099 Sometimes you don't. Oh, man, 393 00:18:11,099 --> 00:18:14,399 I gotta start watching TV in this decade. 394 00:18:14,400 --> 00:18:15,580 395 00:18:15,579 --> 00:18:18,369 So, tokenizers. 396 00:18:18,369 --> 00:18:22,629 What is a tokenizer? A tokenizer is something, as they say it's a 397 00:18:22,630 --> 00:18:27,280 computer science term. All a tokenizer is, is we have some string of text. What we 398 00:18:27,279 --> 00:18:30,829 want to do is break it up into tokens. That's called tokenization. So you 399 00:18:30,829 --> 00:18:31,609 might say, 400 00:18:31,609 --> 00:18:34,879 Marilyn, what is a token? Like, last time I remember what a token was, is when I gave 401 00:18:34,880 --> 00:18:39,470 a dollar at the arcade and I got back, like, ten tokens instead of quarters. And you're like, 402 00:18:39,470 --> 00:18:42,680 yeah, Marilyn, I never did that. I had 403 00:18:42,680 --> 00:18:46,259 an XBox. All right, so a tokenizer - anyone ever go to an arcade? All right, 404 00:18:46,259 --> 00:18:47,549 just checking. 405 00:18:47,549 --> 00:18:52,940 All right a token, basically, is a piece of string - a piece of string - is a 406 00:18:52,940 --> 00:18:55,900 string that has on the two sides of it, 407 00:18:55,900 --> 00:18:57,929 white space. So if I say, 408 00:18:57,929 --> 00:18:59,790 hello 409 00:18:59,789 --> 00:19:00,980 there, 410 00:19:00,980 --> 00:19:02,169 Mary, 411 00:19:02,169 --> 00:19:03,020 hello 412 00:19:03,019 --> 00:19:06,529 there and Mary are tokens. They are something that we refer to as delimited 413 00:19:06,529 --> 00:19:11,200 by what space, which means there is either spaces, or tabs, or returns, or whatever, 414 00:19:11,200 --> 00:19:14,700 in between the individual tokens. We like to think of tokens as 415 00:19:14,700 --> 00:19:15,420 words, 416 00:19:15,420 --> 00:19:18,690 but computer scientists say token. Token is a more general term 'cause if I actually said 417 00:19:18,690 --> 00:19:20,210 hello there 418 00:19:20,210 --> 00:19:20,970 comma 419 00:19:20,970 --> 00:19:21,740 Mary, 420 00:19:21,740 --> 00:19:25,509 the "there comma" might actually be considered one token by itself 'cause it's 421 00:19:25,509 --> 00:19:27,519 just delimited by space. 422 00:19:27,519 --> 00:19:30,538 Here's a space here and has a space there, so the comma's in there. And you would think why 423 00:19:30,538 --> 00:19:34,509 comma's not part of the word. Yeah, that's why we call them tokens and not words. 424 00:19:34,509 --> 00:19:38,309 So if we want to tokenize, there is a library that we can use in Java that 425 00:19:38,309 --> 00:19:40,899 actually has some fun stuff in it for tokenization. 426 00:19:40,900 --> 00:19:47,759 And that's Java util, so we would import Java.util.*, 427 00:19:47,759 --> 00:19:50,109 and what we get for doing that, 428 00:19:50,109 --> 00:19:53,029 is we get something called the string tokenizer, 429 00:19:53,029 --> 00:19:58,889 which is a class that we can use to tokenize text. All right, so 430 00:19:58,890 --> 00:20:02,080 we get this thing called the string tokenizer. 431 00:20:02,079 --> 00:20:05,369 How do I create one of these? Well, I paste string tokenizer as the type 432 00:20:05,369 --> 00:20:09,439 'cause that's the class that I have, and I'll call it tokenizer 433 00:20:09,440 --> 00:20:14,570 equals I want to create a new tokenizer. So I say new 434 00:20:14,569 --> 00:20:16,220 string tokenizer, 435 00:20:16,220 --> 00:20:19,299 and the question that comes up here is well, 436 00:20:19,299 --> 00:20:21,450 what is the string you're gonna tokenize? 437 00:20:21,450 --> 00:20:22,920 That is the string that we 438 00:20:22,920 --> 00:20:26,850 passed to the string tokenizer's constructor when we create a new one. So 439 00:20:26,849 --> 00:20:29,558 we might have some line here that we passed in. 440 00:20:29,558 --> 00:20:31,609 And now, line is just some string 441 00:20:31,609 --> 00:20:35,490 that maybe we got from the user for example by doing a read 442 00:20:35,490 --> 00:20:39,200 line. Maybe we were unfriendly and didn't give the user a prompt. We just like, if a 443 00:20:39,200 --> 00:20:42,058 blinking comes up, and there's like oh, I gotta turn and write something. 444 00:20:42,058 --> 00:20:44,339 It's just like when you're writing a paper, right. 445 00:20:44,339 --> 00:20:47,639 The blinking cursor comes up and there's nothing there. You just gotta fill it in. 446 00:20:47,640 --> 00:20:50,000 So you write some line, and then we can say, hey, 447 00:20:50,000 --> 00:20:53,429 string tokenizer, I'm gonna create a new one of you, and the line I want you to tokenize is 448 00:20:53,429 --> 00:20:56,310 this line that I'm giving you to begin with. 449 00:20:56,309 --> 00:20:59,849 So once you get that line, there's a couple of things you can ask the string tokenizer. 450 00:20:59,849 --> 00:21:02,689 One of them is a method that returns a booleon, 451 00:21:02,690 --> 00:21:07,019 which is called has more tokens. 452 00:21:07,019 --> 00:21:09,288 And the way this puppy works is you just ask 453 00:21:09,288 --> 00:21:12,480 this string tokenizer, like you would say tokenizer dot 454 00:21:12,480 --> 00:21:15,329 has more tokens, like; do you have more tokens? 455 00:21:15,329 --> 00:21:18,730 Have you processed the whole string yet? So if you've just created the new line, and this 456 00:21:18,730 --> 00:21:20,079 line is kind of sitting here 457 00:21:20,079 --> 00:21:23,528 like that, and it's saying do you have any more tokens. Yeah, I got tokens, 458 00:21:23,528 --> 00:21:26,549 man. I got tokens up the wazoo. You want tokens, I'll give you tokens. 459 00:21:26,549 --> 00:21:30,700 And so, has more tokens [inaudible] true. If you process the whole string, when 460 00:21:30,700 --> 00:21:32,259 you will see when we get there, 461 00:21:32,259 --> 00:21:34,799 it'll say no, I don't have any more tokens. 462 00:21:34,799 --> 00:21:36,500 How do you get each token? 463 00:21:36,500 --> 00:21:40,079 Well, you ask for next token. 464 00:21:40,079 --> 00:21:43,799 And what next token does, when you call the tokenizer with next token, is it gives 465 00:21:43,799 --> 00:21:45,239 you the next token 466 00:21:45,239 --> 00:21:48,009 of the string that it's processing, 467 00:21:48,009 --> 00:21:52,798 as a separate string. So if I started off the tokenizer with this line, I say hey, do you 468 00:21:52,798 --> 00:21:56,160 have more tokens. It says yeah. Well, give me the next token. So what it will 469 00:21:56,160 --> 00:21:58,820 return to you is hello. 470 00:21:58,819 --> 00:22:02,189 And it will be sort of sitting here waiting to give you the next token. You can 471 00:22:02,190 --> 00:22:04,140 ask if you have more tokens. It says yeah, give 472 00:22:04,140 --> 00:22:07,579 me the next token. It will give you "there" and the comma 473 00:22:07,578 --> 00:22:11,200 'cause the default version of the tokenizer, the only think that delimits 474 00:22:11,200 --> 00:22:14,558 tokens - delimit is a funky word for splits between tokens - 475 00:22:14,558 --> 00:22:18,980 are spaces, or tabs, or return characters. But for a single line, you won't have returns 476 00:22:18,980 --> 00:22:19,750 in that. 477 00:22:19,750 --> 00:22:23,069 And then you said you had more tokens. Yeah, give me the next token. It will give you "Mary" as 478 00:22:23,069 --> 00:22:24,638 a token that's sitting here. 479 00:22:24,638 --> 00:22:28,539 And then when you say do you have more tokens, that's all, okay. And 480 00:22:28,539 --> 00:22:31,039 at that point, you shouldn't call next token. 481 00:22:31,039 --> 00:22:33,950 You can if you want. You can experiment with this if you want to 482 00:22:33,950 --> 00:22:36,140 experiment with random error messages, but 483 00:22:36,140 --> 00:22:37,500 there's no more tokens to give 484 00:22:37,500 --> 00:22:42,910 you. It's all out of love. It's so lost without you. It has no more tokens. Yeah, 485 00:22:42,910 --> 00:22:44,110 Air Supply. Not that I 486 00:22:44,109 --> 00:22:47,178 would recommend that you have to listen to Air Supply, but sometimes you hear a song and you 487 00:22:47,179 --> 00:22:48,680 can't get it out of your head 488 00:22:48,680 --> 00:22:50,789 as much as you wish you could. 489 00:22:50,789 --> 00:22:53,039 Sometimes selective brain surgery would not be a bad thing, but that's important right now. What is 490 00:22:53,039 --> 00:22:54,579 important 491 00:22:54,579 --> 00:22:55,980 right now 492 00:22:55,980 --> 00:22:59,170 is how do we put all this together at the tokenizer line. So let me show you an 493 00:22:59,170 --> 00:23:01,140 example of the tokenizer. 494 00:23:01,140 --> 00:23:03,429 This one's very simple. All we're gonna do here 495 00:23:03,429 --> 00:23:05,970 is we're gonna ask the user - I'll just scroll over a 496 00:23:05,970 --> 00:23:08,390 little 497 00:23:08,390 --> 00:23:12,040 bit. We're gonna ask the user to enter some lines to tokenize and we're gonna 498 00:23:12,039 --> 00:23:15,609 write out the tokens of the string R, and then we're gonna call the message 499 00:23:15,609 --> 00:23:16,559 sprint token. 500 00:23:16,559 --> 00:23:19,329 What sprint token's gonna do, it's gonna take in the string you want to 501 00:23:19,329 --> 00:23:20,579 tokenize. It 502 00:23:20,579 --> 00:23:27,579 creates one of these string tokenizers - I'm so lost without you. [Laughter]. 503 00:23:28,430 --> 00:23:30,390 Can 504 00:23:30,390 --> 00:23:31,288 we make 505 00:23:31,288 --> 00:23:35,640 Marilyn snap? No. I know it's 506 00:23:35,640 --> 00:23:38,630 like great fun to listen to when you're, like, 14, and you just broke up with 507 00:23:38,630 --> 00:23:40,850 a girlfriend for the first time. 508 00:23:40,849 --> 00:23:44,789 And then, after that, you want to kill the next time you hear it. Fine, 509 00:23:44,789 --> 00:23:46,809 So, tokenizer. 510 00:23:46,809 --> 00:23:50,309 I'm glad we're having fun though. 511 00:23:50,309 --> 00:23:53,990 So what I'm gonna do is I'm gonna count through all the tokens. So I'm gonna a foreloop 512 00:23:53,990 --> 00:23:56,069 interestingly enough. Here's something funky. I'm gonna have a 513 00:23:56,069 --> 00:23:59,939 foreloop, but the thing I'm gonna do in my foreloop, my test, is not to check to see if 514 00:23:59,940 --> 00:24:01,710 I've reached some maximum number. 515 00:24:01,710 --> 00:24:05,539 But my test is actually gonna be to see if tokenizer has more tokens. 516 00:24:05,539 --> 00:24:08,889 So I have a foreloop that's just like a regular foreloop, but I start off with a count 517 00:24:08,890 --> 00:24:11,720 that's equal to zero, and you're like that looks okay. 518 00:24:11,720 --> 00:24:15,190 I do a count ++ over here, and you're like that's okay, what are you counting up 519 00:24:15,190 --> 00:24:19,430 to Marilyn. And I say I'm counting up to however many tokens you have. And you go, 520 00:24:19,430 --> 00:24:23,769 oh, interesting. So my condition's to leave, or to continue on with the 521 00:24:23,769 --> 00:24:27,210 loop, is tokenizer has more tokens. If 522 00:24:27,210 --> 00:24:28,548 it has more tokens, 523 00:24:28,548 --> 00:24:31,058 then I'm gonna do something here to get the next token. I'm gonna keep 524 00:24:31,058 --> 00:24:34,410 doing this loop. But what the counter's gonna give me is a way to count 525 00:24:34,410 --> 00:24:36,048 through all my tokens. 526 00:24:36,048 --> 00:24:38,480 So I can write out token number count, 527 00:24:38,480 --> 00:24:42,620 and then a colon, and then write out the next token that the tokenizer gives me. 528 00:24:42,619 --> 00:24:44,689 Is there any questions about there? Let's 529 00:24:44,690 --> 00:24:47,730 actually run this puppy [inaudible]. Do, 530 00:24:47,730 --> 00:24:49,430 de, do. 531 00:24:49,430 --> 00:24:53,450 You can feel free to keep singing now if you want, if 532 00:24:53,450 --> 00:24:56,069 you want. All right, 533 00:24:56,069 --> 00:24:58,308 so we're gonna do our friend. 534 00:24:58,308 --> 00:25:01,000 What's our friend called? The tokenizer example. 535 00:25:01,000 --> 00:25:04,789 Do, do, do, do, we're running the tokenizer, interline's tokenized, so I might say 536 00:25:04,789 --> 00:25:05,779 "I, 537 00:25:05,779 --> 00:25:07,450 for one, 538 00:25:07,450 --> 00:25:09,970 love CS." We're very formal here. 539 00:25:09,970 --> 00:25:13,370 And it says the tokens of the string are on notice. It got the "I" and the comma 540 00:25:13,369 --> 00:25:17,379 together as one token because as we talked about, spaces are the delimiter. 541 00:25:17,380 --> 00:25:21,240 And so "for" and then "one" with the comma, and "love" and "CS," and that's all the 542 00:25:21,240 --> 00:25:23,759 tokens we got. And so at this point you might be thinking, yeah, man, 543 00:25:23,759 --> 00:25:27,299 that's great, but you know what. I really don't like punctuation. 544 00:25:27,299 --> 00:25:30,440 And sometimes I don't like punctuation, but I can't stop the user from 545 00:25:30,440 --> 00:25:33,350 using punctuation because even though I don't like to be grammatically correct, 546 00:25:33,349 --> 00:25:34,589 they do. 547 00:25:34,589 --> 00:25:37,829 So how do I prevent them from being grammatically correct as well, 548 00:25:37,829 --> 00:25:40,220 which is kind of a fun thing to do. What you can say is hey, 549 00:25:40,220 --> 00:25:43,829 what I want to do is change my tokenizer, so that it not only 550 00:25:43,829 --> 00:25:47,689 stops at spaces, but it's gonna stop or consider a delimiter, 551 00:25:47,690 --> 00:25:51,179 any of this list of characteristics that I give it. So you give it a list of characteristics as 552 00:25:51,179 --> 00:25:53,798 a string. So here I'm gonna give it a comma 553 00:25:53,798 --> 00:25:55,668 and a space, okay. 554 00:25:55,669 --> 00:25:59,620 And this version of the string tokenizer constructor, what it will do is 555 00:25:59,619 --> 00:26:01,939 it will actually tokenize the string. 556 00:26:01,940 --> 00:26:06,070 But think of the thing that you're using as your delimiter, or what chops up your 557 00:26:06,069 --> 00:26:07,189 individual tokens 558 00:26:07,190 --> 00:26:10,749 as either a comma, or a space, or anything you want to put in that string there. 559 00:26:10,749 --> 00:26:13,649 Each of the individual characters in that string is treated as a potential 560 00:26:13,648 --> 00:26:18,529 delimiter. So if you say "I for one love CS," 561 00:26:18,529 --> 00:26:20,009 ah, no commas. 562 00:26:20,009 --> 00:26:22,700 Why? Because commas are considered delimiter. So it just gives you 563 00:26:22,700 --> 00:26:26,559 everything up to a comma or a space, and you could imagine you could put in period, and 564 00:26:26,559 --> 00:26:28,178 exclamation point, and all that other stuff, 565 00:26:28,179 --> 00:26:32,860 if you just want to get out the non punctuation here. So 566 00:26:32,859 --> 00:26:35,349 tokenizing is something that's oftentimes useful if you get a bigger 567 00:26:35,349 --> 00:26:38,349 piece of text, and you want to break it up into any individual words, and then maybe do 568 00:26:38,349 --> 00:26:40,558 something on those individual words, okay. 569 00:26:40,558 --> 00:26:42,629 Any questions about tokenization? 570 00:26:42,630 --> 00:26:46,490 Hopefully, it's not too painful or scary. All right. So 571 00:26:46,490 --> 00:26:50,298 the next thing I want to do, will just pay for the smorgasbord of string, 572 00:26:50,298 --> 00:26:53,408 is I want to teach you about something that's really gotten to be an important 573 00:26:53,409 --> 00:26:55,650 thing about computer science these 574 00:26:55,650 --> 00:26:56,559 last few years, 575 00:26:56,558 --> 00:26:57,430 which is, 576 00:26:57,430 --> 00:27:00,120 basically, this idea known as encryption. 577 00:27:00,119 --> 00:27:03,889 And encryption is something that's been around for thousands of years. All 578 00:27:03,890 --> 00:27:07,000 encryption is, is it's kind of like sending secret messages. 579 00:27:07,000 --> 00:27:08,808 You have some particular message. 580 00:27:08,808 --> 00:27:11,749 You want to send it to someone else, but you want to send a secret version of 581 00:27:11,749 --> 00:27:15,568 that message. And people have been doing this for thousands of years, actually, 582 00:27:15,568 --> 00:27:16,339 interestingly enough. 583 00:27:16,339 --> 00:27:20,269 They just didn't have very good methods of doing it until about the last, oh, 50 years. 584 00:27:20,269 --> 00:27:23,240 But you know they did it for a long time, and people broke encryption. As a matter of fact, 585 00:27:23,240 --> 00:27:26,759 there's this really interesting book by Simon Singh. I'll bring in a copy, 586 00:27:26,759 --> 00:27:29,398 perhaps next class, if you're really interested, 587 00:27:29,398 --> 00:27:33,439 about the whole history of encryption. It goes back thousands of years, and how, like, 588 00:27:33,440 --> 00:27:37,410 wars, and queenships, and kingships, and stuff, were, basically, lost in one on the 589 00:27:37,410 --> 00:27:38,200 strength of 590 00:27:38,200 --> 00:27:41,269 how well someone could break a piece of code. 591 00:27:41,269 --> 00:27:44,408 But the basic idea of encryption, and it probably dates back even further than this, but 592 00:27:44,409 --> 00:27:47,390 one of the most well-known ones is something that's known as the Caesar 593 00:27:47,390 --> 00:27:48,470 594 00:27:48,470 --> 00:27:50,129 cipher, not to be confused with the salad. 595 00:27:50,128 --> 00:27:52,759 But the basic idea with the Caesar cipher - I 596 00:27:52,759 --> 00:27:54,190 picked up the wrong newspaper - 597 00:27:54,190 --> 00:27:55,960 the Caesar cipher 598 00:27:55,960 --> 00:27:58,240 is that what we want to do 599 00:27:58,240 --> 00:28:02,808 is, basically, take our alphabet and rotate it by some number of letters to get a 600 00:28:02,808 --> 00:28:03,220 replacement. 601 00:28:03,220 --> 00:28:05,920 What does that mean? That's just a whole bunch of words. 602 00:28:05,920 --> 00:28:10,070 So let me show you a little slide that just makes that clear. 603 00:28:10,069 --> 00:28:11,240 So in Caesar's day - 604 00:28:11,240 --> 00:28:13,779 I will now play the role of Caesar. I actually considered wearing 605 00:28:13,779 --> 00:28:18,170 a toga to class today. I just thought that was fraught with way too much peril. 606 00:28:18,170 --> 00:28:21,670 So I just decided to bring my little Caesar crown. And that's what I'm trying to find my little 607 00:28:21,670 --> 00:28:23,970 crown of reason stuff, but I couldn't. 608 00:28:23,970 --> 00:28:26,288 So I just got a little hat. 609 00:28:26,288 --> 00:28:28,519 [Laughter]. 610 00:28:28,519 --> 00:28:32,359 And so the basic idea - say you are Caesar - well, I did crown myself, actually. 611 00:28:32,359 --> 00:28:35,699 I knew someone here could actually take the crown from [inaudible]. That was Napoleon, 612 00:28:35,700 --> 00:28:38,170 a whole different story. I really 613 00:28:38,170 --> 00:28:42,660 like to take history and mix it up. It's just to see if you're actually paying attention. 614 00:28:42,660 --> 00:28:46,460 All right, the basic way the Caesar cipher works is we take our original 615 00:28:46,460 --> 00:28:48,600 alphabet. Here's all of our letters from A through Z. 616 00:28:48,599 --> 00:28:52,048 We take that whole alphabet, and we shift it over some number of letters. Like let's 617 00:28:52,048 --> 00:28:55,980 say we shift it over three letters. So I take this whole thing, I shift it over 618 00:28:55,980 --> 00:29:00,360 three letters, so now the D lines up over here where the A should have been so I've 619 00:29:00,359 --> 00:29:03,649 shifted over these bottom characters. And the characters that kind of went off the 620 00:29:03,650 --> 00:29:07,410 end here like the A, B, and C, were kind of like whoa, we're going off the end. Where do we go? 621 00:29:07,410 --> 00:29:09,450 We just kind of shuffle them 622 00:29:09,450 --> 00:29:10,950 back around over here. 623 00:29:10,950 --> 00:29:15,169 So the basic idea is we're gonna rotate our alphabet by N letters, 624 00:29:15,169 --> 00:29:18,709 and N is 3 in the example here, and N is called the key. So the key of the Caesar 625 00:29:18,709 --> 00:29:19,919 cipher 626 00:29:19,919 --> 00:29:23,380 is how many letters you're actually shifting. 627 00:29:23,380 --> 00:29:27,139 And then we wrap around it again. And now, once we've done this little wraparound, 628 00:29:27,138 --> 00:29:30,678 we take our original message that we want to encrypt. That's something that's referred to 629 00:29:30,679 --> 00:29:35,060 as the plain text. The plain text is your actual original message. And 630 00:29:35,059 --> 00:29:39,179 we want to encrypt that or change it to our cipher text, which is what the 631 00:29:39,180 --> 00:29:43,150 encrypted message is, by using this mapping. So every time an A appears in the original, 632 00:29:43,150 --> 00:29:47,320 we replace it by a D. And a D appears in the original, we replace it by G, and a C 633 00:29:47,319 --> 00:29:51,369 appears in the original, we replace it by an F, etc., for the whole alphabet. Is there 634 00:29:51,369 --> 00:29:54,579 any questions about the Caesar cipher? This is actually an actual cipher that, 635 00:29:54,579 --> 00:29:57,928 evidently, historians tell us that Caesar used in the days of yore. 636 00:29:57,929 --> 00:30:01,980 And you know, evidently, he was killed, so it didn't work that 637 00:30:01,980 --> 00:30:04,528 well. But you know most people, that's one of the things that when you were a little kid, and 638 00:30:04,528 --> 00:30:07,538 you had like the Super Secret Decoder Ring, you were 639 00:30:07,538 --> 00:30:09,480 probably getting a Caesar cipher. All 640 00:30:09,480 --> 00:30:12,509 right, any questions about the basics of the Caesar cipher. 641 00:30:12,509 --> 00:30:15,910 So what we're gonna do is let's write a program that actually can be able to 642 00:30:15,910 --> 00:30:18,980 encrypt and decrypt text according to a Caesar cipher, 643 00:30:18,980 --> 00:30:22,509 and we'll do it doing pop-down design. So we'll actually just do it on the computer 644 00:30:22,509 --> 00:30:23,379 together 645 00:30:23,380 --> 00:30:25,560 'cause it's more fun that way. And 646 00:30:25,559 --> 00:30:29,470 because I'm Caesar, I will drive. So 647 00:30:29,470 --> 00:30:31,279 we're gonna have my Caesar cipher, all 648 00:30:31,279 --> 00:30:34,308 right. And I just gave you a little bit of a run message here. It's kind of 649 00:30:34,308 --> 00:30:36,879 the very beginnings of the program. But all this does - 650 00:30:36,880 --> 00:30:38,410 it's not a big deal. It says 651 00:30:38,410 --> 00:30:41,290 this program uses a Caesar cipher for encryption. 652 00:30:41,289 --> 00:30:45,178 It's going to ask for the encryption key. That means it's asking for the number 653 00:30:45,179 --> 00:30:48,860 by which it's gonna rotate the alphabet to create your Caesar key, 654 00:30:48,859 --> 00:30:51,528 or to create your Caesar cipher, and that's just our key 655 00:30:51,528 --> 00:30:52,460 that's an integer. 656 00:30:52,460 --> 00:30:56,230 So our plain text, that's the original message that we want to encrypt. 657 00:30:56,230 --> 00:30:59,779 We ask the user for the plain text, so we just get a line form the user. And then 658 00:30:59,779 --> 00:31:03,428 what we're gonna do is we're gonna create our cipher text, or the encrypted 659 00:31:03,429 --> 00:31:03,870 text 660 00:31:03,869 --> 00:31:08,079 by calling a function called encrypt Caesar. We're sort of giving a directive. It's 661 00:31:08,079 --> 00:31:10,949 kind of like an inquisitive tape. Encrypt Caesar, 662 00:31:10,950 --> 00:31:14,840 and we give it the plain text, and we give it the number for the key that we want it to 663 00:31:14,839 --> 00:31:15,909 encrypt using. 664 00:31:15,910 --> 00:31:19,090 And then, hopefully, that will give us back the encrypted string, and we're just gonna write that out, okay. S 665 00:31:19,089 --> 00:31:20,909 o 666 00:31:20,910 --> 00:31:23,350 how do we do this encryption? All right, so 667 00:31:23,349 --> 00:31:26,559 at this point, and it should be clear that the thing we want to write is probably 668 00:31:26,559 --> 00:31:28,009 encrypt Caesar. 669 00:31:28,009 --> 00:31:31,019 So what we're gonna do is we're gonna write a pleasant message, 670 00:31:31,019 --> 00:31:34,109 and what is this puppy gonna return to us? 671 00:31:34,109 --> 00:31:36,709 String, right 'cause that's what we're expecting, the encoded version of this 672 00:31:36,710 --> 00:31:37,788 particular 673 00:31:37,788 --> 00:31:39,378 message as a string. 674 00:31:39,378 --> 00:31:41,368 So we'll call this encrypt Caesar. 675 00:31:41,368 --> 00:31:44,259 And what's it getting past? It's getting past 676 00:31:44,259 --> 00:31:47,430 the string, which we'll just call STR, and it's getting past an integer, 677 00:31:47,430 --> 00:31:50,700 which will we will refer to as the key. So 678 00:31:50,700 --> 00:31:53,259 if I want to think about doing the encryption, 679 00:31:53,259 --> 00:31:54,789 right, what I'm gonna do is, 680 00:31:54,789 --> 00:31:57,928 on a character-by-character basis, I want to do this replacement. I want to 681 00:31:57,929 --> 00:32:00,679 say for every character that I see in my original string, 682 00:32:00,679 --> 00:32:04,369 there is some shifted version of that character that I want to use in my 683 00:32:04,368 --> 00:32:05,809 encrypted string. 684 00:32:05,809 --> 00:32:09,759 So in order to do that, I'm gonna use my standard kind of string building idiom, which 685 00:32:09,759 --> 00:32:13,680 says I start off with a string, which I'll call results, which starts of 686 00:32:13,680 --> 00:32:16,180 empty, right. It says, quote, quote, empty string. 687 00:32:16,180 --> 00:32:17,798 And I'm gonna do a foreloop 688 00:32:17,798 --> 00:32:19,509 through my string 689 00:32:19,509 --> 00:32:25,599 that I'm giving to encrypt. So up the string's length, I'm just 690 00:32:25,599 --> 00:32:28,138 gonna count through and get each character. So I'll 691 00:32:28,138 --> 00:32:31,819 so sort of a standard thing. I'm gonna say CH and 692 00:32:31,819 --> 00:32:35,369 I'm gonna essentially get the character that I want to get from the string, so 693 00:32:35,369 --> 00:32:37,739 I'll 694 00:32:37,740 --> 00:32:39,109 say 695 00:32:39,109 --> 00:32:39,629 STR.char@chat@char@I. 696 00:32:39,630 --> 00:32:40,450 697 00:32:40,450 --> 00:32:41,909 So I've now gotten my character. 698 00:32:41,909 --> 00:32:45,540 I want to figure out how to encrypt that character, okay. 699 00:32:45,539 --> 00:32:48,970 So I think to myself, wow, gee, while encrypting the character involves all this 700 00:32:48,970 --> 00:32:51,069 stuff, doing the shift and all that, 701 00:32:51,069 --> 00:32:52,619 that's kind of complicated. 702 00:32:52,619 --> 00:32:54,639 Maybe I should just create a function to do it. 703 00:32:54,640 --> 00:32:57,520 All right, that's the old notion of pop-down design. Any time you get somewhere, well, 704 00:32:57,519 --> 00:32:58,470 you're, like, 705 00:32:58,470 --> 00:33:01,960 wow, that's kind of complicated. Maybe I don't want to stick this all in here and 706 00:33:01,960 --> 00:33:02,750 figure it out. 707 00:33:02,750 --> 00:33:05,890 But it's the smaller piece, which is just dealing with a single character instead 708 00:33:05,890 --> 00:33:07,340 of dealing with the whole string. 709 00:33:07,339 --> 00:33:10,429 Let me write a function that will actually do it, or a method that'll actually do it. So what I'm 710 00:33:10,430 --> 00:33:13,419 gonna do, is I'm gonna append to my results 711 00:33:13,419 --> 00:33:17,759 what I get by calling encrypt, 712 00:33:17,759 --> 00:33:20,860 a single character. So I'll just call it encrypt char 713 00:33:20,859 --> 00:33:24,238 and what I'm gonna pass to it is the character that I want encrypt, and I need 714 00:33:24,239 --> 00:33:26,900 to also pass to it the key so it knows 715 00:33:26,900 --> 00:33:30,559 how to do the appropriate shifting to encrypt that character. 716 00:33:30,558 --> 00:33:35,589 And after it does this encryption, I'm just gonna say hey, if you've 717 00:33:35,589 --> 00:33:38,928 successfully encrypted all of your strings, what I want to do is return, 718 00:33:38,929 --> 00:33:40,000 719 00:33:40,000 --> 00:33:40,910 RTN, 720 00:33:40,910 --> 00:33:43,009 my results, right. 721 00:33:43,009 --> 00:33:46,339 That's your standard string idiom. I start off with an empty string. 722 00:33:46,339 --> 00:33:49,709 I do some kind of loop through every character of the string. I'm gonna do 723 00:33:49,710 --> 00:33:52,919 the processing one character at a time, and return my results. 724 00:33:52,919 --> 00:33:54,660 Everything in that 725 00:33:54,660 --> 00:33:58,080 function that you see, or in that method that you see, except for that one line, should 726 00:33:58,079 --> 00:34:01,319 be something you can do in your sleep now. You've seen it, like, over and over. We 727 00:34:01,319 --> 00:34:04,210 just did it a couple of times today. We did it a couple times last time. 728 00:34:04,210 --> 00:34:07,759 It's the standard kind of thing for going through a string one character at a time. 729 00:34:07,759 --> 00:34:10,929 And now, we reduced the whole problem of encrypting a whole string 730 00:34:10,929 --> 00:34:13,239 to the problem of just encrypting a single letter. 731 00:34:13,239 --> 00:34:14,619 So what I'm gonna have in here 732 00:34:14,619 --> 00:34:15,940 is private, 733 00:34:15,940 --> 00:34:18,739 and this is gonna return a single character called - 734 00:34:18,739 --> 00:34:21,639 and this puppy's called encrypt char. 735 00:34:21,639 --> 00:34:25,499 And it's gonna get passed in some character to encrypt as well as the key 736 00:34:25,498 --> 00:34:28,018 that it's gonna use to encrypt it. 737 00:34:28,018 --> 00:34:29,408 And now I want to figure 738 00:34:29,409 --> 00:34:33,898 out how do I encrypt that single character. So 739 00:34:33,898 --> 00:34:37,168 what's something I could do to think about how this character actually gets 740 00:34:37,168 --> 00:34:38,708 encrypted. 741 00:34:38,708 --> 00:34:41,778 How do I want to do the appropriate shifting of the character. So let's say I've 742 00:34:41,778 --> 00:34:42,548 gotten 743 00:34:42,548 --> 00:34:43,940 an uppercase A. 744 00:34:43,940 --> 00:34:47,068 Let's assume for right now all my characters are uppercase. As a matter of fact, that's a perfectly fine 745 00:34:47,068 --> 00:34:48,070 assumption to make. 746 00:34:48,070 --> 00:34:50,780 The solution you've gotten to, it assumes all the characters are uppercase, so 747 00:34:50,780 --> 00:34:52,540 assume all the plain text is uppercase, 748 00:34:52,539 --> 00:34:56,808 and I want to return to the encrypted cipher text also in uppercase. Let's say 749 00:34:56,809 --> 00:34:59,430 I've gotten an uppercase A, okay. 750 00:34:59,429 --> 00:35:00,919 And my 751 00:35:00,920 --> 00:35:06,528 T is 3. So I want to do, is take that A somehow, and convert it to a D. 752 00:35:06,528 --> 00:35:13,528 How do I do that? [Inaudible]. Un huh. 753 00:35:14,498 --> 00:35:17,578 I want to add 3 to the character. 754 00:35:17,579 --> 00:35:21,068 Now the only problem is I might go off the end of the character. 755 00:35:21,068 --> 00:35:24,248 If I just add 3, and I have a Z, I'm gonna 756 00:35:24,248 --> 00:35:27,248 - if I just have the A and go to D, that works perfectly fine, but if I have 757 00:35:27,248 --> 00:35:30,048 a Z, I'm gonna get something like an exclamation point, or something I 758 00:35:30,048 --> 00:35:32,278 don't know 'cause I go off the end of the character. 759 00:35:32,278 --> 00:35:35,039 So I need to do slightly a little bit more math. And what I'm gonna do is 760 00:35:35,039 --> 00:35:36,450 say take this character, 761 00:35:36,449 --> 00:35:38,889 and subtract from it uppercase A. 762 00:35:38,889 --> 00:35:42,759 That's gonna tell me which character in the alphabet it is, which number 763 00:35:42,759 --> 00:35:44,259 character it is, right. 764 00:35:44,259 --> 00:35:46,019 Now, if I add the key, 765 00:35:46,018 --> 00:35:47,868 what I get is the 766 00:35:47,869 --> 00:35:50,930 number, or the index, of the shifted character. 767 00:35:50,929 --> 00:35:54,079 So if I had an uppercase A, and I subtract off uppercase A, I'm gonna 768 00:35:54,079 --> 00:35:55,018 get a zero. 769 00:35:55,018 --> 00:35:57,888 I now add the key, so I get 3. 770 00:35:57,889 --> 00:36:01,349 And you might say, well, if you just convert that to a character, you get a D. That's perfectly fine. 771 00:36:01,349 --> 00:36:05,649 Yeah, but if I had a Z and I subtract off an uppercase A, I get 25. 772 00:36:05,648 --> 00:36:09,328 If I add 3 to 25, I get 28, which is now outside the 773 00:36:09,329 --> 00:36:12,778 bounds of the alphabet. How do I wrap around that 28 back to the 774 00:36:12,778 --> 00:36:16,389 beginning of the alphabet. 775 00:36:16,389 --> 00:36:20,759 Mod it by 26, or we do with the remainder operator by 26, 776 00:36:20,759 --> 00:36:23,630 right. So what that does is it says if you've gone off the end, 777 00:36:23,630 --> 00:36:26,440 basically, when you divide by 26 and take the remainder, if you've 778 00:36:26,440 --> 00:36:31,869 gone off the end, it kind of gets rid of the first 26, and wraps you back around the beginning. 779 00:36:31,869 --> 00:36:33,630 So if I do that, 780 00:36:33,630 --> 00:36:35,548 this will actually work 781 00:36:35,548 --> 00:36:38,639 to get me the position of the character 782 00:36:38,639 --> 00:36:41,650 wrapped around, and once I've gotten the position of the character, here's the 783 00:36:41,650 --> 00:36:45,568 funky thing. I need to add the A back in 784 00:36:45,568 --> 00:36:48,808 because if I have, let's say, an uppercase A to being with, and I subtract out 785 00:36:48,809 --> 00:36:50,380 uppercase A, that gives me zero. 786 00:36:50,380 --> 00:36:55,970 I add the key. That gives me 3. I do the remainder by 26. Three 787 00:36:55,969 --> 00:36:59,558 divided by 26 as the remainder is still 3. So 788 00:36:59,559 --> 00:37:03,499 now I have the number 3. I need to get that 3 converted to the letter D. 789 00:37:03,498 --> 00:37:05,848 How do I do that? I add the letter A 790 00:37:05,849 --> 00:37:08,649 to that 3, okay. Is there 791 00:37:08,648 --> 00:37:10,848 any questions about that? 792 00:37:10,849 --> 00:37:13,939 Now, the final funky thing that I need to do, 793 00:37:13,938 --> 00:37:17,438 is if I want to assign this to a character, I can't do this directly. Notice if I 794 00:37:17,438 --> 00:37:20,239 try to do this directly, I get this little thingy here. And you might say 795 00:37:20,239 --> 00:37:23,959 Marilyn, what's going on? Like you told me characters were the same as numbers, and everything 796 00:37:23,960 --> 00:37:27,889 I've done so far has to do with numbers, so why can't I assign that to a character? 797 00:37:27,889 --> 00:37:29,969 And this little error message comes up. 798 00:37:29,969 --> 00:37:33,268 And this has to do with the same thing when we talked about converting from real 799 00:37:33,268 --> 00:37:36,498 values to integers. Remember when we went from a real value to an integer. We said you'll 800 00:37:36,498 --> 00:37:39,768 lose some information if you try to truncate a real value, like a double to an 801 00:37:39,768 --> 00:37:40,199 integer. 802 00:37:40,199 --> 00:37:43,728 So you explicitly have to cast it from being a double-splint integer. 803 00:37:43,728 --> 00:37:46,899 Same thing with characters and integers. The set of possible integers is 804 00:37:46,900 --> 00:37:48,628 huge. It's like billions and billions. 805 00:37:48,628 --> 00:37:51,739 The set of characters is much smaller than that. So if you want to go from 806 00:37:51,739 --> 00:37:55,460 an integer back to a character, you need to explicitly say 807 00:37:55,460 --> 00:37:57,510 convert that integer back to a character. 808 00:37:57,510 --> 00:37:59,610 So we need to explicitly do a 809 00:37:59,610 --> 00:38:00,360 cast here 810 00:38:00,360 --> 00:38:01,999 back to a character. 811 00:38:01,998 --> 00:38:03,039 And if we do that, 812 00:38:03,039 --> 00:38:06,429 then we're happy and friendly. Did 813 00:38:06,429 --> 00:38:11,690 I get all my friends right? One, two, three, one two three. All right, 814 00:38:11,690 --> 00:38:13,989 why is this still unhappy? Oh, 815 00:38:13,989 --> 00:38:17,749 duplicate variable CH, yeah. Let me call this C. 816 00:38:17,748 --> 00:38:20,399 Actually, let me make my life easier. This is a thing I just want to return, so 817 00:38:20,400 --> 00:38:22,119 I'm just gonna return it. Do, do, 818 00:38:22,119 --> 00:38:26,499 do, do, do. I won't even assign it to any temporary variable. We'll just return it 'cause 819 00:38:26,498 --> 00:38:28,509 now I'm upset. No, 820 00:38:28,509 --> 00:38:31,688 I'm really not upset. We're just gonna return it. 821 00:38:31,688 --> 00:38:34,679 So, hopefully, that will give us our little Caesar cipher. 822 00:38:34,679 --> 00:38:37,788 So let's go ahead and run this, and see if, in fact, it's working. 823 00:38:37,789 --> 00:38:40,790 Any questions about this while this is running? I'll sort of scroll this 824 00:38:40,789 --> 00:38:42,708 down a little bit so you can see 825 00:38:42,708 --> 00:38:44,588 what's going on for that single character. 826 00:38:44,588 --> 00:38:49,139 So this was my Caesar cipher. 827 00:38:49,139 --> 00:38:52,798 So we say, et tu, brute. 828 00:38:52,798 --> 00:38:56,409 Illegal number format. Yeah 'cause that's not the thing I wanted to encrypt. 829 00:38:56,409 --> 00:38:58,519 My encryption here is 3, 830 00:38:58,518 --> 00:39:02,139 then I will give it the plain text I was to - everyone's like what did 831 00:39:02,139 --> 00:39:03,929 he do. Sometimes 832 00:39:03,929 --> 00:39:05,759 it's the obvious that's wrong, 833 00:39:05,760 --> 00:39:08,059 and you just need to read. 834 00:39:08,059 --> 00:39:11,249 All right, there we actually go. Now, there's a little problem here. See, the 835 00:39:11,248 --> 00:39:14,708 little problem is the spaces actually got encrypted. 836 00:39:14,708 --> 00:39:18,448 We don't want to encrypt spaces. We only want to encrypt things that are actually 837 00:39:18,449 --> 00:39:21,688 valid characters. So we're not quite done yet. What we need to do is come back over 838 00:39:21,688 --> 00:39:22,598 here and say, 839 00:39:22,599 --> 00:39:26,009 hey, you know what, for my encrypt character, I wasn't quite as bright as I thought I 840 00:39:26,009 --> 00:39:26,259 was. 841 00:39:26,250 --> 00:39:30,170 I need to make sure this thing's actually in uppercase character before I try to encrypt 842 00:39:30,170 --> 00:39:33,430 it. So we can sort of do that if 843 00:39:33,429 --> 00:39:36,528 I just call my little friend character. 844 00:39:36,528 --> 00:39:39,079 And the thing we want to say is, 845 00:39:39,079 --> 00:39:41,369 846 00:39:41,369 --> 00:39:45,039 is uppercase, 847 00:39:45,039 --> 00:39:49,829 and I'll pass at CH. So if it's already - if it's an uppercase character, then I'll 848 00:39:49,829 --> 00:39:51,859 return this. 849 00:39:51,858 --> 00:39:54,088 Otherwise, what I'll do - I'll tab this in - 850 00:39:54,088 --> 00:39:56,688 is I will just return CH 851 00:39:56,688 --> 00:40:03,688 unchanged. So if I've gotten something that isn't actually a character, then I'll return - do, do - why 852 00:40:04,849 --> 00:40:10,649 is this unhappy again. Oh, semicolon, thank you. All 853 00:40:10,648 --> 00:40:14,268 right. [Inaudible] Now I got an extra one. Notice it doesn't give me an error on the extra one 'cause, actually, semicolon without a statement 854 00:40:14,268 --> 00:40:17,448 is the emsin statement. It's perfectly fine, but thank you for catching the straight 855 00:40:17,449 --> 00:40:18,469 semicolon. 856 00:40:18,469 --> 00:40:20,259 So we'll go ahead and run this, 857 00:40:20,259 --> 00:40:23,639 and we'll try our friend, et tu, Brute, again. 858 00:40:23,639 --> 00:40:26,588 Sometimes it's all about texting, and so we have 859 00:40:26,588 --> 00:40:27,759 et tu, 860 00:40:27,759 --> 00:40:28,938 Brute, 861 00:40:28,938 --> 00:40:29,918 and now we're okay 862 00:40:29,918 --> 00:40:33,588 'cause we're not encrypting anything that is not a letter. 863 00:40:33,588 --> 00:40:36,719 So sometimes we think we're okay. We need to go back and just make sure we actually do 864 00:40:36,719 --> 00:40:41,188 the texting. Any questions about this? 865 00:40:41,188 --> 00:40:44,658 If this all made sense to you, nod your head. 866 00:40:44,659 --> 00:40:48,969 If this didn't make sense to you, shake your head. Feel no qualms about shaking your head. 867 00:40:48,969 --> 00:40:50,539 If you're someone in the middle, just 868 00:40:50,539 --> 00:40:52,170 stare 869 00:40:52,170 --> 00:40:59,170 and stare at me. No, if you're someone in the middle, shake your head. Okay, un huh. [Inaudible]. 870 00:41:01,139 --> 00:41:03,708 Why don't I need an L statement, like, say here? 871 00:41:03,708 --> 00:41:07,048 'Cause if I hit the return, I return from the function immediately 872 00:41:07,048 --> 00:41:10,798 and I never, actually, get it down to this return. So if I hit this return 873 00:41:10,798 --> 00:41:11,300 statement, 874 00:41:11,300 --> 00:41:14,200 I'm done with the method. As soon as I hit that return, it doesn't matter if there's 875 00:41:14,199 --> 00:41:19,478 any more lines in the method. I'm done. I actually return out. So the 876 00:41:19,478 --> 00:41:21,659 one other thing we might like to do with this that doesn't 877 00:41:21,659 --> 00:41:24,500 quite actually work right now. Let's actually try running this, then I'll show 878 00:41:24,500 --> 00:41:27,030 you what happens, just to show you that it's bad time. 879 00:41:27,030 --> 00:41:30,339 If I actually encrypt something like et tu, Brute, and I want to decrypt it, I might 880 00:41:30,338 --> 00:41:31,250 say hey, 881 00:41:31,250 --> 00:41:35,108 try to use minus 3 as your key, 882 00:41:35,108 --> 00:41:38,568 and if I try to put in the text - I don't even remember what the text was that I wanted to 883 00:41:38,568 --> 00:41:41,748 encrypt. I guess this funky thing with questions marks, and it's just not working 884 00:41:41,748 --> 00:41:43,448 to move in the negative directly. 885 00:41:43,449 --> 00:41:47,789 So I want to allow for my Caesar cipher to also be able to decrypt information, 886 00:41:47,789 --> 00:41:51,700 which means if I got a Caesar cipher by encrypting with a key of 3, 887 00:41:51,699 --> 00:41:55,839 if I give it the text that's been encoded, and I give it the key minus 3, it 888 00:41:55,840 --> 00:41:59,278 should shift it back three letters and actually work for me. So 889 00:41:59,278 --> 00:42:01,318 how do I do that? Well, 890 00:42:01,318 --> 00:42:04,909 it's something that has to deal with each individual character. If I want to 891 00:42:04,909 --> 00:42:07,849 encrypt each individual character, I need to figure out what's the right way of 892 00:42:07,849 --> 00:42:09,709 using the key, okay. 893 00:42:09,708 --> 00:42:16,518 Think about a key of minus 3. What's a key of minus 3 equivalent to? 894 00:42:16,518 --> 00:42:19,788 A key of 23, right. A few people mumbled it, so we'll just throw out 895 00:42:19,789 --> 00:42:20,699 some candy. 896 00:42:20,699 --> 00:42:24,278 If I want to go 3 in the opposite direction, if I want to go 3 897 00:42:24,278 --> 00:42:26,260 sort of this way, as opposed to this way. 898 00:42:26,260 --> 00:42:27,890 It's the same thing as going 899 00:42:27,889 --> 00:42:30,648 23 characters in the opposite direction. 900 00:42:30,648 --> 00:42:32,389 So if I want to think about doing that, 901 00:42:32,389 --> 00:42:34,309 I can say, if my key 902 00:42:34,309 --> 00:42:38,469 is a negative number, so if my key is less than zero, there's some shifting I 903 00:42:38,469 --> 00:42:41,568 need to do of the key to actually get this puppy to work. 904 00:42:41,568 --> 00:42:44,829 So if my key is less than zero - as a matter of fact, I'm gonna do this once 905 00:42:44,829 --> 00:42:45,609 down here. 906 00:42:45,608 --> 00:42:48,288 So rather than doing it, and encrypting each character, I'm gonna do it over here 907 00:42:48,289 --> 00:42:51,809 by saying you know what, once I shift my key over, I want to use that 908 00:42:51,809 --> 00:42:56,019 same key to encrypt all my characters. So I want to do the shifting just once up 909 00:42:56,018 --> 00:42:59,598 here. It makes sense to do it once for the string, and then I'll use my updated version of key. 910 00:42:59,599 --> 00:43:01,919 So here's what I'm gonna do. I'm 911 00:43:01,918 --> 00:43:03,478 gonna say take the key, 912 00:43:03,478 --> 00:43:07,118 and the way I'm gonna update key is I'm gonna say it's 26. And this 913 00:43:07,119 --> 00:43:10,568 looks a little bit funky, but I'll explain it to you in just a second - 914 00:43:10,568 --> 00:43:12,369 modded 915 00:43:12,369 --> 00:43:14,499 by 26. 916 00:43:14,498 --> 00:43:18,129 And you might say Marilyn, why do you need all this math to actually pull it off 917 00:43:18,130 --> 00:43:20,838 'cause if you do something, you could say why can't you just take 26 918 00:43:20,838 --> 00:43:22,358 and subtract from it 919 00:43:22,358 --> 00:43:26,498 your key. So if you want to say minus - or add toward your key. So if 920 00:43:26,498 --> 00:43:30,688 you want to have a key of minus 3, isn't that just the same as adding minus 3 921 00:43:30,688 --> 00:43:33,379 to 26. You'll get 23, aren't you fine. 922 00:43:33,380 --> 00:43:37,130 Yeah, that's fine for sufficiently small values of key. So if this thing 923 00:43:37,130 --> 00:43:39,710 actually is minus 3, minus, minus 3 924 00:43:39,710 --> 00:43:40,829 gives me 3, 925 00:43:40,829 --> 00:43:43,199 and if I were to - oh, missing 926 00:43:43,199 --> 00:43:44,190 a minus in here. 927 00:43:44,190 --> 00:43:46,349 Sorry, my bad. I had two 928 00:43:46,349 --> 00:43:49,829 minuses. I want to have another minus 929 00:43:49,829 --> 00:43:50,260 930 00:43:50,260 --> 00:43:53,620 right there. So if key is minus 3, and I take a negative of minus 3, that gives 931 00:43:53,619 --> 00:43:57,190 me 3. Twenty-six minus 3 by itself would give me 23, which is 932 00:43:57,190 --> 00:44:00,519 the value I care about and that's perfectly fine. 933 00:44:00,518 --> 00:44:04,258 But what happens if this key that someone gives me is something, for example, that's 934 00:44:04,259 --> 00:44:06,009 larger than 935 00:44:06,009 --> 00:44:08,119 26. That's kind of bad time 936 00:44:08,119 --> 00:44:10,449 because if I subtract a number that's 937 00:44:10,449 --> 00:44:14,338 larger than 26 from the 26, so if this happens to be minus, 938 00:44:14,338 --> 00:44:18,599 let's say 27, and I say minus minus 27 is positive 27. And I 939 00:44:18,599 --> 00:44:22,499 subtract 27 from 26, I get minus 1. That's bad time. 940 00:44:22,498 --> 00:44:23,058 941 00:44:23,059 --> 00:44:26,940 So the reason why I have this 26 in here, is it says first take the key. They 942 00:44:26,940 --> 00:44:28,619 gave me some negative value. 943 00:44:28,619 --> 00:44:33,039 Take the negative of that, which gives you some positive value. When you mod it by 26, you 944 00:44:33,039 --> 00:44:37,669 will guarantee that that value they've given you is less than 945 00:44:37,668 --> 00:44:39,478 26 'cause if it was 946 00:44:39,478 --> 00:44:42,588 26, 26 mod, 26 is zero. Something larger than 26 gives me 947 00:44:42,588 --> 00:44:43,978 a remainder. 948 00:44:43,978 --> 00:44:46,699 So as long as I mod by 26, I will always get back 949 00:44:46,699 --> 00:44:52,328 the appropriately mapped value, less than 26, and then I will subtract that [inaudible]. 950 00:44:52,329 --> 00:44:54,739 So just to make sure this actually works, 951 00:44:54,739 --> 00:44:55,759 what I'm gonna do 952 00:44:55,759 --> 00:44:57,909 is in my main program, 953 00:44:57,909 --> 00:45:03,689 I'm gonna say encrypt Caesar using this key. 954 00:45:03,688 --> 00:45:05,778 And then, do, do, do, so I have some cipher text. 955 00:45:05,778 --> 00:45:08,599 I'm going to now - well, actually, let 956 00:45:08,599 --> 00:45:11,640 me write out the cipher text so I'll still use this print link. 957 00:45:11,639 --> 00:45:12,699 And then, 958 00:45:12,699 --> 00:45:17,838 I'm going to have some other string, 959 00:45:17,838 --> 00:45:19,150 new plain. 960 00:45:19,150 --> 00:45:23,180 A new plain is just going to be doing encrypt 961 00:45:23,179 --> 00:45:24,239 962 00:45:24,239 --> 00:45:25,278 Caesar 963 00:45:25,278 --> 00:45:28,289 on my cipher text, so that should be my encrypted text 964 00:45:28,289 --> 00:45:30,199 with the negative of the key. 965 00:45:30,199 --> 00:45:32,449 So I want to, essentially, switch back 966 00:45:32,449 --> 00:45:33,548 to what I've got. 967 00:45:33,548 --> 00:45:35,900 And so I'll have print link 968 00:45:35,900 --> 00:45:40,599 new plane quote dot 969 00:45:40,599 --> 00:45:42,709 whatever the new - man, 970 00:45:42,708 --> 00:45:50,909 I cannot type to save my life - L print link. Thank you. 972 00:45:50,909 --> 00:45:54,759 So now I run this puppy 973 00:45:54,759 --> 00:45:57,960 in our final moment together, 3 974 00:45:57,960 --> 00:46:00,918 et tu, Brute. 975 00:46:00,918 --> 00:46:03,248 Well, at least I got it back even though I misspelled it. 976 00:46:03,248 --> 00:46:04,778 I got my 977 00:46:04,778 --> 00:46:07,978 mixed-up characters, and then I got my new plain text, which is the same as my 978 00:46:07,978 --> 00:46:11,998 original text, which I got just by, essentially, shifting in the negative direction. 979 00:46:11,998 --> 00:46:14,858 So any questions about that. 980 00:46:14,858 --> 00:46:17,818 Allrighty, then we're done with strings for the time being, and I'll see you on 981 00:46:17,818 --> 00:46:18,108 Wednesday.