Harvard CS50: Intro to Computer Science
Harvard CS50: Intro to Computer Science
name is David man and I actually took this course myself back in 1996 I was
a sophomore at the time I was actually concentrating in government because
a year prior as a first year I'd come into Harvard thinking that I liked high stre
and constitutional law and sort of similar classes in high school and so when I
got here I rather gravitated toward that which was familiar I figured if I liked
and if I were good at that particular subject in high school then that's
presumably who I'm supposed to be
here but it wasn't until sophomore year that I got up the nerve to step foot in
the cs50 classroom and even then it was only out of curiosity like I had no
intention of studying computer science of even taking cs50 when I got to
campus but people were talking about it and there was a lot of like beware
and it was it perhaps for the initiated only and I didn't really know ultimately
what computer science was but for me the sort of light bulb went off and I
found that contrary to what I had seen in high
school where I saw friends of mine like programming away in the computer
lab heads down sort of anti-socially just doing whatever it was they were
doing it really wasn't that once I got to this particular class in this particular
place it was much more about problem solving more generally and just
learning how to express yourself in code in different languages so that you
can actually solve problems of interest to you even if you have no intention
of being a computer scientist or an engineer but just want to be able to
solve problems analyze data do interesting things in the Arts Humanity social
sciences physical sciences or really any other field and indeed this particular
path led me to computer science but the hope for cs50 more generally is
that indeed you just find your way to applying principles that you'll learn
over the coming months to whatever field is of interest to you with that said
it was definitely a lot of work and not without its frustrations for me but there
was no better feel then like banging your head proverbial
against the wall for some number of hours even days trying to fix a bug a
mistake in your code and then oh my God the rush of emotion of
accomplishment of pride of exhaustion when you finally solve some problem
that's really been weighing on you it's just so incredibly gratifying but also
empowering because unlike a lot of fields the computer science was built by
humans themselves and so if a human built this surely you another human
can understand it as well and so even though there's going to be
some distractions along the way you're going going to see what looks
incredibly cryptic if you've never programmed before over time and with
practice everything just starts to make more sense and with time and with
practice you just get better at this particular field and indeed really the key
to success in programming in general is just to allow yourself enough time
and so at least thankfully I quickly got into the habit of starting early in the
week for instance when writing actual code why because you're going to like
run up against a wall you're not going to see some bug something's not
going to jump out at you and that's fine that's when you sort of call it a day
take a break move move on to something else and then just come back to it
and that's what keeps programming fund for me even all of these years later
whether it's teaching or actually applying it but there's uh down the road a
history of uh an MIT hack and it looked a little something like this in yester
year and there was a little sign the MIT students
when they made this hack uh on the wall that says getting an education from
MIT is like drinking from a fire hose which indeed they have connected to
what should have been otherwise just a water fountain and that's going to be
what it feels like sometimes not just in computer science per se but just in
UNAM familiar field if you're not from stem if you're not from CS that's fine
but so much of it ultimately is going to be absorbed by you and going to be
uh within your grasp by term Zen so just
keep in mind that's that's very much the intent but you'll be amazed what
you're able to create to accomplish uh just three or so months hence indeed
2third of you contrary to what you might think or assume have never taken a
CS class before so it's absolutely not the case that the person to the left and
to the right surely must know more than you indeed it's quite the opposite
and as you'll see in the coming weeks as you write your own code and solve
your own problems what ultimately matters in this
course is not so much where you end up relative to your classmates but
where you end up relative to yourself when you began and it really is all
about that Delta whether you've programmed or not just getting something
out of a class like this and and if it does take time and if you do feel those
frustrations but you simultaneously eventually feel that sense of
accomplishment that just means it's all working and indeed hopefully all the
more worthwhile and gratifying ultimately as a result so
what are we going to do in the coming week so here we are in week zero
we'll soon see why computers and computer scientists start counting if you
will from zero but week zero is which is one in which we explore
computational thinking sort of thinking like a computer and starting to clean
up your thought processes getting you to think to solve problems more
methodically and then ultimately Translating that into code and some of you
might recognize this environment here AKA scratch coincidentally also from
MIT you might
have used it in in grade school we'll use it today and a little bit this weekend
in the course's first homework assignment or problem set but not so much to
kind of play around in a way that you might have if you did use it in yester
year but to explore ideas of computer science and programming that we're
going to use and reuse every week Hereafter as well uh thereafter we're
going to transition just next week to week one so to speak whereby we'll
introduce you to a more traditional
language a lower level language an older language called C and in C you're
going to use your keyboard not so much your mouse and pointing and
clicking but you're going to write code that soon is going to look a little
something like this and if you programmed before can probably glean what
this is going to do if you've never programmed before which is the case for
most of you this too will soon make sense but this is the most canonical
program that most any programmer ever writes called hello
world and indeed that and all of the surrounding syntax above and below just
that sentence hello world will soon make all the more sense you'll learn how
to use industry standard tools so to speak pictured here something called
Visual Studio code or vs code you'll use a cloud-based version of it initially so
you don't have to suffer with any technical difficulties or headaches like that
it'll just work right off the bat but we'll use that tool and others ultimately to
then explore ideas in
computer science principles that you can apply and we'll take a look
underneath the hood so to speak of your computer at your memory or Ram
Random Access Memory where all of the data is ultimately going to be stored
we'll also take a look thereafter at bugs a bug is a mistake in a program here
is an actual bug in an actual computer in yester year but we'll teach you how
to debug programs find your own mistakes find others mistakes and improve
that code as well we'll transition then to algorithm
bars over there well odds are if you're like me you would probably kind of
eyeball it and if you could physically interact you might just start grabbing
the smallest elements first put them over on the left maybe grab the biggest
elements put them over on the right but what's your algorithm there like how
would you teach someone younger than you who's never done that before
how to do it how would you compel your Mac or PC or phone to do something
like that you can't just kind of wave your hand and
say oh you know figure it out move things around you have to express
yourself more methodically so we'll translate even ideas like this into code to
and that's what the Googles and others of the world are doing constantly as
they sort and organize the world's information we'll use metaphors along the
way if it helps we'll talk about your computer's memory is kind of being like a
postal address like every mailbox in the world has some form of postal
address Street city state country and
the like and it turns out that's how your Mac your PC and your phone also
work you've got a whole bunch of memory like the picture before but you can
think of it really as individual mailboxes and you can put anything you want
in those mailboxes and you can go to a mailbox to grab information that from
it so at the end of the day that's really all your computer's doing with
information it's just kind of organizing it not into mailboxes per se but a term
you probably know called bites for
instance instead we'll talk about problems that arise even nowadays in fact
most of you are familiar with your Mac PC even phone like spontaneously
rebooting sometimes crashing the little Annoying spinning beach ball or
Hourglass Icon that Happ like what is with that well those are just bugs in
programs that humans at Apple and Google and Microsoft and others they
screwed up and they wrote buggy code and your computer when it
encounters those mistakes doesn't know what to do and so nine times out of
10 so to speak it just
crashes or freezes or the like but that kind of stuff will make more sense so
even the real world will make sense and pictured here or some lower level
terms will eventually get to Mid semester but generally speaking when
something is going this way as per this arrow and something is going this
way as per this Arrow like that does not end well and that often is what
happens when your computer crashes someone's using memory up here but
someone else is using memory down here here and then they're not
really talking left hand and right hand so that is just a high level overview of
some of the problems we'll encounter but we'll focus to on data ultimately so
picture here is something fairly technical called a hash table it's an amalgam
of something we're going to soon call an array and also something we call a
linked list and these are just fancy terms for describing how you can organize
information even more flexibly than just putting individual ma values in
mailboxes like how could you build
chain them together well you'll learn how to do that in code so that even if
you get more data than you expect if your business is booming and you're
some web-based business how do you keep adding and adding information
to your software to actually keep up with it but this again is what codee's
going to soon look like as soon as next week in week one this here being C
but we'll transition in a few weeks to a more uh modern higher level
language so to speak called python indeed the course Very
deliberately back in my day and now this introduces you first to C which
funny enough most many people don't tend to program and certainly every
day I you see generally September October November December when
teaching C is 50 itself but it's everywhere nonetheless in fact even today's
other languages with which you might be familiar like Python and Java and
uh yet others still you see this same primitive language underneath the hood
because it's so darn fast and as you'll learn over the coming weeks it
and even mobile devices nowadays well pictured here is a language called
HTML it's not a programming language it's a markup language and some of
you might have made homepages or portfolios in the past but you'll
understand what's going on here but more powerfully you'll understand how
the computer sees that same kind of code builds up a hierarchical family tree
type structure in memory and then you can manipulate that tree with code
to actually add more and more information chat messages
anything on the screen that you like and finally we'll tie all of this together by
introducing what are called Frameworks and libraries third-party code that
makes it a lot easier to solve problems of interest to you and so in particular
here this is the very first web app that I myself made back in like 1997 I was
part of the first year intramural sports program not as an athlete but as the
programmer and I was teaching myself how to build web applications I only
knew C and maybe a little bit of something else at the time
but this became for Harvard at least the very first website for the first year
uh intramural sports program and it wasn't just a static website with links
and images and the like it was interactive you could register for sports we
could input exactly who is in a tournament bracket or the leg and it could
actually automatically keep track of this data so there two after just three
months of a class like this you'll go from writing quite simply this week in
next hello world to building things like this for
whether it's web mobile uh or other platforms as well if you so choose but
we'll get you off of the courses infrastructure by the end of the term you
won't be using any toy environments along the way will Empower you
ultimately to write code after cs50 especially if this is the only CS class you
ever take on your own Mac or PC using the same software but not the cloud-
based version thereof but all of this software is itself free and can be used by
you uh powerfully after the course's own end but along the way as
you may know uh is there this tradition within the class particularly in
healthy times of a number of events that really brings people get together
not just collaboratively academically but to just solve problems and generally
uh engage with each other as well coming up first cs50 puzzle day which is
meant to be not jigsaw puzzles but logic puzzles that require no prior
experience with computer science or programming but it's just an
opportunity to sort of quietly work on a packet of puzzles with some
number of friends for for prizes and more later the semester once you tackle
your final projects the Capstone of the course where we don't give you a
homework to write you yourself come up with something to build we'll get
together generally around 7 a uh 700 p.m. in the evening uh wrap up around
7 a.m. if you so choose and it's an evening a 12-hour opportunity to
collaborate with classmates on your very own final project in a large space
on campus uh that ends if you're awake with
us at 5:00 a.m. uh we can hop on some cs50 shuttles and go down the road
for some pancakes at IHOP around 6 uh of course uh of course this is 6 7
a.m. at that point but it's an opportunity finally to lead into What's called the
cs50 fair which is an end of semester celebration an exhibition of everything
that you'll accomplish over the coming months and in fact pictured here are
some of your predecessors and healthy times um the cs50 fair allows you to
come with your laptop or phone and
exhibit a students faculty and staff across campus put together something in
person and on video that people can Delight in seeing as you exhibit what it
is you create in what you learned over the course of the several weeks and
ultimately a chance to just share and Inspire others as well um and you'll all
walk home ultimately with your own I to cs50 t-shirt saying as much as well
so with that highlevel overview of the course I propose that we begin to take
a look at what computer science itself is
and what it is we're going to be doing over the next several weeks at this
lower level than two so what is computer science right if you're maybe like
me or new people like my friends and High School you probably assume that
it means programming and that's absolutely a big part of it for a lot of people
because with code you can write you can express ideas and solve actual
problems especially involving data but computer science itself is really the
study of information if you will how do you
ultimately follow and I'd propose that this is problem solving you've got some
input which is like the problem you want to solve the goal is to solve it so
that's the so-called output and then somewhere in here the proverbial black
box is some kind of Secret Sauce that gets the work done and in the coming
months have to decide well how are we going to represent these inputs and
outputs and really how do we code up how do we write solutions for what it
is that's solving the problem of interest
if you have to count so high but unary is a very simple system of using a
single symbol a human finger in this case to just solve some problem like
counting the number of people in the room let's make this slightly more
technical for a moment a little more mathy that's just called base one where
the base under which you're operating has one digit in it like literally a
human finger and maybe multiple such fingers if you need to count higher
but of course of course all most of you if
not all of you generally vaguely know that computers use something other
than unary and even you and I probably don't use this that often they use
what language or alphabet instead yeah so binary so binary is indeed the
system that computers somehow use so in this case by implying two and so
computers have two digits it turns out at their dispos one in fact if you've
ever heard the technical term bit which is like a smaller version of a bite
more on that soon well a binary digit is the origin of that term bit
because if you get rid of some of the letters and you're left from binary digit
with just bit thus is a bit a bit is just a zero and one it's two more digits than
you might have on your own finger and of course it's fewer though than you
and I have you and I typically use as humans the decimal system deck
meaning 10 because you and I generally use zero through nine so on the one
hand another pun intended you've got unary computers use binary we
humans generally think and talk in terms of De
but at the end of the day these are fundamentally going to be the same
thing which is to say that it's all pretty accessible to us even if you're not a
computer person I dare say you're about to be so what is a bit well a bit then
is a zero or a one that is a so-called binary digit but how do computers only
speak in binary how do they solve problems represent information using only
binary well at the end of the day if they want to represent zero and one we
need to do so physically somehow and
I dare say that maybe the simplest way to think about a bit a zero or one is
like a light bulb and so by human convention let's just assume that if you are
a computer be it a laptop desktop phone or the like and you want to
represent the number zero you know what you just keep a light switch off
you keep a light bulb off if by contrast you're that same computer and you
want to represent the number one you take that same switch that same light
bulb and just turn it on so a light bulb that's on represents a one and a light
bulb that's off represents a zero so why is this relevant into computers well
at the end of the day you and I are charging our laptops or phones at night
so there's some like physical resource being replenished there whether
you're on battery or some power cord and so a inside of a computer are just
thousands millions of tiny little switches nowadays you can think of them
metaphorically as light bulbs but they don't yet it's actually shine light but
there are tiny tiny little switches and
those switches if you've ever heard the term are just called transistors so like
computers have millions of transistors that can either be flipped on to
represent ones or flipped off to represent zeros and from that very simple
mechanism electricity is there or it's not a one or a zero computers can
actually count obviously from zero to one but it turns out even higher if they
use a little more electricity as well so how might I do this well let me go
ahead and propose that I just grab one of our
own light bulbs here on stage this one's off so for instance If This Were
miniaturized inside of your Mac PC or phone this would be a transistor and
indeed here's the little switch on the bottom and if your computer wants to
represent a zero it just leaves the switch off and the light is not shining if you
want to represent a one well now I've counted as high as one because the
switch is now on I've grabbed a little electricity I'm holding on to it inside of
the computer and so now I see that
this is a one all right but unfortunately with just one switch one light bulb I
can only count from zero to one how do I count out higher might you think
intuitively say again yeah so more light bulbs so let me do this let me just
put grab something to put these on so I can use a few of them at a time and
let me propose that here instead of having just one light bulb let me give
myself maybe three in total so all of them are initially off and if you think of
this in miniature form in your mind's eye
this is like a computer with three transistors three switches representing now
the number you and I know as zero why they're just all off so how does a
computer go about representing the number one well it turns on one of these
light bulbs and how does the computer represent the number two well you
might think if I by may you just turn on a second light bul and if you might
think how does a computer represent three you just turn on the third light
bulb and so as such with three bits a computer would
seem to be able to count from zero on up to 1 two three but it turns out if I'm
a little smarter here I can actually count higher than that why well I'm just
considering like the combination of bulbs being on here what if I do
something like this this is still zero I will claim but what if I propose now that
this will be how a computer represents one on on off off this though will be
how the computer represents two notice I didn't turn on the same two I'm
just turning on the one in the middle
this I now claim will be how a computer represents three this is going to be in
just a second how a computer represents the number we know as four and
yet I'm still only using three bulbs this is going to be the number the
computer represents as five this is going to be how the computer represents
the number six and then lastly it turns out with three light bulbs if you're
smart about it you can count it seems as high as seven now even if you kind
of lost track of like what I was turning on and Y I
claimed there were eight different patterns from all of them off to all of them
on but notice that I started to permute them I took into account which ones
were on and which ones were off why though do these represent the
numbers we know as 0 through 7even well let me go ahead and maybe let's
let's do this instead of just considering there to be light bulbs let's assign
some special significance to each of them based on where it is and maybe
for this could we get uh maybe three volunteers three
volunteer okay yeah who's you're being volunteered okay come on up if you
want to go over to the stage there yeah you want to come on up as well and
over here as well so there's some stairs on either end maybe a round of
applause for our first volunteers of term all right so you want to be you are
number one and if you want want to go ahead and stand roughly right here
how about you want to be number two okay come on over and right uh to
the right of here and you'll be number four it
turns out and if you want to come over here on this end let's give you all a
moment to uh introduce yourselves uh briefly to your classmates if you'd like
hi I'm Ellie I'm a senior nice to meet I'm rahana and I'm a first year welcome
hi I'm Joseph and I am a first year welcome all right so so glad to have all
three of you up here thank you let me propose now that we'd like you you
three to represent how about the number zero and I claim now that if each of
you now represents a switch you have
sort of fancier light bulbs now one is a one one is a two one is a four but each
of you still just has a switch on the bottom in fact of your plastic devices I
claim these three volunteers are representing the number zero let me ask
you all now how might you represent the number one how should you
cooperate here okay so so we would have on off off which I think is matches
what I did with my three light bulbs as well if you want to go ahead and turn
yours off how might you three represent the number
two okay so off on off now from right to left how would you3 represent the
number three ah so that's why my two light bulbs went on at the end how
would you three represent the number four perfect number five number six
and number seven all right and give us one more how would you represent
eight okay you can't uh how about then one more volunteer one more
volunteer okay come on up all right what's your name my name is Moen if
you want to say it there my name is Moen all right Moen you're GNA be
number eight and if now you all actually let's make this uh how would you
represent number eight all collectively as four bits or four switches okay
eight and now lastly give me 15 everyone's awkwardly doing arithmetic in
their head oh using unar hey yeah is everyone yes okay Round of Applause
okay thank you all if you want to leave your numbers over here we have a
650 stress ball for you but thank you for volunteer you can turn those
numbers off and leave them over here so thank you so how do we go about
how do we go
from there to uh creating these patterns well even though we still had three
bits initially and three switches later four bits and four switches ultimately we
still used it used the same approach fundamentally to actually representing
information and now why were they those patterns and why did I very
deliberately have our volum volunteers line up in that way well I wanted
them using base 2 AKA binary but with binary there come certain rules and
even if you're not familiar with binary beyond that it
exists and relates somehow to computers it's actually pretty much identical
to the system you and I use every day known as base 10 AKA decimal so
let's consider if you will by rewinding to primary school for just a moment like
how decimal works and you'll see that even if you're not a computer person
you actually are you just have to tweak your mental model ever so slightly so
here is the number that you're probably viewing as 123 but why is that well
it's not really 123 these are just uh this is
just a pattern of three symbols on the screen 1 2 three and your mind is sort
of rapidly assigning mathematical meaning to them 123 but why is that well
if you're like me you probably learned back in the day when you have a
three-digit number like this the rightmost number is in the ones place the
middle digit is the 10 place the leftmost digit is in the hundreds place and
why is that relevant well if you then quickly do some mental math as you and
I just do uh instantly nowadays that just means 100 * 1 plus 10 * 2 plus 1 *
3 of course 100 + 20 + 3 gives us the number you and I know as 123 but
beyond that how do we get to just two digits instead of as many as nine in
the decimal system well let's generalize this in the decimal system you and I
know if we've got three digits represented by these hashes here yes it's the
ones place T Place hundreds place and if we keep going thousands 10
thousands and so forth but why is that well base terminology is now a little
more gerine that's technically the 10 to
the zero column the 10 to the 1 10 to the two so these are powers of 10
where 10 is your base computers just simplify things a little bit because
computers at the end of the day only have access to electricity on or off they
don't have access to like 10 different types of electricity just two on or off if
you will well they just use a different base and the rightmost digit would be
in the so-called 2 to the zero then the middle digit is 2 to the one the
leftmost is 2 to the 2 AKA one's Place two's Place
Four's place and as we kept going 8 and if we keep going 16 32 64 128 and
so forth but the idea is fundamentally the same so why is this how the
computer represents the number you and I know is zero well off off off from
right to left or in this case left to right is just zero why CU that's 4 * 0 + 2 * 0
+ 1 time 0 is of course 0o this is why 0 01 represents one this is why 0 1 0
represents two and three and four and five and six and seven on up and why
did we need a fourth bit to represent eight
well we kind of needed to carry the one so to speak using our human familiar
uh familiar human terminology but for that we need a fourth bit another
transistor and this now represents the number eight and that's why we
ended with on from left to right off off off so I keep saying on and off or the
light bulb is on or off but really I just mean one or zero and so computers and
we humans think of things digitally as just being zeros and ones but
mechanically you can think of it indeed as these light bulbs
now a bit not very useful even three bits four bits not that useful you can
count to seven or 15 generally speaking bites are a more useful unit of
measure and anyone familiar how many bits is in a bite yeah so eight bits
are in a bite can think of it as an octet equivalently uh in some context there
are nuances there but think of a bite as just being eight bits and that's just a
more useful measure so what does this mean in real terms well if you've ever
downloaded like a music file or a photograph or a a
video those are measured in bytes probably not small numbers of bytes
probably kilobytes for thousands of bytes megabytes for millions of bytes
gigabytes for billions of bytes especially for video that just means you have a
lot of patterns of 8 Bits some combination of zeros and ones on your
computer's hard drive here then with a bite of bits eight bits is how a
computer would typically represent the number zero and if that same
computer uses all eight of its bits it's full bite to rep uh to change it to one
anyone who's quick with math or have seen this before how high can a
computer count with eight bits or one yeah 255 why is that well we're not
going to turn this into a constant math exercise indeed after today we're not
really going to think about or talk about bits at this low level but this is the
ones place 2os 4S 8 16 32 64 128 and if I do all of that math from left to right
that indeed gives me 255 it ignores how we might represent negative
numbers but perhaps more on those some other day but computers of
course do so much more than numbers and math and all this lowlevel stuff
we send text messages right documents emails and the like so how might a
computer represent something like the letter A I claim at the end of the day
your Mac your PC your phone just has lots of transistors lots of switches that
it can use in units of eight in units of bytes how though if it's already using
those patterns of zeros and ones apparently to represent numbers from zero
on up how do you go about representing letters of the
alphabet might you think yeah okay so we could assign a number to every
letter okay so let me just conjecture well let's just call a0 for Simplicity B1 1
C2 and now let me play Devil's Advocate okay how do I now represent zero
or one or two well we've maybe created a problem for oursel if now we have
to steal some numbers to represent letters we kind of have to pick a lane but
there's a solution to that too that we'll see and it turns out the world is not
quite as simple as a being zero a
describing what really you proposed a mapping between numbers and letters
not not quite as simple as 012 starts at 65 66 67 for capital letters but here
are most of the letters in use today at least with this system so this is just a
big chart from online and you'll see it in the middle of this chart here here's
my 65a here's my 66 b c and let's see 72 is H 73 is I and so forth so there's a
mapping at least for English between all of these numbers and all of these
letters and if we focus here those are
the beginning of our up case alphabet so suppose then that today tomorrow
you receive a text message from someone and underneath the hood now
that you're a computer person you figure out a way to see what pattern of
zeros and ones was sent in this case it's wireless as opposed to wired but it's
still some pattern of zeros and ones and your phone is turning some switches
of its own on and off to represent that message from a friend suppose that
the three patterns you received were these three btes from
left to right spelling out a three-letter word well if if we do out the math one's
Place two's place and so forth I'll spoil it for you suppose that you received a
text message that doesn't literally say 72 73 33 but you've received a
pattern of 8 plus 8 plus 8 24 bits that if you do out the math represent the
decimal number 72 73 33 anyone recall what message you might have
received from the green and white charts yeah hi but yes hi is the message
but 7273 gives H and I what's 33 any any
guesses to 33 yeah over here yeah so it's an exclamation point how would
you know that well you really do need some kind of cheat sheet AKA aski in
this case and if we look elsewhere let me highlight the left of the chart you
can see that next to 33 in decimal is indeed the exclamation point so back in
the day a bunch of humans got in a room decided that hey when we start
building PCS and later Macs and phones we all just have to agree on this
form of representation of letters of the the English alphabet
in this case we just need to agree on this mapping but somewhat curiously
notice this it turns out that once you paint yourself into this corner and start
using 65 for a 66 for B well how do you represent 65 the number and 66 the
number if you want to do math or use Excel or something like that does
anyone kind of see the solution perhaps how do you represent the number
one in asy yeah in the middle yeah so this is getting a little maybe Inception
or something but you could represent numbers with other
numbers and so if you want to represent the number you and I know as one
like when you type it on your keyboard turns out the computer stores that as
the decimal number 49 if you hit two on your keyboard the computer is not
storing two per se it's storing the decimal number 50 now thankfully the the
uh the uh the the Paradox kind of stops there we just have a mapping now of
numbers to to numbers but really at the end of the day and you're going to
learn this when we start writing code in that other
language C next week it's just context dependent at the end of the day
inside of your Mac PC and phone there's just all of these permutations of bits
all of these patterns of zeros and ones and generally speaking when you
open up a text message that you've received from someone it's zeros and
ones but obviously if it's a text message the whole point of text messages is
to send text and so those patterns of zeros and ones by default will typically
be interpreted as letters of the alphabet
so you won't see zeros and ones you won't see decimal numbers you'll see
the English message that your friend intended by contrast if you open up
something like Excel that same pattern of zeros and ones might indeed work
out to be 6 uh 72 73 33 you might see cells in your spreadsheet with literally
those three numbers why because spreadsheets are all about numbers and
number crunching and math uh in many cases if by contrast you open up
Photoshop and try to look at that same pattern of
zeros and ones it's not going to be 727 73 33 it's not going to be zeros and
ones it's not going to be high it's going to be some color of the rainbow
you're going to use those patterns of zeros and ones it turns out too to
represent colors and indeed so long as you and I just agree as humans long
have what these patterns are going to be all of our systems many of our
systems nowadays are indeed interoperable but I'm being very biased here
and indeed the a in ASI is very American Centric
what do you not see in this chart if you speak any other language in English
odds are you're not seeing characters you know and love and need every
day to type or send messages well there's a huge character set that's not
supported here whether it's accented characters and a lot of Asian alphabets
you have many more symbols than can fit even on this screen here and so
humans kind of painted themselves into a corner early on or really
Americans did but on a typical keyboard us English keyboard
yeah you have A's and B's and C's uppercase and lowercase but you also
have accented characters here and nowadays not sure if this is maybe
necessary but nowadays you have other characters on your keyboard like
these and these are kind of a playful incarnation of what's actually a
technical solution to this problem if I claim for the moment that ASI
historically used seven bits to represent letters and let's just round that up to
a bite eight bits to represent letters aski can represent as
many as 255 or really 256 total characters why 256 well if you have them all
zero that's zero and the highest number I claimed a moment AG go was 255
so that's 256 total possibilities that's not many letters it's fine for English but
not a lot of human languages so what might the intuitive solution be if you
want to represent accented characters um Asian characters Emoji even like
these which are just keys on a keyboard nowadays what's the intuitive
solution if a bite's too few
yeah yeah so add another digit just like we had a fourth volunteer come on
up to give us a fourth bid let's just throw throw Hardware at the problem and
use a few more bits so maybe instead of one bite let's use two or heck let's
use three or four bytes even though it's getting a little expensive we're going
from 8 to 16 to 24 or 32 bits that's how computers do things these days and
thankfully we have so much memory inside of our computers and phones we
can certainly spare a few to represent these
things and the solution then to aski is what we'll call Unicode so Unicode is
also just a mapping of letters to num of numbers to letters but in many many
different languages and indeed the Unicode Consortium is a bunch of people
from all different companies and and uh and a lot of different uh companies
and countries and cultures whose mission as an organization is to capture
digitally all forms of human language in this case and to ensure that
especially smaller demographics of humans speaking lesser
and ones or if we do out the math suppose you receive a text message that if
you do out the math in decimal is 4 B 36 m991 106 anyone know what emoji
you're looking at this would be weird if you do but what is this Well turns out
that as of this past year this is the most popular Emoji to be sent uh by many
measures face with tears of joy so that is the pattern that a bunch of humans
in the Unicode Consortium decided would represent this but you'll notice
some many of you might have iPhones some of
you might have Android devices too and sometimes these don't actually look
quite the same this happens to be the current version of face with tears of
joy on iOS on Android it tends to look a little something more like this and
here's kind of a curiosity even though you and I look at these things and they
look like images they're not images they're characters at least as we've
defined them now in Unicode and a IOS and Android and Windows and
Facebook and other companies and uh apps nowadays
really just have different fonts if you will so just like fonts with English and
other languages can give you different F characters with seraps or not emoji
are themselves yes drawings that someone made but they're really just a
font and so that same pattern of zeros and ones might just render slightly
differently on someone's phone or another if you've ever gotten like an icon
on your phone that's broken and you've been sent an emoji but it's like a
square or something arbitrary and not sensical
sensible it might just mean that you have not updated to the latest version of
iOS or Android which just updates the font of supported Emoji because those
Folks at Unicode pretty much every year nowadays are adding more and
more Emoji to that that particular character said now I went down the rabbit
hole figuring out the other day just which are the most popular Emoji these
days on Twitter specifically this past year the most popular Emoji by contrast
uh was loudly crying face I don't know if that says
more about 20121 or about Twitter but you'll see different Trends certainly in
how these are used but even humans themselves didn't necessarily think
two steps ahead and now a lot of the Emoji are the sort of default yellow
color but there's a lot of emoji that aren't sort of these cartoon characters
but they're meant to represent humans in various professions or gestures or
the like and nowadays too you've probably noticed on your phone and Macs
and PCs there are different skin tones that you can assign
to certain emojis if it's supported by the company and by Unicode you can
actually like touch and hold on a certain emoji and then you can choose the
appropriate skin tone to represent yourself or someone else and that then
modifies the display well let's just think for a moment here how did Apple
and Google and Microsoft and others go about implementing support for
emoji with different skin tones how could you do this if you want to represent
some smiling Emoji but in five in this case
different skin tones you could come up with what five different patterns that
are identical structurally except for the skin tone used in places in that image
but that's a little inefficient right to kind of just do copy paste paste paste
paste and like change the color in Photoshop if you will that's going to use
more bits more information than you might need to how else if you now start
to think a little bit more like a computer scientist if at the end of the day all
you have are zeros and
ones how else could you implement skin tones might you think yeah okay so
RGB and we'll come to that in just a moment that stands for red green blue
that's one way in this case though I'm seeking an alternative to just using
five different patterns of zeros and ones to represent the same Emoji but
different skin tones so not quite RGB yeah okay so store one copy of the
Emoji and then store different variants of the color that you want to assign to
that emoji yeah so this is actually an
example of do you want to elaborate okay so you can use a loop to actually
output these things more on that in a moment let me go down this this road
for just a moment this would be in some sense sort of a a better design if you
will uh but why yeah filter okay so filter if we think sort of in the Instagram
sense you can sort of change the color of something and that could be
related here too could it be oh interesting so maybe it could be just a
completely different font and you have five different fonts that are
almost identical except for the various interpretations of skin tone for those
same Emoji let me spoil I think if we go down this one particular Road the
way the uh the Unicode folks decided to do this some years ago were the
first uh bite or bites that you receive via text or email just represent like the
structure of the Emoji the default Yellow Version thereof but if it's
immediately followed by a certain pattern of bits that these humans
standardize to represent each of these different shades of skin tone then the
phone the Mac the PC will change that default color yellow in most cases to
whatever the more apt human tone is so you just use twice as many bits but
you don't use five times as many bits so what do I mean you don't have uh
five completely distinct patterns per se uh for each of these possible variants
you have a representation of just the Emoji itself structurally and then uh
reusable pattern patterns for those five skin tones un um unfortunately that
wasn't quite versatile enough for other
features that were in the pipeline and nowadays too and there's a double
meaning now to representation emojis had tended to focus on certain
professions and early on too were certain professions associated with certain
genders and vice versa and you couldn't necessarily be one gender or
another in a certain profession or another there were these combinatorics
that just weren't possible but nowadays as you might have seen you can
have couples in love for instance that actually look a a
little more like three emojis but just kind of combined into one and indeed
this is just one key press on your phone and you can combine different emoji
on the left and in the right with the Emoji in the middle and so it turns out
how computers nowadays represent these patterns are one set of bits for the
character on the left one set of bits for a character on the right one set of
bits for whatever Emoji you want in the middle and then you assemble more
complicated compositions of emoji by just reusing those same patterns and
bits and bits Emoji doesn't the Unicode folks don't have to come up with a
whole new representation for some very specific Incarnation they can create
one for person for male for female for other uh characters that you might
want to display and reuse those same patterns of zeros and ones and so here
you see sort of the imperfection of or lack of foresight of humans for building
a system early on that was entirely American Centric no characters Emoji or
the like that's evolved too and so
that's an important detail in Computing nowadays um it too is evolved in and
the languages you're about to learn in the coming days those twoo are
evolving as well and new features are getting added and even programming
languages have version numbers you might have a different version of an
app on your phone programming languages too have different versions as
well questions then thus far and how information is represented using asy or
Unicode or anything in between yeah so you can use a string of a good
question
so to recap why can't you just um well let me summarize that as why can't
you similarly use different patterns to change the context of what these
patterns of bits represent whether it's a number or a letter or a graphic in
actuality that's kind of what's Happening underneath the hood it's not
standardized in quite the same way but starting next week when we
transition from scratch to C you'll learn about types data types where the
onus initially is going to be on you the
programmer to tell the program whether or not this pattern of bit should be
interpreted as a number or as a letter or as a color or something else
nowadays though in toward the end of the semester you'll use languages like
python where the computer just figures it out for you by context which
makes it even easier and faster to program as well other questions on
Unicode asky or the like all right well how about just a few other forms of
information RGB was called out earlier red green blue How do
much blue it should show red green blue respectively so for instance if a DOT
on your screen were using these three numbers these three values or bytes
72 73 33 in a text message email that would be interpreted as I claimed high
but in Photoshop or in some graphical program that same pattern would be
repres would be interpreted as let's call it a medium amount of red a medium
amount of green and a little bit of blue and why medium and little turns out
that each of these are bites the smallest value you can
have in a bite we said is zero the largest value you can have in a bite is 255
so I'm just kind of spitballing here this is like medium medium and a low
amount of red green blue uh specifically those three colors um like uh
wavelengths of light are combined in such a way that you would have this
dot on the screen a sort of murky shade of yellow or brown that is how a
computer would store precisely that color and in fact we've seen this color
when you type in face with tears of joy generally on
your screen it looks like this typically much smaller but let's zoom in or let's
zoom in a little more what do you starting to see if you know the term so
pixels it's getting very pixelated a pixel is just a DOT on the screen and if you
really zoom in on it you can literally see all of the dots that compose in emoji
in this case on iOS in the font that Apple's using to represent this particular
pattern of zeros and ones so one of those yellow dots and there's many of
them all that kind of
blend together here each dot on the screen I claim is three bytes how much
red green blue for this dot how much red green blue for this dot how much
red green blue for this Dot and you'll notice too that when it gets to be sort
of brownish here the dots really stand out the three values the three bytes
AKA 24 bits are just slightly different and so underneath the hood this is why
images photographs that you take or GIFs that you download get so darn big
potentially because you have a number
representing every dot on the screen well if this I claim is indeed how images
are typically represented using pattern of bits that are assigned to some
amount of red green or blue how do you get video what is a video if at the
end of the day all we have are zeros and ones what's a video perhaps yeah
oh let's go here weighing back yeah pixel's really changing values over time
and do you want to confirm or deny the hand that went up here yeah or
equivalently a sequence of images that over time are
changing on the screen so both of those are valid interpretations and you
know just for fun if you uh grew up with these sort of picture books you
might remember a little something like this if we could dim the [Music] lights
[Music] so that's sort of the old school analog way to implement a video in
the sense that um that artist wrote out like hundreds of pieces of paper with
almost identical images but where the ink from their pencil or pen was
slightly moving and if you digitize that such that each
of those Strokes are represented with dots instead that's really what you're
seeing as a sequence of all these images flying across the screen and if we
dive into the the real world if you've ever watched a film a Hollywood movie
is typically 24 FPS frames per second that really means you're seeing 24
images per second or on TV or in soap operas it's often 30 frames per second
that makes things look a little more smooth so it's not actual motion picture
if you will it sequences of pictures and your brain and
mind are kind of interpolating that oh this is smooth movement even though
we're just seeing a lot of pictures really fast now that gets really big and we'll
talk later in the semester how you can compress information so that you're
not using way more bits than you actually need to and there's fancy
algorithms that folks have developed but at the end of the day that's really
all a video might be is a sequence of images conversely if you want to
represent the music that accompanies that or something
else if any of you play an instrument and can read sheet music How Could
You digitize this like how could you represent musical notes in a computer
you and I hear them when we play files but what's really going on
underneath the hood any any musicians piano players anyone yeah Val okay
so Hertz value so some frequency so sound is some frequency and it's kind
of hitting your eardrum and that's what makes it sound low or high or
somewhere in between so maybe we could assign just like there's
outputs and thankfully humans have standardized a lot of this they don't
always dis agree and this is why we have different file formats for Apple
numbers and Microsoft Excel and Google spreadsheets and sort of stupid
incompatibilities like that but generally speaking humans have standardized
how we represent the inputs and outputs to and from problems but let's now
focus on this black box so to speak in the middle this abstraction so
abstraction is technically a term that you'll see all over the place in
computer science and really problem solving that just refers to the
simplification of something so that you don't focus on the lower level
implementation details you really just focus on the high level goals or the
process itself uh Therefore your car if youve uh if you have a license and
have driven or have been in a car a car so far as you're concerned is
probably an abstraction most of us if you're like me probably don't really
know or care how the engine works and all the parts that
are moving to you it's just a way of getting from point A to point B it's an
abstraction but someone hopefully the mechanic does know those lower
level implementation details if you had to understand how a car works every
time you want to go to school or to the store it's probably going to be a
pretty slow process you just want to think and operate at this higher level of
abstraction and we're going to do this all the time when writing code and
solving problems so what then is in this black box this
some kind of contacts pictured here here and it's alphabetized typically by
first name and last name and odds are you and I are in the habit if they're
not already a favorite of like clicking on search and then using autocomplete
and what happens when you start typing autocomplete well if you type in the
letter H you'll see only presumably Hagrid Harry Hermione and so forth if you
type in ha that shows you only Hagrid and Harry and it all happens super fast
so how is that happening well
typically you could just start at the top and look to the bottom searching for
all the hes or all of the ha but for larger data sets that's going to get slow for
the Googles of the world that's going to get really slow and even on our
phones when you have hundreds thousands of contacts eventually even that
kind of approach that algorithm step by step but it might be slow so how
might we go about searching for someone in a phone book like this uh like
say uh John Harvard well here's an old school
incarnation of this and uh odds are you might not have had occasion to even
physically use this thing nowadays and in fact this is a bit of a white lie cuz
this is the Yellow Pages which means this is a book of companies not people
uh but for this is all you can find and at that it's even hard to find this but this
is the same thing in analog form physical form so if I wanted to search for
someone like John Harvard how could I do that well I could start on page one
and I could start searching for page two
page three page four page five little hard to do physically especially since no
one's used this phone book in a lot of years but uh is this algorithm correct
Turning Page by Page very inelegantly is this correct will I find John Harvard if
if he's in here all right so yes I mean this is a little stupidly tedious because if
there's like a thousand Pages he might be a few hundred pages into this but
it's correct at some point I will find him and if he's on the page I'll be able to
call
why because presumably the names are alphabetized in here and there's no
like cheat sheet on the edge so I have to search for John Harvard from left to
right searching for H if it's alphabetized by last name well what would be
marginally better well how about two pages at a time it's hard to do with a
20-year-old old phone book where the pages are kind of uh grown together
but 2 4 6 8 10 12 this algorithm is this correct all right so no why yeah so I'm
skipping every other page so if I don't consider that and I
find myself in like the I section or the J section well I might accidentally
conclude nope I haven't found John Harvard yet just because I skipped them
because it was sandwiched between two pages now I can fix this I think if I
do hit the I section well let me just double back one page just in case he was
in that last page so it's recoverable but it's almost twice as fast minus that
that hiccup there but what most of us would do and what your phones are
doing albeit digitally is they open up roughly
to the middle of the phone book and they look down and they say oh I'm in
roughly the M section so I'm roughly halfway through this thousand page
phone book but what do I now know about John Harvard where is he to my
left or to my right all right so alphabetically he's presumably to my left and
so here I get can both uh met uh metaphorically and physically tear the
problem in half you don't need to be impressed it's really easy down the the
spine that way but uh I know that John Harvard is to
the left here but now I can throw unnecessarily dramatically half and page
one out of the way and what do I now know I've gone from a thousand pages
to like 500 I can kind of repeat roughly the same algorithm go to the half of
this and so this time I went back a little too far I'm in now the um e section
so what do I know is John Harvard to my left or to my right to my right so I
can again tear the problem in half throw this half away and now I'm really
flying I'm toing it verbally slowly but that went from a
th000 pages to 500 to now 250 and now I can do it again 125 i' do it again
roughly like 67 and keep doing it again and again and again until I get left
with hopefully just one single page or in this case an ad for ironically a
mechanic okay so what is the implication for our performance well let's just
do this sort of in the abstract if you will if that first algorithm were to be
plotted just quickly on a chart without even numbers here's my x-axis size of
problem on the x-axis so the bigger the problem the
farther out that way time to solve the problem the tire you go up on the y-
axis the uh more time you're taking to solve it how would we draw the
running time The amount of time taken to run that first algorithm well it's
going to be a straight line why cuz if you add one more page next year
because more people move to Cambridge you're going to add one more
page turn potentially so one more second one more unit of time so it's a
straight line and we'll abstract it away as n if there's n pages in the
phone book the slope of this line is essentially n the second algorithm
wherein I was doing two pages at a time was twice as fast but it's still a
straight line and in fact let me just draw some dotted lines here if the phone
book is this big with my first algorithm it might take this many step this
many units of time this many steps this many page turns but with that that
second algorithm notice that the intersection is with much lower on the
yellow line than on the red so n/2 means there's
half as many pages here if n is the number of pages so indeed that algorithm
the second one is twice as fast minus the little hiccup that I have to double
back one page but that's not a big deal if I'm still doing things twice as fast
but the third algorithm looks fundamentally different it looks like this
logarithms if you recall from high school or prior if you don't that's fine too
it's just a fundamentally different function a different shape and notice that
the green line is going up and up
and up but a much slower rate of increase which means crazy things are
possible if two towns in Massachusetts like Cambridge and Austin across the
river merge next year for instance in terms of their phone book their phone
book just got twice as big for the first algorithm that's going to take me twice
as many steps to go through the second algorithm almost Twi it's going to
take me 50% more steps to go through two at a time but the third algorithm
that I ended with tearing things again and
again divid and conquering if you will in half and in half and in half how many
more steps will my third algorithm take if Cambridge and Austin merge into a
phone book that's twice as big just one more step right no big deal you just
take a really big bite out of the problem once you decide if John Harvard is to
the left or to the right and so you've made much faster progress and so this
in essence is what your computer your phone is probably doing underneath
the hood when searching for
Harry or Hermione or Hagrid or anyone else because it's that much faster
especially when you have large data if you don't have that many contacts
probably doesn't matter if you search from top to bottom or more uh more in
the form of this divide and conquer algorithm but if you're the Googles of the
world or you're analyzing large data sets indeed this is going to add up quite
quickly so where do we go with this well we're going to introduce next
something called pseudo code how can I
translate what I did verbally there sort of intuitively to actual code well this
won't be scratch this won't be C or python just yet it's just going to be an
english-like syntax and this is how many programmers would start solving a
problem they don't start typing out code in C or python or the like they use
English or whatever their human language is to jot down an outline for their
ideas my first step really was picking up the phone book my second step was
opening to the middle of the phone book
my third step was somewhat different look at the page because why my
fourth step was if person I'm looking for is on the page I then do what never
happened in my example but I call the person so I'm done else if the person
is earlier in the book alphabetically as John Harvard was in the case of my H
then I should search to the middle of the left of the phone book and then I
should go back to step three step three is look at the page thereby repeating
the same process again and again step nine though
might be else if the person is later in the book then let's go ahead and open
to the middle of the right half of the book and then go back to line three else
there's a fourth scenario we should probably consider lest my search process
freeze or crash or give me one of those spinning beach balls with a bug yeah
yeah what if John Harvard isn't in the phone book I'd prefer that my
algorithm my phone not just reboot or freeze I should handle that with some
kind of catchall else so to speak let's
just quit the program so there's welldefined behavior for every possible
scenario of the four now let's call out a few of these Salient terms it turns out
if I highlight in yellow here there's a pattern to what I've been doing here
these are all of my English verbs and we're in a moment we're going to start
calling those verbs functions when you program or write code and you want
the program or the computer to do something for you some action or verb
we're going to refer to those actions or
verbs as these things called functions like those here by contrast I've just
highlighted instead my if my El if my Els if and Els this is going to represent
what we're going to start calling a conditional a proverbial fork in the road
where you can either go this way or that way do this thing or this other thing
and you're going to decide which of those things to do based on what I've
now highlighted here which are going to be called Boolean Expressions bull
referring to a mathematician last
named bull a Boolean expression is just a question with a yes no a true false
a one or a zero answer if you will and it governs whether you do this thing or
this thing or this thing or that the indentation in this case is important the
fact that I've indented line five implies by convention in programming that I
should only do line five if the answer to line four is a yes or true and same for
these other indented lines as well and the last characteristic here is this here
uh someone called this out
earlier in fact these lines eight and 11 are now highlighted and represent
what what might we call these in code if you've done that yeah so these are
Loops some kind of cycles that result in my doing the same thing again and
again but there's a key detail with this algorithm in pseudo code even though
it's telling me to go back to line three why is this algorithm event going to
stop why do I not constantly keep looking for John Harvard Forever by nature
of these Loops telling me to keep going back to line
three good eventually he'll be on the page or or to your point earlier he won't
be at all and we're out of pages and so we just quit and that's the key about
going to the left half or the right half it doesn't matter if you do the same
thing again and again you're not going to get stuck in a so-called infinite
Loop so long as you keep dividing the problem and shrinking it into
something small smaller smaller eventually there's going to be no problem
left to solve so even if you
bit and we'll take a break in a moment uh to take a breather that you will see
these same ideas in a moment in the context of scratch an actual
programming language via which we drag and drop puzzle pieces to make
actual code work we'll see some variant of these ideas things called
arguments and return values and variables but we'll ultimately convert it into
this somehow anyone want to wager what this program will do if fed to your
Mac or PC or phone here's just a massive pattern of
zeros and ones it will indeed say rather disappoint ly apparently just hello
world and indeed baked into all of these zeros and ones are not just the h l l
o but also the verbs the action of printing something to the screen and
there's other stuff too so that the program knows how to start and how to
stop a lot of stuff that we won't have to worry about that whoever designed
the computer or the language did but at the end of the day you're never
going to be writing these zeros and ones yourselves
though our ancestors Once Upon a Time did in some form we'll be using a
much higher l level language like this in C or better yet in just a moment like
in scratch like this and indeed this is why today we focus with focus on and
begin with scratch this graphical programming language so we have a way of
expressing ourselves with functions conditionals loops and more but in a way
that doesn't have stupid parentheses and curly braces and all these visual
distractions in the way and we'll translate that thereafter
to this lower level language but for now that was a lot that was definitely a
fire hose let's go ahead and take a 10-minute break feel free to get up or
stay here and we'll resume in a bit with some actual code uh so this then is
scratch a graphical programming language from a friends down the road at
mit's media lab that indeed some of you might have used in grade score the
like for playing and writing code and the like but you maybe didn't
necessarily think about how some of these Primitives ultimately worked
and in fact everything you've done if you've used scratch before and
everything you'll see today is going to apply to all of the weeks to come as
we explore these things called functions and loops and conditionals Boolean
expressions and more with scratch because it's so graphical and animated
can create can you create animations like this one interactive art and
software more generally but you'll do so by dragging and dropping puzzle
pieces that only lock together if it makes
logical sense to do so and what you won't have to deal with in this first week
of class is Curly braces parentheses all of the weird symbology that you
might recall seeing when we just wanted to say hello world now this
particular program um riging men was written by a former cs50 teaching
fellow Andrew bar who's actually now the general manager of the Cleveland
Browns the American football team and so these are just some of the
programs that some of your predecessors in the class have created and
you'll see in the remainder
of class here a couple of others as well and more in the course's first
assignment namely problem set zero so how do we get there well first a
quick tour of what it is we're going to do this in scratch is perhaps the
simplest program you can write and even if you've never seen scratch or any
programming language before can probably guess that this just says on the
screen somehow hello world but what you don't have to do is type esoteric
commands and weird syntax those curly braces and
parentheses I keep alluding to you just drag this yellow puzzle piece you
drag this purple puzzle piece let them magnetically lock together so to speak
click a button and boom with those same building blocks and several others
can you make what exactly the sorts of things that Andrew brought to life as
well so here's what we're about to see at [Link] is a cloud-based
programming environment on MIT servers you can also download it offline on
your own Mac or PC and it gives you an
interface like this on the left hand side of the screen you'll see a blocks pette
these puzzle pieces AKA blocks come in different colors which rather
categorize them so pictured here for instance in blue are a whole bunch of
motion related blocks so Andrew used a whole bunch of those to have the
singer and the men moving around on the screen um in synchronicity with
the song that was playing in the background meanwhile in the middle of this
interface is going to be the code area and this is where
Andrew and sunu will drag and drop some of those puzzle pieces and other
colors as well and lock them together to get your character soon to be
invented to do something on the screen indeed at the bottom right here will
you see ultimately a Sprite area where a Sprite is a technical term for like a
character in a video game or a programming environment like this by default
historically scratch uh is the cat the mascot if you will for this programming
environment and so here we see by default just one Sprite selected because
on the top right of the screen is the stage for that Sprite and you can click in
Click and zoom in to make it full screen but this is the world in which Scratch
by default the cat will live but you can change scratch's costume so that it
looks like a singer or a man falling from the sky or the like or anything else
either creating the art yourself or importing some of the things that come
with it or elsewhere online so what is this world that scratch rather lives in
well generally speaking we won't have to
care too much about numbers because we'll be able to ask questions like
intera active ones like is scratch the cat or any character otherwise touching
the edge of the screen touching something else but scratch does exist in this
two-dimensional uh coordinate system world so when the cat or any
character is dead center in the middle that would be XY location 0 comma 0
if you will meanwhile over here is 240 pixels or dots all the way to the right
so this would be 240 comma 0 where Y is z because it's right on that midline
so
it's neither up or below over here to the left of course would be 240 and 0
above the cat would be xal 0 cuz it's right on that vertical midline and 180
and then down here as you might guess would be 0 comma netive 180
generally speaking we don't have to care about those precise pixel
coordinates but it's helpful ultimately if you do want the cat to move up
down left or right having some sense of direction according to the x- axis and
y AIS as well can help you express your ideas
ultimately so what might some of those ideas be well let's do this I'm going
to go ahead and create on [Link] just an empty screen like this one
here and so this is the exact same interface but now I'm in my browser uh
full screen so that I can start writing some code and let's get that cat to say
something actually on the screen now this takes a little bit of practice but
honestly just by scrolling through these puzzle pieces can you quickly get a
sense of what's possible not just categorically but
specifically and I'll jump around because I've done this of course but I'm
going to go to events in yellow first and I'm going to drag and drop this first
block called when green flag clicked and I've zoomed in there just to make it
a little more legible and notice that the shape of this green flag just so
happens to mirror this green flag here at top next to this red stop sign of
sorts and the green flag is going to mean go and the red stop sign's going to
mean stop to start or stop our program
something curious and different about this purple block it says of course say
in purple but then there's this white oval and some text that by default is
hello cuz MIT just decided that by default the placeholder will be hello but
anytime you see this white oval it's an opportunity to provide an input into
the function called say and so here I'm borrowing terminology from before
problem solving again is all about inputs producing outputs and in between
there is some algorithm in a moment
human clicked on that green flag I triggered what we're going to start calling
now an event an event is generally something graphical or interactive that
just happens in a computer program you and I trigger events on our phones
all day long whenever you tap or drag or long press or pinch or any of those
gestures in Vogue nowadays on phones you are triggering events and people
at Apple and Google and elsewhere have written code that listen for those
events and do something when that event happens that's
what I just did when green flag is clicked I want something to happen namely
I want this purple function this verb this action called say to do something
what do I want it to do I want it to say what this input is and I'm going to
introduce another vocabulary term the white ovals here are yes inputs very
generically but in the programmer's terminology they're called arguments
otherwise known as parameters and that just means an input to a function
that modifies Its Behavior in some way when I
click stop that's just another event and that one is just built into scratch
scratch knows that when you click the green stop sign uh everything should
just stop automatically I don't have to write code to support that feature so
that's all fine and good hello world but if I keep doing stop and start and stop
and start it's going to do the same thing again and again and it's really not
that interesting at the end of the day maybe gratifying once but it'd be nice if
this were a little more
interactive so it turns out that we can do that too but we need a different
mental model instead so in this case here when we think about this function
say in this input hello world this actually Maps pretty cleanly to this model
earlier that I propose is problem solving is computer science if you will the
input to the current problem is going to be in white here hello world the
algorithm is the say algorithm now I don't know how MIT got it to print out
the little pretty speech bubble on the
screen but they wrote those underlying low-level implementation details and
they gave me and you a purple function called say that just does that for you
you and I don't have to reinvent that wheel the output of SE is another
technical term now called a side effect a side effect is usually something
visual that happens like as a side effect of you calling a function and so the
side effect here is that the cat has this speech bubble magically appear
inside of which is hello world so we have an input we have an output we
have
an algorithm but now we're talking about these ideas in the context of
programming so now the input is an argument the algorithm is a function
and the output in this case is a side effect terminology that you'll just hear
more and more and it'll eventually sink in but not to worry if the terminology
doesn't come naturally early on so what more might I do with this let me go
back to scratch here and make this maybe perhaps more interactive and
actually get the cat to say something a little
more dynamically so instead of hello world why don't I get it to say hello to
me or to you or anyone else so let me do this let me go under say uh let me
get rid of this first and you'll notice this neat trick as soon as you start
dragging on a block if it gets close to it it kind of goes gray and it can be
magnetically snapped together you don't have to do it very precisely
conversely if I want to get rid of a puzzle piece I just drag it anywhere on the
left let go and that deletes it or you can right
click or control click in a little menu will let you delete it as well well let me
do this instead under sensing which I know is there because I've done this
before are a whole bunch of things related to sensing whereby the cat can
kind of feel out its World in some sense it can do things like ask this question
am I touching the mouse pointer like the user's cursor am I touching a
specific color that you can override to be anything you want is the distance
to the mouse pointer some specific value but
for now I'm going to focus this on this this blue puzzle piece that asks a
question which itself is this white oval that I can apparently change and then
it's going to wait for a response but this puzzle piece is a little different it's a
little special it comes with a freebie it comes with what we're going to call
technically a return value so some functions don't just do something on the
screen they hand you back so to speak a value that you can do anything that
you want with nothing happens
immediately unless you do something with that so-called return value so let
me go ahead and drag this thing over here ask what's your name and I'll use
the default question that seems a reasonable place to start I'm not going to
override that default and now let me go ahead and zoom out let me go back
to looks let me go to say and let me just form the English sentence I want so
let me zoom in here and type in hello maybe comma space I could do David
but that's that's obviously not right because I'm asking
for a name and then I'm like in advance hardcoding my name that's not what
I want I just want hello comma and now let me zoom out and grab one more
say block Let Me Maybe say here okay I don't want to say hello hello I I don't
want to just type in my own name CU again then what's the point of asking
the user for their name but notice this if I go back to the sensing block this is
where that oval that's blue called answer is useful this will be the so-called
return value of that function so I'm just going to go
ahead and do this and drag and drop even though it's not the right size it is
the right shape and so scratch will be smart about it and grow to fill that
puzzle piece for you let me zoom out now and now let me click the green flag
you'll see that scratch is indeed prompting me with the speech bubble what's
your name notice the little text box below the cat is asking what's your name
so I'm going to type in da a v d and hit enter or I can click the blue check
enter okay it's a little weird I wanted
him to say hello not just my name so let me stop let me start it again all right
hello what's your name da a v enter huh kind of rude uh why is there this
bug like I wanted to say hello David not just David and and yet twice it has
failed to do so uh yeah yeah the computer is processing my directions my
actions really quickly and so it actually is doing it it's just you and I in the
room are just way too slow to notice that it said hello David it just seems to
have just said David so all right how can I fix this well here's
where you start to poke around and think about how you might solve this let
me go back under looks maybe there's a smarter way to do this maybe I
could do okay I could do this how about instead of just say hello there's
apparently another puzzle piece where I can time it so I can maybe slow
things down a little bit so let me do this let me throw away all of this let me
drag a say hello for two seconds let me drag another say hello for two
seconds let me change the first one to indeed hello comma and then let
me go back to sensing let me grab that same answer because I threw it away
a second ago and I'll just change it I don't even have to delete hello I can just
overwrite it like this so now I think we'll kind of pump the brakes and see
things more slowly let me stop let me start da ID enter hello David okay so
it's better like it seems to be working I think your hypothesis was right just
looks kind of stupid right like the fact that it's saying hello David like we can
do better and
adding subtracting and so forth you can Generate random numbers which
might be useful and if I keep scrolling down there's this join apple and
banana but that's just placeholder text you can join one piece of text with
another piece of text by default apple and banana but let's change it to hello
and my name name so this to wrong size but right shape so let me let it snap
into place let me go ahead now and do hello comma and now I think I just
want to go grab that answer return value let me
drag the same oval as before clobber that is overwrite banana so now I'm
kind of composing functions the output of one function join is going to be the
input of another function say so let's see what happens now that they're kind
of stacked on top of each other or nested so to speak click the Green Arrow a
green flag DAV ID enter hello David all right that was pretty fast let's just just
do it once more stop start here we go daavid enter okay right it's not the
most exciting program in the world but
it's more correct it's better design just because that's what you would kind of
expect the software to do and not be some kind of lame user interface that's
just inserting random delays to just make it kind of work like that's a
workaround a hack if you will but there's some cool things you can do with
scratch and we won't really go down the rabbit hole of all of the fun and
familyfriendly features that it has but there is one that's kind of cool here let
me go into the extensions button at
the bottom left of my screen and this one's kind of cool let me go to text to
speech and you'll notice that this one requires internet because it's cloud-
based but this just gave me some new puzzle pieces in a new category text
to speech and these green ones do exactly what they say so let me do this
let me zoom out again let me keep the join block and I'm just going to
temporarily toss it over here it's not going to delete itself cuz I didn't drag it
over to the other side but I'm going
to get rid of the say block in purple I'm going to do the speak block here in
green and let it snap into place and then I'm going to drag and drop this onto
the input to speak and now perhaps a little more adorably let's try this green
flag what's your name d ID enter and hello David okay it's a little it's a little
robotic but at least now it has synthesized speech and I've kind of got my
own like Siri or Google assistant or Alexa thing going on here now where it's
now recognized whatever text it is and it's played it well let's make this an
actual cat that doesn't talk in that weird human voice let me go ahead and
get rid of most of this stuff and let's get the cat to actually meow like a cat
tends to and let me go under the sounds block now MIT gives you a few
sounds for free because it's designed around a cap by default and I'm going
to go ahead and grab this one play sound meow until done and now and we
saw I heard a teaser for
this earlier in the crowd it's a little piercing admittedly we can lower the
volume a little bit there but notice if I want the cat to meow a second time I'll
just click it again okay and over there too I hear okay all right so it's kind of
cute now right so it's just meow okay yes echo echo so it's meowing now
every time I hit the green flag now that's great but even a kid is probably
going to like would prefer that it just meow perhaps like again and again
without having to
keep hitting the button so well how might we do this all right well if I want it
to meow multiple times why don't I just like grab it another time and another
time alternatively you can right click or control click a puzzle piece and just
duplicate it from a little menu that drops down so here we go three meows
all right that's not really a happy cat it sounds maybe hungry so can we slow
that down well maybe in fact if I poke around let me go under control looks
like there's a weight block wait 1
and it did so iager this is correct but it's not the best design and this is where
things get more subjective right like you could write accurate sentences in
an essay for an English class but otherwise just it's just completely a mess
like your arguments here and there and you don't say anything wrong but
you don't say it well in the context of code we can do better than this and
copy paste or repeating yourself again and again tends to be bad practice
why suppose that you want to change the
weight to two seconds instead of one it's aditt not a big deal F I click there I
change it to two I click there I change it to two but what if you Ma five times
10 times now I have to change the weight like in five 10 different places like
that's just stupid it's taking unnecessary human time and you're going to
screw up eventually especially if your program is getting longer you're going
to miss one of the inputs you're going to leave the number wrong and you're
that's a bug so just based on
what you've seen already or if you've programmed before which a few of you
have what's the term of art here that will solve this how can we design this
better I heard it here here yeah so a loop a loop some kind of cycle that says
do that again do that again not infinitely many times necessarily but some
finite number well you can perhaps see a spoiler on the screen under the
same uh orange control category is a repeat block and by default it's
proposing 10 but we can change that so
let me do this I'm going to throw away most of this copy paste as redundant
I'm going to detach this temporarily just to make room for something else
and I'm going to drag a repeat block over here and let that snap into place
and I'm going to change it for now just to be three for consistency and this is
the correct shape even though it's too small but scratch will accommodate
that for us and now same uh same output but arguably better designed why
because if I want to change the number of meows I change it
in one place no copy paste messiness if I want to change the waiting one
place I don't have to change it in multiple places and not screw up so let me
hit the green flag all right so nice now it would have been nice if MIT had just
given us a meow block that just automates all of this for us let me wager
they gave us the low-level implementation details they gave us the play
sound meow but I had to implement like a decent number of blocks just to
get a cat to meow again and again I feel like we should have
gotten that for free from MIT well they don't have to be the only ones that
invent blocks for us to use you can write your own functions your own verbs
or actions so how can we do this let's make our own puzzle piece called now
that uses this code but creates it in such a way that it's reusable elsewhere
so let me do this under my blocks in pink here I'm going to go ahead and
click literally make a block now here's an interface via which I can give the
block a name Meo W will be the name of
this block and I'm just going to go ahead and quickly click okay that just
gives me a very generic pink puzzle piece that starts with the word Define
because scratch is asking me to Define that is implement or create this new
puzzle piece for me well what does it mean to meow I'm going to claim that
it means to do these two steps to play the sound meow and then just wait for
one second but what's powerful about this idea is look at this up top now
that I've made a block it exists in scratch D
MIT didn't need to create this for me I created it for myself and even you if
we end up sharing code so I can now drag meow up in here and what's nice
about meow is that itself is yes a function but it's also an abstraction like
never again do I or even you need to worry or care about what it means to
meow or implement it I can sort of drag it out of the way I didn't delete it
drag it out of the way out of sight out of mind why because my code is now
even better designed in some sense because it's more
readable what is it doing when the green flag is clicked repeat three times
meow it just says what it means and so it's a lot easier to read it and it's a lot
easier to think about it especially if you're using Meow in other uh projects
too now let me go ahead and right hit click play same thing so it's not really
fundamentally any different but I can make this custom puzzle piece this own
function of M meow even more powerful let me kind of rewind a bit and go to
my meow puzzle piece and I am going to
control click or right click on my pink puzzle piece and I'm going to edit it so I
kind of regret making meow so simple wouldn't it be nice if meow took an
input AKA an argument that tells meow how many times to meow then I can
get rid of that Loop and just tell meow how many meows I actually want so
I'm going to click on another button here called literally add an input and it's
going to have placeholder here so I'm just going to put a placeholder there I
keep using n for number which is a go-to in
computer scientist terms um and I'm going to add some descriptive text just
so that it's a little more self-explanatory I'm just going to say meow n times
but there's only one oval times is just going to be explanatory text and now
notice what has happened now my puzzle piece takes an input AKA an
argument that will tell that function to meow some number of times but it's
not just going to going to work magically I need to implement that lower
level detail so let me zoom out I have
to remind myself what this function was so I'm going to drag it higher up just
so they're on the screen at the same time I'm going to go ahead now and
temporarily move this over here I'm going to temporarily detach this over
here why because what I think I want to do is move my Loop into the
function itself move the play and the weight into the loop but I don't want to
hardcode three notice that n here is its own oval I can drag a copy of N and
just let it go there so now I have a new version of
meow that takes an argument in that tells meow how many times to meow
and now let me again drag this out of sight out of mind because who cares
how I implemented it once it's implemented it's sort of done now my
program is even better designed in some sense why because now it really
just says what it means there's no Loop there's no repeat no like
implementation details when green flag clicked meow three times and so
functions indeed let you implement algorithms like they're just code that
do something for you but they're also themselves abstractions why because
once a function exists it has a name and you can think about it in that term
and you can use it by its name you don't have to care or remember how the
function itself was built whether it's by you or even MIT so again here I'll click
the green flag it's the same thing so still correct but better and better
designed and so anytime here and out with scratch or soon C and eventually
python when you find yourself
doing anything resembling copy paste or again and again grabbing the same
code probably an opportunity to say wait a minute let me refactor this so to
speak that is rip out the code that seems to be repeated again and again and
put it in its own function so you can give it a descriptive name and use and
reuse it any questions just yet on now saying or these Loops or these
functions that we're using yeah [Music] how did I make it so it meows three
times so I originally only had a puzzle piece called meow and I decided to
improve it so I held down control and I right clicked or control clicked on the
pink puzzle piece at top left and I clicked edit and that brought back the
original interface that lets me add some arguments to the puzzle piece itself
and I clicked add an input on the left here and then I clicked on add a label
over here so that just lets you customize it even further all right so we've
done this let's add one of those other Primitives too to do something
optionally so how about we make the cat
meow only if it's being petted by a human as by moving the mouse to hover
over the cat like a human would pet a cat well let me go ahead and throw uh
away the meowing uh for now and let me simplify it by just using a sound I'm
going to go ahead and do this I'm going to go ahead and have a control block
that says if because I want to implement the idea of if the cursor is touching
the Cal then play sound meow or I could use my same pink puzzle piece but
I'm going to throw that away and focus only now on
the sounds and I'm going to do this uh if uh touching Mouse pointer so I need
to sense something about the world and we saw this earlier so if touching
Mouse pointer so notice this shape here way too big but it is the right shape
so if I hover just right it'll snap into place and this now in blue is my Boolean
expression a yes no question true false uh if is a conditional and what do I
want to do well if the Cat is being uh is touching the mouse pointer I want to
go ahead and play sound meow until done
so let's do this I'm going to hit green flag click now nothing's happened yet
because it's a conditional right it's only supposed to do something if I'm
touching the cat let me move the cursor over to the cat and and and wait for
it h another bug why is the cat not meowing even though I very explicitly
said if touch in Mouse pointer meow yeah in the middle yeah this is again my
computers are so darn fast like yours I click the green flag it asked the
question am I touching the mouse pointer well no cuz
my cursor was up there not touching the cat it's too late the cat's out of the
bag and so we have to instead solve this some other mean by some other
means how can we fix this how do we fix that sort of race yeah yeah so why
don't we just keep asking the question until I eventually am I'm not actually
petting the cat so let me detach this temporarily let me go under control let
me go under instead of repeat some finite number of times let's just do it
forever so sometimes Loops that do work
forever are a good thing like the clock on your phone that's in a loop forever
because you want it to always tell time and not stop at the end of the day so
sometimes you do want code to Loop forever as in this case so let me go
ahead and drag and drop it there let me again click the green flag nothing's
happening yet but notice the program's still running and so if if I move my
cursor move my cursor move my cursor and okay so maybe we could add
some waiting but the cat does not want to be
pet in this case but it's indeed conditional so there we have an incarnation in
scratch of doing something conditionally now we can make this really cool
really fast if you will let me stop this version let me go ahead and do this uh
let me go ahead and throw all of this away let me go into my little uh
extensions Bucket over here and let me do video sensing since most uh
laptops or phones these days have cameras and there indeed I am with
Sanders behind me and let me do this um when video motion and let me get
out of
the way when video motion is greater than some value so 10 is the default
this is just a number that measures how much motion there is or isn't so
small number is like no motion big number is lots of motion so I'm going to
choose 50 somewhat arbitrarily here so 50 this is not normal to program off
to the side but I'm now going to say this when video motion is 50 go ahead
and play sound meow like this so the cat is still in that world I'm going to
stop the program and rerun it so here we go green flag
and now here come all right this is a little creepy the way I'm petting the cat
but and ah okay there we go okay so 50 was too big of a number I have to
pet the cat faster whereas this if I don't know yeah so okay so you can make
things even more interactive in this way by just assembling different puzzle
pieces and honestly there are so many different puzzle pieces in here we're
not going to even scratch the surface of a lot of them but they generally just
do what they say and indeed when you see on the
screen here um this pallet of puzzle pieces really a lot of programming
especially early on when learning a language is just trying different things
and try and fail and if it doesn't work quite look for doesn't work quite right
look for an alternative solution there too as even I just had to do a moment
ago well let's go ahead and use actually how about another example of
something a predecessor of yours made let me go ahead and grab a
program I opened in advance here uh called wacka um might we
get a brave volunteer to come up who is willing to whack a mole with their
head virtually maybe okay let's see how about in way back you want to come
on down all right come on down and in just a sure Round of Applause for our
volunteer all right so here we have come on down there what's your name
I'm Josh oh actually say it into the microphone hi I'm Josh okay nice welcome
Josh come on over all right so same idea here I'll take the mic back you can
you'll have to stand in front of the camera in just a
moment you're going to have to position your head in a box that your
classmate from yester year created and we'll start with beginner okay so line
your head up in the Box in a moment all right all [Music] [Applause] right
[Music] nice 12 seconds 5 Seconds notice the score is up to 18 already pretty
good all right a round of applause for Josh if you can so notice how using
some fairly simple Primitives things do get interesting pretty fast and how
was that implemented well there were probably at
least four Sprites so you're not re confined to just one cat you can create
more and more Sprites change what they look like so they actually look like a
mole in this case there's probably some conditionals in there Some Loops for
30 seconds that's checking if Josh's head's movement is exceeding some
value over this way or over this way then increment something called a
variable we'll see those two just like in algebra you might have X and Y and Z
storing values like numbers so can computer programs have
examples we just did might also be implemented let me go ahead now and
click the green flag so some trash is moving presumably in some kind of loop
from the Top If I'm touching the mouse cursor it follows me if I hover over the
trash can it responds if I let go in some kind of loop Oscar pops out creates a
variable with the current score and it happens again pretty easy at first but I
don't need to keep playing this up on stage in front of everyone so my score
is already now up to some six or
so but in a moment two you'll see that it's going to escalate so I'm taking
into account some time apparently so now so more and more Sprites are
suddenly appearing and notice that each time they're appearing from a
different part of the screen that's an illusion perhaps do that pick a random
number between X and Y so you can actually pick some range of values to
have the game constantly changing and indeed I'm going to go ahead and
click stop since i' spent like 8 hours plus years ago making
this and I can never listen to the song again not that I should be anyway at
this point in my life but this song is uh synchronized in with a lot of the
actions that's happening and ultimately there's just a lot of building blocks
but I didn't sit down and Implement Oscar time as I called it all at once I
really did take baby steps so to speak and I figured out well how could I
decompose this Vision I had at the time to create this game ultimately and
how do I bite off maybe the easiest Parts
first and honestly the first thing I did was I found this image and I just like
dragged and dropped it into scratch okay done like lamp post is installed it
doesn't do anything it's not interactive but I at least set the stage so to
speak for the program then what else might I have done well let me do this
let me go ahead and open up uh in another editor here a early incarnation of
Oscar Time by doing this let me go into Oscar time here let me full screen
this and here you have let me hide the trash for just
that code well maybe the first version would have been something like this
where by my very first version of Oscar time might have said something like
oh this how about let me control the program as before or rather events
when the green flag is clicked what do I want to do well I want to go ahead
and forever do something like this uh forever so I want the lid to open up if I
touch it so if the cursor gets near the lid I want the lid to open up and then if
I move away I want it to close so how can I do that I want an if but I
just don't want one question I really want two a fork in the road that goes left
or right so to speak and let me grab this puzzle piece here as I did long ago
so notice it grows to fill what's the question I want to ask well under sensing
I'm going to go ahead here and say if this trash can is touching the mouse
pointer what do I want to do well I want to change what the trash can looks
like and this part I did in advance of class if you go up here to costumes this
is where all the graphical
stuff happens and you'll see that I imported a whole bunch of different
costumes that effectively much like a video when you play them quickly
creates the illusion of movement some animation but it's really just dot dot
dot dot dot different images showing on the screen well some of these
costumes are called like Oscar 1 Oscar 2 Oscar 1 is closed Oscar 2 is open so
let's just deal with those first so if I'm touching the mouse pointer let me go
under how about looks and we didn't use this before but
there's this block switch costume to something else I'm going to drag and
drop this inside of the if and notice it's a little bit indented I'm going to
change it not to Oscar 8 but Oscar 2 otherwise if not touching the mouse
pointer this is the other direction in the fork and the road let's go ahead and
switch the costume back to what I described as Oscar 1 so let me run this
program and not much of interest is happening yet but notice if I move the
cursor up down but how is that working
it's just changing the costume that's being overlaid on the Sprite so it looks
like interactivity but you're really just changing the Aesthetics and we
humans are just kind of you know assuming oh it's opening up well no it's
just changing a costume so here's the difference the high level abstraction
trash can opening the lower level implementation detail costume changing
creating that illusion and if I wanted to look prettier I could just have many
other costumes and go
boom boom boom boom boom to create more frames per second if you will
so I need to do um one other thing maybe if I accidentally leave the trash
can open let me make one change here let me make sure that the very first
thing I do when the green flag is clicked is always start with the trash can
closed because otherwise you might accidentally leave it open so this gets
me into some default state so now it's always closed until I manually hover
over it instead well what might I have done next well if
I wanted to introduce something like the trash I need a second Sprite and
here in advance I grabb the image already let me pretend that this never
happened let me drag this away here and now I have nothing in my code
area for this piece of trash but it is a second Sprite and all I did was I clicked
on the little cat plus icon here created a second Sprite I named it trash I
added an costume for it sort of the aesthetic stuff I did in advance but here
I'll do now the code how do I want to do this
well how about when the green flag is clicked for the trash can I want the
trash can in parallel to do or I want the trash the piece of trash to do its own
thing so what I wanted to do is maybe let's do motion how about and let's go
to a specific coordinate now there's a lot of options here there's turning go to
a random position go to x comma y Glide more elegantly there's a lot of
different ways to implement movement I just wanted to go to a very specific
location first so I'm just
going to go to x comma y first and I'm going to say x how about will be um
uh let's not hardcode this let's just have it be well let's do it at zero initially
and then 240 so whoops let's do 0 comma 240 so that this piece of trash
always starts at the top middle of the screen if you think back to that
coordinate system 0 0 is in the middle 240 is straight above it all right now
after I do that what do I want to do well how about I control this thing by
forever falling now how do I make the trash move
we haven't seen this puzzle piece yet but under motion the very first thing is
called move some number of steps by default it's 10 but we'll do it more
simply let me go ahead and move uh oh sorry move is going to move it uh in
whatever Direction it's facing I only want it to move down so here even I'm
getting confused as to how many different ways there are to do things what I
think I want to do is this let me only change my y AIS as follows so here's
another puzzle piece called
stop this let me go under operators and let's pick our random number so let
me change the hardcoded the manually inputed zero and let's make X be
somewhere between zero so in the middle and all the way over to uh what
was it one oh I got my numbers wrong 240 and my y will be 180 sorry I got
my X and my my y confused so let me play this again and now we have a
game that's more like games you might have played growing up or even now
like there's some Randomness to it so the CPU so to speak is doing
something more interesting let me run it again now it's a little to the left let
me run it again now it's a little more to the left again now it's back to the
right so Randomness just makes games more interesting and this is why
when you play any video game if different things are happening there's
probably just some Randomness and it's quantized is just a simple number
now I think I just need one final flourish here if I may let me go ahead and
add this how about uh events or rather yes events
when green flag is clicked I can do multiple things within the same Sprite
they don't all have to be attached to the same one let me go ahead and
forever go ahead and do something else how about whenever the trash is
how about touching the trash can so forever if let's see I need a sensing
block so how about is touching uh not the mouse pointer this time but
touching Oscar himself there now let's see what happens all right so let's go
ahead and click the green flag now I go down over here and let
go okay that's I kind of want it to go into the trash can how do I make it go
into the trash can how can we take this high level idea put trash into the
trash can and make it seem to disappear logically what could we do yeah
okay so when it touches it let's have it disappear so I could hide it or
honestly if the game's going to be ongoing like it was letting me Mo drop
more and more trash let me just have it go ahead and pick a new random
location so let me do this let me go ahead and
copy this puzzle piece up here and duplicate and I don't want the whole thing
sorry let me get rid of this let me just do this let me go back to some random
location at the top so now notice what happens if I click and drag on it here it
goes and I let go it looks like it's going into the trash can because it snaps
back up to some random location now the only thing I'm not doing really is
keeping track of any kind of score and it turns out if I full screen this it's not
going to be draggable by
default so just as a corner case so to speak something that you might trip
over otherwise let me go ahead and under uh let's see uh sensing it turns
out I also need this for the piece of trash there's this way of setting and
scratch a puzzle a Sprite to be draggable or not draggable I need to explicitly
make it draggable so that when I do full screen this thing now it Still Remains
draggable and someone like myself can play it again and again well how
about we supplement this with one final
flourish why don't we keep track now of the user score so how about when
the user actually drags the piece of trash to the trash can let me go under
variables here where in advance I've already made myself a variable called
score I could have called it X or Y or Z or ABC but that's not very descriptive
in programming you typically give things a more descriptive English or some
other language name so I called this one score so how do I want to do this in
my score well let me go ahead and initially set
this game score to zero at the very top of one of these Scripts or one of
these programs up here and then anytime my cursor my piece of trash is
touching Oscar let's not just jump to the top let's change the score by one up
here so now notice if touching Oscar change the score that is add one to the
score and then pick a new random location and now green flag let's do this
slowly here it goes it's the trash can opens I let go and now notice at the top
left of my program notice the score
is now two notice the score if I do this again is about to become three and so
here we have building blocks literally of making this program better and
better and better and so indeed that's how you generally approach solving
any large program uh any problem with code be it in scratch or C or python
or some other you take this Vision you might have or some Vision you've
been assigned in a homework assignment and try to break it down into these
constituent parts and just pluck off the easy ones first put
the lamp post there first and at least feel like you're making some progress
then pluck off something like the trash can and just make it do a little thing
and it doesn't have to be in some same order here I could have done this in a
million different ways but figure out what the small pieces are that ultimately
like a few of the problems we've solved today assemble into a greater
solution there too uh so that you have now a mental model for these types of
blocks and others let's return
for a moment to this we saw a moment ago that when I started saying hello
David and nesting those puzzle pieces we had a whole different Paradigm Al
together my input for that second version of hello world was to now pass in
for instance what's your name into my function called ask that gave me not a
side effect but what I called again a return value called answer by default in
scratch and now notice and recall when I had that same output become the
input to my next block it looked a little something like
this say so how does this type of block and this nesting the stacking of blocks
fit into the same mental model well same idea my input for that part of the
story is now taking in not one input but two two arguments hello and the
answer from before the function in this case is that new block called join the
output thereof is hello David which itself became if we sort of animate this
the input to my final function which indeed was still say and this is only to
say no pun intended that almost everything that
that you do with these puzzle pieces be in the context of Oscar time or the
mole whacking or even just something simple like hello world will ultimately
fit into that relatively simple mental model there now I thought we'd End by
taking a look at just a couple final examples these ones two made by some
of your predecessors and for this I thought we would not write code together
but read it instead and so allow me to open up one other example here that
will show us a few different versions of a program
that a predecessor made give me just a moment here and we'll see how we
might build up to something even more interactive and in just a moment
we'll see something they called Ivy's uh hardest game focused here on these
particular mechanics so here is version zero so to speak of this program
where in the goal is to create a game where you have to like get out of some
kind of Maze and you have to get out in this case the Harvard Crest from this
maze let me go ahead and just hit play on
this green flag so you can see what what the first building block for this
program might have been notice that my hand here is actually on the arrow
keys of my keyboard and it seems that by moving up down left or right this
little Crest on the screen responds in exactly that way now let's hypothesize
for just a moment even though we've not done anything quite like this before
how might this code be implemented how do you get a Sprite be it a cat or a
Crest to respond to keys on a keyboard might
you think intuitively yeah there could be something that's sensing what key
you're pressing on yeah there could be something sensing what key you're
pressing on and if you do it again in a forever Loop you'll just constantly be
listening for keystrokes and this is how like every piece of software nowadays
works it's constantly waiting for your phone to be tapped or something to be
typed on the screen so let me go ahead and look inside of this existing
program here and there's more going on but we'll
take a quick glance what's actually going on well up here at top left notice
we just have go to x equals 0 and Y equals 0 that means put the Harvard
Crest dead center in the middle of the stage then we have forever two
functions that we made in advance as custom functions uh listen for
keyboard feel for walls so it's doing two things at once it's forever listening
for the keyboard up down left right and feeling for the walls in the sense that
if I get too far to the left I don't want it to
keep moving past that black wall and if it moves too far to the right I don't
want it to blow through that wall either so it's going to do two things
constantly listening for keyboard and feeling for walls so to speak and how
are those implemented well this one's a bit long but on the left here is listen
for keyboard so this pink puzzle piece listen for keyboard first checks if the
key up arrow is pressed question mark Boolean expression in a conditional
change y by one that means move it up
one else if the key down arrow is pressed then change y by negative one and
similar for left Arrow similar for right arrow and even though there's not a
loop in this pink function there is where I'm using it so it's constantly being
asked again and again how about feeling for walls well over here to the right
it's a little cut off but here you have if touching left wall change X by one so if
you hit the wall it's too late you're kind of blowing through it already so I
want to move it wh one
pixel so it's no longer touching that wall similarly if it's touching the right wall
I want to back it up one pixel so it's no longer touching that wall so it's kind
of like bouncing off ever so slightly so that it doesn't slip through that actual
wall and what are those walls well noce down here it's just a simple Sprite
with a black line that I've oriented hor vertically instead of horizontally and
that's just so that I can ask questions of these other two Sprites now that
gives me that
form of interactivity what more can I now do well what if we make things a
little more interactive here let me go ahead and see inside version one our
second and let me propose what's going to happen here well how might we
add a little something like Yale into the mix well what's Yale going to do when
I I hit the green flag now based on this code any hunches here is the code for
my Yale Sprite yeah yeah it's kind of got to be an adversary by blocking my
path theoretically if I keep writing more
code so why it to goes to the middle of the screen it points in Direction 90 de
so similarly there's a whole degree system as well and it forever asks this if
touching the left wall or notice the green block touching the right wall then
just turn around 180° and in if you think this through logically that just
means you're bouncing this way and this way by just flipping yourself around
180° for just this Yale Sprite so if I go ahead and zoom in on this and click the
green flag I can still move up and
down but Yale is just kind of doing this all day long back and forth and back
and forth forever nothing bad happens if I try to go through it but we could
add that certainly to the mix in fact let's add one final feature before we play
this particular game and let me go ahead and open up the final version of
these building blocks that adds MIT to the mix so here's MIT someone want
to explain what this code does and this is what we're doing this itself is a skill
reading someone else's code and understanding it is half
of the part of programming besides writing yeah yeah it's chasing down the
Harvard logo outline so this is apparently the name of the costume that this
student made Harvard logo outline outline and apparently it goes to a
random position first but then it forever points to Har so no matter where I'm
moving it up down left or right MIT is being a little more strategic than Yale
bouncing back and forth like this so let's go ahead and play this one in full
screen and here we have a green
the actual version bit written by one of your predecessors that I'll full screen
here it's going to stitch together all of these same Primitives in more but add
the notion of score and lives so that there's actually a goal which in this case
is to move the Harvard Crest to constantly pursue the character on the right
hand side so that your Sprite touches that one would you like to introduce
yourself uh hi my name is Muhammad all right wonderful welcome aboard
and here we come with some
instructions and final flourish if we want to keep the lights up but perhaps
increase the music [Music] e [Music] [Music] you [Music] all right this is cs50
and this is week one wherein we continue programming but we do it in a
different language because recall last time we focused on this graphical
language called scratch but we use scratch uh not only because it's sort of
fun and accessible but because it allows us to explore a lot of these Concepts
here Nam functions and conditionals Boolean Expressions Loops
something like this and now even if you can't quite distinguish what all of the
various symbols mean in this code turns out that at the end of the day it's
indeed going to do what you expect it's just going to say hello world on the
screen just like we did in scratch so let's start to apply some terminology to
these to these uh tokens first so what we're about to see what we're about to
write henceforth we're going to start calling source code code that you the
human programmer write is just
henceforth called source code doesn't matter if it's scratch doesn't matter if
it's C doesn't matter if it's python before source code is the general term for
really what you and I as human programmers will ultimately write of course
computers don't understand source code it turns out computers don't
understand scratch and puzzle pieces per se or C code like we're about to
see they only understand this which we called what last week yeah so this is
binary zeros and ones but really it's
just information represented in binary and in fact the technical term now for
patterns of zeros and ones that a computer not only understands
understands how to interpret as letters or numbers or colors or images or
more but knows how to execute as well henceforth is going to be called
machine code to contrast it with source code so whereas you and I the
humans write source code it's the computer that ultimately only understands
machine code and even though we won't get into the details of exactly what
pattern of
symbols means what you'll see that in this kind of pattern of zeros and ones
there's going to be numbers there's going to be letters but there's also going
to be instructions because indeed computers are really good at doing things
addition subtraction moving things in and out of memory and suffice it to say
that the Macs the PCS the other computers of the world have just decided as
a society what certain patterns of zeros and ones mean when it comes to
operations as well so not just
data but instructions but those patterns are not something we're going to
focus on in a class like this we're going to focus on the higher level software
side of things simply assuming that we need to somehow output machine
code so it turns out then that this problem we have to solve getting from
source code to machine code actually fits into the same Paradigm as last
time but the input in this case is going to be source code on the one hand
like that's what you and I ideally will write so that we don't have
to write zeros and ones but we need to somehow output machine code
because that's what your Macs PCS phones are actually going to understand
well it turns out there's special programs in life whose purpose is to do
exactly this conversion convert the source code you and I write to the
machine code that our phones and computers understand and that type of
program is going to be called a compiler so indeed today we'll introduce you
to another piece of software and these come in many forms we'll use a
popular one here that allows you to convert source code in C to machine
code in uh zeros and ones now you didn't have to do this with scratch in the
world of scratch it was as simple as clicking the green flag because
essentially MIT did all of the heavy lifting there figuring out how to convert
these graphical puzzle pieces to the underlying machine code but now
starting today as we begin to study programming and computer science
proper now that power moves to you and it's up to you now to do that
kind of conversion but thankfully the fact that these compilers exist means
that you and I don't have to program in machine code like our ancestors
Once Upon a Time did be it virtually or with physical Punch Cards like pieces
of paper with holes in them you and I get to focus uh on our keyboard as
such but it's not just going to be a matter today of like writing code it's going
to be a matter ultimately today onward of good code as well and this is the
kind of thing that you don't just learn
overnight it takes time it takes practice just like writing an essay in any
subject might take time and practice and iteration over time but in a
programming class like cs50 we're going to Aspire to evaluate the quality of
code along these three axes generally is it correct first and foremost like
does the code do what it's supposed to do after all if it doesn't well what was
the point of writing it in the first place so it sort of goes without saying that
you want code you write to be
correct and it's obviously not always again anytime you're Mac or PC or
phone has crashed some human somewhere wrote buggy that is code with
mistakes but C correctness is going to be the first and foremost goal but then
there's a more subjective goal we'll see in time a matter of design and we
saw a little bit of this last week when I proposed that we could design even
scratch programs better maybe by using Loops instead of just by copying
and pasting the same blocks again and again so design is more
subjective it's more of a a learned art whereby two people might ultimately
disagree as to which version of a program is better designed but we'll give
you building blocks and principles over the coming weeks so that you can
have a better sense for yourself if your own code is well designed and why is
that valuable well the better design your code is often the faster it's going to
run the more maintainable it's going to be by you or colleagues if you're
working with others in the real world so
good design is a good thing it helps you communicate your ideas just like an
typical English essay and then lastly we'll talk this week onward about style
and this is really just the Aesthetics of your code it turns out that computers
often don't care how sloppy your actual code is um where uh in the world of
code it turns out that you don't really need to indent things in a beautiful
way you don't need to paginate things like you might in an essay the
computer generally does not care but the human does the
teaching assistant does you will care the next day when you're just trying to
understand what your code does so we'll focus lastly on Style the Aesthetics
of the code that you're writing so where are we going to write code where
are we going to compile code so for this class not only with C but the other
languages we use later in the term we're going to use a free text editor that
is program called Visual Studio code AKA vs code it's super popular
nowadays not just for C but for C++ and Python and Java and
any number of other languages it's a text editor in the sense that it lets you
edit text and that all that's all code is going to be now strictly speaking you
could write code on paper pencil in fact in high school if you took a class you
might have done that one or more times as sort of an in-class exercise you
can't run it on paper of course but you could write it certainly you could use
something like Microsoft Word or [Link] or text edit on the Mac but
none of those programs are
really designed to format the code in the best way for you nor are they
designed to let you compile and run the code so VSS code is going to be a
tool via which you can do all that and more write the code compile the code
run the code so that you all don't have to wrestle with stupid technical
support headaches at the beginning of the course by installing this software
and that on your Macs or PCS we'll use a cloud based version of VSS code at
code. cs50. and that's going to be the exact same tool
and the goal then is by the end of the semester to sort of uh migrate you off
of that cloud-based environment to your own Mac and PC so that even if
cs50 is the only CS class you ever take you're 100% equipped to continue
writing code after the class using not something that's even cs50 specific but
a de facto industry standard at least for some time so what's this program
VSS code going to look like be it on your Mac PC or initially in your browser
and it's going to look a little something like this and
there's going to be several different regions to the screen and picture here is
that very same code I keep proposing is the simplest program you can write
in C and what are these different regions of the screen well there's
essentially these four here so first highlighted up top is going to be one or
more tabs where you're going to actually write code so much like in Google
Docs or Microsoft Word you can have tabs open with files similarly in VSS
code or really any programming environment do
generally nowadays have tabs of some sort and this is going to be a tab
containing a file it seems called hello.c and that's going to be the very first
file we write in just a moment uh down here though is going to be an
interface that many of you might not know this is what's called a terminal
window and a terminal window provides what's generally called a
commandline interface or CLI and this is in contrast with a graphical user
interface or guey now you and I every day are using guies
on our phones on our PCS and a guy is literally graphical so menus and
buttons and icons and you generally use your finger or a trackpad or a
mouse or something like that to interact with it but it turns out that many
programmers dare say most programmers at least over time come to prefer
not a guey but a CLI a command line interface where you actually do
everything somewhat uh somewhat arcely via keyboard alone why well it
turns out there's just more features built in to most computers if you can
access them with a keyboard
turns out you can most of us can type faster than you can point and click and
so that ends up being an efficient gain over time so in time will you get
comfortable using this terminal window to do things like compile your code
or make your program as well as run it so you won't be in the habit initially of
just double clicking icons like we do in our typical real world you'll do it sort
of the programmer's way but it's not to the exclusion of adding icons and
clickability and more on the left hand
region of the screen that we're actually going to type most of our commands
and in general in class I'm going to hide all of the graphical stuff that's just
not of all that uh that much interest so with that said let me actually change
over to a live version of vs code and I've indeed HD in the activity bar I've
indeed hid in the file explorer so what I have here for visibility sake is a really
big area for writing code and a really big terminal window at the bottom
you'll see in the terminal window
there's a dollar sign and this doesn't mean any form of currency this is just
the standard symbol that represents type commands here so the fact that
there's just a dollar sign in a cursor means eventually that's where I'm going
to type commands but first i'm going to actually create some code so how
might I program using vs code be it on my Mac PC or in this cloud-based
environment that you'll get set up for problem set one go about writing my
first file well perhaps
the easiest way is this literally run the command code and then the name of
the file you want to create notice that I deliberately end the file with C in
lowercase notice that I've deliberately lowercased the whole file name and
these are just conventions you could use a capital H you kind of could use a
Capital C but just don't do that follow best practices so that it's consistent
with what most everyone else would do when I hit enter I just get an empty
tab just like the screenshot a moment ago
and it's in this tab where I can now write my very first program in C
unfortunately it's not quite as user friendly as scratch where you drag and
drop a couple of puzzle pieces and boom it's done so I'm going to do this for
memory but this too will become familiar to you over time I'm going to
include something called standard io. I'm going to type int main void and
parentheses on a new line I'm going to insert some curly braces as we'll call
them and then I'm going to type print F and then some
parentheses and then in quotes hello comma world then a backslash then a
lowercase n then a close quote and then a semicolon at the very end of the
line so all I've done is recreate just from memory that very first program in a
little bit we'll make clear what most of this does but for now let's just actually
run this thing and just like I click the green flag last week for the first time
let's actually compile and run this program if it were your Mac PC and Google
or Microsoft or someone else
had made the software at this point in the story we'd be double clicking an
icon but we can't do that yet this is still source code so I'm going to click
back down in my terminal window notice I have a second dollar sign below
the first which just means it's ready for a second command and now the
command via which to make this an actual program to compile it from source
code to machine code is going to be quite simply make and then the name of
the program I want to make slight subtlety I'm omitting
deliberately. C because the program I want to make I just want to call hello
so don't write make hello. C just write make hello and this program make is
essentially our compiler technically speaking it's a program that automates
the compilation of my program for me but it is going to see that I've typed
the word hello it's going to automatically look now for a file on the hard drive
called hello.c and convert it from source code in C to machine code in zeros
and ones so if I didn't make any
typos enter nothing seems to happen and that's a good thing almost always
if nothing gets outputed on the screen like you did good like you didn't make
any mistakes you didn't get yelled at there's no error messages so this is
actually a good thing how do I now run this program well notice I've got a
third dollar sign which just means I'm ready for a third command and now I'm
going to go ahead and run dot slash hello and this is admittedly a little weird
that you have
to do dot slash but for now just take on faith that this is how you run a
program called hello in your current folder in your current directory in this
cloud-based environment all right crossing my fingers again hitting enter and
voila my very first program in see hello world and now let me go ahead and
reveal the file explorer that I proposed exist earlier I'm just going to use a
keyboard shortcut to reveal that and generally I keep it close because I don't
really need to know constantly
what files are in my account but you'll see now in the file explorer similar
INSP to a Mac or PC but graphically a little different here's my file hello.c it's
highlighted because I have that tab open but now there's a second file here
called just hello that's the name of my program so if you were on a Mac or PC
you would ideally double click that thing you can't do that in a command line
environment you have to run it down here but that's all we've done we've
created a file called hello.c and then my compiler made the program from
that let me pause here and see if there's any questions because it's a lot of
magical phrases yeah yeah so if you're currently following along uh playing
along at home and you're getting some kind of error message part of today
will be for me to deliberately induce some of those error messages for now
let me just propose that if you literally did what I did you must have made a
typo somewhere and notice that it's indeed standard IO
stdio.h uh maybe you type studio. okay super common mistake if I don't if I
could call you out um like it is not studio. it is standard io. so common but
this is exactly representative of like the kind of stupid headaches you're
going to run into this week probably for a few weeks probably honestly for a
few years but you start to see past these sort of stupid mistakes over time
and it just gets easier and easier because the computer is going to be so
regimented like you it will only do what you tell
it to do and if you say because it's verbally sounds like studio. it's not going
to know what the file is so actually thank you for tripping over that so early
that's super common to happen yeah have two hello so I why do I have two
hello files so why do I have two hello files one is the one I created as the
human called hello.c and it's pictured right here but then when I ran make
hello that process compiled my source code into machine code so this
second file just called hello is the
file that contains all of those zeros and ones that the server actually
understands all right so yeah question if you try clicking on the Hello file
you'll see in this environment of vs code quote unquote the file is not
displayed in the editor because it is either binary AKA zeros and ones or uses
an unsupported text encoding in this case it's binary it's zeros and ones now
you could use software to see those zeros and ones it won't be intellectually
enlightening to most any human so VSS code just takes
the choice of not showing it to you at all so that would be a common mistake
too clicking on a file you don't intend but the source code is indeed going to
be editable by us all right all right so I've written this program it seems to
sort of magically work at least with some effort if you get every single
keystroke right well what is it that's going on and how is this working well
first of all notice that even without my highlighting things or choosing
buttons from menus notice that it's already kind
of colorcoded and yet I wasn't highlighting along the way in sort of Google
doc style changing the color certainly well it turns out what VSS code and
most programming environments nowadays do for you automatically is
syntax highlighting so syntax highlighting is just this feature of typical text
editors nowadays that analyzes the code that you've typed and when it
notices certain types of keystrokes things that represent functions or
conditionals or Loops or variables a lot of the vocab from last
week it just highlights it ever so differently for you so main for instance
which we'll soon see is in purple here int and void and include are in red hello
world is in blue my parentheses are in green this will totally vary by
programmer too in fact if you do want to change these colors for problem set
one for your own environment you can poke around vs code settings via the
gear icon you can change a different a change to a different color theme
syntax highlighting isn't some specific color
a when green flag clicked icon uh puzzle piece roughly an orange and then a
purple uh sa block beneath it so whereas this is the C version if we rewind to
last week this was the same program in scratch but what's happening now is
exactly the same so if you think back to last week and you've got some
function like the say function in purple that might take one or more
arguments like inputs that influences what it says on the screen and then
functions recall can sometimes have side effects right like
the speech bubble appears on the screen so last week when we used the sa
block and we passed in an argument of hello world at left we got this visual
side effect on the screen that says now hello world in the speech bubble and
that's exactly what just happened in VSS code but much much more
textually and let's look a little closer now at the code itself let me wave my
hand at the equivalent of the when green flag clicked part of my code and
let's focus only on the say block in scratch and the
corresponding function in C so if I step through this and I wanted to convert
what we did last week with the say block to C I would first use the print
function although that's actually a bit of a white lie it's actually the print F
function print f means formatted and it's just a function that allows you to
format text on the screen There Is No Sa function in C there's a print F
function what MIT did down the road years ago go was they took what
existed historically as print F and they simplified it for a
broader audience by just calling it essentially say instead but notice that now
if I want to convert the scratch code at left to C code at right it's sort of the
same shape so MIT deliberately used this white oval if only because it kind of
conjures these uh this idea of having parentheses too so on the right if I want
to pass an argument or an input to the print F function I use an open
parenthesis and a CL parenthesis in those parentheses I then type whatever
it is I want to print on the screen in this case hello comma
world but notice I've deliberately left some room because you need some
extra keystrokes in the world of C anytime you type out some text otherwise
known as a string of text to use computer science jargon you need to quote
it in this case with double quotes double quote at the left double quote at the
right and notice to I'm going to include some slightly cryptic symbol here too
back sln which I also typed and said verbally earlier and then one last
nuisance at the end of this which is a semicolon so
suffice it to say this is why we start with scratch this drag and drop you're
good to go in a language like C print F parenthesis double quotes the text
you want backs slash and semicolon at the end there's just so much
syntactic overhead but at the end of the day it's just a function and you'll get
used to these sort of nuisances like the parentheses the quotes the
semicolon and the like but things can very easily go wrong and it's very easy
to make mistakes even with lines of code like
this so let me do this let me go back to vs code where I have the exact same
code notice that on line five is exactly that line of code so this is the
equivalent of the sa block and let's consider what mistakes I may make early
on or even now 20 years later after learning this that are quite common um
in general suppose I forget the semicolon there so easy to do you will do this
eventually let's see what happens now when I go back to my terminal
window and try to compile my code again just to keep things tidy I'm
going to clear my screen but that's just for lecture sake so that we can focus
only on the most recent command but I'm going to go ahead now and rerun
make Hello this will ensure that my program is recompiled and this is a
manual process I changed my code the zeros and ones on the hard drive
have not changed I need to recompile it to Output the latest machine code
so here we go I'm going to hit enter crossing my fingers as before but again I
remove the semicolon by accident oh my God there's like more
lines of Errors now than there are of actual code and this too takes them
getting used to um the programs we're using were not necessarily written
with the least comfortable audience in mind but really professional
programmers back in the day but through practice and through experience
and through mistakes you'll start to notice patterns here too so here's what I
typed make hello after the dollar sign prompt now I get yelled at with as
follows hello.c colon 529 well what's that referring to I've
in green here this semicolon at the end of that line one error generated built
in so some esoteric stuff there but my program did not compile when you
see an error like this it means it did not work so what's the fix well obviously
the fix is to go back up here put the semicolon there and now if I recompile
my code with make hello I won't clear my screen just yet just to show you
the difference now it just worked so we're back in business as before all right
let me pause here though and ask if there's any
questions about what I just did [Music] these error messages will become
frequent initially yeah really good question do you need a semicolon after
every line or just some it turns out just some uh this is something you'll learn
through practice through demonstrations and examples today generally you
put a semicolon after a statement so to speak and this is the technical term
for this line of code it's a statement and think of it is it's kind of the code
equivalent of like an English sentence so the semi colon
and code is sort of like a period in English when you're done with that
particular thought you don't need semicolons for now anywhere else and
we'll see examples of where else you put them but it usually is at the end of
a line of code that hasn't that isn't purely syntactic like uh curly braces
instead other questions on the mistake I just fixed and created for myself
[Music] yeah uh correct so line five is where the error is most likely character
29 means it's sort of 29 characters that
way and then it's actually in this case giving me a suggestion the compiler
won't always know how to advise me especially if I've made a real mess of
my code but often it will do its best to give you the answer like this yeah ah
so how come I first typed code space hello.c and now I'm typing make hello
two different processes so when I typed code space hello.c that was because
I wanted to open vs code and create a new new file called hello.c it's like
going to file new in in a a
Mac or PC thereafter though once the file exists and is actually open here
and it does autosave you don't need to hit command s or control s all the
time I can now compile it with make hello again and again so theoretically I
should never need to type code space hello.c again unless I want to create a
brand new file called the same thing all right so what about this other piece
of syntax here let me clear my terminal window here you can also hit control
L just to throw everything away just to
clean it up aesthetically suppose that I omit whatever this sequence of
symbols is back sln since I'm not really sure at first glance why that's even
there does anyone want to conjecture especially if you've never programmed
before what might happen now if I recompile and rerun this version of the
program I left the semicolon but I took away the backs slash n any instincts
all right well yeah will the next dollar sign appear straight after your it will
the next dollar sign will appear right after my
hello world but what makes you think that back creates a line exactly back
sln is actually a special sequence of symbols that creates a new line and so
to your point if I recompile this program make hello enter no syntax error so
it did compile this time so you don't need the backs slash in you do need the
semicolon but if you don't have the back sln watch what happens when I do/
hello this time now indeed I see hello comma world and then a weird dollar
sign and this is still a prompt I can still type
commands at it like clear and everything gets cleaned up but it just looks
kind of stupid if I run it again here with hello you know it's just not very user
friendly it is convention that when you're done running your program you
should ideally clean things up move the cursor to the next line for the user
and so the backs slash n is simply the special symbol otherwise known as an
escape sequence that c knows means move the cursor to the next line in
other languages python among them uses the
same symbology as well now if I go back to the code here and for instance I
try to do this differently like suppose I don't put the backs slash N I just hit
enter like a normal person would in Google Docs or Microsoft Word let me go
ahead and try compiling this program and this you would hope would work
right you would hope this would print out hello world and then a blank line
because I move the cursor to the next line but no if I run make hello now and
try to compile that c does not like this now I
get a different error still on line five this time starting at character 12 uh
error missing terminating double quote character and then some other
esoteric stuff and then this does not sound good fatal error this time too
many errors admitted stopping now so I really screwed up here so why can't I
do this just because like the humans who designed C decided that if you
have a string of text it must stay on the same line it can get really long it can
soft wrap that is without you hitting enter
but you can't hit enter to create a new line if you deliberately want a new
line you have to indeed use this back slash and Escape character so let me
go ahead and do this let me put it back let me go back to my terminal
window I'll clear the screen again let me go ahead now and do make Hello to
recompile to that version do/ hello and voila we're back in business with uh
hello all right so now let's tease apart some some other aspects of this code
because there's a lot going on just to get us to say hello
world on the screen for today we're largely going to ignore this int main void
and these curly braces here we'll come back to that before long as to why it's
there but for now just think of int main void and these curly braces here as
really being the C equivalent of when green flag clicked like why you just
need it there that's how you get your program going and Maine is indeed
going to be some special function but more on that another time but why do
I have this line of code here the cor spelling is
indeed standard io. stdio.h and they're angled brackets this time so that's a
little new there's a hash and then an include keyword you know if you don't
know what something is you know there's not really that much harm in just
getting rid of it and see what happens so let me delete that line let me go
back to my terminal window clear the screen and then run make hello again
and let's try compiling this program now without that first line why I don't
understand it so let's see what
happens all right here's yet another error but let's see how hello. C line five
character 5 so it's pretty early on error implicitly declaring Library function
printf with type int and then dot dot dot so implicitly declaring Library
function print F so this is very cryptic sounding you'll get better at
understanding phrases like these but apparently I do need the include line
for standard i.h but why based on this symptom what might Your Instinct be
for what that first line of code is doing for us in
the first place why intuitively must it be there exactly it's like importing a
library so that you can do things like print things out on the screen now in
scratch you didn't have to do this for most of the puzzle pieces but you might
recall that partway in through week zero I went to the extensions button at
the bottom left of the scratch screen and I imported some extra puzzle
pieces for text to speech that gave us the sort of creepy uh humanized voice
that actually came out of the cat's mouth well that
was like adding a library code that someone else wrote in that case it was a
third party then but I gave myself access to it same here turns out that you
don't really get printf automatically in C you have to include a so-called
header file that declares that function to exist now the reason for this
historically is just efficiency back in the day when computers were really
slower and resource constrained you don't want to just give yourself access
to the entire kitchen sync of functionality you only want to include
only the functions you actually care about nowadays it's sort of a sort of a a
copy paste step because you almost always want to print something out on
the screen at least when writing programs like these but these so-called
header files contain enough information about all of the functions in What's
called the standard IO library and standard IO just means standard input in
output and that's appropriate right because printing is pretty basic output
turns out there's other functions for
getting input from the human's keyboard more on that in a bit but anytime
you want to print something on the screen and see you indeed need to
include this header file at the top of your code and that's going to essentially
inform the compiler hey compiler I want to use functionality from the
standard IO Library including printf in this case and if you omit the header
file by accident it's just not going to work because it doesn't know what print
f is it's sort of some unrecognized symbol in
that case all right questions then about this line of code this line of code here
or what these header files are all right you might wonder well how do you
know what functions exist how do you know what files you might indeed
want to include well it turns out that c is a many-year old language and it has
ample documentation a caveat is that its documentation isn't necessarily all
that userfriendly but what we have for the course is a simplified version of
the official documentation for C at this URL
here manual. cs50. so in the world of c and other languages too there are
what are called manual pages and these are just like text-based
documentation that honestly is typically written in a voice that you kind of
have to be an experienced programmer to understand some of it so what
we've done it this version of the same documentation is we've imported all
of the original official documentation but we've added sort of less
comfortable translations in English for a lot of the functionality
that you might use in class just to help onboard you so at the end of the day
you don't need this documentation long term but just to get started we'll
translate it into terminology uh that you might appreciate from a teaching
assistant for instance as opposed to the original author of these documents
and so for instance if you were interested in reading up on what functions
exist in the standard io. um header file well you could go to a URL like this or
you could search for it at manual. cs50. that
would show you a list of all of the available functions in that library and print
F indeed would be one of them and then you could click further on that
reaching a URL like this that's just going to give you all of the documentation
for how to use print def it turns out you can do even more than it uh than
just printing out hello world and we'll scratch the surface of that today but it
turns out that the documentation will always be your authoritative Source
ultimately for questions like what can I do and how can
I do it meanwhile it turns out that cs50 has its own library in A Accessible via
header file called cs50.h it turns out in C that output is actually pretty easy
relatively speaking once you get used to all the curly braces parentheses
quote marks and the like but input is a little more difficult and if you have
programmed before input's not that hard to do in Python it's not that hard to
do in Java it's more difficult to do in C and we'll see why in a couple of weeks
but for the first couple of weeks of the
class we actually provide you with some training wheels of sorts whereby we
have a number of functions that are declared in this file cs50.h it lives its
documentation at a URL like this this and in a moment we'll use a few of
these you'll see that cs50 provides you with some functions like get Char
forget a single character from the user's keyboard uh get int to get an
integer from the user's keyboard uh get string to get a sequence of text from
the user's keyboard and a bunch of others as
well so let's actually use some of these functions how about by revisiting
really the second program we wrote in scratch last time which adds some
input to the output so first version of scratch was just hello world said the
same thing every time you click the green flag version two recall though did
this it asked the user what's your name and then that somehow gave it back
a a return variable return value we called it and we then joined hello in that
name to say something a little more interesting on
the screen so what did that model look like same thing as before we've got a
function in the Middle where function is like the code implementation of our
algorithm that takes in one or more arguments like what is it you want to uh
say on the the screen ultimately and return value in this case is going to be
actually a value that comes back so in the case of getting input we can
consider this ask block again like last week the input to it is whatever words
of English you want to ask the user and
then it returns a value and this was called by default in mit's World answer
that we'll see and see you can call these return values anything you want
ultimately in variables but this is different from a side effect a side effect is
just something visual often that happens on the screen like the speech
bubble or hello world a return value is actually a value you get back from a
function that you can use or reuse so how do we convert this scratch block
from last week to C code this week well if you want to ask the user for
something like their name you can do this you use a cs50 function called get
string and you use the parentheses to represent here comes the inputs there
too you can then put the sentence you want to ask the user quote unquote
what's your name but you do indeed need the quotes literally in C so I'll go
ahead and add those as well well subtle but I've deliberately included a
space after the question mark but before the double quote just so that the
cursor moves one step over because in this case
we're not going to get a special speech box like we did in scratch it's just
going to leave the cursor where it is so we'll see that aesthetically this just
moves the blinking cursor one space after the sentence on the screen all
right but the catch is with scratch we just automatically got back the answer
from the user in a special variable called answer in C you're going to have to
be a little more specific in C if you want to get back a return value from a
function like get string you have to use
an equal sign and then the name of a variable on the left the choice of
variables is up to you I could have called this anything X YZ I'm going to
more descriptively call it answer for parody with what MIT did with scratch
but notice that this doesn't represent equality per se this is assignment in
this case so in C when you use a single equal sign that means means copy
the value on the right over to the value on the left from right to left so what
does this do for us well if get string is a
function that prompts the user with quote unquote what's your name and it
has I claim a return value that means it kind of hands me back some value
but it's up to me in C to do something with that value so if I want to copy
that value into a variable that I can use and reuse I use an equal sign and I
invent on the left hand side of that equal sign any variable name I want
there's certain rules certain conventions but generally if you use a single
word with all lowercase you're in good shape but C is
a little more pedantic than that and those of you who have have
programmed before might not be used to this for instance in Python which is
a world we'll get to in a few weeks you also have to tell see what type of
value you're storing so if I do want a string of text from the user so not an
integer not a single character I want a whole string of text like a phrase a
sentence a name in this this case I have to tell C that this variable is of type
string so it's a little wordy but you get used
to it and you just have to be precise you're informing the computer what
type of value is going in this variable all right it's so close to being correct
but I have omitted something that's annoyingly important still what's missing
still yeah so semicolon this is a statement this is like a a a full thought if you
will in code I do need to end It ultimately with the semicolon at the end there
all right so this was more of a mouthful but let's try using this in now my
code let me go
back to vs code where I have version zero of my code here let me go ahead
and include one other file at the top of hello.c namely include cs50.h so that I
have access now to get string and anything else I might want now let me go
ahead and add a line of cod here inside of these curly braces and let me go
ahead and do this string answer equals get string quote unquote what's your
name question mark I'm going to add an extra space before the double
quote I'm going to indeed end my thought with a
semicolon and now let me deliberately make a mistake just to make a point
here let me now try changing hello world to hello comma answer all right
now perhaps even though this is some new lines of code you can see where
I've aired already but let me try making this program now so far so good so
no error messages so that's a good thing let me go ahead and run /hello and
you'll see the prompt what's your name question mark and notice the cursor
indeed one space to the right just because I
thought it would look prettier to put a little Blank Space there as opposed to
leaving it right after the question mark let me type my name but even if
you've never programmed before I have screwed up here what are we going
to see on the screen when I hit enter yeah hello answer most likely why cuz
the computer is going to take me literally and if I say quote unquote hello
answer that is the string of text followed by a new line that's going to be
outputed to the screen so we need some way of actually plugging answer
into this line of code it's not quite as simple as scratch where you could just
grab like a second say block and drag and drop the variable there we
actually need a new syntax and it's going to look weird at first but it is
everywhere in software nowadays especially in the world of c and certain
other languages so let me go ahead and propose that I solve it as follows
well back when we did this in scratch remember that the most elegant
solution was this here we used the say block still which is going
to be analogous to print F today but I use the join puzzle piece in scratch to
combine hello comma space and then the name of the human so how do we
translate this code to C well it's going to look a little different now I'm going
to start with print f with some parth C's and a semicolon representing the say
block but how do I now do this joining this is where the puzzle pieces don't
quite translate perfectly this would be the way to do this you put hello
comma and then a placeholder so this is what's
known as a format code in C specifically for printf and it just means this is a
placeholder for a string again a string is just text so this means hey
computer print out literally hello comma space and then not literally percent
s percent s is uh treated specially to mean plug in some value here all right
so what else do I still need well this is still some text so I'm still going going
to surround the whole thing with double quotes I'm still going to include my
back sln just to keep things tidy and
move the cursor to the next line so the last step here in C is to somehow join
the answer with that word hello and the way you do this is with printf passing
it not one argument which is what I keep doing I keep passing it one string of
text quote unquote I'm going to now add a comma and then the name of the
value that I want want printf to go back and plug into that percent s and
printf is just smart about this if you have 1% s and one additional argument
after a comma it just does from right to left it
plugs it in if you have two percent S's and two variables after the comma
that's okay too if you separate them with commas it'll plug the first into the
first percent s and the second variable into the second percent s so it's just
left to right order of operations it's not as pretty or as uh simple as this but
this is how it's done in C all right let me pause because this is a lot of
symbology any questions on this technique here [Music] yeah yeah really
good question why did I
good catch yeah can I show an example with 2% s's surely so let me uh in vs
code do this let me clear my terminal window to clean things up and let me
do this instead of calling the variable answer all over the place let me call it
uh first and I'll ask two questions what's your first name and now let me do
string last equals get string quot whoops capitalization matters so let me fix
my capital S there quote unquote what's your last name question mark
semicolon and now we'll
plug in one percent s and a second percent s and now I'm going to plug in
first first and last last coincidentally and now I'm going to go back to the
terminal window make hello crossing my fingers all good/ hello here's my
first question David here's my second question me and again hello David Ma
so it just inserts them left to right all I was doing for parody with scratch
though and let me go ahead and undo this again I'll go back to answer like
this I'll go back to just
asking for the person's name I'm going to delete mention of last I'm going to
delete mention of the second perc s and now if I recompile this simpler
version I did screw up didn't intend it what did I do wrong yeah so just
newbie mistakes so I changed my variable back to answer just to be
consistent with week zero but I didn't change it here so I have an use of
Undeclared identifier first it's Undeclared in the sense that I declared answer
align prior I didn't declare
first so indeed intuitively I want to just change that to that let me now do
make hello again /hello type in just my first name this time and there it is
hello David questions on this then syntax with print F yeah uh the
placeholder I'll zoom in is just a single percent and then an S so inside of my
string here is percent s and then I have a comma outside the quotes and
then the name of the variable whose value I want to plug in for that percent
s and now notice there's technically two commas in side of these
parentheses on line seven and yet I claim that printf at the moment is only
taking in two arguments why is there then two commas but only two
arguments if there were two commas you would think there would be three
arguments right exactly the comma in between the quotes is just an English
thing it's separating the hello from the name so that's why indeed it's not
only in quotes that that's also why programs like VSS code tend to syntax
highlight it a little differently just so that it
sort of jumps out as different to you even though in this case it's a little
subtle a light blue versus white but indeed it's trying its best other questions
now on this placeholder yeah ah good question if I wanted to add an
exclamation point after the name would I have to add another placeholder
and so forth I could actually do that much more simply I can just put the
exclamation point right after the percent sign I don't need an additional
placeholder per se if I zoom out now and
run make hello again/ hello and type in just my name no exclamation point
now you'll see more excitedly hello comma David So print f is smart it will
figure out where the percent s is and then go and replace it now let me
propose that a common thing in programming is that as soon as we make a
decision as to how to design something we often paint ourselves into a
corner and sort of regret a decision can anyone think of a problem that arises
from using percent S as a placeholder in this string to print F
what could go wrong if we're using percent in this special way yeah if you
literally want to say for whatever weird reason percent s on the screen or
honestly even just a single percent it turns out that a percent sign is treated
specially inside of of print F strings so what's the solution here there's
different uh patterns of solutions to problems like these but suppose you
wanted to say uh I got 100% for instance let me let me go ahead and change
this completely so I got 100% on your test or whatever all
right let me go ahead and run make hello enter all right so invalid conversion
specifier I mean I have no idea what this means but it's underlining the
percent sign is problematic well it turns out that humans years ago decided
all right damn it we already Ed percent well 2% signs will mean 1% literally
so now if I rerun make hello aha hello I got 100% so there's going to be
things like that honestly that you have to ask someone you have to Google
you have to look it up in the documentation but
there's always a solution to those kinds of problems and thankfully they don't
come up all that often yeah oh just poting other questions [Music] yeah if
you have multiple variables it is in the Left Right order so print F will analyze
the first string of text that you pass in between quotes and whatever the first
percent is the next the first variable that's passed in after a comma gets
plugged in there and then the second gets plugged into the second third and
to the third and so forth so it's just based on left to
right yeah it's just a placeholder it's called a format code and it just means
colloquially plug in some value here and printf the like the humans who
wrote printf decades ago decided to treat percent s special why just because
they needed some placeholder they decided that eh no one's ever going to
really want to type percent s and if they do they can just do percent percent
s so they decided to implement printf in such a way that they have code that
look analyzes whatever text comes in looks
for percent s and then somehow plugs in the subsequent values into that
placeholder and just this H question [Music] sorry ah so what if you wanted
to do single characters like initials like DM or djm for first middle last
absolutely and that too is a perfect segue from the two of you to what in
general are going to be called data types in C so it turns out in C there's not
only strings as text and we'll see in more detail in over the next couple of
weeks what a string really is underneath the hood but
strings of text are not the only thing that programs can output they can
indeed output single characters as for initials they can output integers as
well turns out that printf has different format codes for all sorts of different
data types and just some of the data types we'll see in the coming weeks will
be this list here which you'll notice it almost perfectly lines up with the cs50
functions that I rattled off earlier like get Char get int get string the reason
we called those functions that is
those can all be different data types dot dot dot but for now we'll focus really
on just these Primitives that was a lot let's go ahead and take a 5minute
break here no cookies yet but in 5 minutes we'll come back dive into more
detail and our second break today we'll have cookies all right we are back
and so if you have been playing along at home but hitting some bumps in
the road that's totally normal and indeed the goals of lecture generally will
be to give you a sense uh conceptually of where we'll be
going during the course of the week but it's indeed through the Hands-On
labs and problem sets that you'll really have an opportunity at your own pace
to work through uh some of those same bumps in the road but for today let
me give you a few more building blocks and these two will translate from
scratch initially namely like conditionals like how now and see after knowing
now how we can use functions at least uh get string and print F and we can
use variables like the string I created earlier how can I
now add to the mix things like decisionmaking and conditionals at that well
with conditionals in scratch we had this kind of syntax on the left here in
scratch is how you might Express if two variables X and Y have this
relationship if x is less than y then say on the screen X is less than y well let
me translate that to the right now in C code so in C the corresponding code is
going to look like this assuming X and Y already exist more on that later and
notice a pattern we're going to see again and again there is going to be
the word if when using a function like print F or get string you shouldn't both
will work but you'll find that these are conventions stylistically that most
people adhere to so space when using an if here all right now inside of the
curly braces is where the actual code goes that you want to execute
conditionally so if you want to print out X is less than y only if x is actually
less than y in C you use this open curly brace which up until now you've
probably rarely used on your keyboard and the Clos curly brace down
here and those are kind of hugging if you will the one or more lines of code
underneath the if very similar in spirit to how the orange block here kind of
hugs the purple puzzle piece here so there's no Graphics in C it's all text so
you can think of those curly braces as really representing the same idea as a
side note if you only have one line of code inside of the if condition if you will
you strictly speaking don't need the curly braces but as a matter of good
style do include them it will make more
obvious what your intent is how about in scratch if you wanted to express
this two ways in the road that you might go left or right so to speak well if x
is less than y I want to say x is less than y else I want want to say the
opposite X is not less than y in this case so I'm making a decision based on
that Boolean expression in C it's almost the same but you're adding to the
mix the keyword Els so MIT borrowed for scratch the same keyword there
and a second pair of curly braces open and close respectively and
you might guess now what goes inside of those well you print out X's less
than y or you print out X's not less than y all right what if there's a three-way
fork in the road in scratch this actually gets a little unwieldy graphically if
you will but notice that in scratch this is how we could express if x is less
than y say x is less than y else if x is greater than y say x is greater than y
else if x equals y then say x is equal to Y now minor inconsistency here just a
little bit ago I claimed in C that a
equal sign represents what operation assignment from right to left left in so
far as scratch is really meant for kids and they didn't really want to get into
the weeds of this kind of uh semantic equal sign in scratch means equality
however we're going to need to fix this in C in just a moment in c equal sign
means assignment right to left in scratch it literally means what you would
expect all right let's translate this code then to C on the right this code would
correspond really to this and you can perhaps see somewhat
goofy what the solution was not unlike the percent percent solution earlier
when hum painted themselves into one other corner you say if you say else
if and you say else if and how did we resolve the use of a single equal sign
already in C when you want to repr when you want to express equality is the
thing on the left equal to the thing on the right you literally use two equal
signs right next to each other no space in between them but now this code
would be correct on both the left and the right whether you're doing this in
scratch or C respectively but now we can kind of nitpick our code specifically
the the design thereof logically can anyone critique the design of this code
either in scratch or C like I feel like we could do better how about in back
perfect logically it's got to be the case that X is less than y or X is greater
than y or by conclusion it's got to be equal to Y so why are you wasting my
time or the computer's time asking a third question you don't need to ask
this final else if because logically as
you knowe it should go without saying so it's a minor tweak like you're doing
extra work potentially in cases where x equals y so we can just refine that
and just like in scratch you could just use an else block similarly in C could
we simplify this code to just an else a sort of catchall logically that just
handles the reality that of course that's going to be the final situation instead
all right so we have this ability now to express conditionals with Boolean
expressions let's actually do
something with this next here so let me go back to vs code I've closed hello.c
and I want to create a second file for the sake of some demos now recall that
you can create new files by typing code space and then the name of the file
you want to create uh for instance I might do compare. c I want to write a
program that's going to start comparing some values for demonstration sake
but before I do that let me just show you by opening the file explorer at right
this is similar in spirit to a Mac or PC like
you can go up here and click on an icon and you can click on the plus icon
and you'll get a blue box and I can type in compare. C and I can just
manually create it that way notice that opens the tab even without my
having typed code so again on the left you have a guey a graphical user
interface albeit a simplistic one on the right and at the bottom here you have
a command line interface but they're one and the same what's nice though is
that if I close this file accidentally intentionally
whatnot I can reopen it without creating a new one by just running that same
command code space compare. c so code is a vs code thing it's just a
userfriendly shortcut but it's just creating a file or opening an existing file like
that I'm going to hide the file explorer just to make more room for code here
and let's go ahead and do this let's write a program that compares two
values that the human inputs but not strings this time let's use some actual
integers all right I'm going to go ahead and include
the cs-50 librar header file at top cs50.h I'm going to also include standard
io. why one gives me userfriendly input via get string get int and so forth one
gives me user friendly output via printf in the case of standard i.h now I'm
just going to kind of blindly type this line of code which we'll come back to in
future weeks but for now that's analogous to the when green flag clicked
code in scratch and now let's go ahead and do this let me go ahead and get
an INT from the user and
ask the user what's X question mark I'm not going to bother with a new line I
want to keep it all in one line just for Aesthetics sake but when I get back an
INT just like I get back a string I get back a return value so if I want to store
the result of get int somewhere I had better put it in a variable and I can call
the variable anything I want previously I used answer or first or last now I'm
going to use x but there's still two things left to do here logically even though
we haven't
technically done this yet what do I still need to do so I need the semicolon at
the end and the in at the beginning you the programmer starting today kind
of need to decide what you're going to be storing in your variables and you
just need to tell the computer that so that it knows now as a teaser for
languages like python more modern languages turns out humans realize well
gee this is stupid like why can't the computer just figure out that I'm putting
an INT there why do I have to tell it proactively so
in some languages nowadays like python we'll get rid of some of this syntax
we'll get rid of the semicolons but for now we're looking at really the origins
of how this all worked all right so I've done this one line ending with
semicolon let me do one other and let me get a second int asking the user
what's why question mark so almost identical but different responses from
the user hopefully and let me just ask simply if x is less than y in
parentheses then some curly braces let me go ahead and
print out quote unquote X is less than y back sln and now just as a side note I
I seem to be typing kind of fast some of that is because vs code is helping
me let me go back to this first line with the if hit enter and now I'm only on
my keyboard going to type the open curly brace this is a feature of many
text editors nowadays it finishes part of your thought why just to save
yourself a keystroke to make sure you don't accidentally forget the closing
one so you'll notice sometimes that things are happening that you
didn't type it's just vs code or future programs you use trying to be helpful
for you I'll go ahead and manually type out now printf uh X is less than y
back slend close quote semicolon so let me go ahead now and try to run this
and we'll see let's see so make not hello but make compare because this file
is called compare. C hitting enter okay no output is good because it means I
haven't messed up let me do dot SL compare instead of/ hello enter what's X
how about one what's Y how about two x is
less than y well let's try it again and here I'll save you some keystrokes too
let me clear my screen instead of constantly typing dot slash this and Dot
slash that you can also use your keyboard's arrow keys in vs code to scroll
back through time so if I hit up once there's the last command I wrote If I
write uh do it up twice there's the second to last command I wrote so
sometimes if you see me doing things fast it's just because I'm kind of
cheating and going through my history
like that all right let me go ahead though and rerun compare enter let's
reverse it this time two for x one for y and now of course there's no no
output all right well that's logically to be expected because we didn't have an
else here so let's add that else now let's open my curly braces letting vs code
do one of them for me print F quote unquote X is not less than y back sln
semicolon let me go ahead and try this again/ compare enter again two for x
one for y and we should see
huh what did I do wrong why am I not seeing any else output yeah exactly
you got to get into the Habit after you change your code of recompiling it
otherwise the zeros and ones in the server are the old ones until you
manually compile so let's fix this make compare enter no error messages
that's good do/ compare 2 1 and now I get back the output so X is not less
than y how about if I go and add in the third condition well we can do this
either efficiently or inefficiently let me go ahead head and refine this so else
if x is greater than y let's literally say x is greater than y and now I could do X
else if x equals equals y but I think we already claim that that's unnecessarily
inefficient so let's just have our catchall and here I'm going to say quote
unquote X is equal 2 y back sln close quote there so I think now with this
code we've handled all three scenarios let me go ahead and recompile it
properly compare do/ compare and now 1 and 2 x is less than y let me run it
again 2 and One X is greater than y and
lastly one and one and X is equal to Y so for the most part our code's getting
longer we're up to like 21 lines of code though some of them are just single
characters on the screen almost everything else is the same I'm using the
cs50 libraries header file for my get int function standard i.h for my print F
function and the rest of this is just now new syntax for conditionals as well
questions then on this C implementation of just some basic comparisons like
this any questions [Music]
that you keep the curly braces on their own line if only because it rather
resembles like the hugging nature of scratches blocks and just makes clear
that they're balanced open and closed however another common Paradigm
in some languages and with some programmers is to do something like this
uh on each of them so you have the opening curly brace on the same line as
here we do not recommend this this is invogue in the JavaScript world and
some others um but ultimately in the real world it's up to
I'm going to write code agree. C just to give myself a new tab I'm going to
start as always now include cs50.h let's include standard i.h and then let me
do my int main void which again for today's purposes we'll take it face value
is just copy paste and if I just want to get y or n for instance instead of yes or
no we can just use a simpler variable here how about just a Char a character
A Single Character so I can use get Char to ask the user for instance do you
agree question mark but before as before
I need to store this somewhere so I don't want a string because it's a single
Char I don't want an INT I just want a Char and it's literally CH h a r and then
I could call this thing anything I want it's conventional if you have a simple
program with just a single variable and it's of type Char call it C if it's an INT
call it I if it's a string call it s for now I'm just going to keep it simple and call
it C and now I'm going to ask a question so if C equals equals how about
quote unquote
y then let me go ahead and print out uh agreed back sln as though they
agreed to my terms and conditions uh otherwise let's see else if the
character equals equals quote unquote n then let me go ahead and print out
uh say not agreed as though they didn't quote unquote and let's leave it at
that I think here initially now you'll notice one curiosity one inconsistency
perhaps does anyone want to call it out though it's somewhat subtle I've
done something ever so slightly differently without explaining
it yet do you see it single single yeah so I've suddenly used single quotation
marks for my single characters and double quotes for my actual strings of
text this is a necessity in C when you're dealing with strings like strings of
text like someone's name a sentence a paragraph anything really more than
one character you typically use uh double quotes and indeed you must when
dealing with deliberately single characters like I am here for y or n you must
use single quotes instead why because that makes
sure that the computer knows that it's indeed a Char and not a string so
double quotes are for Strings single quotes are for chars so with that said let
me go ahead and zoom out let me go ahead in my terminal window run
make agree enter seems to work okay so let me go ahead and do/ agree uh
let me go ahead now and type in y here we go enter huh uh let me try that
again rerun do/ agree how about no enter why is it not behaving as I would
have expected uh because you the capital Y yeah I kind of cheated there and
I hit
the caps lock key just as I started typing in input why because I deliberately
wanted to type in uppercase instead of lowercase which is kind of reasonable
right it's a little obnoxious if you force the user to toggle their caps lock key
on or off when you just need a simple answer that's not the best user
experience or ux but it would work if I cooperated let me run this again
without caps lock on y lowercase for yes ah that worked and uh lowercase for
no that work worked but how could I get it
to work for both well how about this let me go ahead and just add two
possibilities so else if C equals equals quote unquote capital Y then also do
printf agreed back sln and down here else if C equals equals uh single quote
uh capital N then go ahead and print out again not agreed okay this I will
claim now is correct and I'll do make agree real fast do slash agree and I'll
use Capital it now works I'll use uh Capital it again works but this is perhaps
not the best design let me hide the terminal
window and pull this up on the screen all at once why might this arguably not
be the best design even though it's correct there's another term of art we
can toss here like something smells kind of funky about this code this is an
actual term of art like there's code smell here like something smells a little
off why what do you [Music] [Music] think yeah there's the same output
again and again I mean I manually typed it but honestly I might as well have
just copied and pasted most of my original
code to do it again and again for the C two capital letters so if line 10 and 14
are the same and line 18 and 22 are the same and then the rest of these if
and Els ifs are almost the same like like there's some code smell there like
it's not well-designed why because if I want to change things now just like
last week in scratch I might have to change my code in multiple places or
copy paste is never a good thing and go God forbid I want to add support for
yes and no as full words it's really going to get long
so how can we solve this well it turns out like we can combine some of these
thoughts so let me try to improve the yeses first it turns out if I delete that
Clause I can actually or things together in scratch there's a couple puzzle
pieces if you didn't discover them that literally have the word or and the
word and on them which allow you to combine Boolean Expressions so that
either this or this is true or this and this is true in C you can't just say the
word or you instead use two vertical
bars and vertical bars together mean or logically and so I can say C equals
equals quote unquote capital Y agreed and now I can get rid of this code
down here and let me go ahead and say vertical vertical bar twice C equals
uh quote unquote n in all caps and now my program's like you know roughly
a third uh smaller which is good there's less redundancy and if I reopen my
terminal window rerun make of agree SL agree now I can type little y or Big Y
and same thing for lowercase and uppercase n any
questions then on this syntax whereby now you can combine thoughts and
just kind of tighten things up and there'll be other such tricks too yeah a
really good question is there not a function to just ignore the case short
answer there is and we'll see how to do that in actually just about a week's
time and in other languages there's even more ways to just canonicalize the
user's input throwing away any space characters they might have
accidentally hit forcing everything to lower case in
C it's going to be a little more work on our part to do that but in fact as early
as next week we'll see how we can do that but for now we're comparing
indeed just these literal values other [Music] questions really good question
so we are assuming with this program in all of my last ones that the human's
cooperating when I asked for their name they typed in David and not one
two three or in this case they typed in a single character and not a full word
so this is one of the features often of using a
library so for instance if I run agree again and I say something like sure enter
it rejects it alt together why because s e is a string of characters it's not a
single character now I could just say something like X which is neither y nor
n of course but it tolerates that because it's a single character but built into
cs50's library is some built-in rejections of inputs that's not expected so if
you use get int and the user types in not the number one or two but cat C A
it will just prompt them
again prompt them again and this is where too if you were to do this
manually in C you end up writing this much code just to check for all of these
errors that's why we use these training wheels for a few weeks just to make
the code more robust but in a few weeks time we'll take the library away and
you'll see and understand how it's doing all it's indeed doing all that all right
so how about this let's now transition to something a little more scratch like
literally uh by creating how about
another program here called meow so meow. C we won't have any audio
capabilities for this one will just rely on print and suppose that I wanted to
write a program in C that just simulates like a cat meowing so I don't need
any user input just yet so I'm just going to use standard i.h I'm going to do
my usual int main void up here and then I'm just going to go ahead and do
printf meow back sln and let's have this C meow three times like last week so
I'm going to do meow meow meow notice as an aside
whenever you highlight the lines you'll see little dots appear this is just a
visual cue to you to let you figure out how many spaces you've indented vs
code like a lot of editors will automatically indent your code for you I've not
been hitting the space bar four times every time I've not even been hitting
tab however in C the convention is indeed to indent lines where appropriate
by four spaces so not three not five and these dots help you see things so
that they just line up as a matter of good style
all right so this program I'm just going to stipulate right now is indeed going
to work make meow which is kind of cute and now meow there three times
correct it's meowing three times but of course this is not well designed it
wasn't well designed in scratch last week why what should I be doing
differently yeah yeah it's a perfect like opportunity for a loop why because if
you wanted to change maybe the capitalization of these words you wanted
to change the sound to like woof or a dog or something like
you'd have to change it one two three places and that's just kind of stupid
right in code you should ideally change things in one place so how might I do
that well we could introduce a loop yes but we're going to need another
building block as well that we had in scratch namely those things called
variables so we're call that a variable like in algebra x y z whatever can store
a value for you and a variable in scratch might have looked like this you use
this orange puzzle piece to set a variable of
any name not just X Y or Z but you could call it something more descriptive
like counter and you can set it equal to some value in C the way to do this is
similar to to Spirit to some of the syntax we've seen thus far you start by
saying the name of the variable you want a single equal sign and then the
value you want to initialize it to copying therefore from right to left why
because the equal sign denotes again assignment from right to left this isn't
enough though you might have the intuition already what's
missing probably from this line of code just to create a variable so we need
int to make sure the computer knows that this is indeed an INT and then
lastly semicolon as well and that now completes the thought so a little more
annoying than scratch but we're starting to see patterns here so not every
piece of syntax will be new all right if you wanted to increment the counter
by one scratch uses the verb change and they mean add the value to
counter so if I want to increment an existing variable
called counter this syntax is a little more interesting it turns out the code
looks like this which almost seems like a paradox like how can counter equal
counter plus one like that's not how math works but again a single equal sign
is assignment from right to left so this is saying take whatever the value of
counter is add one to it and copy that value from right to left into counter
itself you still need the semicolon but I claim you do not need to mention the
keyword int when updating an existing
variable so only when you create a variable in C do you use the word string
or the word int or any of the others will eventually see only when creating it
or initializing It For the First Time thereafter if you want to change it it just
exists it's the word you gave it the computer's smart enough to at least
remember what type it is so this line is now complete turns out in code as
we'll see it's pretty common to want to add things together increase
increment Things by One so there's actually
different Syntax for the same idea the term of art here is syntactic sugar like
there's often in code many ways to do the same thing even though at the
end of the day they do exactly the same functionality so for instance if after
a few days of cs50 you find this a little tedious to keep typing and some
program you can simplify it to just this this is the syntactic sugar you can use
plus equals and only mention the variable name once on the left and it just
knows that that means the previous thing it's
just slightly uh more uh more succinct this to is such a common thing to add
one to a value and it doesn't have to be one but in this case it is but if it is
indeed one you can further tighten the code up to just do this counter Plus+
so anytime in C you see plus plus it means literally adding one to that
particular variable there's other ways to do this in the other direction if you
want to subtract one from a variable you can use any of the previous syntax
using a minus sign instead of Plus or you can more
this is kind of ridiculous right like we went from two super simple puzzle
pieces like this to my God like it's 1 2 3 four five six lines of code all of which
are pretty involved so like that escalated quickly but what's each line doing
and we'll see other ways to do this more simply so we're izing a variable
called counter to three just like before why well what does it mean to Loop or
to repeat something three times well it's kind of like doing something three
times and then do it and then count down and
then do it and then count down and then do it until you're all out of counts so
this is declaring a variable called counter setting it equal to three then I'm
inducing a loop in C which is similar in spirit to repeat three but you have to
do more of the math yourself so I'm asking the question in parentheses while
counter is greater than zero what do I want to do well Pur the indentation
inside the curly braces I want to meow one time and then to be clear what's
this last line of code doing if counter
starts off at three this makes it two by subtracting one from it then what
happens by nature of a loop just like in scratch it kind of knows to go back
and forth even though there's nice pretty arrow in scratch and there isn't
here C knows to do this again and again and again con stantly asking this
question and then updating this value at the end so if I highlight just a few of
these steps the variable starts off at three and actually let me simplify two I
claimed earlier that uh when using
single variables people very often just call it I for INT or c for Char or s for
string unless you have multiple variables so let me tighten the code up and
this already makes it look a little more tolerable let me actually tighten it up
further and one more step so now this is about as tight as succinct as you
can make this code at the moment so what's actually going to happen here
well the first line of code executes and that initializes I to 3 then we check
the condition while I is greater than
zero is I greater than zero we'll per my three fingers obviously so we print out
meow on the screen then we subtract one from I at which point now we have
two as the value of I then the code goes back to the condition and notice the
condition there is in parentheses that's another Boolean expression so Loops
can use Boolean Expressions just like conditionals use Boolean Expressions
to make decis ision the loop though is deciding not whether to do this thing
or that but whether to do the same thing
again and again and again and as it ticks through the code one line after the
other it's ultimately going to get down to uh one and then zero and then stop
so put another way came with some props here so suppose uh this bowl here
is your variable and you initialize it to three with like three stress balls you
can do something three times right if I want to give out three stress balls
here's your chance for free stress ball without having to answer any
questions any okay there we go so here we go
subtracting one from my variable I'm left with two uh oh my God all right uh
don't tell Sounders oh I'm sorry oh okay that ended poorly apologies all right
but now the educational point though is that my variable has been uh
decremented further to just have I'm not throwing that far again I can't do
this in here we go all right here we go and one final subtraction and now our
variable is left so we have three stress balls there and that's all a variable is
right it's some
questions on Loops all right so it turns out this is kind of ugly and like this
really starts to take the fun out of programming uh when you have to like
write out this uh sequence of steps so it turns out there's other ways to do
this but first let's see logically how else you might Express this because it's a
little weird that we keep using zero so the one other way to do this would be
to invert the logic you could absolutely start with your variable call it I equal
to one and then you ask the question is
I less than or equal to three and notice a bit of new syntax here on your uh
typical keyboard there is no less than or equal sign or greater than or equal
sign like you would write in math class with one over the other and so in C
you use two characters less than followed by an equal sign or if appropriate
greater than followed by an equal sign and that logically captures that idea
so notice that I'm kind of changing my questions I'm initializing I to one and
then I'm going to increment It ultimately to two
and then three but because I'm doing less than or equal to it's still going to
go from 1 to three so that works too we could similarly do this yet another
way we could initialize I to zero and then we could say well I is less than
three and keep incrementing it and I showed this last form is actually the
most canonical like it might be the most humanlike to think in terms of 1 to
three it might be the most stress ball like to think in terms of three to zero
counting down but Ty typically the go-to
Syntax for most programmers once you get comfortable counting from zero
is to always start counting from zero and count up to less than the value
you're counting up to so it would be incorrect why to change this to less than
or equal to three here what would happen if I Chang the less than to less
than or equal to it'll me out twice yeah it'll out an extra a fourth time in fact
total right because you'll start at zero then one then two then three and less
than or equal to three sorry three will give you
the fourth time so we do want it indeed to be just a single less than all right
so now that we have those options let me just give you one other and this
one takes a little more getting used to as well but it's probably the more
common way to write this let me go ahead and propose that we implement
this as follows let me go back to my code here let me go into my several
printfs getting rid of all but one of them ultimately and let's implement this in
code so let's do in I get that's zero
how about then while I is less than three then let's go ahead and say print F
quote unquote meow uh mellow meow back sln and then we have to do I
minus minus or plus plus so plus plus because we're starting at zero and
going up two but not through three so let me go ahead now and make meow
after clearing my terminal SL meow and it's still just as correct but it's a little
more uh it's a little better designed why because now if I want to change it
from three to 30 times for instance I can change it there I can
recompile my code I can do/ meow and done I don't have to copy and paste
it 27 more times to get that effect and I can even change what the word is by
changing it in just one location but it turns out there's other ways to do this
too and let me propose that we introduce you to what's called a for loop as
well so if you want to repeat something three times you can absolutely take
the while Loop approach that we just saw or you can do this and this one's a
little takes a little more getting used to but
it kind of consolidates into one line all of the same logic so notice we have
the keyword four here and four is just a a preposition in this case that
generally implies here comes a loop inside of parenthesis here is not just a
Boolean expression and this where things get a little weird there's three
things to the left of the semicolon in the middle of the two semicolons and to
the right of the semicolon this is really the only other context we'll see
semicolons and it's weird normally it's
been at the end of the line now it's two of them in the middle of the line but
this is the way humans decided years ago to do it so what is this doing
almost the same thing it is going to initialize a variable called I to zero it's
going to then check if it's less than three it's then going to do what's ever in
the curly braces is and it's lastly going to increment I and repeat so just
highlighting in turn at first I is initialized to zero just like before then this
condition is checked this is a
Boolean expression yes or no true or false will be its answer and if I is less
than three which it should be once it starts at zero well then we're going to
go ahead and print out meow then I is going to get incremented so it starts
at zero it goes now to one at that point the Boolean expression is checked
again so you don't keep changing I back to zero that first step happens only
once but now you repeat through those three other highlights I check if I is
less than three it is so I print out meow it
then increments I I check if I now two is less than three it is I print out meow I
gets incremented I now check is I less than three no it's not because three is
not less than three and so the whole thing stops and whatever code is below
this curly brace if any starts executing instead just like in scratch you break
out of the loop and the the puzzle piece being hugged questions then about
this alternative Syntax for Loops AKA a four Loop sorry say again yeah can I
explain again why it doesn't reset to zero honestly just
because like this was the syntax they chose this first part before the first
semicolon is only executed once just because that's how it's designed
everything else Cycles again and again and this is just an alternative syntax
to to using the slightly more lines of code it was like six lines of code using
the while loop logically it's the same thing programmers once they get more
comfortable tend to prefer this because it just expresses all your same
thoughts more succinctly that's all yeah okay so
let's just work this into my meow example let me go back to the code here
and notice indeed if I highlight all these lines I think we can tighten this up
let me get rid of all of those and instead do four in I equals 0 and I'm saying
equals most programmers would say gets so in I gets zero means
assignment the word get now I'm going to do I is less than three i+ plus now
in here I'm going to do my print F quote unquote meow back sln and so it's
indeed a little tighter I mean two of the lines
are just curly braces there's really only two juicy lines of code now let me go
ahead and do make meow meow and again we're back in business with three
of them printing only all right there's one last structure we should explore
just cuz it's sometimes useful this was a forever block and this would be a
little weird in scratch to just say meow Forever at least without waiting but
there is indeed a forever Block in scratch which means do the following
forever and I proposed I think verbally last week's at
least one example where this is useful meowing forever little annoying but
can you think of common cases where you might want to write code or use a
program that Loops forever yeah yeah playing music like Spotify playlist just
repeating again and again would be some kind of loop for collisions checking
for collisions and scratch so seeing if something's bouncing off the wall or
another Sprite yeah checking for input so yeah get string is essentially just
waiting there forever for me to type in some input
until I do the time checking the time and actually maintaining like human
time like a wall clock behind you was that the same okay checking the time
and one more detecting a key press too like in scratch just waiting for some
kind of event to happen just like on a phone or a browser and so there's so
many examples where you might want to do something forever just so
you've seen the corresponding C building block it's a little weird but this is
probably the most canonical way to do it in C if you
want to print meow forever which would be a little crazy because it literally
print and take over your computer printing forever meow you would
generally do it like this why well a while loop expects in parentheses a
Boolean expression and a Boolean expression is again a yes no a true false
question but if you want the answer to that question always to be yes or
really always to be true turns out in C in a lot of languages well then just say
true because true T R is never going to change magically to false I mean it's
was zero and one nowadays you could say true or false but true and false are
themselves special words that you have to include and it turns out if you
want to use special Boolean values like this there's another header file we
haven't seen called standard bu that essentially creates true and false as
keywords alternatively cs50 includes that same file so it's more common in
cs50 to see it like this now if I clear my terminal window and do make meow
and then do/ meow and hit enter well unfortunately
this isn't the best thing to do uh infinitely when you're in the cloud using a
browser this is indeed a a browser full screened here um this means I'm
sending millions of meows over the internet to my computer here uh so this
will happen to you at some point probably not with meow but you'll lose
control over your terminal window why because you screwed up and like you
have an infinite Loop you didn't really intend it or maybe you did you were
curious to see what happens what do you
do like when when does the meowing stop what recourse do we have here all
right well control c will be your friend sometimes you have to hit it a bunch in
a cloud environment but control C for cancel with will interrupt a program
that's running and I promise that almost all of you will at some point
accidentally introduce an infinite Loop because you're math is slightly off
when in doubt click in the terminal window and hit contrl C sometimes
multiple times and that will indeed cancel whatever is happening there in
this case
I might have intended it but sometimes it's not in fact intended all right so
we've been taking for granted this whole graphical user interface for some
time and indeed uh the uh commands that I'm typing and the buttons I'm
clicking and let me just give you a better sense of what it is we are using
underneath the hood this whole time um namely an operating system called
Linux so I keep alluding verbally of course to Macs and PCs because almost
all of us are running Mac OS or Windows on our desktops or
laptops nowadays but there's lots of other operating systems out there and
one of the most popular one is called Linux and Linux is very often used on
servers nowadays companies that host email companies that host websites
or apps more generally um certain computer scientists or computer science
students often like to brag that they run Linux just because that's a thing um
but it is really just an alternative to Mac OS or windows that provides you
with both a guey if you want it but also and
especially a command line environment now fun fact Windows and Mac OS
do have terminal windows or the equivalent thereof and eventually you
might uh use it on your own Mac or PC to solve some problem but Linux is
really known for along with other operating systems it's command line
environment which again I distinguished earlier from Guy as a command line
interface or CLI and that refers really to the terminal window so if I go back
to VSS code here and let me in fact go ahead and close my tab and
focus entirely on the terminal window this terminal window is really just your
command line interface to your very own server in the cloud the term of art
here is you each will have your own container in the cloud which is like your
own computer running somewhere on the internet with your own username
and password to which you have access and your own hard drive if you will
your own home folder that has all of your files for the class and it's only
accessible to you unless you enable live sharing
thereof so when you're typing commands here it looks like you're typing
them of course on your own Mac or PC but they're actually being sent over
the browser to uh some server in the cloud where you are controlling really
your own account therein so it turns out that there are other commands that
are worth knowing and we'll give you just a few of these today and over the
coming weeks will you have opport ities to play with others as well but these
are kind of some of the basics and they're all incredibly
succinct because indeed for things you're typing at the command line
humans generally have not wanted to type out long commands so a lot of
these are abbreviations here now perhaps the most common one I'll start
with first is LS a lowercase l and a lowercase s that stands for succinctly list
so if I go to my terminal window now where up until now I've only typed code
which is a vs code thing for creating and opening files and make which
triggers the compil ation of my code what if I now type LS
this will list all of the files in my current folder my hard drive in the cloud if
you will so if I hit enter you'll see a whole bunch of results now they're color-
coded too the white ones here and in. c those are the source code files I've
written during class today agree. C compare. C hello.c and meow. C and you
can perhaps guess the green ones here that just by convention have an
asterisk on the end to denote that they're special represent what what are
the four others yeah yeah the machine code so those are
my actual programs that are identically named minus the C extension and
the asterisk means that they're executable that is in the world of Mac OS or
Windows you would double click but in the world of a command line
environment that means you do dot slash and then the name without the
asterisk to execute or run the code therein so if I open up my file explorer
and I'm hitting command B on my computer here just as a keyboard shortcut
you'll see the exact same thing so LS is the command line interface for
listing the files in your account but here because I'm using vs code or any
program like it I also get a graphical user interface as well so it's just two
different places to be you're welcome to use whatever you're comfortable
with but over time will you naturally get more comfortable and capable with
the terminal window alone well what else is on this this list here well during
the break I saw that at least one of you for instance had created a file called
hello instead of hello. so you were in a
situation where you did this accidentally and hit enter and then you went
ahead and typed in all of your code like this and then down in your terminal
window you were trying to do make hello enter and this now didn't actually
do anything like I can't I'm hitting um I'm trying to run the command I got
permission denied as at least one of you did now why is that well let's just do
a quick check if I do LS I see now hello but hello has no asteris next to it
which means it's not executable that's
my code why well notice the top of my tab confirms oh I screwed up I didn't
name my file hello.c which it just has to be so what do you do well you could
very hackish like copy this create a new file paste it in or no no no like we
know how to rename things now here because that's one of our options let
me do this let me do MV for move hello and then hello.c and hit enter you'll
see the tab closes cuz hello no longer exists but if I now now type LS you'll
see ah there is hello.c and if I open
that file now there's all of my same code and now if I do make hello make
hello now I do get an executable file wherein the world is restored so MV is
just a command not just for renaming but it also turns out eventually for
moving files as well you can also create directories or folders so for instance
if I go into vs code again and suppose I hover over here and click not on the
plus file icon but plus folder I can create a folder called for instance like pset
one for problem set one in the
class and you'll see now that it's empty CU all of my other files are in the
default folder of my account but I could also go in there like this and I could
click on file and now I can create a new file called like mario.c which is uh
one of the first problems for instance but you'll notice now that mario.c is
inside of the pet one folder so if I zoom out and I type LS at my terminal
window I won't see mario. C anywhere but I do see a pet one folder and it's
in light blue followed by a slash which you don't
have to type it just indicates that's a folder now I can visually at top left
obviously see pet one contains mario.c but if I try to do something like make
Mario here no rule to make Target Mario like it just doesn't seem to exist and
that's because you're in the wrong directory so in a command line interface
it's not quite as simple as just clicking on a folder and voila it opens you have
to change into the directory or folder and CD is going to be the command
there so if I want to actually change
into that directory I can do CD space pet one enter and now you'll see my
prompt changes and this is just a common convention but it's not the only
one out there now I still have a dollar sign which indicates where I can type
commands but before it I see a reminder constantly what folder I'm in and
we put that there deliberately like a lot of Linux users do just to remind
themselves where they are cuz unlike Mac OS where you or Windows where
you have a nice big window telling you where you are at the
command line you kind of need to be reminded textually but now if I type LS
and hit enter what should I see yeah mario.c and now if I want to open it uh if
I want to uh actually compile it I can run make Mario in this directory once I
actually type out all the code rest assured that in problem sets in Labs we
almost always certainly in the first weeks of the class give you exactly the
commands to type odds are because it's new to many of you view you will
accidentally type the wrong commands no big deal just remember that
you have different ways to solve these problems you've got like the graphical
file explorer which should feel a little more familiar but in time you'll start to
know and honestly probably prefer commands like these so CD for change
directory CP for copy a file uh LS for list MK dirt to make a directory create a
new folder at the command line instead of with the button MV for move or
rename RM for remove so be careful with that one RM dur remove directory
and there's dozens hundreds of other commands you
won't need many of them but we'll start to scratch the surface all the more
over time but ultimately this command line interface is going to be a more
powerful mechanism a more capable mechanism and ultimately a more
efficient mechanism for writing code running commands uh solving problems
analyzing data more generally even though no there's going to be some uh
some Growing Pains early on just because it's probably so new for many of
you so with that said we have some problems still to solve but we
how you go about approaching it when it's not obvious what the point of the
exercise is so one of my favorite games from yester year is this one here
Super Mario Brothers that has come in so many different forms since but in
this original uh two-dimensional Sid scroller game there was a lot of artwork
like this so for instance up here in the sky were four question marks and we'll
find that in C in a lot of programming languages initially it's a lot easier a lot
more accessible to focus really on
with in problem set one indeed in problem set one you'll be challenged to
build a little something like this albeit with hashtags uh for asky art instead
of graphics and in mario.c I want to just solve this simple problem first so it's
all involving output so I'll do include standard i.h so I can use printf I'll do my
int main void more on why we keep doing that in future weeks and I'm just
going to do something simple initially like 1 2 3 4 back sln this is about the
simplest way I can
Implement four question marks in the sky like these here using pure text like
this so let me go ahead and do make Mario / Mario and voila we have those
four question marks but we've seen of course that there are better ways to
do this and if you wanted to generalize this to be five question marks six 60
different question marks you know Loop was always the answer for not
repeating ourselves so maybe I should rewrite this a little bit more flexibly
and say something like this four in I gets zero
I less than four I plus plus and then inside of the for Loop now I can just do a
single question mark But I don't think what I've just done is correct anyone
spot the aesthetic bug already yeah why is this why is this wrong if I want to
print the same thing [Music] yeah yeah so I don't think I want to backs slash
in after every question mark because the goal is again this like row of
question marks in the sky so if I now recompile this make Mario Mario okay
it's almost there but
now I have that regression to where the dollar sign's not on its own line so I
think I need a new line but I don't think I want it here cuz that was not going
to end well where do I want it instead any Instinct yeah yeah so outside the
for Loop so indeed I can just go below line eight and above line nine creating
a new one and now there's it's totally fine to just print a new line like that
you don't have to print anything else with it it's indeed a character unto itself
so let's do make Mario one last time/ Mario okay so now we're back in
business there well what if we wanted to do some other scene from Mario uh
such as this one here where there's a lot of vertical obstacles like the These
Bricks here if I wanted to print out now a column of three bricks and I'll use
hashtags for these instead of anything graphical well I think we're almost
there right like I think I can now it's almost maybe a little easier I can go back
here change the question mark to something that
looks more like a brick like this hash symbol and I think now I do want the
new line character because when I now do make Mario Mario okay there's my
wall of four oh but wait I didn't want four I wanted to be consistent just with
this particular scene here so I just want three so I can still change it in one
place and here again is that Paradigm even whether you're using four or
three if you get into the habit of starting counting from zero you go on up to
but not through the value you want to count
up to so that's why I'm using less than instead of less than or equals to there
so this would be the common Paradigm though you could certainly count it
like we saw earlier in different ways but what if things escalate one level
further and when you're in the underground version of Super Mario Brothers
there's a lot of these underground obstructions including like grids of bricks
like this and let me conjecture that if you slice this up it's roughly a 3X3 grid
of bricks that
all interlock uh prettily to give us just one big large brick like this so if I want
to print out a 3X3 grid now things are getting a little more interesting
because up until now I printed either one row horizontally or one column
vertically but we haven't really seen any code where I'm sort of printing or
living in two different dimensions like the game would imply but let me
propose that we could do this let me go ahead and say all right suppose I
want to print a 3X3 grid of bricks it's
really that I want to print what three rows of bricks like a grid is three rows so
if I take the highle idea and reduce it to something a little simpler how do I
do that well let me get rid of the print for a moment as I did and let me just
stipulate that this for Loop even though it doesn't do anything useful yet will
do something how many times just by Design all right three times right this
for Loop is good to go it will do something three times by just using I to do
the counting all right well if I want
to print out now a row of three bricks all on the same line that's pretty similar
to what we did earlier when I just wanted to print out four question marks in
the sky so we've kind of seen a solution there and I dare say we can
compose one into the other so if I want to print out a row of bricks I could
just do this four in I gets zero I less than three I ++ and then inside of this
inner loop if you will let me print out a single brick like this and then I I don't
like where this is going yet but I
think I've taken two ideas and I've combined them but what might be
problematic about lines five and seven at the moment what might be bad
here uh yeah and back yeah I'm using the same integer ey which I feel like
could get me into trouble right if I'm sort of trying to count three things here
but then I'm hijacking this variable and using it inside of the loop like I feel
like I should avoid this this Collision of names and so what's a good
alternative to I well a programmer if nesting Loops
in this way would pretty commonly go with J you could certainly change this
to be like rows and columns if you want more descriptive variables but I and J
is pretty canonical so I'm going to go ahead and do this j++ instead of i++
everywhere and let me try compiling this so make Mario enter Mario okay so
a couple of things are wrong here this is not a 3X3 grid but if you count these
things how many did I indeed print at least can probably just guess logically
yeah there's nine hashes there
unfortunately they're all on the same line instead of on three different lines
so where logically can I fix this I'm definitely printing all the bricks they're
just not on the right levels yeah yeah so put a new line after the first Loop
this inner loop if you will the nested Loop if you will so let me go ahead and
print out just a back slash n here and what's this doing well I think that's
going to solve it by just moving the cursor to the next line after you've done
one row so let me go ahead and do
make Mario enter SL Mario and now we're in business so it's a very simplistic
version of the same graphic but I'm leveraging two different ideas now the
same or the same idea twice rather now I'm using one Loop to kind of control
my cursor going row by row by row but then within that Loop I'm doing left to
right do do dot dot dot with printing out each of these individual bricks like
this now there's a little sloppiness here still like if I want this to always be a
square just because that's what it looks
like in the game well I could change it to be a 4x4 uh Square by doing this or
a 5x5 grid whoops by doing this why is this perhaps not the best design to
just keep changing the numbers when I want to change the size where could
this go AR yeah yeah if it's always going to be a square and height is going
to be the same as width I'm just inviting trouble here right eventually I'm
going to screw up I'm going to change one but not the other then it's going
to come out to be a rectangle instead of a proper Square
so I should probably solve this a little differently so let me do that at the top
of my main function here let me go ahead and give myself a variable called
maybe uh n for the number of bricks I want horizontally and vertically and I'll
just initialize that to three initially and instead of putting three here I'll
literally just use n but I'll do it in both places so that now henceforth if I ever
want to change this and change it to four or five or anything else like I'm all
done like it's better designed
instead of just declaring a simple variable like we did in scratch I can further
Harden my code so to speak by declaring it to be a constant using the
keyword const now this is just a feature of c and some other languages to
protect you against Yourself by proactively saying N is a constant specifically
the number five or previously the number three you cannot accidentally
write code elsewhere that changes it the computer will throw an error and
catch that error so it's just a way of programming a
little more defensively um some languages have this some languages don't
but in general it's a good practice it makes your code better designed
because it just is less vulnerable to Mistakes by you colleagues or anyone
else using the code so let me change this back to three just to be our default
but now I'm using n in both places and if I do make mario. Mario we're back
to where we originally started but the code's a little more better design and
let me note this too all this time I've been
mentioning that uh there's correctness is important design is important there
was also this matter of style I've been very deliberately writing pretty code if
you will not just the syntax highlighting which is automatic but notice that I
keep indenting everything nicely anytime I have curly braces like on lines 4
and 14 everything is indented one level when I have additional curly braces
on line 7 and 13 everything is nicely indented uh as well technically speaking
the computer does not care
about that kind of white space so to speak and you could really make a mess
of things like this because you have a strange sense of style or just because
you're being a little sloppy but this code is actually still correct if I recompile
it let me open up my terminal window make uh Mario no errors Mario it works
perfectly fine but you can imagine just how annoying this now is to read like
certainly for a TA U but certainly for you the next day certainly for a
colleague who has to read your code this
is just bad style like it still works and it's welld designed in that like you're uh
writing code defensively you're using a constant but my God the style is
atrocious now you'll often find that there's tools that can help you format
your code for you in a manner consistent with a courses or a company's uh
style but this is the kind of muscle memory you'll want to develop over time
to take vs code suggestions as it's outputting lines of code for you because
it's trying to format your code in a
readable way and oh my God if and when you do have bugs in your code and
things aren't even indented properly there's no way you the human are going
to be able to wrap your mind around what's happening and where like you're
just making the problem harder for yourself so do get into this habit too of
manifesting good style as well all right well let me propose that we don't only
want a 3X3 grid we want this to be a little more Dynamic so suppose we
moved away from a constant to just using an
integer called n and let's ask the user for the size of this grid as by
prompting them with get int as we've done before and I'll store it in N here
and then I can go ahead and more dynamically run make Mario to compile it
whoops oh I screwed up accidentally what is it in suggesting I do albeit
cryptically yeah I forgot to include the cs50 header file up top and that's why
it doesn't know that get int is in fact valid so that's an easy fix U I'm just
going to go up here and include
cs50.h now I'm going to clear my terminal and rerun make Mario now we're
good Mario and now notice I'm prompted for size so if I type in three it's the
same as before if I type in 10 it's even bigger but it happens all now
automatically but there are some things that we're not detecting for instance
suppose I type in cat well that's handled by the get it function as I claimed
earlier that's one of the features of using a library you don't have to deal
with erroneous input but we
only designed a function called get int to get you an integer we don't know if
you want it to be positive negative zero or some combination thereof and it's
kind of weird to allow the user to type in like negative one for the size of the
Grid or you know ne3 for the size of the grid and indeed your code does
nothing so at least it's not crashing but that's kind of stupid right like it'd be
nice to force the user if they want a grid to give us a positive value so how
could we
do this well I could go up here and I could say something like if n is less than
one so if it's zero or negative which I don't want what could I do well I could
say well prompt the user again for the size and now notice I'm not declaring
and again because once it exists you don't have to mention the data type
again we said that earlier but this is kind of stupid why because now when
You' given the user a second chance okay now maybe I'll do all right if this
version of n is less than one well let's
just go and prompt the user a third time I mean you can kind of see where
this is stupidly going like this can't be the right solution to keep typing
recursively the same thing again and again like where would it stop you'd
have to give them a finite number of chances or just you know make a mess
of your code so what would be intuitively a better solution here yeah so
some kind of loop we've seen a while loop we've seen a four Loop so maybe
one of those so let me try this let me delete this messiness and just go
back to the first question and let me do this so while n is less than one so
while the number is not what we want let's just prompt the user in a loop this
time for the size again now here too this is better because it's only two
requests for information but clearly line six and N are pretty much identical
other than the int and if I went in and changed the size you know uh if I add
this if I change the wording here change it to a different language like I have
to change it in two places that's bad
copy paste bad so what might be better well it turns out there's another
Paradigm in C that you can use that gets around this problem this duplication
of code it would be much nicer if I just write this code once and I can do that
using a third type of loop called a do while loop so it turns out in C you can
do this if you want to get the value of a variable like n first just decree create
the variable without an initial value so int n semicolon means it we don't
know what value it has yes but
that's okay we're going to add a value to it eventually then I'm going to say
this do literally I'm going to open my curly braces and what do I want to do I
want to assign to n the return value of get in prompting the user for size well
when do you want to do that I want to do that while n is less than one and
this code now achieves the exact same goal but by never repeating myself
why well notice on these lines of code now I'm literally saying on line six give
me a variable called n of type integer it
doesn't have a value initially but that's fine you can do that line seven says
do the following what do you want to do get int prompting the user with the
word size and just store that value in N but because code runs top to bottom
left to right now it's reasonable on line 11 to ask that question okay is the
current value of n which it definitely got on line a less than one and if the
user didn't cooperate they typed in zero or Nega 1 or3 what's going to
happen it's going to go back up here and repeat
repeat repeat everything in the do while loop so a do while loop in C which is
not something some other languages have like python if you know it does
not have a do while loop this is perhaps the cleanest way to achieve this
even though it's a little weird that you have to declare your variable create
your variable up top and then check it down below but other wise it's similar
to a while loop it just flips the order in which you're asking the question any
questions on this construct and Doh in
general is super useful when you want to get input from the user and make
sure it meets certain requirements all right so now that we have this building
block after that interlude how can I go about cleaning up this code and then
let's conclude by taking a look at things that our code can't do or can't do
very well or correctly let me propose that in a final version of Mario let me
just add what are called now some comments so it turns out in code in C you
can Define what are called comments which are just
notes to self some of you discovered these in scratch there's little yellow
sticky notes you can use to add citations or explanations in C there's a
couple of ways to write comments and in general comments are notes for
yourself for your ta for your colleague as to what your code is doing and why
or how it's a little explanatory note in English or whatever your human
language might be so for instance what I might do here in my
implementation of this version of Mario I might first ask
myself a question like I I might first make a note to S like this on a new line
above this first block of code uh get size of Grid it's just an explanatory
remark in any tur English that generally explains the next six or so lines the
next chunk or block of code if you will it would be a little excessive to
comment every single line at some point the programmer should know what
individual lines of code do but it's nice to be able to kind kind of glance at
this comment on line six that starts
with two slashes and it gets gray out because of syntax highlighting it's not
logic it's just a note to self it generally gives me a little cheat sheet as to
what the following lines of code should be doing and or why and then down
here well there's a second block of code that's a bunch of lines but together
this just what prints uh grid of bricks and so it's another comment to myself
that just makes it a little more understandable what these 20 some odd lines
of code are doing by adding some
English explanations thereof but now that I have these you know wouldn't it
be nice if I could kind of abstract these pieces of functionality away this
getting of the size and this printing of the Grid in other words suppose that
you didn't know where to begin with this problem and the problem at hand
were literally Implement a program that prints a grid of bricks of some
variable size three or four or five or whatever the human types in if you have
really no idea where to start comments are
massive placeholders there like I still have work to be done but at least I
have a highle solution to the problem in comments and now I can even go
this far I could say well let's suppose that there's just a function already that
exists called get size I could do something like this I could do Inn equals get
size and now I just have to assume for the moment that some abstraction
called get size exists it doesn't this does not come with the cs-50 library but I
could invent it I bet how else might I proceed well let's
just assume for the moment that there's also a function called print grid that
just prints that a grid of that size n so here too is an abstraction these puzzle
pieces don't exist these functions don't yet exist but in C just like in scratch I
can create my own functions how do I do that well let me go down later in
the file and by convention you generally want to leave main at the top of
your code why because it's the main function and it's just where the human
eye is going to look to
see what some file of code does and let me do this I want to create a
function of my own called get size whose purpose in life is to get the size
that the user wants I want this function to return an integer and the Syntax
for doing that is this write similar to a variable the data type that this
function returns I don't need this function to take any inputs and so I'm going
to use a new keyword that we've actually been using thus far more on it
another time just called void which just means this get
size function does not take any inputs it does have an output it outputs an
INT and this is just a weird order in which you write it you write the output
format the name of the function and then the inputs if any inside of
parentheses and now I can Implement get size but I've already implemented
get size or at least now at this point in the story I at least know concretely
what to do and I could figure out eventually with some trial and error perhaps
all right if I declare a variable and I do the
following n equals get in prompting the user for size and I keep doing that
while n is less than one once that block of code is done here is a new
keyword in C where you can return that value n so I keep referring to these
values that some functions return as return values in C there's literally a
keyword called return that will hand back to any function that uses that
function the value in question so in a nutshell between lines 15 and 21 now
here is some code identical to our solution earlier
that gets a value n from the user that is positive it's one or two or higher it's
not zero or it's not less than one and as soon as we've got that value we
hand it back as a return value notice how I'm using this function on line
7even just like with get int just like with get string I'm calling the function
nothing in the parenthesis in this case but then I'm using the assignment
operator to copy whatever its return value is into my variable n and so now I
have a function that didn't used to
exist called get size that gets me a positive integer no matter what and now
for the grid how do I do this how do I invent a function called print grid that
takes a single argument a number and prints a grid of that size well let's go
down here I'm going to write the name of this function print grid this function
just needs to print it has a side effect as we keep saying so I'm just going to
say it has no return value it's just void it doesn't have an output per se it's
just an aesthetic side effect but
it does take an an argument an argument is an input and the Syntax for this
in C is to name the type of the input it takes and the name of the variable
and I could call this anything I want I'll call it size I could call it n and it's okay
to use the same variable in different functions but I'll call it size just to be
distinct and then in this function I'm just going to copy from memory the
same code as before for in I get zero I less than size instead of three I ++
inside of this four uh int J gets zero J
is less than size j++ and inside of that print print out with print f a single
hash print out after that Loop a single new line and that's it now I did this
fast admittedly but it's the same code that I wrote earlier but now just like I
did with scratch let me just arbitrarily hit enter a bunch of times to like move
the code out of sight out of mind now I have abstractions I have puzzle
pieces that now exist called get size and print grid Syntax for which takes
some getting used to but they now just exist except I
I could do this I could all right fine well let me just kind of highlight all of this
cut with to my clipboard and paste it up here this would solve the problem I
could just move all of those functions at the top of my file that's kind of
annoying because now main is like the bottom of the file you're not it's going
to take longer to find it it's just that's not a clean solution so let me put it
back where it was at the bottom and let me do this this is the only time in
cs50 and really in C
programming where copy paste is reasonable if you copy and paste the first
line of code from each function and then end it with a semicolon you can
tease the compiler by giving it just enough of a hint at the top of the file that
okay these functions don't exist till down later but here's a hint that they will
exist this is how you can uh convince the compiler to trust you so those other
functions can still be lower in the file below main but now when I do make
Mario oh damn it oh I said print
instead of print F that's my bad print f so if I do make Mario Mario now I can
type in three and we're back in business now this was a very heavy-handed
way and long way to get to a much more complicated solution but this
solution in some sense is better designed why because now especially
without the comments I mean look how short my code is my main function is
literally two lines of code why well I kind of factored out the juicy stuff into its
own functions and now especially if I'm working with
we go ahead and use these in a very simple program and make our very own
calculator so let me go over here here to vs code let me go ahead and create
a new file called calculator. C and in this file let's go ahead and first include a
couple of now familiar header files cs50.h as well as standard i.h let's go
ahead then and declare main with int main void and then inside of main let's
do something relatively simple let's declare an INT and call it X and set it
equal to whatever the return value is of
get int prompting the user for a value for x let's then give ourselves a second
variable we'll call it say y set that equal to the return value of another call to
get int prompting the user this time for that value Y and then let's very
simply go ahead at the very end and just print out say the sum of X Plus y a
super simple calculator so I'll use print F quote unquote percent I for integer
back sln to give me the new line then I'm going to go ahead and do x + y to
indeed print out the sum let me go
same for y and of course now the answer of 2 billion plus 2 billion should
have of course be 4 billion and yet it's not so curiously we see of all things a
negative number here which suggests that somehow the plus operator
doesn't quite work as well as we might like now why might this actually be
well it turns out that inside of your computer is of course memory or Ram
random access memory and depending on the size of your computer and the
type of computer it might very well look a little something
like this a little circuit board with these black little modules on it that actually
contain all of the btes of your computer's memory unfortunately you and I
only have a finite amount of this memory inside of our computers which
means no matter how high we want to count there's ultimately going to be a
limitation on high how we can count because we only have a finite amount of
memory we don't have an infinite number of zeros and ones to play with we
have to actually be bounded ultimately so
what's the implication of this well it turns out that computers typically use as
many as 32 bits zeros or ones to represent something like an integer or in C
an INT so for instance the smallest number we could represent using 32 ins
of course using 32 bits of course would be zero 32 zeros like this here and
the biggest number we could represent is by changing all of those zeros to
ones which in this case will ideally give us a number that equals roughly 4
billion in total it's actually 4 billion 294 mil
967 295 maximally if you set all 32 of those bits to ones and then do out the
actual math the catch though is that we humans and Compu in general also
sometimes want to and need to be able to represent negative numbers so if
you want to represent negative numbers as well as positive numbers in zero
you can't really just start counting at zero and go all the way up to roughly
four billion you got to kind of split the difference and maybe allocate half of
those patterns of zeros and ones to negative numbers and the other half
convey where we'd like to uh put an additional bit ultimately if this of course
is zero per week zero discussion this is 1 2 3 4 5 6 7 now ideally in binary if
you want to add one more to this value seven you're going to have to carry
the one mathematically and that would ideally give you 1 0 0 0 but if you
don't have four bits and your computer's only sophisticated enough to have
three bits not even 32 but three the implication is that you're effectively
representing not one0 0 but
rather 0 z0 there's just no room to store that fourth bit that I've gray out here
which is to say that your integer might overflow and as soon as you get to
seven the next number once you add one is actually going to be zero or
worse as we've seen here in my code a negative value instead so what could
we do to perhaps address this kind of concern well C does not have just
integers or ins it also has Longs which as the name suggest or just longer
integers which means they have more bits available to
them so let me go back into my code here I'll clear the terminal window and
let me go ahead and change my integers to literally long here long here I'm
going to have to change my function in uh cs50's library to be not get in but
get long and that's indeed another function we provide in the library let me
change this get in to get long as well I'll keep my variable names the same
but I do need to make one other change it turns out that print F also support
supports other format codes so
not just percent I for integers or percent s for Strings but also for instance
percent Li for a long integer as well as percent f for floating Point values with
decimals so with that said let's go ahead and change my print F line to be
not perc I but percent Li I now let me go ahead and do make calculator again
enter no apparent errors now do/ calulator and 2 + 2 still equals 4 as before
but now if I do calculator again and let's do two billion again as well as 2
billion for y previously we overflowed the size
of an integer and got some weird negative number because the pattern was
misinterpreted if you will as a negative number instead but along instead of
using using 32 bits conventionally uses 64 bits which means we have more
than enough spare bits to go when we add 2 billion plus 2 billion and now in
fact we get the correct answer of four billion which does fit inside of the size
of a long now along can count up quite high and in fact it can count as high
as this nine quintilian and so that will give us quite a bit more Runway but
decimal point so in fact let me go back to vs code here I'll clear my terminal
window and let's still use Longs but let's go ahead and use division instead of
addition here so let me change this plus to a divide operator let me go ahead
and recompile the code down here with make calculator let me go ahead and
run/ calculator and let me go ahead and do something like 14x and 3 for y
and we'll see that well wait a minute 1 divided 3 I learned should be 13 but in
a floating point value that should be
point0 it should be 0.33333 you know maybe with a little line over in grade
school but really an infinite number of Threes And yet we seem to have lost
even one of those threes after the decimal point because the answer is
coming back here as just zero so why might that be well if I know that two
integers when divided one by the other is supposed to give me a fraction a
floating point value with a decimal point I can't continue to use integers or
even in this case Longs which do not have support for decimal
points so let me go ahead and change this format code here from percent Li I
to percent F which is again going to represent a floating point value instead
of a long integer or even an integer and let me go ahead further and Define
maybe a third variable Z as a float itself so I'll give myself a variable Z equals
x / Y and now rather than print X ided Y let's just go ahead and print Z so
now I'm operating in a world of floating Point values because I know
proactively that an long or an INT divided by
you the program know that you're dealing in a world that's going to give you
floating Point values with decimal points you might very well need to use
what's called a feature known as typ casting that is convert one data type to
another by explicitly telling the compiler that you want to do so now how do I
do this well let's go back to my code here and if the issue fundamentally is
that c is still treating X and Y as integers or technically Longs with no decimal
point and dividing one by the
other therefore has no room so to speak for any numbers after a decimal
point why don't I proactively do this let me using a slightly new syntax with
parenthesis specify that I want to convert X proactively from a long to a float
let me specify proactively that I want to convert y from a long to a float as
well and now let me go ahead and trust that in Z should be the result of
dividing not a long by a long or an INT by an INT but rather a float by a float
Let Me Clear My terminal window run make
calculator again seems to work okay/ calulator and now 1 3 and hopefully
now we actually see that my code has outputed 0.333 333 and I think if we
kept showing more numbers after the decimal point we'd theoretically see as
many of those threes as we want but there is still one more catch and
especially when we're manipulating numbers in this way in a computer using
a finite amount of memory another challenge we might run up against
besides integer overflow besides truncation is this known as floating
let's go ahead and show me 20 decimal point numbers after the decimal
point and the weird Syntax for this is to do not percent F but percent period
to zero to indicate to see that I want to see 20 digits not the default after
now the decimal point let me rerun make calculator let me do dot calculat
again and let's do one let's do three and now this is even weirder right from
grade school you presumably learned that 1 divided 3 is of course 1/3 but
that should be 0.33333 infinitely many times or on
paper with a little line over it but the computer is just doing some weird
approximation here it's a whole bunch of Threes And then 43 267 44 079 590
well what's really happening under the hood well again is this issue of
floating point in Precision if you only have a finite number of bits and in turn
a finite amount of memory the computer can really only be so precise
intuitively alter Converse or equivalently the computer has decided on some
way of representing floating Point values but the catch is per grade school
math there's an infinite number of numbers out there and an infinite number
of floating Point values because you can keep adding more and more digits if
you want so the computer given the way it's implementing these floating
Point values is essentially giving us the closest approximation that it can now
how can we go about improving the situation well there is one alternative
instead of using float I can use something called a double which as the name
suggests uses twice as many bits as a float so instead
of 32 typically it will use 64 and that's just like the difference between a long
and an INT which gave us more bits but in this case this will be used for more
Precision let's go ahead and Cast X to a double let's cast y to a double and
now let's go ahead and using the same format code percent 2f is still okay
for doubles let me do make calculator let me do do slash calculator and now
let me do 1 / 3 and we still have some of that in precision and we'd see even
more of it if we looked at more
than just 20 digits but now we have more threes after the decimal point so
it's at least more more more precise but it's not perfect but it it's at least
more precise so these kinds of issues then are going to be necessary to keep
in mind anytime you do something numerically scientifically at least with a
language like C where you're going to bump up against these real world
limitations of hardware and intern language now later in the semester we'll
transition to a language called Python
and that's actually going to solve at least one of these problems for us but
just automatically giving us more bits so to speak as we need them at least
for integers but even the issue of floating point in Precision is going to
remain now just how real world are these issues well back in the year 1999
we got a taste of this when the world realized in the Years leading up to that
date that it might not have been the best idea to implement computers and
software therein by storing years using just two digits
like instead of storing 1999 to represent the year 1999 a lot of computers for
reasons of space and cost were in the habit of kind of cutting a corner and
just using two digits to keep track of the year the problem with that is that if
systems were not updated by the year 1999 to support the year 2000 2001
and so forth is that just like before with integer overflow some computers
might add one to the year in their memory 99 it should be the year 2000 but
if they're only using two digits to represent years they might
mistake the year as some systems may very well have for the year 1900
instead taking literally a big step backwards if you will now you'd like to think
that kind of issue is behind us especially as we understand all the more
about the limitations of code and Computing but we're actually going to run
up against this very same type of issue again in just a few years on January
19th in the year 2038 we will have run out of bits in most computers right
now to keep track of time it turns out years ago humans
decided to use a 32-bit integer to keep track of how many seconds had
elapsed over time they chose a somewhat arbitrary date in the past January
1st 1970 and they just started counting seconds from there on out and so if a
computer stores some number of seconds that tells the computer how many
seconds have passed since that particular date January 1st 1970
unfortunately using a 32-bit integer as we've seen you can only count so
high at which point you overflow the size of that variable and so potentially if
we don't get ahead of
is week two wherein we're going to take a look at a lower level at how things
work and indeed among the goals of the course isn't this bottom up
understanding so that in a couple of weeks time even a few years time when
you encounter some new technology you'll be able to think back hopefully on
some of this week's and this courses basic building blocks and Primitives and
really just deduce how tomorrow's Technologies work but along the way it's
going to seem it's going to be a little
hard perhaps to see the forest for the tree so to speak and so the goal at the
end of the day still is going to be problem solving and so we thought we'd
begin today with a look at some of the problems we'll talk about or solve this
coming week uh and for that we have some Brave volunteers who have
already come up if we could turn on some dramatic lighting and meet
today's volunteers so on my left here we have hi my name is Alex I'm a first
year at the college and I'm from Chapo North
Carolina Welcome to Alex and to Alex's right um I'm Sarah I'm from Toronto
Canada and I'm also a first year student at the college wonderful well
welcome to both Al and Sarah so one of the problems you'll perhaps solve
this week for problem set two is to analyze the reading level of a body of text
whether someone reads at a first grade level second grade level third grade
level all the way up to 12 or 13 or Beyond but you've perhaps never quite
thought about certainly in terms of code like how you
would analyze some text some book and figure out what reading level is it at
and yet surely our teachers growing up kind of knew or had an intuitive
sense of this so let's consider some sample text for instance Alex what have
you been reading lately um One Fish Two Fish Red F Fish Blue Fish wonderful
so given that what grade level would you say Alex is currently reading at feel
free to just shout it out first first so indeed you'll see this week if you run
your code on Alex's text it actually turns out he reads
below a first grade reading level but but why might that be what might your
intuition be for why we've uh why we've accused Alex of reading at this level
feel free to shout out yeah so very few syllables short words short sentences
and so there's some puristic perhaps we can infer from that short text that
that probably means that it's best for younger children now Sarah by
contrast what have you been reading Mr and Miss dersley of number four
privet Drive were proud to say that
they were perfectly normal thank you very much they were the last people
you'd expect to be involved in anything strange or mysterious ious because
they just didn't hold with much nonsense all right now irrespective of what
grade you were in when you might have read that text what grade level does
Sarah seem to be reading at so eighth grade second grade okay so hearing a
bit of everything so that at least according to code would actually be seventh
grade and what might the intuition there be why is that a higher
grade level even though we might disagree exactly which grade it is comp
yeah so complicated sentences longer sentences so indeed a lot more more
words were being spoken by Sarah because there was so much more there
on the page so we'll translate these ideas this coming week and problem set
two if you tackle this one to code so that you can ultimately infer things of
these quantitatively but to do so we're going to have to understand text so
let's first thank our volunteers and then
we'll dive in to that lower level stress balls sure you can keep those yeah all
right so besides that let's consider one another body of text perhaps that you
might see this week which is namely a little something like this what I have
here on the screen is what we'll start calling today Cipher text it's the result
of encrypting some piece of information and encryption or more generally
the Art and Science of cryptography is all around us it's what you're using on
the web on your phones
with your Banks and anything that tries to keep data secure is using
encryption but there's going to be different levels of encryption strong
encryption weak encryption and what you see here on the screen isn't all
that strong but we'll see later today how we might decrypt this and actually
reveal what the plain text is that corresponds to that Cipher text but in order
to do so we have to start taking off some training wheels so to speak and
believe it or not even though your time with C this past week
for the first time probably might have been rather in the weeds and much
more complicated seemingly than C it turns out that along the way we have
been providing and will continue to provide certain training wheels for
instance the cs50 library is one of them and even some of the explanations
give up topics for now in these early weeks will be somewhat simplified
abstracted away if you will but the goal ultimately is for you to understand
each and every one of those details so that after cs50 you
really can stand on your own and understand and wrap your mind around
any future Technologies as well so let's consider first the very first program
with which we began last week which was this one so hello world and C at
the end of the day it was really the print a function that was doing the
interesting part of the work but there was a lot of technical stuff above and
below it the the curly braces the parentheses words like void and include and
then of course the angled brackets and more but at the
end of the day we needed to convert that source code in C to machine code
the zeros and ones in binary that the computer understood and to do that of
course we ran we compiled the code we ran make and then we were able to
actually run that code there so let me actually go over here to VSS code and
really quickly recreate that hello.c pretty much by transcribing the same so I
have here uh include standard i.h uh int main void and then in here I had
quite simply hello comma world with my back slend quotes and more now
last
time to compile this I indeed ran make hello followed by enter hopefully you
see no errors and that's a good thing and if you do do/ hello you see in fact
the results of that program but it turns out that make is not actually a
compiler as I alluded to last week it's a program that clearly makes your
program but it itself just automates the process of using an actual compiler
and there's lots of different compilers out there and the one that it's actually
using underneath the hood is a little
something called clang for C language and clang is a pretty popular compiler
nowadays there's another one that's been around for ages called GCC but
these are just specific names for types of compilers that different people
different companies different groups have actually created but if you use in
week one a compiler yourself manually you have to know you have to
understand a little more about what's going on because it's even more
cryptic than with just make a loan so in fact let me go
back to my terminal window here let me go ahead and clear the screen a
little bit and just run really the raw compiler command so what make is
automating for me let me actually do this manually for just a moment so if I
want to compile uh hello.c into an executable program I can run I can do this
uh clang space hello C and then enter and now there's no output which is a
good thing in this case no errors but notice this if I go ahead and type LS it
turns out there's a uh a file that's been created suddenly
in my current folder weirdly called [Link] that stands for assembler output
and long story short that's actually the default name of a program that's
created when you just run C by itself now that's a pretty uh bad name for a
program because it doesn't describe what it said does so better would be
here to perhaps do well instead of [Link] which yes still prints hello. world but
isn't really a a clearly named program it'd be nice to name this hello so what
could I do I could do like we learned last week well
type after a command at your prompt in your terminal window that just
modifies the behavior of that command it configures it a little more
specifically so what you're seeing here on the screen is a summary of a
better command with which to run clang so that now I can specify the output
of this command this- o so what do I mean by that well let me go ahead and
clear my terminal window again and more explicitly type clang d o hello
hello. C and then enter nothing again appears to happen but that's a
good thing when you see no errors and now the program I just created is
indeed called hello so it achieves really the same exact effect as make did
but what I don't have to do with make is type and remember something as
long as this command and this too is a bit of a white lie it turns out we have
preconfigured vs code in the cloud for you to also use some other features of
clang that would be even more tedious for you to write yourselves and so
really this is why we distill this as ultimately just running
make so let me pause here to see first if there's any questions on what I've
done by taking my very first program in C and just now compiling it first with
make but then starting over and now manually compiling it with clang with
what we'll call command line arguments - o space hello and then the name
of the file yeah yeah so [Link] is a historical name it refers to assembler
output more on that soon and it's just the default file name that you get
automatically if you just run the compiler on any file so
that you have just a standard name for it but it's not a very well-named
program instead of running Microsoft Word on your Mac or PC it would like be
like double clicking on [Link] so instead with these command line arguments
you can customize the output of clang and call it hello or anything you want
other questions on what I've done here with clang itself the compiler yeah
so- o and you would only know this from reading the manual taking a class
means output so- o means change clangs output to be a file called hello
instead
of the default which is [Link] and this too is again a detail you would have to
uh look it up on a web page read the manual hear someone like me tell you
about it and in fact there's even more than these options but we'll just
scratch the surface here all right so if we now know this what more is
actually happening underneath the hood well let's take a a closer look at not
just this version of my code but my slightly more complicated version last
week which looked a little something like this
wherein I added in some Dynamic input from the user so I could say not Hello
World to everyone but hello David or hello to whoever actually runs this
program so in fact let me go ahead and change my code here in vs code just
to match that same code from last week so no new code yet I'm just going to
in a moment compile it in a slightly different way so I did last week string uh I
think answer equals get string quote unquote what's your name just like in
scratch and then down here instead of
doing world I initially wrote answer but that didn't go well what did I
ultimately do instead to print out hello David or hello so and so yeah sorry a
little louder yeah so percent s the so-called format code that printf just
knows how to deal with and I had to add one other thing someone else
besides percent F yeah the name of the variable that I want to plug into that
placeholder percent s and in this case it's answer now let me make one
refinement only because now we're in week two and we're
going to start writing more lines of code even though scratch called the
return value of the ask puzzle piece answer always and see we have full
control over what our variables are called and now it's probably good not to
just generically always call my variable answer if I'm using get string let's
call it what it is so this is now just a matter of style if you will let me change
the variable to be name just so that it's a little clear to me to you to a TF or
ta exactly what that variable
represents instead of more generically answer all right so that said let me go
down to my terminal window and last week again I ran make to compile this
exact same program now though let me go ahead and just use clang so
clang d o I'll still call this version hello space hello.c so exact same command
as before the only thing that's different is I've added a couple of more lines
of code to get the user's input let me hit enter and now darn it our first error
so output from clang and make is not a good
thing and here we're seeing something particularly cryptic uh so something
in function main undefined reference to get string and then Linker command
failed with exit code one so there's actually a lot of jargon in there that will
tease apart today but my hint is that clearly my problem's in Maine although
that's not surprising because there's nothing else going on here get string is
an issue and the uh issue is that it's an undefined reference and yet notice I
was pretty good I added the cs50 header file
and I said last week that that's enough to teach the compiler that functions
exist but the problem is that even though this does in fact teach clang that
get string exists it is not sufficient information for clang to go find on the
hard drive of the computer the zeros and ones that actually Implement get
string itself so in other words this include line per last week is a little bit of a
hint it's a teaser to CLA that you're about to see and use this function
somewhere but if you actually want to use the zeros and ones
that cs50 wrote some time ago and bake those into your program so your
program actually knows how to get input from the user well then I'm going to
have to go ahead and run a slightly different command so let me do this let
me clear my terminal window just to get rid of that distraction and let me
propose now that we run this command instead almost the same as before
clang - o space hello then hello. C but with one additional command line
argument at the end and this is a-h L not a number one so- L
cs50 with no space in between those two now the L is going to result in all of
those zeros and ones that actually were written by cs50 being linked into
your code your few lines of code or mine here but that's the second step that
the compiler requires in order to know how to actually execute and rather
compile your code and cs50's and cs50 is not the only one that does this if
you use any third-party library in C that doesn't come with the language you
would do- L such and such where whoever however
they've named their own library but you don't have to do it for built-in things
like uh like we've been using thus far all right so let me go ahead and try this
I'll go back to vs code here and let me go ahead now and run clang - o hello
then hello. C and now instead of just hitting enter - L cs50 with no space
between the L and the cs50 enter now nothing bad happens and now I can
do/ hello what's your name I'll type in David enter and now we see hello
David now honestly this is where we're really
getting into the weeds and now this is taking this is really just adding new to
the process of compiling and running your code and so the reality is even
though this is indeed what is happening this is why we used last week and
we're going to continue using this week onward make because it just
automates that whole process for you but it's ideal to understand what's
going wrong because any of the error messages you saw for problem set one
any of the error messages you see for the next few weeks
probably aren't coming from make they're coming from clang underneath the
hood because make is just automating the process but with make you
literally just write make and then the name of the program you don't have to
worry about any of those command line arguments questions then on
compiling with- l cs50 or anything else yeah sorry what is the benefit of what
is the benefit of using clang manually none really in fact all main is doing is
just sa make is doing is saving us some keystrokes um if you prefer
though and you just like to be more in control you can totally run clang
manually if you remember the various command line arguments yeah exp
exactly why did I have to explain that is provide a hint to cs50 with the
cs50.h henter file but I didn't have to do that with standard i.h just because
standard i.h comes with C just like a few other libraries come with C that
we'll start seeing today um cs50 though is not built into C everywhere and so
you do have to explicitly add that one there
yeah a command line argument is a a word or phrase that you type at the
command line AKA your terminal in order to influence the behavior of a
program for whatever you're doing yeah it changes the defaults right in our
guey World graphical user interface you and I would probably click some
boxes we would select some menu options to configure a program to behave
in the same way at a command line interface you have to just say everything
all at once and that's why we have command line arguments
yeah no make is not just for cs50 it's used globally in any project really
nowadays using C C++ even other languages as well in fact most every
command you see in this class unless it has 5 zero at the end of it is globally
used only those suffix with 50 are indeed course specific and even those will
gradually take training wheels off of so that you know exactly what those
commands are doing as well all right so what is it that we've just done
everything we've just done of course I keep calling compiling but let's just go
down one Rabbit Hole so that you understand that when you compile code
there's actually a whole bunch of steps happening and this is going to enable
uh a lot of features like companies can write code and then convert it to run
it on Macs and PCs alike or phones or the like so it's not just a matter of
converting source code to machine code there's actually four steps involved
in what you and I as of last week know as compiling and these aren't terms
that you'll have to keep in mind constantly
because again we're going to abstract a lot of this away but just so we've
gone down the rabbit hole once let's consider each of these four steps that
have been happening for you for a week automatically uh the first of which is
called pre-processing so what is this mean well let's consider that same
program as before notice that a two of the lines of code start with a hash
mark that is a special symbol in C and it's a so-called pre-processor directive
you don't need to memorize terms like that
but it just means that it's a little different from every other line and anything
with a hash symbol here should be pre-processed that is analyzed initially
before anything else happens so let's consider these two lines up top what
exactly is happening well it turns out with these two lines you have two
header files of course cs50.h and standard. i.h where are those files because
they've never been in VSS code for you seemingly if you type LS if you open
up the file explorer in the GUI you
have never seen probably cs50.h or standard i.h they just work but that's
because there's a folder somewhere on the uh the hard drive that you're
using on your Mac or PC or somewhere in the cloud as in our case and inside
of this folder traditionally called sluser SL include and user is deliberately
misspelled it's just slightly more succinct although it's a little weird why we
drop that one letter but user SL include is just a folder on the server that
contains cs50.h standard i.h and a
bunch of other things as well so in fact if you type in uh VSS code in your
terminal window uh when you're using Code spaces in the cloud and type LS
space SL user include you can can see all of the files in that folder but we've
pre-installed all of that stuff for you so let's consider what's actually in those
files here where if I highlight these two lines up top that start with hash
include well I kind of hinted last week that what's in that first file is a hint as
to what
functions cs50 wrote for you so you can kind of think of these include lines as
being temporary placeholders for what's going to become like a global find
and replace that is the first thing clang is going to do it's pre-process this file
it's going to look for any line that starts with hash include and if it sees that
it's going to essentially go into that file like cs50.h and then just copy and
paste the contents of that file magically there for you you don't see it visually
on the screen but it's
happening behind the scenes and so really what's happening with this first
line is that somewhere in cs50.h is the Declaration of get string like we talked
last week and it probably looks a little something like this and we didn't
spend much time on this yet this past week but we will in time more notice
that this is how the a function is declared that is it is decreed to exist the
name of the function of course is get string inside of the parenthesis are its
arguments in this case there's
one argument to get string I claim today but you've known this implicitly and
it's a prompt it's the prompt that the human sees when you use get string
what is that prompt well it's a string of text like quote unquote what's your
name or anything else that I asked last week mean meanwhile get string as
we know from last week has a return value it returns something to you and
that too is a string so again this is also called a functions prototype it's the
thing toward the end of last week that I just
copied and pasted from the bottom of my file to the top just so that it was
like this teaser for clang as to what would exist later so you can think then of
these include lines as just kind of uh combining all of those function
declarations in some separate file called cs50.h so that you yourself don't
have to type them every time you use the library or worse so that you
yourself don't have to copy and paste those lines this is what clang is doing
for you in its first step of pre-processing second
and last in this example what happens when clang pre-processes this second
include line well the only other function we care about in this story is printf of
course which comes with C so essentially you can think of printf's prototype
or Declaration as just being this print f is the name of the function it takes a
string that you want to format like hello comma world or hello comma
percent s and then with dot dot dot this actually has technical meaning it
means of course that you can plug in
zero variables one variable two or 10 so dot dot dot means some number of
variables now we haven't talked about this yet and we won't really in general
print F actually returns a value a number that is an integer but more on that
perhaps another time it's generally not something the programmer tends to
look at but that's all we mean by pre-processing so that the the end of this
process even though there's more lines of code in cs50.h and standard i.h
what's really just happening is that
clang in pre-processing the file copies and pastes the contents of those files
into your code so that now your code knows about everything get string
printf and anything else any questions then on that first step pre-processing
yes good question when you include a file does it only include what you need
or Does it include everything think of it as including everything so if it's a big
file that's a lot of code at the very top and that's why if you think back to all
of the zeros and ones I showed a
little bit ago as well as last week there's a lot of zeros and ones that end up
on the screen as a result of just writing hello world a lot of those zeros and
ones are perhaps coming from code that you didn't actually necessarily need
but some of it is perhaps there but there are ways to optimize that as well all
right so step two of compiling is confusingly called compiling it's just this is
the term that most everyone uses to describe the whole process instead of
just this one step but once a program
has been pre-processed uh behind the scenes by the compiler for you it looks
now a little something like this and I've put dot dot dots just to imply that yes
to your question there's more stuff above it there's more stuff below it it's
just not interesting right now for us so now we have just C code there's no
more pre-processor directives at this point all of the hash symbols and those
lines of code have been pre-processed and convert it to something else and
so now and this is where things get a
little spooky looking uh here now is what happens when clang or any
compiler literally compiles code like this it converts it from this in C to this in
assembly code so this is among the scarier languages I myself don't really
have fond memories this is not language that many people program in if you
take a subsequent class in computer science in systems uh a higher level
class you might actually learn this or some variant there of but there's at
least a few people out there that need to know
this stuff because this is closer to what the computers themselves nowadays
understand like the Intel CPUs or the AMD CPUs the brains of today's
computers and phones understand stuff that looks more like this and less like
C now it's completely uh esoteric but let me just highlight a few phrases
there's some stuff that's a little familiar there is mention of Maine at the top
there in yellow there is mention of get string toward the bottom there is
mention of prf down below so this is just another
compiles your code but of course this still not zeros and ones so we got two
steps to go so when a compiler proceeds to step three this is where things
get converted to machine code and when a compiler assembles your code
for you it converts what we just saw on the screen here to actual zeros and
ones the so-called machine code that your phone or your computer
understands but it's worth noting that these are not necessarily all of the
zeros and ones of your program yes they re uh they are the
zeros and ones that correspond to your hello program or printf and get string
and the like but notice that here we need one final step in those zeros and
ones are only your lines of code but what about cs50's lines of code that we
wrote to Implement get string what about the lines of code that humans
wrote decades ago to implement printf those are somewhere on this hard
drive like on my Mac my PC or somewhere in the cloud but we need to
combine all of those zeros and ones together and Link My code
with cs50's code with uh standard io's code all together and so what happens
in the last step ultimately is that if we have my code here in yellow and then
the code that cs50 wrote and the code that the authors of C itself wrote what
really is happening is that somewhere we have not only hello.c which
obviously I wrote and wrote with us live here there's also let's assume
somewhere on the computer a cs50.c file that coincidentally I and cs50 staff
wrote years ago and also somewhere on the
computer there's another file Let Me oversimplify by just calling it standard
io. C in practice it's probably specifically called print F.C but there's
somewhere these two other files and so this last step called linking takes my
zeros and ones from the code I just wrote namely this code on the screen
here it then grabs the zeros and ones that cs50 wrote and it grabs the zeros
and ones that the authors of C wrote In order to implement the standard IO
library and lastly voila links them
all together and this is the same blob of zeros and ones that we saw earlier
it's just now the result of pre-processing your code compiling your code
assembling your code linking your code and my God it's at this point like if
there were any fun in programming for you yet we've just taken it all away
we just call this whole process compiling why because now that we know
those steps exist and smart people solve that problem for us you and I can
kind of operate at this level of abstraction and
just assume that compiling converts source code to machine code questions
though on any of these intermediate steps yeah a good question so where
are all of these zeros in one store because you and I we've been using a
browser at code. cs50. of course is this web-based user interface but again
recall from last week even though you're using a web browser to access VSS
code that web-based version of vs code is connected to an actual server
somewhere in the cloud and on that server you have
your own account and your own file and really your own hard drive virtually
in the cloud think of it a little like Dropbox or box or Google drive or one drive
or something like that so you have a hard drive somewhere out there that
we've provisioned for you and it's on that hard drive that we have uh your
code that you just wrote or I just wrote cs50.c standard I.C and all of the
other code that implements the math functions and everything else that c
supports good question yeah c good question that uh hash includes
cs50.h line at the top of my code if I just replace that with the contents of
cs50.c would that work short answer yes that would work you could copy all
of the code there however there's some order of operations that might come
into play and so it's probably not quite as simple as copy paste but
conceptually yes that's what what's happening now with that said in cs50.h
are only the prototypes of the functions the hints as to how the functions
look what their return type is what their name is and what their
arguments are it's in the C file that actual code tends to be written and this is
a little confusing now because you and I have only written code in C files but
in the next few weeks you'll actually start writing some of your own files as
well just like cs50 just like standard iio but in essence that line of code just
makes it easier to use and reuse code that's already been written and that's
the whole point of a library I say that little louder yes does linking happen
when you
compile your code yes when you run make as we have been doing the past
week now all four of these steps are happening pre-processing converts the
hash include lines to something else compiling technically converts it to
assembly code which the Mac the PC the server more closely understands
assembly converts that language to Binary machine code that this computer
actually understands and then linking combines everything together and in
fact if you think back a few minutes ago to when I did this- L
cs50 the reason I had to add that and the reason my code did not compile at
first was because I forgot to tell clang to link in cs50's zeros and ones per
that last step I don't need to do- L standard IO because it comes with c so
that would just be tedious for everyone in the world but cs50 does not come
with C so we link that in and to be clear too we won't always use cs50's
Library that'll be yet another pair of training wheels we take off in the coming
weeks but for now it makes a few things
simpler yeah short answer yes so what do the zeros and ones the machine
code translate to yes there is a one toone relationship between the machine
code and the assembly code assembly code it's not really English but at least
it's symbols I recognize it's not zeros and ones machine code of course is just
zeros and ones so back in the day before c existed people were programming
only in assembly code before assembly code existed people were coding in
zeros and ones and you can imagine just how
painful that was and so each of these languages makes life for us sort of
easier and easier in a few weeks we'll transition to python which will in turn
make C even uh simpler or coding in general simpler to do to all right so with
that said what now can we uh what could go wrong with this well it turns out
that besides compiling technically speaking there's decompiling and we've
not done this and we won't do this but it's worth considering for just a
moment uh if you were to not compile your code
but decompile it as the word suggests this just means reversing the process
converting it ideally from machine code zeros and ones maybe back to C now
this would be cool perhaps if all you have is a program you can convert it and
see the actual source code what might a downside be if if anyone on the
Internet is able to decompile code on their machine yeah okay so it's easier
to find bugs in the code that oh to exploit so it might be easier to uh hack
into the software by finding mistakes you and I made
because literally they're staring at you in code worries the zeros and ones
make it way less obvious other downsides of what I call decompiling yeah
yeah yeah if your code your work is your intellectual property copyrighted or
otherwise you know that's kind of obnoxious that someone can just like run a
command and boom they can see the original code that you wrote now it
turns out it's not quite as simple as that and so even though yes you could
take a program like hello or even Microsoft Word and convert it from zeros
the zeros and ones might not know so to speak whether it was a for Loop or
a while loop so maybe decompiling will show you one or the other and
honestly decompiling while possible and it's one way of reverse engineering
someone's product odds are if you're good enough to start reading code
that's been decompiled and reading through the messiness of it odds are you
have the talent probably to just write that same program from scratch
yourself now that's an overstatement perhaps but it's not
engineer it so same kind of idea in the physical world any questions then on
compiling or even decompiling in these forms all right so odds are at this
point not only I but you have made mistakes and you've written buggy code
a bug in a code is just a mistake a logical error or otherwise where the code
just does not behave correctly as you intend and up until now odds are your
debugging techniques have been to maybe look back at what I did in class
maybe ask a question online or in person but
ultimately it'd be nice if you had some tools of your own with which to debug
code and this honestly is a lifelong skill you're going to emerge from cs50
and even 20 years from now you're not going to be writing if you're writing
code at all correct code all of the time like all of us on the staff continue to
write bugs hopefully they get a little more sophisticated and not sort of like
oops I missed a semicolon but even those kinds of mistakes we make too but
there's tools out there and techniques
that can make your life easier when it comes to solving those problems now
the term bug has actually been around for decades but a fun story to tell is
that the first documented actual bug was actually somehow connected to
Harvard in fact this is the log book relating to the Harvard Mark 2 computer
from 1947 whereby if you read the notes here and if I zoom in this was an
actual moth discovered inside of this big Mainframe computer that was
causing some kind of problems and the engineers at the time
actually thought it was funny that wow physical bug actually explains the
issue and it's been forever uh taped to the sheet of paper which I believe
now is on display in the Smithsonian uh with that said this is just represented
two of a logical bug and that story is actually uh that story was often retold
by a famous mathematician then computer scientist really uh Dr Grace
Hopper who actually worked not only on the Harvard Mark 2 computer but its
predecessor the Harvard Mark 1 and if you ever spent
time yet in the engineering building across the river here you can actually
see much of this computer which is along the wall when you first walk into
the science and engineering complex and indeed as you've probably heard
growing up this is a Mainframe computer like this is what Macs and PCs so to
speak looked like back in the day with very physical things that essentially
implemented the zeros and ones that you and I take for granted now being
miniaturized in our laptops and phones so there's a piece of history there if
you visit campus that side of Campus sometime do take a look but let's
consider then how we solve not of course physical bugs but logical bugs and
let's consider something like this from last week whereby we were trying
very simply to print like this uh column of three bricks using hashtags of
sorts so let me go over here in just a moment to VSS code and I'm going to
go ahead and open a program I wrote in advance and I'm bringing it to class
because there's a bug in it and I'd like to figure out how
to solve this bug so let me open up uh buggy z.c which is version zero of my
code and let's just take a quick peek at what's here it's pretty short it
includes only standard i.h it uses printf it uses a for Loop and the goal quite
simply is to print out that column of three bricks now it's short enough that
some of you if you're getting comfy already with see you might already see
The Logical bug it's not a syntax error like it will compile and run but there's
a bug there and suppose that I'm very
new to see I'm very uncomfortable with C it's 2 a.m. and I just can't see the
bug what are my recourses here for actually finding a mistake like this well
first let's look at the symptom let me go down to my terminal window I'm
going to use make buggy zero because again the file is called buggy zero. C
I'm not going to use clang in fact I'm never really going to use clang
manually here and out I'm just going to use make because it makes our lives
easier it does compile no
errors so it's not syntax it's not something silly like a missing semicolon but
when I runbuggy Z I of course see 1 2 3 4 and this of course does not match
the zero the one two three bricks that I actually intended for that column and
yet I'm starting counting at zero as I usually do I've got three I'm going up to
three so where is my logical error if it hasn't obviously jumped out at you
already well how can I solve this well first and foremost perhaps the best
technique for solving bugs at least
early on is just use printf like thus far we've used printf to say hello and other
things on the screen but print def is just a function for printing anything and
there's no reason you can't temporarily use print def to like print out the
contents of variables what's going on inside of your program just to figure
out where your mistake is and then you can delete that line of code later it
doesn't have to stay there forever so let me do this instead of just printing
out in vs code the hash
symbol let me do a little safety check here and print out the value of I so let
me go ahead and say something like I is now I want to say I is this but of
course this is not how I print out the value of I if I want to print out the value
of I what should I put here so percent I for integer instead of percent s for
string so they're still placeholders but we use percent s for integers and now
if I want to print out I I just need the comma as the second argument and
then I all right let me go
a hash well wait a minute that's one two three four so clearly I'm printing it
one too many times so let me look back at the code here by shrinking my
terminal window and let me just ask the group where is in fact the mistake or
what equivalently would be the solution yeah in the middle yeah instead of
less than or equal to use just less than so you got to kind of pick a lane here
like if you're going to start counting from zero you generally use less than
and go up to but not through the value or if you
prefer like in the human world counting from one on up you can use great
less than or equal to but you have to be consistent and in general as a
programmer just always start counting from zero if you're doing something
canonical like this but the solution is indeed just to change this by changing
the greater less than or equal to to less than if I re compile this program with
make buggy zero and then do buggy zero again and let me increase the size
of my terminal window now you see okay almost the same output but indeed
I
starts at zero goes up two but not through three all right so printf in short
should be can be your first uh diagnostic tool instead of just staring at the
screen or raising your hand I mean use printf to see literally what's going on
inside of your program by just printing out things of interest and then once
you've solved the problem you can go back into your code AS I'll do here by
shrinking my terminal window I'll delete the print F line and now I'm ready to
share this program with the
little complicated to just start using the debugger you have to like create a
configuration file and do like some annoying steps that just get in the way of
solving real problems so we have automated the process for you of just
starting the debugger and thereafter it's sort of Industry standard how you
use it but we save you the headache of having to create those configuration
files so suppose I want to do this suppose I want to try to debug this program
step by step using special software well how can I do that well let
me propose that if I revert this back to the original version where I was less
than or equal to three I'm pretty sure that I was printing to hashes so I'm
going to do this and you might have done this accidentally or never at all but
notice if you hover over the gutter so to speak in vs code the part of it all the
way to the left of the editor you see this sort of grayed out uh Red Dot if you
click there it becomes a brighter Red Dot and this represents what we're
going to call a break point and this is
just a visual indicator that you've put like a stop sign equivalent there and
you're telling the debugger in a moment stop running my code there why
because I prefer to step through my code at sort of a human speed and not
as computer speed where it runs all at once so I've set my breakpoint which
is step one and then step two is quite simply this instead of running the
program itself run a command called debug 50 and then do/ bugy Z and now
this will start your program but inside of the debugger which
is a special program that smart people wrote that will Empower you to now
step through your code line by line at again at your own Comfort Pace I'm
going to hit enter some stuff's going to happen on the screen whoops uh
notice this is a common mistake that I made accidentally here looks like I've
changed my code I did because I went in and changed the less than or equal
to sign so let me go ahead and rerun make buggy zero enter good now let
me rerun debug 50 enter and now some stuff just happened on the
screen and it takes a moment to get started but once it's started you'll see
this you'll still see your code but you'll see this yellow highlight which you've
probably not seen before and notice that it's specifically highlighted in the
same line that I set a breakpoint on why that just means the program has
EXE the debugger has executed all of these lines except for line seven it has
broken at not in a bad way but it has paused execution on line seven so it
hasn't yet printed any
hashes and you can see that no hashes in the terminal window yet it's
paused execution but what's interesting with the debugger is the stuff over
here on the left hand side in the debugger here you'll see under variables all
of your so-called local variables and we haven't really made a distinction
between local and something called Global but for now local variables just
means all of the variables that exist in your function so I currently has a
value of zero okay and that makes sense so now how do I step
through my code and see what it's doing well at the top of the screen here
you'll see some playback icons kind of like a video player but they have
special meaning this first one will just play the rest of your program all the
way to the end so you only click that if you sort of solved a problem and you
just want to run run it to completion like before but the next three or next
two really are really the juiciest the second one here if you hover over it
eventually you'll see that it's called
step over step over means that the debugger will run this currently
highlighted line of code but it's not going to dive into it so if it's a function
like print F it's not going to start stepping through print F line by line why
because I can pretty much assume print F written decades ago is correct
problems probably with me but this next line if I did really want to step in
into the printf code to figure out how it works or find some problem in it all
these years later you can step
into printf and then the screen would change and you'd see each of the lines
for print F line by line at least if you have the source code for print F installed
all right I'm going to use the first one step over and watch as the yellow
highlight moves and watch as in the terminal window there's a hash symbol
here we go there's one hash now notice line five is highlighted that means it
has paused on line five line five has not yet been exec executed so what
does that mean the value of I per
the top left hand corner is still zero but as soon as I click step over again
Watch What Happens at the top left where I is a variable on the screen now I
and it flashed briefly has a value of one and now if I step over again watch
the terminal window there's my second hash now let me click step over on
the for Loop watch the variable at top left now one goes to two now let me
click it again third hash and Here's Where The Logical error is perhaps
revealed let me go ahead and step over the loop now I is three wait a minute
I'm still going to print out a hash there it is there's the fourth hash and at this
point hopefully the light bulb proverbially has gone off I realize oh I screwed
up I can either stop the program Al together with the red square or I can just
let it run all the way to the end which just terminates everything at this point
I just want to get back into my code and start fixing things and you can close
for instance as I will here the file explorer just to hide the panel that open so
that's debug 50 but it's
not a cs50 thing that just starts the debugger for you which is something
you'd find in most any programming environment nowadays questions on
debugging questions yeah good question where does it tell you where it went
wrong so sadly it does not tell you any of that the onus is still on you the
human to use this tool productively to walk through your code at a a saner
pace but your brain is is the one that still needs to solve it and I don't doubt
down the line with artificial intelligence and more
programs like this will get all the more helpful and start answering questions
like that for us and there are other tools we'll introduce you this semester
that are even more powerful than this but for now it's just a tool really to
slow things down and not have to change your code the fact that I had that
panel on the left that just showed me eyes changing value is just an
alternative to print F and I can step through it a little more slowly other
questions on debugging now let me show you one final
example with this debugger here and this one too I wrote in advance let me
close buggy z.c and let me open up buggy1.c my second version thereof let
me close my terminal window for a second and give you a quick tour of this
program which similarly has a mistake now at the top of this program some
familiar includes cs50.h and standard i.h this is not something we've seen
before it's specific to this example a function called get Negative int takes no
arguments and it returns an integer what
does it do it literally gets a negative integer ideally from the user fun fact
though it doesn't correctly that's the bug get Negative int is broken at the
moment so what does Maine do well main just calls this function passing in
nothing in parenthesis no inputs and it stores the return value in I and then it
just prints out I on the screen so honestly just by eyeballing this you know I
feel comfortable enough with programming and see I think main is correct let
me just stipulate main is
correct but there is going to be a bug down here now what's the bug down
here here well let me look at get Negative in implementation notice this first
line 12 is identical to the Prototype up here the Prototype is sort of stupidly
required up here because C reads things top to bottom left to right the
compiler technically does so if you reference get Negative in here but you
don't implement it until down here and you haven't told C in advance that it
will exist again you get the error we saw last week all
right so how does get Negative int work we declare a variable called n we've
got a do while loop that does what it uses get int which comes with the cs50
library per last week it prompts the user for negative integer quote unquote
and stores the value in n i then do all of this while n is less than zero right
remember we used a do while loop last week to make sure the human
cooperates and doesn't give us the wrong type of value be it positive or
negative or something else and then we return n
and there's some subtleties anyone recall or have an intuition for why I've
declared clared in on line 14 instead of on line 17 this is a c specific thing
exactly there's this notion of scope in see and we'll continue to see this over
time whereby a variable only exists inside of the most recent curly braces
that you've opened so if I've declared and here on line 14 I can use it
anywhere between lines 13 and 21 because those are the nearest curly
Braes if by contrast as you note if I
instead said this int n equals get int and so forth and didn't have the current
line 14 well n would exist inside of these curly braces but not here which is
too late and definitely not here so you just have to declare it first and then
use and reuse it as such now let me just show you how I can debug this but
let me show you the symptoms first let me open my terminal window let me
run make buggy one compiles okay so it's not something silly like a
semicolon dots SL buggy one and I'm asked for a negative integer all
right let me give it negative one enter well the main function supposed to
print out what I typed but it clearly didn't it's prompting me again all right so
maybe it'll like -2 no maybe -3 50 okay so it's definitely broken right it kind of
seems logically to be doing the opposite now you can perhaps see why this is
happening already these are deliberately simple programs by uh for
demonstration sake but let's do this let me go ahead and set a break point in
main even though I'm pretty sure main is
correct but it just helps me start my thought process start with Main and
then take it from there let me run now uh debug 50 bugy one enter and let's
see with that breakpoint now the the goey is going to reconfigure itself it's
going to pause on line eight because that's the first interesting line inside of
Maine so I could have just put the breakpoint on line eight too it's smart
enough to know that if I set it on six eh you really mean line eight because
that's the first actual line of code and
watch when now what happens if I step over this line notice that I which at
the moment seems to have a default value of zero more on that another
time but if I click step over like before I'm prompted for a negative integer let
me type negative 1 enter and now notice there's no additional yellow
highlight why where am I currently stuck logically yeah just logically I must
be in that D while Loop and even if you don't understand it like that's the
only explanation if you keep getting prompted
surely there's a loop going on there's only one Loop in my code so there's
probably a problem there so okay I can't just set a breakpoint in Main and
then wait for this to work so let me just uh let me stop this with the red
square and let me think all right instead of I can still set my break point in
main but let me rerun the debugger instead and this time not step over that
line of code let me step into that line of code so Watch What Happens now
instead of clicking the second icon here let me click the third
whose name is indeed step into and watch as the yellow highlight does not
move to line n it dives into line8 the function on line eight thereby bringing
me whoosh down to line 17 it's kind of going down into that next function
now it didn't bother pausing on line 12 or 13 or 14 because there's nothing
intellectually interesting there happening yet the juicy part really starts it
would seem in line 17 so now notice n is my variable at the top left if I click I
don't want to click step into now
though what what would go wrong if I click on step into or what would it do
that I don't think I want to do yeah yeah it would step into get int but I'd like
to think that the staff's version of get in is correct and that's not our problem
today so I want to step over it and watch now at top left that nothing
happens yet to the value of n until I go to the terminal window now and type
in something like Nega 1 now notice it jumps to line 19 which is the next
interesting line top left n indeed
is1 and here's where I can now pause as a human and think all right so while
n is less than zero all right n per the top left corner is negative 1 so all right
while negative 1 is less than zero well obviously that's true mathematically
so what's going to happen it's a do while loop so when I click on step over
again it's going to go to this line cuz it's at the end of the inside of that Loop
and now here it's looping through again and again all right let me do this
once more I'm going to step over
all right I'm going to type in -2 and it's the exact same thing now is my
chance on the yellow line okay wait a minute -2 is obviously less than zero
let me try this one more time click it once here and now all right let me give
it 50 and now okay while 50 is less than zero that's not true so the loop is
over because it's not going to do it while 50 is less than zero that's not true
so now watch when I click step over once more it then finishes the loop even
though there's nothing more to do it's now
about to return n it jumps back up to main where I left off on line nine it now
prints in my terminal window the number 50 and hopefully at this point to
your question earlier my human brain has realized oh I'm an idiot like I
flipped my my sign there so I probably let me stop this I probably want to do
something like this if the goal is to get a negative integer I probably want to
say well n is for instance greater than or equal to zero would work so while n
is greater than or equal to zero
keep doing this and that's the logic I wanted to express so the debugger just
saves me from staring at the screen raising a hand sort of asking someone
else at least in this case it allows me to go through it at a healthier Pace
questions now on debug 50 which should be your new friend even if it's not
your first instinct after printf any questions on debug 50 no all right all right
well there's one last Technique we can equip you with here um and that is in
addition to printf and a
really helping honestly maybe it would help to just sound out what problem
you're having similar to going to office hours talking to a a TA or a professor
just walking through your problems because in sort of talking to the duck
about you know the fact that you're doing this while uh n is less than zero
and then if it is I wait a minute I'm an idiot not just for talking to the rubber
duck you realize hopefully in expressing yourself literally verbally you
probably will hear with non-zero probability like
some illogic in your statement and just by sounding things out you'll realize
like oh that's my problem and so frankly if you have roommates you can also
use a roommate for this but the rubber duck is just sort of a go-to when your
roommates have no interest in your you know C problem set talking
something through that um as follow as such and this is an invaluable
technique I admittedly tend not to do it so much with a rubber duck but
ideally with colleagues human colleagues but just talking through
things often will help you just realize oh I said something logical now I can go
back to the code so don't solve problems by staring at your screen endlessly
for minutes for hours at that point it's time for a break time to walk away
time to talk to the duck if you've already exhausted some of those other
tools um as an aside on your way out today at the end of class we have H
clearly plenty of rubber ducks uh for you um and uh it's become a thing over
the years at least among some uh to bring the duck with
them when they travel and send us photos here for instance is uh cs50's
rubber duck debugger AKA ddb for duck debugger which is a pun on a
geekier program called GDB the ganu debugger which is an actual piece of
software for debugging this is cs50's debugger uh in the hills of Puerto Rico
uh also here on the sea uh he made its way to San Francisco here uh also
down by Fisherman's dwarf by the sea lions if familiar uh here at Stanford
where there's a William Gates computer science building for computer
science uh down the road in SF at Google uh and this is the Tre fountain in
Rome and lastly uh the Coliseum so we'll be curious to see in the coming
years where your duck two travels so that then was quite a bit why don't we
go ahead here and take a short five minute break no snacks yet you're
welcome to get up or sit down we'll return in about five all right so we are
back and if the goal ultimately today is to have a better understanding of
things like strings so that we can solve problems with text
let's consider some simpler types of data first how we might represent those
and then see if that doesn't lead us to a discovery as to like how strings in
just today's modern software is using things like that so when we talked on
week zero about representation of data we had different ways of doing it in
terms of binary and decimal and uh unary even when we started talking
about the same last week in code we started talking about uh data types
instead and these data types were a way of telling
the computer like do you want an integer do you want a character do you
want a floating point value like a real number or even a string as we've seen
but it turns out that computers of course only have finite amounts of
resources your computer only has a fixed amount of memory or RAM and
that actually has very real world implications so for instance here are some
of the data types we've seen thus far and it turns out that each of these in C
has a specific number of bits allocated to it now admittedly this
can vary by System it's not so much the case nowaday days but for many
years for decades computers were getting better and better the earliest
computers might have used fewer bits for some of these data types more
modern computers might use more bits so the numbers you're about to see
are pretty much where we are present day so when it comes to these data
types A bu which is true or false somewhat curiously uses a whole bite even
though that's way Overkill because for a bull true or false you of
course only need one bit but it turns out even though it's wasteful to use
eight bits or one bite just to represent true or false it's just easier for
computers so a bull tends to be one bite an INT which we've been using a lot
uses four bytes typically or 32 bits and if I do some quick math from week
zero with 32 bits you have four billion possible values roughly but if you want
to represent positive and negative that means you can represent roughly -2
billion all the way up to positive2
billion so that's the range typically within if that's to few numbers for you
turns out there's things called Longs and longs use 64 bits which allow you to
have like a quintilian number of possibilities which is a lot certainly a lot
more than 4 billion so sometimes you might use a long but even that's finite
and so uh as we discussed at the end of last week bad things can happen if
you make certain assumptions as the data because of things like Inger
overflow or the like where things wrap around then
digits of precision they eventually get imprecise per the example we looked
at last week but it at least gets you further down the line as an aside in really
really important applications in finance and medicine and Military operations
and the like where you really can't have rounding errors long story short
humans have developed libraries in C and other languages that use more
even than 8 bytes so there are solutions to these problems but they're
always finite you have to pick uh an upper bound then
there's Char which we saw briefly last week when I asked the user for y or n
for yes or no and then there's string which I'm going to propose as a question
mark because a string totally depends like high h i exclamation point would
seem to be three bytes d a ID would seem to be five so strings clearly are
variable based on what you or the human type in so we'll see what this
means though in just a bit this though is the thing inside of your Mac your PC
your phone might not look exactly like this
but this is a a memory module for a modern computer and let's go ahead
and use this really as just representative of the finite amount of memory that
any computer indeed has let's zoom in on one of these little black chips on
the uh uh circuit board here zoom in and let me propose that this rectangle
really represents some number of bytes like tucked inside of this little black
uh circuit on the board is maybe I don't know a gigabyte a billion bytes
maybe it's a 100 btes some number of bytes it
totally depends on the computer and how much you paid for the Stick of
memory but if there's a finite number of bytes physically implemented
somehow digitally inside of this Hardware well then it stands to reason that
we could number those bytes we can just arbitrarily decide that the top left
corner is bite number one or really bite number zero the one next to it is
number one then number two number three dot dot dot number two billion
or whatever it is however big this this memory is so if
you are use a variable in a c program that's only one bite like a Char it might
literally be stored in that top left hand corner of the memory like in practice
you don't care where physically it is but really the artist rendition would be
this a Char might use one of those single btes somewhere in the computer's
memory if you use an INT which is four bytes it would give you four bytes
contiguous that is left to right top to bottom but all 32 bits would be next to
each other so the computer knows that those indeed all
belong to the same int if you need a long or a double for that matter then
you might use a full eight bytes in this case and you just keep using and
using this memory kind of like a a canvas you know almost in Photoshop or a
spreadsheet where you can just move uh pixels or you can move data
around that's really what your computer's memory is a canvas for storing uh
information in units of bytes or 8 Bits now we don't need to keep looking at
these circuit boards can abstract it away as we often do and let's go ahead
and zoom in on this grid just to consider some very specific variables so let
me zoom in and now I see fewer but larger uh boxes on the screen Each of
which again represents a bite and now let me propose that we play with
some actual code so here in C albeit without a full program or three ins score
one score two score three I have coincidentally given myself two uh uh two
scores at around 72 and 73 and then a pretty low score at 33 of course
course last week or two weeks ago this would have been high but now we're
program to average my three test scores together something like that so let
me do print F quote unquote my average is and I'm going to go ahead and
do say percent I back sln and now let me plug in the results and this is kind
of grade school math now how do I compute the average of three values well
just like in on paper I can do score one plus score two plus score three in
parentheses because of order of operations divided by three since there's
three total scores all right so I think this checks out and indeed you
can use parentheses and operators like plus and your code like this in C let
me go ahead now and do make scores no syntax error so that's good nothing
missing there and now let me do dot scores and see what my test average is
all right you know it's not great but I I I think I still passed and indeed my
average here is is 59 is it precisely 59 though well let's see let's let's actually
instead of using an INT how about we go ahead and use something like a
floating point value here and let me
go ahead and do this so let me recompile my code make scores huh all right
I've got an issue let me zoom in on my terminal window we've not seen this
one necessarily before but error on line nine format specifies type double
which is a lot of precision but the argument has Type in so what does this
mean well it's showing me with these green squigglies that something's bad
between the percent F and this thing over here well on the left I'm implying a
float or a double for that matter on the right
though what data type are score one score two score three all right so
they're ins so Cent does not like this the compiler just doesn't like that I'm
using ins on the right but I want floats on the left so there's going to be
different ways of solving this one way would be to just ignore the problem
like I originally did and just go back to percent I or as an aside percent D is
often an alternative to percent I for a decimal number but we use perc I cuz
it sounds like int so percent I is fine
here too but I don't want to just avoid the problem I want to actually display
a floating point value so how can I fix this well it turns out I can solve this in a
few different ways the simplest is just to make sure that at least one number
on the right is a floating point value like 3.0 instead of just three now I think
clang will be happier let me do make scores enter and indeed it's okay why
as soon as you have at least one more precise data data type on the right it
just treats everything at that point as
floating point value so that the math works out so/ scores enter and now
there we go right you know uh some of us might really want that third of a
point our average was not 59 it's 59 and a third as in this case here all right
so we've solved that there as an aside though there's one other technique to
uh to show here if you didn't want to change it to 3.0 because that's a little
weird because I there were literally three scores it's not like that needs to
have a decimal point you could also
explicitly convert the three to a float by saying in parentheses float this is
what's called type casting and this will just convert the thing right after it to
that data type if it's possible so if I do this again make scores no errors now
do/ scores and I get in fact the same result there's a bit of a rounding issue
here but we know the rounding relates to the imprecision from last week for
now let me just be happy with my 59.3 something I'll take that for now but
this is you know is close to um a good
enough correct answer for me now but how do I think about now what's
going on inside of the computer's memory well let's consider here's that
same grid of memory each box represents a bite where are score one score
two and score three in my memory well score one let me just propose is at
the top left but it's taking up four boxes for four bytes score two probably
ends up right next to it in memory though this isn't always going to be the
case but I've chosen simple examples 73 is next to it also
taking up four bytes and then lastly 33 is in score three uh down there
underneath now if we don't if we really look at the computer's memory look
at it with some kind of microscope or the like there's actually 32 bits 32 bits
32 bits in each of those four uh groups of four bytes representing those
values but again for today's purposes onwards we don't really need to think
again and again in binary it's just indeed these decimal numbers being
stored there but I claim now this isn't the best design
even if if you have never programmed before cs50 what you're looking at
here on the screen as an excerpt in what sense is this perhaps bad design
even though it's a correct way of storing three test scores what's kind of bad
here yeah yeah always do exactly what you did extrapolate to four scores
five scores 50 scores this can't be that welld designed because now you're
going to have four lines of code five lines of code 50 lines of code that are
almost identical except for this like arbitrary
number that we're updating at the end of the variable so indeed there's
probably going to be a better way even though at least in C we haven't yet
seen that technique but the solution today onward is going to be something
called an array an array is a uh way of storing your data back to back to back
in the computer's memory in such a way that you can access each individual
member easily put another way with an array you can instead do something
like this instead of saying int score one int score 2 in
score three giving each a value you can first tell the computer please give
me a variable called scores plural that you can call it anything you want of
size three Each of which will be an integer that is to say this is how you
declare an array in C that will have enough room to store three integers put
another way this is the technical way of telling the computer please give me
uh 12 bytes in total 3 * 4 each for an inch so give me 12 bytes in total and
what the computer will do is guarantee that they're back
the first int is at bracket zero second int is at bracket 1 third int is at bracket
two so it's not 1 two three it's literally 0 one2 and this is not something you
have control over you must start at zero so these lines now create an array
of size three and then insert one two three values into that array but the
upside now is that you only have one name of the variable to remember it's
just called scores yes you need to go into the array to get individual values
you need to index into it using those
square brackets but at least you don't have this hackish approach of
declaring a separate variable for each and every one of these values so let
me go back to scores do c here and let me propose that I do this let me just
kind of use that same idea to do the following let me get rid of these three
separate integers let me give myself an INT scores array of size three and
then scores bracket Z will as before be 72 scores bracket 1 will be 73 and
scores bracket 2 will be 33 and let me get rid of the Little Dot
there all right so now if I go ahead and run this again with make uh scores
enter huh what did I do wrong here I think I got a little too ahead of myself
let me increase my terminal window let's focus on line 10 here first error use
of Undeclared identifier score one what did I do here that was dumb yeah
right so I didn't declare score one I've got old code right so I just kind of
honestly got ahead of myself here not even intentionally so let me go ahead
and Shrink my terminal window again I
special program just to check your average of three test scores like 72 73 33
why don't I actually make the program Dynamic and ask the human for that
average uh for those scores so instead let me do do this how about we get
rid of the 72 and change this to get int and I'll just prompt the user for a
score let me get rid of the 33 and change 73 and get this to be get int score
quote unquote and then lastly get rid of the 33 and replace it with get int
quote unquote score get int is a cs50 thing
for now so I need to include cs50.h as always but I think now it's sort of a
better program because now I can compile it once I can even share it with
my friends and now any of us can average three scores on some classes test
they don't need to know the code or rewrite the code just to T type in their
scores so make scores worked scores now I can type anything I want maybe
it's a 72 73 33 still get the same answer or maybe I'm having a better
semester 100 100 maybe 99 and now we get still a pretty
high score there but now it's Dynamic now you don't need the source code
you don't need to recompile the program it's just going to work work again
and again but this too let me propose that this code is correct if I want to get
three scores from the user but these highlighted lines now 6 through n are
they welld designed would you say yeah yeah right this is we can use a loop
is the spoiler here why I mean my God it's like the same code again and
again and again the only thing that's
changing is the number and you it should have kind of had some code smell
again because if I keep typing the same thing again again and again like
that's clearly an opportunity to better design something so let me do this let
me go ahead and still create my ver my array of size three and but let me
use our old friend the for Loop for in I equals z i Less Than 3 I ++ and then in
here let me do scores bracket we haven't seen this before but any intuition
scores bracket I because that will use whatever I is be it Z or one or
two in iteration and then I can get an INT asking the user for score without
having to repeat myself again and again so hopefully if I didn't make any
typos make scores all good do/ score 72 73 33 and we're back in business
but the code is arguably now better designed because now I haven't actually
uh I haven't actually hardcoded the scores and I haven't actually copied and
pasted any of that code well if we consider now what's going on inside of the
computer's memory it's pretty much the same in
terms of the values but instead of the variables being literally score one
score two score three there's just one variable it's an array called scores but
you can index into its three locations by using scores bracket Z to get the
first scores bracket one to get the second scores bracket two to get the third
but this is key the memory is contiguous it's only the screen is only so large
so it wraps around but physically digitally the memory is contiguous top to
bottom left to right and that's important why because the
brackets indicate 0 1 2 that each of these integers is just one integer away
from the next it can't be randomly down here all of a sudden it's got to be
back to back to back all right now equipped with that Paradigm what more
could we actually do here well it turns out it's worth knowing that it's
possible in code to even pass arrays around as arguments and let me just
whip this program up somewhat quickly just so you've seen it before long but
let me go ahead and do this let me propose that I create a
function that does this averaging for me so I'm going to create a function
called average that returns a float uh and the arguments this thing is going
to take uh let's see it's going to be the array so it turns out if you want to
take in an array of numbers you can call it anything you want this is how you
tell C that a function takes not an integer but in Array of integers and you
don't have to call it array I'm doing that just for the sake of discussion it can
be called X it can be numbers it can be anything
else I'm just calling an array to be super explicit as to what it is there now
how do I change my code down here what I think I'm going to do for the
moment is just this I'm going to get rid of this code here where I manually
computed the average and let me just call the average function here by
passing in the whole array of scores so this is just an example of abstraction
like now I have a fun function called average I don't care I don't have to
remember how it works once I implement
it it just kind of tightens up my main code a little bit but I do still have to
implement this so later in my file let me repeat myself before the only time
it's okay and see to repeat yourself again and again by typing out again
average and then int array Open Bracket but now not a semicolon now I have
to implement this thing and I can implement this in a bunch of different ways
uh but I don't know huh in advance I can't just do this I can't just do array
bracket 0 plus array bracket 1 plus array bracket
2 unless unless this program is only ever going to work on three numbers so
huh let me let me go ahead and do this let me first propose that there's a
poor design here in my main function what value have I repeated twice
among the highlighted lines what jumps out at use twice L of the array yeah
the length of the array is just three now it's not a huge deal that I type the
number three on line eight on line nine but this is exactly the kind of like
shortcut that's going to get you in trouble eventually why because
eventually you or someone else is going to go in make the array bigger or
smaller and you're not going to realize that magically that same number is in
two places and indeed this is what a programmer would often call a magic
number a magic number is one that just kind of appears magically and
you're on the honor System to change it here if you change it here and then
you change it over here like that's not going to end well if it the onus is on
the programmer to remember where they hardcoded that is rote out three
explicitly so anytime you reuse a value like this you know what we should
probably do what we did last week which was to declare a variable uh
perhaps at the very top of my program so it's super obvious what it is called
maybe n and set that equal to three better yet what did I do last week to
make sure that I Can't Screw Up and accidentally change that value yeah
constant and the keyword there was just const for short and now I have a
global variable Global in the sense that I can access it anywhere that
is called n it's an INT and it's always going to be three and now I can improve
my main function a little bit by just changing the threes to n so now if I if a
colleague realize oh wait a minute there's four tests this year you change n
to four recompile the code and it just works everywhere else except in my
average function let me change it back to three just for consistency this is
not going to fly now to just uh sum up things like this for instance and then
return this divided three why will this
not work now as I've defined it yeah okay I might be in returning an integer
value when I intend to return a float per this but I think I'm okay CU I Ed that
little trick where I made sure that at least one of the numbers in my my
arithmetic expression is in fact a floating point value and just by adding the
point zero make sure that everything gets treated as a float so I think that's
okay sorry a little ladder exactly so left hand's not talking to the right hand
here and that my current implementation of average is
still assuming that there's only going to be three tests or whatever but wait a
minute I just went through the trouble of modifying this to be n generically
and if I change this to four I'm not going to be happy perhaps with my
average because now I'm going to ignore one of my test scores Al together
so let me change this back to three and unfortunately if it's a variable now n
and therefore I have literally a variable number of scores how do I take the
average of a variable number of
things I mean what's my building block there yeah yeah why don't I use a
loop that goes through the array and adds things up as you go I mean kind of
like grade school as you take the average on your calculator or paper pencil
you just keep adding the numbers together and then you divide at the End
by the total number of things so how can I do this well let me change my
implementation of average to First declare a a variable called sum or
whatever set it equal to zero so this is like me on my piece of paper getting
divided by the total number of things now this I can tighten slightly recall
that this is syntactic sugar for just adding things I can't use plus plus
because that only literally adds one but I can use here plus equals questions
on this implementation here really the only takeaway or the most important
takeaway is that this is the Syntax for how you tell a function that to it
expects a whole array not a single variable like an INT or the like you literally
use square brackets but you don't specify the length inside
there yeah what about the variable at the top good question what do I have
it defined as at the top this variable n it must be an integer if you're going to
uh use it inside of in arrays square brackets here so this line 10 notice no
longer says three it says n and so whatever n is three or four or something
else that's how many integers I will get in that array and it must be by
definition of an array an integer that goes in those square brackets and
here's a common source of confusion when you
create the array that is declare it you use square brackets like this where you
put the total number of elements you want when you subsequently use the
array like I'm doing here you don't mention int again just like you don't
mention in again and again once a variable exists you use the square bracket
still but you don't use n you use zero or one or two or generically here I so
when C was designed they sometimes use the same Syntax for two different
ideas or contexts yeah good question do I have to include
line six short answer yes because of the reason we ran into last week C or
clang really reads your code top to bottom left to right and so if the compiler
sees some mention of this function average on line 16 but you haven't told
the compiler that average exists you're going to get an error on the screen
so the conventional way to do that is you just copy paste the one first line of
code from the function it's so-called prototype or Declaration yeah really
good question uh in a perfect segue is there a library you can
use if you don't know the size of the array no and so if any of you have
programmed in uh in uh Java or python or other Lang languages you can
actually just ask the array like how big is it in C you and I the programmers
have to remember it and so short answer no there's no function that will just
automatically do this for us and in fact let me make a more subtle claim that
it's fine to use Global variables like this if they're really for configuration
options why it's just convenient to put
them at the very top of the file because everyone you your colleagues your
Tas are going to see them at the top of the code but you really shouldn't be
using them everywhere throughout your code it'd be better if the average
function itself we're independent of that special variable so by that I mean
this you know what I should really do if I really want to be welld designed I
should pass in the length of the array to the average function I should give
the average function a second argument I'll call it
length for instance but I could call it anything I want and so rather than
putting n all the way down here at the bottom of my file let me just
dynamically say length instead and this is a subtlety and no need need to
get too tripped up over this but this now is just an example of how the same
function can take not one but two arguments but indeed in C you must
remember yourself what the length of an array is you can't just ask the array
via some syntax like you can those of you who've programmed
alluding here to not having that information now just to make sure I didn't
screw up anywhere let me compile this final version of scores suspense all
good/ scores 72 73 33 and we're still back in business so this version is more
complicated and as always we'll have this version on the course's website for
reference but the point really is that arrays not only can be used as
containers to store multiple values three or more in this case um you can
also even pass them around as arguments as such all right now besides
that let's let's simplify for just a moment and consider now the world of chars
if we've just got single bites where uh where does this lead us and how does
this get us ultimately to strings to solve problems like readability and
cryptography and the like well here for instance are three lines of code out of
context that simply store three chars and you can already see where this is
going having three variables called C1 C2 C3 is clearly going to end up being
bad design because of all the silly
redundancy here but notice I'm using single quotes like last week because
these are single chars what does this look like in the computer's memory
well it looks a little something like this if we clear out the old memory C1 C2
C3 probably will end up here maybe not literally in the top leftand Corner this
is just an artist rendition but C1 C2 C3 will probably end up like that now
what's really there it's really those same three numbers 72 73 33 but how
many bits does a bite have just eight so if we were to look at
the binary representation of these characters it would only be eight bits each
that's enough to store small numbers like 72 73 33 we're not dealing with
Unicode and emoji and the like but the point is the same you don't have to
use four bytes to store these numbers you can use a different data type like
chars and underneath the hood it's indeed going to use just single bytes for
each but this is sort of like a this isn't really how we Implement strings right
when you wanted to say hi last
week or this we use double quotes and we wrote all of the things together
and used one variable not three right when I typed in David I didn't have a
variable for d a v i d i had one variable called name that stored the whole
thing so in C we keep talking about these things called strings we'll see
eventually that strings are not necessarily what they seem to be but for now
the key thing about strings is that they're variable length uh so to speak right
they might be three characters high or five
eight bytes if I drawn a a character it always takes up one bite but how many
bites does a string take up yeah I mean that's kind of the right answer in this
case three it would seem but if it's David that's a good five characters but
where do we put the number three where do you put the number five right
this is literally all that's into your computer this is all our building blocks in
front of us so how can we where does the three go where does the five go
well it turns out you can solve this in a couple of
different ways but the way humans decided to implement strings years ago
is indeed an array but they added one extra bite at the end of every such
string array just to make clear with a so-called Sentinel value that the string
ends here why so that if you have two strings in the computer's memory like
hi and by you know where the barrier is between like the exclamation point
of one and the letter B in the next right you need some kind of delimiter and
so what really is underneath the hood is
this when you store a string in memory when you type in a string as the user
if you type in three characters it's it's going to use 3 + 1 = 4 bytes in total if
you type in David it's going to use 5 + 1al 6 bytes in total why because C
automatically adds this special zero at the end of the string I've drawn it with
back SL Zer because this is how you represent zero as a Char as a character
but this is literally just zero as we'll soon see so anytime there's a string in
memory it always takes up one more bite
stop printing decimal numbers are not that enlightening we'll generally write
the characters like this and again back back sl0 is just special symbology like
it's what the programmer types to make clear that you're not saying hi zero
you're saying hi and then it's a special zero specifically it is eight zero bits
that indicate that it's the end of the string technically that back sl0 if you
want to be fancy it's called null NL and it turns out you've seen this before
that we didn't call it out here's
that same asky chart from the past couple of weeks if I highlight this what is
decimal number zero mapping to n which is just programmer speak for the
special null character all zero bits that means the string ends here this all
happens automatically for you you do not need to create these null
characters or these zeros any questions then on this implementation thus far
any questions here no well let me do this let me go back to vs code in a
second and let's actually corroborate this with some code let me go ahead
and
create a small program called high. C and how about we do this let me
include standard i.h let me include uh let me type out int main void as
always and now let me do something simple and kind of bad but Char C1
equals quote unquote H in single quotes Char C2 equals quote unquote I in
single quotes and lastly Char C3 equals exclamation point in single quotes
and now let me just print this out I can't use percent s cuz that is not a string
that's literally three chars cuz that's the design decision I
characters are just numbers and strings are just characters you can kind of
poke around let me change all three placeholders to percent i instead and
this is totally fine too let me rerun this make High um actually let me make
one change just so we can see this let me add spaces just for Aesthetics sake
let me do make high do SL High enter and voila like now you can actually see
the numbers that I claimed back in week zero were in fact happening
underneath the hood well this is not how you would make
strings it'd be incredibly tedious to have three variables for threel Words five
variables for five letter words we've been using of course strings since last
week so let's do that instead uh string uh s equals quote unquote double
quotes high for this no because of these training wheels I need to include the
cs50 library but we'll come back to that in the coming week but for now I'm
going to go ahead and create a string s called quote unquote high and now
I'm going to change this to be my familiar percent s
and now just print out s itself this of course is the same thing as last week
high gives me the exact same thing but now we're dealing of course with
strings but how can we see a little beyond that well how about this let's poke
around further with today's Primitives even though s is a string I could
technically print out its first character with percent C by doing s bracket z i
could technically print out its second character with percent C by doing s
bracket one I could print out its third
character with percent C and printing out s bracket 2 so again this just
derives logically from my understanding now of that strings or arrays as you
note let me do make let me do make high SL high and no visual change but
I'm just kind of now tinkering around and in fact if you're really curious let me
do this let me change these back to I back to I I oops back to I and let me
add a fourth one because if I'm really curious now let's see it what's S braet 3
this is the fourth bite and even though the
string itself is hi I think we can corroborate this whole null thing make high SL
High enter and there it is you could have done this last week if you really
wanted to geek out on strings but like for now it's just revealing what's going
on underneath the hood questions then on what these strings are yeah why
do we need the bracket uh uh why do you not need brackets good question
why do I not need brackets on line six uh to because uh s is a string we'll see
in a couple of weeks that s is essentially implemented
underneath the hood indeed as an array but that happens automatically for
you you can treat S as just a variable name without square brackets you will
use square brackets when you have arrays of ins or you manually create
arrays of chars or doubles or Floats or anything else but strings are special
why I mean every program you write seems to use strings text in some form
we're humans we like text not just numbers and and such so this is just
treated a little specially in C and many other languages
as well other questions on this here no let's add then one other string to the
mix so instead of just saying hi why don't we consider version of the program
that says both high and by and I claim now that that back sl0 that null
character is going to be ever more important now if we've got two strings in
memory so that c knows how to distinguish one from the other so let me go
ahead and just get rid of these two lines for the moment let me recreate
string s equals quote unquote double
quotes High Let me give myself another one and because I'm just playing
around I'll choose very short variable names string T equals quote unquote
by exclamation point and then let me just print them both out uh let me go
ahead and print out percent s back sln comma s and then print F percent s
uh back sln and then T so very simple demonstration of just these two
variables make high do SL high and of course it prints out two lines one after
the other what's actually going on underneath the hood
well let's go back to the computer's memory High I think it's going to be I
claim pretty much the same so s I'll claim is in the top left followed by the
back sl0 and that's important now because by probably is going to end up
there and Visually it wraps just by nature of how I've drawn this grid of bytes
but it's continuous B ye exclamation point null AKA back0 this is Now helpful
to print F because now printf knows where one begins and ends by way of
that special null character
but we can poke around now too what else can I do here how about this how
about I go into my code here back to BS code and let me go ahead and say
something like well if I've got two of these uh strings you know let's put them
in an array let's kind of do this sort of arrays in arrays sort of inception style
here so string words bracket two so give me an array of two strings is what
I'm saying here in code even though we've not done it with strings yet we
only did it with ins and now let me do this the first
word AKA words bracket zero will equal as before high and now words
bracket one will equal quote unquote by bite so by and now I've done the
exact same thing but again I'm just avoiding having s t QR and like all these
different variables in my code I just now I'm treating them as one single
array of strings how do I change my code down here well if I want to print the
first word I do words bracket zero and if I want to print the second word I do
words bracket one this is not a useful
exercise at the moment because I'm just making my code more complicated
but again it allows us to post poke around and see what's going on because
there is that high and by but watch this if I really want to be cool I can do this
let's print out percent C percent C percent C back sln and then here percent
C percent C percent C percent C so four of those and now here's where
things get interesting words is an array of strings but again if I may what's a
string an array of characters so just
use the same logic if words is an array of strings you get at the first string
with words bracket zero how do you get at the first character in the first
string bracket Zer words bracket Z bracket one and lastly words bracket Z
bracket two and now down here words bracket one but the first character is
there words bracket one the second character is here words bracket one the
third character is here whoops third character is here and words bracket one
one the fourth character is here like this is not how people program this is
only for demonstration sake my God it's so tedious and verbose already but
if I make high now do SL high now I'm like manually Reinventing percent s if I
forgot it existed using percent C alone but you can indeed manipulate arrays
in this way but because strings are arrays of characters you can manipulate
strings uh in this way too any question now on this syntax any questions
here no no all right well let's go ahead and propose that we solve a couple of
other problems we might not have as before but first a quick visual
of what's been going on underneath the hood here if here again is where we
left off on the screen high and by uh back to back here is really how I just
treated these things s bracket 0123 and then t0 1 2 3 4 but really once I put
them in an array the picture becomes this words Z is the whole High words
bracket one is the whole by but if I really get into the weeds and start
indexing into individual characters in those strings all I'm using is new syntax
in order to represent these same values here
as before and let me do this int main void and in the first thing I'll do is just
get a string from the user I'll ask the user as always for their name so I'll call
get string and say what's your name question mark as always and then down
here if I want to figure out the length of this string and print the name the
print the length out on the screen well I can kind of do this similar in spirit to
the average where I'm accumulating something let me go ahead and
initialize n to zero let me give
myself huh it's not not a for Loop because I don't have a I don't know in
advance how long it is but what if I do this while the value at name bracket n
does not equal single quote back sl0 crazy syntax at the moment but it's just
the culmination of these various building blocks let me just finish the thought
here n++ and then down here let's just print out with print F and percent I
that value of n so I claim this is going to show me the length of any string I
type in whether
going to check name bracket one well if I typed in David name bracket one is
going to be a a does not equal back0 and so it's going to go again and again
and again but five steps in total later it's going to get to the bite after David
realize wait a minute that is a back slash n the Loop finishes and I print out
the total length arrays in general do not have this null character however
strings do again strings are special versus all of the other data types we've
talked about thus far but how could I
for instance uh do this differently well let's actually Factor this out as a
function as I've I've commonly done but rather than implement it myself you
know what it turns out what's nice about strings being so common there are
many other people who have solved these problems before and in fact
there's a whole string library in C it is used by way of a header file called
string.h and what string H is is a library of string related functions in fact you
can see in cs50's manual pages for C uh the
n equals Sterling of the human's name name and now I'll just use print f as
before with percent I back sln and output the value of n but but there's a bug
at the moment what have I forgotten to do yeah I have to include the header
file at the top of the screen so let me at the top of the code so let me also
include string.h at the top of my file so that c knows that in fact sterl exists
let me go ahead and make length as before length uh or actually really for
the first time what's your name David
and hopefully I'm going to see in fact five by contrast if I run it again and
type in high exclamation point now I see three so Sterling is just one of the
functions in that library and there's so many more in fact yet another library
that might be useful moving forward is this one C which relates to C data
types and lots of functions therein that can be useful for instance if you
review its documentation in the manual pages online you'll see that there are
functions via which we can solve problems like this
let me go ahead and propose here let me see let's do um example here
involving how about uh checking if something is uppercase or lowercase and
converting it uh to uppercase only let me go back to vs code and code a
program called uppercase Doc in this file I'm going to start by including now
as always cs50.h I'm going to include standard i.h and I'm going to add one
other to the mix which is string.h now2 so I can access the length of things
as needed int main void comes next and then within my main function
I'm going to go ahead and declare a string called s I'm going to call get string
as before and I'm going to go head and just ask the user for a string called
before I want to do it before and after whatever the user types in is before
but I want to force everything to uppercase thereafter let me now in this
Loop here do this let me print F quote unquote after just so we can see this
on the screen and let me do four in I gets zero I is less than stir Lang of s I +
+ what am I about to do I'm about to
iterate over every every character in the string from left to right from zero on
up two but not through the length of s and how do I check if something is
lowercase so that I can actually force it to uppercase well it turns out I could
do this literally if the character in s at location I is greater than or equal to
Capital a Ampersand Ampersand which means and instead of or which we
saw in the past s braet i is less than or equal to little Z that means logically in
English that this is indeed a lowercase how do I now convert it to
uppercase this character well I could just literally print out the same
character but that would not be the answer here because that's not changing
the value but what could I do instead well let me actually pull up here real
fast the asky chart as before and let's see if we can't glean some insight if I
pull up the same ask each chart and suppose the human has typed in a
lowercase a that's 97 what letter I want to convert it to uppercase a so what
number do I want to convert the 97 to per week zero so 65 we
keep coming back to that one what if the user types in lowercase B I want to
change the 98 value to 66 and so forth and any quick math how far part of
those so it's always 32 like uppercase to lower case is always wonderfully
good design 32 away one from the other so what does this mean well I think
we saw earlier that underneath the hood a Char is just a number you can
certainly do arithmetic on it and here again if you understand these lower
level Primitives what if I do this whatever s bracket I
on occasion literally that's what Microsoft and Google have done they iterate
over every character in the document check if it's lowercase and if so they
subtract 32 from it and show you the new value what if though it is not a
lowercase letter I think I can keep it easy and just print out the current letter
unchanged if my goal is to Simply Force things to all uppercase and that
letter then would be S bracket I so let me go ahead now and make uppercase
hopefully no errors do/ uppercase and
I'll now now type in David with an uppercase D but lowercase everything else
but now the after version is David an aesthetic bug notice here I forgot to
include just for prettiness sake about back sln at the end no problem I'll add
that let me fix my mistake make uppercase SL uppercase enter daid enter
and voila and I I deliberately added another space after the after just so they
would line up pretty even though before and after have different numbers of
letters questions then on this implementation of forcing something to
you so how can I now use this uh Library ctype.h well let me go back into my
code let me include this among my header files here I tend just so I can skim
things easily I tend to alphabetize my headers but that's not strictly
necessary but it allows me at a glance to realize did I or did I not include
something I need now let me go ahead and do this it turns out if you read the
documentation for the ctype library there is a function wonderfully called if is
lower that takes in a character as
up I can get rid of the whole else I can get rid of the whole if and arguably
now Implement a program that's just as correct but better designed why
fewer lines of code easier to read Le lower probability of mistakes assuming
the library is correct it just makes it easier and faster for me now to write
code so if I now do one last time make uppercase enter uppercase and type
in my name still working but now notice we've whittel this down to far fewer
lines of code albeit using now this additional
Library questions then on how we did this no well even even though this
code I dare say is correct it's not necessarily welld designed just yet in fact
there's one line of code one function call in this current implementation
that's more inefficient than it needs to be and allow me to draw our attention
to this here line 10 wherein we're calling Sterling but we're calling it inside of
this for Loop specifically inside of the condition and why might that not
necessarily be the
best idea well is the length of the string s changing ever I mean certainly not
within the span of this Loop and so here we are within our for loop on line 10
11 12 and 13 asking on every iteration that same question what's the length
of s what's the length of s what's the length of s and in turn we're calling
Sterling every time even though we're getting back the same answer so I
dare say a better solution here would be to maybe figure out the length of s
earlier on in my code and maybe declare a variable or
blocks for the day so we started by talking about those command line
arguments that clang uses whereby anything after the command that you
type at a prompt be it make or clang or even CD in Linux any word thereafter
or something cryptic like- O is a commandline argument it's an input to the
command it's different from a function argument because a function
argument of course is an input to a function but it's the same idea it's just
different syntax after the dollar sign at the prompt well it turns out
that command line arguments are something you can now use in your own
programs by accessing uh words after the prompt and let me propose that
we invent we invent this as follows let me propose that we switch back to vs
code here and I'll open a new file here called greet doc so in greet Doc is
going to be a program that very simply greets the user had we written this
last week we would have done this include cs50.h and then include uh
standard i.h and then int main void and then we might do something simple
like string
name equals get string quote unquote what's your name question mark and
then we would have printed out as always hello comma percent s and then
plugging in that name so this is the same program we've implemented many
times just to make sure it works although nope that's not quite the same
program semicolon in the wrong place this now is the same program so
make greet /g greet and I'll type in my own name hello David so we're back
there now what's arguably a little Annoying about this program if I type in
something else like Carter enter you know I have to run the program wait for
the prompt type in my name hit enter and that's fine but imagine if every
program worked like this like make suppose you could only type make then
you wait for a prompt then you type the name of the program you want to
make then you hit enter or worse in Linux when you have to change
directories as you might have for problem set one what if you had to type CD
enter now type the name of the folder you want to change into enter I mean
it
just slows life down and so it just gets annoying quickly so commandline
arguments just let you express your whole thought all at once so how can I
do this well if I want to express the notion of command line arguments in my
code I could do something like this I could for the very first time go up and
get rid of this void which as of today means this program takes no command
line arguments and I can change it to exactly this int ARG C string argv with
brackets now it's cryptic admittedly and let me
zoom in but I think we can perhaps infer now what's going on if main Now
does not have void as its input which means it takes no arguments surely the
spoiler here is that now main will take command line argument somehow any
guesses as to what argv is or will be what might this represent it's an array of
strings right by way of the syntax yeah exactly it will be all of the characters
or really all of the words that you type at the prompt argc as an INT any
guess argument count is what it generally
stands for though technically you could call these things any anything but
this is the convention because I claimed earlier that arrays don't keep track
of their own length if you want to know how many words the human typed at
the prompt after your program's name you have to be told not just the array
of the words but the length of that array the strings you can figure out the
length of using Sterling but you can't figure out the length of the array of
strings the collection of words that the human typed
in so how can I now use this well let me go ahead and do this let me go
ahead and change this program now just to be print F quote unquote hello
comma uh percent s back sln then Arvy bracket 1 so this is not the best
version of my code yet but it's my first make greet and now let me dog greet
David all at once enter hello David now let me run it again dogre Carter enter
hello Carter you know it's a marginal Improvement but I don't have to wait
for get string to prompt me to hit enter
it's just speeding things up you know is fast one less command to type in but
I deliberately did bracket one but where what's the beginning of arv it would
be bracket zero well what's that this is sometimes useful though for now it's
not suppose I recompile my code and run this program now greet David
anyone want to guess what's in argv Z say again greet enter hello. SLG greet
so if you want sort of inception style your program to figure out what its own
name is or at least how it was executed at
the command line at the terminal you can look at ARG v0 in general probably
not that useful probably better to start looking at bracket one which was the
first word after the program name and if there were more I could do this how
about RV bracket 2 let me add in a second percent s let me recompile greet
let me do greet David maen enter and that two now works taking in two
words at the prompt if I really want to be smart at this now I could do
something like this though how about if the count of arguments AKA AR C
equals equals 2
then assume that the human typed in only their first name and do print F uh
hello comma percent s back sln and then uh argv bracket 1 else if the human
did not provide exactly two arguments the name of the program and their
own name let's just print out a default value L they forgot their name or they
typed in two names or three names let's just do uh hello comma world as a
default and we'll just ignore what the human typed in if I recompile this make
greet I can dog greet and David again enter oops uh
sorry what am I missing yeah so newbie mistake else all right make greet
again dog greet David enter there's my hello David but if I omit my name I
just get the generic like a default value and if I get a little curious and I type
in both names then I get ignored two why because I just haven't built in
support for arxy of three I could do anything I want but now we have access
to these kinds of building blocks all right what else might I do here well it
turns out there might be some final features for us to
now execute um notice though that in C despite what you might see in books
or online tutorials nowadays the two official formats for defining a main
function are either this which we've been using now for Two Plus weeks or
now this whereby you change the void to int argc and then for now string
argv and then empty brackets and we'll see that this two is an a
simplification some training wheels if you will but for now those are the two
forms even though you will see in online tutorials and even
books some people use Maine in different ways these are the two now to
keep in mind and I'll note that these command line arguments are kind of all
over the place didn't probably expect to see this word on the screen here
and what does it mean well it turns out that for decades there's actually this
program that comes with Linux systems in particular called coway why
probably because someone had too much free time once and decided to
write a program that creates asy art out of a cow saying something textually
on
the screen but you use coway Just For Fun by way of command line
arguments so for instance let me propose that uh I go back to vs code here
not because I want to write any code but I just want to use my terminal
window and let me uh maximize my terminal window here and let me go
ahead and type in something like how about c space moo so cow is not a
program I wrote It's been around for decades but we installed it in vs code
for you in the cloud it takes at least one command line argument what do
you
want the cow to say I can say cow say moo and hit enter and voila there's
my asky art of a cow saying moo on the screen it can say multiple words so I
can say hello world and enter and now it says hello world so this is just an
example of a silly program that uses command line arguments but it takes
others too just like clang use this convention of hyphens to change the
output of the program Dash something is just a super common convention
with commandline arguments when you want a very tur notation for some
option like
screen here so this too is just an example of what you can do with these
command line arguments now that we have this building block and there's
one final thing we can now do with code there's one last feature today that
we'll introduce before we now connect all of these dots to readability and and
encryption by talking lastly about something called exit status it turns out
that whenever your main function exits it returns a secret integer that you
can figure out as the programmer or
an advanced user what it was and these exit codes exit statuses are typically
used to indicate errors so for instance over over the past couple of years if
you us zoom and you ever got some kind of error you might have seen a
screen like this it's usually not that helpful maybe tells you to click report
problem or contact support but very often in our human world on Macs PCS
and phones you see cryptic error codes like literally numbers that probably
only Zoom knows or Microsoft or Google or whatever company
wrote the software you're using but that number corresponds to a specific
error that some human somewhere knows might very well happen these are
used similar L although under a different name that we'll talk about later in
the term uh on the web as well have you ever seen this maybe not character
but number so 404 means what so error yes but really not found so why I
mean this is the most Arcane thing and we'll talk in a few weeks about like
what this and other numbers mean but numbers are all around us in
technology and they very often mean something to the technical people who
wrote the software less so to humans like you and me why so many of us
recognize 404 is kind of weird that like that's been around long enough that
we all know it but it really is just a special number that represents an error of
some sort so it turns out the last thing we'll reveal today about what we've
been taking for granted for two weeks is what the int is in Maine we've seen
just a moment ago that the thing in
the parentheses which up until now has been void which means no command
line arguments now in Arc string arv brackets just means yes command line
arguments and we've seen how to access them so the last piece of the
puzzle honestly of all the cryptic syntax the past two weeks is just what int
means int is always there for Maine and it indicates that Maine will always
return an integer even though you and I have never done so explicitly usually
Maine returns Zero by default but it would be weird if you saw
an error message saying zero so zero is just hidden you would never see it
on the screen but it's happening automatically by way of how C is designed
so let me write one final program here I'll call it for instance status . C to
show you these exit statuses code of status. c and then up here let me do
something simple like include cs50.h then include standard i.h and then int
main uh let's do actually let's use a command line argument in argc string
argv so that's copy paste but now
let's do this if argc does not equal to why don't we do something like this
let's not just um default to hello world like last time let's yell at the user so
let's say something like printf missing command line argument so that they
know they screwed up and they need to run the program again correctly else
let's go ahead and say print out uh as before hello comma percent s and
then plug in argv bracket one so the human's name from The Prompt now at
this point let me go ahead and run
statusstatus and I'll will type nothing first first I get yelled at this time I'll type
it again/ status David and it works properly but now let me show you a
somewhat secret cryptic command you can type this at your prompt and it's
just a coincidence that there's another dollar sign Echo dollar sign question
mark totally Arcane but it allows you to see what exit status your program
has ended with so let me run this again the wrong way/ status okay I get the
error message what was secretly returned I can't can't
see it there's obviously no error screen but by typing Echo dollar sign
question mark I can see that oh my program automatically by default return
zero however if I run it again correctly status David enter this is the correct
version but if I run Echo question mark status again it's still entered with zero
and long story short this is just a missed opportunity when something goes
wrong why don't I return a value other than zero zero by default means
success and it's always there automatically but
but you can control this I can go into my code here and return one else if
something works fine I can return zero by default and honestly if I omit the
return zero again zero automatically is returned so let me go ahead and
though be explicit just so I know what's going on make status again/ status
and let's do this correctly with David enter hello David Echo question mark
uh Echo dollar sign question mark zero so all is well but now if I do/ status
and nothing or multiple things but not just David enter
I get the error message but now if I do Echo dollar sign question mark waa
there now is the one so what does this now mean this is in the graphical
world we would just show something like this on the screen which is a little
more informative to the user but even in the Linux world we don't have a
guey necessarily even for the programs we've written you can check these
exit statuses and in fact more comfortable more advanced programmers
when they write code that calls programs be it coway or anything else you
can encode
check what the exit status is of a program and then decide did my program
work or did it not and now let's connect the final dots uh before we adjourn
for some fruit snacks uh cryptography namely one of the applications this
week via which you'll be able to send if you will secret messages and better
yet decre secret messages this will be in addition to perhaps analyzing the
readability of text using puristic like we identified at the start of class 2 so a
cryptography is just the art the science
claim here is to take some plain text like the message you want to send think
back to grade school if you ever passed a note to a friend or to your crush
saying I love you it's a little awkward if the teacher or someone else
intercepts the paper and in English it just says I love you or whatever it is it'd
be nice if you had at least encrypted it in some way but the other person
needs to know what algorithm you used and what inputs you used to that
algorithm so that ultimately they can decode the so-called Cipher text which
to send and then some Secret so for instance suppose that the simplest thing
I could think of as a kid was instead of sending the letter A why don't I write
the letter b instead of the letter B why don't I write the letter c so I can kind
of shift the English alphabet by one space so a becomes b b becomes C dot
dot dot Z becomes a you can wrap around at the end and let's assume no
punctuation in this part of the story so that's a very simple algorithm add a
value to each letter and send the value as the
cipher text and now the teacher the classmate they have to know that you
use not only this rotational algorithm also known as a Caesar Cipher they
also need to know what number you use did you add one to every letter two
to every letter 25 to every letter now if they're super smart and probably not
the the young age in this story they could also just try all possibilities and
that would be an attack on the algorithm this is not a sophisticated algorithm
but it's enough to send a message in class so if the two
inputs now are high as the plain text message and one as the so-called key
the secret number that only you and the other person know you might uh be
able to encrypt a message from one way to the other and so in this case for
instance high would become i j exclamation point in this version of the
algorithm we're not going to bother with numbers or punctuation will only
operate on a through z be it uppercase or lowercase so now if you were to
receive a a slip of paper in class with i j on it you
know you the recipient would know what it is so long as you know that the
sender used one because you just reverse the algorithm and you subtract
one instead the teacher you know they probably don't know what this means
and they're not going to spend time hacking the message so it just looks
scrambled to them and that's what we get from encryption someone who
intercepts it be it in class or in the real world on the Internet or anywhere
else can't actually figure out ideally what it is you have
sent the opposite of course is indeed called decryption but the process is the
same we now pass in negative one and so how about this why don't we end
with a demonstration here u y JT xbt dt50 there's a bit of a tell there if we
pass that in and do Nega -1 well how do we get out the plain text originally
well if this is the cipher text and we subtract one from each letter I think U
becomes t i becomes h j becomes i t becomes s x becomes w b becomes a t
becomes s d becomes c t becomes s and
this was indeed cs50 have a duck on your way out and some snacks in the
[Applause] [Music] lobby [Music] [Music] [Music] [Music] oh [Music] all right
all right this is cs50 and this is week three already wherein we'll take a look
back actually at week zero where we first began and in week zero recall that
everything was very intuitive in a sense we talked not just about
representation of information but algorithms and we talked about tearing a
phone book again and again and that somehow got us to a better solution
but
today we'll try to start formalizing some of those ideas and capturing some
of those same ideas not in pseudo code just yet but in uh actual code as well
but we'll also consider the the efficiency of those algorithms like just how
good how welld designed our algorithms actually are and if you recall when
we did the phone book example where in I first had an algorithm searching
one page at a time and then second when two pages at a time and then third
started tearing the thing in half recall that we with a wave of
the hand kind of analyzed it as follows we proposed that if the x-axis here is
the size of the problem like number of pages in a phone book and the y- axis
is the time required to solve the problem in seconds minutes page tears
whatever your unit of measuring is recall that the first algorithm was this
straight line such that if you had n pages in the phone book it might have
this slope of N and there's this one toone relationship between pages and
tears two pages at a time of course was twice as fast but
still really the same shape the yellow line here indicating that yeah it's n /
two maybe plus one if you have to double back as we discussed but it's
really still fundamentally the same algorithm one or two pages at a time but
the third algorithm recall was this one here in green where we called it
logarithmic in terms of how fast or how slow it was and indeed the
implication of this algorithm was that we could even double the size of the
phone book and no big deal one additional page tear and we take yet
another thousand page bite out of the phone book so today we'll revisit some
of these ideas formalize them a bit but also translate some of them
ultimately to code and all of that now is possible because we have this lower
level understanding perhaps of like what's actually inside of your computer
this of course is your computer's Ram or memory and recall that if we kind of
start to abstract this away your compter computer's memory is really just a
grid of bites in fact we don't have to look
at the hardware anymore and we looked at a grid of bites like this whereby
each of these bites could be used to store a Char an INT a long or even an
entire string at that but let's focus perhaps just on a subset of this because
last week of course we emphasized really arrays storing things in arrays and
that allowed us to start storing entire strings sequences of characters and
even arrays of integers if we want to have multiple ones and not just multiple
variables as well but the catch is that
if you look inside of an array in the computer's memory and for instance
suppose these integers here are stored it's pretty easy for us humans to
glance at this and immediately find the number 50 you sort of have this
bird's eye view from where you're seated of everything on the screen and so
it's pretty obvious how you get to the number 50 but in the world of
computers of course it turns out that this is hardware and computers for
today's purposes can only do one thing at a time they can't just take it
all in and find instantly some number like 50 so perhaps a decent metaphor
is to consider the array of memory inside of your computer really is a
sequence of of closed doors and if the computer wants to find some value in
an array it has to do the digital equivalent of opening each of these doors
one at a time now how can code do that well of course we introduced indices
or indexes last week whereby we by convention call the first element of an
array location zero the second location one the Third
location two and so forth so-called zero indexed and this allowed us to now
bridge this conceptual world of like what's going on in memory with actual
code because now we had this square bracket syntax via which we could go
searching for something if we so choose and it turns out if I now uh paint
these red instead of yellow it would seem that we actually have a pretty
good physical metaphor here standing in place for what would be a
computer's uh array of memory if for instance you're storing some
seven numbers like that and so today we begin with a look of a specific type
typ of algorithm that is for searching like searching is all over the place all of
us have probably gone to [Link] or some equivalent like already
multiple times per day and getting back answers fast is what companies like
Google are really good at so how are they doing that how are they storing
information in computers uh memory well let's consider what this really is it's
really just a problem as it was back in week zero the
input though to the problem for now might be this array of seven lockers so
that's the input to the problem inside of which is a number and maybe for
Simplicity now we just want a yes no a true false answer a bull that is to say
of whether or not some number like 50 is in that array it's not quite as fancy
as Google that doesn't just tell you yes we have search results it actually
gives you the search results but for now we'll keep it simple and just output
as part of this problem yes or no true or false
we have found the number we're looking for given an input like that array
but it turns out inside of this black box that we keep coming back to there's
all sorts of possible algorithms and we talk about this at a high level
conceptually in week zero with the phone book but today let's consider it a
little more concretely uh by way of a game that some of you might have
grown up with namely Monopoly and so behind these doors it turns out we'll
be hidden some denominations of Monopoly money but for
this we now have two volunteers if you'd like to greet the world hi I'm
Jackson yay hi my name is Stephanie Steph and you want to say a little
something about yourselves years house first year living in Matthews nice
and I'm a first year in Canada nice well Welcome to our two volunteers so
why don't we do this would one of you like to volunteer the other to go first
all go first okay all right so Stephanie's up first and behind one of these doors
here we've hidden the Monopoly money 50 and so we'd like you
to find the 50 we'll tell you nothing more about the lockers but we would like
you to execute a certain algorithm and in fact I'm going to give you some
pseudo code for this and I'm going to give you the name for it it's called
linear search and as the name implies you're pretty much going to end up
walking in sort of a straight line but how are you going to do this well let me
propose that in a moment your first step will be to think kind of like a loop for
each door from left to right what do we
want you to do on each iteration well if 50 is behind that door then we want
to go ahead and have you return true and sort of hold up the 50 proudly if
you will for the group otherwise if you get through that whole Loop and you
haven't found the number 50 you can just throw up your hands and
disappointment false you've not found the number 50 so to be clear step one
is going to be for each uh door from left to right how would you like to begin
yep oh and then yep there we go yep oh and if you'd like to at least
tell oh good good acting here what have you found instead it's not 50 but 20
oh okay so step one was a fail so let's move on to step two inside of the Loop
what are you going to do next I'm going to move to the next door okay
almost okay almost sort of a 500 instead next Locker I would rather take no
okay we're not telling the audience oh okay so keep going this is step three
now oh [Music] man five okay few more lockers to check a little sad guys all
right second to last step this one kind of close all right
and finally the last clearly you've been perhaps set up here let's go all right
so the number 50 and Stephanie if I may let me ask you a question here so
on the screen this is the pseudo code you just executed suppose though I
had done what many of us have gotten into the habit of doing when you
have a if condition you often have an else Branch as well suppose that I had
done this now and I'm marking it in red to be clear this is wrong but what
would have have been bad about this code using an if and an else might you
say any instincts um then you would end up like cancelling canceling the
code before you found the 50 yeah exactly just be eternally sad indeed when
Stephanie had opened the first Locker she had found 20 20 of course is not
50 she would have decreed false but of course she hadn't checked all of the
rest of the locker so that would seem to be a key detail that would this
implementation of the SoDo coat we actually do go through as we did and
only return false not even with an else but just at the end of the loop such
that we only reach that line if we don't return truer uh earlier than that well
let's go ahead and do this let me take the mic from you if you'd like to take a
seat next to Jackson Jackson in just a moment we'll have you come up Carter
if you don't mind reorganizing the uh lockers for us but in the meantime let
me point out how we might now translate that same idea to code pretty high
level pretty English oriented with that pseudo code but really now as of last
week we have syntax via which Stephanie and soon
Jackson could treat this Locker the set of lockers as really indeed an array
using bracket notation so we can now get a little closer in our pseudo code to
actual code and the way a computer scientist for instance would translate
fairly high level English pseudo code like this to something that's a little
closer to C or any language that supports arrays would be a little more
cryptically like this but you'll see more of this syntax in the coming days for I
from 0 to n minus one this is still pseudo code but that's kind of
like the english-like way of expressing what we've known come to know as a
for Loop if 50 is behind doors bracket I so I'm assuming for the sake of
discussion that doors now is the name of my variable this array of seven
doors but then the rest of the logic the rest of the pseudo code really is the
same way and so you'll find in time that programmers computer scientists
more generally when you start expressing ideas algorithms to someone else
instead of maybe operating at this level here
you now have in your vocabulary really a new new syntax that you can be a
little more specific not getting so into the weeds of writing actual C code but
at least now doing something that's a little closer to manipulating an array
like this so Jackson would you like to uh stand on up all right yes yes support
for Jackson here too nice and here now I'm going to allow you an assumption
that Stephanie did not have Stephanie clearly was really doing her best
searching from left to right using linear searches we'll Now call it
but they were pretty much in random order right there was a 20 over there
there was a one over there and then a 50 so we deliberately jumbled things
up and did not sort the numbers for her but Carter kindly has just come up to
give you a leg up Jackson by sorting the numbers in advance and we'd like
you this time much like in week zero to do something again and again but
this time using what we'll Now call binary search it's exactly the same
algorithm conceptually as we did in week zero but
amend my my pseudo code here and just say Jackson if we don't hand you
any doors at all or eventually as he's dividing and conquering if he's left with
no more doors we have to handle that situation so that the def behavior is
defined find all right so with that said Jackson do you want to go ahead and
find us the number 50 and walk us through verbally what you're doing and
finding all right so it looks like this one is the middle door so I'm going to
open it but it's 20 not 50 oh sad okay what's
going through your head now so now I'm looking because 50 is higher than
20 I want to look to the right good um and look for the new middle door
which would be here nice and it's 100 sad um but 50 is less than 100 so now
we know to look left which would be here and Tada nice very well done this
time around too so thank you first to our volunteers here and in fact um
since you're fan of Monopoly as we're so informed we have the Cambridge
edition of Monopoly with all your Harvard favorites here you go
thank you so thank you to our volunteers for finding us 50 so kind of was
more popular than we expected so here we can translate this one more time
into something a little closer to code and again still pseudo code but here
now might be another formulation of exactly what Jackson just did just using
the nomenclature now of arrays where you can be a little more precise with
your instructions and still leave it to someone else to translate this finally to
code but here we have same question at the beginning if no
doors left return false if 50 is behind doors bracket middle so I'm assuming
here because this is pseudo code that somewhere I've done the mental math
or the actual math to figure out what the index of middle is for instance if
these are seven doors in an array this would be location zero 1 2 3 4 5 6 so
somehow I've taken the total number of doors seven divided by two to find
the middle that's three and a half we have to deal with rounding but suffice it
to say there's a well- defined formula for
finding the middle index given the total number of lockers divide by two and
then round accordingly so that's presumably what Jackson did just by
counting in his head to find us door number three not the third door the
fourth door but door bracket three so this is just saying if 50 is behind door is
bracket middle return true that was not the case he found a $20 bill instead
else if 50 is less than the uh doors bracket middle go ahead and now it gets
interesting search doors zero through doors middle minus
one so it's getting a little more to the Weeds now but if middle is three this
one here well we want to now have Jackson search if 50 had been uh if the
number had been less we want to start at bracket zero and go up through
this one when we deliberately subtract one CU what's the point of looking in
the same Locker again we might as well do zero through middle minus one
else if 50 is greater than doors bracket middle which it was what did we then
do Jackson intuitively searched for doors middle
plus one through door n minus one and honestly it gets a little Annoying
having the pluses and the minuses here but just think of what it means this
is the middle door and Jackson then did proceed to search through doors
middle plus one because there's no point in searching this one again and
then the last element in any array of size n where n is just our go-to number
uh for the size is always going to be n minus one it's not going to be n it's
going to be n minus one because we always start counting at Rays
at zero so here then we have a translation into pseudo code that's a little
closer to C of this exact same idea and here we come full circle to week zero
like in week zero is pretty intuitive to imagine dividing and conquering a
problem like this but if you now think back to actual your iPhone your
Android phone or the like when you're doing autocomplete and searching the
list it's possible if you don't have many friends or family or colleagues in the
phone you know what linear search
just checking every name for the person you're searching for might be
perfectly fine but odds are your phones being smarter than that especially if
you start to have dozens hundreds thousands of people in your contacts over
the years what would be better than linear search well perhaps binary search
but but but there's an assumption a requirement which is what why was
Jackson ultimately able to find the 50 in just like three steps instead of a full
seven like Stephanie because the array was sorted
and so this is sort of a teaser for what we'll have to come back to later today
well you know how much effort did it take someone like Carter how much
effort does it take your phone to sort all of those names and numbers in
advance because maybe it's not actually worth the amount of time now
someone like Google probably somehow keeps the database of web pages
sorted you could imagine it being super slow if when you type in cats or
something else into [Link] if they searched linearly
over their entire data set ideally they're doing something a little smarter
than that so we'll formalize now exactly this kind of analysis and it's not
going to be so much mathy as it still will be in intuitive but we'll introduce
you to some some jargon some terminology that most any programmer or
computer scientist might use when analyzing their own algorithms let's
formalize now what this kind of analysis is so right right now I claim binary
search better than linear search but how much better and
why exactly well it all comes back to this kind of graph so this recall is how
we analyzed the phone book back in week zero and recall that indeed we
had these these formulas rough formulas that describe the running time of
those three algorithms one page at a time two pages at a time and then
tearing the thing again and again in half and precisely if you count it up the
number of pages I was touching or the number of pages I was tearing it's fair
to say that the first algorithm in the worst case might
have taken n total Pages it didn't because I was searching for John Harvard at
the time which is somewhat early in the alphabet but if I were searching with
some for someone with the last name of Z I would have had to keep going
and going in the worst case through all n Pages not as bad for the second
algorithm and that's why we do n divided by two and even that's a bit of a
white lie right it's probably n / 2 + 1 in case I have to double back but again
I'm sort of doing this more generally to
capture the essence of these things and then we really got into the weeds
with like log base 2 of n for that third and final algorithm and at the time we
claimed anytime you're doing dividing something in half in half in half odds
are there's going to be some kind of logarithm involved and we'll see that
today but today we're going to actually start using computer science
terminology and we're going to sort of formalize this imprecision if you will
we are not going to care generally about exactly
how many steps some algorithm takes because that's not going to be that
enlightening especially if maybe you have a faster computer tomorrow than
you did today it wouldn't really be fair to compare numbers too precisely we
really kind of want to with a wave of the hand just get a sense of roughly
how slow or how fast an algorithm is so the notation here is deliberate that is
literally a capital O often italicized referred to as Big O and so the first
algorithm is in Big O of n the second algorithm is in
Big O of n / two the third algorithm is in Big O of log base 2 of n but even that
is kind of unnecessary detail when using Big O notation you really don't care
about we'll see the smaller ordered terms right we're not going to care about
the divided by two because you know what the shape of these algorithms is
almost the same and really the idea the algorithm itself is sort of
fundamentally the same okay and instead of one page at a time I'm doing
two but if you throw millions of pages billions
of pages at me those algorithms are really going to kind of perform the same
as n gets really large goes off toward infinity and the same is true for
logarithms even if you're a little rusty it turns out that whether you do the
math with log base 2 log base 3 log base 10 you can just multiply one by the
other to really get the same formula this is only to say a computer scientist
would generally say that the first two algorithms are on the order of eps the
third algorithm is on the order of log
end steps and we don't really care precisely what we mean beyond that and
this Big O notation as we'll see and actually let me let me zoom out if you
can imagine suddenly making the x- axis much longer so more pages on the
screen at once it is indeed going to be the shapes of these curves that
matter because imagine in your mind's eye as you zoom out zoom out zoom
out zoom out and as n gets much much much bigger on the xaxis the red
and the yellow line are essentially going to look the same
once n is sufficiently large but the green line is never going to look the same
it's going to be a fundamentally different shape and so that's the intuition of
bigo to get a sense of these uh rates of performance like this so here then is
Big O here is perhaps a cheat sheet of like the common formulas that a
computer scientist certainly in an introductory context might use when
analyzing algorithms and let's consider for a moment which of our first two
algorithms linear search and binary
search fall into these categories so I've ordered them from sort of slowest to
fastest so order of n s it's not something we've actually seen yet but it tends
to be slow because it's quadratic you're doing n * n that's got to add up to a
lot of steps better today is going to be n Lin even better is going to be n even
better than that is login and best is soall order of one like one step or maybe
two steps maybe even a thousand steps but a fixed finite number of steps
that never changes no how matter how big
n is so given this chart just to be clear linear search let's consider the worst
case in the worst case how many steps did it take someone like Stephanie to
find the uh solution to the problem assuming not seven doors but n doors
yeah so on the order of N and in this case it's exactly n but you know what
you know maybe it's arguably too n right because it took Stephanie a couple
of steps like she had to lift the latch she had to open the door maybe it's
three steps you had to show the money so now
it's 3n 2 N but there we don't really care about that level of precision we
really just care about the fundamental number of operations so we'll say yes
on the order of n so that might be an upper bound we'll call this for linear
search and how about binary search in Jackson's case or in general me and
week zero if there's end doors how many steps did it take Jackson or me
using binary search in this case it was literally three but that's not a formula
yeah so it's on the order of log
n and indeed if there's seven doors well that's almost eight if you just do a
little bit of rounding and indeed if you take log base 2 of eight okay so that
does actually give us three so the math actually checks out and if you're not
comy with logarithms no big deal just think about it intuitively uh logarithm
of base two is just dividing something again and again so on this chart when
we consider Big O which to be clear allows you to describe the order of an
algorithm's running time like the
over time some algorithms might always take a minimum of n squ steps or
on the order of n steps some might only take n logn or n or log n or one so
something like uh linear search when Stephanie started with linear search
she didn't get lucky this time on stage but what if she had and the first door
she opened were 50 how much you then describe the lower bound on linear
search in this so-called best case using this list of possible answers yeah
yeah so Omega of one so in the best case the lower bound on how
many steps it might take uh linear search to find something might just be
one step why because maybe Stephanie had gotten lucky and we had pre-
filled these lockers with the numbers in some other order such that she
might have open the first locker and waila the number 50 could have been
there so a lower bound arguably could indeed be Omega of one for linear
search and how about now for Jackson he used binary search so he dived
right into the middle of the problem but what would be a lower bound
on binary search using this logic yeah yeah so again Omega of one why
because maybe he just gets lucky and indeed right in the middle of the
lockers could have been the number 50 it wasn't and so more Germain in
Jackson's uh actual practice would have been the Big O discussion but Big O
and Omega upper bound and lower bound just allow a computer scientist to
kind of wrestle with what could happen maybe in the worst case what can
happen in the best case and you can even get even more precise like the
average case or the
idents of both upper bound and lower bound that is they are one and the
same that was not the case for our discussion a second ago of linear search
not the case for binary search but you could use the same kinds of formulas
if it turns out that your upper bound and lower bound are the same so for
instance if I were to count everyone like literally in this room 1 2 3 4 five six
and so forth you could actually say that counting in that way is in Theta of n
right because I in the best case it's going to take me
end points at the uh people in the audience in the worst case it's going to
take me n it's always going to take me end steps if I want to count everyone
in the room you can't really do better than that unless you skip people so
that would be an example off the cuff of something where Theta is instead
germine are any questions now on Big O on Omega or Theta which are now
just more formal tools in the toolkit for talking about the design of our
algorithms any questions no seeing none yeah oh is this yes no
okay so we're good so let's go ahead and translate this perhaps to some
some actual code let me go over to vs code here and let's see if we can't
now translate some of these ideas to some actual code not so much using
new syntax yet we're going to still operate in this world of arrays like last
week so let me go ahead and create a program called search. C by executing
code space search. c in my terminal and then up here let's go ahead and
include our usual so include cs50.h so I can get
some input include standard i.h so I can print some output we'll do int main
void which the meaning of which we did start to tease apart last week the
fact that it's void again today just means no command line arguments and
let me go ahead and do this let me go ahead and declare just for discussion's
sake a static array like an array that never changes and the Syntax for this is
going to be give me an array called numbers using the square bracket not
ation and I'm going to immediately initialize it
new there and now let me go ahead and Implement linear search and the
pseudo code we had for this before used some array like notation let me go
ahead then and start similarly four in I and it's you almost always start
counting at I by convention so that's perhaps a good starting point I'm going
to do this so long as I is less than seven not the best design to hard code the
seven but this is just for demonstration sake for now because I know how
many numbers I put in there and then I'm going to do
i++ so now I have the beginnings of a loop that will just allow me to iterate
over the entire array and let me ask this if the current number at location I
equals equals n which is the number the human typed in then let's go ahead
and do something simple like print F quote unquote found back sln and then
per our discussion last week to indicate that this is successful I'm going to
going to return zero if I found it and if I don't find it I'm just going to go down
here and by default say not found back sln
and just for convention whoops just for good measure per convention I'll
return one or really any value other than zero zero recall means success and
any other integer tends to mean error of some sort irrespective of the
number I'm looking for so just to revisit the only thing that's new here is the
syntax for creating an array of seven numbers these numbers and then after
after that we have really highlighted here an implementation of linear search
I mean this is the C version I dare say of what
Stephanie did on the board whereas now the array is called numbers instead
of doors but I think it's pretty much the same let me go ahead and open my
terminal window and run make search seems to compile okay search and
let's go ahead and search for a number we'll start with what we did before 50
and it's found let's go ahead and run it again/ search let's search for maybe
20 at the beginning that one too is found let's run it one more time searching
for like 1,000 which is not uh in among the
uh denominations and that one indeed is not found so we've taken an idea
from week zero now formalized in week three and just translated it now to
code questions on this implementation of linear search linear search nothing
oh so successful so far today okay so let's see if we can't maybe make this a
little more interesting and see if we can't trip over a detail that's going to be
important in C instead of doing numbers let me go ahead and do this we'll
stay on theme with Monopoly and I went down
the rabbit hole of reading the wikkipedia article on Monopoly and the original
uh pieces or tokens that came with Monopoly and it turns out we can
represent those with strings so I'm going to create an array called strings
plural of whatever size I defined here and the very first first Monopoly pieces
back in the day were a battleship that you could play with a boot a cannon
an iron a thimble and a top hat some of which you might know from the
game nowadays turns out they've been changing these uh had
no idea over the years so here is now an array of strings let me go ahead and
prompt the user now not for an integer anymore I want to Now search for
one of these strings still using linear search so let me create a string s set it
equal to string prompt the user for a string to search for and then I think my
code here is almost the same except for one detail I now have an array
called strings I now have a variable called s but it turns out for reasons we'll
explore in more detail next week this
line of code is not going to work and it turns out the reason has to do with
what we discussed last week of like what a string really is and what is a
string again a string is an array and it turns out though that equals equals is
not going to generously compare all of the characters in an array for you just
because you use equal equals it turns out it's not going to compare every
letter and so thankfully there is in the uh string library that we introduced
last week a solution to this problem the
reason for the problem we'll explore in more detail next week but for now
just know that when you want to compare strings in C especially if you've
come into the class knowing a bit of Java or python or some other language
you cannot use equals equals even though you could in scratch you cannot
in C so what I have to actually do here is this I have to ask the question does
the return value of a function called stir compare or stir comp equal zero
when passed in the current string and that user input so if
you read the documentation for this function called stir compare you'll see
that it takes two strings as input first one and second one it then someone
decades ago wrote the code that probably uses a four Loop or a while loop to
compare every character in each of those strings and it turns out it returns
zero if they are in fact equal turns out too it will return a positive number or a
negative number in other situations any intuition for why it might actually be
useful to have a function that allows
you to check if two strings are equal if they're not equal what else might be
interesting to know when comparing two strings are if certain values are
okay possibly maybe you want to know just how similar they are um and
that's indeed an algorithm unto itself but stir compare is a little simpler than
that exactly if you're trying to like alphabetize a whole list of strings just like
your phone probably is for your context or address book it turns out that stir
compare will actually return a
there like a race car which was there when I grew up but huh segmentation
fault core dumped like and actually some of you have tripped over this error
before anyone want to admit seeing this so yeah not something we've talked
about and um honestly not something I intended just now but that too we'll
see next week any intuition for why my program just broke I didn't really
change the logic it's still linear search let me hide the terminal so you can
see all of the code at once the only thing I did was
switched from integers to Strings and I switched to stir compare here but
segmentation fault happened and the teaser is that that somehow relates to
the computer's memory yeah yeah and this is subtle but spot on so 1 2 3 4 5
six elements total in this array versus the seven number of Monopoly
denominations that we had earlier and this is where see sort of case in point
this came back to bite me the fact that I hardcoded this value as to opposed
to maybe separating it out as a constant or declaring it higher up
kind of bit me here because now I'm iterating over an array of size six but
clearly I'm going one step too far because I'm literally going to iterate seven
times not six so it's as though I'm looking at memory that's over here and
indeed next week we'll focus on memory and that's just a bad thing so odds
are not even and seeing your code from this past week if any of you have
had segmentation faults odds are you touched memory that you shouldn't
have you maybe looped too many times you
might have uh used a negative number to get into your array in general you
touched memory that you shouldn't have and you touched a segment of
memory that you shouldn't have the fix though at least in my case is simple
just don't do that so let me go ahead and recompile this make uh search do/
search and I'll search again for uh race car and and now it does not crash but
it does tell me it's not found so subtle but something you might yourself
have tripped over already questions then on
what I just did intentionally or otherwise yeah in front return don't return Z
return so what the a really good question so the program will still work even
if I don't return zero or return one in fact let me go ahead head and do that
and just hide my terminal window for a second let's get rid of the return here
let's get rid of the return here however Watch What Happens here uh let me
go ahead and recompile this make search Let Me scroll up in my code here
let me go ahead and do do/ search and let me go ahead and
search for the first thing in the list Battleship so I know that this should be
found I hit enter huh interesting so it's saying found not found but do you see
why logically in this case exactly so the loop is still running so there's a
couple of solutions to this I could for instance somehow break out of the code
here but that's going to still result in line 18 executing I could then instead
just return here I don't strictly need to return one down at the bottom but I
made this claim last week
lower level way of signaling H it didn't really find what I was looking for and
remember from last week you can see this as follows if I recompile this again
now that I've reverted those changes so make search and if I do uh /search
and search for Battleship which is indeed found recall I can execute this
magical command Echo dollar sign question mark which you're not going to
often execute but it shows you what main returned if I run search again and
search for race car which is not found I see not found but I
can also run this command again and see that oh it returned one so now if
you fast forward a few months a few years when you're actually writing code
in a company or for larger projects you might want to be automating
software you might not want the human to necessarily be running it
manually you might want um code to be automated by some nightly process
or something like that using these exit codes can a program determine yes
or no that other code succeeded or failed other questions on linear search
in this way no all right well let's translate this to one other feature of uh C
here by incorporating these ideas now into one other program so I'm going to
create a phone book in C by doing code space phone [Link] a phone book
for an actual name and getting back a number so I'm going to go ahead and
quickly include some of the same things cs50.h so we can get input uh
standard io. so we can print output and I'm going to preemptively include
string.h in case we need that one as well uh int main void no need for
uh command line arguments today and let me me give myself now an array
of names for this phone book so string names equals and then in curly
braces how about Carter will be one person in the phone book and David
myself will be the other so we'll keep it short so we don't have to type too
many names but this is a phone book with two people thus far suppose now
we want to also store Carter's phone number in mind so it's not just saying
found or not found it's literally looking up our phone numbers
like a proper phone book well at the moment there's really no way to do this I
could do something hackish like I could put a number like 617495 1000 after
Carter I could maybe do something like 949 uh 468 2750 after me but now
you're kind of doing the whole apples and oranges thing right like now it's
not strings it's a string int string int all right so maybe I could just make all of
these strings but now it's just a conceptual mixing of apples and oranges like
yes that's an
array of four strings but now you're on the honor System to know that the
first string is a name the second string is a number the third string is like you
can do it but it's a bit of a hack so to speak so what might be cleaner than
this instead of combining our phone numbers into the same array as our
names what else might we do that's perhaps a little better say a little Lou a
2d array uh possibly something we could do I'm going to keep it even simpler
now because we haven't used
those by name even though that is we saw last week technically what argv is
what else could I do if I want to store names and numbers yeahp yeah let me
go with this suggestion just it's a little simpler rather than complicate things
in literally different dimensions let me go ahead and do string well I could do
int numbers but you know what so that we can support punctuation like
dashes or even parentheses or country codes I'm going to do this instead I'm
going to do string numbers so that I can represent
Carter's number as quote unquote plus one for the US 617 495 1,000
complete with hyphens as is us convention and then for mine I'll go ahead
and do + one 949 how about 468 275 semi colon and now down below let's
actually enable the user to search this phone book just like in week zero we
did string name equals get string and let's ask the user for a name
presumably David or Carter or someone else and now let's re-implement
linear search so four in I gets zero I is less than two and do as I
say not as I do I think we should beware this coding but we'll keep it simple
for now i++ and then in this for loop I think we have all of the ingredients to
solve this so if the return value of stir compare of all of the names bracket I
comparing against the name that the human typed in if all of that equals
equals zero that is all of the characters in those two strings are equal then I
think we can go ahead and say found just like last time but you know what
let's actually print Carter or
of the strings and then I'm printing from the other array the answer so let me
go ahead here and run the compiler make phone book enter okay that's
promising no errors do/ phonebook now and let's search for for instance
Carter enter all right so we found Carter's number all right let me do that
again phone book Let's search for David all right we seem to have found
David's number all right let's do it one last time phone book enter and now
we'll search for like John Harvard enter not
found all right so I dare say albeit with minimal testing this code is correct
would anyone now like to critique the design does something rub you the
wrong way perhaps about this approach here and as always think about how
if the program maybe gets longer more complicated how decisions like this
might unfold yeah okay so if I is less than two so technically I if I change the
number of people in this phone book I'm going to have to update I and we've
already seen that I get myself into trouble so that's
bad design goodes yeah so again I'm sort of trusting myself not to screw up
if I add John or anyone else to the first array but I forget to add their number
to the second array you know eventually things are going to drift and be
inconsistent and then the just code will be incorrect at that point so so sort of
a poor design setting me up for future failure if you will other thoughts yeah
yeah really good we're assuming the same order from left to right the names
go and from left to right the numbers go
but that's kind of just the honor System like there's literally nothing in code
preventing me from reversing the order for whatever reason or maybe
sorting the names like they're sorted now and maybe that's deliberate but
maybe it's not so this honor System here too is just not good right I could put
a comment in here to remind myself to uh you know note to self always
update arrays the same way but like that's something's going to happen
eventually especially when we
have not two but three but 30 300 names and numbers it would be nice to
keep all of the related data together and so in fact the one new feature of C
we'll introduce today is one that actually allows us to implement our very
own data structures you can think of a raay as a very lightweight data
structure and that allows you to Cluster related data back to back to back to
back and this is how strings are implemented they are a data structure
effectively implemented with an array but with c and with other
languages it turns out you can invent your own data types whether they're
one dimensional two-dimensional even or Beyond and with uh with C can you
specifically create your own types that have their own name so for instance
wouldn't it have been nice if C came with not just Char and int and float and
uh uh long and and others wouldn't it be nice if C came with a data type
called person and ideally a person would have a name and a number now
that's a little naive and unrealistic like why would
they define a person to have just those two Fields certainly people could
have disagreed what a person is so they leave it to us like the authors of C
gave us all of these Primitives inss and floats and strings and so forth but it's
up to us now to use those in a more interesting way so that we can create an
array of person variables if you will inside of an array called people just to
pluralize it here so how are we going to do this well for now let's just
stipulate that a person in the world
will have a name and a number that we could argue all day long what else a
person should have and that's fine you can invent your own person
eventually at the moment I'm using just two variables to define a person's
name and number but wouldn't it be nice to encapsulate that is combined
these two data types into a new and improved data type called person and
the Syntax for that is going to be this so it's a bit of a mouthful but you can
perhaps infer what some of this is doing here so it
turns out C has a keyword called type def as the name kind of suggests this
allows you to Define your own type struct is an indication that it's a structure
it's like a a structure that has multiple values inside of it that you are trying
to Define and then at the very bottom here outside of the Curve early braces
is the name of the type that you want to create so you don't have discretion
over using type def or struct in this particular case but you can name the
thing whatever you want and
you can put anything in the structure that you want as well and as soon as
this semicolon is executed at the bottom of the code every line thereafter
can now have access to a person data type whether as a single variable or
as an entire array so if I want to build on this then let me go ahead and do
this let me go back to my C code here and I'm going to go ahead and uh
change just a couple of things let's go ahead and do this I'm going to go
ahead and first get rid of those two hardcoded arrays and
let me go ahead and at the top of my file invent this type so type def struct
inside of it will be a string name and then a string number and then the
name of this structure will be person and best practice would have me Define
at the very top of my file so that any of my functions in fact could use it even
though I just have Main in this case now if I want it I could do this like person
P1 and person P2 but we know from last week like that already is bad design
if you want to have multiple instances of
the same type of variable we should probably use what instead and yeah an
array so let me not even go down that road let me instead just do this person
uh will be the type of the array but I'm going to call it I could call it persons
but in English we typically say people so I'll call the array people and I want
two people to exist in this array though I could certainly change that number
to be anything I want how now do you put a name inside of a person and
then put the number inside of that same person well
slightly new syntax today I'm going to go ahead and say this people bracket
zero just gives me the first person in the array that's not new but if you want
to go inside of that person in memory you use a DOT and then you just
specify the name of the attribute therein so if I want to set the first person's
name to Carter I just use that so-called dot notation and then if I want to set
Carter's number using dot notation I would do this plus one 617495 1000 and
then if I want to do the
same for myself I would now do people bracket 1name equals quote unquote
David and then people bracket one still number equals quote unquote + 1
949468 2750 and now at the bottom of my file I think my logic can pretty
much stay the same I can still on this line here prompt the user for the name
of the person they want to look up for now even though I admit it's not the
best design I'm just doing this for demonstration sake I'm going to leave the
two there because I know I have two people but
down here this is going to have to change I don't want to compare names
bracket I anymore what do I want to type here as the first first argument to
stir compare what do I want to do here yeah so people i. name yeah so I
want to go into the people array at the I location because that's what my
Loop is doing it's updating I again and again and then look at name and
that's good I think now I need to change this too what do I want to print if the
person is found someone else what do I want to print here if I
found the person's name yeah say a little louder perfect so people bracket.
number if indeed I want to print the corresponding number to this person
and then I think the rest of my code can stay the same so let me go ahead
and run make phonebook to recompile this version so far so good/
phonebook let's go ahead and type in Carter's name found all right let's go
ahead and run it again David's name found let's go ahead and run it one
more time type in John Harvard for instance not found in this
case so fundamentally the code isn't all that different linear search is still
behaving the same way and I admit this is kind of ugly looking like we've
kind of made a two-line solution like five lines of code now but if we fast
forward a week or two when we start saving information to files uh we'll
introduce you to files like CSV value CSV files comma separated values or
spreadsheet files which you've surely opened on your Mac or PC at some
point in the past suffice it to say We'll soon learn
Techni for storing information like names and numbers in files and at that
point we're not going to do any of this hackish sort of hardcoding of the
number two and manually typing my name and Carter's name and number
into our program we'll read the information dynamically from a file and in a
few weeks we'll read it dynamically from a database instead but this is for
now just syntactically how we can create an array of size two containing one
person each we can update the name and number
of the first person update the name and the number of the second person
and then later search across those names and print out the corresponding
numbers and in this sense this is a better design why because my person
data type encapsulates now everything that it means to be a person at least
in this narrow world and if I want to add something to the notion of a person
for instance I could go up to my typed Def and tomorrow add an address to
every person and start reading that in as well and now it's not the honor
System it's
not a names array a numbers array and a addesses array and everything
else you might imagine related to a person it's all encapsulated which is a a
term of art inside of the same type reminiscent if some of you have
programmed before of something called objectoriented programming but
we're not there yet C is not that questions on this use of struct or this new
syntax the dot operator being really the juicy part here any questions yeah
on what line number 16 so yes so syntactically we introduced the square
brackets last week
so doing people bracket zero just means go to the first person in the array
that was like when Stephanie literally opened this door that's uh that's doors
bracket zero but this is of course people bracket zero instead today the dot is
a new piece of syntax it means go inside of that person in memory and look
at the name therein and set it equal to Carter and do the same for number
so that's all it's like open the locker door go inside of it and check or set the
name and the number
yeah attributes is fine uh good question in the struct can you set default
values short answer no and this is where C becomes less featurable than
more more modern languages like Python and Java and others where you can
in fact do that so when we transition to python in a few weeks time we'll see
how we can start solving problems like that but for now it's up to you to
initialize name and number to something yeah really good question how can
we adjust or critique the design of what I'm doing
this is one of the few situations where I would say hypocritically do as I say
not as I do I am using pretty ugly lines like this just to introduce the syntax
but my claim pedagogically today is that eventually when we start storing
names and numbers or other things in files or in databases you won't have
this redundancy you'll have one line of code or two lines of code that read
the information from the file or database and then fill the entire array with
that data for now I'm just doing it manually
so is to keep our Focus only on the new syntax but that's it so forgive the bad
Design By Design today other questions on this all right that's been a lot
already why don't we go ahead and take our 10-minute break with snacks
first we have some delightful brownies in the lobby all right we are back and
up until now it clearly seems to be a good thing if your data is sorted
because you can use binary search you know a little more some little
something more about the data uh but it turns out that sorting of
itself is kind of a problem to solve too and you might think well if sorting is
going to be pretty fast we absolutely should do it before we start searching
because that'll just speed up all of our searches but if sorting is slow that
kind of invites the question well should we bother sorting our data if we're
only going to search the data maybe once maybe twice and so here is going
to be potentially a trade-off so let's consider what it means really to sort data
in our case it's just going to be
simple and use numbers but it might in the case of the Googles of the World
be actual web pages or persons or the like so here is our typical picture for
sorting input uh for solving any problem input at left and output at right the
input to our sort problem is going to be uh some unsorted set of values and
the output ideally will be the same set of values sorted and if we do this
concretely let's suppose that we want to go about sorting this list of numbers
7 2 5 4 1 6 03 so it's all of the numbers
right no uhoh all right there we go number three all right so let's just do a
quick check we have 7 2 5 4 1 603 very good so far do you want to just
scooch a little this way just to make a little more of room all right and let's
consider now who we have here on stage you want to each say a quick hello
to the audience hi my name is Ryan uh I'm a first year from Penny Packer hi
my name is Cel I'm a first year at Strauss um hi my name is Lucy I'm a first
year from greo hi my name is Shiloh I'm a first
they are already then sorted so let me propose that we first consider an
algorithm that actually has a name called selection sort and selection sort is
going to be one that literally has me or really you as the programmer
selecting the smallest element again and again and then putting them into
the appropriate place so let me go ahead and start this here uh starting with
the number seven at the moment seven is the smallest number I found so
I'm going to make mental note of that with a mental
variable if you will I'm going to move on now oh number two is obviously
smaller so I'm just going to update my mental reminder that two is now the
smallest effectively forgetting for now number seven uh five not smaller four
not smaller one smaller and I'm going to make mental note of that six not
smaller zero smaller I'll make mental note of that having forgotten now
everything else and now number three is not smaller so what's your name
again Michael so Michael is number zero he belongs of
course way down there but unfortunately you are Ryan Ryan Ryan is in the
way so what should we do how should we start to sort this list where should
number zero go yeah do you want to say the L yeah so let's just go ahead
and swap so if you want to go ahead and zero go on where seven is we need
to make room for number seven it would kind of be cheating if maybe
everyone kind of politely stepped over to the side why because if we imagine
all of our volunteers here to be in Array like that's a crazy amount of work to
have
every element in the array shift to the left just to make room so we're going
to keep it simple and just evict whoever is there now maybe we get lucky
and number seven is actually closer to its destination maybe we get unlucky
and it goes farther away but we've at least solved one problem if we had n
problems at first now we have n minus one because number zero is indeed in
the right place so if I continue to act this out let me go ahead and say two
okay currently the smallest five no four no one currently
the smallest I'll make mental note 6 7 3 and now let me pause one is
obviously the now smallest element so did I need to keep going well it turns
out at least as I've defined selection sort I do need to keep going because I
only claim that I'm using one variable in my mind to remember the then
smallest element I'm not smart enough like us humans to remember oh wait
a minute one is definitely the smallest now I don't have that whole
recollection so I just am keeping track of the now smallest so
number one what your name was Jack Jack where should jack go probably
there and what's your name itel itel okay so Jack and itel if you want to swap
places we've now solved two of the end total problems and now we'll do it a
little faster if each of you want to sort of start to swap as I find the right
person so five smallest four is smaller two is smaller got to keep checking
okay two was smaller all right now I'm going to go back to the beginning all
right four is small five is not six is not seven is
oh three is small where do you want to go okay good I'm going to go back
here and I can be a little smart I don't have to go all the way to the end
because I know these folks are already sorted so I can at least optimize
slightly so now five is small six is small seven is four four is smaller if you
want to go in place there and now here things get interesting I can optimize
by not looking at these folks anymore more cuz they're obviously problem
solved but now five is small six is not seven is not
okay five you can stay where you are now a human in the room is obviously
going to question why I'm wasting any more time but with selection sort as
I've defined it thus far I still have to now check six is smallest not seven and
now my final step okay they're all in place so here too is this dichotomy
between what we all have is this bird's eye view of the whole problem where
it's obvious where everyone needs to go but a computer implementing this
with an array really has to be more methodical and
we're actually saving a step here if we were really doing this none of these
numbers would be visible all eight of our volunteers would be inside of a
locked door and only then could we see them one at a time but we're
focusing now just on the Sorting aspect so let me just before we do one other
demonstration here proposed that what I really just did here in pseudo code
was something like this 4 I from 0er to n minus one keeping in mind that zero
is always the left of the array n minus one
is always the right end of of the array for I from 0 to N minus1 I found the
smallest number between numbers bracket I and numbers bracket n minus
one and that's the very geeky way of expressing this optimization I'm always
starting from numbers bracket I wherever I am and then everything else to
the right and that's what was allowing me to ignore the already sorted
volunteers if though my last line says swap smallest number with numbers I I
think that implements what our humans were doing by physically
walking to another spot all right so that then would what we'll call selection
sort let's go ahead and take a second approach here using an algorithm that
I'm going to call bubble sort but to do this we need you all to reset to your
original locations we have a little cheat sheet on the board if you'd like to go
back to this position here and let me take a fundamentally different approach
because I'm not really liking selection sort as is because it's kind of a lot of
walking back and forth and a
lot of walking suggests like a lot of lot of steps again and again so what
might I do instead well bubble sort is going to have me focus a little more
intuitively on just smaller problems and let's see if this gets me somewhere
else so if I just look at this list without looking at everyone else seven and
two this is obviously a problem why because you're out of order so let's just
solve one tiny problem first so seven and two why don't you swap I know two
is in a better place now because she's
definitely lower uh less than seven so I think I can now move on seven and
five problem so let's solve that seven and four problem let's solve that s and
one let's solve that 7 and six let's solve that 7 and zero solve that 7 and
three solve that okay done sorted right all obviously not if you just glance at
these numbers here but we have fundamentally taken a bite out of the
problem seven is indeed in the right place so we maximally have n minus
one other problems to solve so how do I do
this I think I can just repeat the same logic let me go over here two and five
good five and four no five and one no five and six yes 6 and zero no six and
three no so so now we've solved two of the problems and what's nice about
Bubble sword at least as this glance it's nice and simple it's nice and local
and you just keep incrementally solving more and more problems so let's go
ahead and do this again and I'll do it we can do it faster two and four we
know are good four and one four and five five and
zero five and three five and six six and seven good so we go back two and
one ah now another problem solve two and four four and zero four and three
four and five five and six six and seven and so notice too as per its name the
largest elements have bubbled their way up to the top and that's what
seems to be happening just as we're fixing some remaining problems so
almost done one and two two and zero two and three three and four four and
five five and six six and seven almost done obviously to us
humans it looks done how do I know as the computer for sure what would be
the most Surefire way for me to now oh it's not done sorry uh that's a bug
okay one and zero okay one and two two and three three and four three four
and five five and six six and seven okay so now it's obviously sorted to the
rest of us on stage how could I confirm as much as code right you're doing it
with your mind just glancing at this how would the computer the code know
for sure that this list is now sorted
yeah let's do one more time and look uh draw what conclusion yeah let's do
it one more time even though it's a little wasteful but logically if I go through
the whole list comparing pairs again again and again and I don't do any work
that time now it's obviously logically safe to just stop because otherwise I'm
wasting my time doing the same thing again and again if no one's actually
moving so I'm afraid we don't have Monopoly games for all of you but we do
have eight stress
balls and a round of applause if we could for our volunteers if you want to
put your numbers on the Shelf there so if we consider for a moment thank
you thank you so much sure thank you thanks sure so if we consider now
these two algorithms which one is better any intuition for whether selection
sort the first is better or worse than bubble sort the second any thoughts
yeah okay so bubbl swort seemed like less work especially since I was
focusing on those localized problems other intuition selection sort versus
bubble
sort well let me propose that we try to like quantize this so we can actually
analyze it in some way and this is not an exercise we'll do constantly for lots
of algorithms but these are pretty representative of algorithms so we can
wrap our minds around indeed the performance or the design of these things
so here is my pseudo code for selection sort whereby as it's as per its name I
just iteratively select the next smallest element El again and again so how
can we go about analyzing something like this well we could just
do it on paper pencil and count up the number of steps that seem to be
implied logically by the code we could literally comp count like the number of
steps I was taking again and again left to right we could also just com uh
count the number of comparisons I was making with each of the persons
involved and I was doing it kind of quickly in selection sort but every time I
was looking at a person trying to decide do I want to remember that number
is small as that number I was comparing two values with
an equals equals or less than or greater than sign at least if we had done this
in code so that tends to be the norm when analyzing algorithms like these
counting the number of comparisons because it's kind of a global uh it's kind
of a global unit of measure we can use to compare different algorithms
entirely so think too that in the general case when we have more than eight
volunteers more than seven doors we can generalize our our array in general
as this is the first element at bracket zero and the end of it is always
stage how many total comparisons did I do like if there's eight people I
compared these folks then then like this person this person yeah yeah so
seven total right because if there's eight people on stage you can only do
seven comparisons total because otherwise you'd be comparing one number
to itself so it seems like in the general case if you've got n numbers that
you're trying to sort finding the smallest element first takes n minus one
comparisons maybe n it's total steps left or right
but the number of comparisons which I claim is just a useful unit of measure
is n minus one how about finding the next smallest person how many steps
did it take me to find the next smallest number which ended up being the
number one yeah yeah so just n minus two why because I'd already solved
one problem someone was already in the right position it would be silly to
keep counting them again and again so I can Whittle down my number of
comparisons for the next past n minus 2 the third past to find the third
smallest number
would be n minus 3 and then dot dot dot presumably this story this formula
ends when you have just one final pair the people at the end to compare so if
this is looking a little reminiscent of some kind of recurrence from high school
or high school math or physics or the like let me just stipulate that if you
actually do out this math and generalize it that is the same thing as n *
nus1 / 2 and if you're Rusty on that no big deal just kind of commit to
memory that anytime you add up this kind of series
really large which of these symbols which of these terms is really going to
dominate become the biggest influencer on the total value of steps right it's
the square right like it's definitely not n divided by two that's shaving some
time off but N squared as n gets big is going to get really big if n is 100 then
n squar is bigger if N is a million n squ is really bigger and so at the end of
the day when we really just talking about sort of a wave of of the hand
analysis and upper bound if you will let's just say that
selection sort as analyzed here it's on the order of n s steps it's not precisely
n squar steps but you know what n^2 divided two the intuition here might be
that well it's half of that you n squ is what really matters as n gets really
really large and that's when you start thinking about and trying to solve the
Google problems of the world when n gets large that's when you have to be
smarter than just sort of naive implementations of any algorithm so where
then does this algorithm fall into
this categorization here well n^2 it turns out is on the order of n squ steps in
the worst case whether it's sorted or not it turns out though lower bound if
we consider this same code suppose the best case scenario like our eight
volunteers came up on stage and just because they already sorted
themselves 0 through seven suppose they just happen to be in that state
how many steps would selection store take to sort an already sorted list of
volunteers any intuition yeah would it still be n so for the first pass it would
still
be seven for the first per uh pass across the humans because even though
yeah I'm claiming zero is here I don't know that zero is the smallest until I
make my way all the way over there doing all seven comparisons okay fine
first pass took seven more generally n minus one steps what if I look for the
next smallest element and the humans in this story are already sorted 0
through 7 well yes the number one's here and I see them first but I don't
know they're the smallest until I compare against
everyone else get to the end of the list oh well that was stupid I already had
the smallest person in hand then and so this pseudo code this
implementation of selection sort is sort of fixed like this there's no special
case that says if already sorted quit early it's always going to take n squ
steps and so in this case if we borrow our uh jargon from earlier using Omega
notation just to be clear selection sword is also going to be in this Incarnation
on an Omega of n s because even in the best case where
the list is already sorted you're going to waste a huge amount of time
essentially verifying as much or discovering as much even though we
humans of course could see it right away so selection sort would seem to
take both N squared steps in the uh worst case n s steps in the best case and
so you know what we can use our Theta terminology for that here would be
an algorithm just like counting earlier that always takes N squared steps no
matter whether the array is sorted or not from the get-go all right so
hopefully we can do better and someone proposed earlier that bubble sort
felt like it was using fewer steps well let's consider that next with bubble sort
we had this pseudo code I claim whereby let's focus on the inside of the code
first down here what was I doing for I from 0 to n minus 2 that's curious
we've never seen n minus 2 before but I asked this question if numbers
bracket I and numbers bracket I + 1 are out of order swap them so that was
when I was pointing at our first two volunteers
here I saw that they were out of order so I swapped them how come I'm
doing that again and again up to nus 2 though instead of n minus1 which
we've always used up until now as our rightmost boundary any intuition for
why I'm doing this from 0 to n minus 2 yeah exactly because I'm looking at
the E person per this pseudo code here and the E plus one person I better
make sure I don't go Step Beyond the at boundaries of my array so if you
think of like my left hand when my back was two here
pointing at the current person at the first position my right hand for this if
conditioner is essentially pointing at the person next to them and you want
to iterate with your left hand all through these people but you don't want
your left hand to point at the last person you want it to point at the second to
last person but we know that the last person is always at n minus one so the
second to last person just mathematically is at n minus 2 so it's a subtlety
but this is like a Segal waiting to happen if you implemented
bubble sort using n minus one you will my right hand would go beyond the
boundaries of the array so just bad all right so why am I saying this end
times Well we did it very organically with humans but each time someone uh
each pass I did through the array someone bubbled their way up to the end
number seven then number six then number five so if on each pass through
the array of volunteers I was solving at Mo at least one problem it seems like
bubble sort can just run n times total to solve all
n problems cuz the first pass will get at least one one number into place
second pass second number into place you might get lucky and it would do
more but worst case this feels like enough just do this blindly end times and
they'll all line up together well technically all right now we're getting into the
weeds technically you can just repeat it n minus one times because if you
solve all n minus one other problems and you're left with one like literally
that person's where they need to be just
logically if you've already sorted everything else and you've got just the one
left it's already bubbled up so how do we analyze this well in bubble sore we
might do something like this I'm essentially doing n minus one Things N
minus one times now let me back up to the pseudo code because this one's
a little less obvious this is where you can actually mathematically infer from
your Loop uh how many steps you're taking so this first line literally says
repeat the following n minus one times
large honestly you're barely going to notice the difference it would seem
between these two algorithms but what about um the lower bound if the
upper bound on Bubble sword is also Big O of n what about the lower bound
here well with this pseudo code what would the lower bound be on bubble
sort even in the best case when all of the volunteers are sorted any intuition
in this pseudo code yeah in the middle good question isn't bubble sorts
designed such that you wouldn't need to compare numbers that have
already uh
bubbled up that's what's happening here in the middle implicitly I'm always
going from left to right but remember that even when I screwed up at the
end and the last two people were out of order I do always need to restart at
the beginning because the big numbers are going that way and the small
numbers are coming this way so that is true there are some slight
optimizations that I'm kind of glossing over here let me stipulate that it
would still end up being on the order of n squ but that would definitely shave
off some actual running time here but what if the list is already sorted our
our pseudo code at the moment has no allowance for if list is already sorted
quit early so we're going to blindly do n minus one Things N minus one times
unless we modify our pseudo code as I did verbally earlier I propose this
inside of that outer loop if you make a pass across all of the volunteers and
your mental counter has made no swaps you have to keep track with some
kind of variable well then you might as well
stop because if you do a whole pass and make no swaps why would you
waste time doing it again expecting different Behavior so to help visualize
these whereby now bubble sort can be advantageous if the data is already
sorted or mostly sorted why because it does have this short circuit detail at
least if we implement it like that how can we go about um visualizing these
things a little more clearly well let me go ahead and do this let me pull up
here a visualization of exactly these algorithms thanks to a third party tool
code need to do so but it's this redundant comparisons that kind of explains
why n s is indeed the case so now it's done small bars here big bars there
and I had to just keep talking there to kill time because it's relatively slow
well let me randomize the array just so we start with a different order and
now let me click on bubbl sore and you'll see similar idea but different
algorithm so now the two bars in pink are the two that are being compared
and fixed potentially if they're out of order and you can see
already that the biggest bars are bubbling their way up to the top but now
you can also see like this redundancy like we keep swooping through the list
again and again just like I kept walking back and forth and this is n squar this
is not that many bars what 10 20 there's like 40 or something bars I'm
guessing that's pretty slow already just to sort 40 numbers and I think it's
going to get tedious if I keep talking over this so let's just assume that this
two is relatively slow had I gotten lucky and
the list were almost sorted already bubble sort would have been pretty fast
but this was a truly random array so we did not get lucky so indeed the worst
case might be what's kicking in here so I don't I feel like it'll be anticlimactic
like holding in a sneeze if I don't let you see the end of this so here we go
nothing interesting is about to happen almost done ah okay done all right so
thank you thank you so still somewhat slow though how though can we
perhaps do a little better
true and here's the interesting part if number is less than middle door search
the left half else if number is greater than middle door search the right half
this pseudo code earlier was itself recursive why because here is an
algorithm for searching but what's the algorithm telling us well on this line
and this line it's telling us to search something else so even though it's not
explicitly defined in code as having a name if this is a search algorithm and
yet the search algorithm is using a
search algorithm this pseudo code is recursive now that could quickly get
you into trouble if a function just calls itself again and again and again but
why intuitively is it not problematic that this code this pseudo code calls
itself why will the algorithm still stop yeah exactly it has some exit condition
like if no do is less and more more importantly anytime you search the left
half you're searching a smaller version of the problem anytime you search
the right half you're searching a smaller
version of the problem literally half the size so this is why in the phone book
obviously I couldn't tear the phone book in half uh infinitely many times
because it was literally getting smaller each time so recursion is this ability
to call yourself if you will it's but what's important is that you do it on a small
smaller smaller problem so that eventually you have no more problems to
solve or no more data no more doors at all so these two lines here would be
the recursive elements here but if we go
back to week zero we could have used recursion in some other way so this
was our Cudo code for the phone book back in week zero and recall that we
described these yellow lines as really representing a loop some kind of cycle
again and again but there was a missed opportunity here what if I had
reinvent reimplemented this code to do this instead instead of saying open
to middle of left half of book and then go back to line three like literally
inducing a loop or open to Middle right half a book and go
back to line three inducing another loop why don't I just recognize that what
I'm staring at now is a algorithm for searching a phone book and if you want
to search a smaller phone book like a through M or n through Z we'll just use
the same algorithm so I can replace these yellow lines with just this casually
speaking search left half a book search right half a book this would be
implicitly and now I can shorten the whole thing a recursive implementation
of the phone book pseudo code from week
zero and it's recursive because if this is a search algorithm and you're saying
go search something else that's fine that's recursive but because you're
searching half of the phone book it's indeed going to get smaller and smaller
even in the real world or the real virtual world you can see recursive data
structures in the wild or at least in Super Mario Brothers like this let me get
rid of all the distractions here and focus on this pyramid where you have one
block then two then three then four well
this itself is technically recursively defined in the sense that well what is a
pyramid of height four well it's really what how would you describe a pyramid
of height four is actually the same thing as a pyramid of of of height three
plus one additional layer well what's a pyramid of height three well it's
technically a pyramid of height two plus one additional layer and so even
physical structures can be recursive if you can Define them in terms of itself
now at some point you have to say that if the
pyramid is of height one there's just one block you can't forever say it's
defined in terms of a height negative 1 negative2 you would never stop so
you have to kind of have a special case there but let's go ahead and
translate something like this in fact to code let me go back to uh vs code
here and let me Implement a program called iteration that refers to a loop
iterating and let me Implement a very simple pyramid like that so let me go
ahead and include the cs50 library I'll include our standard
i.h in main void no command line arguments today and let's go ahead and do
this let's declare a variable called height ask the human for the height of this
pyramid and then let's go ahead and draw a pyramid of that height now of
course draw does not yet exist so I'm going to need to invent the draw a
function let me go ahead and Define a function that doesn't have a return
value it's just going to have side effects it's just going to print bricks on the
screen call draw and it takes in
an integer n as its input and how am I going to implement this well again I
want to print one block then two then three then four that's pretty
straightforward at least once you're comfortable with loops let me go back to
the code here let me go ahead and say 4 into I gets zero I is less than n i ++
and that's going to iterate essentially row by row and on each row I want to
print out one then two then three then four bricks but I'm iterating from Z to
1 to two to three so I think that's okay
I can just say something like 4 in J gets zero J Let's Be Clever about this is
less than I j++ and now let me go ahead and inside of this loop I think I can
get away with just printing out a single hash sign but then outside of that
Loop similar to last week I'm going to print my new line separately so a little
non-obvious at first but this outer loop iterates row by row line by line if you
will and then the inner loop just Mak sure that when I equals zero um let's
see oh nope there's
a bug I need to make sure that it's j is less than I + 1 so when I is zero on my
first line of output I'm going to print out one brick when I is one I'm going to
print out two bricks and so forth so let me go ahead and run make iteration
all right and now seems to compile uhoh huh implicit Declaration of function
draw so I'm making week one mistakes again what say again yeah the the
Prototype is missing I didn't declare it at the top that's an easy fix and the
only time really it's
okay and necessary to copy paste let me copy the functions declaration
there end it with a semicolon so that clang now knows that draw will exist
make iteration now it works thank you / iteration we'll type in something like
four and there we have it our pyramid of height 1 2 3 4 that looks pretty
similar to this albeit using hashes so that's how we would have implemented
this like two weeks ago in week one maybe last week but just using arrays
but let me propose that we could do something
recursively instead let me close this version of the code and let me go back
to VSS code and open up recursion doc just to demonstrate something
recursively and I'll do it incorrectly deliberately the first time so let me
include cs50.h let me include standard i.h let me do uh int main void and let
me just blindly draw a pyramid initially of height one but now in my draw
function let me reimplement it a little differently so my draw function this
time is still going to take a number n but that's how many hashes it's going
to
print so let's do four into I gets zero I is less than n i ++ then let's go ahead
and print out a single hash mark here and then after that let's print out the
end of the line just as before but now this of course is only going to draw a
single um row it's going to print out one hash or two hashes or three hashes
but only on one line let me now incorrectly but just kind of curiously say all
right well if this draws a pyramid of height one let's just use ourself to draw a
pyramid of
height n plus one so the first time I call draw it will print out one hash then
the second time I call draw it will print out two hashes then three then four
so we're kind of laying These Bricks down from top to bottom uh make
recursion uh oops I screwed up again so let's copy the Prototype here let's
put this down over here semicolon let's do this again uh make recursion all
right all good/ recursion and now let me increase the size of my terminal
window just so you can see more of the output
and here we have okay bad but thank you so we have an infinitely tall
pyramid and it's just flying across the screen which is why it looks kind of like
a mess but I printed out a pyramid of height one and then two and then
three and then four and unfortunately what am I lacking any sort of quick
condition any kind of condition that says wait a minute when it's too tall stop
all together so this is an infinite Loop but it's not a loop it's a recursive call
and actually doing this
in en is very bad we'll see next week that if you call a function too many
times you can actually trigger yet one yet another of those segmentation
faults cuz you're using too much memory essentially but for now I haven't
triggered that yet control C is your friend to cancel and as an aside if you're
playing along at home or play with this code later I actually cheated here we
have a special clang configuration feature that prevents you from calling a
function like that and creating a problem I overrode it just to
for demonstration sake but odds are at home you wouldn't be able to
compile this code yourself but let me do a proper version recursively of this
code as follows let me go back into the code here let me go ahead and not
just blindly start drawing one then two then three layers of bricks let me
prompt the human as before for the height of the pyramid they want using
our get in function and now let me call draw of height again so now I'm going
back to the looplike version but instead of using a loop now this is where
recursion
gets rather elegant if you will let me go ahead and execute and code uh the
draw function as follows per your definition if a pyramid of height four is
really just a pyramid of height three plus another row well let's take that
literally let me go back to my code and if you want to draw a pyramid of
height four well go right ahead and draw a pyramid of height uh three first or
more generally n minus one but what's the second step well once you've
drawn a pyramid of height three draw an extra row so I at least have to
bite off that part of the problem myself so let me just do for in I gets zero I is
less than n i++ and let me the programmer of this function print out my
hashes and then at the very bottom print out a new line so the cursor moves
to the next line but this is kind of elegant now I dare say in that draw is
recursive because I'm literally translating from English to code this idea that
a pyramid of height four is really just a pyramid of height three so I do that
first and I'm sort of trusting
that this will work then I just have to lay one more layer of bricks four of
them so if n is four this is just a simple for Loop Allah week one that will print
out an additional layer but this of course is going to be problematic
eventually why it's not done yet this program how many times will draw call
itself in this model infinitely many times why yeah there's no there's no
equivalent of quit like if you've printed enough already then quit well how do
we capture that well I don't
suppose that draw uh is called with an argument of four four is of course not
less than zero so we don't return but we do draw a pyramid of height three
and here's where things get a little mentally tricky you don't move on to line
20 until draw has been called so when draw is called with an argument of
three it's as though you're executing from the top of this function again three
is not less than zero so what do you do you draw two okay how do you draw
two well two is not less than zero so
you don't return so you draw one got to be careful here draw one and now
we go ahead back to the beginning how do you draw One well one is not less
than zero so you don't return you draw height zero how do you draw height
zero wait a minute 0o is less than or equal to zero and you return and so it's
kind of like this mental stack this to-do list you keep postponing executing
these lower lines of code because you keep restarting restarting restarting
the draw function until finally one of those
function calls says there's nothing to do return and now the whole thing
starts to unravel if you will and you pick back up where you left off and this is
perhaps the best uh scenario we won't do it in class but if you'd like to
wrestle through this on your own using debug 50 to keep stepping into step
into step into each of those lines logically you'll see exactly what's actually
happening so let me go to my terminal and do make recursion which is now
this correct version of the code do/
recursion let's type in a height of four and voila now we have that same
pyramid not using iteration per se though admittedly we're using iteration to
print the additional layer we're now using draw recursively to print all of the
smaller pyramids that need come before it no question is can you only use
recursion with a void function no not at all in fact it's very common to have a
return value like an integer or something else so that uh you can actually do
something constructively
with that actual value other questions on this say a little lad when is line 21
getting executed so if you continue to UNR let me uh scroll down a bit more
so you can see the top of the code so line 21 will be executed once line 19 is
done executing itself now in the story I told we kept calling draw again again
again but as soon as one of those function calls where n equals z returns
immediately then we don't keep drawing again and again so now if you kind
of think of the process as reversing then you continue to line
21 then a line 21 again then line 21 again and as the sort of logic unravels
and next week we'll actually paint a picture of what's actually happening in
the computer's memory but for now it's just it's very similar to the pseudo
code for the phone book you're just searching again and again but you're
waiting until the very end to get back the final result uh can Google Now
whom I keep mentioning by coincidence today is full of programmers of
course um here's a fun exercise let me uh go back
to a browser I'm going to go ahead and search for recursion because I want
to learn a little something about recursion uh here is kind of an internet
meme or joke if I zoom in here the engineers at Google are kind of funny
funny see why ah there we go yes yes this is recursion and there's going to
be so many memes you'll come across now where recursion like if you've
ever pointed a camera at the TV that's showing the camera and you sort of
see yourself for the image again and again that's really
recursion and in that case it only stops once you hit the base case of a single
Pixel um but this is a very funny joke in some circles uh when it comes to
recursion uh and Google so how can we actually use Google or rather how
can we actually use recursion constructively well let me propose that we
actually introduce a third and final algorithm for sorting that hopefully does
better than the two sorts thus far we've done selection sort and bubble sort
bubble sort we liked a little better at least
in so far is in the best case where the list is already sorted bubble sort is at
least smarter and it will actually terminate early giving us a better lower
Bound in terms of our Omega notation but it turns out that recursion and this
is not necessarily a feature of recursion but something we can now leverage
it turns out using recursion we can take a fundamentally different approach
to sorting a whole bunch of numbers in such a way that we can do far fewer
comparisons and ideally speed up our final results so here is the pseudo
code
for what we're about to see for something called merge sort and it really is
this tur sort the left half of numbers sort the right half of numbers merge the
sorted halves right this is almost sort of non sensical because I uh if you're
asked for an algorithm to sort and you respond with well sort the left half sort
the right half like that's being sort of difficult because well I'm asking you for
a sorting algorithm you're just telling me to sort the left half and the right
half but implicit in
that last line merging is a pretty powerful feature of this sort now we do need
another base case at the top so let me add this if we find ourselves with a list
an array of size one well that array is obviously sorted if there's only one
element in it there's no work to be done so that's going to be our base case
but allowing us now in just these what uh four six lines of pseudo code to
actually sort some elements but let's focus first on just a subset of this let's
consider for a moment what it
merging these two halves well because they're sorted already and you want
to merge them in order I think we can flip down we can hide all but the first
numbers in each of these subl lists so here we have a half that starts with
two and I don't really care what the other numbers are because they're
clearly larger than two I can focus only on Two And Z too zero also we know
that zero is the smallest there so let's just ignore the numbers that Carter
kindly flipped down so how do I merge these two lists
into a new sorted larger list well I compare the two on my left with the zero
on my right obviously which comes first the zero so let me put this down
here and Carter if you want to give us the next element now I have two
sorted halves but I've already plucked one off so now I compare the two
against the one one obviously comes next so I'm going to take out the one
and put it in place here now I'm going to compare the two halves again two
and three which do I merge first obviously the two comes next
and now notice each time I do this my hands are theoretically making
forward progress I'm not doubling back like I kept doing with selection sort or
bubble sort back and forth back and forth my fingers are constantly
advancing forward and that's going to be a key detail so I compare four and
three three obviously I compare three and uh I compare four and six four
obviously I compare five five and six five obviously and then I compare seven
and six six of course and then lastly we have just one element
left and even though I'm kind of moving awkwardly as a human my hands
technically were only moving to the right I was never looping back doing
something again and again and that's perhaps the intuition and just enough
room for the seven so that then is how you would merge two sorted halves
we started with left half sorted right half sorted and merging is just like what
you would do as a human and Carter just flips the numbers down so our
Focus was only on the smallest elements in each any questions before we
Forge ahead with
what it means then to be merged in this way so now here is an original list
we deliberately put it at the top because there's one detail of Mer sort that's
key mer sort is technically going to use a little more space and so whereas
previously we just kept moving our humans around and swapping people and
making sure they stayed ultimately in the original positions with mer sort
pretend ends that here's our original array of uh memory I'm going to need
at least one other array of memory and I'm
going to cheat and I'm going to use even more memory but technically I
could actually go back and forth between one array and a secondary array
but it isn't going to it is going to take me more space so how do I go about
implementing merge sort on this Al on this code well let's consider this here's
a array of size eight if only one number quit obviously not applicable so let's
focus on the juicy part there sort the left half of the numbers all right how do
I sort the left half of the numbers I'm
going to just nudge them over just to be clear which is the left half how here
is now a sub list of size four how do I sort the left half well do I have an
algorithm for sorting yeah what do I do here's a list of size four how do I sort
it what's step one sort the left half so I now sort of conceptually in my mind
take this sublist of size four and I sort it by first sorting the left half focusing
now on the seven and two all right here's a list of size two how do I sort a list
of size
two sorry I think we just keep following our instruction sort the left half all
right here is a list of size one how do I sort a list of size one I'm done like it's
done so I leave this alone what was the next step in the story I've just sorted
the left half of the left half of the left half what comes next I sort the right
half of the left half of the left half and I'm done cuz it's just a list of size one
what comes after this merge so this is where it gets a little trippy because
you have
left half of it and the right half of it but then merging is really where the
magic happens all right again if you rem if you rewind now in your mind if
I've just sorted the left half of the left half what happens next sort the right
half of the left half so again you kind of rewind in time so how do I do this I've
got a list of size two I sort the left half the just the five done sort the right
half four done now the interesting part I merge the left half and the right half
of the right half of the left half so what do I
do four comes down here five comes down here and now notice what I have
left half is sorted right half is sorted if you rewind in time where is my next
step three merge the two halves and so this is what Carter helped me do
before let's focus only on the smallest elements just so there's less
distraction I compare the two and the four two comes first so let's obviously
put that here now I compare the new beginning of this list and the old
beginning of this list four obviously comes next and now I compare
the seven against the five five obviously comes next and now lastly I'm left
with one number so now I'm down to the seven so even if you've kind of lost
track of some of the nuances here if you just kind of take a step back we
have the original right half here still untouched but the left half of the original
input is now indeed sorted all by way of doing sorting left half right half left
half right half but with those merges in between all right so if we've just
sorted the left half we rewind all
the way to the beginning what do I now do all right so sort the right half so
sort the right half how do I sort a list of size four well I first sort the left half
the one and the six how do I sort a list of size two you sort the left half just
the number one obviously there's no work to be done done sorting the left
half six done sorting the right half now what do I do I merge the left half here
with the right half here and that one's pretty straightforward now what do I
do I've just merged so now I sorted I've
just sorted the left half of the right half so now I sort the right half of the right
half so I consider the zero done I consider the three done I now merge these
two together zero of course comes first then comes the three and now I'm at
the point of the story where I've sorted the left half of the right half and the
right half of the right half so step three is merge and I'll do it again like we
did with Carter all right one and zero obviously the zero comes first now
compare the one and the three
obviously the one comes first compare the six and the three obviously the
three and then lastly the sixth so now where are we we've taken the left half
of the whole thing and sorted the left and sorted it we then took the right
half of the whole thing and sorted it so now we're at lastly step three for the
last last time what do we do merge and so just to be consistent let me push
these down and let's compare left hand to right hand noticing that they only
make forward progress none of this back
and forth comparisons two and zero of course the zero so we'll put that in
place two and one of course the one so we put that in place two and three
we merge in of course the two in this case four and three we now merge in
the three in this case four and six we now merge of course the four in place
and now we compare five and six we keep the five bug okay well pretend
that the five is on uh oh this is why all right so now we compare the seven
and the six sixth is gone and lastly seven is the last one in
place and even though I grant that of all the algorithms this is probably the
hardest one to stay on top of especially when I'm doing it as a voiceover
realize that what we've just done is only those three steps recursively we
started with a list of size eight we sorted the left half we sorted the right half
and then we merged the two together but if you go down each of those
rabbit holes so to speak sorting the left half involves sorting the left half of
the left half and the right half of the left half and
so forth but this germ of an idea of really dividing and conquering the
problem not such that you're having the problem and only dealing with one
half clearly we're sorting one half and the other half and merging them
together ultimately it does still lead us to the same solution and if we
visualize the remnants of this now if I depict this as follows where on the
screen here you see where the numbers originally started in the top row from
left to right essentially even though this is in a
different order I divided that list of size eight ultimately into eight lists of size
one and that's where the base case kicked in and just said okay we're done
sorting that and after that logically I then sorted I merged two lists of size
one into many lists of size two and those lists of size two into lists of size four
and then finally the list of size four into one big list sorted of size eight and
so I put forth this picture with the little line indicators here because how
many times
did I divide divide divide in half or really double double double so exponent is
the opposite oh spoiler uh how many times did I divide so three concretely
but if there's eight elements total and there's N More generally it really is a
matter of dividing and conquering login times you start this and you can
divide one two three times login times or conversely you can start here and
exponentially uh double double double three times which is log n but on
every row every shelf literally I made a fuss
about pointing my hands only from the left to the right constantly advancing
them such that every time I did those merges I touched every element once
and only once there was none of this back and forth back and forth on stage
so if I'm doing something log n times if I'm doing rather N Things log n times
what would be our Big O formula perhaps N Things log n times yeah so n log
n the order of n log n is indeed how we would describe the running time of
merge sort and so of all of the sorts thus far we've seen that
merge sort here actually is n log n which is strictly better than n squ which is
where both merge uh both selection sort and bubble sort landed but it's also
slower than linear search for instance but you would rather expect that if you
have to do a lot of work up front sorting some elements versus just searching
them you're going to have to put in more effort and so the question of
whether or not you should just search something blindly with linear search
and not bother sorting it really boils down
to can you afford to spend this amount of time and if you're the Googles of
the world odds are you don't want to be searching their database linearly
every time why because you can sort it once and then benefit millions
billions of people subsequently using something like binary search or frankly
in practice something even fancier and faster than binary search but there's
always going to be this tradeoff you can achieve binary search only if the
elements are sorted how much does it cost you to sort
them well maybe N squared if you use some of the earlier algorithms but it
turns out n log n is pretty fast as well so at the end of the day these running
times involve tradeoffs and indeed in merge sort 2 I should note that the
lower bound on merge sort is also going to be Omega of n log n as such we
can describe it in terms of our Theta notation saying that merge short is
indeed in Theta of n log n so generally speaking probably better to use
something like merge sort or some other algorithm that's in N log n in
like space and as these shelves suggest that too is one of the key details of
merge sort you can't just have the elements swapping in place you need at
least an auxiliary array so that when you do the merging you have a place to
put them and this is excessive this amount of memory I could have just gone
back and forth between top shelf and bottom shelf but it's a little more
interesting to go top down but you do need more space back in the day
decades ago space was really expensive and so
you know what it might have been better to not use merge sort use bubble
sort or uh selection sort even or some other algorithm alog together
nowadays space is relatively cheap and so these are more acceptable trade-
offs but it totally depends on the application the very last thing we thought
we'd do is show you an actual compar comparison of some of these sorting
algorithms it's about 60 seconds long and it will compare for you uh selection
sort bubble sort and merge sort in parallel simultaneously uh with some fun
sorting
as this grid of pixels and each pixel has like some pattern of bits that defines
its color well it turns out today we'll take a deeper look underneath the hood
at how things like images and so much more is actually implemented using
just these zeros and ones and how now as a programmer you can actually
harness that for better for worse to better understand and better manipulate
what's going on inside of a computer's memory using a language like C in
fact even this bowl of stress balls
that we keep happening is just a photograph of course but if you think back
to week zero if you sort of enhance enhance enhance this image like they do
in the movies it actually doesn't work out the way you would think from
Hollywood as I keep continue to zoom in and zoom in and zoom in on a
screen like this you'll see that yes it gets bigger but if it gets too big what do
you start to notice the so-called pixelation and indeed you can see the
individual dots so next time you watch some uh show or
movie on uh TV that has this sort of notion of enhancing you know there's
actually a finite limit there you can only enhance so far as there's actually
information there but once you zoom in to a certain level like this like that's
all that there you're not going to see the glint of the suspect in some crime
drama in their eye just because you've enhanced the image there's only a
finite amount of information actually there but we'll see today too that by
understanding what's going on inside of
a computer's memory we can start to represent and even create and code
more interesting things so for instance here is a bit map if you will which is a
term of art a bit map is a type of image and it's a map of bits in the sense
that you have this coordinate system of up top down left right at least in this
artist representation here and suppose that maybe we all defi decide as a as
as the world that one shall represent the color white and zero shall represent
the color black what might this map of bits this
bit map actually be can you see through it yeah it is indeed a smiley face so
an amazing eye if I actually turn all of the ones to White just to visualize this
you'll see indeed this is what was embedded there but of course on our
computer monitors and phones we have this grid of square is this grid of
pixels so indeed if you were to actually see on your screen a smiley face like
a black and white one at that what's probably going on underneath the hood
is just some pattern of zeros and ones and
maybe single bits one bit color if you will where one here represents white
and zero represents black so if you kind of like this thing it turns out you can
do pretty uh pretty beautiful pretty interesting pretty artistically inclined
things if you go to this URL at your leisure cs50. lart it'll actually redirect you
to a Google spreadsheet that we've made in advance and we've kind of
Shrunk the rows and columns to resemble a grid of pixels tiny little squares
all of which are white by
default not unlike this easel here that we have a couple of volunteers
working away at in fact would you guys like to come forward for a moment
and say a quick hello before we come back to you uh hello my name is
Daniel I'm from Chicago welcome to Daniel and hi everyone I'm Adam and
I'm from chindon Tobago nice well welcome to you both thank you you'll see
that in their hands are actually a whole bunch of pixels uh Post-it notes that
we've handed them in in advance so if you don't mind we'll
come back to you in a couple of minutes and see what they've created if you
will on this grid of white paper much like you could create on this Google
spreadsheet in fact feel free to send us your Creations if so inclined uh via
the URL you'll get at cs50. lart now let's come back to week zero where we
Define some of the building blocks for images we talked about RGB which is
just red green blue and it's just one of the systems a popular system via
which you can represent any color of the rainbow
using some combination of red and green and blue and if any of you are
artistically inclined or have used Photoshop or similar programs you might
typically have some means of selecting a color by some grid like this but
down here notice there's explicit mentions of the types of color systems in
use R GB and in fact here you see 0000 0 and up here under new you see
the color black and that implies that if you have no red no green no blue well
that indeed would represent by convention the color black
green no blue if we change it instead to 255 for green but zero for red and
blue of course we get green and then lastly if we crank up the blue but leave
red and green as zero we of course get blue but all this while down here
highlighted is something that maybe some of you have seen before like
some combination of numbers and letters if any of you have made personal
web pages or use programs like Photoshop you might have used these so-
called color codes so indeed the world has this convention whereby using
six digits or sometimes three you can represent a little more succinctly some
amount of red green blue and you'll see here maybe by inference that if RGB
is0 0 255 respectively perhaps where we're going with this is that 0000 FF is
just an alternative way of expressing the exact same idea no red no green
and a lot of blue but why is that and in fact we'll come full circle here to
introducing something that we could have done in week zero but it doesn't
really solve a problem then but today as
1 1 2 and so forth but in other systems not binary not decimal but systems
called heximal hex implying 16 there are actually more digits than these
which might come as a surprise um it's not pairs of digits like in decimal
single digits and frankly it doesn't really matter what the digits are because
at the end of the day these are just symbols that you and I immediately
associate with some notion of math but just Strokes on the screen that
represent some have represent some actual value so it turns out that by
convention when you want more than n 10 digits 0 through 9 you start using
letters of the English alphabet A B C D E and F and you can represent them
in lower case it's case insensitive so it doesn't really matter you might see it
in upper case or lower case but this is how you can count Beyond N9 not
using decimal but using indeed something called hexadecimal if we get
really technical this is also known as base 16 and it's the same idea as week
zero where instead of using base 2 for binary base 10 for decimal use 16
as the base for heximal and so if we run through just some simple examples
here in the world of heximal your columns are just powers of 16 16 to the 0er
16 to the 1 16 to the 2 and so forth but in the world of hex we usually at
least thus far and today we'll see just pairs of digits like this so here for
instance is the ones column and the 16's column if we multiply that out so if
you wanted to represent the number you and I know in in uh the real world
as zero in heximal it would just be 0 0 if you want
some of the past math well once you get to Zer F in heximal if f is the
highest you can count just like in decimal nine is the highest you can count
what comes next if this is 15 I claim how do I represent 16 in heximal with
what pattern of symbols what pattern of symbols for heximal yeah so one Z
not 10 even though you might read it like that as a typical human but one
zero because why well even if this is completely new to you the whole
column system the places are exactly the same intuitively so you need
a one in the 16's place and a zero in the ones place and we won't count all
the way up to 255 but we count if we count a little higher this would be 1 Z
AKA 16 in decimal this would be 1 one AKA 17 in decimal and then 18 19 20
and so forth dot dot dot and we can count all the way up to FF CU if f is the
biggest digit in hexadecimal FF is indeed as high as we can count and if each
F represents 15 well let's just do the math like in week zero so 16 * F + 1 * f
is how all of us learned to do uh
Beyond but like why is heximal useful like why are we complicating the world
and adding on top of decimal something else well it turns out that a single
decimal digit like f the biggest one for instance is 15 and here let me just
propose a bit of Mental Math how many bits do you need to represent the
number 15 in binary if you've got the ones place two's Place fours and so
forth how many bits total so fewer than five to count this highest 15 I think
but close someone else sing in hand yeah so four bits I
bits is a bite which is again just a convention we've seen and so the reason
that you see hexad desmal in the world of Photoshop and eventually web
pages is it actually just Maps really nicely to expressing binary numbers
more succinctly with a fixed number of digits so for instance anytime you see
11 one one one 1111 in the world as binary you know what that's a little
tedious to both say and write you can represent more succinctly any uh
group of four one bits more succinctly in heximal as just F so 111
1111 in binary more succinctly and more commonly now in the world of
Photoshop memory images and the like is represented more succinctly as FF
and that's why because it just Maps really nicely to four bits and so we can
be a little more succinct so any questions on heximal which is just another
way of representing information but using the same grade school approach
yeah good question if you represent 15 with f it would use four bits so base
systems are really just a way for us humans on paper
255 of blue and it's just way more succinct than writing out like what 8 plus 8
plus 8 24 zeros and ones and it's just cleaner than even using decimal when
you're using units of eight which again computers just use everywhere so it's
just another system it's not one you need to dwell on very much but again
it's fundamentally no different from binary or decimal we're just using a
slightly different base no all right well we had this blank canvas here and I
think uh are you two perhaps ready to
reveal for the world what you've created do you want to go ahead and I'll I'll
swivel it around for you all right here we go big reveal and today's pixel art a
round of applause if we could very nicely done well thank you both if you
want to come up after and tear this off and bring it home you're welcome to
and keep the Post-it notes too well thank you to our volunteers there let's
now translate this to really more technical worlds where we're going to see
and consider it more often
because in fact sometimes when you've had error messages over the past
few weeks from clang the compiler you might have even seen evidence of
heximal we didn't call it out it wasn't useful to know at the time but it turns
out a lot of programs use and a lot of code uses heximal for those reasons of
more prec more representation so for instance where else might we see it
well here's that picture we keep pulling up of our computer's memory and
each of these squares in this grid represents a bite
sort of top left to bottom right in the computer's memory but again just an
artist's representation a few weeks ago I claimed that each of these bytes
can be numbered of course like this is bite zero at top left then bite one then
bite two then bite two billion if you have two gigabytes of memory and so we
could just number them like this 0 through 15 on up 16 17 18 and so forth
but per the reasons earlier it's just more common in computer systems and
in software to actually use hexadecimal just to
potential problem here with using heximal in this way there's an ambiguity
can anyone imagine like what can go wrong if we use hex to just simply
describe locations in memory like this yeah yeah yeah like one zero might
also be 10 and you know maybe if you're you know really thorough okay wait
a minute it can't be 10 cuz here's F over here so it's obviously not decimal
but why create potential confusion especially when you're collaborating
building something with someone we want to avoid
that ambiguity and so the convention humans decided on years ago is that if
you want to make clear that a number is in hexadecimal just by convention
you prefix all of the digits with Zer X the X is not like another character it's
not like a 17th character it's just a human conven of putting 0x to imply here
comes heximal and now it's unambiguous so now we see 0x10 obviously is
not 10 as we know it in decimal but rather it's the number that comes after a
single F so it's really the number in decimal 16 so
0x anytime you see it that's just a visual cue that what is ahead is actually
heximal so let's now start playing around with this information so here's a
super simple line of code from like week one where I'm just declaring ing a
variable n and I'm defining it to be the value 50 and this is out of context we
probably need a main function and all of that but let's just rewind to week
one where we actually saw code like this and do something useful with a line
of code like this so let me go over here
to VSS code and in vs code I'll create a program called how about addresses
since the goal of this uh the goal here is to just play around ultimately with a
variable like n and let me go ahead and do this I'll include how about
standard i.h I'll do int main void so no command line Arguments for now in N
gets 50 and now so that we can do something mildly useful with it let's just
go use print F and print out with percent I and then a new line whatever that
value of n is so this is not going to be interesting per
se it's just week one stuff where I'm defining a variable and printing it out to
the screen so let me go down to my terminal window and do make addresses
no errors so that's good I'll do do/ addresses and of course I should see the
number 50 here now what's going on underneath the hood let's translate
now code to really what's going on under under underneath the hood of the
computer so if this is our grid of memory I don't necessarily know as the
programmer and I definitely don't care
as the programmer where exactly it's ending up in memory that's the whole
point of using Code let the computer figure this out but at least conceptually
I know that by declaring a line of code like that the number 50 ends up
somewhere in the computer's memory and it's assigned the name n a
symbol n via which I the programmer can refer to it and I very deliberately
used four of these squares for what reason what might be the reason for
using four squares specifically yeah yeah so an integer is four bytes at
least most of the time on Modern systems an integer is four bytes on an
older computer it might just use one or maybe even uh two bytes But Here
by convention we're almost always going to see four bytes I don't know if it's
going to end up here it might end up over here but for now who cares I just
know that the computer can store the the information in this way underneath
the hood so let's now introduce another feature of C that we haven't had
occasion to use just yet that's going to allow us to start poking
around the computer's memory For Better or For Worse and this is one of
those situations where you're about to learn acquire a a skill a power that
can actually come back to bite you because once you know how to start
poking around a computer's memory you can do very powerful things and
next week we'll see what you can build in a computer's memory but you can
also screw up pretty easily and cause more of those segmentation faults that
a few of you have already suffered so with that said
let's just stipulate that you know what I don't care necessarily where the 50
is in memory but I know it exists at some address in memory and just so I
have an easy address to pronounce let's just suppose it lives at ox123 so
that's the address in memory in heximal by convention and that just happens
to be where it ends up when I write that line of code but it turns out C has
some other operators we can use when we've seen the asterisk before the
star and we've used it for multiplication but
today we're going to use it for something more powerful and we're also going
to introduce an Amper sand which allows us to do something as well the
Amper sand operator is going to allow us to get the address of a piece of
data in memory like by literally putting Ampersand before the name of a
variable C will tell us tell you what address that variable lives at maybe it's
ox123 maybe it's Ox 456 who knows but that will give you back the answer
the Star does the opposite it's sort of means go
there so using the star otherwise known as the D reference operator I can
actually go to a specific address if I want and we'll see what this means in
code so how can I leverage this in some mildly interesting way to start
poking around but eventually we'll use this primitive to build more
interesting things so let me go back to save vs code here and let me go
ahead and do this I'll clear my terminal to start fresh and I'll introduce
another format code for printf percent p and for now just
take on faith that this it is percent P because but percent p is going to allow
me to print the address of a variable if I additionally tell C get the address of
n so I'm changing percent I to percent p and that's just something you have
to do when printing addresses for now but I need to change an ampersand in
front of the variable name so I don't print in the number 50 I print out
something like ox123 and it's not going to be as simple as that we'll see on
the screen though
as small as the thing on my slide so this at the moment isn't that useful yet
but it introduces us to a concept that we'll Now call pointers and pointers are
admittedly one of the more challenging aspects of c and if in future life you
tell friends that oh I took a class called cs50 and we learned C like you'll
probably get kind of a look at people like why did you learn C or like oh c was
hard and it's largely because of this topic which isn't to say that it's that hard
to wrap your mind around but
it's definitely very different and it's not a feature that you can harness in
higher level languages that we'll see in class two like Python and Java and
the like C is about as close to the computer's Hardware so to speak that you
can get before things get actually scary the so-called Assembly Language we
saw in week two when I had a link and compile and assemble and all of that
like that gets really lowlevel and you really have to be an expert with the
computer's CPU or brain to understand
that but with C you can actually poke around the computer's memory and do
powerful things with that but again with great power comes responsibility it's
very easy to break programs by misusing memory or just having a bug that
touches memory in some way that you don't intend so pointers at the end of
the day are pretty much what we just saw a pointer is really just a variable
that contains the address of some value a pointer is a variable that contains
the address of some value or more simply it's fine to
because it's an address it's not int P it has to be int star P so to speak and
the star here on the left hand side of the equal sign is just a clue to C that
means p is going to be a pointer that is p is going to be the address of what
the address of an integer now technically it's still an integer itself right
because an address is just a number whether it's 1 2 3 or ox123 so this is
really just a semantic difference so in Star p just means that this variable
doesn't contain any old number like 50
access the addresses of things in memory means that we'll be able to build
things and construct things and Link things together by knowing where they
live so to speak so any questions on this technique thus far [Music] yeah a
good question on line six must it be starp and Ampersand and in this case
yes because what am I doing on the left and I'll get rid of the equal sign for
now this would give me a variable called P that's not an integer per se but
that's the address of an integer but
without the equal sign I'm not storing anything in that variable so by adding
the equal sign and then Ampersand N I am explicitly figuring out with
Ampersand what the address of n is which already exists per line five and
tucking it away in this new variable called P other questions yeah every time
you run good question every time I run the program it uses up a different
piece of memory short answer yes computers though long story short also
have something called virtual memory so if you run it
again and again you might actually see the same addresses on the same
Mac or PC or cloud-based server but we'll see in a bit where uh at a high
level it's laid out but it will always exist at some address good question yeah
some correct Ampersand n is the address of N and in Star p is a pointer
called p and honestly in an Ideal World if C were made today and not
decades ago when humans were first creating languages you know ideally
we would just have a data type called pointer and then this would be a little
less complicated because it would literally be what it says you know the
humans who invented SE didn't do this but this is the idea so pointer is not a
legitimate word in the code it is a term of Art in English but this is really just
the idea but the way you express pointer as a data type type is a little more
cryptic as int star P here but notice in line seven when I print out p i don't
use a star I don't use an ampersand y I literally just want to print the value of
p and we've been
doing that since week one if you want to print a variable just describe the
variable by its name no special syntax any other questions on this thus far
uh what's the advantage of using pointers with pointers we'll see today some
applications of them really the idea is going to come to fruition next week
when we're going to create what are called uh data structures in memory
where we can build not just uh for instance uh one-dimensional data
structures like an array we'll see next
of discussion we're only dealing with integers like the number 50 uh you
mentioned strings or characters absolutely we're about to go there soon so
you can use the address of anything you want in the computer's memory so
in fact let's translate this now to just the same picture just to help you wrap
your minds around what these two lines of code really fundamentally are
doing so if I come back to my grid of memory here let's plop the number 50
in the variable n at the bottom right like it
was before so this is that first line of code as before but with the new second
line of code as soon as I create P what do I do well first remember that n lives
somewhere in the computer's memory usually I don't care precisely where it
is but for the sake of discussion let's suppose it's at ox123 which is easier to
say than where it actually ended up and now what is p well p is just another
variable and variables live in memory too so let me just hypothesize that P
lives up here and it turns out that P
once you assign it the value of Ampersand N means that c will take a look at
the variable n realize oh it lives at ox123 and what goes in the value of p is
literally ox123 so again it's still an integer which is confusing but it's
technically an integer being used as an address and now just a a prompt
here notice that this pointer is pretty darn big it's like eight squares what's
the implication of that because I did that deliberately how big must a pointer
apparently be in most modern systems would you say okay
good computers today are very big you have gigabytes of RAM in your
computer you therefore need big pointers to be able to point at memory
that's conceptually pretty far away so to be clear how many bytes does a
pointer apparently take up well it seems to take up eight in total integers by
convention nowadays are usually four pointers though nowadays are
typically eight in this case so I'm drawing it in a manner consistent with the
reality even though at the end of the day it's not really
that interesting what values are in here in fact let's emerge from these
weeds I don't really care what else is going on in my computer's memory at
the moment because I've only got those two lines of Juicy code defining n
and defining P so let's hide all of the other squares and honestly I mean it
when I say that programmers need to know that a variable exists somewhere
in memory and needs to be able to get that address using like the
Ampersand but you're never going to print F like I did the actual address
like it's not generally interesting unless you're debugging your code but
you're not going to like start typing out crazy Ox numbers in your code to
move things around you just need to know that the computer can figure out
where things are so frankly by that logic who cares that it's ox123 right
tomorrow it could be Ox 456 or something else so one of the ways to think of
a pointer is literally as a variable that points at something else and indeed in
this case P yeah technically it has an address and
yeah technically it's ox123 in the story but honestly who cares I just need to
know that you using p i can get to the value n and so what are these
addresses and in fact if Carter wouldn't mind joining me up here for a
moment what are these addresses well just like in our human world we have
mailboxes even though you might not check it very frequently nowadays but
to get physical mail every uh home every business has a unique address the
uh science and engineering complex is 150 Western
twice as big because of the number of btes using but Home Depot only had
identical size mailboxes but here is p one variable there is in another variable
if I open up this mailbox what should I find inside of it based on our story
thus far like what value will I pull out dramatically in just a moment yeah I
think ox1 123 Now using this you can kind of think of this as like x marks the
spot no pun intended where I can now like walk around the computer's
memory and find my way to that location by sort of following the
Carter's waiting for Applause so like really well nicely done thank you so
that's just like a physical metaphor of what's going on here in one variable
we have an address and that variable by convention is called a pointer in the
other variable per week one we just have a value like n and you can yes
follow the map and walk yourself to that particular address and we'll see how
to do that in code but what's really interesting is this abstraction that
pointers literally or really I guess
figuratively point at some other value in memory all right questions then on
pointers in this form pointers point to each other can pointers point to each
other so yes there's things called double pointers we're not going to see
them anytime soon but using star star you can express an address of an
address um but we won't see that just yet other questions on pointers yeah
in front our array so to summarize our arrays then pointers so short answer
there's a relationship and we'll come
back to that in a little bit but arrays are technically different from pointers
but we we're going to be able to blur the lines a little bit by using one like the
other but let me come back to that in just a bit of time all right so if we have
now this mental model if you will of like what a pointer is in memory I think
we can start to peel back a layer of uh simplification that we've been
assuming for the past few weeks since week one so a string recall is a
sequence of characters and so if you
want to create a string that says Hi in all caps and an next exclamation point
we do string s equals quote unquote high and we can hard code it like this or
we could use get string but for now just assume that I hardcoded it into my
code to always say hi in all caps with an exclamation point well what does
that look like in the computer's memory well let's stop looking at the entire
memory let's just focus on really what's going on once you create a string
called s and store in it hi you know that a couple of
things are happening H and I and the exclamation point are ending up in the
computer's memory we know from week two that this thing the so-called null
character n AKA back slz is also being added for you and it's somewhere in
memory at the moment I don't really care where I drew it at the bottom right
yes it has an address but for now it just ends up somewhere and in fact
here's a little visual cue as to how this happens in C anytime you use double
quotes to give you a string you can imagine that
the double quotes are like a a clue to not only store Hi exclamation point but
also put the null character there for you and this is in contrast to what chars
if you want individual characters what syntax did we use instead so single
quotes single quotes do not add magically a back SL zero they literally just
store one character so again strings have always been a little special you get
some extra an extra bite for free so that you know where the string ends and
functions like stir compare can then find their way there so
in memory it might indeed look a little like this and if we assume that there's
going to be somewhere in memory these things are going to be somewhere
in memory we can address them per week two by way of the name of the
variable so if s is the name of the variable S braet 0 is how you would refer to
the first letter s bracket 1 s braet two and if you really want s bracket 3
would get you at the uh null character at the very end but what is s so
technically in this line of code here not only is the
computer giving you memory for h i exclamation point back sl0 we turns out
that s itself must take up some amount of space right because s is the
variable and every time we' talked about variables thus far I've given you a
rectangle on the screen in which to store its value so let's assume for the
sake of discussion that the H is at ox123 and I is at o x124 exclamation
points at o x125 and the null characters at ox1 126 well what then is s well s
is just going to be some other variable and
I'll draw it somewhat abstractly without all the other boxes up here and I'll
claim that the name of this variable is s but it turns out what is s really how
do strings Really Work Well s is a variable and has been since week one but
when you define it what the computer is doing for you automatically is when
it knows you want to store Hi exclamation point it puts that somewhere in
memory the computer then figures out for you what's the address of the very
first character and it stores that address and
only that address in the variable you created on the left hand side of the
equal sign and that's enough like to represent a string with three letters of
the alph alphabet or punctuation you don't need three variables you just
need one you just need to know the beginning of the string why why is it
sufficient for a variable to Only Store the first byes address and not all of the
bytes addresses exactly because of the design of strings per week two we
always null terminate them so it suffices to only
remember the first bites address because from there you can sort of follow
the breadcrumbs bite after bite after bite and until you see the new line
Sorry the the null character you know that all of those characters are
apparently part of the same string so this is what's been going on in the me
computer's memory all since week one and in fact if we abstract this away
you can really think of s as being just this really a pointer to that chunk of
memory so in fact what do we have here well in the left to
recap on the code here on the left hand side string that's what ensures that
we'll actually be able to store a string in a variable called s we're going to
have on the uh right hand side though the actual value so let me switch back
to VSS code here and let me change my code to no longer involve integers
alone so I'm going to add the uh cs50 Library just so that I can use some
shortcuts in there cs50.h and then in my main function I'm going to go ahead
and do this string s equals quote unquote high
in all caps exclamation point and then I'm going to go ahead and print out
using percent S as always back sln the value of s so this program at the
moment not interesting at all it's just week one stuff again/ addresses indeed
prints out high but it turns out that now that I know this what's really been
going on underneath the hood all this time well here's that same line of code
that defines the variable called s and it turns out anyone want to guess what
string is actually a synonym
for string it turns out is kind of a white lie we've been telling since week one
there is no such thing as string as a keyword in in C it's technically a cs50
thing yeah it's a pointer to a character so really all this time we've kind of
been lying to you there is no string quote unquote it's actually Char star and
if I may it dramatically here go the training wheels like okay that didn't land
very well so uh what have we been doing well it turns out that string is a
much
easier way conceptually to think about what a string of characters is like my
God if we had to start in week one by having you type char like yeah you
might get past it but like this is just way too much ugly syntax not
intellectually interesting at all so we abstract it away what a Char star was in
the first week of C by telling you it's actually called string now string is a
term of art like C programmers programmers are in any language we'll use
the word string to mean a sequence of characters but in C it's not
technically a word unto itself it's rather a synonym that we ourselves created
in some form so in fact how did we do this well think back to just last week
last week I proposed that it'd be really nice if we had a person data type
which the creators of C did not think of decades ago but that's okay we can
Define it ourselves what did we do here well using syntax like this recall that
we defined a person to be what to be this structure this structure using the
new keyword last week struct means that
a person is just a name and a number and it could have been other things
we just kept it simple but how did I associate person with that structure well
we claimed that it was this value here type def which as you might expect
defines a data type so what did we do as cs50 back in week one without
telling you well we could have done something like this like int itself is a little
cryptic and maybe we should have to keep things even simpler said hey
everyone turns out you can Define integers in C and if you
wanted to do this well if you want to create the keyword integer as a data
type you can just typ def it to int so typed def creates the word on the far
right integer and U creates a synonym for it in this case called int so what did
we do in week one without telling you we have a line of code like this in the
cs-50 library that Associates quote unquote string with more cryptically
charar and this is why in week one onward anytime you use the cs50 library
you can write the word string as though it's a real C data type and that's just
because we wanted to have this abstraction these training wheels on for the
first weeks so we don't have to get weeds of all this crazy memory stuff we
can sort of talk about strings at a higher level but that's all they are strings
are the address of the first character in that sequence of characters
questions now on any of these details yeah strings liar good question what
about the strings Library which we have used um unrelated so it does not
define the word string everything in there actually
relates to char stars and so in fact if you've used the CS5 uh manual uh
which is just our userfriendly version of the actual manual pages for the
official language C you'll see throughout that now if you start poking around
or turning off less comfortable mode you'll actually see that we've changed
any mentions of charar in the official documentation for these first weeks to
just string to simplify it but underneath the hood C does not know the word
string per se as a keyword but it's absolutely a concept
that like every program in the world knows about and in fact in other
languages in Python for instance there will actually be a proper string
although it's not going to be called string it's going to be called stir St Str for
short questions on these strings here well let me propose there's one other
feature of this syntax that we can now leverage as follows let me propose
that if we go back to the previous version of my code here wherein let me
switch back to vs code in just a moment
I'm going to rewind in vs code to the integer version of my code from before
and most recently it looked looked like this before when we were using
integers only and not in fact strings at all let me propose that there's this
other feature of C that we can use that actually allows us to go to an address
so at the moment let me just rewind and do make addresses to remind you
what this program do did when it was using integers alone and there's that
address why because on line seven notice I'm
printing out the value of P which is a pointer so of course it's going to look
like an address but let me zoom out now and make one change and instead
of printing out P how can I use today's second new operator not the
Ampersand but the star to actually go to that address well what I can
actually do on this line of code is this if I want to print out the actual integer
50 that's in that variable or equivalently at that address I can go to P here
and not print P literally because that's just an
address I can now say star p and star p means go there more technically
dreference p that is follow the treasure map to the actual address and do
what Carter did open the mailbox and print whatever was in the mailbox
which recall was the actual number 50 so let me try this let me recompile the
code so make addresses okay let me clear my terminal window do/
addresses this time I shouldn't see the ox anything I should see just the
number 50 in this case and here to is kind of a unfortunate design
when you use a pointer you just use the star in an Ideal World this would be
a completely different symbol but again this is what we have questions now
on that syntax [Music] yeah uh why can't we just do the Amper sand here are
you saying it was still a little quiet so strictly speaking we do not need line
six so this is really for pedagogical sake that I am um defining a separate
variable p and then printing it out at this point though I'm just kind of you
know going in circles if you will
because more simple would have been what I would have done in week one
which would be get rid of P alt together get rid of P here and just print out n
right but today we're just giving you this new building block these this new
syntax via which you can figure out the address of something and then
reverse the process later and actually go to it as well other questions on
what we've done here with these pointers all right well let's context switch
back to the string now and see what more we can do with this here in
the case of our strings here let me uh refine this to zoom out let me delete
the integer related code here let me do string s equals quote unquote high in
all caps let me go ahead and for the moment include cs50.h at the top so
that indeed I can use the keyword s or string rather and let me go ahead now
and do something more than I did last time last time I did print F of percent s
back sln and then I printed out s and again I'll recompile this just for clarity
make addresses addresses that just prints out
high so that's again week one stuff but now that we have this other bit of
syntax we can do some interesting things too so for instance suppose I want
to print out not s itself but what if I want to print out the address of s like at
what memory location is s well I can change my percent s to percent P which
now we know P is for pointer so percent p means print out the value of a
pointer that is an address and here I can actually print out s itself but why
that is we'll see in a moment let me do this
here go the training wheels string does not technically EX exist but it does if
I'm using the cs-50 library but if I get rid of the cs50 library as I'm
metaphorically doing by taking off the training wheels I can't use the word
string anymore and in fact let me make this mistake deliberately as you
might have accidentally in past weeks here is the error message I get if I
forget the cs50 library use of Undeclared identifier string did you mean
standard in it's trying to be helpful but it's
char technically you can put the literal star here the asterisk or you can put it
there or you can put it here by convention is to do what I done from the
beginning put the star next to the name of the variable as opposed to
anywhere else uh let me go ahead now and or sorry I meant to add the
spaces there you could do this too but this would be the most normal
convention so now let's do this make addresses compile is okay now do slash
addresses what should I see high or something else feel free to just
call it out so still high you say someone else memory location a memory
location all right so could be one of the two options right either I'm going to
see the string or I'm going to see a memory address though I do in fact see a
memory address and this one's quite different from the integer one but does
anyone now want to explain why you were correct why am I seeing the
address down here and not hi it's subtle yeah exactly because I left my
percent P there which means hey print F show me a
pointer but this is where printf is smart and has been smart since week zero
humans who invented printf decades ago uh wrote code that notices that
okay percent s means to treat the following value not as just an address per
se that gets printed literally but printed as with the mailbox demo is sort of a
treasure map that leads you to the address of a character so simply by
changing one character percent P to percent s and if I now do make
addresses again and slash addresses this now is identical to week one but
hopefully
makes sense because percent s is just a clue to print f that means go to this
address in s print out every character there and thereafter until you see what
the null character and then stop printing anything more and this is why hi
has printed since week one today we can see the address percent P but this
combination of having access to addresses and the null Terminator is all the
information printf needs to actually do something more useful by like printing
the actual strings any questions now on this
approach to percent s yeah in back oh so why is it traditionally being used in
this way honestly like the word string has been around for decades it's not a
keyword you should be able to type in C unless you're using a library like
cs50 um and so percent s just means string so even though it doesn't exist
as a keyword percent s connotes string and humans decades ago like today
just kind of know what that means so they could have chosen any letter of
the alphabet but s sort of makes the most
sense all right well let's in back other question good question before let me
zoom in I did not use a star before the S why well it's subtle here but printf is
was invented years ago to know given an address like in the variable s printf
knows to go there so if we looked at the source code that some human wrote
years ago for C we would likely see the actual uh asterisk that you're
referring to printf is taking on the responsibility for going to S if you were to
do uh star s here instead an asterisk and then s
our syntax in another way let me print out with percent s how about uh not s
here but let's print out some addresses percent s back sln close quote and
then let's print out how about this the first character in the string s would be
called s bracket Z but how do I get the address of the first character in s well
I could technically just use today's new primitive I can just add an ampon
that always gives me the address of some value so when I end this thought
and clear my terminal window and run make
addresses still compiles when I run addresses in just a moment any guesses
as to what I will see line by line this will print out two things and you don't
have to remember what the actual number was but at a high level what will
be printed now the same thing twice why well when I run this what I'm
printing here and let me zoom in at the bottom I ined see two really long
addresses but they're in fact the same why well that's because again if s is
the address of a character as implied
Now by either the cs50 word string or the actual phrase Char star well then s
is just an address by contrast per week two s bracket 0 is a Char always has
been a Char a specific Char but if you want the address of that Char you just
add the Ampersand well it turns out that strings per the definition we keep
emphasizing is just the address of the first character in a string so of course
if you do this you're going to see the exact same thing and if I do this a bit
more generally you don't want to copy
paste but this is just for uh visualization sake let me print out all the
characters so another another another and let me change this to print out
the address of bracket 1 bracket two and bracket three so all four characters
Hi exclamation point and the null character notice I'm using percent P for all
of them so if I now do make addresses and do/ addresses now notice and this
is kind of cool the first two are indeed still the same but what's no
noteworthy about the other values on the
screen yeah they're consecutive each of these is just one bite away even if
you're not good at hex yet and there's a crazy number of digits here who
cares they're all the same except for the last ones four four and then 5 six
seven and this confirms what I've been claiming for weeks is that in an array
all of the characters are back to back to back contiguous One Bites away so
with just this Ampersand with just this star like it's actually a pretty cool tool
in the toolkit to have because you can start to
poke around what's actually going on inside of the computer's memory and
in fact if we do this I can introduce one other cool trick here if you will let me
propose that we can actually now do arithmetic on pointers and you don't
have to you'll see a simpler way to do this but now that you have perhaps
this underlying understanding of where things are in memory and it's just
addresses we can actually do something kind of neat we can do something
like this uh let me go back to how about uh the string
version of this with high and let me do this instead let me um clean this up a
bit get rid of some of these lines of code and let me do this let me print out
percent C percent C percent C let me get rid of all these Amper Sands we're
going to roll back to like week two stuff just to be clear when I compile and
run this version of the program and I'll zoom in what should get printed on
the screen this is just week two stuff now no pointers per se yeah mhm just
Hi exclamation point one per
line because I have all of these back slash ends so let me do that let me go
down here make addresses enter okay pretty good/ addresses and indeed Hi
exclamation point But now if you're getting a little more comfortable and it's
fine if you're not yet today but over the coming week or weeks as you get a
little more comfortable with the equivalence of addresses with our definition
in the past of arrays and strings and all of this you can start to play around
and I can do this instead if
I I want to print out the first character in the string I could do like week two s
bracket zero like that will always work and you can keep using that that's not
a cs50 thing it's just a convenience in C but I could technically print out not s
because s is an address but what would be the syntax I could use to say print
out the character at s any Instinct how can I say go to the address in s it's
one of two possible answers today so of our two new uh uh of our two new
operators today we have the
Ampersand and the star which one will lead us to what is that an address so
the star so in fact if I want to print out what is that address zero at the
address S I can just do star s and if you really want to get fancy how do you
print out the second character that's immediately to the right of it so to
speak well you can go to with the D reference operator and do you want to
answer this one s+ one Argo pointer arithmetic like you can do math simple
addition subtraction whatever on pointers if you
want and you can do this here too so star you want to pluck this one off too
how do I print out the last character the third s plus2 right because if you
know and understand that like a string is just a sequence of characters every
character is just a bite and these bites are back to back to back you can just
go wherever you want in the computer's memory and here I can do make
addresses again/ addresses and voila we now have high exclamation point so
we haven't printed out anything new but again just
by using these two new operators the erson and the star you can figure out
the address of something and you can go to the address of something okay
question in back indeed it ends up being the exact same and so I might have
used this term before the Amper sand technique where sorry the square
bracket technique where you do s bracket 0o s bracket One S bracket 2
that's actually what we would really call syntactic Sugar like it works and you
can use it you should use it it's nice and simple
but the square bracket notation underneath the hood is essentially being
converted to this which this is not fun right like this is when you want to
show off to your friends like you know how to do cool stuff in code but this is
not as readable as just s bracket Z and one and two but that's all that's
happening underneath the hood and so again this is why in cs50 we spend
time on some of these lower level building blocks because if you assume that
indeed your computer's memory is just this grid of
btes and you have now the code ability in code to get an address and go to
an address you can start doing any anything you want and you can poke
around a computer's memory at any location and here in lies the danger like
I'm kind of on the honor System right now that if my string is high
exclamation point it's kind of up to me to go to the first bite the second and
the third but I could get kind of crazy now and if I want to see what's going
on in the computer's memory I mean there's nothing stopping me from
doing like s+ 50 and let's see what's there so make addresses do/ addresses
Hi and then okay nothing it seems well how about 5,000 bytes away let's
poke around what's inside of the computer's memory so make addresses
again uh make addresses addresses enter okay still nothing there let's try
50,000 all right do make addresses do SL addresses okay there we see it so
you've probably done this some of you by accident because you probably
went too far to the left or to the right in an
array touching memory that you shouldn't suffice it to say I should not go
blindly touching 50,000 bytes away cuz who knows what's there and indeed
in your computer computer when a program is running the computer uh
segments it into different segments of memory and if you get a little too
greedy and you touch another segment of memory that technically was not
allocated to you by Mac OS or Windows or Linux or the operating system bad
things happen and you get a segmentation fault and that
means it's a bug in your code so you can now do this and this means hackers
too can do things like this if they can somehow inject code into your C
program maybe they can poke around the computer's memory and indeed
this is kind of the technique whereby maybe a really sophisticated hacker
can jump to this memory this memory this memory looking for something
like your password or your financial information or anything that's in the
program but at some other address there's nothing
stopping an adversary at least right now from poking around if they can
execute code on your computer from doing this kind of thing so there and
again is the power of C but also the danger and you'll absolutely suffer more
segals in the coming days but ultimately the goal is going to be to help you
solve them ultimately uh and fix things um but for now I think that was that
was quite a bit so let me propose that we go ahead and take our longer
break here maybe 10 minutes and have ourselves some whoopy
pies in the transcept we'll be back in 10 all right so we're back and to recap
where we left off you now have this new capability in code to do pointer
arithmetic like treat addresses as numbers which they really are in heximal
or otherwise and like add them together and kind of poke around a
computer's memory and it was asked during break actually how we might
further harness this in the context of string so I didn't change the code we
wrote just before break recall that we last broke
the program by checking out bytes 50,000 bytes away but let's not do that
and let's actually try printing out not individual characters like I did per the
percent C but why don't we try printing out strings and substrings if you will
so let me clear my terminal window let me change all of these percent C's to
percent s percent s percent s and then let me rewind to what we've been
doing since week one with strings which is just print them out for instance
with that first line and the only difference
at the moment is that now I took off the training wheels I got rid of cs50.h
wherein string is typ Def to char star for you got rid of that so now on line
five I'm declaring S as being a Char star which just means the address of a
character and print f is smart enough to know that the end of a string is
wherever that null character is but now that I can do pointer arithmetic
notice that I could do something like this if I want to print out s i just print out
s suppose I do s+ one here and s+2 here
again after changing percent C to percent s any intuition around what this
code will now print on the screen line by line yeah thoughts okay reasonable
conjecture maybe the memory address of H that of I that of exclamation
point but other [Music] thoughts yeah I think it's going to do the latter it's
going to print high in the usual way because honestly line five is this rather
line six is the same as like week one stuff except we took off the training
wheel of string and we're calling it Char star but I think line
seven is indeed going to print out I and line eight is just going to print out
because it'll be just the exclamation point prti will still be smart enough to
know where each of those substrings a portions of the string End by the
same logic as always but let me go ahead and zoom out run make addresses
enter compiles okay/ addresses and now indeed this is all a string is it's a
sequence of characters identified by its first bite if you then start poking
around and tell printf to print at what's at the
next bite or the next next bite it's going to do its same thing printing out that
character and everything after it up until that null character so again even
though there's like a lot going on we've introduced these two new operators
like there's nothing that's happening today that hasn't been happening for
weeks but hopefully through this week uh this week's lecture this week's
problem set and Beyond you'll start to realize that now you just have more
tools via
which to harness those lower level implementation details so last week too
recall one other implementation detail I claimed that you could not compare
two strings quite as easily as you could compare to uh integers for instance
and I told you to use a different function instead that you probably used one
or more times with the past problem set how are you supposed to compare
strings apparently yeah so string compare stir comp that additional function
that we said H you just have to use it for now
but you might have a little intuition already as to like why we have to use Stir
compare and we can't just use equals equals to compare strings like any
intuition for this already why was Stir compare necessary last week equ
perfect equals equals would compare literally the two memory addresses
instead of the actual strings character by character and unless the memory
addresses are literally the same so you compare the that exact same
memory address two different strings probably are not going to be
considered equal
even if to us humans they indeed look equal so let's see this let me go ahead
and close addresses. C and actually before I do one last mention one of the
powerful things about Pointer arithmetic as an aside is that c and really the
compiler is smart enough to know how many bytes to keep adding and
adding and by that I mean this right now we got lucky because a string is a
sequence of characters and by definition every character is is a single bite
you can poke around and do s+ one to get the
next bite S Plus 2 to get the uh the third bite however if we weren't dealing
with strings suppose we were dealing with integers that were in an array
back to back to back if you wanted to get at the next integer you could still
do plus one or plus two to get at the next or the next next integer you would
not start to get into the weeds of doing plus four and then plus eight you
don't have to know or care how big the data types are in the computer C and
the compiler will figure that out for you
based on the data type in question so keep that in mind if ever doing this on
a different data type uh than chars all right so let me go ahead and open up
a file that I wrote in advance most of and let me hide my terminal window
and show you this so here is a program called compare. C whose purpose in
life is to compare two strings I'm back to using the cs50 library because at
least for now in probably a couple more weeks it is so much easier to get
input from the user using cs50's function get in but
we'll conclude today by taking off those training wheels as well so you can
see how you can actually get user input with nothing cs50 specific so line six
and seven pretty boring week one stuff get an INT called I get an INT called J
and store them in two variables I and J respectively if I equals equals J print
out the same else print out that they're different let me just stipulate for time
sake I'm pretty sure this code is correct this will get two integers from the
human it will compare them and tell
to maybe be string s equals get string uh asking the user for s s uh then let's
change this second line here to be string T just to keep the variable name
short for now and T is a good uh Choice after s for something like this get
string prompt the human for T and then let's change our I and J here to do
the wrong thing per the intuition earlier if s equals equals T then print out the
same else print out that they're different now if I want I could take off at least
some of the training wheels I
could change this to char star I could change this to char star either is fine I
still need the Cs library though because I'm using get string because it's
actually hard as we'll see today to get strings manually without using a
library but I'll keep it using string just for now with the library all right make
compare again compare and now let me go ahead and type in for instance Hi
exclamation point Enter and Hi exclamation point Enter and they're different
all they're obviously not visually but they are
underneath the hood and you probably do have the intuition for this already
whereby what's going on underneath the hood is that we're comparing
accidentally the two memory addresses so in fact let's go there let's consider
the memory and let me zoom out now so I can just have more btes to play
with so the squares are a little smaller than before just so we can fit more in
them and let me propose that when I declare s on what was line six a
moment ago it ends up somewhere in memory like the top
leftand corner of my picture for discussion sake and when I uh execute that
same line of code and get string is called and I type in high exclamation
point we know from week one that get string puts it somewhere in the
computer's memory and I'll propose that it's in like the bottom left hand
corner of the screen here what happens after that well I know even though I
don't generally care that Hi exclamation point in the null character exist at
some address like ox123 12 4 125 126 for
discussion sake and what's in s same as before break 0x1 23 so that's all
that's happening again on line six which is pretty much the same as when we
were getting an S earlier but notice now with line seven when I get a second
variable called T and I call get string again and by coincidence as the human
I type the same thing well what happens here T gets its own chunk of
memory maybe at the top right uh that second version of high gets
somewhere else in memory you know the computer could be smart and
notice
it's the same but C doesn't generally do that for you it just plops it
somewhere else in memory and maybe it's at address Ox 456 457 458 459
or wherever but you can perhaps see where this is going already T now of
course contains the address of that first bite and so in my code on line nine
when I compare s and t for equality suffice it to say they are not equal
because of the way the uh strings are laid out in the computer's memory it's
indeed looks the same the same values are there but if we abstract
away further you can really see that s and t are not the same themselves
and so how did we fix this or really how did we avoid this last week without
spilling the beans and going down this Rabbit Hole explaining like why you
have to use Stir compare well if I go back to my code here let's do it now the
right way let me go ahead and include uh a line of code that says string
compare of s comma T both as inputs and then if you recall what does stir
compare return when two strings are equal there's three possible
check high in L caps maybe high in lowercase those are in fact different why
well stir compare which was written by some other human decades ago is
just smart enough to know that it should go to S and go to T start comparing
them left to right stopping once it hits one or both null characters and return
zero only if everything in s and and T are exactly the same are any questions
then on this here any questions on why we're using stir compare all right if
no yeah oh in the middle yes so so why does why is it not
the case with integers so it turns out it's not the case with integers with
floats with bulls with uh doubles with Longs like literally every other data
type works correctly strings though are special they are useful enough in
programming and have been for decades that the authors of printf and the
authors of stir compare and bunches of other functions Sterling for that
matter just kind of treat strings special because they're just useful right we
humans interact using language be it English or anything else and so it's
just useful to have into the language C just sort of uh first class support for
this notion of strings of human text so the short answer is just because like it
just uh is necessar strings are different they're implemented with this
address and the null character everything else though is just a value but a
string again is a white lie it's an address it's not a thing unto itself good
question yeah in front oh really good question so in my code here in vs code
what if I do this instead of stir compare and instead of
if s equals equals T what if I start playing around using star s and star T
really interesting case to consider let's go back to our sort of deductive logic
here so star the asteris operator today means go there so when I've typed in
high once and then high again both uppercase for instance what is at the
address s literally someone else what is at the address s yeah uh so not quite
at the address not so not what is the address what is at the address ox123 H
and what is at the address Ox
456 H also and so here you're kind of cheating like you're comparing the first
character of both strings but not every other one now you could be really
pedantic and here again this is like a good use of uh codee but you could do
this if that and how about this craziness so star s+1 equals equals star t +
one and and you could do this for every character manually but that's why
stir compare exists it does all of this for you but that's why and that's the
intuition so I would encourage you too anytime there's
rid of all of this comparison stuff and let's just see what's going on as you are
welcome to in your own code let's print out for instance as we might have in
week one the value of s itself in a new line comma s and then let's just print
out T just to make sure it compiles and I'm not doing anything wrong but this
is not going to be that interesting and frankly I don't need string.h anymore
because I'm not using stir compare so make addresses addresses there's my
um oh sorry that's fun okay
not percent T percent s here too ignore that let's do this again make a oh and
that's the wrong program okay dot SL let's do make compare compare and
let's type in high again and high again and and now we just see the two
strings I'm not comparing but now we can kind of play around right instead
of printing out percent s which prints the string how do I print the address in
S I just need to make a slight change if I want to see not what's at s but I
want to see s the address
almost the same but this one ends in b0 this one ends in F0 so they're
indeed separated by some number of bytes not just one but a few because
these strings are indeed longer all right so once you've seen this here how
can we now maybe leverage this to solve other problems well let me propose
that we do this let me uh zoom out here let me close compare and let me
open up another program I wrote part of in advance called copy.c so copy. C
in theory makes a copy of a string how on line eight I'm using
the same thing as before get string storing in a string or char star and asking
the user for it then I'm not asking get string again I'm just making a copy
super simply with line 10 here string T equals s now intuitively I think that's
how I would copy a variable right that's how we've copied variables every
week thus far and see but something's going to go wrong in line 12 in English
does someone want to explain what you think line 12 does don't worry about
finding any bugs or mistakes but
what does line 12 seem to be doing using two upper which which is thanks to
the ctype library which I've included the header file for yeah yeah right it's
kind of like ugly syntax but this would seem to be capitalizing the first letter
of T specifically and just changing it so we have t bracket Zer here because
we want to save the change and we're passing to two upper the first
character here so this is how we did uppercase in the past and now I print
out s and t respectively using percent s so this feels like it
should work I copied s and stored it in t on line 10 and then I change T and
only t on line 12 but you can perhaps if you're comfy thus far see where this
is going if I do make copy copy and let me type in lowercase Hi exclamation
point this time just once so I'm going to hit enter and watch what we see for
the value of s and t huh the new value of SN T at the end of my program
seems to be what it seems to be the same high is capitalized both times so
what's the intuition then for this
why did this just happen yeah and back yeah I assigned S&T the same
memory address so it did copy s into T but C takes this very literally what is s
it's an address what is T it's a copy of that address if you want to copy the
whole string like a normal human would expect hey you or someone has to
do a lot more work you have to go to that address copy this character this
one this one this one and copy it to a new location in memory that does not
happen automatically here for you in C it does
in memory well let's go back to the Big Grid this time focusing on the
copying of values and let's do this here's S as in this new program just
declared to be a Char star uh here is where my lowercase High maybe ended
up in the computer's memory that's probably at ox123 12 4 125 whatever
something like that and that's of course what ends up in s as a value when I
declare T I do get a second variable called T just like before but when I copy
s into T what happens it's really just literally ox1 2
you two functions one of which is called Malo one of which is called free and
these are used all of the time by like every piece of software you and I use on
our Macs PCS and phones whether it's written in C or some equivalent other
language Malo is for memory allocation it's a function that you can use to ask
the operating system Mac OS Linux Windows anything for some number of of
bytes one bite 100 bytes a gigabyte of memory you can ask malog for
however much memory you want in advance it will
return to you the address of the first bite of memory that it found free for you
unlike a string it is not null terminated and so the danger with malok is that
it's on the honor System if you ask it for one bite or 10 bytes you the
programmer in like a variable have to remember how many byes you
requested one or 10 or the like strings do that for you not when we're getting
now to this low level Mal just going to give you some memory and it's up to
you to manage it free does the opposite when you're
done with some chunk of memory you can free it by passing in that same
address and just hand it back to Mac OS windows or Linux and say I'm done
with this you can let me use this for something else later um as an aside if
your computer has ever like frozen or hung like the whole thing maybe just
spontaneously reboots yet another reason for a bug like that might be if you
write a program with a bug that keeps Mal loocking Mal loocking Mal loocking
that is asking for more and more and more memory but you make a mistake
and you
never free it well eventually the computer is going to literally run out of
memory and something's going to go wrong and that's often when
computers freeze like they're just out of memory it has the memory there but
the program was trying to use too much of it endlessly so this too will be a
mistake that some of us will surely make in the coming weeks but hopefully
you'll now see the solution so let me go back to uh vs code here and let me
propose that we do the following I'll hide my terminal
window for a moment and I'm going to introduce another header file up here
and I promise there's not going to be too many more of these but this one is
called standard li. for standard library and in this file are the Declarations the
prototypes for malok and free and a bunch of other stuff as well it lets me
now manage my own memory so let's focus now on line 11 line 11 is where I
went wrong before because conceptually I want to copy the whole string but
of course I'm only copying modestly the individual
address so how do I copy the whole darn thing well what I need to do is this
when when I declare T to be the address of something in memory why don't I
set T to be the address of a free chunk of memory so let me ask the
operating system give me this many bytes tell me what the address is and
I'm going to store that in t initially just so I know where there's free space for
me so how do I do that well quite simply I call malok and then I pass in the
number of bytes that I need now for Hi exclamation
point I think I need three although wait no I really need four because of the
null character but I don't think I should be hardcoding numbers like this cuz
who knows what the human's going to type in so I can actually use Sterling
of s and then plus one this will ask malok then for however many btes
corresponds to the number of characters the human typed in plus one for
again the null character so it's just being smart and defensive rather than
choosing a number myself but now all T is is a
pointer if you will to some random chunk of free space so there's nothing
there yet or there's you know bits there but who knows what value they are
they're certainly not identical to what the human type did in I now have to do
this so how can I copy one string into the other well let me do this instead of
uh capitalizing something just yet let me do this how about four in I gets zero
I is less than the length of s uh and then i++ so I'm going to iterate for the
whole length of the
string and in here I'm just going to do this the E character in t should be
identical to the E character in s so I'm just literally copying from right to left
each and every character in s and I can trust that there's enough memory in
t why cuz I asked for that many bytes plus one now there's technically a bug
here I actually should probably do this I should do plus one here or if you
prefer I should do less than or equal to the Sterling but I think it's a little
clear to do the plus one why do I for
the first time want to go just beyond the boundary of s and copy one more
bite yeah yeah I need the null character like I could technically manually add
it with some additional line of code but I might as well just copy it because
back sl0 is back sl0 so this time and probably only this time it's reasonable
and correct to go just beyond the boundary of your string so you copy the
null terminating character so that the computer also knows where T ends
and now I think what I can do a little more
safely is this let me go down here and say t braet 0 equals 2 upper of t uh
two upper of T bracket Z so same line of code as before if I actually want to
be really safe I should probably do this so if the stir L of T is greater than zero
so there's at least one bite there okay now it's safe to blindly capitalize the
first character and I think that now puts me in better shape so let me try this
now let me uh open up my terminal make uh copy do slash copy I'm going to
type in Hi exclamation point in all
four bytes total what is now happening well T is defined as pointing to that
because that's what Malo gives us the address of the first bite of the free
memory and now with my for Loop I'm just iterating over it copying the H
then the I then the exclamation point and then for good measure the back
slash Zer instead questions then on this process here a really good question
um if I omitted in my code the uh plus one and I didn't do less than or equal
to so that I'm copying the fourth
bite odds are in this program because it's so short you wouldn't notice that
there's an actual error but what could happen is When I Call printf On T if
there's no null bite there it might print h i exclamation point some random
value some random value some random value some random value until it
gets lucky and there happens to be a zero bite a null bite by chance for
instance so if you don't include the back sl0 some way that's going to
happen and I say some way I could even do this I
could technically just copy the length of the string s and at the very bottom
here I could do something like T bracket I uh sorry T bracket um Sterling of T
I could do this but this is just not necessary like I could manually add it at the
end of the string but again i' claim that it's just simpler to borrow that is copy
the one that's already in s because it's the same thing at the end of the day
good question other questions on this copying correctly [Music] now all right
is there any room for
general calling a function inside of your condition is probably not very good
design like why why is it bad for me to be calling a function like Sterling in
this condition in the middle of my for Loop yeah yeah you're just calling it
again and again for no reason like the length of s never changes so like why
are you wasting everyone's Time by calling Sterling of s again again again
again just to check this inequality whether I is less than that value so it turns
out if you haven't discovered this already
there's a slight optimization we can do here that has nothing to do
fundamentally with strings or pointers just with better design I can actually
Define two variables at once I could do this let me remove this whole
condition and let me add a comma after I equals 0 set n or any variable
equal to the stirl of s + 1 and then after the semicolon just ask the question
while I is less than n so it's almost the same but notice now my condition in
the very middle of this Loop is at least comparing two static values n never
change sorry one static value n never changes all that changes is I but I'm
not foolishly calling sterling sterling sterling again and again why well how
does Sterling work similar in spirit to PR print F Sterling given the name of a
string looks at the first character and then starts looking through the entire
string looking for the null character and we saw this in week two counting up
how many characters are there so it's just a waste of time again and [Music]
again totally if you wanted to use n
multiple times you could absolutely take it out of the for Loop put it right
after s is defined and reuse n and again and again absolutely but in General
consider this when designing your for Loops even though modern compilers
like cang can actually fix this problem this inefficiency for you good practice
would be don't call functions unnecessarily especially if the answer is always
going to be the same all right so what else should I perhaps refine here well
how about I do one last thing and just
comment on what exactly could go wrong here well a couple of things well
actually this is just silly too like surely someone before me in the world has
had to copy a string before surely there's a function like called stir copy
maybe like stir compare like stir Ling and indeed there is so let me propose
that we actually get rid of this whole for Loop and we actually just call a
function called stir copy no o just St cpy and pass in the destination which is
T first and then the source that you
want to copy into the destination and that takes the place entirely of that
whole Loop so again I demonstrated the loop first just to be very pedantic
about it but that's wasting time you're wasting time writing lines of code you
don't need to stir copy is what you can use here instead and so this has now
always existed and what more can I do well as one final point it turns out
that there's actually things that can go wrong in this code even besides the
string being too short like if the human
Just Hits enter and there are no characters I don't want to blindly capitalize
the first character that doesn't exist that's why I added that if condition but
there's other things that can go wrong and we introduced those to you today
it turns out that functions like get string and functions like Malo return
potentially a special value and wonderfully confusingly it's also called null
but with two L's all right so left hand and right hand weren't talking so well
like decades ago NL is a back SL
zero it's a single character as it always has been for a couple of weeks now n
is technically a pointer it's an address but it's address zero it's like the top
left hand corner if you will of your computer's memory that just nothing is
ever supposed to go in by convention so null is a synonym for zero but it's
specifically an address now why is this useful well suppose that in my code
here something goes wrong with get string suppose you're being a little
crazy and you type in way too long of a string
it's not just high but it's like an entire essay of text and there's not enough
memory in the computer how does get string signal to the programmer whoa
like that's way too big of a string I can't fit it in memory well we never told
you this but all of this time it turns out that that get string will return this
special value called null if something goes wrong so to be really careful now
you should do something like this if s equals equals literally null then you
better um exit the program
entirely and return like one or two or three to signify that something went
wrong don't uh go any further similarly with malok it's possible if you ask for
way too much memory that could fail especially if you're asking now for
double the memory after the human typed something in so if T equals equals
null then you know what but let's also return one or some other value to just
get out before something crashes or freezes on the human as well so
honestly I tend not to do this always in class because the
code just gets so bloated and complicated but you absolutely in practice
need to start doing this otherwise you will be responsible for the freezes and
the crashes and the reboots that users in the real world might actually
encounter otherwise of course if we get to the bottom of this program now I
should probably return zero explicitly or implicitly to just signify that
everything is is successful but there's one other thing I haven't done we
introduced malok but what did I claim also existed so free I'm also being a
little
reckless now here I am not practicing what I'm preaching I'm asking the
computer for memory via get string I'm asking the computer for more
memory my via malok and I'm never technically handing it back so really
what I should be doing at the very bottom of my program to is freeing the
memory I've asked for so henceforth it is a rule a law if you will in see
whenever you allocate memory with malok or certain other functions as well
you the programmer must free it when you're all
free memory that comes from get string because the cs50 library
automatically frees it for you but you anytime you use malok henceforth as
you did or I did here you must free that by just passing in the same address
you got back questions now on malok and free questions yeah oh really good
question so free just so what does free do so free um just uh lets the
computer know that you are done with that chunk of memory which means
that if you have a another line of code elsewhere that same memory might
be
reused and can be used again and again and that's going to be necessary
certainly for any long running program you can't ask for memory constantly
you'll eventually run out so you need to free it in this way other languages as
an aside python you get another motivation in a couple of weeks for it is
going to be Python and certain other languages manage all this headache for
you but in C the goal here is to really harness these capabilities ourselves all
right so it turns out like almost
everyone in the room everyone in the room myself included you're going to
screw up when it comes to anything memory related if you haven't already
segals are in your future but hopefully there's tools via which you can detect
these things and fix them proactively and not just use print def uh or debug
50 or rubber duck we actually have another tool we can equip you with now
that help you find some mistakes so let me do this let me close copy.c let me
open a program I wrote In Advance called
memory. C that doesn't do anything really interesting but it's going to have
two bugs in it notice that I've included standard i.h as always I've also
included standard li. which is necessary now for anything related to malok
Andor free and the like line six it's a little weird what I've done here but this
is like the manual way of asking for enough memory for an array in week two
how do we ask for memory for an array you very simply say int X3 and that
gives you an array called X of size
three but if you do it manually Now using Malo what you have to do is use
syntax like this you call malok you ask for three things times however big an
INT is now we know it's four so you could literally write 12 here but this is
more generic so three times the size of an integer will give you 12
dynamically and what do Malo return the address of the first bite you get
back what do I want to put that well I want to put it in a variable now the
variable can't just be int X because that's a
number it's not an address per se if I want to store this address in a variable I
could call it X I could call it P but in Star X just means that X is now the
address of a chunk of memory specifically a chunk of memory that's big
enough not for one but for three ins in total all right now I'm just sort of
naively putting our old friends 72 73 and 33 at the first second and third
locations in memory but perhaps based on week two or week four I'm clearly
screwing up up here in a couple of ways
someone want to identify at least one bug what did I do wrong yeah like this
is now you know amateur uh stuff like I should be zero indexing not one
indexing so this has got to be 0 one2 ultimately and other bugs that are
maybe more week four specific other bugs it's more subtle yeah I'm not
freeing the memory right so I'm not practicing what I'm preaching by freeing
this memory now suppose these are non obvious and honestly after like an
hour or two of this like this shouldn't be obvious yet it will be over
time how could I find these bugs with uh software as opposed to just staring
at the thing or asking someone for help well let me propose this let me first
go ahead and run make memory to compile the program and it seems to
work look fine there's no syntax errors at least dotmemory notice seems to
work fine too now this program doesn't do anything interesting there's no
printf or anything like that but it didn't crash there's no segmentation fault
but that doesn't mean there aren't bugs latent in
the software and this is true sadly of all of today's software like Chrome and
Microsoft Word and other programs surely have memory related bugs that
people at Google and Microsoft haven't yet found but there are tools at least
to find the most obvious of those bugs and we're going to introduce you now
to a program called valren so valren it's a fairly fancy program but we'll use it
for very simple ways we'll look at your code and find memory errors as it's
executing and
try to help you understand where they are so let me go back to vs code here
memory seems to be fine you know I feel like okay I'm going to submit this
homework all is good no error messages that's no longer the case now you
need to poke a little more at your code to see if maybe there's still some bug
there so let me do this valren and then space dotmemory so just like debug
50 you run it on a program you already compiled valren I'm going to run it on
a program I already compiled
let me uh zoom in on my terminal window so we can see more at once and
enter all right the output is crazy cryp IC for no good reason there's lots of
numbers and equal signs it's a lot of clutter but there is some juicy
information here and let me start from the top down invalid write of size four
so write means to change a value read means to like access a value and this
is again esoteric like a lot of our error messages are but it looks like uh after
a block of size 12 allock and then there's these weird hex
notation there's some mention of malok but honestly the juicy part here is
memory. C line six that's probably my fault so let's look at line six per that
output let me shrink the terminal window look at line six okay 12 is now
Germain right if you did the mental math of the size of an INT times three 12
is somehow involved here but line six is now uh happening next year that's
where the memory came from what is this let me Zoom back in where is
there invalid right of size four like what's perhaps
going wrong here invalid right of size four what does that mean it's like a
very technical way of explaining the bug is actually one line later on line
seven as we already identified yeah indeed and I I misspoke a moment ago
the bug actually arises here with line nine so after the allocation of memory
I'm somehow writing four bytes incorrectly and unfortunately the onus is kind
of on you to sort of think through deductively like what could that mean but
I'm clearly touching four bytes
of memory in these few lines of code that I shouldn't be and hopefully here is
the light bulb already went off ear oh I'm not zero indexing okay that must
mean that X bracket 3 as you note is just too far past the chunk of memory
so I'm invalidly writing to four bytes that I shouldn't be so again it's not super
obvious this is not super userfriendly but at least it does give you a clue as to
where that bug is so the fix there is going to be quite simply to change the
one to a zero the two to a one and the
three to a two that'll fix that but there's still a second error and let me look at
the cryptic output again Heap summary some stuff there okay this does not
sound good down here 12 bytes in one blocks are definitely lost in Lost
record one of one very Arcane output two but clearly related to line six again
our allocation of memory now here too it's not obvious what the solution is
but memory is lost AKA this is a memory leak and now the deductions kind of
up to you why what is leak oh wait I didn't
call free and so the second solution here is probably to free x at the very end
of the program and if you really want to be you should probably check like I
proposed earlier if x is null just get out now while you still can and don't even
touch those other lines of code but if you get to the bottom return zero but
really the takeaways are I fixed my zero indexing of the array to avoid the
invalid right of size four and now I'm freeing the memory that I asked for so
there should be no leak lost all
right let's try this again make memorymemory no visible errors yet but let
me now increase my terminal window again do valren of dotmemory crossing
my fingers and now all Heap blocks were freed no leaks are possible I don't
see any invalid rights there's still a crazy amount of output but none of it is
erroneous it's not bad now I fixed my memory bugs and so now my my ta my
TF they're not going to find them either because at least valr has proactively
done that for me questions then on
valgren generally it's those two types of Errors you might trip over there's
not twoo much else in the way of Arcane output questions then on this no all
right well what else might be going on so someone alluded to this earlier
what happens when you for instance forget the null Terminator or you
generally start poking around memory that you yourself didn't ask for or uh
looking at values you didn't put there well let me go ahead and open this
code of garbage. C in honor of Oscar the
Bute yeah I didn't initialize any values for that array back in week two we
didn't do 1024 we did like three and I typed in like three test scores or
something like that he here I'm allocating memory even more than that just
because I really want to be dramatic with what I'm demonstrating but I'm not
initializing those values to anything and so here it turns out in C generally if
you do not initialize a variable or you do not initialize an array with explicit
values there are
just random positive and negative numbers interspersed among the zeros
well that's because I'm literally poking around a random 1,24 bytes of the
computer's memory who knows what's there so the lesson here is that
garbage values are indeed this like term of art it means that a variable that
you might have uh defined that you might have declared if you don't give it
an explicit value who knows what's going to be there and the lesson here is
just don't do that always initialize
variables to something either yourself or prompting the human for it
questions about garbage values you'll see them sometimes if you print
things you shouldn't or touch arrays beyond their boundaries all right so
maybe to make this a little visual too it turns out that a lot of things can go
wrong unfortunately with poers and we've seen some of them and here's
another program that's a little contrived it's very simple and it just is about
manipulating values it doesn't do anything useful per
Malo finds for me in X then I go to x and put the number 42 there all right
why it's the sort of meaning of life the universe and everything here but star
X again just means go to that address and put a value there so why I don't
know but it's just uh correct at this point but what about this line here star y
equals 13 unlucky in this case what's bad about this line here star y It's a
combination now of today's Primitives and that point here yeah yeah we
didn't ask the computer to
allocate any space so why was not initialized with an equal sign at any point
to anything and so what is inside y so to speak like a garbage value maybe
it's zero which isn't bad because at least it's nice and simple but maybe it's
some crazy large uh positive number some crazy large negative number
either way odds are if I go to this address or that address randomly with star
y bad things are going to happen and so let me go ahead and propose well
let's not do that let's let's actually do this
instead assign y equal to X and we've done that before and then I can go to
Y now and change what was a 42 to a 13 again why this is just for
educational sake but for now this does not crash because I only dreference y
with star y after actually giving it a value albe it a duplicate value similar to
our copy example earlier so our friends at Stanford have put together a
wonderful visual it's about 2 minutes long allow me to dramatically dim the
lights if we could and play with what happens with
memory when you do bad things like [Music] this hey Binky wake up it's time
for pointer fun what's that learn about pointers Oh goodie well to get started
I guess we're going to need a couple pointers okay this code allocates two
pointers which can point to integers okay well I see the two pointers but they
don't seem to be pointing to anything that's right initially pointers don't point
to anything the things they point to are called Pointes and setting them up is
a separate step oh right
right I knew that the Pointes are separate or so how do you allocate a point e
okay well this code allocates a new integer point and this part sets X to point
to it hey that looks better so make it do something okay I'll dereference the
pointer X to store the number 42 into its Point e for this trick I'll need my
magic wand of dereferencing your magic wand of D referencing uh that
that's great this is what the code looks like I'll just set up the number and
hey look there it goes so doing a d
reference on X follows the arrow to access its Point T in this case to store 42
in there hey try using it to store the number 13 through the other pointer
why okay I'll just go over here to Y and get the number 13 set up and then
take the wand of d referencing and just oh hey that didn't work say uh Binky
I don't think de referencing Y is a good idea cuz uh you know setting up the
point T is a separate step and uh I don't think we ever did it good point yeah
we we allocated the pointer y but
we never set it to point to a point T very observant hey you're looking good
there Binky can you fix it so that y points to the same point as X sure I'll use
my magic wand of pointer assignment is that going to be a problem like
before no this doesn't touch the Pointes it just changes one pointer to point
to the same thing is another oh I see now y points to the same place as X so
so wait now Y is fixed it has a point e so you can try the wand of D
referencing again to send the 13 over uh okay here it
goes hey look at that now D referencing works on why and because the
pointers are sharing that one point e they both see the 13 yeah sharing uh
whatever so are we going to switch places now oh look we're out of time but
L thanks to Professor Nick parlante of Stanford for spending a huge amount
of time doing stop motion animation for that but hopefully now you have a
sense of what two can go wrong when you misuse memory in this way but at
the end of the day we really only have these four new building
blocks today like the star operator the Ampersand operator malok and free
and really with that and the underlying understanding of what your computer
is doing underneath the hood we have this way now to really manipulate
things in memory For Better or For Worse and eventually we'll see how we
can build things but we can also now use today's Primitives to better explain
some things that we've been asking you to take for granted over the past
several weeks so for instance let me propose that we uh
do one volunteer up here if we could could we get one volunteer who's you
want to come straight up yep right in the middle come on you'll have to take
a left right there all right so we have two empty glasses here and two colors
of liquid and we have let me give you the mic if you'd like to say hello to the
group hello um I'm Moen uh I'm in the and first year all right welcome well
well welcome here I'm going to go ahead and fill these two glasses with this
colored liquid um purple here on my
right let's fill up a glass here yeah don't drink uh and now we'll put some
orange in here and what we'd like you to do for the audience if you don't
mind is swap the two values we've got a purple value and orange value and
I'd like the purple liquid in this glass and the orange liquid in that glass
please can I have another glass oh okay good intuition but for the
microphone uh can I have another glass so you can and just in fact I brought
one here for you why are you asking for this though cuz
if I just pour this into this and it'll get mixed up right so obviously we need
like a temporary variable if you will so here is your temporary variable and
you want yeah there's yeah all right so pouring the value of the orange glass
into this temporary variable if you will all right and now pouring the value of
the purple glass into the former orange glass and now the temporary value
goes back into the original purple glass and now I think we give you a round
of applause for having done that very well
okay thank you all right so so it should go without saying that in the real
world like that's how you do this and in fact in code that's pretty much how
you have to do this although ask us some time for a super fancy way of
doing it without a temporary variable turns out that is possible using bits but
for now let's suppose that indeed this demonstrates what is the reality in
code if you want to swap two values you need to have have something like a
temporary variable so for instance on the screen here is a uh
the beginning of a function called swap whose purpose in life is to as you just
did swap two values call it a and b so orange and purple respectively are now
just a and b and integers to keep things simple well here is the
corresponding code if I may to what you just enacted as a human you
declared a temporary variable a called temp in this case which was like me
handing you The Empty Glass and you stored the orange liquid in it akaa you
then changed the value of the formerly orange glass to be equal to
the Purple by pouring one into the other and then you did the opposite there
now at the end of this you still have a temporary variable that's now empty
so it's temporary in literally that sense like you just don't need it anymore
but it was necessary along the way so I dare say this code is correct logically
like this will swap two values A and B thanks to the use of that temporary
variable unfortunately though if I actually do this in practice let me go over
to VSS code here and open a program I wrote in
advance called swap. C which does this as follows in here notice I have my
prototype for a swap function at the very top and let me scroll down to the
very bottom there is that exact same code so I'm uh the same code for
swapping two values A and B which I'm claiming for now is correct now if I go
back up here what is main going to do for us main is really just meant to be a
demonstration of the correctness of your algorithm so here I declare on line
seven and eight two variables X and Y
being 1 and two arbitrarily respectively I then on line 10 just print out what
the value of x is and Y is just so I can see it on the screen I then call the swap
function on line 11 and then I literally print the exact same thing again I print
X and Y hopefully it'll obviously be the opposite so I think logically swap is
indeed correct let me do make Swap and then do/ Swap and I should see X is
one y is 2 and then hopefully X is 2 Y is 1 enter but I don't and it did work in
the sense that
the code compiled the code ran so it's not like some bug in that sense but
because I don't quite understand what's going on underneath the hood at
least as of right now or prior weeks this code here is indeed buggy in some
way but does anyone have an intuition perhaps based on today's discussion
is to like why this code while logically correct clearly works in reality
apparently does not work in C any intuition yeah perfect and to summarize
here's that term of art I prom when you call a function and pass in two
literally one literally two and not by another term of art by reference AKA by
their addresses swap has no capability in C to go to those locations swap the
actual locations just like we did successfully in reality but I think we really
have the syntax already for solving this if we consider that really this is just
an issue of scope and we've talked a bit about scope in the past whereby
scope refers to the context in which a variable lives and generally I've
claimed that a variable exists
between the most recent curly braces and that's pretty much true for the
swap function because a and b i now claim again exist only in the context of
these curly braces they have no effect on Main up top which has different
variables X and Y but we can consider now what's really going on underneath
the hood and here's that same picture of memory as we've seen in the past
if we zoom in and see on these little black chips this is a bunch of bites of
memory if I create a grid out of it just to kind of highlight
that we can address each of these bites throw away the plastic circuit board
and focus only on those bites what's going on underneath the hood when
functions are called in C which you've been doing for weeks now well this
rectangle of memory if we kind of abstract it away further is generally broken
up into different regions or segments like I called them earlier and different
things get put in different parts of the computer's memory and without
getting too into the weeds when you double click
a program on your macro PC or when you do/ something on a Linux you are
loading your machine code into the computer's memory from the computer's
hard drive so all the zeros and ones that compose Microsoft Word or Chrome
or whatever are loaded into the computer's memory or RAM and by
convention it's put up top in the so-called machine code area and that's how
the CPU has access to them quickly at that below that are what are be going
to be our globals so Global variables which we haven't used very much in C
but
you can declare them outside of main at the very top of your files if you have
globals they end up up there as well just FYI and then there's this big chunk
of memory that we saw valgren mention indirectly earlier called the Heap
and it's kind of like Heap literally like it's a heap of memory that you can use
as you see fit and the Heap is where malok grabs memory from so initially
there's nothing in the Heap it's just a big chunk of free space anytime you
call malok Malo kind of carves out from the
Heap area more and more bites and Malo keeps track of essentially which
bites have already been allocated so initially it looks empty but different
bites squares if you will keep getting requested again and again as a
program runs thanks to functions like malok and it grows if you will
conceptually down so the more and more memory you request from malok it
starts up here but then the next chunk you get is down here conceptually the
next chunk is down here down here so it kind of fills the available space in
the computer's
overall memory but there's this other chunk of memory called the stack and
just like a stack of trays in like annenburg or a cafeteria kind of grow upward
so does the stack of memory and it turns out the stack is where functions
have variables and have arguments stored temporarily so whenever you call
a function and it has variables inside of it or it has arguments there too this
is the chunk of memory and the computer's overall block of memory that are
used for functions but anytime you
call malok it's memory up here right at the end of the day they just had to
pick a direction top bottom and technically it's an artist rendition you could
Circle this thing around any orientation you want but you're just using a finite
amount of memory in this conventional way Malo Starts Here functions start
here now you can kind of see where like bad things can happen and indeed
one of the other reasons programs computers can crash is if you ask for way
too much memory from the Heap by calling malok
many many many times or if you call way too many functions or accidentally
per last week you recurse infinitely many times you might have a
segmentation fault and that's because you're using too much stack memory
so this is bound to be a problem eventually and the onus is on the
programmer to just minimize the probability of doing that and really avoid
the possibility of doing that by just checking return values like checking if
malok or get string return null because you can proactively with
conditionals make sure that these two things do not Collide by just making
sure that you get back non-null values so let's consider the stack in the
context of Swap and what's really happening here and Carter if you wouldn't
mind helping me animate the screen here when I call the main function of
any program it is allocated a slice of memory called a frame at the bottom of
this stack so if Carter you want to go ahead and uh Advance here here's like
the First Slice of memory that will always be used by main whether
memory is just freed up automatically you don't call free you don't undo
malok this just all happens automatically and has been since week one now
technically it's still there even though we've removed it from the picture and
there's your first hint of garbage values right like there's still zeros and ones
there and they're left in the original the previous configuration and so the
reason you get random values in the memory is because even though we
hav't drawn swap
here there was stuff there a moment ago it's going to be there the next time
you use that same memory now let's go ahead and step through this a little
more methodically main has two variables called X and Y one and two so
let's advance and represent X is one y is two taking up these two chunks of
memory when we call swamp now swamp gets a new slice of memory that
then gives us three variables A and B technically the arguments and temp so
what happens well because functions automatically pass in
just swap the things by value because you're only changing it in the scope of
the swap function function but I think if we change it to this and add some
annoying syntax we can solve the problem just like you can declare variables
as storing addresses you can declare arguments to functions AKA
parameters as taking addresses this new version of swap means that a shall
be the address of an integer B shall be the address of an integer and now it
gets a little cryptic here temp is the same because
it's just an integer like it was in week one nothing special about temp but if
you want to get the value at a you do star a and that goes to the address
grabs the number one presumably if you want to change the value of a you
go to that address you follow the treasure map to the other mailbox and you
set it equal to whatever is at the value of B you go to B as well last line you
go to B now and change it to be whatever the temporary variable was which
happened to be the same as a so that's where the
final value gets swapped but here there's a lot more like crisscrossing
metaphorically across the stage where you're going to all of these different
addresses in the swap function to make these changes so if we advance now
to the pictoral version of this here's the same story as before with Maine and
X and Y are 1 and two respectively when swap gets called now notice and I'll
do it with arrows here a is effectively pointing to X B is effectively pointing to
Y if we really get into the weeds
these are actually like addresses but who cares about the specifics it's really
just the concept here so now what happens in temp gets star a star a means
start at a and go there Follow the arrow if you will sort of shoots in ladder
Style and then that's one so we put one and 10th all right star a equals star
B so let's do it from right to left star b means Follow the arrow it's two and
then what do you do Follow the arrow it's now two because you copy one to
the other from right to left and then lastly star
me go back to my swap code here and let me change the function ever so
slightly in vs code so let me scroll down leaving main the same and let me
change swaps prototype to taking in addresses let me go to a here let me go
to a here let me go to B here and let me go to B here as well but nothing else
changes this change here in particular is enough of a clue to see that means
when you call Swap and pass into values I'm expecting addresses now not
integers but now that I've made this change I do need to go up
to Main and make one change does anyone have the intuition for what now
need change in main so that I pass in X and Y by reference that is by address
rather than than by value or copy oh yeah and [Music] back so close so on
the swap line it's not star that I want in front of the X and the Y it's instead
what's the other one it's the Amper sand why because if I want to enable
swap to go somewhere just like Carter and I played this game with the
mailboxes I need to inform swap of the
address of X and the address of Y and again per the beginning of today's
class Ampersand is the text via which we do that so I add an ampersand here
to get the address of X Ampersand here to get the address of Y and now this
code lines up with the picture that Carter just helped us walk through and so
when I run make swap here I have a mistake oh what did I do wrong not
intentional but I guess worth pointing out I screwed up here it doesn't like
Ampersand X because of something on line three
which is way early in the code what did I screw up yeah in the middle yeah
so this is why we you should not copy paste even though it's necessary for
things like function prototypes if I change swap at the bottom I need to
change its prototype so let me add the star there add the star there or just
recopy paste it at the top of the file now let me do make swap again let me
now do do/ Swap and I should now see X is one y is 2 and hopefully X is 2 Y is
1 which I now do so the logic is the same
the algorithm is the same all the weak zero stuff is the same except now and
four you just have a bit more expressiveness via which you can tell the
computer exactly what you want to manipulate and how any questions then
on this technique here no all right well when we fix this there's still going to
be problems and just so you've seen some terms of art here this is bad
whenever you have two arrows pointing at one another certainly if you might
use and reuse more and more memory and it turns out there's some
terms of art that might suddenly now make sense especially if you've
programmed before bad things Can Happen by this design but there's really
only this kind of design because it's a finite amount of memory so at some
point bad things are going to happen no matter what if a computer runs out
of memory so it's not that this was a poor decision it's just sort of a
necessary one given finite amounts of memory in a computer but a heap
overflow so to speak is when you actually overflow the Heap and touch
memory that you shouldn't up there stack Overflow is when you somehow
overflow the stack and touch memory that you shouldn't down there so with
that said these are really just problems that can happen and there specific
incarnations of what are generally called buffer overflows a buffer like in the
YouTube sense is just like a chunk of memory that in the case of YouTube
stores like the next few seconds or minutes of video but generally speaking a
buffer is just a chunk of memory that the computer is
using for some purpose be it the stack be it the Heap be it an array in the
computer and so buffer overflows are what happens when you just have
logical bugs in your code but with these Primitives now in mind we wanted to
conclude with a final Revelations and that's how some functions like these
here work the other thing in the cs50 library besides the type def for quote
unquote string is of course all of these functions and we give you these
functions because honestly in C it is hard it's annoying it's painful it's
difficult to get user input correctly it's very easy when you don't know how
much how much the humans going to type to write buggy code when it
comes to it and indeed it's really hard to store it correctly without
accidentally having some kind of buffer overflow so for instance let me show
you a program here I'm going to go ahead and write this one from scratch so
let me go ahead and open a file called get. C where I'm going to go ahead
and mimic the idea of getting integers manually without the cs50
library so I'm going to include standard io. only I'm going to Define main as
not taking any command line arguments and then I'm going to do something
like this give me a variable x with no value yet and normally I would do
something like get int but let me take that away no more training wheels for
get int either so let me just Define the int X let me then just print out
something like uh a prompt and I'll just do x c just to make it obvious to the
human what we're waiting for and now I'm going to use a
builtin c function to get user input I'm going to call a function called scan F
which sort of scans the user's keyboard for input I'm going to scan it for an
integer so just like print F I'm going to use percent I because I expect an INT
and then I want to tell scanf where to put the human's integer from the
keyboard it is not correct though to say x because if I say x I run into the
same swap problem scanf no function can change the value of X unless I
pass it not by value but
by reference so we're back to our Ampersand friend and now it has like a a
treasure map to the actual location of X and can therefore change it and so
now at the very end of this program let me do something simple like let's just
go ahead and print out with print F uh the value of x using percent I as
always plugging in X not Ampersand X this is now week one stuff I want to
print the actual integer value of x so the only change here is that instead of
using get in I'm now using this new function that
as of today exists called scanf so let me go ahead and run get make get to
create this program doget and let's go ahead and type in a value for x 50
enter and it just works so it turns out get in is pretty simple to implement
however notice what does not work if I type in cat for instance cat gets
converted to zero and meanwhile get in recall will reprompt the user if a
human not type an actual integer you get automatically reprompt so that's
one of the features we for cs50 added to get in just to make
although you know what there's no cs50 Library so we do Char star s today
instead and that gives me not a string per se but a pointer that will Point
presumably to a string ideally I would use this get string but again we've
taken that training wheel away so now that I have a pointer s suppose I
prompt the human for a value for S just like before let me use scan F now
and tell the user that I expect to read a string percent s from the keyboard
uh and store it in s now this is subtle I don't technically need an Amper sand
here even though I did for an INT and I would for a float and a double and a
long and a bull and a Char why do I not need an Amper sand in this story to
pass by reference because s is it's already an address again strings are just
special strings now are always addresses so you don't need to additionally
add an Amper sand here that's the only subtle difference here but now if I go
ahead and print out at the very end what the value of s is using percent S as
before this program looks like it's almost the
same as the int version but let's do make get and okay so this is not good all
right so it doesn't like an initialized value so let me make it happy I said
earlier to always initialize my variable so let's initialize it to null so that at
least something is there that's your good default value nowadays now if I do
doget now we're good and let me type in something like cat okay cat is not X
well let me try another word maybe it's just cat is wrong dog okay let me try
David it just
not actually request of the computer like actual memory to store the C A the
dog the d v d right there's nowhere have I asked the computer for some
amount of memory and so technically it might be reading it into some
garbage location and that's really the problem here S is initialized to null now
and so in fact it is printing zero as null but I'm not seeing any of the other
letters because there was nowhere to put them C A do D ID because I didn't
ask for three bytes four bytes five bytes 100 bytes there's
four enough for a one two three letters plus a null character here's where to
someone's question earlier it turns out then in some context you can treat
arrays as though they are Pointers themselves see we'll sort of do the
conversion for you but for now just assume that s is just an array of size four
and if you pass it into to scanf that's like a treasure map that leads to those
four bytes so scanf can now successfully fill it with c a d o but but let's try
this again let's type in
David and here okay we got lucky but I technically touched memory that I
should not and in fact if I typed in a long enough string and I don't think I
could do it very easily like by without typing this thousands or hundreds of
times still okay but you'll notice that it's forgotten the rest of it now right so
somewhere we went beyond the boundary of the array and we just don't
have enough storage space for that entire thing so what do you do in your
program if you don't know how long the person's name or
the the the animal name is going to be what do you do 40 400 4,000 40,000
like at some point you have to draw a line in the sand and that's why like
getting user input is so annoying in a language like C and that's why get
string exists what we do if you're curious is we look at the user's input and
we take baby steps we look at it one character at a time and every time we
see another character we actually call malok again and say no I need more
than one bite I need two oh wait they typed in three
letters I need three instead of two oh I need four instead of two and we have
this crazy loop essentially that keeps asking for more and more memory but
by taking baby steps and honestly if you all had to do that in week one my
God like we couldn't even write hello world anymore and so that's why these
training wheels exist at least early on and that's why in higher level
languages like in uh python you don't have to do this at all it just works as
you'd expect so what more can we do well
you'll see in problems set for this coming week if I open up an example like
this phone book. C you'll see that you can manipulate files now that you
have a vocabulary for pointers it's going to be new quickly but here we have
an example of how I have a program using some familiar libraries here but as
I claim in my comment this saves names and numbers to a CSV file all of my
examples thus far I type in some words I type in some names and some
phone numbers and they disappear because we only store
them in memory but if you want to store data in like a CSV file comma
separated values which is like a simple spreadsheet like Excel and apple
numbers and Google Sheets can open you can actually do this yourself so
just as a teaser for this week here on line n I'm using a new data type not a
cs50 thing this is a c thing called file but if you want to manipulate files you
need to use addresses that is pointers so here is me creating a variable
called file that's going to point to an actual file on the
hard drive on the server or your Mac or PC fop is going to be a new function
you'll use that will open open a file and it will return effectively a pointer
there to in memory the file name I want to open is phonebook .csv and in
this example it's going to be uh a pen mode it will keep allowing me to add
more and more names and numbers to this file here's some old get string
stuff because I'm not going to reinvent get string with scan F but down here
is a slightly new function it's not print F but
fprintf and it turns out it's very easy to print things not to the screen but to a
file with fprintf and it takes an additional argument instead of starting with
the quoted string you'll have to like say what file you want to write to and fr
print F we'll figure out how to get the uh the bits into that file passing in
something like name comma number so if I run this somewhat quickly here
let me do this let me pre-create a file called uh phonebook do CSV and in
phone book. CSV I'm going to create a
temporary row here name comma number just so that there's something in
this file and now let me go ahead and do this and split my screen here if I
have [Link] and phone book. C on the left let me compile make
phonebook which is the C version phonebook and now I'm prompted for a
name and a number so I'll type in David and then for instance + one 949 uh
what is it 468 275 o enter oh damn it bug uh pretend that didn't happen I
forgot to hit enter in the file so let's do this again if I run the
program Again David and plus one 94 9 468 2750 enter it's been saved now
to the file and if I close this file and I reopen code of [Link] or the
like and I've actually created an actual CSV file uh if you're smiling because I
keep repeating my phone number out loud I would encourage you to call or
text that number sometime it might very well well be a an Easter egg of
source but via these functions here do we have now the ability to write files
uh input and output and among the goals
then for this week as we'll see are to actually play with images in the spirit of
something like Instagram filters or the like and we'll introduce you for
instance to a file format called bmps which to come full circle to the start of
class are just maps of bits but more than just single bits for white and black
but rather colorful patterns as well and we'll give you images like this of the
week's Bridge here across the river at Harvard and you've run after writing
your own code in C and
understanding how the data stored in the computer's memory you'll be able
to apply your own Instagram like filters to make things uh grayscale instead
or sepia in this case you can even flip the bits around so that the thing is a
mirror image you can blur things further or if you really are feeling more
comfortable you can even write code that finds the edges of the image and
creates works of art like these so all that in more in problem set four we will
see you next time [Music] [Music]
[Music] [Music] [Music] [Music] [Music] all right this is cs50 and this is week
five which is going to be our last week in see uh but what that means is that
we'll have okay but with this week with last week and really all of the weeks
prior have you been hopefully acquiring if slowly and with some challenge
like some fundamental building blocks that are still going to underly
everything we continue doing in the semester even as we transition to so-
called higher level languages next week indeed we'll
introduce python a very popular language that does not have pointers does
not have memory management at this very low level but that's really just
because someone else wrote the code that will do that for you and it's going
to make our lives easier because it means when you want to solve a problem
concept up here to just get real work done or build something amazing you
don't have to really get into the same weeds as we have been deliberately
this week and now last but the goal ultimately is that you
better understand and can better harness than all that a computer can do
even in those higher level languages So today we're going to focus
particularly on data structures how you might structure your data in memory
which really amounts to building things digitally stitching together ideas and
Concepts in memory using this new building block from last week which of
course are these pointers pointers allow you to store addresses in memory
like in variables but with those simple addresses we can sort of leave
these breadcrumbs we can point from here to there and we can conceptually
stitch pieces of data together but there's going to be different ways of doing
that and today we'll focus first on what's generally known as an abstract data
type and similar to a type in C more generally it does actually have some
properties in it but the underlying implementation details of an abstract data
type are ultimately up to you that is to say an abstract data type can
represent one thing and can do something
but how you implement it allows you some discretion underneath the hood
so for instance in the world of computer science a q is actually a technical
term this is a type of data structure that we could theoretically build in code
in C or really any other language but a q has properties just like cu's in the
real world for instance if you've ever lined up for something to get food in a
uh and get food in a restaurant or go into a store or wait for the airport to
clear you well you've lined up in a queue Q
being some sort of line but what's noteworthy about Q's are specific
properties they are first in first out data structures either virtually or in the
human world which is to say the first person in the line should ideally be
served first at the restaurant or the first person in the line should get through
airport security first by contrast if it weren't first in first out you can imagine
how frustrated you would be if you have this sort of inherent unfairness in
fact if you've ever been in line at a store a
supermarket or the like and all of a sudden they maybe open a new line and
now the person behind you gets to kind of cut ahead and go forward that's
because they've broken the concept of the cube so it has this inherent
potential for unfairness unless you maintain this first in first out property this
would be true too for like a to-do list just for productivity sake if you're in the
habit on paper or virtually making a to-do list ideally you probably want to go
through that list top to bottom so that you actually
get the first stuff you thought of done first rather than always focusing on
your most recent thought now within the world of Q's there's generally two
operations two functions if you will that any Q would have either in the real
world or in the virtual NQ is usually the technical term to mean adding
something to a q but specifically it means adding it to the end of the que so
that someone isn't cutting in line for instance to go up front and then DQ is
just the opposite when it's time for the
first person in line to be served the time for the first person in line to go
through security they are dced so to speak so technical concept ultimately as
it's implemented in code but it's really just a real world concept and these
are in contrast to another abstract data type that we might call a stack and a
stack much like a stack of trays in the cafeteria has sort of fundamentally
different properties you can still add and remove things from them but
consider what happens whenever they clean all the
trays in the cafeteria or the dining hall they put one of the trays down here
and then the next one on top of it and then the next one on top of it it it and
so forth but of course which tray do you presume presumably take as a user
of that physical stack the top one presumably right like you're not going to
fuss down here and try to pull one out and so that would seem to have an
opposite property lifo last in first out is what characterizes something like a
stack and that just makes sense certainly in the physical
world of stacking all of those cafeteria trays because it just makes more
sense to grab the most recently added one the last added one first and at
least in that context the trays don't necessarily care what order they're using
used in but even then you could imagine that maybe there's some old dirty
nasty tray at the very bottom that like never gets used because you're
constantly replenishing the stack so there might very well be side effects of
these kinds of features um you might be familiar
with using Gmail for instance or really any email program what you're
looking at in your inbox is technically a stack at least if you've left the
defaults configured why you get a new email where does it end up not like
five pages of emails later presumably right at the top and the next one's
right at the top right at the the top right at the top and so if you're like me
you're guilty of eventually losing track of some emails why because you've
pushed so many more emails onto the stack that you sort
of lose track of the things you got earliest last in first out though is
maintained the most recent email you get might very well be the first one
you reply to but that's not necessarily good for responsiveness to everyone
else out there uh similarly if you store like all of your sweaters in a stack like
this uh the uh like a pile of black ones below which is a red and then a blue
stack might be perfectly fine for keeping things organized it's sort of the
sane way to do it if you just have a shelf in
your dorm room or home but what's going to be a side effect of using a stack
to store your sweaters if they're these in this way as opposed to a queue
yeah it's harder to get the red and blue one so presumably you're going to
much more often wear for instance if you will black instead there now the
operations for adding things to a stack are similar in spirit but just different
vocabulary you push something on top of a stack um even though it's more
like in the tray world you sort of place it or rest it
but pushing means adding something to the top of the stack and popping
means some removing something also from the top of the stack so it's not a
matter of enqing at the end and deqing at the beginning with a stack
everything's happening on top you're pushing onto the top and then popping
off of the top now when it comes to actual code how might we Implement
something like this well let's just focus on really how you might implement
the data structure itself and we won't Implement any functions you
might implement the notion of a stack like this we've seen type def before it
just means Define a new type of my own struct means here comes a
structure AKA a data structure of one or more variables within and let's
suppose like last time we've had we defined already like a person data type
using a separate type Def and every person has like a name and a number or
whatever let me just stipulate that person exists already so here you might
have to implement a stack an array called people
there for those sweatshirts in my closet for instance whereas size is just
literally at this moment in time how many sweatshirts are in the stack it's
either capacity or fewer presumably in total there so what is capacity well we
could implement this in you know perhaps a familiar way I might just Define
a const somewhere else in my code of type int that just defines it to be
capacity 50 but what perhaps is going to be the downside of implementing a
stack in this way of how using an array inside here
like what's a downside now of implementing a stack using an array and this
size variable within what's a caveat here perhaps any hands yeah okay so it's
going to be harder to reach elements that aren't the last one that is the most
recently added one so there could be some sweatshirts so to speak way
down below so we've seen that before too but at some point too a limitation
of this design is is what it's going to be finite right I can maximally fit in this
example 50 sweatshirts or 50 emails or 50 cafeteria
Trace which is fine but at least it's indeed finite and at least in the
computer's memory it might be nice to use more and more and more maybe
as more things are getting added to the computer so maybe I make this 500
or heck why don't I make it 5,000 or 50,000 well what's the tradeoff there if I
want to have enough room to grow seems like I should just crank up the
value of capacity endlessly but why might I not want to change the 50 to 500
or 5,000 or 50,000 what's the trade-off there
perhaps just intuitively yeah okay you don't want to touch memory that
you're not supposed to be touching and in this case it wouldn't be that
wouldn't be a risk per se unless you indeed overflow the stack but there's a
related issue in asking for that much memory what would another downside
be yeah okay exactly so if you've got a capacity of 5,000 but you're only
using one of those elements it's it's awkward to say it non-technically which
is just to say very very wasteful right that's
just bad design it's correct it will work for up to 5,000 elements but my gosh
you're wasting 4,999 extra spots and that's not going to end well especially if
you're using more data structures in memory like your Mac your PC your
phone is surely going to run out of memory if you ask for that much so it' be
nice if there is a bit more dynamism there whether it's a stack or a CU both
of which might be implemented a little similarly in spirit but let's conclude
this abstraction by
guy he knew he went up to Lou and asked what do I do Lou saw that his
friend was really distressed well Lou began just look how you're dressed
don't you have any clothes with a different look yeah yes said Jack I sure do
come to my house and I'll show them to you so they went off to Jacks and
Jack showed Lou the box where he kept all his shirts and his pants and his
socks L said I see you have all your clothes in a pile why don't you wear
some others once in a while Jack said well when I remove
clothes and socks I wash them and put them away in the box then comes the
next morning and up I hop I go to the box and get my clothes off the top Lou
quickly realized the problem with Jack he kept clothes CDs and books in a
stack when he reached for something to read or to wear he chose the top
book or underwear then when he was done he would put it right back back it
would go on top of the stack I know the solution said a triumphant Lou you
need to learn to start using a queue Lou took Jack's
clothes and hung them in a closet and when he had emptied the box he just
tossed it then he said now Jack at the end of the day put your clothes on the
left when you put them away then tomorrow morning morning when you see
the sun shine get your clothes from the right from the end of the line don't
you see said Lou it will be so nice you'll wear everything once before you
wear something twice and with everything in cues in his closet and shelf Jack
started to feel quite sure of himself all thanks to Lou and his wonderful
Q all right so sure so that paints a picture of these two abstract data
structures but if we really were to dive underneath the hood we could
Implement them in a number of different ways but we really I think need
some building blocks via which we could solve problems like those and we'll
see today to some others as well so let's rewind back to week two where we
imple we introduced you to your very first data structure that is an array and
an array of recall was just a chunk of memory whereby
you've been running it for a while there's a lot going on in your memory is
being used and reused so for instance somewhere in memory might be
immediately adjacent to this like hello comma world back0 the null character
just because maybe you have another variable somewhere in there that is
storing that particular string alongside your existing array of size three and
all of these Oscar the grouches here really just represent what we called last
week garbage values like there's obviously
bits there because they don't disappear they're always going to be inside of
the computer somehow implemented but we don't really know or care what
they are they're the remnants of those bytes having been used for other
older variables previous function calls or the like but the problem clearly here
is that okay 1 two 3 is there but the H is here and unless I want to start uh
taking a bite out of my string by overriding the H with a four like we just can't
fit it right there and yet
even though there's Oscars all over the place those are indeed garbage
values and therefore we could use that space because it's technically unused
we just don't know or care what the values are so where could I put 1 2 3 4
well my gosh like I have all this memory down here that's unused I could
certainly change those garbage values to be 1 2 3 4 but to do that I might
need to do a bit of work here right it's not just a matter of just saying boom
and it happens now with c and with code I'd
to new and then ultimately we can get rid of the old memory those three
original bites could now look like Oscar the Grouch and just be garbage
values for all intents and purposes but now I have room for a fourth bite
wherein I can put the number four so this is nice but what's a downside of
this approach what's a downside of solving the problem in this way where
the problem at hand is just to grow the array so to speak to increase its size
to fit one or more numbers seems pretty straightforward but
yeah okay maybe it's out of order but I think that's okay because the order is
just matters that it's relative so so long it's it's still contiguous back to back
to back in a different chunk of memory I think we're okay there it's not like I
changed it for 4 3 2 one but a reasonable hunch yeah yeah like I don't really
plan ahead here like if I have to add another number like five or anything
else well I might have to jump through these hoops again maybe I get lucky
and maybe
there's space there but not if I have other variables and other things going
on that too might be used at some point other thoughts yeah slow efficiency
slow effic in slow efficiency why have to again yeah I mean it's just in it's just
inefficient it's sort of bad design arguably why because I had to copy all of
my original work down here and as you note if I want to add a fifth number
I'm going to have to copy it again and again and again and do things end
times again and again now maybe that's necessary
we'll soon see for sure but it feels like this is not going to end well especially
if the array isn't of size three or four but 300 400 your computer ends up
spending so much time just spinning its Wheels I mean honestly better might
be this like if this is my same array physically incarnated Now 1 two 3 it's
literally on the edge of the shelf so there's no room for the number four you
know maybe where we could take this story is well let's just find room for the
four like let's just put the
four for instance over here replacing some available garbage value some
spare bite over here but now wait a minute I've broken the definition of an
array right it's I can't have one two three and then four over here so maybe
there maybe there could be a mechanism if I put this thing on again where
when you get to the end of the existing elements maybe I just somehow
digitally point to the fourth array and maybe we can kind of stitch together
all of these different values in memory so that if
you follow the arrows so to speak we can reconstruct exactly what the order
is even without having to find or make room here or pick up all of these
numbers and move all of them over there so that's perhaps the direction in
which we'll go here so let's see how we might get to that spot as follows let
me go ahead and open up say VSS code here let me open up a program
called list. C in my terminal and let me go ahead and whip up a relatively
simple program that just demonstrates what we did back in week
into codee what we just had uh pictorially on the screen and also physically
here with these numbers on the desk now let's just do something mildly
useful for this how about we do four in I gets zero I is less than three I ++
let's just print each of these numbers out just to make sure they're indeed in
memory as I intended so percent I back sln comma I and then a semicolon
and I think that's it for now so nothing interesting no problem solved just yet
just a proof of concept so that
now when I clear my terminal and run make list no apparent errors at the
terminal and so when I now do do/ list I should see hopefully from left to right
1 two 3 but of course if I want to add a fourth number now there's no
mechanism for such certainly in the code that I just wrote I could go back in
here and change this to a four I could go down here and change lists bracket
3 equals 4 I could just manually change the code recompile the code but of
course that doesn't give me any additional run way for the fifth
this syntactic sugar this convenience of just using square brackets and
indexing into it it's just making it easier to manipulate a chunk of memory
that's contiguous all together back to back to back so today just like last
week we can take those sort of training wheels off and maybe be a little
more deliberate in how we allocate memory let me go for instance and do
this let me delete my contents of my main function here go back into Main
and let me propose now that I declare for instance how about my
list no longer as an array but as a pointer so int star list and I'm going to go
ahead and initialize this to be how about a chunk of three integers for now so
I'm still going to hardcode it but I'm taking a step toward more dynamism for
now so let me allocate three times whatever the size is of an INT but it's
usually going to be four bytes as we know so this is really going to be 3 * 4al
12 but it's a little more Dynamic and now what can I do down here well this is
just a chunk of memory
this it's just too cryptic it's a little too far over the line at least for most
people and so I think the syntactic sugar as I keep describing it just the more
user friendly square bracket notation does the exact same thing figures out
the pointer arithmetic and puts each of these integers in the right chunks
therein now just to be super pedantic let me make sure if something went
wrong so if list equals equals n that means that something went wrong like
my computer is out of memory which
we should check for typically so let me just immediately return one signaling
anything other than zero which means success typically just to get out of this
program because something's wrong but now let me propose that I've had a
uh well let's do this for in I gets zero I less than three i++ though a better
design would always be to use a con but I'm just doing this for demonstration
sake let's print out each of these ins too and just make sure I didn't mess
anything up and let me open my terminal
window again let me do make list again okay huh implicitly declaring Library
function Malo with type void star something something implicitly de
declaring is the operative words there what did I mess up yeah yeah I forgot
the header file in which malok is declared I remember now okay that's in
standard li. and it's fine to look stuff like that up if you forget so let me
include standard libh now let me clear my terminal run make list again okay
now we're good /list and now what did I do
[Music] wrong oh okay not intended but teachable moment what did I do
wrong yeah yeah I'm printing the values of i instead of what is at location I in
the array so what I actually meant to do was print this out thank you you so
now let me recompile make listlist and now okay those are the three values I
was expecting not the indices thereof now let me propose that for the sake of
discussion that I I regret having only allocated space for three integers and
maybe I really should have allocated enough space for four now
this is not how you would do this in practice because presumably if you have
a change of thought just go back in and correct the code but let me propose
that somewhere in here is a more complicated program and time passes dot
dot dot there's a lot of other interesting code there but at some point I might
want to give myself more memory so how can I do this well let me just ask
the operating system now for four new bytes of memory so that we can at
least in version one implement the ID on the board where I
just copied the three bytes into the new four bytes and then added a fourth
value so I'm going to use malok again and I'm going to say here's a new
pointer I'll call it temp TMP for short which is quite common when you just
need it briefly I'm going to then call malok again I'm going to say give me
four integers using size of let me again make sure so if temp equals equals
null something went wrong so let me just immediately return one and for
good measure before I return one one let me
free the original list so that I don't leak memory so I'm not just immediately
returning one I'm being a good citizen and remembering well if this Malo call
did succeed and indeed I got as far as line 18 but then line 18 failed I should
free the memory that I previously macked so again that's the rule of thumb if
you allocate it you should be the one to free it even before you're about to
quit now once I've done that I think I need to do what we did pictorially on
the screen where I need to copy the One the
two the three from the old array into the new so how might I do this well let
me give myself a loop so for in I gets zero I less than three i++ because the
size of the original is still the same let me go ahead and treat the new chunk
of memory called 10 temp as an array itself and so I can absolutely use
these square brackets just like before it's just a chunk of memory I'm treating
it like an array and let me add to that value whatever is at the original list at
location I as well so this again is
just this exercise of copying from right uh from old to new step by step the
one the two and the three but I still need one additional step if my goal at
hand now is to have ultimately a fourth value here well I'm just going to hard
coose this for demonstration sake and I'm going to go to the very last
location of temp which is of size four which means the last element in temp
is temp bracket three because it's zero indexed but there's four total spaces
there and I'm just going to arbitrarily for the
sake of discussion put the number four there and that is what happened
when we proposed uh changing the final garbage value there to that four but
now I need to do what the slide did for us sort of magically on the screen I
should now do a couple of final things I should free the original list which I've
not done yet CU I only called free earlier in cases of error and that was just to
be safe I can now free the list and now if I want to inform the computer that I
want list quote unquote my variable
called list to point at not the old chunk like it originally did but the new chunk
I think I can just do this list equals TMP and again that's just saying that if list
is a pointer which it was cuz look at the very top line here on line eight on
line six I declared list to be a pointer uh to a chunk of memory temp
meanwhile is a separate pointer to a chunk of memory so down here this line
33 is just a matter of my saying okay now henceforth because I've already
freed the old chunk of memory my list
variable should Point not at this chunk of three bytes but this chunk of four
bytes or really 12 in total now uh or rather 16 now because we have four
such bytes questions now on this code the point of which was quite simply to
demonstrate how we could Implement and code this idea of Fairly correctly
but inefficiently allocating a new array of sufficient size and then populating
it with a new fourth value questions on what we've just done here no yeah
[Music] good question at this point in the story
with line 33 do I not have two different variables pointing at the same chunk
of memory short answer yes but here's where the semantics are perhaps
compelling list is the variable that I intend to use longer term and keep
around in memory and again assume that there's even more code going on
here that we just didn't write yet so it's useful to have that that variable
temp was just kind of a necessary evil because up here it would not have
been correct to do this it would not have been correct to
say list on line 18 equals the new chunk of memory because this would have
represented a memory leak if I pre uh prematurely change temp to point not
at the old chunk but the new chunk at that point no one's pointing at the old
chunk and so I've lost those three bites vren for instance would yell at you
for having lost as many btes in memory so in this case here I do leave this as
temp yes it's duplicative at this point but it's it's not a huge deal if it was just
meant semantically to be a temporary
value but down here at the risk of one more line of code I still want to to be a
good citizen free list and maybe just for good measure return zero explicitly
but notice it's not doing it twice per se on line 31 what am I freeing the
original address of list the three integer version then I change what list
points at so it's pointing at a completely different chunk of memory this one
of size four So eventually when I'm all done using this memory for this
demonstration I still need to free list
but at this point in the story line 40 it's pointing at the new chunk of memory
which I similarly need to hand back to the operating system by free yeah
when would temp equal null so let me scroll back up slightly this is being a
good citizen and a good programmer whenever it comes to using malok
malok can return null if the computer's out of memory so this is maybe a
much bigger program you've got other things going on in it and so you just
don't have enough memory available to be handed malok
needs to signal to you that there's some error and so it will by convention per
the documentation per the manual pages return null so this is just me being
a good citizen otherwise here's another error that might cause your program
to crash with a segmentation fault if you get back null but you assume that
it's good memory uh going to address zero AKA null will crash your program
intentionally [Music] yeah correct if I were to change my final line 40 here to
be free temp this would also work as well and here this is
really a matter of design it's a very nitpicky thing we could probably debate it
but because at this point in the story my main variable for remembering
where the list is is called list this is sort of the more responsible way to do it
freeing the list just so that my colleagues my ta doesn't sort of wonder why
are you freeing temporary ver memory that you already freed like it just is a
semantic thing at this point but good Instinct it would also work correct
maybe just not good design all right so
it turns out that this gets annoying quickly as it did in the picture of doing all
of this duplication and even though technically it's necessary to copy those
values if you need a newer bigger chunk of memory there is at least a
function in C that simplifies a lot of this for us and in fact let me go ahead
and do this instead of using malok this second time on line 18 in addition to
the first time I used it on line six I'm actually going to try and introduce
another function called realloc which as the name suggests tries
to reallocate memory for you and it works a little differently from malok
realloc expects two arguments the first one is what is the chunk of memory
that you want to try to grow or Shrink that is reallocate to be a different size
and then you specify what size you would want and indeed in this case I
want four times size of int and that will now give me hopefully a new address
of a chunk of memory that's big enough to fit all four numbers but what's
wonderful about realloc is that it will handle all of
the copying for me so in fact I'm going to go down here I'm going to get rid
of all of this this extra for Loop and what I'm simply going to do instead is
this once I can trust after lines 18 through 23 that rioc worked and it didn't
return null because I'm out of memory I can just say okay just immediately
remember that the new list points at this new chunk of memory instead and
then I can still now do this line but I can tweak the semantics here and just
say list bracket 3 the new Final a in uh um the new list
is for I don't need to free this here I don't need to do this all I need now at
the bottom is the final for Loop to just print out these values so in short even
though that was somewhat quick using realloc just moves the entire copying
process that I implemented myself a moment ago using a for Loop it just
moves it to realloc and lets it deal with the copying for me it's no more
efficient but at least means uh I'm writing less code which is more pleasant
and hopefully the people who wrote aloc
or realloc are smarter than me and they just will introduce bugs with lower
probability too all right that was a lot any [Music] questions good question
why do you still need to make list equal temp as I did on line 24 so ideally I
would do this ideally I would just change this line 18 to be list that is to say
call re or actually even better ideally I would just say realloc this list to be of
this new size but again things can go wrong when allocating memory you
need to check a return value to see if it was
successful or not and so we need to use a return value okay so let's not
introduce temp let's just use list but here's where a memory leak might
happen in the off chance realloc fails and doesn't have enough memory for
your four bytes therefore it returns by definition null you can't overwrite the
original value of list with null to then check it why because now who
remembers where the original Three byes were if you prematurely change
the value of list you've lost you've uh leaked
memory in that sense and so that's why let me undo this change I declare a
temporary pointer for the sole purpose of making sure I can check the return
value and then once it's good now I'll update the value of list so it's sort of
doing a Switcheroo by making sure first that you have a new value to swap
with the old other question on this code yeah indeed realloc automatically
frees the previous memory for you and better yet it's even smarter than that
if you get lucky and they happens to be space
right after your existing chunk of memory so one two three garbage value
instead of one two three hello world realloc won't even bother copying things
from old to new it will just say okay I'm going to now reserve for you more
bites than you originally asked for so it doesn't have to waste time doing that
copying and so in that sense this version is now not only still correct it's even
better designed because we're not wasting time with that for Loop we might
have to resort to it if there is
in fact hello world or something else in the way but hopefully we'll get lucky
and save those steps other questions on this this manipulation of code here
yeah in the middle what if you want to resize a two-dimensional array so
very similar in Spirit uh whereby you can use the same trickery let me wave
my hand at that for now just because I think that's going to sort of
significantly increase the complexity but very same Primitives ultimately a
two-dimensional array is essentially just a doubly long or
a chunk of memory so is to make room for things we now have that ability
memory addresses and pointers just give us the ability to like Point around at
things and move things around in memory but now that we have malok and
even realloc you can imagine maybe rewinding and you could Implement
that stack that Q using not an array per se because you have to commit to
an array size in advance but if you implement your stack or your que using a
pointer and then malok and realloc and maybe someone else writes
all that code for you perhaps now you can imagine that okay now the stack
can grow or Shrink by using realloc accordingly you don't have to preact pre
um preemptively say give me five bytes or 50 or 500 or 5,000 you can say
just give me one initially and if I need more I'll realloc realloc reloc and if you
keep popping things off the stack you can realloc in the other direction and
ask for fewer and fewer bytes and the operating system can take that
memory back as well so we now have this
building block let's see what we can do with it so we've had a few pieces of
syntax in recent weeks all of which we're going to combine now in just a
slightly more clever way so struct is this keyword in C that lets us build our
own structure in memory like a collection of two or three or more variables
like a person that we've seen before the dot operator recall we've used when
you do have a struct like a a person and you want to go inside of it so like uh
person. name or person.
number we did this a few weeks ago now but the dot Operator just allows
you to go inside of a structure and get the individual variables within and
then the star operator unfortunately has a lot of uses now one was
multiplication like my God that was easy back in the day now it's used to
declare pointers it's also used to dreference pointers so to make one exist
and then go to that address unfortunately it's the same symbol for all of
those but it's all related but with these three symbols it turns out
you're going to get one last one today and my God it finally looks like the
concept it turns out there's a clever way anytime you want to use the dot
and the star together that is to go somewhere and go to an address and then
look inside of a structure you can actually literally use an arrow symbol on
your keyboard it's not a single keystroke it's a hyphen and then an open
angle bracket but at least it looks like an arrow and we'll see indeed in code
today the things I was drawing
pictorially on the screen last time with yellow arrows you can actually now
Express as well in code and so here we have our next data structure called a
linked list and this is one of the most useful powerful Concepts in C it's the
kind of thing that you can take for granted in Java and Python and higher
level languages but today we'll see how we or others can actually build these
things just using these same Primitives so a linked list is going to allow us to
actually do what we you know used a foam
finger for last week allow us to link together for instance these three values
maybe with that fourth value over there and then if there's a fifth you know
maybe this other foam finger points even farther overway to that fifth value
the key being that you can stitch together fancier data structures without
having to like pick all of these up and find new space you just have to at
least connect the dots somehow we just need to somehow point from one to
the other and that's going to make things much more
efficient it would seem so how do we get there so here's my computer's
memory as always suppose that I'm storing the value one somewhere in
there and it's at ox123 address whatever and I'm storing the number two
somewhere else in memory Ox 456 and number three at address Ox 789 this
is not an array by definition why even though it's the only three things on the
screen what makes this not an array it's not contiguous so this violates the
definition of an array but you know especially since they're
there's no more training wheels to take off here this is what we've got
underneath the hood of a computer so if all I have is memory I think the
solution to this problem of stitching together those values in a list must be to
spend a bit more memory that's literally the only resource we have right now
so let me propose that if we want to create a list conceptually out of three
values that are in random although pictorially pretty positions in memory let
me just add a little bit more
memory to the picture so in addition to storing the one I'm going to leave my
space myself some room a little scratch pad if you will to use some other bits
as well same for the two same for the three and you can perhaps see where
this is going based on last week if I want to somehow connect the one to the
two any instincts as to what I should write in this box here that would lead
me effectively from one to the two what could go here yeah we could store
the address of Two And so specifically what would you have
me write here perfect ideally I would just put in this box another integer one
that happens to be represented in heximal but that's just a base system it's
just a human thing for us to look at I'm going to put the value Ox 456 here so
let me go ahead and reveal that Ox 456 goes there you can perhaps see
further where this is going well if I want to get from the two to the three I
think I need to put below the two the address of the three which gives me Ox
789 now if three is the end of the list I don't want to
let it be some garbage value because that would imply that it who knows
where it's pointing I need some definitive value and just what would your
instincts be if I want to make clear with some special Sentinel value that the
buck stops here what do I put what my my options be yeah so null not n per
se but n l which was the new keyword we introduced last week which just
represents an empty pointer if you will technically the address o x0 so
literally the zero address and what humans did years ago they just decided
you know what nothing should ever live at address zero in memory we're just
going to reserve that one special bite to be a special signal a sentinel value
such that if you ever see a zero address in a pointer it just means it's it's
invalid it does not exist now now we write that though a little more
pleasantly for the eyes as just n l in all caps and that's a key word in C as
well but of course last week I claimed that who cares where things are in
memory and honestly like this quickly
gets tedious even worrying about these values so let me abstract this away
and propose that if we want to remember where all of these numbers are in
memory let's give oursel one final piece of memory that just allows us to
start the whole process let me allocate on the left hand side here not room
for a number like 1 two 3 just room for a pointer that henceforth I think I'll
call list by convention and then store in that one additional pointer a value
that just kickstarts the whole process
this is the sort of treasure map if you will that you get handed and this has
the address of the very first actual node in memory now technically we could
just start with this but it turns out we'll see it's just a little cleaner to use a
simple single pointer that leads to the things you care about as opposed to
just starting with the first element why well if you ever want to get rid of this
element it'd be nice if you could at least still hang on to an empty sheet of
paper that indicates that the list is
empty would be one argument for that so again who cares about these
addresses now now with the wave of the hand let's just abstract it away and
there are our pointers each of those addresses in the rec uh the squares at
the bottom are simply pointing to the next element in the list the jargon to
introduce here would be that now that we have these integers 1 2 3 but
they're in these like wrappers if you will these structures that have metadata
that is additional data that is related to but not the data
you actually care about this is data this is metadata this thing here
rectangularly we'll call a node n o and it's just a term of art that means it's
like a container in code for storing some values this then is a linked list and
this then is the sort of graphical incarnation of like one node pointing to the
other in this case case they happen to be by chance and by design of this
desk contiguous initially but there's no requirement that they be such the
one could be over there the two over there
the three over there I would just need more foam fingers to point at one to
the next questions on this concept of a linked list yeah and [Music] back can
you say that again a good question do traditional arrays start with a pointer
that's outside of the structure short answer no arrays are special in C and
certain other languages and the name of an array is technically a symbol if
you will that the computer the program knows maps to a specific location in
memory it's just a label a synonym for a memory address it
does not take up space so to be clear the name of an array does not take up
space like that extra Square on the left but you do need that extra Square on
the left when implementing a link list so that you can determine if the list is
of size zero there's nothing being pointed at or size three in this case we're
sort of taking on more responsibility ourselves yeah how do you point to the
next element can you [Music] elaborate ah good question if each of these
elements is pointing to the next
how is three point to the others short answer it doesn't at least in this design
we have more technically what's called a singly linked list and as the arrows
imply it only goes in One Direction so if you somehow find incode maybe a
for Loop maybe a while loop somehow you're sort of encode over here you
have no way in code to go backwards unless we changed this to a doubly link
list where I add another box that lets me have arrows in both directions or
maybe I just kind of make it uh uh
circular and I connect the three back to the one which you can totally do but
that tends to you know make life harder because now you have to figure out
when you're stuck in a loop in your data structure but it's doable as well but
as is it's a dead end by Design other questions on this design here all right
well how might we implement this structure in code well let me just connect
the dots to something like we've seen before here like this is how a couple of
weeks ago we introduced the notion of a a person
and we claimed a person might have a name and a number last week of
course we took off some of these training wheels and a string is really
technically a Char star in both cases but really there's no conceptual
difference beyond that but let's use this same Paradigm to implement a node
as I described it in that picture so let me get rid of the name and the number
because that's related only to a person and let me rename this structure for
discussion sake to node that then invites the
question well what needs to go inside of a node well minimally an integer but
this is now where we need to think a little harder just conceptually even if
you have no idea how to type it at the keyboard what else needs to be part
of a node based on these rectangular pictures that we've drawn what more
do we need yeah yeah we need a pointer to another node so if I don't know
how to implement this yet you know it could be something like you know
pointer to another node how do I do that well you know what but
it turns out you would ideally say this if you know that the next node is itself
a node by definition well anytime we've needed a pointer we just use the
data type and a star and I'm going to arbitrarily but I think reasonably call
this second square at the bottom of those rectangles next as the name of my
attribute here but node star just connotes that the next variable is going to
be not a node per se but the address of a node and that's exactly what we
did you had me put Ox 456 Ox 789 in that box
which is the address of another node so the way we would Express this in
code would be node star next but we could call the variable anything we
want now this is a bit of a white lie but we'll fix this right now this code won't
actually compile C takes you pretty literally recall and if you use some term
at the top of your file that you don't Define until later in your file you're going
to see some error message right we've seen this when I've messed up and
forgot to include the function
here and so we have this sort of Catch 22 like how can a structure be self-
referential that is point to another version of itself if the word doesn't yet
exist so the solution to this in C which we didn't need for a person because
there was no notion of listing connecting as a list we need one more keyword
here that we didn't need for a person and we reuse that keyword here so
kind of an annoying detail but if we preemptively call this whole thing struct
node you can now refer to the
thing on the inside as a struct node star but then you can shorten the name
of the whole thing from struct node to just node sort of an annoying
sequence of steps but in short anytime you're building a node a linked list in
memory this is just the Paradigm you use type def struct the name of the
thing you want to Define like node you use that name on the inside if you
want to point from one to another and then you can shorten it down here to
just be called node questions then on this code here questions on what we
just did well
if I rewind just a moment to that final picture what would be the upside to be
clear of having jumped through these hoops and added this complexity if you
will what problem did we just solve by linking together these three values to
be clear [Music] yeah making lists that are that are not contiguous if you will
so making lists that are not contiguous in memory the upside of which is that
if I want to add the number four to this list it looks like I could choose from
any chunks of available memory on the
screen I just need to sort of point from the end of the current list to wherever
that other one is in memory what I don't need to do to be clear is copy the
One the two or the three everything can just stay put which means TimeWise
I can do this much more quickly it would seem without copying things again
and again and even without using realloc to let it do all of the copying
potentially for me all right but as we'll start seeing even more in the coming
weeks every time we benefit and solve some problem we pay a
this weird shape on the board where the bottom square is even bigger than
the top square but technically we're using even more than twice as much
space for these pointers so there's that trade-off now thankfully decades
after SE was invented memory is generally much cheaper nowadays and so
it's okay to sort of spend more of it if you need to and it depends on what
you want to optimize for but that's absolutely here a downside what's
another downside of having transitioned to in a uh link list
you can't index into it now I haven't even tried in code but when you have a
linked list you can no longer use square bracket notation because why well
square bracket notation just assumes the contiguousness of memory location
zero is here location one is literally one to the right location two is literally
one to the right one to the right these things even though I've drawn it from
right to left to just keep things pretty there are gaps here and this is just my
interpretation of this these gaps could
be big they could be narrow they could be down here up here they could be
anywhere so long as we're linking things together in this list the computer
can't just use bracket zero bracket 1 bracket two anymore because it can't
do simple arithmetic and jump to like the middle and now here's perhaps the
worst price we've paid if you don't have square bracket notation or really you
don't have contiguousness what algorithm did we just sacrifice for this
dynamism if you rewind even back to week
zero and we gave it a name in week three what algorithm can we not use
now if we can't assume that the memory is back to back to back to back
binary search why because binary search just like the phone book back in
the first week requires being able to arithmetically jump right to the middle
right take the total length of it divide by two and boom you're right there in
the Middle with some simple arithmetic here they might be laid out again
with these big or small gaps there's no simple math I can do to just jump
immediately to the one in the middle and in fact again if this TV were bigger
the two could technically be in memory be way down here or even way over
here the foam finger could be pointing in any number of directions
depending on where malok put the thing there's just no way to do binary
search and so it would seem that we've paid another price indeed in terms of
it performance we're now talking about linear time again so that's a
regression now that's also a lot things like feels like a good time
for some muffins and fruit out in the lobby and when we come back we'll try
to solve the problem we just created so see you in 10 so we are back and
let's see if we can't now take some of these higher level concepts of like
stitching together these nodes in memory and translate it to some actual
code but we'll do it step by step first before I actually start writing it in vs
code so if Carter you wouldn't mind helping me step through with some
visuals let me propose that line by line we solve some
of the problems that we've just created for ourselves in building this thing in
memory so let's go ahead and first consider how we could build a length list
containing the numbers indeed one then two then three and let's translate
each of those steps to code and then we'll put it all together into something
that actually runs so how about first step here will just be this to declare a
pointer called list that's initially has no value at least at this point in the story
list is the name of the variable
node star just means that this is essentially going to be our little square over
here that points to the beginning of the list of course it's ideal If It ultimately
has a value because when we initially I'll call this line of code it just gives us
indeed that square over here on the left but it's got a garbage value because
there's no equal sign on the other side there so let's propose that we do one
more step here and actually initialize it to null so that if only we know that
it's not
garbage it at least has some known value and null is a good way of signifying
that at this point in the story The List is empty indeed null indicates there's
no nodes in the list so that picture would now look like this whereby let's just
draw instead of writing null everywhere I'll just leave the squares blank when
it's not a garbage value per se it's literally Ox Z or null all right so that's it for
building a link list of size zero like we're sort of done then but we want to
now add a one and then a
two then a three so next step here might be this if I want to allocate the first
of my rectangles on our previous picture I'm going to call malok and I'm
going to ask for enough memory to fit a whole node now technically I think
that's going to be like four bytes for the int and eight bytes for the pointer
even though I did not draw it to scale on the board so that's technically going
to be what 12 bytes but again size of node just figures out how many bytes I
actually need dynamically that's going
to return to me the address of that chunk of memory which apparently I'm
going to store inside of a temporary variable called n for short for node but
let's see what this does pictorially so when this line of code is executed I first
get on the left that variable n it's got a garbage value by default because I
haven't executed the whole thing from right to left meanwhile on the right
hand side of the expression I've got now a node somewhere in memory it
happened to be free here this is
where malok put it for me but it does have two garbage values initially but
because it's a node per my type def earlier every node I proposed is going to
have a number and a next pointer so we can see those labeled here but
they've got two garbage values initially but all I care about initially is that
ultimately n is pointing at that chunk of code so initially if we could back up
two steps we have two steps so we have initi one step forward we have this
line of code gives us this variable here which
has garbage when this side of the expression is executed that allocates the
memory and then when we copy from right to left the address of that chunk
of memory that's what gives us conceptually this arrow and the garbage
goes away because it's a valid pointer now of course there's still two garbage
values there because we haven't set this node to store a number like the
number one so let's go ahead and execute one other line of code like this
which while cryptic looking is just an application
of ideas we've seen in week four and prior star N means to start at this
variable and go there Follow the arrow is what the star or the D reference
operator does for us and then the dot operator recall when we first introduce
structs like for a person struct allows us to go at the number field or the next
field so if I do star n and then in parentheses to make sure order of
operations is preserved do number and then assign it the actual number one
which puts the one in the top of that rectangle now admittedly this syntax is
not very user friendly it's annoying to remember you have to the
parentheses so there's another Syntax for this whenever you're doing two
things like this in code dereferencing a pointer that is going to an address
and then further using the dot notation to go inside of the structure you find
that wonderfully C gives us this syntax whereby you can just change the star
and the parentheses and the dot to just be an arrow and again it's not a
single character on your keyboard it's a hyphen and then an
open angle bracket but I kind of like the semantics of this because this code
now pretty much matches the picture n arrow leads you to the value that you
want to access or ultimately change in this way there's one step though
we've forgotten of course which is that we can't leave this garbage value
here because the garbage value is some unknown value that effectively is
pointing who knows where and we don't want to accidentally misinterpret
that garbage value as being a valid address
and risk going there so of course what value should we put here instead our
old friend null just to signify that this is indeed the end of the list and we
could do that with a line of code like this and again we'll canote as much by
just leaving that empty box blank so now we have a list of size one let's go
ahead and add the second number to it as with these lines here list equals n
allows us to remember that indeed we have this list here so if we can step
one step forward here's what the picture now
looks like and technically let's go one step further here this is now really
what's going on in memory once my list of size exists my main variable
called list is pointing at exactly that first node at this point in the story I don't
need to know or care about the temporary variable that I called n even
though it might very well still be there but indeed this now represents that
link list let's now indeed add the number two so with the same line of code
as before I'm going to allocate another node size
of node ideally I would be checking for null here but we're doing the juicy
Parts only on the slides let's now go ahead and depict that so what happens
with this this brings back our n pointer which might have been there the
whole time but we're doing this step by step it's a garbage value though
because we haven't yet copied from right to left Malo of course gives us a
second chunk of memory which maybe ends up there with two garbage
values by default I've omitted the labels now just because
they're still going to be number and next respectively once we copy from
right to left the garbage value indeed becomes an arrow Oscar disappears
because it's now indeed a valid pointer pointing here now the values
themselves number and next are invalid garbage values so here is where we
can now start using our new syntax like the arrow notation or the star and
the dot if you prefer and we can change the value of n Follow the arrow to
number and that becomes two similarly we can do this
again and set n arrow next so start at n Follow the arrow access the next
field and set that equal to null now we're not quite done yet because we
haven't actually linked things together so here's now where things get
interesting how do I combine these two well let me me propose this let me
propose on our next line here we actually update for Now list equal to n that
is to say whatever address this is whatever it's pointing at change list to be
the same address that is point at the same thing
so if n is pointing here let's change list to point here and go ahead and do
that Carter if you could I don't like this can you go one further step this is bad
what is wrong about my sequence of operations here where I updated list to
point my new node yeah yeah we lost the pointer to the other node so I don't
even care about the ordering 21 or one two the bigger problem now as the
lack of arrows over there suggests is that I have a memory leak I have
orphaned my original node in the sense that nothing is pointing at it
anymore now absolutely I could fix this by adding some temporary variables
I could add it to the mix but at this point in the story I have not done any
such uh recollection thereof so let me back this up and let's go forward in the
slides this is where we left off a moment ago I think I need to take into
account order of operations and I'm going to keep this simple I'm not going
to care about the order of the numbers for now I'm fine with a list that is two
and then one so with that said let me go
ahead and update I think this box here to point at my original node so let's
see how we can do this in code okay n arrow next so n arrow next should
equal the current list and this is a little weird again but recall what list is list
is this pointer here that just contains the address of the original address of
the list or equivalently it contains this Arrow whatever it's pointing at so what
this means in this line of code n bracket next means start at n Follow the
arrow access the next pointer and set it
equal to whatever list equals so if list is pointing here then next should point
there as well this I think is safe because now we have redundancy now we've
got two pointers pointing at the original list and now I think we can do
another step whereby we update list to equal n same line of code before that
got us into trouble but I'm doing it second now instead of first when I execute
list equals n this now sets list equal to the same thing that n equals and so
now I have successfully inserted
my new node containing two into the list and in fact if we advance one more
we can just clear up the Clutter assume that the temporary variable is gone
from the story now we have a linked list where admittedly ordering is wrong
it's 21 instead of one two but at least it's linked correctly and I didn't orphan
or leak any memory questions on this sequence of steps here yeah in [Music]
back yeah spot on so this would fall under that category of a stack if you will
although I've not called it that by
name because I just pushed the number two onto this data structure if you
will and indeed it ended up at the beginning of the list instead of the end and
so here's where we see a distinction between an abstract data structure
which is where we began a stack is a thing like the pile of sweaters that just
has push and pop properties and lifo access like uh last in first out how do
you implement something like that in memory well it would seem that you
could implement the notion of a stack here not
for sweaters but for numbers using a linked list so long as you implement
insertion AKA pushing by prepending new values to the list by prepending
again and again and if Carter you don't mind hitting the keyboard one more
time if I wanted to add the number three now you would could imagine
prepending it to the list why well honestly especially as this list gets longer
and longer I kind of like the appeal of prepending these elements why
because even if this list gets crazy long and way way out here you
didn't notice me following all of the arrows earlier to do the insert if I want to
insert a fourth number a fifth number a sixth number all I have to do is like
insert it here if you will point it at the original uh start of the list then update
this pointer and done and I would say that's like two steps give or take it's
not going to be end steps as it would be if I had to upend the new nodes to
the end of the list now of course we've sacrificed ordering of these numbers
they're literally in the
opposite order or whatever order they were inserted in but that might very
well be okay depending on the goal at hand all right thank you to Carter for
stepping through this what if now we wanted to translate this oh sure thank
you it's all for you none for me in this example so here we have perhaps a
way of translating this now to some actual code and this will be the last of
like the sort of intense code here just to give you a sense of how we can
translate this idea now to actual step so this is list.
C and VSS code here let me go ahead and make a couple of changes up top
let me go ahead and how about uh declaring a node using typ def uh struct
node using our new framing as before I'm going to give every node a number
as I proposed and every node a pointer to the next element which is going to
be implemented just as before and I'm going to simplify the whole name as
just node so all of that is is the exact same type depth that we proposed
earlier now let me go ahead and get rid of all of this code
which we wrote earlier and recall that this was the most recent version that
was not a linked list this was just in Array that we allocated and then
reallocated so this is sort of the old way of doing things but it was inefficient
because we might have to lean on a for Loop or lean on realloc to copy
everything around we're now going to reimplement the notion of a list as an
actual linked list not as an array so my main function now might do
something like this and I'm going to really just
copy the lines of code that we just stepped through on the board so let me
give myself a uh special variable called list that's going to be initialized to
null and this is just my pointer the square on the left hand side of the screen
that represents the start of the list and if it's null it means the list is empty so
done I'm done implementing a linked list of size zero well now how do I want
to run this code well let me propose for the sake of discussion that this
version of the program will take
code here so let's do this in argc uh string uh argv but you know what we
know that strings are not actually a thing anymore so I can change my
command line argument definition to be what it really is it's really charar but
it's the exact same thing as in week two just strings are no more at least
without the training wheels on anymore like last week and now let me do this
uh for in I equal 1 uh I is less than ARG C i++ so what I'm doing with this
Loop is I just want to iterate over the command line
argument so I have one number at a time from The Prompt um what else do I
want to do here uh well let's go ahead and how about do this um let's get a
number so in number equals arv braet I so a couple of notes Here one I'm
starting my for loop at one instead of zero but I'm going up to RC RC is
argument count how many words are at the prompt why am I starting at one
instead of zero though given my goal why am I starting at one yeah yeah so
the first value in RV is is actually the name of the program that's
obviously not a number so I want the second value so I'm going to start
iterating over those command line arguments at I equals 1 so that's all I just
want to get the actual numbers at the prompt um unfortunately argv bracket
I is a string AKA Char star that is not an INT so this line of code won't work
but can anyone think back to like week two where we had a function for
converting strings to integers anyone yeah so a to I is a function that
converts asky to an integer assuming what you give it as an
argument looks like a number like one or two or three so let me fix this let
me actually do the conversion if I were really being careful I would error
check this make sure that there's no digits just like you might have in
problem set two but for today's purposes I'm just going to assume the honor
System that the user me is going to run the program correctly all right so
now that I have a variable containing the number from the command line
let's just allocate a node for it so let me do node star n just
like we did in the visualization and let's malok enough space for the size of
one such node here I now need to just be super safe so if n equals equals
null like if I'm out of memory you know what let me go ahead and just
immediately return one here otherwise if that's not the case let me go ahead
and update the number field of this new node which it line 24 does exist
because it did not return null so I did not exit early with return and let me
just store whatever number that human typed in first so the
is let's change it so that this new node points to that existing list and now
step two as before was to update the actual list to point at this node so recall
in red on the screen before I screwed up originally and I only did this line by
moving the pointer too early if you will but I fixed that once Carter helped me
rewind and we got rid of the red line which indicated error and I just do n
arrow next to change the next field of this new node to point to the existing
list so I'm not orphaning
anything all right at this point in the story I think my code is correct not
batting very well though today but I think my code is correct but the program
doesn't do anything interesting so it would be nice to kind of now iterate over
this link list in memory whatever its order is and print things out well how do
we do that well it turns out if you want to iterate over a linked List the
general Paradigm is to do something like this to define a temporary variable I
could call it temp but another convention that you might as
well see is called pointer PTR for short but you can call it anything you want
and you can have a temporary variable first point at the first node in the list
and then in some kind of loop like a while loop you point it at the second
node in the list and then you keep iterating you point it at the last node in
the list and then eventually you iterate too far effectively pointing at null at
which point your while loop can presumably terminate so how do I
Implement that idea of allocating a temporary pointer that just points at
each node in the list and lets me print out ultimately each of those numbers
well let's go back to my code here and let me do this let me go ahead and
declare this temporary pointer which is going to be a node star also why
because it's the address of a node the first the second the third and I'm
going to set that equal to whatever the beginning of the list is so that is
going to be equivalent to this version of the picture here where pointer is just
temporarily pointing at the first node
in the list it's not pointing at list per se it's pointing at the first node in the list
which list is also pointing at itself all right once I've done this I think I can
translate this to code that's a little new but it's conceptually familiar perhaps
now while that pointer does not equal null so while I have a valid pointer like
my finger or that arrow is pointing at an actual node in memory well let me
go ahead and print it out so let me print out with percent I back sln whatever
is
in the current node at the number field within and again this is going to have
the effect hopefully of first printing the three and I think I just need to Now
update the pointer so that on the next iteration it's pointing at the next value
so if this is where the story is how do I update pointer to point at the second
element of the list well I want pointer to point at the two and I want pointer
to eventually point at the three well how do I do that well the way in code I
can follow these arrows is as
follows if I currently have pointer pointing at this node but I want to point it
at the next node I can borrow this pointer here so whatever this address is in
the first node aka the next field I can copy that into pointer because then
pointer will point at whatever this is pointing at by just setting one equal to
the other so once I've done that the picture will become this and how do I
translate that to code it while new syntax is surprisingly straightforward all I
need do is say pointer after printing it
equals whatever pointer currently is but grab its next field instead and this is
a very common Paradigm when iterating over a link list and you're using
some temporary variable like pointer you can simply set pointer equal to
pointer next and what that means here is as follows if this is pointer pointing
from here down to here pointer next is Follow the arrow grab the next field so
if you set pointer equal to this thing that's the same thing as pointing this at
this same box and indeed if I advance to the next
slide even though the arrows are technically pointing at different parts of the
rectangles that's just for graphic sake pointer is now pointing at the second
node and when I do this again on my next iteration it points at this and then
this last step notice when I keep doing pointer equals pointer next this will
become eventually this value but what's this value in this link list it's null
technically so this Arrow will eventually take on this value when I set pointer
equal to pointer next and at
well if pointer finally equals null three steps later the four the while loop is
now done and so what I can do at the end of this program once I've printed
out those values well first let's go ahead and open my terminal window let's
make list okay a compile do/ list and let me try the same values one and two
and three that's going to again allocate one node two node three nodes by
prepending prepending prepending each of those values and it's then going
to iterate over them from
left to right and so when I hit enter now what should I see on the screen if my
code is correct what will I see feel fre to just call it out 321 because I've
prepended presumably and here we go I indeed see 321 so the list is is
backwards but all of the elements are there now technically if I ran valgrind
on this valgren would not be happy because I have never freed any of my
memory so I should probably now have a second Loop here that does
something like this let me again set pointer equal to list I
don't need to redeclare it because I've already created this thing on line 31 I
just want to reset it to be the beginning of the list again and now I can do the
same kind of thing while PTR not equals null go ahead and do this well I don't
want to just do free pointer and then do pointer gets pointer next y my goal
is to free all of my memory but I think this is going to get me in trouble
pointer equals list just gives me a temporary pointer that points at the three
and then eventually the two
and then the one how well while pointer not equal null I'm freeing the pointer
so this is like saying to Malo free that node free that node free that node but
what's the problem with what I've just done here this code is technically
Bugg [Music] yeah exactly after you call free on pointer You Are by social
contract with c not allowed to touch pointer anymore it is invalid now it's still
going to be a number it's still going to be a pattern of bits but it's invalid and
you'll very often get a segmentation
fault if you tempt fate in that way so I can't free the pointer and then use it
literally the next line the solution here kind of like our swapping of the liquids
last time was to maybe just have a temporary variable so I can do a
Switcheroo and so a common way to solve this problem to get the order of
operations right would be to do something like this give yourself a temporary
pointer like node star next set it equal to the place you want to go next so
one step ahead now you can free pointer and then you can update pointer
to be that next value so essentially you need need like two hands now you
create on line 41 another pointer that if this is pointing at the first Noe the
three your new pointer is pointing at the two temporarily so now you can tell
malok via free release this memory but I haven't forgotten where I want to
go next and so I can now continue on so a common Paradigm for just
iterating over these nodes and then freeing them a couple of observations
strictly speaking I could have Consolidated this I don't
need two Loops to print the nodes and then free the nodes I could do that all
at once but let's assume that there's other stuff of interest in my program
and I don't want to just immediately free it there's one other bug that I
should probably address here there is still a potential memory leak up here
and this one is super subtle the valind would help you find it notice that in
this Loop here when I'm calling malok this line of code is fine if the first line
of malok fails and returns null
because I immediately return and I'm done but what if the second call but
not the first or the third call but not the first or second fail this line of code
has me returning immediately you really need to to do some garbage
collection so to speak whereby you really need to go in and free any nodes
that you did allocate successfully earlier honestly that's going to be a pin in
the neck we won't do that here but probably what I'd want to do is write a
function called free list or something like that and
call that function to free any nodes I had previously created so it's not quite
at the finish line but the building blocks are indeed here questions on this
code and I think it's safe for me to promise that it won't escalate further from
that questions on [Music] this no well let me show you one alternative that
you might prefer and I'm pretty sure this isn't an escalation it's just an
alternative formulation another way you can iterate over nodes in a list could
be this instead of a
while loop for instance let me actually show you one other piece of syntax
here you could technically use a for Loop you could give yourself a node
pointer here that initialized is initialized to the list you can then check in your
for Loop that it's not equal to null and then you can do your update as usual
like this either of these are equivalent even though this one I suspect looks
scarier it's doing the exact same thing in one line instead of two but there's
no reason we can't use four Loops instead
of while Loops to achieve the same idea but I'll leave these two as
demonstrations of one approach or the other but that's just like in week one
four Loops while Loops whatever looks simpler to you even though
admittedly neither of these probably looks super clean all right so let's take
the back to things more conceptual here up until now we've been inserting
elements into this link list by prepending them let's consider what the
running time then is of these operations so if I've got a
link list of size three or size N More generally time has passed and I've added
a lot of things to it what's going to be the running time for instance of
searching a linked list for some value and I'll tell you already it's not login
because again binary search is off the table as per before break so what
might the running time be of searching a linked list for some value like two or
three or 1 or 50 what might the running time be o of I heard it over here o of
n and y who was that oh in the middle here why
memory so big O of one is possible with these link lists if I indeed Preen
things of course if I Preen things everything's going to get out of order
potentially and we're going to have maybe the stack property instead of a q
property so we might want to do things slightly differently so instead of doing
this whereby we kept prepending prepending prepending suppose we
append to the end of the list instead so if we now insert the one the two and
the three as we might want to for a q to maintain that
fairness property we might start with an empty list we might add the one we
might append the two append the three and so it just is sort of laid out
differently in memory and again if I can come to you in the middle what's the
running time of search again when the link list uses this append
implementation yeah still Big O of n because in the worst case you're going
to have to go through the whole list just to find it and notice it doesn't matter
if you have an intuition now that the bigger numbers might very well be at
the end you have no way to jump to the end you have no way to jump to the
middle or do anything resembling binary search every search has to start
from the left and follow the arrows again and again all right so I don't think
we've done any better there and in fact what is insertions running time now
in Big O when we're appending to the list in this way as we might to
implement a Q instead of a stack what's the running time of inserting a new
value Big O of so not Big O of one in this case but
insert numbers like this starting from an empty list we might have a two then
we might try inserting a one but we want to keep it sorted so now we're
going to prepend in our code but then you might want to insert a four so you
would append the four because you're probably going to look for the right
spot to insert it then we're going to insert a three and this one's getting a
little annoying because now you have to like iterate over the list look for the
right spot and then do a little smarter of a
splice but it's possible but you don't want to Orphan the four for instance and
then ultimately we get back to this question what would the performance be
of your linked list if you're trying to maintain sorted order well search I think
is going to be Big O of n for the same reasons as before what about insertion
big go of what for inserting into a sorted linked list yeah in the worst case
yeah it's still Big O ofen so it's no worse than but it's not really any better
than a pending but we gain the
through as well as this sorted order one but I think I'll avoid showing it live
just because I do think that starts to escalate quickly but I think we have
enough of a building block if we're comfortable with prepending to at least
solve some real world problems with these link lists questions then on link
list which we'll now leave behind on their own but now use this technique to
solve fancier problems but much less code questions on linked list all right so
to C recap we've kind of taken a s
side step with link list like we have this dynamism now where we can grow
and Shrink our chunks of memory without over allocating or accidentally
underallocation as in the world of an array we don't have to worry about
copying values endlessly because once you allocate the node it can just stay
wherever it is in memory and you can just maintain uh you can just Stitch it
together somehow but unfortunately we've sacrificed what we started the
class with in week zero which was like binary search divide and conquer
which was like
gave us that log and running time which was really compelling if you think
back to the demonstrations and the the visuals can we get the best of both
worlds can we get the sort of uh speed of binary search something
logarithmic but the dynamism of something like a link list well we can
actually I think if we start to think not in a single Dimension just the x-axis if
you will but two Dimensions such that our data structures can maybe now
have width and height if you will and so a tree is perhaps the right term here
much like a
family tree if you have sort of your elders up here in the tree and then the
branches below them for their children and grandchildren and the like that's
actually what a computer scientist means when they talk about trees not a
tree that grows up like this but really one that typically is depicted growing
down although this is just an artist's depiction no matter what but there are
certain types of trees in the world called binary search trees that are
structured on paper and in visually like
a uh family tree but they have a special property that lends themselves to
exactly that feature binary search so for instance here is an array back from
week two and I've sorted a whole bunch of numbers here in from 1 to seven
we know we can do binary search on this structure if it's implemented as an
array but what feature do arrays to be clear not have that link lists do today's
kind of a seesaw like what did we just gain by adding link list that arrays do
not allow yeah yeah you can insert more elements
without having to copy or moving everything else around like right now in
this single Dimension if these values to the left and or right are already used
then you have to move everything and that's where we started today's story
so arrays kind of paint you into a corner because you have to by definition
decide in advance how big they are well couldn't we have some kind of array
that can still grow but still is contiguous so we can do binary search in some
way well yes if we sort of rethink how we
Implement binary search Let Me propose that this I've chosen these seven
elements in the array much like the lockers from uh week two to be ordered
from smallest to largest I've highlighted now in yellow the middle elements
here and if we were telling the story of week two going left or going right let
me highlight in red the middle elements of the left half and the right half and
then let me further highlight in green the other elements in between those
and there's a pattern here as you
might notice whereby there's one yellow in the middle and then there's the
two red and the four green there's kind of an implicit structure there if you
will and what if I do start to think in two dimensions and instead of laying out
an array of lockers like this on the x-axis only what if I kind of like Slide the
four up and pull the uh the the one the three the five down and kind of draw
this in two Dimensions instead well let me do that as by separating these
things like this such that now let me propose
abstracting away what a node is but let me claim that each of these squares
now is a node and a node might have a number but it might also have a
pointer heck maybe even two or more pointers and let me draw those now I
don't care about addresses like ox1 2 3 4 5 6 7 8 n anymore let's just draw
our pointers with arrows but now let me propose that we could very well
think about this as a tree storing what was previous previously array data but
now each of these nodes can be anywhere in memory
and moreover even though I've kind of painted myself into a corner visually
on the screen so long as there's more memory in the computer I could put
the number zero over here I could put the number eight over here and I'm if
I'm smart I could probably if I want to insert other numbers like 2.5 or 1.5 or
values in between you know I bet we could kind of make room by swiveling
things around and just kind of hanging things off of these branches slightly
differently and so what does this gain
process this is the so-called route all right here I am at the number four I
want to find the number five what decision can I make when I see that I'm
currently at the number four just like the phone book from week zero where
is five not it's not to the left and if I were had you know built a little mobile
here or something we could very dramatically snip off this Branch this is this
is like very lowbudget animation these nodes could like fall to the ground and
we're left with half of essentially a
tree but what do I now know it's obviously the five to the right so let me go
to the right six is obviously not the one I'm looking for but what do I now
know about the five well five is less than the sixth so I can sort of snip this off
here because I know it's not going to be down there and I can follow the
remaining Arrow here and voila I just found it and now without getting into
the weeds of the math I've got here what Seven Elements that's roughly
eight if I round up and if I do
some log base two I actually 1 two three is the key detail here the height of
this tree is three because I took a list of size seven and IED it and IED it in
order to let it dangle in these two Dimensions plus or minus one for rounding
sake so what do I get back I now have binary search but it's not like H the
middle of the middle of the middle I now follow these arrows in one of two
directions so each of these nodes now has an INT and maybe a left pointer
and a right pointer but you can call them anything you want
and so I've gotten back binary search and dynamism because if you want to
add zero or eight or 9 or 10 we can just dangle them at the bottom of the
binary search tree so what would this look like in code but we won't actually
implement it line by line well here was previously our definition of a node for
a link list which was onedimensional if you will even though it might bounce
up and down on the screen it was still just a line if you will well let me get rid
of the single pointer in the linked list let me
make a little bit of room here in this type death and let me propose that we
just add two pointers each of which is a struct node star one will be called
left by convention one will be called right by convention and so long as
someone not me not today not in class writes the code that stitches together
this data structure too handling both the left child and the right child so to
speak I think we can indeed stitch together that two-dimensional structure
and moreover once you have this in memory you can
translate pretty elegantly to code binary search itself using a principle we
talked about recently too here is for instance a function that I'll write by just
clicking through steps called search whose purpose in life is to return a
Boolean true or false the number I'm looking for is in the tree this search
function therefore takes two arguments the number I'm looking for called
number and then a pointer to the tree the so-called root of the tree now how
can I Implement binary search in
code will recall our brief discussion of recursion it turns out recursion is a
beautiful technique and honestly more obvious technique when you have
two dimensional structures which finally after five plus weeks we now do
here's maybe my first line of code here if the tree is null then obviously
return false you've handed me an empty tree there's nothing going on
obviously the number you're looking for is not going to be here so that's my
like safe base case to make sure I don't screw up and recurse
infinitely well what else might be the case well if the number I'm looking for
is less than the tree's own number and now recall that trees a node star so
even though I'm calling it a tree it's really the current node that's been
passed in so if the number I'm looking for is less than the current nodes
number then I must know that the number I'm looking for is to the left so to
speak so how can I solve that well this is where the magic of recursion just
return whatever the answer is to calling
search again but on a sub tree if you will this is the sort of equivalent of
snipping off half of the tree pass in the left sub tree if you will with the same
number else if the number you're looking for isn't less than the current nodes
number but greater than snip off the other subtree instead and just return
whatever search says it finds in the right subtree here and then there's a
fourth and final case what else could be true logically [Music] yeah perfect if
the number you're
looking for equals equals the number in this node then I'm just going to
return true and you might recall from our recurring discussions of design I
don't strictly need to ask that explicitly either there's no node it's to the left
it's to the right or you found it so I can just Whittle that down as usual to an
else and this now returns my true so here too this is where recursion once
you get comfy with it sort of gets pretty elegant and cool in the sense that
wow even though there's a lot of
lines here I mean there's only a few interesting lines a lot of it's like Curly
braces at that which strictly speaking I could get rid of and so recurs lends
itself to Elegance when it comes to traversing these two-dimensional data
structures as well so that is in code how you might Implement something like
search questions then on these trees we have dynamism we can insert more
nodes to them they're faster because we get B search back but but but
there's got to be a price paid any
downsides or question or downside okay let me come back to that in just one
sec downside though what price of we paid for this dynamism and for this
binary searchability even though I've abstracted it away in the picture say
again we're using a lot of memory right I'm kind of misleading you now
because I'm just drawing these little squares with the simple numbers but
there's actually three things in there a four byte integer an 8 by left pointer a
8 by right pointer so we're already up to 16
20 bytes now to store individual ins that's probably okay though if memory is
relatively cheap and voluminous as it nowadays is but these are the kinds of
trade-offs and here too you see a hint of why some people still do like and
use C and in fact it's so omnipresent because when you have C you can
really fine-tune how much memory is being used for Better or For Worse
under the hood as we transition soon to python these decisions get made for
you and you have much much less control about how many me
how much memory is being used by your program because someone else
made those designed decisions for you question is it bad if we don't know
the parent node uh not necessarily there's no reason why you need to have
pointers in both directions however that can lend itself to efficiency by
spending more space and having arrows go up too you can actually save
more time when searching the tree in other context this though would be the
canonical way the typical way to implement it um but absolutely just like a
doubly link list
that could help you solve other problems too all right so turns out I'm kind of
overselling binary search trees there are perversions of them so to speak
whereby they won't actually behave as advertised for instance here's a a
good situation suppose you've got an empty tree initially and you insert the
number two well it's got to go somewhere so it might as well become the
root of this binary search tree and let's assume that someone wrote the code
to do this now you want to insert the number one and
you want to maintain the searchability of this tree well it's important to note
that binary search tree is different from tree if you just got a tree in memory
there is no social contract with where the numbers need to go they can be
completely random all over the place binary search tree means that you can
do binary search means that any node here is going to be greater than every
node here and less than every node here and that's a definition it's a
recursive structural definition that must be true
the right because it's larger all right now the user inserts three where does it
go okay it goes there logically and how does this story un fold the user
inserts four five six it's wonderfully sorted in advance by luck but this is a
perversion of the structure in what sense it's still technically a binary search
tree but what does it look more like it really is devolving if you will into a
linked list and so if you the programmer don't Implement a binary search
tree with some kind of repairs
going on such that as soon as something gets whoa a little too long in stringy
I think I can fix this it's going to be an annoying line number of lines of code
which we're not going to write here or or in a pet but we could kind of pivot
this thing right and we could just rejigger things so that the two becomes the
new root the one becomes the left child the three becomes the right child
but that's what like two three plus lines of code it's possible it's doable but
it's it's extra work it's extra code
so unless you write that code though and maintain balance of these trees
just because it's a binary search tree does not mean its height is going to be
log base 2 of n the height could be log it could be n in which case you don't
get those properties so when it comes to looking up in a balanced binary
search tree yes it's log in but if it's unbalanced if you don't add that
additional logic and those repairs so to speak you could it could devolve into
Big O of N and this is a whole category
of algorithms and fanciness that you would explore in a higher level course
on algorithms and data structures there's lots of way to do that sort of fixing
that I'm alluding to in the picture there on the screen screen all right a few
other data structures if you will toward an end of a sort of computer science
Holy Grail so log in is repeatedly a really good place to end up we started in
week zero when we got log in we lost it this earlier today by introducing link
list but we just got it back albeit at the price of
spending more space but the Holy Grail so to speak when it comes to
algorithms would not be Big O of n certainly definitely not n squar like our
mer like our bubble sorts and selection sorts and not even Big O of Logan
what's better than all of those big O of one constant time right that's the
Holy Grail because if we could store huge amounts of data but find it
instantly in one step or two steps or heck even 10 or 20 steps but
independent of the size of the data structure that's pretty powerful I mean
that's the secret sauce of the Googles and the twitters of the world trying to
get back results really really fast well it turns out another abstract data type
or abstract data structure might be something called a dictionary just like the
maram Webster Oxford English dictionaries that you might know which
associate say words with definitions well you can think of a dictionary really
abstractly as this like two columns maybe on a spreadsheet of sorts where
the left column represents something and the right column
represents something else like the word is on the left and its definition is on
the right and that's almost literally what a dictionary is on paper you've got
all the words and all the definitions right next to it but more generally in
Computing a dictionary really just has not words and definitions per se but
key value pairs this is a term of Art and we're going to see this again and
again especially as we transition to web programming keys and values key is
what you use to look for something the value
is what you find ultimately via that key so that's the generic term there
we've seen key value pairs really in the past in week zero we talked about
your contacts in your iPhone or Android phone uh being an app that has a
whole bunch of contexts presumably alphabetized by first name or last name
or the like well one one of those contact cards ultimately has someone's
number for instance like John Harvard in this case so in that type of
application the keys is the name like John Harvard that you
use to find information and the value is the number that you find there or if
there's more information like where he lives and uh email address and the
like the whole contact card could be the value thereof the key is what you
use to look up John Harvard now back in week zero oh and rather the
corresponding table then if we draw this in two columns wouldn't be word
and definition or key value generically it would be name and number for
instance so we're just slapping some new terminology on
this old contact problem well this is the picture we drew way back in week
zero whereby I claimed that log of n was really really good and indeed it was
and has been since but the Holy Grail would indeed be something more like
this in this dashed Green Line constant time and maybe not literally one step
but a fixed number of steps that even as the problem gets huge and you go
way way out on the right of the X AIS the problem does not depend on the
side the uh the uh the time to solve the problem does not
depend at all on the size of the problem itself you can have a thousand
contacts or 100,000 contacts constant time means it takes the same number
of steps no matter what well how can we get to that point well there's a
couple of final building blocks today and there's one called hashing and this
is something that will recur a few times but for now hashing is all about
taking as input some value and outputting a simpler version thereof so for
instance here's a a gratuitously large deck of cards which are all the
more visible as a result and in a deck of cards typically you've got like what
52 cards plus maybe the Jokers and whatnot and each of those cards has a
number of sorts and a suit on it and here are literally four buckets on the
stage and how might I go about sorting these cards not just by number but
also by suit well you could certainly like spread them all out and sort of make
a mess of things and just kind of reason your way through it and get
everything in order according to suit and corded by
number but most of us even if you don't have four buckets at home probably
are going to do something a little more intuitive feels like an optimization
where if I find like the nine of Hearts I'm going to put that into the hearts
bucket the King of Spades I'm going to put that into the Spades bucket the
jack of diamonds over here and I'll do this with the Queen of Diamonds and
uh the Ace of clubs here and the three here and the 10 here and even
though it's still going to be 52 steps why am I and maybe
at home like why would you perhaps do this step first what's the value of
bucketization reduces the probability of errors or the like and what I'm doing
here to give it a technical term is that I'm hashing the values I'm taking as
input a card like this and I'm reducing it more simply from a larger domain to
a much smaller range if you will so here's a domain of like 52 possibilities I
want to map that to a range of four possible outcomes the Diamonds the
clubs the carts or the
Spades here and by doing that I'm just shrinking the size of the problem so
hashing does that it's like literally an F ofx type Arrangement whereby you
pass something in and you get back a simpler known value well a hash
function more technically is the algorithm or even the math or even the code
that implements that idea converting something bigger to something smaller
to this indeed finite range of values and it turns out that hash tables are a
wonderful application of arrays and length lists to try to
Leverage The Best of Both Worlds the goal being theoretically to achieve that
Holy Grail of constant time and that's going to be a bit of an overstatement
because you're not always going to achieve it exactly but at least we can get
a little closer there too so with hash tables you have something that looks
like this this is just an array this is an artist rendition of drawing it vertically
instead of horizontally but that's just a a detail graphically and this array for
instance maybe uh is
of size 26 and where am I going with this well how does Apple how does
Google Store your contexts alphabetically in your phone and search for
things quickly well they might they probably alphabetize at least in English
according to a through z or if we convert that to numbers it's like what 65
through whatever or really 0 through 25 suffices if we're using an array of
size 26 we start counting at zero and we count up to 25 but let's abstract
that away as just letters of the alphabet so
maybe what Google and apple are doing in your phone is storing all of the
A's up there all of the Z's down there and everything else in between and so
this works pretty well if you start adding your friends and your family so for
instance and I'll get rid of the letter so it's to not distract uh alvus might go in
that first spot because a you subtract the 65 maps to zero so we put him in
the first bucket the a bucket uh maybe Zacharias ends up all the way at the
end there and then in the middle
might here be Hermione and if we do this dot dot dot you keep adding all of
your classmates you might get a uh contact database that has all of this data
here in now each of these nodes they're drawn differently because this is just
another artist rendition these rectangles these long rectangles represent a
contact card like John Harvard's that's got the name maybe email definitely
phone number and things like that so this seems great why how can I find
Albus well I go to the a bucket how do I find Zacharias I go to
the Z bucket how do I find Hermione I go to the H bucket but but but I've
done this very deliberately what problem will arise eventually assuming you
have enough classmates yeah there'll be too many people too many
contacts for all of the available spaces in the array there's still some room
here but I'm pretty sure if I think back to this particular class uh we've got
not hermy but also Harry who's also an H Hagrid who's also an H so where do
I put them I could just put them
arbitrarily in any of the open spots but then you lose the immediacy of
jumping right to the H right to the a right to the Z but now that we have link
lists we can kind of combine these ideas right use an array to get to the first
letter of the name you care about and then if you have a collision so to speak
whereby someone's already there you don't do something stupid like put
Harry down here just because it's available or maybe Hagrid down here just
because it's available because then you're losing the
immediacy of the lookup why don't you just kind of stitch them together in a
linked list now what does this mean this means for most of the characters
here you have constant time lookup you look up alvas boom you're done
Zacharias boom you're done okay Harry Hermione Hagrid it might be one
two or three steps so that's actually devolving into something linear but here
we make a distinction today between theoretical running times which we
keep talking about and honestly a clock on the wall running times that
actual humans care about this is way faster than a linked list because you
don't have to search every name it's even faster than a an array because
you don't need to do binary search you can literally for most of the names
find them in constant time one step and again it's not theoretically constant
because these if you only befriend people who have H names it's going to be
a crazy long link list anyway so again it really kind of depends on what the
nature of the data is here but this is
pretty close to constant time and in fact how could we get even closer how
could we reduce the probability of collisions for the H's or any other letters
how could we avoid putting too many H names together say a little L okay
yeah so we could add another dimension if you will but let's not add a third
dimension per se but let's indeed look at not just the first letter of everyone's
name but the first and the second and in fact let's see if that gets us a little
uh further along so let
me go ahead and propose if you go through the whole Harry Potter Universe
there's actually a lot of collisions if we keep going and so we've got the L's
here the RS the S's and so forth well let's clean this up here Hermione
originally went to the H location but let's decrease the probability of
collisions there and everywhere else instead of putting hermion Harry and
Hagrid Al together let's go ahead and do this instead instead of labeling
these buckets A through Z let's just give
of Harry and Hagrid colliding yeah so we could look at the third letter okay so
let me try that instead of ha let's look at haa ha ha C dot dot dot haq dot dot
dot h h e q h e r hes and so forth and now I think those names and probably
all the others we saw are now much more cleanly distributed there's much
lower probability of collisions unless two people have like almost the same
names or one is like a prefix of the other but but but even though we're now
closer than ever to constant time because the odds that we
hit a collision and have to devolve to a link list or much lower what's the
downside that's not completely obvious from how I've depicted this on
screen what's the price I'm paying here yeah this is a huge amount of
memory the number of cells here in the array is now what 26 * 26 * 26 for
the first the second and the third possible characters all combinatorically
combined here that's a lot I didn't even draw them I have the dot dot dot to
evoke that instead that's a huge amount of memory
this is a very sparse data set now and odds are you're going to waste so
much memory even for the names like Hae ha like HQ like I can't even think
of names so many of those buckets are going to be empty not to mention the
AAA and the zzz and everything else in between so it's a trade-off and it
might be too expensive a trade-off and so you might have to tolerate
something like the collisions we had earlier whereby even though they might
very well happen at least you uh are decreasing the probability by
perhaps having more buckets like this and in fact if I rewind now to where we
might have gone with this here here's how we might represent these nodes
in the tree previously in the past we've had a person who had a string name
and a string number AKA now Char star and so here now might be how in
this hash table we represent someone's name and number as well as a
pointer to the next element in the list we let me rewind just to the picture
here we keep drawing different shapes because again these are
abstractions who really cares if they're to scale now we've got enough room
for the person's name not pictured on the screen is herm's number that's
somewhere in this rectangle but yes pictured here in this little square is a
pointer to the next node in the list so by storing name and number maybe
her address maybe her mailing address whatever in addition to a pointer
allows each of these nodes to be connectable just like the nodes in a linked
list but where they're starting
just go in and free them or delete them or just kind of shrink the array and
not have a a AA and aab and a a only have the prefixes two or three
characters that you need you absolutely could do that but now what you you
lose is the arithmetic benefit of being able to map each letter to a number if
you start uh freeing up unused space you don't know that Zacharias is
necessarily at location 25 Albus is still going to be at location zero but if
you've deleted some of the elements in the middle
values therein at the end of the day it technically is Big O of n because in the
craziest case you might have a huge fancy hash table but everyone in the
universe has a name starting with h and then it just evolves into a really long
link list just like a binary search tree could do the same but if you choose a
smarter hash function maybe you mitigate that and you don't rely only on
the first letter but on the second or the third as well or some other
combination of that input and make your hash
function smarter odds are if you get a good hash function you want it get it
to be more of like order of n divided by K where K means constant
mathematically and so K is the number of buckets so ideally you want like a
uniform distribution you want like this many people here this many people
here you don't want there to be some or no people you want a uniform
statistical distribution and maybe you get that from Human names maybe
you don't but that's kind of the challenge of a hash function
like this kind of thing in that it does not devolve into Big O of n it is truly
constant time but there's going to be a price there's going to be a gotcha a
try is sort of a fancier tree and it's short for retrieval but pronounced try for
weird historical reasons but a try is a tree each of whose nodes is an array
right so this is all like crazy mashups now people started inventing data
structures just by combining different ones unfortunately a lot of the good
ideas are taken but you just have
benefits from certain aspects of those data structures and combining them
ideally gives you the best of both worlds so to speak so here might be the
root of a try it's literally a big node a big rectangle but it's actually an array
so there's like 26 locations in this picture here and here's how you use a try
for instance to store names just like the hash table it you treat each of the
elements of that array in that node as like a letter of the alphabet so a
through z or 0 through 25 and if you
want to store someone's name in here you do so as follows if you want to
store like uh an H you index into the H location and if you want to store the
second letter of someone's name like an a well you add another node below
it and such one is connected to the other and you then identify the a in that
array and then you go on and maybe put a g if the goal is to store spoiler
now Hagrid in this data structure and then the r and the I and then the D but
when you get to the D the end of the name you
have to somehow flag that this is the end of a name that we've embedded
into this data structure so whereas all of these are called out in white just to
make obvious what we're connecting to what green has to be like a bull
that's true that just indicates like the buck stops here like d is the last letter
in someone's actual name and what's kind of cool now about a try is that we
can repeat this for other names as well so for instance here is where we
might put Harry as well and notice they share a
common prefix ha for Hagrid ha for Harry so we're reusing some of these
nodes some of these arrays we can even slip Hermione in here too borrowing
only the H but she gets the H then the E then r r m i o n e and so forth and
we Mark at the end of her name too that she's in there now what's the
takeaway here well what is the running time of a try how many steps does it
take to find someone in this data structure and let me zoom out so that it
sort of suddenly becomes a massive data structure with even more
in it uh maybe it looks sorry no I'll keep it on this one maybe it looks a little
something like this with just these three names but how many steps does it
take to find Hagrid or Harry or Hermione no matter how many names are in
this data structure there's three at the moment but it takes what h a g r i d
so six steps to find Hagrid uh h a r r y five steps to find Harry h e r m i o n e
eight steps to find Hermione but notice that those steps are only dependent
on what the lengths of the human's names
and let's assume that no one's going to have a infinitely long name it's going
to max out at what like eight no maybe eight 18 maybe 20 30 there's
actually some pretty long human names out there but it's going to be finite
you know it's abounded whereas most contexts n could grow forever so
what's compelling here is if you assume that the longest name is I don't
know 50 for the sake of a theme here then you know that finding anyone in
this data structure will take
you no more than 50 steps 50 is thus a constant which means you have big
O of one running time it doesn't matter if there's a million people in this
phone book or a billion people in this phone book that's going to definitely
add more nodes to it but it's still going to take you h a r uh sorry h a g r i d
six steps to find Hagrid h a r r y five steps to find Harry even if there's a
billion other people in that data structure so now we actually do seem to
have constant time if you assume that there's going to
be a bound on the length of the name why don't we use tries for everything
then what's the price we're paying for this data structure even though we've
represented just three characters here yeah it's a lot of memory yeah and
you can see it even with these three names most of the squares on the
screen are empty like bites and bits that are there and are allocated and they
need to be there because you need to be able to do that arithmetic thing of
this being zero this being 25 so you can jump from boom
boom boom boom based on each of the letters but it's a hugely sparse data
structure which means it takes up a crazy amount of memory now maybe
that's tolerable especially for short names but that's going to be the trade-off
as well and this is such a tension in Computing almost any time you want to
improve time you want to speed up the efficiency the speed of your
algorithm you're going to spend space if by contrast you want to decrease
the amount of space you might very well have to increase the running
time it is indeed this seesaw back and forth and you your colleagues your
company need to decide what resource is the most precious heck it might be
much harder to code one of these data structures than another you're a
human your time is valuable do you really want to spend hours implementing
a try when you know hey in 30 minutes I can bang out an array nowadays or
a linked list even there too development time is going to be yet another
resource and why sometimes there's good code or bad code
in the path that you take to find them so that's a minor optimization but it
saves us some space but this would be just a different data structure we
could use to actually solve this problem as well albeit at a very expensive
cost and what do we need our variable to be that stores the try just like four
we just need a single pointer that hangs on to the root of this structure that's
null if it's empty or non-null if it's actually pointing at something any
questions then on tries and if it's feeling like a lot the
fire the the fire hydrant it is we started with arrays then link list then tries but
questions on how we've just assembled from these basic building blocks
[Music] yeah a good question why is this not uh size 26 uh it's just like with
the try just like with the link list before it just tends to be en code convenient
to have a separate additional pointer that's small that just points to the
beginning of the data structure because that way it can be null thereby
clearly indicating there are no nodes the whole
structure is empty if you allocated one of those nodes you absolutely could
but then you'd be just wasting space even if it's empty and it creates an
ambiguity so just having a single pointer linked to the beginnings of all of
these things is a good thing other questions now on tries or trees or hash
tables or arrays so what problems might arise well here's a counter example
what names are manifest in this try here feel free to just call it out what do
you see Danielle and Danielle so presumably
if these are two uh names here one of which is a prefix of another notice that
the data structure still works and I chose you know a friend's name and then
appended a couple of more characters to it that's also a name because we
have here d a n i e l and the green technically is implemented as a bull or
something like that that indicates a word stops here but we don't want to
preclude storing Danielle as well who's a super string if you will of Daniel and
so that's okay too so long as the
structure allows for the pointers to keep going so even that works out okay
whereas it might not have otherwise and in terms of the running time just to
be clear at the end of the day tries do give you actual constant time for
insertions lookups deletions and the like because it's dependent only on the
length of the input the key if you will and not on how many other people are
in your phone or a dress book and now thought we'd conclude with a visual if
you've gotten out into the square anyone
recognize this okay okay sweet green a local salad place what are we looking
at here and what's its connection to today um you're about to become all the
geekier in the real world cuz you will start to see data structures everywhere
what is this or how does this work maybe in salad form who's been to Sweet
green okay either of you so how does this [Music] work okay good so if you
order a salad for someone named L when it's ready they put it in the l
section here and so this is kind of a set of key value pairs
right if L is the first letter of someone's name the value hopefully is the salad
and so what you kind of have here is a dictionary key value pairs where it's
not words and definitions it's names and salads and you can think of this too
as kind of a hash table why even though it actually doesn't fit on one long
shelf cu the store is only so big this is really an array and apparently a is
missing or maybe it's around around the corner but this array just happens
to wrap onto multiple lines
but it's still conceptually a single Dimension but suppose two people have
the name L what do they do typically yeah so maybe they they well if they
run put that much effort into it they might look at the second letter and then
the third letter odds are this is not that interesting a problem to solve
optimally in that way but they probably do start stacking the salads on top of
each other maybe scooching it over just a little bit and so what do you have
there well now you start to view the lens through like cs50 glasses like okay
you have an array and then you have like these link lists that are sort of
growing here but even then you run into a problem why because it's not
really a link list because at some point you're going to hit the boundary here
so it's kind of like an array of arrays because you can only fit what like three
or four salads here and so long story short we started today deliberately
talking about real world things like stacks and cues and even though it did
escalate quickly into binary search trees and hash tables
and tries even those things are everywhere even though they don't call them
as such these are just solutions to problems and now with this final week of
SE under your belt you have all the more of a technical toolkit via which to
implement these things and code next week we'll be able to trust that
someone else solved all these problems we'll introduce Python and lines of
code like this will finally become lines of code like that so that's the promise
ahead and we'll see you next
yourself new languages in the future and so indeed what we'll do today what
we'll do this coming week is sort of prepare you to stand on your own and
once python is p a and the world has moved on to some other language in
some number of years you'll be well equipped to figure out how to wrap your
mind around some new syntax some new language and solve problems as
well now you recall in week zero this is where we started just saying hello to
the world and that quickly escalated just a week later in C
to see be something much much more cryptic and if you've still sort of
struggled with some of the syntax find yourself checking your notes or your
previous code like that's totally normal and that's one of the reasons why
there are languages besides C out there among them this language called
python humans over the decades have realized gee that wasn't necessarily
the best designed decision or humans have realized wow you know what now
that computers have gotten faster with more memory and more faster
it's going to be literally this none of the crazy syntax above or below fewer
semicolons if any fewer curly braces and really a lot of the distractions get
out of the way so to get there let's consider exactly how we've been
programming up until now so you write a program in C and you've got
hopefully no syntax error so you're ready to build it that is compil it and so
you've run make and then you've run the program like/ hello or if you think
back to week two where we took a peak underneath the hood
of what make is doing it's really running the actual compiler something
called clang maybe with some command line arguments creating a program
called hello and then you could do do/ hello so today you're going to start
doing something similar in spirit but fewer steps no longer will you have to
compile your code and then run it and then maybe fix or change it and then
compile your code and run it and then repeat repeat the process of running
your code is going to be distilled into just a single
step and the way to think of this for now is that where a c is frequently used
as indeed a compiled language whereby you convert it first to zeros and
ones Python's going to let you speed things up whereby you the human
programmer don't have to compile it you're just going to run what's called an
interpreter which by Design is named the exact same thing as the language
itself and by running this program installed in VSS code or eventually on your
own Max or PCS this is just going to tell your
computer to interpret this code and figure out how to get down to that lower
level of zeros and ones but you don't have to compile the code yourself
anymore so with that said let's consider what the code is going to look like
side by side in fact let's look back at some scratch blocks just like we did with
C in week one and do some side by sides because even though some of the
syntax this week and Beyond's going to be different like the ideas are truly
going to be the same there's not all that much
intellectually new just yet so whereas in week zero we might have said hello
to the world with this purple puzzle piece today of course uh or rather in
week one it looked like this in C but today moving forward it's going to quite
simply look like this instead and if we go back and forth for just a moment
here again is the version in C noticing the very seike characteristics and just
at a glance here in Python I claim it's now this what do you apparently need
not worry about anymore what's gone so semicolon is gone
and indeed you don't need those to finish most of your thoughts anymore
anything else so the back sln is absent and that's kind of curious because
we're still going to get a new line but we'll see that it's become the default
and this one's a little more subtle but now the function is called print instead
of print F so it's a little more familiar in that sense all right so when it comes
to using libraries that is code that other people have written in the past
we've done things like hash include
cs50.h to use cs50's own header file or standard IO or standard lib or string
or any number of other header files you have all used well moving forward
we're going to give you for this first week a similar cs-50 Library just very
short-term uh training wheels that will quickly take off because in reality it's
a lot easier to do things in python as we'll see but the Syntax for this now is
going to be to import the cs50 library in this way and when we have now this
ability we can actually start writing
some code right away in fact let me switch over to vs code here and just as
in the past I'll create a new file but instead of creating something called C I'm
going to go ahead and create my first program called [Link] using Code
space [Link] that of course gives me this new tab and let me actually quite
simply do what I proposed print quote unquote hello world without the back
slash without the semicolon without the F in print and now let me go down to
my terminal window and I don't have to
compile it I don't have to do dot slash i instead run a program called python
whose purpose in life is now to interpret my code top to bottom left to right
and if I run python of [Link] crossing my fingers as always voila now I have
printed out hello world so we seem to have gotten the new line for free in
this sense where it's automatically happening the dollar sign isn't weirdly on
the same line like it want was in week one but that's just a a minor detail
here if we switch back to
now some other capabilities well indeed with the cs50 library you can also
not just import the library itself but specific functions and you'll see that
temporarily we're going to give you a helper function called get string just
like in C that just makes it work exactly the same way as in C and we'll see a
couple of other functions that will just make life easier initially but quickly will
we take those training wheels off so that nothing is indeed cs50 specific all
right well how about
functions more generally in Python let's do a whirlwind tour if you will much
like we did in that first week of C comparing one to the other so back in our
world of scratch one of the first programs we wrote was this one here
whereby we asked the human their name we then used the return value that
was sort of automatically stored in this answer variable as a second
argument to join so that we could say hello David or hello Carter so this was
back in week one uh week zero in week one we converted it to
this and here's a perfect example of things like escalating quickly and again
this is why we start in scratch there's just so much distraction here to
achieve the same idea but even today we're going to chip away at some of
that syntax so in C we had to declare the argument as a we had to declare
the variable as a string here we of course had the semicolon and more well
in Python the comparable code now is going to look more simply like this so
semicolon is again gone on both lines for that matter
so that's good what else appears to have changed or disappeared yeah type
of variable yeah so I didn't have to specifically say that answer is now a
string and indeed python is is dynamically typed and in fact it will infer from
Context exactly what it is you are storing in that variable other details that
seem a little bit different little bit different what else jumps out at you here
I'll go back this was the C version and maybe Focus now on the second line
because we've rather
exhausted the first here's now the python version what's different here yeah
yeah there's no percent s anymore there's no second argument at the
moment per se to print now it is still a little weird it's as though I've like
deployed some addition here arithmetically but that's not the case some of
you have programmed before and plus some of you might know means what
in this context so to combine or more technically anyone know the buzz word
yeah to concatenate so to concatenate is like
the fancy way of what scratch calls joining which is to take one string on the
left one string on the right and to join them together to glue them together if
you will so this is not addition it would be if it were numbers involved instead
but because we've got a string hello comma and another string implicitly in
this variable based on what the human typed in in response to this get string
function that's going to concatenate hello comma space and then David or
Carter or whatever the human
has typed in but turns out there's going to be different ways to do this in
Python and we'll show you a few different ones and here too try not to get
too hung up on or frustrated by like all of the different ways you can solve
problems odds are you're going to be picking up tips and techniques for
years to come if you continue programming so let's just give you a few of the
possible ways so here's a second way you could print out hello comma David
or hello comma Carter but what has changed
in the previous version I used concatenation explicitly and the space here is
important grammatically just so we get that in the final phrase now I'm
proposing to get rid of that space to add a comma outside of the double
quotes as well but if you think back to C this probably just means that print
similar in spirit to print F can take not just one argument but even two and in
fact because of this comma in the middle that's outside of the double quotes
it's hello comma and then it will
and probably the weirdness jumps out we've automatic we've suddenly
introduced these like Curly braces which I promised were mostly gone and
they are but inside of this string here I've done a curly brace which might
mean what just intuitively and here is sort of an example of how you learn a
new language just kind of infer from Context how python probably works
what might this mean yeah able to tell that this is not one of the actual stat
inside yeah so this is an indication because the curly braces because this is
the way python was designed that we want to plug in the value of answer
not literally a ANS w r and the fancy word here is that the answer variable
will be interpolated that is substituted with its actual value but but but and
this is actually weird looking and this was introduced a few years ago to
python what else did I have to change to make these curly braces work
apparently yeah yeah there's this weird F and so it's sort of like part of print
F but now it's inside the curly it's inside
the parenthesis there this is just the way python designed this so a few years
ago when they introduced what are called format strings or F strings you
literally prefix your quoted string with the uh letter F and then you can use
trickery like this like putting curly braces so that the value will be substituted
automatically if you forget the F you're going to literally see hello comma
curly brace answer close curly brace if you add the F it's indeed interpolated
the value is plugged in all
right questions on how we can just say hello to the world via python in this
case yeah if you do this without the without the F if you omit the F you will
literally see c h e l l o comma curly brace a NS w r close curly brace so in fact
let's do this let me go back to vs code here quickly I've still got my file called
[Link] open and let me go ahead and change this ever so slightly so I'm
going to go ahead and uh let's say from cs50 import get string and that's just
the new syntax I propose using to import
a function from someone else's Library I'm going to now go ahead and ask
the question uh let's go ahead and use get string storing the result and
answer so get string quote unquote what's your name question mark and
then on this line I'm going to deliberately make a mistake here exactly to
your question let me just say hello comma answer and just this now even
though answer is a variable Python's not going to be so presumptuous as to
just plug in the value of a variable called answer what
it's going to do of course is if I type in my name whoops I typed too fast let
me go ahead and rerun that again if I run python of [Link] type in my name
and hit enter I get hello comma answer well let me do one better let me
apply these curly braces as before let me rerun python of [Link] What's
Your Name daav ID and here's again the answer to your question now we get
literally the curly braces so the fix here ultimately is just going to be to add
the F there rerun my program again with
daav ID and now hello comma David so this is admittedly a little more cryptic
than the ones with the Plus the comma but this is just increasingly common
why because you can read it left to right it's nice and convenient it's less
cryptic than the percent s's so it's sort of a new and improved version if you
will of printf in C based on Decades of experience of programmers doing
things like this questions on printing in this way we're now on our way to
programming in Python anything all right well what more
can we do with this language here well let me propose that we consider that
we have for instance a few other features that we can add to the mix as well
namely let's say some data types as well so let me flip over here to um back
to the slides and there's different data types in python as we'll soon see but
they're not as explicit as we already saw by using a string from get string you
don't have to explicitly State what it is but you solve recall and see all of
these various data types and then in
python kind of nicely enough this list is about to get shorter and so here is
our list in C here is an abbreviated list in Python so we're still going to have
strings but they're going to be more succinctly called stirs now St Str we're
still going to have in for integers we're still going to have floats for floating
Point values we're even going to have bulls for true and false but what's
missing now from the list is long and floats and why is that or rather long and
double well recall
that in C those used more bits well in Python the smaller data types
previously int and Float themselves just use more bits for you and so you
don't need to distinguish between small and large you just use one data type
and the language gives you a bigger range than before it turns out though
there's going to be some other features as well of python these data types
one of which will be called range another of which will be list So Gone will be
arrays we'll actually use something literally called
a list tle sort of like XY pairs for coordinates and things like that uh dict for
dictionaries so we have built-in capabilities for storing keys and values we'll
see and even a set sort of mathematically a set is like a collection of values
but it automatically gets rid of duplicates for you so all of these things we
could absolutely Implement in C if we wanted and indeed in problem set five
you've been implementing your very own spell checker using some form of
hashtable well it turns out that in Python you can
solve those same problems but perhaps a little more readily in fact let me go
back over here to vs code and let me propose that I do the following let me
go ahead and create a file called dictionary. let me propose that I try to
implement say problem set five our spell checker in Python instead of c and
Achieve ultimately the same kind of behavior uh whereby I'll be able to spell
check a whole bunch of words so this is jumping the gun a little bit because
you're about to see syntax will
revisit over the course of today but for now I've got a new file called
dictionary. py and let me begin to create uh some placehold ERS for
functions we'll see in just a bit that in Python you can define a function called
check and that check function can take a word as it's input and I'll come back
to this in just a moment in Python I can define a second function like load
which itself will take a whole dictionary just like in problem set five and I'll go
ahead and come back to the
code and indeed it's been by design a challenge but one of the reasons for
these higher level languages like python is that you can stand on the
shoulders of programmers before you and solve very common problems
much more quickly so that you can f us on building your new app or your
web application or your own project to solve problems of interest to you so at
the risk of crushing some Spirits let me propose that in Python if you want a
dictionary for something like a spell checker well that's fine go
ahead and give yourself a variable like words to store all of those words and
just assign it equal to a dictionary or dict for short in Python that will give
you a hashtable now it turns out in speller recall you don't need to worry
about words and definition conditions it's just about spellchecking the words
so strictly speaking we don't need keys and values we just need keys so I'm
going to save myself a few more key strokes by just saying that technically in
Python using a set suffices again a
set is just a collection of values with no duplicates but they don't necessarily
have uh keys and values it's just one or the other but now that I have on line
one I claim the equivalent in Python of a hash table I can actually do
something like this here's how I might implement the check function in
Python if the word passed into this function is in my variable called words
well return true else go ahead and return false done wait you're thinking if
anything at all maybe we want to handle lowercase instead of
just uppercase and lowercase well you know what in Python if you want to
force a whole word to lowercase you don't have to iterate over it with a loop
you don't have to use any of that ctype functions or anything just say word.
lower and that will convert the whole thing to lowercase for parody with the
dictionary all right how about something like the load function in Python well
in Python you can open files just like in C for instance in Python I might do
open the dictionary argument in read mode just
like fopen in Python I might do something like this for each line in that file let
me go ahead and add to my words variable that line and then let me go
ahead and close that file and I think I'm done I'm just going to go ahead and
return true just because I think think I'm already done now here too I could
nitpick a little bit technically if I'm reading in every line from the file every
line in the dictionary ends with technically a back sln but there's an easy way
to get rid of that uh just like
you might in see with an alternative syntax what I'm actually going to do is
this let me grab from the current line the current word by stripping off with
reverse strip R strip a function will again see that just gets rid of the trailing
new line the back sln at the end of that line and what I really want to do then
is add this word to that dictionary meanwhile if I want to figure out what the
size is of my dictionary well and see you're probably writing code to iterate
over all of those lines
and you're just going to uh count them up using a variable not so in Python
you can just return the length of those words and better still in Python you
don't have to manage your own memory no more Malo no more free no more
manual thinking about memory the language just deals with all of that for
you so you know what it suffices for me to just return true and claim that
unloading is done for me and that's it again whether you're in the middle of
or already finished this might perhaps suggest some
frustration but also Enlightenment in this in that this is why higher level
languages exist you can build on top of the same principles the same ideas
with which you've been dealing struggling even this past week but you can
now express yourself all the more succinctly like this one line implements a
hash table for you and all of this now now just uses that hash table in a
simpler way any questions now on this keeping in mind that the point
nonetheless of speller and P said 5 is to understand
what's really going on underneath the hood and better still to notice this this
might seem all rather amazing but let me go ahead and do this I've actually
got a couple of versions of speller written here and I've got a version written
in C that I won't show the source code for but I'm going to go ahead and
make that version of speller in C and I'm going to go ahead here and let's say
split my window here for just a moment and I'm going to go into a python
version of spell really that I
just wrote and on the left hand side here let me go ahead and run speller the
version I compiled in C using a big text like uh the Sherlock Holmes text
which is a whole lot of words in it and on the right hand side let me run
python of spell. Pi which is a separate file I wrote in advance just like we give
you speller.c and I'll similarly run this on the Sherlock Holmes text and I'm
going to do my best to hit enter on the left and the right of my screen at the
same time but we should see hopefully the
same list of misspelled words and the timings thereof so here we go on the
right here we go on the left all right sort of a race to see which one wins here
C is on the left python is on the right okay interesting hopefully Python's
close behind note that some of the is internet delay and so it might not
necessarily be a crazy number of seconds but the system is indeed using if
we measure at a low level how much time the CPU spent executing my code
C took a total of 1.64 seconds that was pretty fast even though
it took a moment more for all of the btes to come over the Internet the
python version though took what 2.44 seconds so what might an inference
be I mean one maybe I'm just better at programming in c than I am in Python
which is probably not true but what else might you infer from this example
should we maybe give up on python stick with C no so where what might be
going on here like why is the python version that I claim is correct and I think
the numbers all line up just not the times where's the trade-off here well
here again is sort of this design tradeoff yeah yeah exactly in order to save
the human programmer time there's a lot more features built into python
more functions more automatic management of memory and so forth and
you have to pay a price like someone else's code is doing all of that work for
you but if they've written some number of lines of code those are just more
lines of code that need to be executed for you whereas here the computer is
at the risk of oversimplifying only running my lines of
you down to zeros and ones and then the second the third the fourth time
you run that program it might very well be faster so this is a bit of a head
fake here in that I'm running them once and only once but we could get
benefit over time if we kept running the python version again and again and
perhaps fine-tune the performance but in general there's going to be this
trade-off now would you rather spend the 60 seconds I wrote implementing a
spell checker or the 6 hours 16 hours you might be or
have spent implementing the same and C you know probably not for
productivity sake this is why we have these additional languages just for fun
let me flip over to another screen here and open up a version of python
that's actually on on my in just a second on my own uh Mac instead of the
cloud so that I can actually do something with Graphics so here I just have a
black and white terminal window on my very own Mac and I've pre-installed
python just like we've done so for VSS code in the cloud
for you uh notice that I've got this uh photo of uh perhaps one of your
favorite TV shows here with the cast of The Office notice all of the faces in
this image here and let me propose that we try to find one face in the crowd
sort of CSI style whereby we want to find perhaps the stranton Strangler so
to speak and so here is an example of this this guy's face now how do we go
about finding this specific face in the crowd well our human eyes obviously
can pluck him out especially if you're familiar
with the show but let me go ahead and do this instead let me go ahead and
propose that we run code that I already wrote in advance here this is a
Python program with more lines of code that we won't dwell on for today but
it's meant to motivate what we can do from a a pillow uh Library implying a
python image Library I want to import some type of information called some
type of some feature called image so that I can manipulate images not unlike
our own problem set 4 and this is kind of
powerful you in Python you can just import face recognition as a library that
someone else wrote from there I'm going to create a variable called image
I'm going to use this face recognitions libraries load image file function it's a
little verbose but it's similar in Spirit to F open and I'm going to open office.
jpeg I'm going to then declare a second variable called face locations plural
because what I'm expecting to get back per the documentation for this
library is a list of all of the faces
locations that are detected all right then I'm going to iterate over each of
those uh faces using a for Loop that we'll see in more detail I'm going to then
infer what the top right bottom and left Corners are of that face and then
what I'm going to do here is show that face alone if I've detected the face in
question so let me go ahead here and run detect. and we'll see not just the
one face we're looking for but if I run python of detect. piy it's going to do all
of the analysis I'll see a big
opening here now of all of the faces that were detected in this here program
okay some better than others I guess if you zoom in on catching someone
typical Angela if you now want to Now find that one face I think we need to
train the software a bit more so let me actually open up a second program
called recognize that's got more going on but let me with a wave of a hand
point out that I'm now loading not only the office. JPEG but also toby. JPEG to
sort of train the algorithm to find that
specific face and so now if I run this second version recognize. with python of
recognize. py hold my breath for just a moment it's an analyzing presumably
all of the faces you see the same original photo but do you see one such face
highlighted here this adversion of the code found Toby highlighted him with
this green and voila we have face recognition so for better for worse this is
what's happening increasingly societally nowadays and honestly even
though I didn't write the code live
because it's a good dozen or more lines of code it's not terribly many and
literally all the authorities all we have to do is import face recognition and
voila you have access like these techn IES are here already but let's consider
for just a moment how did we find Toby like how might that Library even
though we're not going to look at its implementation details how does it find
Toby and distinguish him from all of these other faces in the crowd what
might it be doing intuitively think back even to pet 4
like what you yourselves have access to data wise yeah you know pixels in
one area area and a lot of that it's a lot of a lot of simar yeah exactly and to
summarize for for camera here we have trained the software if you will by
giving it a photo of Toby's face by so by looking for the same or really similar
pixels especially if it's a slightly different image of Toby we can perhaps
identify him in the crowd and what really is a human face well at the end of
the day the computer only knows it as a pattern
various measurement shapes colors and sizes and the like and indeed that
might might be the intuition but what's powerful here again is just how easy
and readily available this technology now is all right so with that said let's
propose to consider what more we can do with python itself get back to the
fundamentals so that you yourselves can start to implement something
along those same lines so besides having access to things like a get string
function um the cs50 library provides a few other things
as well namely in C we had these but in Python we're going to have fewer in
Python our library short term is going to give you not only get string but also
get in and get float why it's actually just kind of annoying as we'll student
see to get back an integer or a float from a user and just make sure that it's
an INT and a float and not like a word like cat or dog or some string that's not
actually a number well we can import not just the specific function get string
but we can actually import all of
these functions one at a time like this as we'll soon see or you can even in
Python import specific functions from a file one of you asked a while back
when you import when you include something like cs50.h or standard i.h
you're actually getting all of the code in that file which potentially can add
bulk to your own program or time in this case when you import specific
functions from python you can be a little more narrowly uh precise as to what
it is you want to have access to all right so with that
said let's go ahead and see what conditionals look like in Python so in the left
hand side again here we'll see scrp and here for instance was just kind of a
contrived example asking if x is less than y then say x is less than y in C it
looked like this in Python now it's going to look like this instead and here's
before in C and here's after and just to call out a few of the obvious
differences what has changed in Python for conditionals it would seem sort of
what's the difference
yeah yeah so there's no more curly braces and indeed you don't use those
what appears to be taking their place if you might infer what seems to have
taken their place what do you think so the colon at the start of this line here
but also even more important now is this indentation below it so some of you
and we know this from Office hours have a habit of like uh indenting
everything on the left right and it's just kind of this crazy mess to look at
frustrating for you surely but C and clang is pretty
tolerant when it comes to things like Whit space in a program python uh-uh
they realized years ago that let's help humans help themselves and just
require standard indentation so four spaces would be the norm here but
because it's indented below that colon that indeed indicates that this now is
part of that condition something else has gone missing versus C in this
conditional what else is a little simplified yeah so no more parentheses you
can still use them especially when you need to logically to do order of
operations like in math but in this case if you just want to ask a simple
question like if x less than y you can just do it like that how about when you
have an if else well this is almost the same here with these same changes in
C this looked like this and it's starting to get a bit bulky at least if we use our
curly braces in this way in Python we can tighten things up further even
though strictly speaking in C you don't always need the curly braces but here
gone are the parentheses again gone are
the curly braces indentation is consistent and we've just added another
keyword else with a colon but no more semicolons as well how about
something larger like this and if else if else this one's a little curious but in C
it looked like this if else if else in Python it now looks like this and there's
perhaps one curiosity here that honestly all these years later I still can't
remember how to spell it half the time what's weird about this what do you
spot as different uh yeah over
here yeah instead of else if it's l if why apparently El space if was just too
many keystrokes for humans to type so they condensed it into this way
probably means it's a little more distinguishable too for the computer
between the if and the else to but just something to remember now it's
indeed L if and not else if all right so what about variables in Python I've used
a couple of them already but let's let's um distill exactly how you define uh
and declare these things as well so in
incremented by one you have a few different ways here in C we saw syntax
like this where you can say counter equals counter plus one which again
feels like illogical how can counter equal counter plus one but again we read
this code really right to left updating its value by one um in Python it's
almost the same you just get rid of the semicolon so that logic is there but
recall in C we could do something slightly different that we can also do in
Python in Python you can also more succinctly do this plus equals and then
whatever number you want to add or you can even change it to subtract if
you prefer sadly gone is something you've probably typed a whole lot what
was the other way you can add one Plus+ is no more sadly in Python just too
many ways to do the same thing so they got rid of it in favor of just this
syntax here so keep that in mind as well what about loops when you want to
do something in Python again and again well in scratch in week zero here's
how we meowed three times specifically in C
we had a couple of ways of doing this this was like the more mechanical
approach where you create a variable called I you set it equal to zero you
then do while I is less than three the following and then you yourself
increment I again and again mechanical in the sense that like you have to
implement all of these gears and make them turn yourself but this was a
correct way to do that in Python we can still achieve the same idea but we
don't need the int keyword we don't need any of the semicolons we don't
need the
parentheses we don't need the curly braces we can't use the Plus+ so maybe
that's a minor step backwards if you're a fan but otherwise the code the logic
is exactly the same but there's other ways to achieve this same idea recall
that in C we could also do this you could use a for Loop which it does exactly
the same thing both are correct both are arguably welld designed it's kind of
to each their own when it comes to choosing between these in Python
though we're going to have to think
through how to do this so you don't do the same for loop as in C the closest I
could come up with is this where you say four I or whatever variable you
want to do the counting in literally the preposition and then you use square
brackets here and we've used square brackets before in the context of like
arrays and things like that and the 012 looks like an array in some sense
even though we've also seen arrays with curly braces but these square
brackets for now denote a list python does not have
never seen python before how does this example not end well yeah yeah like
if you're making a large list you have to type out each one of these numbers
like comma three comma four comma 5 comma dot dot do 50 comma dot
dot dot 500 like surely that's not the best solution to have all of these
numbers on the code on the screen wrapping endlessly on the screen so in
Python another way to do this would be to use a function called range which
technically is a data type unto itself and this returns to you as many values
as you ask for it range takes some other arguments as well but the simplest
use case here is if you want back the numbers 0 1 and two a total of three
values you say hey python please give me a range of three values and by
default they start at zero on up but this is more efficient than it would be to
hardcode the entire list at once and the best metaphor I could up with is
something like this like here for instance is a deck of cards this is sort of
normal human size and there's presumably 52 cards here so writing out
0 through 51 on code would be a little ridiculous for the reasons you know it
would just be very unwieldy and ugly and wrapping and all of that it would
be the phys it would be the virtual equivalent of me like handing you all of
these cards at once to just deal with and right you know they're not that big
but like it's a lot of cards to hold on to it requires a lot of memory or physical
storage if you will what range does metaphorically is if you ask me for three
cards I hand you them one at a
time like this so that at any point in time you only have one number in the
computer's memory until you're handed the next the alternative the previous
version would be to hand me all three cards at once or all 52 cards at once
but in this case range is just way more efficient you can do range of a
thousand that's not going to give you a list of a thousand values all at once
it's going to give you a thousand values one at a time reducing memory sign
signicantly in the computer itself all right so besides
this what about doing something forever in scratch well we could do this
literally with a forever block which didn't quite exist in C in C we had had to
hack it together by saying while true because true is by definition TR always
true so this just in uh deliberately induces an infinite Loop for us in Python
the logic's going to be almost the same and infinite Loops in Python tend to
actually be even more common because you can always break out of them
as you couldn't see in Python it looks
like this and this is slightly more subtle but gone are the curly braces gone or
the parentheses but ever so slight difference toal capital T for true and it's
going to be a capital f for false stupid little differences eventually you're
going to mistype one or the other but these are the kinds of things to keep
an eye out and to start recognizing in your mind's eye when you read code
questions now on any of these building blocks yeah I in the for Loop was I re
uh it was set to zero on the first iteration then one
on the next then two on the third and the same thing for range it just doesn't
use up as much memory all at once other questions now on any of these
building blocks of python no all right well let's go ahead and build something
a little more than hello let me propose that over here we Implement maybe
the the simplest of calculators here so let me go back to vs code here open
my terminal uh window and open up say a file called calculator. and in
calculator. we'll have an opportunity to explore some of
these building blocks but we'll allow things to escalate pretty quickly to more
interesting examples so that we can do the same thing ultimately as well and
in fact let me go ahead and do this moreover I've brought some code with
me in advance uh for instance something called calculator 0. C from the first
week of C and let me go ahead and split my window here in fact so that I can
now do something like uh this let me move this over here here calculator. Pi
so now I have on the left of my screen
calculator. C or calculator z.c because that's the first version I made and
calculator. Pi on the right let me go ahead and Implement really the same
idea here so on the right hand side the analog of including cs50.h would be
from cs50 import get int if I want to indeed use this function now I'm going to
go ahead and give myself a variable X without defining its type I'm going to
use this get int function and I'm going to prompt the user for X just like in C
I'm then going to go ahead and prompt
the user for another int like y here just like in C and at the very end I'm going
to go ahead and do print X Plus Y and that's it now granted I have some
comments in my C version of the code just to remind you of what each line is
doing but I've still distilled this into like six lines or really four if I get rid of
the blank line so it's already perhaps a bit tighter here here but there's also
it's tighter because something really important historically is missing what
did I seem to Omit
altogether that we haven't really highlighted yet yeah yeah the main
function is gone and in fact maybe you took for granted that it just worked a
moment ago when I wrote hello but I didn't have a main function in hello
either and this too is a feature of python and a lot of other languages as well
instead of having to adhere to these long-standing Traditions if you just want
to write code and get something done fine just write code and get something
done without necessarily all of the same boiler plate so whatever
is in your python file left indented if you will by default is just going to be the
code that The Interpreter runs top to bottom left to right well let me go
ahead now and run code like this let me go ahead and open that back up my
terminal window run python of calculator. piy and I'll do X is one y is 2 and as
you might expect it gives me three slight aesthetic bug I put my space in the
wrong place here so that's a new mistake let me fix that aesthetically let me
rerun python of calculator. Pi type in one type in two
and voila there is now my same version again but let me propose now that
we get rid of this training wheel we don't want to keep taking one step
forward and then two steps back by adding these training wheels so let me
instead do this in my version of calculator. suppose that we take away
already the training wheel that is the cs50 library here and let me instead
then use just Python's built-in function called input which literally does just
that it gets input from the user and it stores it as before in X and
Y so this is not cs50 specific this is real world Python Programming well let
me go ahead and run again python of calculator. Pi and of course if x is one
and Y is 2 X + Y should of course still be three hm it's apparently 12
according to python until cs50's Library gets involved but does anyone want
to infer what's just went wrong yeah exactly the input function by Design
always returns a string of text after all that's what the human typed in and
even though yes I type the number
keys on the keyboard it's still coming back is all text now maybe we should
use like a get in function well that doesn't exist in Python all you can do is
get textual input a string from the user but we can convert one to the other
and so a fix for this so that we don't accident Al concatenate that is join X+ y
together would be to do something like this let me go back to my python
code here and whereas in C we could previously do type casting we could
convert one type to another that
generally wasn't the case when you were doing something complex like a
string to an INT you could do a Char to an INT and vice versa but for a string
recall there was a special function in the ctype library called a to I like ask e
to integer that's uh that closest analog here and in fact the way to do this in
Python would be to use a function called int which indeed is the name of the
data type 2 even though I have not yet had to type it and I can convert the
output of the input function automatically from a
with our calculator instead of just addition let me go ahead and do how
about uh div instead of addition let's do division instead so Z equals x / y
thereby giving me a third variable Z let me go ahead and run python of
calculator. piy again I'll type in one I'll type in three this time and what prog
what problem do you think we're about to see or is it gone what happened
when I did this in C albeit with some slightly more cryptic syntax when I
divided one number like one one divided by
three anyone recall yeah yeah so it would round down to the nearest integer
whereby you experience trunc so if you take an integer like one you divide it
by another integer like three that technically should be 0.33333 infinitely
long but in uh C recall you truncate the value if you divide an INT by an INT
you get back an INT which means you get only the integer part which was
the zero now python actually handles this for us and avoids the truncation
but it leaves us still with one other problem here which is
going to be for instance not necessarily visible at a GL this looks correct this
has solved the problem in C so truncation does not happen the integers are
automatically converted to a float a floating point value but what other
problem did we trip over back in week uh one what else got a little dicey
when dealing with simple arithmetic anyone recall well the syntax in
Python's a little different but let me go ahead and do this it turns out in
Python if you want to see more significant digits than what I'm seeing
here by the default which is a dozen or so let me go ahead and print out Z as
follows let me first print out a format string because I want to format Z in an
interesting way and notice this would have no effect on the difference this is
just a format string that for no compelling reason at the moment is
interpolating z in those curly braces using an F string or format string if I run
this again with one and three we'll see indeed the exact same thing but
when you use an F string you indeed have the
ability to format that string more precisely just like with percent F in Python
you could start to fine-tune how many significant digits you see in uh p in C
rather in Python you can do the same but the syntax is a little different if you
want the computer to interpolate Z and show you 50 significant digits that is
50 numbers after the decimal point syntax is similar to C but it's a little
different you literally put a colon after the variable's name 50 means show
me the decimal point and then 50 digits to the
right and the F just indicates please treat this as a floating point value so
now if I rerun python of calculator. Pi divide 1 by 3 unfortunately python has
not solved all of the world's problems for us this again was an example of
floating point in Precision so that problem is still latent so just because the
world has advanced doesn't necessarily mean that all of our problems from C
have gone away there are solutions using third-party libraries for scientific
calculations and the like
but out of the box floating point in Precision is still an issue meanwhile there
was one other problem in C that we ran into involving numbers and that was
this integer overflow recall that an integer in C only took up what like 32 bits
typically which meant you could count as high as four billion or maybe if
you're doing positive and negatives as high as two billion after which weird
things would happen the number would go to zero or negative or just it
would overflow or wrap back around well
wonderfully in Python they did it at least address this whereby you can count
as high as you want and python will just use more and more and more and
more bits and bites to store really big numbers so integer overflow is not a
thing with that said python is limited to how many digits it will show you on
the screen at once as a string but mathematically your math will be correct
now so we've taken a couple steps forward One Step sideways but indeed
we've solved some of our problems here all right questions now
now on any of these examples thus far question all right well how about uh
how about another problem that we encountered in C let's revisit it here in
python as well so let me go ahead and on the left hand side here let me open
up a file called say compare let's see uh how about a file called compare 3. C
on the left and let me go ahead and create a new file on the right called
compare. Pi because recall that bad things happened when we needed to
compare two values in C so on the left here is a reminder of what we once
did
in C whereby if we want to compare values we can get an INT in C stored in x
a get int in C stored in y we then have our familiar conditional logic here just
printing out if x is less than y or not well we can certainly do the same thing
ultimately in Python by using some fairly familiar syntax and let's just
demonstrate this one quickly let me go over here too I'll do from cs50 import
uh get int even though I could do this instead with the input function itself x
equals get int and I'll prompt the user
for that y equals get int and I'll prompt the user for that after that recall that I
can say without parentheses if x is less than y then print out without the F uh
X is less than y then I can go ahead and say else if x is greater than y i can
print out uh quote unquote X is greater than y if you'd like to interject now
what did I screw up anyone yeah L if right so L if L if x is greater than y else
this part's the same print X is equal to Y so there's not all that much new
there's no New
Logic going on here but at least syntactically it's a little cleaner indeed this
program is only 11 lines long albeit without any comments let me go ahead
and run python of compare. Pi let's see is 1 less than two indeed let's run it
again is 2 less than 1 no it's greater than and let's lastly type in one and one
twice X is equal to Y so we've got a pretty side by side onetoone conversion
here let's do something a little more interesting then and see how about I
open instead something where we
actually going to be a little bit different here let me go ahead and in the
python version of this let me do something like this uh we'll use get string uh
actually no we'll just use input in this case so let's do uh s equals input and
we'll ask the user the same thing do you agree question mark then let's go
ahead and say if s equals equals how about uh y huh how do I do this well a
few things turns out I'm going to do this s equals equals little y then I'm
going to go
ahead and print out a agreed and L if s equals equals capital N or S equals
equals lowercase n I'm going to go ahead and print out not agreed and I
claimed for the moment that this is identical now to the program on the right
the program on the left in C but what's different so we're still doing the same
kind of logic these equal equals for comparing for equality but notice that
nicely enough python got rid of the two vertical bars and it's just literally the
word or if you recall seeing
if you want to manipulate individual character you use a string that is to say
a stir of size one now in Python you can use single quotes or double quotes
I'm deliberately using double quotes everywhere just for consistency with
how we treat strings in C it's pretty common though to use single quotes
instead if only because on most keyboards you don't have to hold the shift
key anymore I mean humans have really started to optimize just how quickly
they want to be able to code so using a single quote
tends to be pretty popular in Python and other languages as well they are
fundamentally the same uh single or double unlike in C where they have
meaning so this is correct I claim and in fact let me run this real quick I'll
open up my terminal window here let me get rid of the version and see run
python of agree. piy and I'll type in y okay I'll run it again and type in little Y
and I'll stipulate it's going to work for no as well but this isn't necessarily the
only way we can do this
there are other ways to implement the same idea and in fact I can go about
doing this this instead let me go back up to my code here and we saw a hint
of this earlier we know that lists exist in Python and you can create them just
by using square brackets so what if I simplify the code a little bit and just say
if s is in the following list of values capital y or lowercase y it's not all that
different logically but it's a little tighter it's a little more compact so L ifs is in
capital N or
lowercase n i can express that same idea too so here again it's just getting a
little more pleasant to write code there's less like hitting of the keyboard you
can express yourself a little more succinctly and using the keyword in Python
will figure out how to search the entire list for whatever the value of s is and
if it finds it it will return true automatically else it will return false so if I run
agree. Pi again and type in capital y or lowercase y that still now works well I
can typen
tighten this up further if I want to add more features well what if I want to
support not just why Big Y and little y but how about yes or yes or in case the
user's yelling or you know someone who doesn't really isn't good with caps
lock types in yes wait a minute but it could be weird like do we want to
support this or this I mean this this just gets really tedious quickly
combinatorically if you consider all of these possible permutations what
would be smarter than doing something like this if you want to
just be able to tolerate yes in any form of capitalization like logically what
would be nice maybe whatever theut is you justf over to all lower and then
exactly super common Paradigm why don't we just force the users's input to
all lowercase or all uppercase doesn't matter so long as we're self-consistent
and just compare against all uppercase or all lowercase and that will get rid
of all of the possible permutations otherwise now in C we might have done
something like this we might have
simplified this whole list and just said let's say uh we'll do how about
lowercase so y or yes and we'll just leave it at that but we need to force now
s to lowercase well in C we would have used the ctype library we would have
done like two lower and called that function passing it in although not really
cuz in C type those operate on individual characters or chars not whole
strings we have actually didn't see a function that could convert the whole
string in C to lowercase but in Python
we're going to benefit from some other feature as well it turns out that
python supports what's called objectoriented programming and we're only
going to scratch the surface of this in cs50 but if you take a higher level
course in programming or CS you explore this as a different Paradigm up
until now in C we've been focusing on what's called really procedural
programming you write procedures you write functions top to bottom uh left
to right and when you want to change some value we were in the
habit of using a procedure that is a function you would pass something like a
variable into a function like to Upper or to lower and it would do its thing and
hand you back a value well it turns out that it would be nicer programming
wise if some data types just had built-in functionality like why do we have
our variables over here and all of our helper functions like two upper and two
lower over here such that we constantly have to pass one into the other it
would be nice to sort of bake into our data types buil-in
functionality so that you can change variables using their own default be uh
buil-in functionality and so object-oriented programming otherwise known as
oop is a technique whereby certain types of values like a string AKA stir not
only have properties inside of them attributes just like a struct in C your data
can also have functions built into them as well so whereas in C which is not
objectoriented you have strs and strs can only store data like a name and a
number when implementing a person in Python you can for instance
have not just a structure otherwise known as a class storing a name and a
number you can have a function like call that person or email that person or
actual verbs or actions associated with that piece of data now in the context
of strings it turns out that strings come with a lot of useful functionality and
in fact this URL here which is in docs. [Link] which is the official
documentation for python you'll see a whole list of methods that is functions
that come with strings that you can
actually use to modify their values and what I mean by this is the following if
we go through the documentation poke around it turns out that strings come
with a function called Lower and if you want to use that function you just
have to use slightly different syntax than in C you do not do to lower and you
do not say as I just did lower because this function is built into s itself and
just like in C when you want to go inside of a variable like a structure and
access a piece of data inside of it like name or
number when you also have functions built into data types AKA methods a
method is just a function that is built into a piece of data you can do s do
lower open pen closed pen in this case and I can do this down here as well if
if s do lower in quote unquote uh n or no the whole thing I can force this
whole thing to lowercase so the only difference here now is in object-oriented
programming instead of constantly passing a value into a function you just
access a function that's inside of the value it just works
because of how the language itself is defined and the only way you know
that these functions exist is the documentation a Class A book a website or
the like questions now on this technique all right I claim this is correct now
even though you've never programmed most of you in Python before not
super well-designed there's an subtle inefficiency now on lines three and five
together what's dumb about how I've used lower might you think yeah yeah
if you're going to use the same function twice and ask the same
question expecting the same answer why are you calling the function itself
twice maybe we should just store the result in a variable so we could do this
in a couple of different ways we for instance could go up here and create
another variable called T and set that equal to s. lower and then we could
just change this to be T here but honestly I don't think we technically need
another variable Al together here I could just do something like this let's
change the value of s to be the lowercase version
thereof and so now I can quite simply refer to S again and again like this this
reusing that same value now to be sure I have now just lost the user's
original input and if I care about that if they typed in all caps I have no idea
anymore so maybe I do want to use a separate variable altogether but a
takeaway here too is that strings in Python are technically what we'll call
immutable that is they cannot be changed this was not true in C once we
gave you arrays in week two or memory in week
four you could go to town on a string and change any of the characters you
want upper casing lower casing changing it shortening it and so forth but in
this case uh this returns a copy of s forced to lowercase it doesn't change the
original string that is the memory the bytes in the computer's memory when
you assign it back to S you're essentially forgetting about the old version of s
but because python does memory management for you there's no maloc
there's no free python automatically frees up the original
bytes like yees and hands them back to the operating system for you all right
questions now on this technique questions on this in general I'll call out the
python documentation will start to be your friend because in class we'll only
scratch the surface with some of these things but in docs. [Link] for
instance there's a whole reference of all of the built-in functions that come
with the language as well as for instance those with the string all right well
let me go ahead and before we take
a break let's go ahead and create something a little familiar to based on our
week here in C let me propose that we revisit those examples in involving
some meow so for instance when we had our cat meow back in the first
week and then second in C we did something that was a little stupid at first
whereby we created a file as I'll do here this time called meow. p and if I
want a cat to meow three times I could run it once like this little copy paste
and now python of meow. py and I'm done now
we've visited this example like two times at least now in scratch it and see
it's correct I'll stipulate but what's obviously poorly designed what's the fault
here yeah it should just be a loop right like why type it three times literally
copying and pasting is almost always a bad thing except in C when you have
the function prototypes that you need to borrow but in this case this is just
inefficient so what could we do better here in Python well in Python we could
probably change this in a few different
ways we could borrow some of the syntax we proposed in slide form earlier
like give me a variable called I set it to zero no semicolon while I is less than
three if I want to do this three times I can go ahead and print out meow and
then I can do i+ equals one and I think this would do the trick python of
meow. and we're back in business already well if I wanted to change this to a
for Loop well in Python it would be a little tighter but this would not be the
best approach so for I in uh 012 I could just do print meow like this
and that too would get the job done but to my to our discussion earlier this
would get stupid pretty quickly if you had to keep enumerating all of these
values like what did we introduce instead the the range function exactly so
that hands me back way more efficiently just the values I want indeed one at
a time so even this if I run it a third a third or fourth time we've got the same
result but now let's transition to where we went with this back in the day how
can we start to modularize this like just like it would
you literally say defa to define a function you give it a name like meow and
now now I'm going to go ahead and in this function just print out meow and
this lets me change it to anything else I want in the future but for now it's an
abstraction and in fact I can uh move it out of sight out of mind just going to
hit enter a bunch of times to pretend like now it exists but I don't care how it
is implemented and up here now I can do something like this 4 I in range of
three let me go ahead and not print meow
anymore let me just call meow and tightening up my code further but I think
let's see python of meow. py this is I think going to be the first time it does
not work correctly okay so here we have sadly our first python error and let's
see the syntax is going to be different from C or clangs output traceback is
like the term of art here this is like a trace back of all of the lines of code that
were just executed or really functions you called the file name is
uninteresting this is like my codes
space specifically but the file name is important here meow. py uh line two is
is the issue okay I didn't get very far before I screwed up and then there's a
name error and you'll see in Python there's typically these capitalized uh
keywords that hint at what the issue is it's something related to names of
variables meow is not defined all right you're programming python for the
first time you've screwed up you're following some online tutorial you're
seeing this
reason through it like why might meow not be defined what can weer infer
about python how to troubleshoot logically is it me ised after maybe is it
because meow is defined after you know as smart as python seems to be
Visa VC they have some similar design characteristics so let's try that so let
me scroll all the way back down to where I move this earlier let me uh get rid
of it way down there I'll copy it to my clipboard and let me just kind of hack
something together let me just put it up
here and let's see if this works so now let me clear my terminal run python of
meow. okay we're back in business so that was actually really good intuition
good debugging technique to sort of reason through it now this is kind of
contradicting what I claimed back in week one which was that you know the
main part of your program ideally should just be at the top of the file like
don't make me look for it it's not a huge deal with like a four-line program but
if you've got 40 lines 400 lines you
don't want like the juice juicy part of your program to be way down here and
all of these functions way up here so it would be nice maybe if we actually
have a main function and so it actually turns out to be a convention in
Python to define a main function it's not a special function that's
automatically called like in C but humans realized you know what that was a
pretty useful feature let me Define a function called main let me indent these
lines underneath it let me practice what I'm
preaching which is put the main code at the top of the file and wonderfully in
Python now you do not need prototypes there's none of that hackish copying
and pasting of the return type the name and the arguments to a function like
we needed in C this is now okay instead except for one Minor Detail let me
go ahead and run python of meow. hopefully now I've solve this problem by
having a main function but now nothing has happened all right even if you've
never programmed in Python before What might explain this
behavior and how do I fix again when you're off in the real world learning
some new language all you have is deductive logic to debug yeah I
remember right so the solution to be clear in C was that we had to put the
Prototype up here otherwise we'd get an error message in this case I'm
actually not getting an error message and indeed I'll claim that you don't
need the prototypes in Python just not necessary because that was annoying
if nothing else but what else might explain yeah
I'm back yeah maybe you have to call Main itself if main is not some some
special status in Python maybe just because it exists isn't enough and indeed
if you want to call Maine the new convention is actually going to be as the
very last line of your program typically to literally call Main it's a little stupid
stupid looking but you know they made a design decision and this is how
now we work around it python of meowy now we're back in business but now
logically why does this work the way it does well in
this case top to bottom line one is telling python to define a fun fun called
Main and then Define it as follows lines two and three but it's not calling
main yet line six is telling python how to define a function called meow but
it's not calling these lines yet now line 10 you're telling python call Main and
at that point python has been trained if you will to know what main is on line
one to know what meow is on line six and so it's now perfectly okay for
Maine to be above meow because
you never called them yet you defined defined and then you called and
that's the logic behind this any questions now on the structure of this
technique here now let's do one more then recall that the last thing we did in
scratch and in Python uh scratch and in C was to actually parameterize uh
these same function so suppose that you don't want Maine to be responsible
for the loop here you instead want to very simply do something like meow
three times and be done with it well in Python it's going to be
similar in spirit toy but again we don't need to keep mentioning data types if
you want me now to take some argument like a number n you can just
specify n as the name of that argument or you could call it anything else of
course that you want you don't have to specify int or anything else in your
code now inside of meow you can do something like for I in let's say I
definitely now can't do this because like that would be weird to start the list
and end it with n so if I can come back over here what's
the solution how can I do something n times yeah using range so range is
nice cuz I can pass in now this variable n and now I can meow whoops now I
can print out quote unquote meow so it's almost the same as in scratch
almost the same as in C but it's a little simpler and if now I run meow. P I'll
have the ability now to do this here as well all right questions on any of this
right now we're sort of like taking the stroll through week one we're going to
momentarily escalate things to look not
only at some of these basic but also other features like we saw with face
recognition with the speller or the like um because of how many of us are
here we have a huge amount of candy out in the lobby so why don't we go
ahead and take a 10-minute break and we come back we'll do even fancier
more powerful things with python in 10 all right so we are back among our
goals now are to introduce a few more building blocks so that we can solve
more interesting problems at the end much like those that
we began with you recall from a few weeks ago we played with this sort of
two-dimensional Super Mario World and we tried to print a vertical column of
like three or more bricks well let me propose that we use this as an
opportunity to now Tinker with some of Python's more uh useful more
userfriendly functionality as well so let me code a file called mario. py and
let's just print out like that the equivalent of that vertical column so it's of
height three each one is a hash so let's do for I in range of
three initially and let's just print out a single hash and I think now python of
mario. py voila we're in business printing out just that same pyramid there or
just that same column there what if though we want to print a column of like
some variable height where the user tells us how tall they want it to be well
let me go up here for instance and instead how about we'll use um let's do
this how about uh from cs50 import how about the get in function as before
so it will deal with making sure
the user gives us an integer and now in the past whenever we wanted to get
a number from a user we've actually followed a certain Paradigm in fact if I
open up here for instance uh how about Mario in how about Mario 1. C from a
while back you might recall that we had code like this and we specifically use
the do while loop and see whenever we want to like get something from the
user maybe again and again and again until they cooperate at which point
we finally break out of the loop so it turns out
python does have while Loops does have four Loops does not have do while
loops and yet pretty much anytime you've gotten user input you've probably
used this Paradigm so it turns out that the python equivalent of this is to do
similar in spirit but using only a while loop and a common Paradigm in
python as I alluded earlier is to actually deliberately induce an infinite Loop
while true capital T and then do what you want to do like get an INT from the
user and prompt them for the height for
instance in question and then if you're sure that the user has given you what
you want like n is greater than zero which is what I want in this case cuz I
want a positive integer otherwise there's nothing to print you literally just
break out of the loop and so we could actually use this technique in C it's just
not really done in C you could absolutely in C have done a while true loop
with the parentheses lowercase true you could break out of it and so forth
but in Python this is like the python
way and this is actually a term of art this way in Python is pythonic like this is
the way everyone does it quote unquote doesn't mean you have to but that's
sort of the way like the cool python programmers would Implement an idea
like this trying to do something again and again and again until the user
actually cooperates but all we've done is take away the do while loop but still
logically we can implement the same idea now below this let me go ahead
and just print out for I in range of n this time
because I want it to be variable and not three I can go ahead and print out
the hash let me go ahead and get rid of the C version here open my terminal
window and I'll run again python of mario. py I'll type in three and I get back
those three hashes but if I instead type in four I now get four hashes instead
so the takeaway here is quite simply that this would be the way for instance
to actually get back a value in Python that is consistent with some parameter
like greater than zero how about this let's
and Implement that abstraction so Define a function now called get height
it's not going to take any arguments in this design while true I can go ahead
and do the same thing as before assign a variable n the return value of get
int prompting the user for that height and then if n is greater than zero I can
go ahead and break but if I break here I logically just like can see end up
executing below the loop in question but there's nothing there but if I want
get height to return the height what should
tighten this up a little bit logically and this is true in C I don't really need to
break out of the loop by using break recall that or know that I can actually
once I'm ready to go I can just return the value I care about even inside of
the loop and that will have the side effect of breaking me out of the loop and
also breaking me out of and returning from the entire function so nothing too
new here in terms of C versus python except for this issue of scope and I
indeed returned n at the
bottom there just to make clear that n would still exist so either of those are
correct now I just have a Python program that I think is going to allow me to
implement this same Mario idea so let's run python of mario. and okay so
nothing happened uh python of mario. py what did I do wrong yeah I have to
call Main so at the bottom of my code I have to call Main here and this is a
stylistic detail that's been subtle um generally speaking when when you are
writing in Python um there's not a cs50
style guide per se there's actually a python style guide that most people
adhere to um it's and in this case double blank lines between functions is the
norm I'm doing that deliberately although uh it might otherwise not be
obvious but now that I've called main on line 16 let's run mario. once more
aha now we get there now we see it type in three and I'm back in business
printing out the values there yeah sure why do I need the if condition at all
why can't I just return n here as by
propose now to take away get int I claimed earlier that if you're not using get
int you can just use the input function itself from python but that always
returns a string or a stir and so recall that you have to pass the output of the
input function to an INT either on the same line or if you prefer on another
line instead but it turns out what I didn't do was show you what happens if
you uh don't cooperate with the user uh with the program so if I run python
of mario. now works great
even without the get int function and I can do it with four still works great but
let me clear my terminal and be difficult now as the user and type in C for
the height instead enter now we see one of those tracebacks again this one
is different this isn't a name error but apparently a value error and if I kind of
ignore the stuff I don't understand I can see invalid literal for INT with base
10 cat that's a super cryptic way of saying that c a is not a number in
decimal notation and so I would seem to
have to somehow handle this case and if you want to be more Curious you'll
see that this is indeed a trace back and um C tends to do this too or the
debugger would do this for you too you can see all of the functions that have
been called to get you to this point so apparently my problem is initially in
line 14 but line 14 if I keep scrolling is uninteresting it's main but line 14
leads me to execute line two which is indeed in main that leads me to
execute line nine which is in get height and
okay here's the issue so the closest line number to the error message is the
one that probably reveals the most line nine is where my issue is so I can't
just blindly ask the user for input and then convert it to an INT if they're not
going to give me an INT now how do we deal with this well back in problem
set two you might recall validating that the user typed in a number and using
a for Loop and the like well it turns out there's a better way to do this in
Python and the are kind of there if you
want to try to convert something for a number to a number that might not
actually be a number turns out Python and certain other languages literally
have a keyword called try and if only this existed for the past few weeks I
know but like you can try to do the following with your code what do I want
to try to do well I want to try to execute those few lines except if there's an
error so I can say except if there's a value error specifically the one I screwed
up and created a moment ago and if there is a value error I can
print out an informative message to the user like not an integer or anything
else and what's happening here now is literally this operative word try the
pro python is going to try to get input and try to convert it to an in and it's
going to try to check if it's greater than zero and then try to return it all why
all of three of those lines are inside of indented underneath the tri block
except if something goes wrong specifically a value error happens then it
prints this but it doesn't return
anything and because I'm in a loop that means it's going to do it again and
again and again until the human actually cooperates and gives me an actual
number and so this too is what the world would call pythonic in Python you
don't necessarily rigorously try to validate the users's input make sure they
haven't screwed up you honestly take a more laxidasical approach and just
try to do something but catch an error if it happens so catch is also a term of
art even though it's not a keyword here
except if something happens you handle it so you try and you handle it it's
sort of best effort programming if you will but this is baked into the mindset
of the Python uh programming community so now if I do python of mario. py
and I cooperate works great as before try and succeed three Works four
works if though I try and fail by typing in cat it doesn't crash per se it doesn't
show me an error it shows me something more user friendly like not an
integer and then I can try again with dog not an integer I
can try again with five and now it works so we won't generally have you
write much in the way of these try except blocks only because they get a
little sophisticated quickly but that is to reveal what the get int function is
doing this is why we give you the training wheels so that when you want to
get an INT you don't have to jump through all these annoying Hoops to do so
but that's all the library is really doing for you is just try and accept you won't
be left with any training wheels
ultimately questions now on getting inputs and trying in this way anything at
all yeah Tri block it say that oh you could you put the condition outside of the
tri block short answer yes and in fact I struggled with this last night when
tweaking this example to show the simplest version I will disclaim that really
I should only be trying literally to do the The Fragile part and then down here
I should be really doing what you're proposing which is do the condition out
here the problem is though
that logically this gets messy quickly right because except if there's a value
error I want to print out not an integer I can't compare n against zero then
because n doesn't exist because there was an error so it turns out and I'll
show you this this is now the advanced version of python there's actually an
else keyword you can use in Python that does not accompany if or L if it
accompanies try and accept which I think is weirdly confusing a different
word would have been better but if you really
prefer I could have done this instead dead and this is one of these design
things where like reasonable people will disagree generally speaking you
should only try to do the one line that might very well fail but honestly this
looks kind of stupid now it's just unnecessarily complicated and so my own
preference was actually the original which was yeah I'm trying a few extra
lines that really aren't going to fail mathematically but it's just tighter it's
cleaner this way and here's again
the sort of like you know arguments you'll start to make yourself as you get
more comfortable with programming you'll have an opinion you'll disagree
with someone and so long as you can back your argument up pretty probably
all right so how about we now take away some piece of magic that's been
here for a while let me go ahead and uh Delete all of this here and let me
propose that we revisit uh not that vertical column and the exceptions that
might result from getting input but these like horizontal question marks
that we saw a while ago so I want all of those question marks on the same
line and yet I worry we're about to see a challenge here because print up
until now has been putting new lines everywhere automatically even without
those backslash NS well let me propose that we do this for I in the range of
four if I want four question marks let me just print four question marks
unfortunately I don't think this is correct yet let me run python of mario. and
of course this gives me a column instead of the row of question marks
that I want so how do we do this well it turns out if you read the
documentation for the print function it turns out that print not surprisingly
perhaps takes a lot of different arguments as well and in fact if you go to the
documentation for it you'll see that it takes not just positional arguments that
is from left to right separated by commas turns out python has supports a
fancier feature with arguments where you can pass the names of arguments
to functions too so what do I mean by this if I go back to
vs code here and I've read the documentation it turns out that yes as before
you can pass multiple arguments to python like this like hello comma David
comma me that will just automatically concatenate all three of those
positional arguments together they're positional in the sense that they
literally flow from left to right separated by commas but if you don't want to
just pass in values like that you want to actually print out as I did before a
question mark but you want to override the default behavior of print
by changing the line ending you can actually do this you can use the name of
an argument that you know exists from the documentation set it equal to
some alternative value and in fact even though this looks cryptic this is how I
would override the end of each line to be quote unquote that is nothing
because if you read the documentation the default value for this end
argument does someone want to guess is is back sln so if you read the
documentation you'll see that back sln is the implied default for
this end argument and so if you want to change it you just say end equals
something else and so here I can change change it to nothing and now rerun
python of mario. and now they're all on the same line now looks a little
stupid cuz I made that sort of week one mistake where I still need to move
the cursor to the next line that's just a different problem I'm just going to go
over here and print nothing I don't even need to print back sln because if
print automatically gives you a backslash n
just call print with nothing and you'll get that for free so let me rerun python
of mario. Pi and now it looks a little prettier at the prompt and to be super
clear as to what's going on suppose I want to sort of make an exclamation
here I could change the back sln default to like an exclamation point Just for
kicks and if I run python of mario. py again now I get this sort of you know
exclamation with question marks and exclamation points as well so that's all
that's going on here and this is what's
called a named argument it literally has a name that you can specify when
calling it in and it's different from positional in that you're literally using the
name let me propose something else though and this is why people kind of
like python there's just kind of cool ways to do things that's kind of a you
know ver it's a three line verbose way of printing out four question marks
you know I could certainly take the you know shortcut and just do this but
that's not really that interesting for anyone
especially if I want to do it a variable number of times but python does let
you do this if you want to uh multiply a character some number of times not
only can you use Plus for concatenation you can use star or an asterisk for
multiplication if you will that is concatenation again and again and again so if
I just print out quote unquote question mark Time 4 that's actually going to
be the tightest way the most distinct way I can print four question marks
instead and if I don't use four I
use n where I get n from the user bang like now I've gotten rid of the four
Loop entirely and I'm using the the star operator to manipulate it instead and
to be super clear here in so far as python does not have milock or free or
memory management that you have to do guess what python also doesn't
have anything on your minds the past couple of weeks doesn't have pointers
yes so python does not have pointers which just means that all of that
happens for you automatically underneath the hood Again
by way of code that someone else wrote how about one more throwback with
Mario we've talked about in week one this sort of two-dimensional structure
where it's like I claim like 3x3 a grid of bricks if you will well how can we do
this in Python we can do this in a couple of ways now let me go back to my
mario. py and let me do something like for I in range of we'll just do three
even though I know now I could use get int or I could use input and int and if
I want to do something two-dimensionally just like
in C you can Nest your for Loop so maybe I could do 4J in range of three and
then in here I could print out a uh hash symbol and then let's see if that
gives me nine total so if I've got a nested Loop like this python of mario.
hopefully gives me a grid no it gave me a column of nine why logically even
though I've got my row and my columns yeah yeah the line ending so in my
row I can't let print just keep adding new line adding new lines so I just have
to override this here and let me not screw
up like before let me print one at the end of the whole row just to move the
cursor down and I think now together now we've got our 3x3 of course we
could tighten this up further like if I don't like the nested loop I probably
could go in here and just print out for instance a uh a brick times three or I
could change the three to a variable if I've gotten it from the user so I can
tighten this up further so again just different ways to solve the same problem
and again sort of evidence of why a lot of people
like python there's just some more pleasant ways to solve problems without
getting into the weeds constantly of doing things like like with um uh for
loops and wild Loops endlessly all right well how about some other building
blocks lists are going to be so incredibly useful in Python just as arrays were
in C but arrays are annoying because you have to manage the memory
yourself you have to know in advance how big they are or you have to use
pointers and malok or realloc to resize them like
oh my God like the past two weeks have been painful in that sense but
python does this all for free for you in fact there's a whole bunch of functions
that come with python that involve lists and they'll ow us ultimately um to do
things again and again and again uh with uh within the same data structure
and for instance we'll be able to get the length of a list you don't have to
remember it yourself in a variable you can just ask python how many
elements are in this list and with this I think we can solve
some some old problems too so let me go back here to vs code let me close
Mario and give us a new program called scores. piy and rather than show the
C and the python now let's just focus on Python and in scores. C way back
when we just averaged like three test scores or something like that 72 73
and 33 a few weeks ago so if I want to create a list in this python version of
72 73 33 I just use my square bracket notation C let you use curly braces if
you know the values in advance but Python's just this
write all this darn code just to do something that you know Excel and Google
spreadsheets can just do like that well python is closer to those kinds of tools
but more powerful and that you can manipulate the data yourself how about
though if I want to um get a bunch of scores manually from the user and
then sum them together well let's combine a few ideas here how about this
first let me go ahead and uh import um the cs50 LI get in function from the
cs50 library just so we don't have to deal with try
and accept or all of that and let me go ahead and give myself an empty list
and this is powerful in py in C there's really there's no point to an empty
array because if you create an empty array with square bracket notation like
it's not useful for anything but in Python you can create it empty because
python will grow and shrink the list for you automatically as you add things
to it so if I want to get three scores from the user I could do something like
this for I in range of three and then I can
grab a variable called score or anything I could call get int prompt the human
for the score that they want to type in and then once they do I can do this
thinking back to our objectoriented programming capability now I could do
scores dot a pen and I Canen that score to it and you would only know this
from having read the documentation heard it in class in a book or whatnot
but it turns out that just like strings have functions like lower built into them
lists have functions like append built into them
that just literally appends to the end of the list for you and python will grow
or Shrink it as needed no more Malo or C or Realo or the like so this just
appends to the scores array the scores list that score and then again and
again and again so the array starts at sorry the list starts at size zero then
grows to one then two then three without you having to do anything else and
so now down here I can compute an average with the sum of those scores
divided by the length of the total number of scores
and to be clear length is the total number of elements in the list doesn't
matter how big the values themselves are now I can go ahead and print out
an FST string uh with something like average colon average and curly braces
and and if I run python of scores. piy I'll type in just for the sake of discussion
the three values I still get the same answer but that would have been painful
to do in see unless you committ it in advance to a fixed size array which we
already decided weeks ago was annoying or uh you
uh grew it dynamically using malok or realloc or the like all right what else
can I do well there's some nice things you might as well know exist um
instead of scores. aen you can do slight fanciness like this like if you want to
append something to a list you can actually do plus equals and then put that
thing in a a temporary list of its own and just use what is essentially
concatenation but not concatenation of strings but concatination of lists so
this new line six appends to the scores
list this tiny little list I'm temporarily creating with just the current new score
so just another piece of syntax that's worth seeing that allows you to do
something like that as well all right well how about we go back to strings for
a moment and all these examples as always are on the course's website
afterward suppose we want to do something like converting characters to
uppercase well to be clear I could do something like this let me create a
program called uppercase dop let me
prompt the user for a before string as by using the input function or get
string which is almost the same and I'll prompt the user for a string
beforeand then let me go ahead and print out uh how about the keyword
after and then end the new line with nothing just so that I can see before on
one line and after on the next line and then let me do this and here's where
python gets pleasant too with loops for C in before print c. uper n equals
quote unquote and then I'll print this here
all right that was fast but let's try to infer what's going on so line one just
gets input from the user stores it in a variable called before line two literally
just prints after but doesn't move the new line to uh the cursor to the next
line what it then does is this and in C this was a little more annoying you
needed a for loop with I you needed array in uh notation with the square
brackets but python if you say four variable in string so for c for character in
string Python's going to
automatically assign C to the first word letter that the user types in then on
the next iteration the second letter the third letter and the fourth so you
don't need any square bracket notation you just you see and python will do it
for you and just hand you back one at a time each of the letters that the user
has typed in so if I go back over here and I run for instance python of
uppercase dopy and I'll type in how about uh David in all lowercase and hit
enter you'll now see that it's all uppercase instead
which indeed exists just like lower exists and then what I can go ahead and
print out is for instance uh let's get rid of this print line here and do it at the
end after and print the value of that variable so now if I rerun uppercase Pi
type in David and all lowercase I can just uppercase the whole thing all at
once because again in C in Python you don't have to operate on characters
individually questions on any of these tricks up until now now all right how
about a few other techniques that we saw and C that we'll
bring back now in Python so it turns out in Python there are other librar you
can use two that unlock even more functionality so in C if you wanted
command line arguments you just change the proo the signature for main to
be void instead of void to be int argc comma string argv Open brackets for
an array or Char star eventually well it turns out in Python that if you want to
access command line arguments it's a little simpler but they're tucked away
in a library otherwise known as a module
called CIS the CIS or system module now this is similar in spirit to the cs50
library and that it's got a bunch of functionality built in but this one comes
with python itself so if I want to create a program like greet py in VSS code
here let me go ahead and do this from the CIS Library let's import argv and
that's just a thing that exists it's not built into main because there is no main
per se anymore so it's tucked away in that library and now I can do
something like this if the length of
argv equals equals 2 well let's go ahead and print out something friendly like
hello comma AR V bracket 1 and then close quotes else if the length of RV is
not equal to two let's just go ahead and print out hello world now at a glance
this might look a little cryptic but it's identical to what we did a few weeks
ago when I run this python of greet with no arguments it just says hello world
but if I instead add a command line argument like my first name and hit
enter now the length of arv is
no longer one it's going to be two and so it prints out hello David instead so
the takeaway here is that whereas in C argv technically contained the name
of your program like hello or dog greet and then everything the human typed
Python's a little different in that because we're using The Interpreter in this
way Technically when you run python of greet py the length of arv is only one
it contains only greet so the name of the file it does not unnecessarily
contain python itself because what's the point
of that being there omnes it does contain the number of words that the
human typed after python itself so argv is length one here argv is length two
here and that's why when it did equal to I saw hello David instead of the
default hello world so same ability to access command line arguments add
these kinds of inputs to your functions but you have to unlock it by way of
using Arvy uh instead in this way if you want to see all of the words you
could do something like this uh just as if we combine ideas
here for I in range of how about length of arv then I can do this print argv
bracket I all right a little cryptic but line three is just a for Loop iterating over
the range of length of argv so if the human types in two words the length of
argv will be two so this is just a way of saying iterate over all of the words in
arv printing them one at a time so python of greet dopy enter just prints out
the name of the program python of greet dopy with David prints out greet
dopy and then David I can keep
running it though with more words and they'll each get printed one at a time
but what's nice too about Python and this is the point of this exercise
honestly this looks pretty cryptic this is not very pleasant to look at if you
just want to iterate over every word in a list which argv is watch what I can
do I can do for ARG or any variable name in ARG V let me just now print out
that argument I could keep calling it I but I seems weird when it's not a
number so I'm changing to AR as a word instead if
of lists even though I get the terminology confused if Arvy is a list then it's
going to print out everything in it but if I want a slice of it that starts at
location one all the way to the end you can use this funky syntax in between
the square brackets which we've not seen yet that's going to start at item
one and go all the way to the end and so this is a nice clever way of slicing
off if you will the very first element because now when I run greet doy David
men I should only see David and
men if I only want one element I could do one to two if I want all of them I
could do zero onward I could give myself just two of one of them in this way
so you can play with the start value and the end value in this way to sort of
slice and dice these lists in different ways that would have been a pain in see
just because we didn't really have the built-in support for manipulating
arrays as cleanly as this all right just so you've seen it too though this one is
less uh exciting to see live if I go
ahead and create a quick program here it turns out there's something else in
the CIS Library the ability to exit programs either exiting with status code
one or zero as we've been doing anytime something goes right or wrong so
for instance Let Me Whip up a quick program that just says if the length of cy.
argv uh does not equal two then let's y at the user and say you're missing a
command line argument otherwise command line argument and let's then
return cy. exit one else let's go ahead and
logically just say print a formatted string that says hello as before cy. arv1
now things look different all of a sudden but I'm doing something deliberately
first let's see what this does so on line one I'm importing not argv specifically
I'm importing the whole CIS library and we'll see why in second well it turns
out that this Arvy the CIS library has not only the Arvy list it also has a
function called exit which I'd like to be able to use as well so it turns out that
if you import a
whole library in this way that's fine but you have to refer to the things inside
of it by using that same library's name and a DOT to sort of namespace it so
to speak so here I'm just saying if the user types in does not type in two
words yell at them with missing command line argument and then exit with
one just like in C when you do exit one just means something went wrong
otherwise print out hello to this and this is starting to look cryptic but it's just
a combination of ideas the
curly braces means interpolate this value plug it in here cy. Arvy is just the
verbose way of saying go into the CIS library and get the argv variable
therein and bracket one of course just like arrays and C is just the second
element at the prompt so when I run this version now python of exit. py with
no arguments I get yelled at in this way if however I type in two arguments
total the name of the file and my own name now I get greeted with hello
David and it's the same idea before this was a very
low-level technique but same thing here if you do Echo dollar sign question
mark enter you'll see the exit code of your program so if I do this incorrectly
again let me rerun it without my name enter I get yelled at but if I do Echo
dollar sign question mark there's the secret one that's returned again just to
show You parody with C in this case questions now on any of these
techniques here all right how about something that's a little more powerful
too we spent so much time in week zero and one
doing searching and then eventually sorting in week three well it turns out
python can help with some of this too let me go ahead and create a program
called names. py that's just going to be an opportunity to maybe search over
a whole bunch of names let me go ahead and import CIS and then just so I
have access to exit and let me go ahead and create a variable called names
that's going to be a list with a whole bunch of names uh how about Here
Charlie and Fred and George and Jenny and Percy and
lastly Ron so a whole bunch of names here and you know it'd be a little
Annoying to implement code that iterates over that from left to right and see
searching for one of those names in fact what name well let's go ahead and
ask the user to input the name that they want to search for so that we can
tell them if the name is there or not and we could do this similar to C in
Python doing something like this so for n in names where n is just a variable
to iterate over each name if how about the
name I'm looking for equals the current name in the list AKA n let's print out
something friendly like found and then let's do cy. Exit 0 to indicate that we
found whoever that is otherwise if we get all the way to the bottom here
outside of this Loop let's just print not found because if we haven't exited yet
and then let's just exit with one just to be clear I can continue importing all of
CIS or I could do from CIS import exit and then I could get rid of CIS dot
everywhere else but you know
sometimes it's helpful to know exactly where functions came from so this
two is just a matter of style in this case all right so let's go ahead and run
this python of names. piy and let's look for like Ron all the way at the end
you know all right he's found and let's search for someone outside of the
family here like herion not found okay so it seems to be working in this way
but I've essentially implemented what algorithm what algorithm would this
seem to be per lines seven and eight and nine
and 10 yeah so it's just linear search it's a loop even though the syntax is a
little more succinct today and it's just iterating over the whole thing well
honestly we've seen an even more tur way to do this in Python and this
again is what makes it a more pleasant language sometimes why don't I just
do this instead in of iterating one at a time why don't I just say this let me go
ahead and change my condition to just be How about if the name we're
looking for is in the names list we're done we found
it use the in preposition that we've seen a couple of times now that itself
asks the question is something in something else and python will take care of
linear search for us and it's going to work exactly the same if I do python of
names. search for run it's still going to find him and it's still going to do it
linearly in this case but I don't have to write all of the lower level code myself
in this case questions now on any of this the code's just getting shorter and
shorter now what about uh let's see what
else might we have here how about this it turns out let's go ahead and
Implement that phone book that we started metaphorically with in the
beginning of the course let's code up a program called phonebook dopy and
in this case let's go ahead and let's create a dictionary this time recall that a
dictionary is a little something that implements something like this like a two
column table that's got keys and values words and definitions names and
numbers and let's focus on the last of
those names and numbers in this case well I claimed earlier that python has
built-in support for dictionaries dict objects that you can create with one line
I didn't need it for speller because a set is sufficient when you only want one
of the keys or the values not both but now I want some names and numbers
so it turns out in pyth python you can create an empty dictionary by saying
dict open parenthesis close and that just gives you essentially a chart that
looks like this with nothing in it
or there's more succinct syntax you can alternatively do uh this with two
curly braces instead and in fact I've been using a shortcut all this time when I
had a list earlier where my variable uh was called scores and I did this that
was actually the shorthand version of this hey python give me an empty list
so there's different Syntax for achieving the same goal in this case if I want a
dictionary for people I can either do this or more commonly just two curly
braces like that all right well
what do I want to put in this well let me actually put some things in this and
I'm going to just move my Clos curly brace to a new line if I want to
implement this idea of keys and values the way you do this in Python is key
colon value comma key colon value so you'd implement it more in code so
for instance if I want Carter to be the first key in my phone book and I want
his number to be+ one 617495 1000 I can put that as the corresponding
value the colon is in between both are strings or
stirs so I've quoted both deliberately if I want to add myself I can put a
comma and then just to keep things pretty I'm moving the cursor to the next
line but that's not strictly required aesthetically it's just good style and here I
might do+ 1 949 468 uh 2750 and now I have a dictionary that essentially
has two rows here David uh Carter and his number and David and his
number as well and if I kept adding to this this call this chart would just get
longer and longer suppose I want to
search for one of our numbers well let's prompt the user for the name for
whose number you want to search by getting string or you know what we
don't need the cs50 library let's just use input and prompt the user for a
name and now we can use this super syntax and just say if name in people
print the format added string number colon and here we can do this people
bracket name okay so this is getting kind of cool kind of quickly kind of
confusingly so let me run this python of phone book. let's type in Carter and
indeed I
see his number let's run it again with David and I see my number here so
what's going on well it turns out that a dictionary is very similar in spirit to a
list it's actually very similar in spirit to an array in C but instead of being
limited to keys that are numbers like bracket 0o bracket 1 bracket two you
can actually use words and that's all I'm doing here on line eight if I want to
check for the name Carter which is currently in this variable called name I
can index into my people
dictionary using not a number but using literally a string the name Carter or
David or anything else to make this clear too notice that I'm at the moment
you using this format string which is adding some undue complexity but I
could clarify this perhaps further as this I could give myself another variable
called number set it equal to the people dictionary indexing into it using the
current name and now I can shorten this to make it clear that all I'm doing is
printing the value of that and in fact I
can do this even more cryptically if I this would be weird to do but if I only
ever want to show David's phone number and never Carter's I can literally
quote unquote index into the people dictionary because now when I run this
even if I type Carter I'm going to get back my number instead but that's all
that's happening if I undo that because that's now a bug but I index into it
using the value of name dictionaries are just so wonderfully convenient
because now you can associate anything with anything
else but not using numbers but entire keywords instead so here's how if in
spell we gave you not just words but hundreds of thousands of definitions as
well you could essentially store them as this and then when the human
wants to look up a definition in a proper dictionary not just for spellchecking
you could index into the dictionary using square brackets and get back the
definition in English as well questions on this yeah location a really good
question so how to summarize how is python finding that
name within that dictionary this is where honestly speller in pet 5 is what
Python's all about so you have struggled are struggling with implementing
your own spell checker and implementing your own hash table and recall
that per last week the goal of a hash table is to ideally get constant time
access not something linear which is slow and even better than something
uh uh logarithmic like log base 2 of n so Python and the really smart people
who invented it they have written the code that does its best
to give you constant time searches of dictionaries and they're not always
going to succeed just as you and your own problem set probably going to
have some collisions once in a while and start to have chains of length lists
of words but this is where again you defer to someone else someone smarter
than you someone with more time than you to solve these problems for you
and if you read Python's documentation you'll see that it doesn't guarantee
constant time but it's going to ideally optimize the data
structure for you to get as fast as possible and of all of the data structures
um like a dictionary a hashtable is really like the Swiss army knife of
computing because it just lets you associate something with something else
and even though we keep focusing on names and numbers that's a really
powerful thing because it's more powerful than lists and arrays which are
only numbers and something else now you can have any sorts of
relationships instead all right let me show a few other examples before we
culminate with
some more powerful techniques in pythons thanks to libraries how about this
problem we encountered in week four which was this let me code up a
program called again compare. Pi here but this time compare to Strings and
not numbers so let me for instance do uh get one string from the user called
s just for the sake of discussion let me get another string from the user uh
called T so that we can actually do some comparison here and if s equals
equals T let's go ahead and print out that they're the same else let's go
ahead and
print out that they're different so this is very similar to what we did in week
four but in week four recall we did this spef specifically because we had
encountered a problem for instance if I run whoops uh if I run what's going
on uh input T come on oh the okay wow okay long day all right if I run the
proper command python of compare. py then let's go ahead and type in
something like cat in all lowercase cats in all lowercase and they're the same
uh if though I do this again with dog and dog they're the same
and of course cat and dog they're different but does anyone recall from two
weeks ago when I typed in my name twice both identically capitalized what
did it say that they were in fact different and why was that like why were two
strings in C different even though I typed literally the same thing two
different places in memory so each string might look the same aesthetically
but of course was stored elsewhere in memory and yet python appears to be
using the equality operator equals equals like you and I
something like this how about give myself a second variable T set it equal to
s. capitalize which note is not the same as upper capitalize by Design per
Python's documentation will only capitalize the first letter for you I can now
print out say two F strings here what the value of s is and then let me print
out with another F string what the value of t is and recall that in C this was a
problem because if you capitalize s and store it in t we accidentally
capitalized both s and t but in this case in Python when I
actually run this and type in cat in all lowercase the original s is unchanged
because when I use capitalize on line three this is indeed capitalizing s but
it's returning a copy of the result it cannot change S itself because again for
that technical term s is IM mutable strings once they exist cannot be
changed themselves but you can return copies and modified mutated copies
of those same strings so in short all of those headaches we encountered in
week four are now solved really in the way
you might expect and here's another one that we dwell on in week four with
the colored uh liquid in glasses let me code up a program called swap. pi and
in swap. pi let me set x equal to 1 y equal to 2 and then let me just print out
an F string here so how about X X is this comma Y is that and then let me do
that twice just for the sake of demonstration and in here recall that we had to
create a swap function but then we had to pass it in by reference with the
Amper sand and like oh my God like that was kind of
peak complexity and see well if you want to swap X and Y in Python you
could do X comma yal y comma X and now python of swap whoops python of
swap. and there we go all of that's handled for you sort of like a shell game
without even a temporary variable in mind so what more can we do here how
about a few final building blocks and these related now to files from that
week four suppose that I want to save some names and numbers in like a a
CSV file comma separated values which is like a very lightweight
spreadsheet well
first let me create a uh phonebook CSV file that just has name comma
number as like the first row there but after after that I'm going to go ahead
now and code up a phonebook dopy program that actually allows me to add
things to this phone book so let me split my screen here so that we can see
the old and the new and down here in my code for phone book. in this new
and improved version I'm going to actually import a whole other Library this
one called CSV and here too especially for people in data
science and the like really like being in a manipulate files and data that
might very well be stored in spreadsheets or csvs comma separated values
which we we saw briefly in week four in phonebook then it suffices to just
import CSV after reading the documentation therefore because this is going
to give me functionality in code related to CSV files so here's how I might
open a file in Python I literally call open it's not fop now it's just open and I
open this file called [Link]
going to open it in a pend mode not right where it would change the whole
thing I want to append new line at a time uh after this I want to get maybe
how about a name from the user so let's prompt the user for some input for
their name and then let's prompt the user for a number as well using input
prompting for number all right and now this is a little cryptic and you'd only
know this from the documentation but if you want to write line rows to a CSV
file that you can then view in Excel or the like
you can do this give me a variable called writer but I could call it anything I
want let me use a CSV do writer function that comes with this CSV Library
passing in the file this is like saying hey python treat this open file as a CSV
file so that things are separated with commas and nicely formatted in rows
and columns now I'm going to do this use that writer to write a row well what
do I want to write I want to write a short list namely the current name and
the current number to that file but I don't want to use FR
printf and percent s and all of that stuff that we might have had in the past
and now I just want to close the file let me reopen my ter teral let me run
python of phone book. and let me type in how about David and then +1
949468 2750 and hold crossing my fingers watching the actual CSV at top
left my code has just added me to the file and if I were to run it again for
instance with Carter in plus1 617495 1000 crossing my fingers again we've
updated the file and it turns out there's code
now via which I can even read that file but I can first tighten this up just so
you've seen it it turns out into python it's so common to open files and close
them you know humans make mistakes and they often forget to close files
which might then end up using more memory than you intend so you can
alternatively do this in Python so that you don't have to worry about closing
files you can use this keyword instead you can say with the opening of this
file as a variable called file do all of the following
underneath so I'm indenting most of my code I'm using this new python
specific keyword called wi and this is just a matter of saying with the
following opening of the file do those next four lines of code and then
automatically close it for me at the end of the indentation it's a minor
optimization but this again is sort of the pythonic way to do things instead
how else might I do this too well it turns out that the code I've written here
on line n especially is a little fragile right if
any human opens the spreadsheet the CSV file in Excel Google spreadsheets
Apple numbers and maybe like moves the columns around just because
maybe they're fussing they SA it and they don't realize they've now changed
my assumptions I don't want to necessarily write name and number always
in that order CU what if someone screws up and flips those two columns by
literally dragging and dropping so it turns out that instead of using a list here
we can use another feature of this Library as
number so if you flip them no big deal it's going to notice oh wait the
columns changed and it's going to insert the columns correctly so just again
another more powerful feature that lets you focus on lets you focus on real
work as opposed to actually uh getting tied up in the weeds of writing code
like this otherwise questions on this one as well but what we will do now is
come full circle to some of the more uh sophisticated examples with which
we began and I'm going to go back over to
my own Mac laptop here where I've got my own terminal window up and
running and I was just going to introduce a couple of final libraries that really
speak to just how powerful python can can be and how quickly you can get
up and running to be fair can't necessarily do all of these things in the cloud
like in code spaces because you need access to your own speakers or
microphone or the like so that's why I'm doing it on my own Mac here but let
me go ahead and open up a program called
speech. and I'm not using VSS code here I'm using a program called VI that's
entirely terminal window based but it's going to allow me for instance to
import the python text to speech version three Library I'm going to give
myself a variable called engine that's going to be set equal to the python
text to speech 3 libraries init method which is just going to initialize this
library that relates to text to speech I'm going to then use the engines say
function to say something like how about hello comma
world and then as my last line I'm going to say engine. runand wait
capitalized as such to tell my program now to run that speech and wait until
it's done all right I'm going to save this file I'm going to run python of speech.
py and I'm going to cross my fingers as always and hello world all right so
now I have a program that's actually synthesizing speech using a library like
this how can I now modify this to be a little more interesting well how about
this let me go ahead and prompt the user for their
name like we've done several times here using Python's built-in name
function and now let me go ahead and use a format string in conjunction
with this Library interpolating the value of name there and at least if my
name is somewhat phonetically pronounceable let's go ahead and run
python ofp speech. type in my name and hello David okay it's a sort of weird
choice of inflection but we're starting to synthesize voice not unlike Siri or
Google assistant or Alexa or the like now we can maybe do something a little
more advanced too in addition to synthesizing speech in this way we could
synthesize for instance uh an actual graphic let me go ahead now and do
something like this let me create a program called qr. I'm going to go ahead
and import a library called OS which gives you access to operating system
related functionality in Python I'm going to import a library I've pre-installed
called QR code which is a two-dimensional barcode that you might have seen
in the real world I'm going to
go ahead and create an image variable using this QR code library's make
function which per its documentation takes a URL like one of cs50's own
videos so we'll do this with YouTube uh /xv f zj5 p g g0 so hopefully that's the
right lecture and now we've got image. saave which is going to allow me to
create a file called qr. ping think back now on problem set 4 and how painful
it was to save files we'll just use the save function now in Python and save
this as a ping file portable Network graphic and
then lastly let's just go ahead and open uh the with the command open qr.
ping on my Mac so that hopefully this just automatically opens all right I'm
going to go ahead and just double check my syntax here so that I haven't
made any mistakes I'm going to go ahead and run python of qr. enter that
opens up this let me go ahead and zoom in if you've got a phone handy and
you'd like to scan this code here whether in person or online I apologize you
won't appreciate it amazing okay and lastly let me go
back into our speech example here uh create a final ending here on our final
moments and how about we just say something like this was cs50 like this
let's go ahead here fix my capitalization just for tidiness get rid of the name
and now with our final flourish and your introduction to python equipped
here we go this was cs50 all right we'll see you next [Applause] time [Music]
[Music] [Music] [Music] [Music] all right this is cs50 and this is already week
seven and this is the week where we'll continue where we left off
with python introducing you to a bit more syntax and capabilities of the
language so you can solve like interesting problems but a lot of those
problems increasingly are now going to involve data in some form after all if
you think of most any website or mobile app uh or uh process nowadays that
involves solving problems it almost always involves some amount of data
and often data at scale lots and lots of data and so what we're going to see
first today is that yes you can use Python to solve all the problems past
that we've seen and also some data specific ones but sometimes it's just
going to be annoying it's going to be a little painful it's just going to be more
work than you might like to just get to some answer and so today we'll too
introduce you to a new language called SQL structured query language and
this is a language that wur is actually much smaller relatively speaking than
C and python it sort of does less but it does it really well and it's a language
for querying databases storing data in
it updating it inserting it deleting it and so much more and it's the kind of
Technology that's used nowadays in indeed web apps and mobile apps data
science analytics and and so much more it's really good at storing lots and
lots of data now this is yet another language and believe it or not next week
we'll introduce you to three more languages HTML and CSS which are not
technically programming languages they're all about Aesthetics and markup
of information but also JavaScript which
is in fact a programming language but the goals here in cs50 really are going
to be to empower you to program more generally and indeed when you're
out there in the real world some years from now invariably there's going to
be some new other popular language out there and hopefully in this week
and next week and Beyond among the goals is not just to teach you these
languages specifically but again like how to teach yourself the future
languages that we've not even heard about just yet so with that said
let's begin with a survey of sorts if you go to this URL on your phone or
laptop cs50. l/f favorites a very simple Google form awaits you that's just
going to ask you a couple of multiple choice questions so go to cs50. l/f
favorites and that should lead you to a Google form that looks a little
something like this asking you first as of now in week seven what is your
favorite language among those options here and then further down one more
question if you think back on problem sets 0 through six what was if any your
favorite uh problem set problem be it in scratch or C or python so answer
those two questions and in a moment I'll flip over to my screen here where
you'll see and anyone who's used Google forms knows the spreadsheet that's
collecting now this data um Microsoft Office 365 can do the same if you use
one of those forms and what you see here now is a spreadsheet in Google
Sheets enumerating all of the audience's questions language is in column B
problem is in column C and each row represents one student who has
responded uh a few of you were super eager for class today at 8:33 a.m.
eastern time 10:32 11:10 okay so now we're getting into the actual class
time here and if I scroll down we'll probably see few dozen couple hundred
Answers by now and yeah so we're getting a whole lot of answers here and
I'm seeing some patterns emerge but but it's not necessarily obvious to the
human eyes what those patterns are now of course you can use Google
spreadsheets you can like highlight the data and you can
create charts magically out of it but you can only do what Google lets you do
with the data and same thing for Microsoft Excel or apple numbers but
wouldn't it be nice to just be able to manipulate the raw data relatively
simple though it is to just answer questions about the data Maybe long-term
create your own charts customize it just the way you want rather than be
holding to like software that's off the shelf like this well how could we go
about doing this well let me propose that we
treat this data set now as what we're going to call for now a flat file database
we'll see today that there's fancier databases but the simplest database in
the world is really just like a CSV file and we saw that a couple of weeks ago
in C we wrote a bit of C code that used F printf to write data to a file using
commas as the separator we didn't really do much more with csvs at the
time though because it's really annoying painful timec consuming not fun to
use see for something like that
because of malok and memory and all that stuff but with python it's going to
be much easier and so anytime you have access to some data set where you
can just like download it to your own Mac or PC or your Cloud environment
it's sort of a candidate for now writing code to do something with the data
Maybe analyze it right away if it's been human inputed manually maybe you
have to clean it up by doing a lot of find and replace but not with your
keyboard but rather with code and so let me go ahead and do this
let me go back to my uh Google sheet here that has all of the data that's
come in now and let me go ahead and download this via uh the file menu
here and let's see download and you can see a whole bunch of options of
most formats might be familiar but today we'll focus just on this one comma
separated values or CSV that's going to go ahead and download it on my Mac
here into my own downloads folder and now I'm going to go ahead and do
this let me go ahead and pull up VSS code in the cloud
here and if you've never done this before there's a couple of ways to do it
but the simplest way to upload a file to your codes space so to speak is just a
sort of drag and drop that's going to magically upload it to the server there
and we'll see that one it has a very long file name which I'm I'm actually
going to clean this up because that's going to be very tedious to type in my
code so I could either rightclick of of course up here but I'm going to use my
Linux command so let's move this file
called cs50 2022 something or other and let's just name it more simply
favorites. CSV so all lowercase no spaces sort of good Basics and let me go
ahead now and open up this file with code favorites. CSV I'll close my file
explorer and we'll see exactly the same data as before but not quite as
pretty as Google Sheets makes it be rather we see here that I still have three
columns timestamp language problem and then all of the values down below
including the timestamps and the answers therefore but
it doesn't have proper columns it just has commas separating them now we
could very easily write python code just like we wrote code to manipulate
files like this either to write or read but in instead let's do something that's a
little more pleasant which is indeed in the form of python so python actually
comes with Native support for csvs it has indeed a package called CSV that
just lets you read and write and do a whole bunch of useful stuff when it
comes to CSV files so let's go ahead and
do something with this file let me go back here to VSS code I'm going to
close favorites. CSV for now but just remember in your mind that timestamp
was the first column language was the second column and problem was the
third and notice because we're using commas they don't again line up
perfectly but that's not a problem there are two commas in every line
presumably and I'm going to go ahead and now create a file called how
about uh favorites. piy so that I can start writing some code to
manipulate this data and let's do something simple let's just write a simple
program in Python that opens this file reads it and prints something out just
as like a safety check that I know what I'm doing even though it's not going
to be useful so in Python if you want CSV support you import CSV and that
gives you access to all the magical capabilities thereof let me now go ahead
and use this technique to open a file in Python which is similar in C but with
python I'm going to do this the keyword
with I'm going to open a file called favorites. CSV which was the shorter
name I gave it this is optional but just for explicitness I'm going to open it in
read mode explicitly just like f open took a second argument as well and I'm
going to name this file once open quite simply file though I could call it
anything I want and now it's just an open file so far as python knows at this
moment it's just text or better yet it's just zeros and ones if you want this
python package called CSV to actually do
something useful with it you have to load this file now into the library and
the simplest way to do this is to give myself like a variable called reader
because I want to read this file though this too I could call anything else I'm
going to then set that equal to the return value of a function called CSV do
reader and I pass to that per the documentation the open file so step one I
open the file and this just gives me me access to the bytes there in step two
now with csvreader tells the python
package called CSV to do something useful with it and start analyzing the
commas and uh allow me to uh parse it further so let's go ahead and do this
let me go ahead now and within this Loop let's say this with sorry within this
open file let's do this for every row if you will or line in the file AKA reader
let's go ahead and print out now just how about Row Bracket one now what's
going on here well it turns out if you read the documentation for the CSV
reader function what it hands you back
contain all of the data from the current row but better yet what the reader
function does for me is it hands me each row not just as a big string or Stir of
text in Python it gives me what apparently based on the syntax on line six
any Instinct yeah it's giving me back indeed a list and I'm I presume the
visual clue for you was the fact that we're using square brackets here and
indeed Row Bracket one is going to be not the first but the second element in
that list and so just take a guess when
I run this code in a moment What's going to get printed the timestamp the
language or the problem the yeah oh the language the language because it's
the second column that is in the file delimited by those those commas so let
me go ahead and do this let me clear my terminal down here here let me run
python of favorites.i and enter and there's everything it was super fast but
there's a really long list here and in fact if I increase the size of my terminal
and start scrolling up you'll just see all of the raw data
now this isn't that useful yet I could have just glanced at the CSV but clearly
now I have the ability to open the file parse it so to speak that is break it up
into its constituent parts and do something with specific Parts therein all
right so if I want to do this a little more pleasantly though let me at least
least make this semantically a little cleaner and you know what just for
clarity let me just give myself a variable it's not strictly necessary but I know
that this is the favorite uh for
instance uh language so let's just call it favorite set it equal to Row Bracket 1
and now just to be more explicit in my code even though again we don't
need the variable per se this codee's of course going to do the same thing
it's just using an additional variable called favorite if I go down here scroll up
run the program again I get back the exact same data but this is a stepping
stone to something that's even more powerful about python support for CSV
files is that you don't have to just treat the
return value as a list with zero and one and two so just thinking intuitively
here why is this maybe not the best design to hand you the programmer
back the data in a list that's numerically indexed with 012 it clearly works but
critique this what could go wrong what's a little poorly designed yeah you
have to always remember what are yeah exactly so if yeah so it's up to you
to repeat it's up to you to remember like what column the data is actually in
and you know God forbid you're you're
that the very first line in this file is actually this and I paused the output this
time so that we can see more optionally I just reran favorites St pi and notice
one of these things is not like the other every output was either scratch or C
or python except for this first one why am I seeing the word language here
where did language come from you didn't have the ability to manual input oh
no where did it come from yeah yeah the header the very first row in the file
which by human convention
generally just defines what the columns represent so that there's some
human useful information there now that's not really intended to be part of
my output at the moment so there is a way to skip this if you want to skip
the first row you can actually do something like this you can say next row
and that will just ignore that row so that I'm starting really with the every row
thereafter but there's a better way to handle this than that that will get rid of
the row in the output but let me go ahead and use a
different feature of the CSV package that's just going to make this a little
cleaner Al together so let me clear my terminal window here let me undo this
next thing that I just added and instead of using a reader let me go ahead
and use a dictionary reader abbreviated dict reader that's going to now
return me the equivalent of all of the rows one at a time so I can still call it
Reader just as before but as the name implies what this reader is going to
return is not a list after list after list but a
using square bracket notation in strings or stirs on the inside just like lists
allow for numbers but this now I think is going to be a little more robust if I
run this again python of favorites. py all of that worked out fine and let me
pause the output to by using this program called more now I don't even see
the header so now whoever works it uh with python wrote the code for this
package to just analyze that first line of code use the header as you just
called it as the keys and then every
time you iterate through this Loop it updates the values the values the
values but the keys stay the same any questions then on this technique
spiced to say this would be painful in C yes Associated exactly so the keys
are always going to be quote unquote timestamp uh Pro language and
problem but on each iteration of this Loop here uh the row is going to contain
a different row of values different row values different row values so you're
going to get back one dictionary for every student who submitted the Google
form if you will while iterating through it there all right so once we have this
ability here why don't we go ahead and transition into how about not just
using that dictionary reader which makes the code a little more robust
because now if you move the columns around no big deal it doesn't matter if
the numeric indices change you can still use those keywords instead well
let's actually analyze the data now I'm just spitting it out which is not solving
any problems for anyone so let's go ahead and and count the
popularity of scratch C and Python and see what everyone's uh been thinking
here all right so how might I do this well let me go ahead and do this up here
before I start iterating let me give myself let's say three variables and to
keep things simple I'll say one variable called scratch set it equal to zero for
zero students so far C is going to equal zero and python is going to equal
zero there's a slightly prettier way of doing this just because this is like three
lines of code to do something very
together so the more defensive sort of better way to write this code I agree
would be L if favorite equals equals python then let's go ahead and
increment python plus equals 1 and if there's a new language next week
we're obviously going to have to update the code but at least we're not
miscounting we're just missing the new language so I think that's slightly
more robust all right now at the very bottom of this program and outside of
the loop when I'm all done counting let me go ahead and print
out using some f strings how about the total number of people who uh whose
favorite is scratch so this is just uh week six F string syntax let me go ahead
and print out another F string for C and I'm of course putting the variables in
curly braces all lowercase but the English words I'm doing capitalization for
Let's do an final one with f uh python colon and then in curly braces python
close quote and I think I'm done so let me just hide my terminal for a second
here's the total program same
stuff as before open favorites. CSV open it further with the dictionary reader
to do that processing for us initialize three variables to zero just so we have
something to count with and then iterate over the file row by row and this is
just some sort of week one style conditional logic albe it in Python counting
things all right so how can we now execute this let me go back to my
terminal python of favorites.i and here we go uh as of today everyone who's
reporting in live via the Google form
their favorite languages are interesting that's pretty interesting too after just
one week of python no less so but scratch is a healthy Contender there a lot
of C so a pretty good mix here so is this going to be the best way to write
this program long term well as you noted if there's a new language next
week this week we're going to have to constantly update this and here's
where you should let your mind wander to like the future like if we have a
fourth language fifth language
6th sth eth which aspect here might kind of have some code smell to it like
this probably isn't the best design to set us up for the future what might be
better than this yeah add language line yeah we have to keep adding a
language to line five and okay not a big deal we could add like SQL today
and maybe JavaScript next week but you know anytime a a line of code a line
of logic is just going to kind of grow out of control and we've had this chat a
couple of times with different syntax there's probably a better way
than that so let's do that instead of using these individual variables we could
make maybe use a list but a list would be a little confusing because like what
does braet zero mean what is bracket 1 bracket two but a dictionary recall is
like this Swiss army knife of data structures whereby you can associate
anything with anything else keys with values so I dare say a cleaner way to
solve this problem that sets us up for Less work or confusion later would be
to create like a new variable called counts if that's what we're doing
counting things up and just set it equal to an empty dictionary and you can
literally say adct with the open parenthesis closed parenthesis nothing or the
more pythonic just use open and Clos curly braces with nothing inside that
gives me an empty dictionary just like square brackets gives me a list now
my logic down here has to change a little bit but what's nice is I don't need
one conditional for every language because again if we have a fourth a fifth
a sixth that chunk of code is also
going to grow a bit out of control too so I can get rid of this here and what I
think I'm going to do is say this whatever the current favorite is from the
current Row in the file why don't we go into our counts variable at that key
and again favorite is a variable it's not quote unquote favorite it's going to be
scratch or C or Python and then why don't we go ahead and just increment
whatever the value of that count is at that key now this is technically bugy
we're really close but there is a
bug does anyone want to conjecture what the bug is yeah a good question
that answers my question in uh nonetheless so no like the magic you
describe will not happen and to repeat the the hypothesis will this
automatically create a key for every uh language that we try plugging into
those square brackets short answer no odds are this is going to create a key
error one of those traceback error messages that you've probably seen by
now either in class or in problem sets whereby if scratch hasn't appeared in
the dictionary before or C or python like then the dictionary has no clue what
you're talking about so I think we actually still need some conditional logic
but not that's going to grow longer and longer with each language what I
think we probably want to do is this if the current favorite is in the counts
dictionary and this is the pythonic way of just saying is this key in this
dictionary then go ahead and safely do count favorite plus equals 1 else to
your conjecture now else what do
I want to do counts favorite equals yeah one so initialize a brand new key to
a brand new value of one because I'm obviously just seen this language
otherwise increment again and again and now down here I just need to
tweak my syntax a little bit I don't need to print out all of these things one at
a time manually I can actually get away I think with another loop at the very
bottom here so how about I do this for each favorite in those counts and this
is again the pythonic way to iterate over all of the keys in a
dictionary go ahead and print out using an F string whatever the current
favorite is scratch or C or Python and then a colon and then figure out what
its count is and you can do that by going into the counts dictionary looking at
the favorite key and get back its value so close my curly braces I close my
quotes and even though this looks ugly at the moment now this is much
more dynamic because if we go and add SQL to the CSV file tomorrow where
we add JavaScript next week this will just work it will keep working now
automatically
all I change is the Google form not my actual code all right let's try python of
favorites.i cross my fingers as always and there now is the data as of now
questions on this code here yeah really good good question what if you
wanted to print it in a particular order uh well I could give you a couple of
solutions like if you want to print it out in it's already coincidentally in
alphabetical order so you got that for free although that's just by chance
here but there is a way to do this and let me
propose that we go down here to my Loop and I explicitly use a function you
might not have seen in Python yet but it's literally called sorted which is
going to take either a list or in this case a dictionary and by default sort it by
key alphabetically now if my intuition is correct this is not going to change
the output because it's already alphabetical but if you read the
documentation for the sorted function it takes multiple parameters
potentially some of which are named parameters and
so you can actually do this if you want to sort the counts but you want to
reverse the order for whatever reason here so that it's reverse alphabetical
order now let me go ahead and rerun this and I'll keep the previous output on
the screen enter and now it's backwards uh alphabetically if you will other
questions on this here here no how about then how about then we transition
to changing sorting by value and let me this is going to escalate a little
quickly briefly but then we'll we'll
tone it down again notice that right now this is indeed sorting by key what if
especially if I have lots of data it'd be nice to make like a top 10 list or in this
case a top three list and actually see in order of the counts the values uh
what these popular ones are so it's not C python scratch it should ideally be
python then C then scratch because of the values and the magnitude thereof
so how can I do this well it turns out there's another key another uh
parameter that you can pass to the sorted function
count is thereof in that uh in that dictionary called counts but what I can do
now down here in my newly introduced call to sorted is I can tell it what to
use as its key instead of using literally the key scratch C python I can sort of
override that behavior and say you know what to figure out what to sort by
go ahead and call this function called get value notice that I have not put
parentheses after get value because I don't want to call get value right then
and there I want to pass the get
value function as itself in argument to the sorted function so that the sorted
function written years ago by the people at python can call my version of get
value again and again and again when they try to sort sort this actual data
so now if I add that and I leave reverse equals true let's see what happens
python of favorites. py enter and now I get my top 10 or in this case top
three list and if I had more sophisticated data with like more columns Al
together that I actually care about I could even
sort this more uh powerfully as well but let me clean this up a little bit just so
you've seen it even though we won't use these that often in cs50 until the
end of the class will they come up again technically this is a little bit H this
isn't necessarily the best design to spend all this time implementing a
function and then only use it in one place in general we've argued that H you
don't necessarily need a variable if you're only going to use it in one place
you don't really need a function if
you're only going to use it in one place and here we kind of have a good
candidate for that and so it turns out in Python if you don't want to bother
creating a function just to use it once you can create what's called an
anonymous function AKA a Lambda function like the lamb Lambda symbol
familiar and a Lambda function the syntax is a little strange looking but you
say this you literally say Lambda you literally then say the name of the
argument that you want this Anonymous function with no
name to take then you have a colon and then quite simply you write what
you want the return value of this function to be you don't even say return
literally these Lambda functions are meant to be used super turly so that you
can in one line Express something like this and I admit this looks more
cryptic I think than the previous version but as you get more comfortable
with python or other languages that support this feature it allows you to not
bother with lines of code like that and just tighten
up your code a little bit so this line here lamba language colon counts
language is the oneline version of this and you don't even need to bother
picking a name for it Lambda tells python I didn't waste any time thinking of
a name for this function so questions then on this technique technique of
using python to analyze data like this any questions we're almost done with
python questions no okay so why don't we make things a little more
interesting because we had a much juicier data set with the
problems that we've assigned over the past several weeks why don't we go
ahead and quite simply you know I think we wrote pretty darn good code
here so I think we can pretty much just change a bit of it to say let's see if I
don't want language I want problem and if I want to sort by not language but
problem I think that's it I think if I didn't Overlook something here just by
changing what column I'm reading the data from and then just to be
consistent renaming my variables just so I know what I'm
looking at what will this program now do after those minor changes what will
I see when I run this what would be the first thing I see when I run this tough
crowd today yes problem yeah the problem this the top problem so the most
popular problem which I'm a little worried it might be hello or just scratch but
let's go ahead and see so let me go ahead and open my terminal window I'll
even maximize my terminal window so we can see a lot let me go ahead and
run python of favorites. py
I'm going to go ahead now and cross my fingers that I didn't mess up and hit
enter and okay great we peaked early so scratch was the most popular
program according to the data at the time I downloaded it I'm sure other
votes have come in since filter uh in week four was tied then with tan as well
Mario is a close third there and so forth so this is helpful for us on staff that
uh not so much love down here at the bottom of the list so it was a bunch of
code to write but now that we've written it in
this very versatile Dynamic way it's pretty good for just like crunching data
and doing some analytics but it's still a decent number of lines to have had
to write manually and this is where sometimes it isn't necessarily the right
tool for the job but rather the job that uh but rather a candidate for using
some other language alog together especially when it's not just a one-time
program that you run and you want to see the answer what if you want to
take input from the user and ask uh answer
questions dynamically like a mobile app would like a website would like
Microsoft Excel or apple numbers or Google Sheets would for for you well
let's make one final change for now to this version of the program and
actually take in some user input so besides just loading all of the data into
memory let's go ahead and down below here not just print out the top 10 list
if you will but prompt the user for their favorite so I'm going to use Python's
input function and I'm just going to
prompt them with favorite quote unquote like tell me what your favorite
problem what problem uh rather uh you interested in and now let me go
ahead and say if that favorite is in the counts variable so you didn't type in
something random that we didn't actually assign as a problem then let me
go ahead and print with a format string whatever that favorite is of yours and
show you the actual popularity thereof by indexing into counts using that
favorite as the key and printing this so now it's a
dynamic program it doesn't dump all of the data and all of the summations
rather it's going to allow me to see what my choice of favorite is and I'm
going to go ahead and say uh let's see I'm a fan of Mario here so enter and
indeed we see the same value we saw a moment ago but just for Mario but
the point now is that one all of this is possible two it's way easier and more
pleasant than this would have been in C this is still only 15 lines of code and
in C again there's the memory management
there's the iterating over the strings trying to find the commas there's just a
lot more work but honestly even when you just want to answer a question
like this in Excel in Apple Numbers Google sheets you know generally you
can just highlight things you can click a button and boom you get your
answer for summation or Max or Min or any of those sort of Basics wouldn't it
be nice if we weren't taking a step backwards as programmers and being
sort of more powerful and yet we now have to do more
of the work so maybe sometimes Python's not or any language is not the
best tool for the job and that's going to now allow us to introduce more
generally something called a relational database graduating from Mere flat
file databases like text files or binary files in which all of your data is stored
to something more proper but first questions really good question to
reiterate if I were to is this case sensitive so if I were to type in Mario in all
lowercase and hit enter I actually get no such response now that
other questions now on python before we leave it behind for the coming
week all right well then let's introduce these relational databases so
relational database is what like every is a super popular way of storing lots of
data like this is what the Twitter's of the world the Googles of the world the
metas of the world use to store some of their data at scale there are
alternatives to relational databases um indeed today we'll talk about a
language called SQL there's so a movement if you will or an
alternative generally called No SQL which is just the opposite you don't use
SQL there are things called objectoriented databases and the like but if
you've ever heard of MySQL or postgress SQL or uh Microsoft SQL server or
Oracle or Maria DB or a bunch of other products both free and Commercial
this is what they're talking about databases that are designed to store lots of
data and what's nice about relational databases is that they're really similar
to the spreadsheets with
which you were presumably familiar long before today's class so a relational
database is going to store as you'll see all of the data in rows and columns
now the terminology will thereafter be a little different instead of having
sheets you're going to have tables but those tables are still going to have
rows and columns and you're going to have even more control over the
performance of your data when you start to access it using this structured
query language or SQL this is a language you
can use for web apps mobile apps uh a lot of analysts would sit down and
their Mac or PC and actually ask questions of data to get back the answer
and wonderfully even though there will be some new syntax today SQL really
just does four basic things crud is the sort of crude acronym here crud is a
way of remembering that a relational database supports ultimately creating
data reading data updating data and deleting data so even if you're feeling
like wow this is a lot of new syntax which it
isn't relative to our past languages the only things you're doing really are
creating dat data reading data updating and deleting the same now a little
confusingly in SQL the corresponding functions or commands that exist that
map to crud are actually this so it's still create but there's another one called
insert uh it's not read which is more of the computer scientist way of saying
it but select which is a little more explicit like select data you care about
update is still update delete is
still delete but there's another command called drop which lets you drop that
is delete entire tables as well so you can create tables using syntax that's
generally going to look like this you'll say create table you'll give the name of
the table uh which you can call almost anything you want but generally all
lowercase no spaces is best then in parentheses you can specify a comma
separated list of the columns that you might want in this table so this is the
code equivalent in the SQL language of
like manually opening Google Sheets or Excel or numbers and like clicking in
the top left cell and like typing timestamp and then in the next typing
language and then in the third typing problem this is the way to sort of
Define what your headers are if you will in a spreadsheet but now it's called
a table now we won't use this command manually first let's do something a
little simpler we're going to start off by just importing this data ourselves and
I'm going to go ahead and do this
let me go back to vs code here I'm going to leave behind favorites. py for
now because now we're going to transition to this other language called SQL
and to do this I am going to create a new database file and I'm going to do so
using a command called SQL light 3 which is just the third version thereof
and I'm going to give the database a name of favorites. DB there's different
conventions but this is one of the most common when I hit enter this is going
to create for me a new empty database just
you how you can use SQL in Python code so that you still write python code
to do whatever you want but you can talk to databases using Python and this
is exactly how web apps mobile apps work for instance on iOS uh and an i
phone an iPad or the like if you want to store data it's very often stored in a
SQL database as we're about to do um but you might use a language called
Swift or objective c and same exists in the world of Android using Java or
cotlin or something else to query the database so
we're going to see SQL in isolation for now like an analyst might just use at
their Mac or PC but we're going to tie it together by Day end so at this
terminal uh SQL light let me go ahead and execute uh this command first I'm
going to first put SQL light into CSV mode because I'm going to cut some
Corners initially and I'm just going to automatically import all of the data that
was submitted via that Google form which I exported as a CSV and uploaded
to my code space and I'm just going to
automatically say turn this CSV file into a SQL database for me just so I don't
have to figure out what those create table commands are so to do this I'm
going to say mode CSV so that SQL light knows that this is the command uh
knows that this is a CSV file it's literally mode so the dot comes before the
keyword there and now I'm going to say do import and then the name of the
file I want to import which is favorites. CSV and now the name of the table
that I want to create with that
data and just for consistency I'm going to call it favorites I could change
these things to be anything I want but I'm going to do that and voila nothing
seems to have happened but just like in C and in Python in Linux when
nothing seems to happen that's usually a good thing it means I didn't mess
up so if I want to see what just happened there's this other command and
these commands that start with dots these are SQL light specific which is
indeed a lightweight version of SQL they're not SQL per se so
if you're using Oracle or something else like that you're not going to use
these exact commands you'll see the ones we use in just a moment and
here's the first when I type schema the schema of a database is the design
of the database what are the tables what are the columns and all of that so
when I type schema this actually in this case shows me that create table
command that was automatically run for me by just doing this import line
once I get more comfortable with SQL I could literally
type this out myself or use some program to generate that as well but what
it's creating for me is this create table if it doesn't exist even though it's
more tur than that I want to create a table called favorites and then the
columns for that table are going to be timestamp which is going to be text
comma language which is also going to be text comma problem which is also
going to be text that was just inferred very trivially by the do import
command to just figure out that yes just give me a three column
database table based on the Google form okay questions on this these are
commands you run once to get up and running you don't run these
commands frequently but we have them on the slide just for reference all
right so now let's do something a little more interesting I'm going to clear my
SQL light terminal here but I'm still in SQL light I'm going to now use some of
my first SQL commands which recall were uh were among them uh select so
crud c r UD D the r was select this is maybe the
most common the most useful the most powerful thing to use with a SQL
database selecting data to answer questions akin to the ones we were trying
to answer with python this is the general syntax anytime you want to select
data from a SQL database you literally say select you then specify the
column or columns that you want to select data from you literally write the
word from and then you specify the name of the table you want to get that
data from semicolon in this case everything that's in capitals here is a sequel
keyword strictly speaking you don't have to capitalize things but we would
encourage you to do so stylistically and especially as you're learning and
even as you're writing it it just helps to distinguish SQL from like words you
chose like the names of the columns and the data there're in so uh do adopt
early on this convention so let me go back now to my code space here I'm
running my terminal window with SQL light 3 inside of it suppose that I just
want to get all of the data from the
favorites table which was automatically imported let's do this select I want
everything well I can do timestamp comma language comma problem but
you know what here's a uh convenience already if you want everything
there's what's called a wild card character in SQL which is just a star in
asterisk which means give me every column without my knowing even what
they're called let me go ahead now and say from favorite it's semicolon and
this is the SQL way of opening the database iterating over every row
they're in printing out every row therein done so those three steps which was
like nine lines of python code give or take earlier is now one line of SQL I hit
enter there is all of the data so I see now all of the data just outputed as a
CSV here but it's not the CSV file it's now actually the table and in fact just
for good measure let me do this CU you'll see the behavior a little different
the next time we open the file I've just exited out of SQL light 3 I'm going to
rerun it but I'm not going to
reimport the data or do anything like that because my file now exists in fact
let me take one step back if I type LS at my Linux prompt there's my
favorites. py from before there's my favorites. CSV from before and here's a
third file that I did create a moment ago when I first ran SQL light 3 so the
data is persistent it's not using Ram or memory anything I do now is save
there so let's go ahead and rerun SQL light 3 with the same file but I'm not
going to I don't have to reort everything because the
file already exists let me now do that same thing again select star from
favorites to get all of the data and what you'll see now is the same data but
it's a little prettier now because I reran it I effectively disabled CSV mode this
time and what I'm now seeing is the entire contents of this database table
called favorit now there's nothing new here but you're just seeing now like an
asky or Unicode version of all of the same data from that database well
suppose I want to get a subset of the
data well let me clear my screen and just like in Linux I can hit contrl L just to
clean things up aesthetically suppose I want to get just the languages so I
could do select language from favorites and this will now select not all three
columns AKA star this will only select the language column and all of the
data they're in if I hit enter voila now I just see those there no time Stamps
no problems it's just a slice of the table if you will all right not that interesting
still still because it's
just a big column of data but now things get more interesting it turns out in
SQL that there are functions that come with this language just like C just like
python in SQL some of the more useful ones some of the simpler ones are
these here average count distinct lower Max Min upper which pretty much uh
do what they say and count is a particularly useful one let's start with that
you know it's a reasonable question to be asked uh how many people
submitted the Google form by the time I actually
downloaded the CSV well why don't we go ahead and do this let me go back
to VSS code here in my terminal window let me select not star but the count
of star so give me the count of the rows that are being returned from the
database called uh the database table called favorites now when I hit enter
I'm not going to get all the data I'm just going to get simply a number 430
rows came back so that's pretty good I now know how much data is in there
well what languages were in there well I could do select
language from favorit just as before but that's not that useful especially if I'm
inheriting the data like I'm the analyst who's been handed a data set by my
boss and they want me to like crunch some numbers okay I could like load
this into Excel I could sort it but you can use SQL now to answer pretty basic
questions too if you want to select the distinct languages in the data set
because you didn't you weren't privy to the Google form let me go ahead
and select only the
distinct languages from the favorites table and now I hit enter and I get back
a much more succinct answer just the three languages in question not really
that useful since I created the Google form but certainly if you're inheriting
data from someone else you've just downloaded a data set at least now I'm
arguably wrapping my mind around what's going on now this is not
necessary for such a small data set but I can combine these things select the
count of the distinct languages in this data set
called favorites and now I should get back what answer so hopefully indeed
an answer called three and what you're getting back notice aesthetically too
is like a mini temporary table when I asked for uh just the distinct languages
what SQL hands me back is this temporary table in memory that has one
column called language and then two row uh three rows now this is not
saved anywhere it's just executed ephemerally like this but that's why it's
depicted in this way what you're getting is subsets of your
data smaller tables containing some of your data and same thing down here
this is like a crazy long uh column name you can rename it if you really want
uh but uh that's all we're seeing there and in fact if that's a little ugly we can
actually Alias These Things N is a common uh name for a variable a number
in any programming language so I can actually alas this to be a column
called n hit enter and now I'm getting a tiny tiny table whose column is
called n that just has the one
value there all right questions on these application of these functions here no
questions yeah say a little louder as oh as as literally in English so name this
column rename this column as this technically it creates an alias for the
column so that's all yeah exactly distinct will operate on whatever you
handed in parentheses and get rid of all of the duplicates giving you back
just the unique correct other questions here yeah good question when you
define an alias like n which I just did does it
become like a variable you can reuse short answer no in this case but you
can reuse it within your same query even though these these queries are
getting a little longer admittedly statements that they are uh you can
actually reuse n in even longer queries so later in your query and we'll see a
few that are start to going to start to grow in length so it's a a nice way of
nicknaming things just to be a little more tur in your query so we can
transition to some of these more sophisticated queries because
it turns out there's some other uh techniques we can introduce as well here
are some other keywords in SQL and again even though like this is another
list of things there's only four things fundamentally we're doing creating
reading updating and deleting data these are just allowing us to like fine-
tune how we do it exactly so where is going to allow us to filter data as we'll
do in just a moment like select data where this conditional is true uh like is
going to be an alternative to an equal
just a couple of these as well let me go back to vs code here I'll clear my
screen I'm still in the same SQL light instance and let's count how many of
you likeed C without writing python code as before so let me go ahead and
select the count of the rows from favorites where the language in each row
equals c and the convention in SQL light is to use single quotes anytime
you're surrounding a string that's meant to represent a literal piece of text uh
as opposed to C which was double quotes or python which
was either so this is selecting the count of rows from favorites table where
the language in question is C enter and this gives me 98 notice though if I
omit that predicate like we did before you'll get back the total number of
rows that were in the table so where is what's called a predicate that just
allows me to filter things just like an if condition or the like in a language that
we've seen before you can be a little more specific like how many people
really liked C and the Mario program uh
problem specifically well let's do this uh let's go ahead and do select the
number of rows from the favorites table where the language is C and so it's
uh still literally the word ands and or just like in Python but not like in C uh
and equals Mario so let's see if there's any fans of both C and the Mario
problem and three of us really like those two things together in this case all
right what else can we do well more compelling might be to see kind of like
in Python for each language what was the
popularity thereof and at the moment we don't really have a way of doing
that except in Python where we had the loop and we had those variables
with the dictionary that did all that counting for us you know totally doable
but tedious especially if your job is to analyze data my God like even writing
15 lines of code to answer simple questions is kind of ridiculous SQL can do
better for us so let me go ahead and do this let me go ahead and select
every language and the count thereof from the
favorites table but this time Group by language so this was another one of
the keywords that we can use in this abbreviated list of extra features of SQL
and this one's a little takes a moment to wrap your mind around but this is
going to give me a two column temporary table where the First Column is a
language and the second column is the count thereof from this data set and
group by language just means that only show me scratch once only show me
c once only show me python once that is group
all of the identical values together but keep track of how many of them there
are and so now if I go over to SQL light and I hit enter now I have in SQL
version the exact same output that I had from python that took me what 15
plus lines before now we're down to just one because SQL structured query
language is all about constructing queries like this to answer questions and
get back answers quickly if we want to clean this up a little bit you asked
earlier about sorting order well we can do that too
there's another uh key phrase we can use here we can order by the count of
those rows and then run that query here so now unfortunately they're from
smallest to biggest but we can reverse that it turns out and my query is
starting to wrap here I'll I'll zoom out for a moment if you want to order by
count the default is in ascending order abbreviated ASC if you want to
reverse the sort in SQL instead of using reverse equals true like we did in
Python you say DC for descending order and now we get almost
the same output but flipped in Reverse so it's just a lot faster to answer
questions once of course you get some muscle memory and some comfort
with it well what else can I do you know what if I just care about the most
popular language I don't care about the second place or the third place
languages or anything else well let me add one more Clause here limit the
answer to one and no matter how many rows should come back now I just
get the number one language as of the data set we collected with 27
uh 270 votes for it questions on this any questions here no well what if uh
you know we're starting to introduce SQL and it was kind of too late to make
it into the this the Google form so it turns out there Syntax for this too you
can create data of course not just the tables but the data they're in and
here's like the typical Syntax for inserting data into a SQL database you
literally say insert into the name of the table and then in parentheses you
specify one or more columns for which you have values that
you want to insert this is to say you don't have to give values for every
column in the given row if you only have answers to some of those questions
you can enumerate them here like this but the values you insert are going to
be these so you literally say after the closed parenthesis values and then in a
second set of parentheses with the same length comma separated list you
specify what values do you want to insert so it's a little verbose and frankly
longer term you're going to use like python
code to automatically do these kinds of insertions but let's go ahead and try
this right now if I do select distinct uh language from favorites again we see
this just these three candidates but we've now taught you a bit of sequel so
let's do insert into favorites the column called language uh and you know
what let's I'm going to give a problem here the values for which and let me
Zoom back out are going to be quote unquote SQL and quote unquote 50v
you'll see soon see what
that's all about semicolon nothing seems to happen but that's usually a good
thing and now if I scroll back up in my my queries in SQL light three you can
scroll back and forth in time and uh to avoid retyping things now I should see
indeed four candidate languages here now suppose that you were never
really a fan of c and maybe you uh programmed a little bit in high school or
in the real world and you liked C++ well there's a whole lot of answers for C
so select star from favorites where language
equals quote unquote C so here's everyone who submitted the answer for C
let's presume that no they didn't really want C they wanted C++ which is not
a language we teach in the class but I could also now do this you can use the
update command to set a column or columns to different values where some
condition is met so if I do update table name set column name equal to some
value filtering it perhaps by where some condition is true so suppose I've
changed my mind or you know what let's
look at the contents of my database now you see that indeed C++ is
comingled with all the other data this is not what you all intended of course
so I can undo this let me go ahead and undo what I just did let me set my
favorite language to C where language equals C plus plus but the predicate
is important this I'm not going to do what if I accidentally omitted this
predicate the wear Clause how would that screw things up might you think
uh yeah I'm back it would set Every Rose language to
indeed see and this is dangerous and if you start googling around for like
sequel mistakes or the like people in the real world have accidentally run
commands like this and without naming names a former member of our
teaching staff at one point accidentally re ran a command like this and
changed every student's name in our database to Bobby I think it was the
same name for every row because they simply forgot a predicate so here to
two like there's dangers in code and you should adopt The
Habit quite quickly of always one backing up your data like with CP for
instance in Linux or any other technique or just making sure before you hit
enter that yes this is indeed the query I want to execute and generally
speaking in the real world there should be process controls in place like the
intern should not have access to the datab the production database the live
database and the like but you have a lot of power now with these queries so
just be all the more careful cuz very easily can you
do bad things so let me undo this where language equals quote unquote C+
+ and I'll Zoom back out enter and now I think we're back in business C is
among the answers yeah is essentially doing what at the end replace it's
essentially find and replace yes in like lay person's terms this is find and
replace implemented with SQL and in fact the authors of Microsoft Word or
Google Docs might very well be using language like this sequel when you go
to the nice graphical userfriendly find and replace box this
may very well be what they're doing underneath the hood or of course they
could be using some other language altogether there's one last uh syntax
that's worth knowing delete which for better for worse is even more
destructive whereby it allows you to delete rows from tables it's distinct from
drop which lets you delete tables themselves this focuses on rows so
suppose that you really really didn't like let's say uh tedin was a little
challenging if you adopt if you tackle
that more comfortable problem so if you really don't want to even think
about t in anymore so why don't we do uh delete from favorites where
problem equals and I won't execute it for real tiamin this would have the
effect of deleting every row including the language therein and the time
stamp where the student answered tiamin worse than this would be this why
might this be bad okay chuckling because like there's no predicate there's no
filter which means literally this would delete all of the
data so again with great power here comes great responsibility now this has
just been a data set of what 430 rows by us dynamically created there's of
course some really juicy data sets in the real world and one website you
might have heard or an app you might have used is IMDb the internet movie
database which wonderfully makes some of their data available for download
as CSV files or technically tsv files tab separated values but what we did in
advance of class was download some of that data for
both TV shows in the real world and movies in the real world and what's
wonderful about this data set is it's not just dozens or hundreds or even
thousands of lines there are millions of rows of Juicy data TV shows and
movies with which most folks are probably familiar at least with the subset
and we'll see in just a little bit that this data comes in the form of now six
different tables that we've given you and the tables and question for today
are going to be the people in the TV
business the Stars they in the shows that people are producing and the like
this is a picture we'll revisit to let enable you to wrap your minds around
what the actual data is this feels like a good opportunity though for a snack
in fact in just a moment we have a whole lot of Rice Krispie treats out in the
lobby but if folks could perhaps acknowledge uh this mini wedding cake here
cs50 own Carter zeni is getting married this week so congratulations to
Carter as well congrats all right there's
only okay there's only one piece of cake in that box but a lot of Rice Krispy
Treats in the transip let's take 10 minutes and we'll be back with Internet
Movie Database in 10 all right we are back so if you've never been like you
can actually go to I [Link] right now and play around or download the
mobile app and it's just big database of a lot of TV show and movies and
actors and the like but what indeed is nice is you can download some of that
data and that's in what I've done in advance and what we've
done is we wrote some python code to convert some of the uh flat file
databases that they let you download and we converted it into a SQL
database with six tables so not just one but six that ultimately are these here
and let me just help you wrap your minds around what this picture is which is
a entity relationship diagram which is just to say each of these boxes on the
screen represents a table and each of the arrows or edges represents some
kind of relationship across the tables because up until now the only data we
had were
those three columns in the favorites table but what's gets really useful about
SQL databases just like a Google spreadsheet or an Excel file is you can have
multiple sheets or in a database multiple tables and so what we're about to
see is that in this IMDb database for TV shows there's going to be a
dedicated table for all the people in the TV business there's going to be a
dedicated table for all of the TV shows that are in their database as of right
now there's going to be a dedicated table
for writers in that industry for the ratings of uh shows for the genres to which
shows belong comedy and the like and then lastly there's going to be this
table which somehow Associates people with the TV shows that they star in
and vice versa and so let's consider first what this looks like in code and we'll
see that it's going to overwhelm intentionally at first but I'm going to do this
I'm going to go back to my terminal window and during the break I
downloaded from the course's website a
file called shows. DB which we made in advance for you and if I type LS I'll
see all of my favorites files from before the CSV the DB and the python file
but now there's shows. DB so I'm going to go ahead in my full screen
terminal window here I'm not using actual tabs or code files now I'm going to
run SQL light3 on the file called shows. DB and I'm just going to see this
version information here let me clear my screen and run the one command I
ran earlier to show us the schema of the
favorites database now we'll see the schema for the shows database and
there's a lot going on here but let me scroll back up to the very top the
beginning and we see this here so when I run. schema we see a dump really
of all of the SQL create table commands that were run in order to create this
database for you and one of those tables is called genres and another people
ratings shows stars and so forth and the columns therein even though it's
formatted a little more prettily than the automatically generated create table
statement for favorites whereby we have one column per line of output here
uh in the for instance people table there's going to be an ID column like
unique identifier like a Harvard ID a Yale ID or the like uh a name column a
birth year and then some other stuff if I scroll down to shows every show in
the world is going to have a unique ID as well a title of course the year in
which it debuted and the total number of episodes as of the time we
downloaded the data and then what else is there
some of these are a little less obvious like ratings here so ratings don't have
an ID column but they have a show ID column and a rating like a fivepoint
scale or 10o scale or the like and then the total number of votes that were
collected to contribute to that rating IMDb allows people to like up vote and
down vote uh shows and movies and the like and then similarly is genre
structured there's a show ID and then there's a genre which is going to be
like an English word like comedy or
drama or something else and then what else let's go a little further at the
bottom here for stars and writers if we go to the very bottom here stars and
writers are similarly structured too they have a show ID and a person ID so
show and person and then this writer's table has a show ID and a person ID
and there's a whole lot of other words that we'll come to in just a moment
but what are the what is this code hinting at well if I go back to the picture
from earlier here you'll see that this
picture captures the relationships among these various tables so for instance
if we focus on shows for just a moment a show again has a unique ID a title a
year in which it debuted and a total number of episodes if you want to figure
out what genre or genr a show belongs to cuz some shows are just comedies
some shows are just just dramas but you know some shows are arguably
comedies and dramas depending on the episode or the like so you can
imagine wanting to associate two or three or even more
genres with a show this line here in this second table allows us to do that
every Row in the genres table we'll see has uh two items a show ID which
relates to the ID of a show and that's why these lines literally line up with
that specific column name and genre which is going to be like quote unquote
comedy quote unquote drama or something else now with that said design
question why have we deliberately not just gotten rid of this genre's table
and made our lives simpler by just adding a genre column to
this shows table and again a table is just like a sheet with rows and columns
at the moment shows only have four columns ID title year episodes why not
just add a fifth column called genre and put the show genre there any
intuition here why not just keep things simple like yeah and back exactly if
you add a fifth column here and call it genre then you have to pick a genre
specifically you have to put in that cell presumably comedy or drama or
music or something else now you could write multiple words in the cell
but generally speaking that would be sloppy bad design like every cell just
like in a spreadsheet should really have one value it might have multiple
words but it shouldn't be like a weirdly comma separated list of multiple
things it should just be in a different cell in that case so if you instead were
to design this with just a single column called genre you're imposing what a
computer scientist would call a on toone relationship every show has one
genre and that's not necessarily a good thing
or strictly speaking it would be a many to one because the same genre could
belong to multiple shows but each show could only have one genre in that
case what a relational database allows you to do and relational is indeed the
operative word it allows you to factor out some of your information and then
have maybe one show here in one row but then in this genres table you
could have one row for that one show genre or you could have two rows in
the genres table for comedy and for drama or if it has a
third genre you could just add another row here so you still have one row for
the show itself with all the juiciest details but a variable number of rows by
having this relationship with another table meanwhile ratings work the same
way at least in this case a show has ID title year and episodes but if you
want to figure out its rating you have to kind of Follow the arrow here so to
speak and look up the corresponding show ID in this table find the rating of
that show and the total number of ratings so
that's been factored out Two For Better or For Worse um now let's consider
people people have just three columns ID name and birth but there's no
mention of the TV show in which people have starred or the TV shows that a
person has written well why is that well if you just had a fourth column here
called show well you would have to decide what show is that person in and
no one could ever act again in another show because there's no room to
store the data but if someone of course a popular actor can St
star in multiple shows well we could have one ID for that person one name
one birth year obviously like there's only one Steve Carell as an actor in the
world of people but Steve Carell in this example could have his person ID
whatever his Harvard ID equivalent Yale ID equivalent is appear in multiple
rows in this table so that it can be associated with multiple shows and this
allows you to create what's called a one to many relationship or technically
it's bidirectional it's a many to many
relationship why well one show can certain certainly have multiple people in
it and multiple people writing for it just in the real world but conversely one
uh person could certainly act in multiple shows or write multiple shows so
this is what you get with relational databases you put your sort of canonical
data for people in one place for for shows in another place and then you use
these additional tables to relate one thing to another so we won't dwell on
the pictures that's just if you sort of
uh can wrap your mind around the data set better that way that's one way of
thinking about it but recall that the code we just saw for the schema again
escalated quickly like there's a lot of keywords I haven't mentioned yet but
some of these are perhaps familiar they're capitalized differently here but
integer is on the list here null is on the list albeit technically not null so let's
tease apart some of these key words and consider what they're actually
doing for your database because now
crazy amounts of data in the world of SQL light specifically there's these five
data types so just like in C we had int and Char and the like in SQL we have
these uh BL which is kind of funny but it just means binary large object so it's
like a binary data type zeros and ones that aren't necessarily uh fitting into
the other categories integer which of course is an integer as we know it
numeric which is kind of a catchall for numbers that are formatted specially
so like a date uh would be like year year
year year-month month Dash day day um and this is actually a wonderful
thing depending on the country you're from you might think your date
system in your country is great or it's horrible the US system is horrible
because we have month day and then year which is impossible to sort it is
the wrong way objectively to store data and yet here we are using this at
scale other countries have gotten this better numeric and SQL itself
standardizes that stuff so it doesn't matter what country you're from
you're storing your data in this particular way for instance times are
standardized and other types of numeric data as well real is synonymous
with float so something with a decimal point and some number of uh digits
thereafter and then text is just uh for strings and the like with other even
fancier databases like MySQL postgress SQL Oracle and other products you
might have heard of there's even more data types where you have to make
even finer grain decisions but for SQL light it's indeed
pretty lightweight and you or we just have to decide the data types for each
column in a table but there's these additional constraints in the world of SQL
you can additionally say that cells in this column may or may not be null so if
you want to protect yourself from yourself so you don't screw up and insert a
null that is a blank value you can explicitly design a table to have a column
that cannot be null and so in fact someone came up during the break to ask
me about my having manually inserted
SQL quote unquote SQL into our favorites database you might recall that I
kind of cheated I just inserted uh SQL quote unquote and 50v the name of a
new problem quote unquote but what did I not insert into the database a Tim
stamp and I could have I could have put like the current day in time a few
minutes ago but I didn't and that's fine if it's uh if it's acceptable to you and
the product you're building but I could have prevented that if we had defined
the table to have a
timestamp column that isn't just text but it's text that's not null SQL would
have complained and would not have let me complete that insertion so
there's these kinds of built-in defenses that you don't necessarily get with a
spreadsheet alone and unique means exactly that if you want to make sure
that every Row in that column is unique maybe for email addresses or in the
US Social Security numbers or anything that you want to make sure you
don't have two versions of you can specify that the
column is unique and there's other such constraints as well but again this is
just a list of features that you get from a proper relational database but
perhaps the most intellectually interesting one and the most powerful one is
what's called here a primary key and a foreign key and let me go back now
to this output if we look at shows you'll see that a show again has an ID a
title a year and a number of episodes and now the data types might make
sense the ID it turns out just like a Harvard
mentioned a few lines earlier this just means that the database will use the
ID column as the unique identifier so it's similar to the unique keyword but
primary key just means the database is going to treat it as special too and
make sure that it is uniquely identifying your data but what's interesting is
this notice if I scroll back up to people people were sort of similarly
structured but with different attributes like up here we had a person has an
ID a name a birth year and a primary key of ID so a ID is again
integer name is text but not null because it'd be weird to have a human with
absolutely no name textually birth is going to be numeric but the primary
key of people is ID as well so those are the unique columns that the database
will just treat special why well we just looked at shows we just looked at
people let's focus now on this one down here Stars how do you determine
who star in a TV show well we had two columns the show ID and the person
ID this is the ex the Incarnation of a many to- many
relationship one person could be in many shows one show could certainly
have many people in it or writing for it but notice this within this table of two
columns show ID and person ID there's what's going to be called a foreign
key called show ID that references the shows tables ID column and then
another foreign key called person ID though I could call these things in
parentheses anything I want that references the people tables ID column
now you're not going to often have to type commands
like this again you set the database up once in the beginning typically maybe
with some help from a TF maybe with help of Google or the like but once
your database is designed it's back to the crud like create read update delete
the selects the inserts the deletions and the like but what's this implying
these keywords like primary key and foreign key are what are doing in code
what this picture was painting a moment ago these lines here are drawn
literally to line up with the corresponding things
pretty picture or uh you know a three-hour lecture to explain what the data
set is rather you just have the data set in your own knowledge of SQL so let
me play around so schema shows me all the tables that might be a starting
point okay this is interesting I know PE what people are let's go ahead and
show me all the people so select star from people I'm just trying to wrap my
mind around what this data set looks like in a more userfriendly way okay
okay that's already a lot of people as you see the
years flying by there's been a lot of people in the TV business so this was
maybe not the best query to run but this is indicative of just how large this
data set is from IMDb okay when in doubt and when whenever you lose
control over your computer control C is your friend to interrupt what would
have been better because I don't think I need to know all the million people
in the world I could do like limit me to 10 people all right and that's enough
now to get us sense of like Fred a stair has an IDE of one
first person ever um birth year of 1899 Lauren ball and all of these other uh
people from yesterday year you see that they are the first 10 people in the
database so there's an example of some of the data now if I want to rep my
mind around what a show is you know I know it technically I know it from the
picture but let's just look at some raw data so instead of saying select star
from people let me go ahead and select star from shows limit 10 and okay
I've only heard heard of or seen a couple of these
but these are older shows at that but I see that every show has an ID a title a
year in which it debuted and a number of episodes but perhaps most opaque
is going to be this select star from Stars where this is the table that
Associates people with shows am I going to see any names or show titles
here not according to the definition we saw earlier oh I should have done my
limit let me interrupt that let me do that again limit 10 no and this is where
now you're definitely in the programmer world
because like this would be the most annoying spreadsheet to use on your
Mac or PC ever if you just had like a sheet with all of these numbers that
Associates one thing with the other like my God how do you figure out who
this is or what this is you have to like manually control F or command F
looking for the data but a database doesn't care once you know SQL you can
sort of Stitch these things back together so what you're seeing here are
foreign Keys foreign Keys why because show ID corresponds to the same
numbers from
that other table called shows that has an proper primary key called ID person
ID is a foreign key in this context because it refers to numbers that belong to
really the people table and its ID column so this is just a way of somehow
linking them and so if you think of I always like think of um this in my mind's
eye is this if this is like the people table this is the shows table and there's
this middle table in between the Stars table there's some way of like
stitching those two together by lining
up the idas of one with the other and getting back some more data so let's
actually play with some of this data how about we start where we
emphasized earlier genres so let me go ahead and take a quick look at all of
the genres in this database so select star from genres star is usually going to
be a little overwhelming but it just gives me a sense of what the data is but
let's actually look at um uh let's go look at all of them there okay that's a lot
these are all official genres from IMDb
let me oh okay it went okay it wasn't terribly long let me filter that down so
from genres where genre equals comedy uh Capital C just based on the data
I'm seeing okay so what am I seeing now and in fact let me limit this
arbitrarily to 10 though I could limit it to anything I want here are 10
comedies what are they well who the heck knows like all I know are the 10
show IDs now I could do something like this as we've seen before with SQL I
could do all right well let's figure out what this show idea is Select
star from shows where the IDE of the show I'm looking for equals what 62
614 semicolon so I could like manually look it up by cross referencing the
other table okay so that was the show in question there the first comedy in
the data set let me look up the second one so instead of that let's do 6 3881
enter okay so that's that show and let's do one more and suffice it to say this
is just getting tedious and vulnerable to mistakes quickly this is not this
surely can't be the way to do this and indeed
SQL is going to let us do this a little more powerfully instead let's do this
instead of getting this table temporarily with all these show IDs and all these
genres let's refine the query so let's just select the show ID from the genres
table where the genre equals quote unquote comedy now I have a big b list
of show IDs all of which are comedy how many well I can combine ideas from
earlier I can just count all of those show IDs and or Star if I want to just do
that too but I can count all those
show IDs 4876 comedies and IMDB's database for TV shows so feels like a lot
but how can I now use that information and get back the titles of comedies in
the database without doing it manually well let's do this I have a moment
ago this query select the show ID from genres where the current genre is
quote unquote comedy um what if I kind of Nest these queries kind of like
grade school math in parenthesis what if I combine this whole thing in
parenthesis and now let me select what I really want let me go
ahead and select how about uh the title of all shows where the idea of the
show is in this list of show IDs so if you agree that the shows table has an ID
column which is its otherwise known as its primary key the unique ID that
identifies it just like our Harvard IDs our Yale IDs and you agree that per a
moment ago this shorter query will give me back just the show IDs of all of
the comedies in the database you can actually combine or Nest these
queries together uh it's going to respect SQL
light order of operations with parentheses just like grade school math so the
thing in parentheses will be executed first that gives it back a list of IDs like
what 48,000 IDs and then this query the outer query is going to get the title
from all of the shows where the ID of the show is in that big list of 48,000 so
if I now execute these together I think the list is still going to be a little long
but let me execute it together now I see this long list of outputs a little
overwhelming let's go
ahead and maybe limit it to just 10 as before for discussion sake and now I
see 10 comedies ordered arbitrarily from however they're in the database
that happen to indeed be have comedy as their genre if I want to do this a
little more cleanly I could do this let's see uh why don't I order by title
ascending order which is alphabetically or the default is also uh ascending
limit 10 now I see the top 10 I mean weirdly named things with hash symbols
presumably to get their titles up to the beginning or
maybe these are hashtags uh here now we have alphabetically the top first
10 shows that are comedies any questions on these kinds of queries it's kind
of a lot but at the same time it's just like composing the smaller ideas from
before into slightly more useful queries yeah do for Keys have to set the
relationship when you create the table the programmer or the database
administrator would create that relationship by using those keywords
primary key and foreign key that create that teaches the database what is
related to what per the picture so you do that once and now I being the sort
of programmer who's familiar with the database I am just using these foreign
keys in a way in a manner consistent with their design but and this is where
it's useful at some point even if you no one hands you a picture to make sure
you understand the database because that's going to inform literally what
you type in SQL to get the data you care about about well let's do something
a little more precise how about very reasonable
question and honestly this is exactly what [Link] in the app are for what
if you want to find all of the shows that Steve Carell is in like kind of a
reasonable query like literally something someone might type into Google or
more specifically IMDb it's not really obvious at first glance how to do that
though because right like from my database if these are my six tables well I
can pretty easily get Steve Carell from here but I can really only get his ID
number whatever that is his name which I know already and his birth year
okay interesting but has nothing to do with the shows that he's in I can look
at shows over here but there's no mention of Steve Carell right because
there's no person ID here where is that relationship implemented well it's
implemented down here so how do we do this well here's the perfect
example of a a lesson we've trying been trying to emphasize for weeks of
taking these baby steps like break larger problems down into smaller ones
and let's do something like this let's just get everything I
know about Steve Carell from the database let's select star from people
where the name of the person is quote unquote Steve Carell I just want to
see what data we've got and here's what we have okay there's only one
Steve Carell born in 1962 and his unique ID is 136 797 according to IMDb this
isn't some like Global uh actor identifier per se all right well how do I get now
all of the shows that Steve Carell is in well I could do this select star from
Stars not to confuse the two one's the symbol one's the table
name uh where person ID equals 13 6797 so I think this will now give me
everything from the Stars table that relates to Steve Carell okay and you'll
see person ID is the same because I'm literally searching for just Steve Carell
but there are what like 20 or so shows that he's been in all right well here's
where things would get tedious what are those shows well I could do Select
Title from shows where the ID of the show equals and here's you know
whenever you copy paste you're probably
doing something wrong okay he's was in The Dana Carvey Show familiar with
that let's do another one we'll copy paste this uh where ID equals this over
the top another and if we keep digging we'll probably find the office but my
God like that's going to take forever to do 20 queries manually it's not very
Dynamic but what if we just Nest these queries a little more dynamically so
let me start from the beginning again what if we go ahead and select
everything we want we know about uh people whose name
equals Steve Carell that gave us earlier this data I don't need all of that data
I know his name I don't care about his birth year so let's change this to just
be give me the ID of Steve Carell and that gives me back now this smaller
temporary data set all right can I now use this uh inside of another query
well let me wrap the whole thing with parentheses and now let me say select
star from the Stars table where the person ID equals this so I'm Del liely not
using in because I'm assuming there's indeed only
one Steve Carell in the world so I'm not getting back a list of Steve Carell I'm
getting back the one and only in this case so equal is fine in is when you
have multiple equal is when you have one let me go ahead and hit enter now
okay that's more data than I need I don't need like 20 copies of Steve Carell's
person ID so let me hit up let me go back and let me just get show ID from
Steve Carell and now I have a list of just the 20 or so show IDs that he has
been in all right how can I now use this
well let me hit up let me put the whole thing in parentheses and now let me
select what I really want Select Title from shows where and here's the final
flourish the shows table has an ID has a title has a year and has an episode
and what I really want though is to check which shows have ID that is what
anyone want to finish the thought I just want to yeah exactly ID in this and
this is getting ugly and when you actually write your queries in like a text file
you can format them nicely and indent them my
font is just getting I don't want to make it too small to fit everything but now
we have three queries one is in doubly nested parenthesis then there's the
middle one then there's the outer one so this last query is going to get me
the title from shows where the ID of the show is in this big list of 20 or so
show IDs that Steve Carell is in and I knew that because I looked up his name
here and notice what I did not do this time is I didn't manually hardcode his
ID number there's no need that would be
consider how else we might combine data suppose that the next question
actually perhaps appropriately would be focusing in on not just like people
and shows and these stars but how do we kind of like gather more
information about the shows themselves like the genres the ratings or the
like so indeed let's focus on just these two tables here recall that every show
has an ID a title a year and episodes but it also might have one or more
relationships with rose and this other table is called genres and this is
so that a show can be a comedy can be a drama can be any number of other
things one row per so you would see the same show ID again and again and
again with a different genre written in English like comma comedy drama or
the like well how do I kind of Recon subtitute that data Well turns out there's
a few different ways to do this and let me propose that we introduce this
keyword here join and this is really the most powerful of the keywords in SQL
itself it doesn't have to be used we've seen with uh nested
queries that you can still select data across multiple tables but here is
another way so let me do this let me go back to my SQL light database and
let me select sort of in one uh breath exactly the data I want select star from
shows and let's just limit this initially to 10 to see what it looks like all right
that's again the shows data select star from genres let's limit that to 102 just
to wrap our minds around it and now this is not that useful however the data
in the leftmost column here is the primary
key in the shows table these are just unique IDs the data here in the genres
table recall show ID is the foreign key so it's the same numbers but just
copied into another table so that we can have this relationship across them
how do I kind of line up these numbers with these numbers to get back like a
wider table that has title and year and episodes and genre and heck ratings
and all of that too if we want well you can join these Tables by just telling the
database what to join on what so let me do this select
star from shows join that table though on the genres table well how do you
want to join those two tables and again the two tables from the picture
looked like this how do you tell SQL programmatically to sort of you know put
one of them right next to the other line up all of the ID so that you just get
one larger data set well we can use indeed this this query this uh syntax
called join so back to VSS code here and let me join these two tables sorry
typo here join genres on the shows tables ID column AKA it's
primary key equaling the genres tables show ID column aka the foreign key
so in other words it looks a little cryptic but I'm just telling SQL how to line up
these two tables and what column to match with the other so that the
numbers line up and I get essentially a wider table let me go ahead and hit
uh semicolon and enter and this is now going to give me a lot of data we
might have to interrupt it but notice even at a glance we're getting the ID the
title the year the number of episodes the ID
again redundantly but that's to be expected if I'm joining them and the genre
all the way on the right let me hit contrl C to interrupt let me just limit this to
the office so where title equals quote unquote the office so we can focus on
just one sample uh datum and here fun fact there's been more than one
office the one that you all probably like is this one that started in 2005 with
188 episodes its ID in the shows table is 3866 76 that's confirmed over here
too so again we've just joined the
two tables how by lining up those fields but now that we can see that almost
all of the offices produced over the decades are comedies except for this one
there was a version of the office produced in 2001 that was considered more
of uh a drama no unsure if it's related to the other how can we Link in other
data well let's go ahead and Link in ratings too or instead so instead of
joining this with genres let me go ahead and Rewind here and join shows on
ratings on shows. ID equals ratings. show ID and let's
limit it to the office 2 for discussion sake where uh title equals quote unquote
the office semicolon and now you can see that among the various offices it
looks like the one that most of us probably know and love is the highest
rated also with a 9.0 with like 585,000 people having cast votes for whereas
these other shows seem to have been less popular perhaps that's why
indeed you see fewer episodes for them as well so even though we've put
the data in multiple places you can still kind of
let me go back into VSS code here and let's just find out um Steve Carell's
information again last time we did it with this nested nested query by getting
his ID and then the show IDs and then the titles for those show IDs with join
you can do it a little differently and any of these ways are fine one might
become EAS easier to you mentally than another let's go ahead and select
the titles from what let's select the title from uh the people table but and I'm
going to hit enter and when you're using
SQL light 3 interactively if you ever find yourself with a prompt that says dot
dot dot angle bracket it means you're continuing your thought onto the next
line if you didn't intend that you can sometimes hit uh semicolon to just end
the thought and hit enter and just even if it triggers an error but this is one
way of formatting my queries now a little more nicely I'm just going to add
some wh space so that it's a little easier to read what do I want to select well
I want to select the title of shows
from the people table joined with the Stars table on the people tables ID
column equaling the Stars tables person ID column so in other words if you
think back to what people are and what stars are one has an ID one has a
person ID I'm just now connecting those two tables I'm joining those two but
I want to do this as well with with another table let me additionally join in so
now I only have two hands but now I'm putting a third table joined in
together here join shows on stars. show ID equals shows. ID
so this is now linking three tables together but I only care about this for one
person so where the name of the person equals quote unquote Steve Carell
so more cryptic to be to be sure but what we're doing with this query is just
taking all three tables that we care about and we're joining them them all
together at once using this new join syntax literally telling the database what
columns to line up with what and then we filter at the very end just like
before to get back if I hit enter the
answer we want which in this case is a little slower at the moment but that
same list of uh 20 or so shows that he's been in there's one other way to do
this and again uh these are all in the slides online so you can repeat them
without having to jot down everything and we'll put them in the notes too
but there's another way to do this I could also use an implicit join so that was
an explicit join because I literally typed the word join multiple times at that
but let me go ahead and select the title from these
three tables people stars and shows and this might just be nicer because if
you know what tables you want to select data from just enumerate them
separated by commas which you might prefer in your mind where the people
ID equals the Stars person ID and the Stars show ID equals the shows ID and
the name of the person equals Steve Carell so this is an implicit joint and
honestly I constantly reference my notes for some of this stuff too it's not the
kind of thing that's going to come like this to you
after just one day but it's just a different way of expressing the same thing I
want to select data from three different tables and hey SQL here is how I
want you to line those tables up so that I can get like related data for Steve
Carell and this now will achieve the same results ultimately let me hit enter
H and there we go so a little slower and performance might vary based on
computer based on uh implementation of SQL but I think I still have the
same answers now suppose as I'm often do and I had to
look it up again last time suppose you forget what uh how to spell Steve
Carell's name is it two RS two L's or the like well I could also do something
like this select well let's just keep this simple select star from people where
name equals I've been deliberately getting it right so as to not embarrass
myself that's the Steve Carell I keep querying if you forget well you could try
searching for just Steves but there interestingly there's a bunch of Steves we
don't know when they were born uh but
that's probably not the Steve Carell we want if we don't have his last name
so I could alternatively do well it's Steve and then it's starts with a C I think
well it turns out there's another wild card you can use in SQL we used the
asterisk to select all of the columns you can in quotes use a percent sign to
say see something so there's some zero or more characters after the letter c
and now this doesn't work cuz now I would be literally looking for Steve
space c something but recall earlier I
mentioned that one other keyword which is for fuzzier matching so to speak
where it's like not exactly what you're looking for but it's like what you're
looking for if you instead say where his name is like Steve space c something
now we'll get back a whole bunch of Steves but I think now I could probably
find the one I'm actually looking for if I don't remember his name you can
use multiple percent signs if you forget what his first name is you could
reverse the order but that too is a uh very
powerful sequel feature at that questions on these queries here yeah sorry
what about it oh yeah sure so the query I used here there's a lot of Steves
whose last name starts with C oops too far uh the last query I executed was
this one here so where the name is like quote unquote Steve C percent sign
so that's just another tool for your toolkit here but you'll you'll perhaps have
noticed that those two the prior to that query the joins I did were sort of slow
and honestly this database isn't even that
big like yes it has tens of thousands of rows in it but like in the real world and
most of the apps you and I use a lot every day or websites like there's
Millions even billions of rows of data and like if I had to wait on like my
computer here or my code space like a second or two to get the data like
that's not going to work for millions of users or customers certainly so how
can we actually improve things well it turns out another upside of a proper
relational database is that it's not
just a spreadsheet where the onus is on like you to like find the data you're
looking for you can also tell the database to index the data for you an index
is an an efficient uh cheat sheet for Finding data fast like books in the real
world often have indices at the end of the at the end of the book where you
can look things up alphabetically and then you can cross reference it for the
pages that that topic appears on same idea in a database if you tell the
database in advance that you want to
search on a certain column frequently you can tell it to build a fancy index
that will just allow you to search that column Faster by default these columns
are going to be searched most likely via linear search like not even binary
search because the data might not be sorted because it came in in any order
but if you create an index you're probably going to get something closer to
logar rmic than linear and that's going to be a big plus overall so let me do
something simple here first let me
turn on a SQL light specific feature that just is going to time all of my queries
by writing do timer on I just want to keep track of uh how long each of these
commands takes this one is not a slow command so this is just going to be
relative but let's just select everything from the shows table where the title
thereof is the office let's see how long this relatively simple query takes all
right not very long at all in real terms like less than a second .035 seconds so
not slow by any means but if you've got hundreds
thousands millions of users like every one of those milliseconds could very
well add up so can we do better well we can if I do this if I use syntax like this
once in the beginning of the design of my database I create not a table but
an index with some name on a specific table on one or more columns I can
give a clue a hint to the database in advance saying please optimize with
some Secret Sauce searching or selecting on this column in this table so that
my searches are faster so let me do this let me go
back to vs code here let me create an index called how about title index I
could call it anything I want but I want to search faster on titles so I'm going
to call this a title index where uh rather uh title index on the table called
shows and then in parentheses is the syntax the column called title so again
I've just borrowed this canonical syntax and I've just translated into
something that's TV show specific all right what is this going to do for me
once I hit enter this is going to create
children or more but the effect of that if you have a very wide tree the upside
is that it's like very short like it pulls the data higher up closer to the node to
the root node and recall that the root node is where we began our searches
in the past whether it was a BST a binary search tree even a try or data
structures we always began at the top so the higher up you can pull the data
even if it makes the data structure very wide you're going to be able to do
boom boom boom look up queries look up
data probably much faster certainly than if it's just a very long list like a
column by default so with that said let me go back to vs code I didn't create
the index yet let me go ahead and hit enter and create it all right it took a
minute a moment it took like half a second which obviously is not that slow
but with more data that could have been even slower but it's a onetime
operation as of now and now let me hit up and let me select the same data
from shows where title equals the office last time just a
moment ago it took 0.035 seconds no not slow but also that's going to add
up if I have lots of users of IMDb let's go ahead now and execute the same
query again how long did that take 0.01 seconds now I mean practically
nothing and so that's the sort of opportunity now when you've got lots of
data and you want to really speed up these searches these indexes these
indices that just create for you these magical data structures in the
databases memory it allows you to search on columns that you are pretty
sure you
want to search on more effectively Now by contrast if you've ever used like
Google or Bing or some search engine that has advanced search some of
those text boxes that you can search more precisely in might very well be
slower why well probably you don't want to go crazy and just index every
column on every table why well what might be the intuition like if logically
indexes speed things up why not index everything there's always going to be
a tradeoff here what might that be yeah yeah it's going to take a lot of
storage right this is just a slide on the screen but like this has to go
somewhere like this needs space in the computer's memory or on the hard
drive or the like and that's fine if you have unlimited space but odds are like
you don't and that's going to get expensive for different reasons so maybe
you only want to index certain columns and certain tables and not all of them
because you know what even if a user really wants to search maybe VI
advanced search on some other column or table
altogether fine if once in a while a query is slow like we're probably getting
the bigger bang for our buck by optimizing the common cases the more
popular queries that people actually care about too all right so let's come full
circle and bring this now back to uh how how we actually began which was
with some uh python code so it turns out these are not either or decisions it
turns out in the real world developers are constantly using one two three
languages at once in fact next week I rattled off HTML CSS
and JavaScript one of which is a proper programming language but those
languages are often used together totally normal and common to use Python
and SQL or Java in SQL or Swift in SQL or any number of different
combinations with a database language you might use use your preferred
programming language Java python C++ to create the user interface and the
logic that builds the uh the that implements the program itself but for your
data like SQL is a really good candidates and indeed we've seen already
that SQL can just speed up certain operations you can change you can uh
collapse 15 lines of code into just one and you can use these things together
so let me come back to I'm going to quit out of SQL light I'm going to
minimize my terminal window and here's where we left off before with
favorite .p with favorites. everything was being stored in uh favorites. CSV
and recall that we eventually imported that CSV file into favorites. DB
automatically with import just so we could start playing around
with SQL but we can now tie these two together and a way to do that is as
follows um cs50 has a library for python you might recall having available uh
get string get int get float you don't strictly need to use them in Python
because it's much easier to just use the input function and then try accept
and convert things to int or float or the like but it's a lot more work to use
SQL in Python without a third-party Library a lot of the commercial options or
popular open source options are actually
just complicated to use so cs50 does have a very useful function inside of its
library for python that you should use and must use for the problem set that
just makes it easy to execute python uh execute SQL inside of your python
code but it's built on top of a very popular open source alternative so you
can use that too in the real world so the documentation for that at this URL
here but I'll show you what we need to know here by focusing back on
favorites.i so what I'm going to do here
as follows is this let me delete everything from favorites. py except for let's
say uh this from cs50 import SQL in all caps so that's importing a SQL feature
from cs50's library that's going to allow me to open a DB file in code how do I
do that well let me create a variable called DB for database though I could
call it anything I want let me call this SQL function and pass in using special
syntax that's not cs50 specific it's an industry thing SQL light colon slash
slash slash unlike every other URL
you type this one literally has three in this context here and then the name
of the database which in this case is favorites. DB so this is just a way of
telling this SQL library that we wrote but that works exactly like third party
Alternatives openen favorites. DB using the SQL light technology if you will
all right let's just ask the user a question give me your favorite um uh
problem so we're going to use input instead of get string but we could use
get string but they're pretty much the same for our
purposes let's ask the user for their favorite and now in Python code let us
select from favorites. DB all of the rows where students specify that problem
as their favorite so in SQL alone it would be this select uh star from favorites
where problem equals and I'll do um well whatever my favorite's going to be
like problem equals Mario for instance so if I were just using SQL I would
literally write something like that but I'm in a piy file now like I have to use
Python syntax but python
supports strings SQL is just text it's just a string so I could certainly just put
my SQL code in a string perhaps and then pass it to a python function and
here's the bridge between the two if you just treat SQL as any old text we
can put it in a string and execute it so let me actually do this let me go
ahead and create a variable called rows which is eventually going to contain
all the rows from the database let me go ahead and uh select db. execute
this is the one function you need to know about inside
of cs50's library and it literally executes a SQL statement and then in quotes
you pass it literally what you want to execute and let me go ahead and close
the parenthesis at the end there and now let me just try this so for Row in
rows let's iterate over all of the rows let me go ahead and print out how
about uh row quote unquote and what do I want here uh let's print out the
Tim stamp of that person for kicks all right let me open open my terminal
window python of favorites. Pi crossing my
fingers here for sure enter uh uh there we go favorit so I'll type in Mario okay
so I got back it's not very interesting but I got back all of the timestamps of
students who typed in Mario that we imported into this database well what I
really care about is how popular Mario is so let me change this a little bit let
me change this to count the number of rows and let me keep it simple let me
give an alias like I proposed earlier like as n where N is a number so that now
down here I can actually just do this print out the
to the very first row and only row that came back and now print out that rows
n column let me rerun the program I'll type in Mario again enter and I still
see 39 so this of course I don't strictly need to do this I don't really need a
variable I can do Row Bracket Z instead but let me focus on what this library
is now doing so per the documentation what the C what the cs-50 execute
function always does for you is it returns a list of dictionaries so if your query
returns nothing like no matches you get back an
empty list like Open Bracket close bracket nothing in it any Loop is not going
to execute anything useful because there's nothing in it if though you get
back one row you're going to get back a list of size one in inside of which is a
single dictionary that dictionary is going to have keys that correspond to
whatever you selected be it the columns or the count so when I selected star
before I could have like I would have gotten all of the columns that's how I
was able to access timestamp here I'm
just selecting count and I don't want to have to type this down here that
would just look kind of atrocious it would work but it would look weird to just
keep retyping count peren star close pen so I just created an alias called n
just to like make this my life easier or cleaner down here so to be clear the
SQL the cs50 execute function returns a list of dictionaries when you're using
select and that is how I can now get back the first and only row and then
print out that Row's n value it is identical to
let me do this let me highlight this whole line of text let me in my terminal
window run SQL light three of favorites. DB like we did before break let me
just copy paste this query enter that that's the table I got back earlier when
we played with SQL manually and so when I get back this table here's the
key here's the value and I only have one row which is why I'm just blindly
indexing into rows bracket zero because I know there's always going to be an
answer there it's going to be zero or one or
more but I know now it's going to be called n because of this here so what
have I just done well this is SQL down here and this is just me being like a
data scientist asking questions about my data just using like black and white
SQL queries this is me now being a python programmer who wants to talk to
a SQL database using Python and The Bridge we're using happens to be the
cs-50 library but again there's third party free libraries you can also use as
well ours is just very simple and indeed the
documentation will explain how execute behaves a little differently for inserts
updates and deletes you don't get back a list because you're not selecting
anything but you do get back some return values questions on this that's the
last of our python code that ties everything together in spirit yeah uh what
this this one here yes so db. execute by definition returns a list of rows and
each of those rows happens to be a dictionary because it's convenient key
value pairs if I'm
selecting the count of rows I just know from Having learned squel an hour
ago that this is always going to give me a single row whose column in this
case is called n so if I know it's a single row I can just blindly just like in C go
into that list or an array in C and go to the first location and then treat that
as the single row what you don't want to do is this even if you the human
know the query returns one row you can't just magically change the variable
name to be singular and expect to have only
one value you will always have a list so even if there is only one value in it
it's up to you to do something like this to get at it or if you prefer more
succinctness you can do rose bracket I bracket n that'll achieve the same
thing without a variable yeah good so I have been misleading this whole time
and cheating because this is only ever going to return Mario I'm ignoring the
favorite that the human typed in here on line five so let me fix that and
that's going to lead us to some
of the problems that arise ultimately with SQL the right way to solve that
problem let me get rid of my terminal window here the right way to solve this
problem is not to use an F string like we did in Python generally because SQL
queries as we'll see in a moment can be dangerous when you want to plug in
users uh data into a query that you've written most of in advance you should
you must you had better use a placeholder namely a question mark in this
case this is somewhat specific to cs50's library but
we just borrowed the convention that like every other Library uses too in the
world of SQL single question marks are used as placeholders and the way
you do this is as follows if you want to plug in a value for that question mark
just like in print F in C you specify as a second or a third or fourth argument
all of the values you want plugged into this so in C weeks ago we were using
percent s same exact idea in SQL it's a question mark that you use instead
this now if I open back my terminal window and I run
uh python of favorites. type in Mario I should still get 39 but now I can also
type in scratch perhaps and get 44 for that very first piece at zero and that
one is even more popular here so this now is correct it would work to use an
F string here and then plug in a value like favorite here but you'll see in just a
moment don't do that you will expose yourself to potential hack or attacks
um by trusting the user's input and so in fact let's transition from that to
exact some of these kinds of
challenges namely two before we wrap up so in the world of SQL especially
when it's used at scale at the Twitter and the Googles of the world like a lot
of lots of data is probably coming into the database all at once because
multiple people are opening their phones at the same time around the world
they're clicking on the same links roughly at the same time around the world
when you have thousands of people all using your site at once like order of
operations is going to be important but unfortunately
in SQL and in other contexts of computing there's this risk of what's known
as a race condition so for instance has anyone ever seen or liked this this is
like yes the world record egg or it's like this thing that was very popular while
back it's still kind of going strong but if you go to the Instagram profile for
world record egg uh the goal was to make the most most liked Instagram
post ever and they did pretty well it's just this it's just a picture of an egg
now at the height of
the popularity like there might have been hundreds thousands tens of
thousands of people clicking pretty much at the same time on this egg so it
actually creates a potential problem with the Integrity of Instagram's data
why well if you're have all these requests coming in at once how do you
possibly keep track of all of them and update your counter in a way that can
keep up with all of that traffic why well let's just hypothesize what meta
formerly Facebook was doing underneath the hood with Instagram if this
were
their code so suppose for the sake of discussion that Instagram servers are
using a mix of python and SQL probably not using the cs50 library but they
could absolutely be using those two languages or two others together um
suppose they do this in order to update the number of likes for that post they
first execute a SQL query like select the current number of likes from a table
called posts where the ID of the post equals whatever the unique identifier is
for that spec specific egg in the table
and then they store the result in this row variable just like I did and then they
do this they grab uh they create a variable called likes they set it equal to
rows bracket Z so the very first row in the result set and they get the likes
key so this is literally what I just did with the count let me hypothesize that
Instagram does something similar with the total number of likes why are they
doing this because they then want to execute a third line of code that
executes update the posts table set the
new number of likes equal to something where the ID of the post equals this
other thing now notice just like in printf there's the comma separated list of
values they want to update the current number of likes from the current
value to the current value plus one so it's likes plus one and then we plug in
the ID for this so suppose this is what Instagram's doing unfortunately
whenever you execute multiple lines of code independently and you're so
popular like Instagram that you have thousands
human might but at a super speed here the problem though is if these lines
of code get interrupted what could go wrong well suppose that Carter and I
both click the egg at the same time and suppose the current number of likes
back in the day is 100 that stores in this variable the value 100 but if we click
so close in time we might get back the same answer to this select query as
of that moment in time when David and Carter clicked it had a 100 likes but
then this last line of code is executed for me and then maybe Carter because
that answer the state of the database was stored in this variable then both
Carter and I will result in this line of code being executed with the same
value update the post table setting the likes equal to 101 for that post's ID
why because again if each of these lines of code running on different servers
are checking the value of the current number of likes but then getting
interrupted because Carter clicked the darn thing too and then resuming
their work on my behalf we might have a race condition where the
code is sort of racing to finish but getting interrupted by other users clicks
and the problem with that is that if you are inspecting the value of some
variable or in this case a database cell and making a decision based on it like
how to update it you might now lose data and Instagram probably not good
for advertising if they're losing likes and so that's probably a problem not to
retain the value 102 and instead insert the number 101 twice it's actually
similar in spirit to a story that uh was
told in a databases course I took myself years ago whereby uh it's somewhat
analogous to kind of a contrived scenario involving like a refrigerator and this
is the closest thing to a refrigerator we could get on stage but imagine
you've got like one of these little dorm fridges in your dorm 2 and your
roommate and maybe both of you as the story was told to me really like milk
and one of you is at class but the other of you comes home and you open
your dorm fridge and you're like oh darn it we're out of milk and so
you close the fridge you walk across the street to CVS or some other store
and you get in line to buy some milk Meanwhile your roommate gets out of
class they come back to your dorm room they're really thirsty for some milk
they open up the fridge they say oh we're out of milk and then they take a
different route perhaps to CVS or some other store nearby get in line to buy
some milk fast forward some amount of time in this very contrived story and
what happens oh damn it we now ended up
with two gallons of milk and there's no way we can fit gallons of milk in there
let alone two of them so that's a problem but what's the relationship to this
here well both of us yeah did what exactly exactly so to summarize both of
us had a very similar thought process made a similar decision based on the
same information not realizing that the information the fridge was in the
process of being updated and of course in the Instagram World happens like
this in the fridge World it might take a few
minutes but the problem is ultimately the result of our having made a
decision about the state of the world and the state of the world was in the
middle of being updated the queries got mingled with others or in this case
someone was already on their way to the store so what's the solution in the
real world well you could you know very simply like take a Post-It note and
put like gone for milk so as to communicate to your roommate that they
should not inspect the value of that variable and make a
decision on it why because it's not yet consistent with the outcome that's
about to happen you could be more traumatic and you could actually lock
the fridge somehow put a pad lock around it or the like so they can't even
get in there and that would achieve the result uh the same effect too and
that is actually pretty much the solution to this problem in code too it's not
safe it's not sufficient to only execute three lines of code like this rather what
you probably want to do is use additional
SQL keywords that we won't spend much time on in the class itself but these
there are solutions to this problem you can begin what's called a transaction
and you can more explicitly commit to making a decision like updating the
database to 10 1001 or 10 1002 or if you realize wait a minute Carter's query
inist interrupting mine let me roll back to the previous state and just uh
rewind let me undo contrl Z if you will there's also another keyword that's not
so much used anymore in SQL which is locking you
could literally back in the day lock the entire database table preventing
anyone from updating it or making changes or even reading it while
someone else was accessing it that was a very heavy-handed solution
because it slowed everything down but in short transactions are now a
feature of SQL that you won't necessarily need to use yourselves that do
solve this problem by doing the equivalent of saying while David's like
counter is in the process of being updated keep Carter at Bay ideally briefly
and then let his data go
through too it's equivalent too to putting a note or a lock on the fridge and
indeed I mean lock litter they they were Once Upon a Time called and still
are in some texts called locks on databases 2 and the last Pro and the code
for which you might do this is almost the same you simply wrap the three
queries uh with a transaction statement and a uh commit and the term of art
here is that this makes your uh your uh statements Atomic so Atomic means
they're either all executed or not at all that is they're all very tightly
implemented like Harvard key login allow you to type in your email address
of course and your password but suppose that they are using SQL
underneath the hood to check your username and password to make sure
that you are David menen or Carter zeni or whoever you claim to be I haven't
shown you the syntax yet but it turns out that in SQL Das Dash is a special
way of indicating a comment it means ignore everything to the right so it's
just like SL slash and C or the hash symbol in Python dash dash just
means ignore everything to the right and we've of course seen single quotes
So one way to wage a SQL injection attack is to try to inject malicious SQL
code into someone else's database without them realizing it how do you do
this well suppose I log in as M [Link] single quote Das Dash I'm not
double quoting anything clearly and there's nothing to the right of the dash
dash anyway but it this imbalance is going to be useful why because if I'm a
hacker and I'm presuming you know someone at
Harvard probably is using SQL uh single quotes to wrap the user's email
address and wrap the user's password what if I try to like complete their
thought for them and close one of those quotes for them what might happen
well we could do this here for instance let me hypothesize is the code that
Harvard wrote hopefully not underneath the hood so they're using CS library
and Python and they're using SQL inside suppose that they have a query like
this select star from users where username equals uh
question mark and password equals question mark and then suppose they
just plug in whatever username and password was typed in and then if they
get back some number of rows dot dot dot they assume I am David they
assume Carter is Carter if both the username and password are in the
database just end of story there this is good this has the question mark
placeholders we discussed earlier but what if you don't quite remember that
you don't quite take that to heart and you use your more familiar last week
F strings whereby we use these curly braces to plug in values what if you do
this instead so it's almost the same idea it's still DB execute but now it's
select star from users where username equals and now notice I'm doing the
single quotes which is required by SQL but I'm using F strings with the curly
braces and the password equals single quote password and then close single
quote the problem is if you're just blindly pasting equ effectively the user's
input into that web form into the
username field and the password field there's nothing stopping a malicious
user student faculty staff from including a single quote in their name or
maybe even an uh uh you know benevolently if their name happens to have
a single quote as some last names in particular do so this is very fragile why
well suppose that if we plug in my malicious value Ma at [Link] single
quote-- notice what happens to username here the username variable inside
of the curly quotes will get replaced with this
why because I provided the single quote that's going to finish the thought of
that first single quote and now I would only know how to do this if I saw the
code or if I just randomly try putting apostrophes into web forms and see if
things break that's often how adversaries attack systems they type in
potentially dangerous characters hit enter if something breaks they're not
necessarily into the system but they know that there might be a vulnerability
and then they start trying more
character in the username or password so the library takes care of this for
you because you're plugging in the username and password as separate
arguments and then we or the third party you're using actually sanitize that
is uh clean up the data and prevent those bad characters now this is kind of
an internet meme that went around for a while um if you've ever uh driven a
car been in a car where there's like the automatic reader for tolls uh this
person thought it might be funny to try
doing something like this what are they presumably doing the presumption
here is whether or not it worked as unclear is that here's like the end of a
actual license plate number but here's an interesting single quote and a
semicolon that's especially bad because it means you can maybe execute a
second query on the database this is someone having fun trying to drop the
entire database table for whatever municipality is scanning through cameras
uh their license plate code and I would be remiss if we didn't
end on this note at least in computer science circles um there is someone
named no relation to the TF name we put in the database earlier um little
Bobby tables which ends with this XKCD comic and if you chuckle if you
laugh you're now legit SQL programmers nice nice like every CS student out
there knows about little Bobby tables so if you name drop little Bobby tables
now like you're in all right that's it though for today we will see you next
[Music] [Music] time [Music] [Music] [Music]
[Music] [Music] [Applause] [Music] [Applause] all the way to the top and then
you're passing down this is for you Yale we love you Yale we're here to Har go
har [Applause] down it's going to happen it's actually going to happen I can't
believe this what do you think of Y they don't think good can't does everyone
have it does everyone have their stuff does everyone have their stuff
probably that it's going to beable very small I know what houses how many
exra how many
EXT no F forer yeah just make sure everyone has pass all the car distributed
all right we can do it [Applause] [Music] now what do you think of Y sir go go
one more time one [Applause] more there goes [Applause] again [Applause]
all right this is cs50 welcome to week 8 last week we learned how to create
read update and delete databases using squl but this week Adam everyone
happy Halloween all right so this is cs50 and this is week eight already my
thanks to Adam on today this happy Halloween uh in
the coming moments we're going to learn all about how the internet itself
Works which of course is a technology that like we're all take we all use every
day probably using in some form right now but we'll see that if you start to
understand some of the underlying building blocks that power the internet
itself we can actually start to build interesting things on top of it and a lot of
the apps the websites that you all use every day should become all the more
familiar things that you yourself
VES can create and honestly when things go wrong you'll have all the more
of a mental model for how things work or are not in fact working so that you
can ultimately diagnose diagnose all the more issues yourselves so if we take
a look at the internet in the early days it pretty much was just this this
happens to be of course the geography of the United States and just some of
the first uh points on the internet were these here this was so-called arpanet
back in 1969 and indeed the internet had
its Origins here in the United States with just a few computers interconnect
somehow initially that of course began to grow over time such that we
eventually had the West Coast connected to the east coast and nowadays
what you can think of these dots on the screen is representing are these
things called routers sort of computers or really servers that somehow have
wires or maybe wireless connections between them that allow data to flow
from point A to B to C and then this of course has been now
magnified across the entire Globe um and even above ground as well so that
we can connect all the more readily uh to systems anywhere now in order to
Route the data from one router to another we need to somehow make
routing decisions and this is the kind of thing that the internet service
providers the isps of the world just handle for us you and I plug our Macs our
PCS into the network here at Harvard or equivalently at Yale or we somehow
get online via Wi-Fi or cellular technology and then some of
these larger entities these bigger companies or countries handle most of the
data getting from point A to point B and if you think about what these routers
present they're indeed just servers somehow interconnected not unlike this
grid of tiles here for instance back in the zoom days and in fact here we have
I claim a grid of routers implemented here by the courses teaching fellows
and course assistants and Tas and if the goal at hand for instance is for
Phyllis to Route some piece of information maybe it's an email
maybe it's a request for a web page in the bottom right hand corner all the
way up to say Brian here in the top left hand corner suffice it to say each
each of these tiles represents a router a server that can move the data back
forth left and right that packet of information so to speak from Phyllis to
Brian could take any number of different possible routes up down left right to
go from the one corner to another so let me go ahead and hit play on this
video here we're in the teaching fellows play the
same [Music] role all right so in this particular case the data was routed
pretty straightforwardly up and then to the left but suppose that one or more
of the staff were a bit busy maybe one of the routers is congested that is to
say just got way more envelopes at a moment in time that it can handle
thankfully the design of the internet is such that there's often multiple ways
that data can get from point A to point B maybe going through Point C or
Point D instead and so there's a resilience there even as some
certainly the more common of the two perhaps in common culture so what
does TCP and IP do for us well really two primary things any computer or any
teaching staff member who understands tcpip knows how to get data from
point A to point B but how well let's break down what that problem to be
solved is IP otherwise known as Internet Protocol is a protocol that computers
speak that allow them to know how to address computers on the internet
and a protocol is just a set of conventions that computers adhere to so
someone wrote
code that probably has a whole lot of conditionals that tells the computer
what to do if something happens like if I receive a packet then send it to the
next server or something like that in the human world we have protocols too
you know pre in healthier times it was quite common to sort of extend your
hand to another human in order to greet them and if they're following human
protocol they would presumably grab your hand and shake it at least in a a
culture like this one here on campus and now that is
dictates that every computer on the internet have a unique address of this
form and this too is probably something you've seen in the real world even if
you haven't thought too hard about it it's a number and what's called dotted
decimal notation which means it's a decimal number do something do
something do something so four digits separated by convention by decimal
points although there are newer and bigger versions of the same and these
so-called IP addresses that might be as simple as
think back to week zero which gives us eight bits plus another eight bits
another eight bits and another eight bits which is to say an IP address
typically is 32 bits in total now if we do another bit of quick mental math or
think back to week zero if every IP address is 32 bits how many computers
can we have on the internet at once give or take roughly 4 billion is the
ballpark and we don't need to be super precise for discussion's sake but
roughly 4 billion is how high you can account assuming no
negative numbers if you have 32 bits in total now that's not terribly uh many
number uh numbers of addresses especially considering the number of
humans in the world the number of us that do have laptops or desktops or
devices more generally phones in our pockets and the like so let me just
stipulate for today's purposes that there's even a newer and improved
version of Ip otherwise known as version 6 this is version four but still super
popular version six uses 128 bits which is a huge number of possible
premutations I dare say I can't even pronounce that number it's so big so
there are ways around even this limitation already so every computer has an
address like this what does that really mean well suppose that I was Phyllis in
the story told visually earlier and I want to send a message to Brian well both
Phyllis and Brian have I IP addresses and suppose that Brian's IP address
happens to be [Link] in that top left hand corner well phyllis's Mac or PC or
phone would essentially do the
equivalent on this human envelope by writing the two address in the middle
of the envelope as is our human convention like this so this is an envelope a
piece of information an email a text message whatever destined for Brian
and so she would have her computer put Brian's IP address in the middle her
IP address is Maybe 5.6.7 do8 so just like our human convention I might write
5. 6.7.8 at the top of the envelope 5678 thereby indicating what the return
address is and this is helpful because if Brian's computer needs to
acknowledge
receipt if he needs to reply in some form this way the envelope has all the
information we need but in the real world servers do a lot of things nowadays
not just email but maybe chat Maybe video conferencing maybe any number
of other services as well and so it turns out that an address alone might not
be sufficient because how do how does Brian's computer know when he
opens the envelope so to speak that this should be interpreted as an email or
interpret it as a chat message or interpret it as like a video attachment
that Phyllis has sent well we need some other mechanism some other hint on
this envelope to distinguish one type of Internet service from another and so
that's where the other acronym in TCI pip comes in which is TCP so this
stands for transmission control protocol which is just a different set of
conventions that computers adhere to in order to solve a couple of different
problems one is this problem of distinguishing one type of service from
another now what does that mean well humans decades ago
decided as they started inventing all of these various internet services the
web being the first one how they might or the web now being one of the
most popular ones they decided to assign different services that can be used
on the internet unique numbers and so two of the most common are these
80 is the number that a bunch of humans decided years ago will represent
what you and I know as HTTP and we'll talk more technically in a bit about
what HTTP is but obviously it's the thing that's in
the beginning of every URL nowadays or https which of course has the S
added to it and that has its own unique number and for now the S just means
secure one is encrypted or scrambled somehow for privacy sake and the
other is unencrypted it's a little more vulnerable to interception so these two
numbers are what the world decided when implementing TCP shall uniquely
identify those services so what does this mean well this means that if Brian's
computer in the story from before is hosting not
like an email server but maybe he has a website and Phyllis is requesting
Brian's homepage or something like that she would have her Mac or PC or
phone not only write Brian's IP address in the middle of the envelope but also
the number otherwise known as a port number that she wants this envelope
to be routed to now 80 would be insecure nowadays HTTP col is sort of p and
we almost always see https colon now so I'm just going to go with best
practice and I'm going to add a colon and then the
number 443 at the end of Brian's IP address so now I have an IP address for
Brian the port number for the service that this is relevant to and I'm not
going to bother writing it but it turns out that phyllis's computer would also
choose a port number maybe a random port number so that Brian can
conversely reply and then the computer can know which response is in uh
coming back for which request but the most important one is this one in the
two field whereby this distinguishes this from like an email a
you're downloading a really big file meant that no one else in your dorm
room or your household could actually download anything until you're
actually done so of course multiple people nowadays can be on the internet
at once even if all of the connections are a little slower but like one person's
usage does not block someone else's now how does this work well TCP in
conjunction with IP can also allow you to take like a really big image of a cat
which is the internet of course is
filled with and take a big image of a cat or a big video file of a cat and
fragment it into multiple pieces so I'm just going to sort of roughly tear it
down the middle and then maybe tear it down the middle again so now it's
four different fragments and I'm I'm sorry but the computer will be
reassembling these for us and what phyllis's computer could do now if she's
like uploading this picture of a cat to Brian's web server well she could put
one fragment in this envelope and then have three
separate envelopes for the other three fragments and what you could then
do on the outside of this envelope is just kind of number them somehow and
in fact this is something else that TCP and IP together would do for us this
first envelope now might say something like one out of four in the memo
field so to speak of the uh metaphorical envelope here now this should be
enough information because now if Brian gets all four of these envelopes he
presumably knows how to reassemble the picture of the cat in order top to
bottom left to right but more importantly suppose that one of the routers one
of the TFS in the video is sort of distracted and they sort of drop one of the
packets and that's a metaphor actually in practice for when a router gets
really busy it's got way too much data coming in it might metaphorically drop
packets what does that mean in practice I mean it literally just ignores the
zeros and ones it doesn't save them to its memory because there's just no
room left so it's equivalent to
sort of dropping the packet so suppose now that Brian gets one of four three
of four and four of four what can his computer infer now after receiving those
three packets one of four three of four and four of four what's the use there
yeah I think you're you're signaling with your fingers which one did which
one can I call on you yeah so he's missing two out of four the second of the
packets and this is useful now because you could imagine he can send some
message back to Phyllis saying hey please retransmit
number two of four without having to redownload the entirety of the cat so
there's there's an efficiency there as well so tcpip allows data really to go
from point A to point B while solving a bunch of these problems along the
way so nowadays if you ever see mention on your Mac or PC of your so-
called IP address that is the sort of problem that's being solved questions
now on these protocols these conventions called TCP and IP that's the extent
to which we'll need to understand them won't have to
implement them per se we'll just take them Hereafter for granted any
questions that you've ever been wondering wondering about your home
network yeah a really good question uh how does TCP know that a user got a
message another aspect another feature of TCP is that Brian's computer by
design of this protocol will also acknowledge the packets that he's received
and it will do it efficiently if Brian receives all four packets in a pretty narrow
window of time his computer will send to
phyllis's computer a quick message saying essentially received all four
otherwise he'll say the opposite which is that I'm missing for for instance two
out of four and that just ensures ultimately that all of the data is indeed uh
arrived so that you're not missing like uh a quarter a quadrant of the cat in
question all right but th that's not the only problem that needs to be solved
ultimately we also need to make the internet userfriendly if you will and it
would be really tedious if
you had to visit websites for instance by way of of their IP addresses right
[Link] is pretty memorable but there's like 4 billion other possible addresses
available and it would be super tedious to remember those it would be bad
marketing to advertise those in fact most of you probably don't even know
the phone numbers of your closest friends and family members anymore
because you instead store them in your contacts you're in your dress book
associating with numbers that are completely opaque
with actual names or strings if you will the same is goes for the internet too
even though every computer does have and must have a unique IP address
numerically why well routers or computers computers just crunch numbers
very readily but we humans work better with strings of text we need some
system for converting userfriendly strings like [Link] or [Link] or
[Link] to the underlying IP addresses and that's where the next
acronym comes in today which is DNS domain name system so this
is just another technology that's been in use for some time now and it's a
collection of servers on the internet that whose purpose in life is to convert
domain names to IP addresses and maybe vice versa as well so let me
stipulate for today's purposes there are some root DNS servers in the world
that long story short know about all of the Dooms all of the edus all of the
dot dot dot all of the other top level domains around the world as well as in
the US and then there are some smaller DNS servers owned
DNS servers all over if you poke around your settings in Windows or Mac OS
or Android or iOS you'll see mention of DNS and you'll probably see the IP
addresses of the servers whose purpose in life is to do this conversion for
you but this is a requisite feature if we just want the internet to be user
friendly and allow us to use words instead of numbers alone what's inside of
these DNS servers you know it's essentially a spreadsheet or if we can say it
more geeky it's essentially like a hash table of some
sort which it has keys and values like the key is the domain name
[Link] gale. edug [Link] and the value is the corresponding IP
address or in many cases IP addresses plural of the corresponding servers so
here already even though I've drawn it fairly abstractly like you would on a
chalkboard it's really probably implemented as some kind of table maybe a
hash table maybe a database table maybe SQL or something like that or
maybe it's even just a link list or an array we just have to somehow enable
this computer to convert one to the other now just to be super precise DNS
servers actually convert what are called fully qualified domain names which
is generally not just [Link] but more verbosely [Link] and
[Link] so the whole thing that you would see as a substring of the
URL so that's what DNS does and that's what your University your company
your home router are doing for you let me pause here to see if there are any
questions this to is just a technology now we'll take for granted just
works questions at all all right so let's now transition among our protocols
really to the last for today which will set the stage for actually solving
problems with these and writing some code ultimately um HTTP this is
something that you see or hear all day long even though you rarely have to
bother typing it anymore odds are if you go to [Link] [Link]
[Link] you don't bother typing HTTP let alone https manually anymore
why because your browser Auto completes that kind of thing just to make life
easier but it is
officially at the beginning of every URL you visit either HTTP or the more
secure https whenever you're using your browser to access some website so
HTTP stands for hyper text transfer protocol and it's uh easily one of the
most popular dare say one of the most powerful features of the internet
nowadays but the mental model to have here is that HTTP or the web more
generally is kind of a service that runs on top of the internet and maybe
Zoom or Microsoft teams is another service that runs on top of the internet
an iMessage
and Technologies like it is another service that runs on top of the internet so
the internet is really like the lower level Plumbing the tcpip stuff the DNS
stuff that just gets data from point A to point B but now and we're in a a
software development class ultimately here in cs50 HTTP is the application
Level protocol it's sort of what programmers use what companies use what
uh developers use ultimately to use the underlying Plumbing to build
interesting and Powerful things so what does this
mean when it comes to accessing Services via HTTP or the more secure https
well here is a representative URL even though you might not type the whole
thing if you poke around your address bar this is what's up there with that
said a lot of browsers nowadays are kind of simplifying if not dumbing down
what you see with your human eyes just to shorten the strings especially on
mobile devices but almost always if you click the URL or highlight it then you
see the whole thing but on many browsers you
might only ever see [Link] but all of this information is there it's just
getting more and more hidden just for user interfaces sake well it turns out
when you visit a URL by default especially if you type nothing after the do
com in this case you're technically implicitly adding a single slash so a single
slash denotes the root of the server that is the default page or folder in the
server and the slash whether or not you type it or not is implicitly going to be
there and that just means give me the default whatever
is at [Link] give me that page or that folder but URLs can be
longer than this and more generally there can be a path so to speak and this
is a term of art a path is some sequence of folder Andor file names after a
URL like this and so you might see more specifically that a URL contains a
very specific file this isn't as common nowadays anymore though we will
begin to today by using this technique but if there is a file called literally file.
HTML or something else on the server that file is going to
be what this URL pulls up on the computer meanwhile you might have slash
folder slash which just means show me whatever is inside of this folder or
you might have more verbosely folder file. HTML which will show you that file
in that folder and meanwhile just to give some other terms of art this is the
so-called fully qualified domain name and again these vocab don't matter all
that much but you'll hear or see them over time we generally colloquially
just refer to this as the domain name which
is a little less precise but gets the job done certainly in conversation and this
part here I described briefly earlier what's the name for this suffix at the very
end of the fully qualified domain name the yeah yeah top level domain or
TLD and this is just some form of categorization of the URLs now the internet
in got it start within the United States and a lot of the first websites of course
came from the US and so For Better or For Worse the sort of steak was
planted in the ground so generally do indicated at least early on
it actually is a TLD from another country that lets anyone on the internet um
pay for and on an annual basis using that domain. TV for instance you might
see in some cases like [Link] and the like um that too is owned by another
country that allows others in the English-speaking World in this case to use it
as though it connotes TV but those are just different types of TLS that
roughly categorize where the domain lives but it doesn't necessarily mean
it's commercial anymore it doesn't
necessarily mean it's a Network anymore for the most part there are
hundreds of tlds now for better for worse most of which are less common
than these big ones um but most anyone can buy most of them with just
some restrictions on things like edu and goov that are still very much
regulated this meanwhile is what we might call the host name www it's
obviously a super common convention like almost every website uses www
as its host name but that's a human convention it's not a requirement and
indeed some websites don't even bother having a host name they just use
their domain to advertise their websites this now is going to be the scheme
or the protocol and this is just going to indicate via what protocol the
computer your Mac your PC your phone should use when accessing content
at that address because indeed there are other protocols you can use but for
the most part will'll only focus on HTTP or equivalently https all right any
questions now on those just definitions building blocks
of URLs just so we all sort of share a common vocabulary any questions at all
yeah what is the local sure we'll come back to this actually later today
there's a a technical term known as Local Host which is a generic name for
your computer your Mac your PC your phone especially when you're doing
software development and by convention your own computer has not only
whatever IP address you get from your University or your internet service
provider it also has a reflexive IP address one that just always refers to
itself which is [Link] and that's just a human convention humans decided
that shall refer always to your computer and it's actually going to be useful
today and onward because we can use that when development on our own
computers ultimately other questions on URLs IP DNS or any of these
building blocks all right so what do we mean by HTTP being a protocol when I
extended my hand earlier as a human handshake you know a typical human
in healthy times would know to respond in turn well how
can start to see in our own Mac or PC some of these very same messages for
instance if Phyllis were visiting not Brian's but [Link] that web server
inside of her metaphorical envelope there would be a textual message that
literally starts with get slash then the word HTTP then the version she's using
1.1 is very common two and three are becoming more common but HTTP
generally looks like this the next line of text in her envelope would probably
say host colon then literally the fully qualified
domain name of the server she's accessing just in case and this happens
super commonly especially on small websites if one server is hosting
multiple domain names multiple websites this just distinguishes which one
she actually wants and then there's usually a whole bunch of other lines of
text as well so where can you actually see this well let me actually go ahead
and do this give me just a moment and I'm going to open up on my
computer here uh an empty Chrome window in incognito mode
generally speaking incognito mode or private mode is used when you don't
want there to be left remnants of what websites you visited and it has the
effect for software developers of just forgetting any things you might have
tried already within your browser including things called cookies more on
those another time uh your autocomplete history and the like so for
development purposes incognito mode is especially helpful because it's sort
of like starting with a clean slate every time you open a new private or
incognito mode
so there's not going to be like any remnants of previous testing or code that
you've been playing with and I'm going to go ahead and do this I'm going to
go ahead and uh right click or control click on Chrome I'm going to choose
inspect and it's going to pull up this window sometimes on the side
sometimes on the bottom I'm going to move it to the bottom just so we can
see it a little more readily and I'm going to zoom in and it's going to look a
little Arcane at first and I'm going to
just highlight a few of these tabs we'll see here along the top that there's
elements console sources Network and whole bunch of other things as well
this is sort of the advanced mode in Chrome and Safari and Firefox and Edge
have their own equivalent of these features they've always been there even
if you've never clicked the right button to enable these features and I'm
going to focus for a moment on network like this this is a feature of the
browser that's going to allow me the programmer in this
case so the the engineer to just kind of look at what messages my browser is
actually sending to a server so let me go ahead and do something like this
let me go ahead and visit uh for instance uh in my browser here and I'm
going to shrink the window just a little bit so we can see it exactly I'm going
to visit https uh [Link] and now I'm going to hit enter and a whole
bunch of stuff just happened along the bottom of my screen and I'm going to
try to pull my window up just a
little bit so we can focus on a subset of this let me pull this up covering up
really the content of the page focusing on these lower level details down
here and what I want to see first is let me oh sorry let me go ahead and
reload this page here after retaining the log so that we can see absolutely
everything on the screen and to be clear I just checked because I forgot
earlier preserve log because I wanted to preserve everything on the screen I
want to see everything all at once and we'll
see this the very first line of output is completely overwhelming with detail at
first glance but what you'll see here if I start to scroll down and down and
down and down are the so-called request headers and let me zoom in here
and what you're seeing inside of chrome inside of its Network tab in its so-
called developer tools again this is just for engineering types you'll see all of
the headers all of the lines of text that magically were sent by my Mac to
[Link] much like from Phyllis to
Brian server in that story so I can see exactly what messages are being sent
and a lot of this we haven't talked about yet but we do see some mention of
get and we see some mention of Slash and a bunch of other Arcane details
but notice they're all sort of key value pairs with the here indicating what the
corresponding value is now most of this is not going to be interesting and
we're not going to focus too much on the weeds of of all of this but it indeed
gives us a sense of what's inside of that virtual
back this so-called status code just a numeric code that indicates in this case
that everything's okay and it includes this header this HTTP header which
again is just a key value pair saying that the type of this content that's
coming back from the server is text/html more on HTML in just a little bit but
for our purposes now this just means that [Link] is sending me back a
web page and indeed if we hide all of this techn techical stuff that's the web
page that we saw up here with all of the
usual imagery and the like and in fact I can see this if I scroll back up not to
request headers but response headers you'll see up here that we get back
responses including the date that the server responded and a whole bunch of
other details as well and honestly this has always been under your fingertips
and it will soon be useful as we start making web-based applications
ultimately but this very quickly gets overwhelming quickly and so better in
this might actually be a tool that we can use
within our code space itself so let me go back to vs code here I didn't open
any code tabs I'm just going to use my terminal window for a moment and
I'm going to run a couple of commands that are going to allow me to actually
see what is going on when I request one website let me go ahead and use a
command called curl for connect URL and this is like a command line black
and white program that's going to pretend to be a browser and it's going to
connect to the URL show me the headers but it's
not going to show me the images are the graphics which might very well be
useful to the humans but not to me right now as the developer so I'm going
to do curl I'm going to do Dashi and then I'm going to do https
[Link] as though I'm pretending to be a browser requesting the
home page and what's nice about curl is albeit overwhelming too you'll get
back a whole response from the server containing only those header values
the key value pairs inside of the envelope
and we'll ignore almost all of these but here is the response from the server
it responded using a new and improved version of HTTP in this case version
two and it gave me back a 200 there's my content type text/html and then
this Char set happens to do with the encoding if it's Unicode or asky or
something else and then there's all this other overwhelming detail for now
but this is the beginnings of my ability to just kind of poke around and see
how the server works and it turns out too that
we'll be able to see other potential responses as well so for instance uh HTTP
might not only return 100 what if I do this instead let me go ahead and visit
c-i HTTP colon uh [Link] so notice I deliberately use the insecure
version of the URL which maybe Harvard's Administration system
administration doesn't like anymore well how can they ensure that I the end
user the student nonetheless use https even if I didn't type it myself well let
me run just that command with just HTTP not https and you'll see that
everything is not okay it didn't come back with a 200 it came back with 301
in this message saying Harvard moved permanently but here's where you
can look for another clue among all of these lines most of which I don't care
about there's a location header colon that's a little hint to me that says
where Harvard University has apparently moved too on the web and what's
different about this URL just to be clear it has the S included and what your
browser will do by default because
Google and Microsoft and Mozilla programmed it this way whenever it sees a
301 response instead of 200 it won't show you any web page it will look for a
location header find that URL and then automatically quote unquote redirect
you there to so this is why it doesn't matter what we type in the browser
Harvard can have its server send these semi-secret messages to our
browsers and then it will just visit a second URL all automatically and you
can do this with host names as well suppose that Harvard
does not want to standardize on [Link] why they just want it to always
be www maybe it's a branding thing maybe it's a technical thing we can see
the exact same response here this first tells me when I visit HTTP
[Link] with no www Harvard minimally wants me to be using a secure
connection if I then okay fine cooperate let me go ahead and clear my screen
let me add the s but not the www you can see here that it again responded
with 301 up here and the location now adds the www
so it's just a way of bouncing users from one place to another and this is all
thanks to http boiling down to relatively simple messages inside of the
envelope that tell the computer the browser in this case how to respond now
odds are you've seen others besides 301 even though you've probably never
seen that actual number unless you've done this kind of thing before but
there probably is a number that like everyone in this room has seen if if You'
never really wondered why is it that number I
think you're smiling what number are you thinking of yeah so 404 why is 404
well 404 indicates by convention not found and now why the world decided
years ago to show us normal humans on the internet 404 is anything
significant is unclear that's sort of like bad design like what do I care if the
status code is 404 but it's common enough on the internet that probably all
of us have seen it but that just means that some server when you visit a URL
that's incorrect maybe it's outdated the URL has been changed if you
see a 404 it just means that the virtual envelope that came from the server
back to your Mac or PC or phone contains not 200 okay not 301 moved
permanently but 404 not found instead and it's usually accompanied by a
technical message maybe a cute picture of a cat sort of hiding because it
means not found or something like that the Aesthetics are entirely up to the
server but that's what the 404 means and there's other codes too a few of
which you'll use in the coming weeks as we transition from commandline
means like you or I screwed up when writing some code so we're going to
see that but it's just going to be an opportunity for us to fix it if a server's
overloaded you often see 503 like something's unavailable because
something's too popular uh or is maybe worse getting attacked um this is an
old uh um April Fool's joke 418 is not actually used in practice but someone
like took the time to write up an entire formal text technical proposal so that
servers can respond saying I'm a teapot
so it's kind of part of Internet lore and there's other ones of these status
codes as well but this is useful because eventually we'll see in code you can
use this understanding uh high level as it might be of HTTP to do some
interesting and Powerful things so for instance we can even send fancier
URLs to servers it turns out as we'll soon see if you send a message like this
get/ search question question mark Q equals cats and then HTTP 1.1 or
whatever version and you send that message to Google server
[Link] this is how you can specify not just the path of a web page
that you want SL search in this case the question mark it turns out is going to
be a convention in the internet in the web specifically for passing hum
human uh user input to the server as well in fact you've probably never paid
close attention to URLs but they very often have question question marks
they very often have equal signs and indeed even [Link] supports a
certain key Q in this case for query and you can put
anything you want after that in order to search for actual cats so if I actually
go back to Chrome itself here for a moment let me pull back my uh pull back
open my Chrome browser here previously I was using uh incognito mode for
[Link] I've gone ahead and Clos that window and opened a new one so
we can start fresh by visiting Google normally you and I are in the habit of
going to [Link] and searching via the form or nowadays you just type
like your search query in the browser itself
and it brings you automatically to Google or Bing or something else but I can
really be pedantic here let me go ahead and zoom in and I'll manually go to
https [Link] search question mark Q equals maybe cats now this
would not be a very userfriendly experience if all of us had to manually type
out something crazy like that but that's what the form is redirecting you to
when you type in more user friendly cats into like a text box if I hit enter here
we'll get back indeed a whole bunch of search results
about cats if I zoom back in and maybe I change it from cats to dogs that too
is going to change and notice it's pre-populating the text box because Google
has written its code in order to do so as well now appr propo of the video
with which we began today from yester year one of the better uh Yale pranks
over the years um has anyone one actually ever been to uh [Link]
and to our friends at Yale watching live hi [Link] so it's kind of fun
if you actually visit it uh depending on
of this amazing prank that was you know against Harvard and at that point I
felt I had to interrupt and said well actually I can tell you a lot more about
that okay okay the idea was perfected in a dorm room came up with the idea
actually to prank them with signs at the football game we threw some ideas
out there as far as what what the signs would say we uh eventually settled
on we suck and my immediate reaction was no this will never work however
the problem solver in me started thinking well maybe
we can make this work the problem they had to infiltrate Harvard Stadium
without getting caught sneak in 1,800 placards distribute them to
unsuspecting Harvard fans and then convince those fans to prank
themselves it's great we thought about basically every possible thing that
could go wrong and tried to come up with a solution for it and then you put
two Reds on top of it they made fake Harvard IDs and fake backstories fake
placard designs and a 28 member fake Pep Squad on November 20th 2004 a
fake
Harvard student smuggled the placards into the game what do you think
[Music] [Applause] [Applause] of but then trouble what houses how many
how many extra are I you know just showed him the front of this ID and all of
a sudden he just ran away and he felt so embarrassed having escaped one
confrontation they couldn't risk another it was time this just looks like a total
mess we have absolutely no idea if this is going to work look at the it's going
to happen it's actually going to happen I can't believe this what was
HTML has like two features and this is a language that we spend very brief
amount of time on because it really boils down to just a couple of basic ideas
and then vocabulary that you'll build out over time just by Googling looking
up references looking at other Pages source code but tags and attributes are
what characterize HTML now what do I mean by that here for instance is the
HTML code via which you can make probably the simplest of all web pages
one that quite simply says in the uh browser window
hello title and hello body for instance now what does this actually mean if
you imagine opening up uh this code in a browser be it on a Mac or PC or
phone you'll see typically like some kind of rectangular window and there's
usually a tab that has the title of that page and then most of the rectangular
region is the web page itself what you're looking at then is the code that's
going to put hello title in the title bar in the tab at the very top and down at
the bottom hello body is going to be all that's in
the big black and white box that composes the rest of the browser window
itself now what are the Salient characteristics here that we'll now start to
take for granted well first whoops uh first let's go ahead and give me just a
moment here um and actually do something with this code so I'm going to go
ahead and do this back in vs code here I'm going to first create a file called
say [Link] and in this tab I'm going to go ahead and really repeat exactly
that same code now I had this
line first DOC type HTML then I had this line HTML Lang equals quote
unquote n close quote then I had inside of that head then I had inside of that
title then inside of that I had hello title and I'm doing this quickly because
we'll tease apart in a moment what it actually all means and then down here
below that so-called head I had just the text hello body so at the moment I
that I claim is the entirety of a web page but it currently lives in my code
space so to speak in a file called [Link] that's
fine if I want to create it but how do I how do you how does anyone on the
internet actually view it well to serve a web page you indeed need a web
server and it turns out that codespaces comes with one of these pre-installed
because we cs50 staff uh did so for you and what you can do in a terminal
window once you have an HTML file ready to go that you want the world to
see you can literally run in your terminal window http-server single command
and what that's going to do for you is start a
web server that is to say a program whose purpose in life is just to serve web
pages and even though probably up until now for years you probably if
you're like me equate server quote unquote with a physical device server is
really aie piece of software it just tends to run on big fancy devices so when
we say server we often all think of in our Mind's Eye you know big expensive
devices perhaps but a server is just a program whose purpose in life is to
respond to requests with responses and
that's the vernacular there now once you run HTTP server and I'm going to
do a bit of magic because I set this up before class just to make sure it goes
smoothly you'll see some output like this whereby your server is now
available on a very long URL mine here uh uh is a very long URL that will be
different from yours but what this is is a unique identifier that your codes
space has temporarily generated so that you can now access and ideally
only you can access that file using your browser
now if I flip the URL or you flip the your all to public by right clicking or
control clicking the right features of vs code you can enable anyone in the
world to visit it but we're not going to ultimately host our websites in your
code space because as soon as you log off for the night and the thing shuts
down like the website will go down but at the end end of the semester
particularly for final projects we'll show you ways that you can put your own
website your own code on the actual
internet 247365 even with your own domain name if you want to get one so
that it lives uh independent of your own sleep schedule and usage schedule
of vs code here so I'm going to go ahead now and visit um this URL in
another tab of my browser and what I'll see here is this this is the output of
that program called HTTP server and essentially what it is doing is it's using
TCP and IP in conjunction with HTTP to just run your very own web server on
gith hubs own servers as well and that's because of different ports
again we won't go too much into the weeds of the TCP the IP and all of that
stuff but recall that different port numbers can allow you to distinguish one
service from another now one of the services is of course your code space
VSS code in the cloud that we've been using for weeks but if you want to use
the same physical server that GitHub controls but actually visit your own web
server that I just ran in my terminal window in another tab that's fine they're
just going to be using different
TCP ports and you and I don't have to care what they are but just that this is
a feature that TCP supports so what you see here is somewhat Arcane this is
not like a thing that most people on the internet should ever see I'm just
doing this for development purposes but this is the index that is the directory
the the folder contents of my code space and because I deleted everything
from prior weeks already all we see right now is [Link] which I just
created so if I click on [Link] within this folder listing
to the slide version of the same and let me just highlight a few of these lines
the very first line is what's called your document type declaration doesn't
really matter to remember that by phrasing and this is just something you
copy paste or do from memory at the top of any HTML file that you create
when making your own web page it's a implicit indicator to the browser that
you're using the very latest version of HTML which is version five you don't
mention the number five just browsers now
nowadays are program to look for this to know that you're using the very
latest version of the language languages just like human languages evolve
over time we're up to version five of HTML but new features get added every
few years so indeed this lecture this class has been evolving over time too so
let's now focus on the next line as well as the bottom line and you'll notice
some deliberate symmetry here this here is what we're going to call a tag
and it's technically different from this this is
a document type declaration it's got the weird exclamation point that's the
only anomaly everything else follows pattern this is a tag in HTML and it's
the HTML tag and a tag generally both starts and stops or opens and closes
at some point so this is the so-called start tag or open tag and this just
means essentially to the browser hey browser here comes some HTML the
language in which web pages are written this here with the forward slash
after the angled bracket means hey browser that's it for the HTML
probably guess just means that hey browser assume that everything
Hereafter is in English and that might be useful for like Google translate or
just search engine optimization so that just the server the browser know like
what human language you have actual content in like hello title hello body
even though a good computer can probably infer from Context often all right
so that's an attribute that's a tag and the whole thing here everything in
between the start tag and end tag we would also call
an HTML element that just means everything related to that open and close
tag all right now notice indented inside of so to speak the HTML open and
close tag are another pair of tags the head tag and the body tag or the head
element collectively and the body element collectively and same idea hey
browser here comes the head of my page hey browser that's it for the head
hey browser here comes the body of my page hey browser that's it for the
body the head is essentially the tiny little strip at the very top including the
tab
itself the body is like 95% of everything else the big rectangular region
what's inside of your title at the at of your head of the web page at the
moment just the title so this indicates hey browser Here Comes My Title hey
browser that's it for the title the title of course is literally quote unquote hello
comma title meanwhile if we bounce back out here is the uh second element
inside of the HTML tag uh this says Hey browser here comes the body hey
browser that's it for the body and hey browser this is the
contents these are the contents of the body itself now the indentation is a
stylistic thing I did it just to be sort of neat and en TIY because it suggests
what is inside of what but it also suggests a sort of hierarchy and in fact we'll
use terminology from like the world of family trees if this is like a parent so to
speak head and body would be the child elements of the HTML tag
meanwhile title is a child of the head tag or equivalently tital is a grandchild
of HTML so you can use the same sort of vernacular as in the human
world when it comes to uh familial relationships too and that just H set again
the same hierarchy so we have tags and they include HTML head title body
and that's it for now we have attributes we've seen one example of them
Lang but we'll see many other examples of the same idea but these building
blocks are exactly the same generally you start a thought you finish a
thought and you might do something in between questions on this basic
structure of any web page any questions at all no all right so let's now now
allow
used week five's terminology if this special node here represents the whole
document well the root element as I called it is HTML HTML has two children
head and body the head tag has in turn a title uh child and in turn has some
text just as the body has some text and so this is what your browser is doing
you and I the programmers write this stuff the browser reads this code top to
bottom left to right whenever you visit a website and inside of the
computer's memory Chrome Edge Firefox Safari what
what not they build this data structure in the computer's memory so as to
know what it is you have told them to do and we'll see over time at the end
of today you can write code in an actual programming language JavaScript to
maybe dynamically add or remove things from this tree and this is how
things like Gmail work when you open up your Gmail inbox if you're a Gmail
user if you just stay there long enough you'll probably get more and more
mail and what happens you don't have to like reload the page
or rebuild the tree per se it just all of a sudden appears at the top at the top
at the top what's happening there is that Google wrote some code that just
keeps adding more nodes to this tree every time they realize you've got a
new message again and again so that's the relationship now even with this
world of HTML with all of the programmatic ideas we looked at in the past all
right so let's go ahead and do something with this that's a little more
interesting than just hello itself I'm
going to go ahead and hide my terminal window because the server is now
running and all I want to do now is experiment with uh hello and other
examples as well let me go ahead and actually before I do that let me go
ahead and run uh code of paragraphs. HTML just so I can keep my code
separate and now I'll hide the terminal window again um paragraphs. HTML
I'm going to do almost exactly the same let me go ahead and start with
something familiar and eventually I'll start copying and pasting just to save
time so doc type HTML is always there open the HTML tag and now notice I
didn't type the rest of that just like with C just like with python we try to save
you some keystrokes by closing parentheses adding quotes the HTML
support in VSS code is pretty good too and it tries to finish your thought
when it comes to tags as well it can screw things up if you if it does
something you don't want it to do so sometimes you have to delete but it's
just autocomplete as we've seen before uh
let's go ahead and let me add Lang equals as all of my examples today will
be let's add the head tag let's go and proactively add the body tag and now
let's go ahead and give this a title tag uh which has a I'll just call this
paragraphs just so I remember which example is what now notice all of this
white space and all of this neat and tidy indentation the browser ultimately is
not going to care about this is just for us humans to kind of keep ourselves
saying when we look at the code it's
just easier to read but strictly speaking I could minimally delete all of this
white space and I could just move all of this tag up to the same line both I
think are fine I'll just going to follow a certain convention but this too would
have the exact same meaning but we'll see where that detail about whites
space could potentially get us into trouble later in my paragraphs tag let's do
this in advance I've written up some Latin like text a really long paragraph of
Latin like text like this it's
actually random nonsense it's not real Latin even though a couple of the
words might look familiar and so here we have three paragraphs of text and
I've deliberately hit enter in between them so that just like an essay in
Google Docs or Microsoft Word hopefully I'll see three separate paragraphs
let me now change tabs and I'll close [Link] from before I'm going to go
back to my other tab here I'm going to click back to go back to that index of
all of my files which I started at earlier and
you'll see now that I have two files because I obviously just created a second
file called paragraphs. HTML so let's click on this to see our three paragraphs
of Latin like text and voila I'll zoom out all right first bug if you will this just
looks like one massive blob of text not three Blobs of text and why might that
be borrowing the the hint I offered a moment ago why are we not seeing
break yeah so we need some kind of line breaks here because the browser
turns out is only going to take us literally and if you
just give it text text text it's just going to show you text and anytime there's
more than a single white space whether it's two or 20 or 200 it's going to just
assume that you did that just to be neat and tidy and it's going to collapse
them into just one space visually like this so there are in fact a couple of
solutions one is this here I could add some explicit line breaks and it turns
out that there's a br tag like this and just for uh visibility sake let me do two
of them so like hitting enter
enter on my keyboard I'll do it here too BR for break break and now let me go
back to my other tab nothing's changed yet but that's because I have to
reload I've changed it on the server but now I need to change it in the
browser by reloading and now it looks a little better albeit nonsensical but
you'll know is a curiosity per this BR tag this is kind of poorly designed it's a
little hackish to just say enter enter and make the browser do this breaks line
breaks don't actually require Clos tags or end
tags so not all tags need to be closed at least those that it just makes no
semantic sense to close them right like the break is there or it's not you can't
imagine like starting to move to the next line and then eventually getting
around to finishing like it's either there or it's not so some tags do not have
closed tags as necessary but there's a more elegant way here I dare say not
just sort of hackish putting in these line breaks let me do this instead I'll
delete those and let me go ahead
and as the name of this file suggests let me add a paragraph tag now here I
need to fight with vs codes autocomplete because I don't want to finish the
sentence uh the thought there let me go ahead and open the paragraph tag
and close a paragraph tag and just to keep things tidy I'll go ahead and
indent two even though the indentation itself doesn't matter let me go ahead
and create another tag for opening this paragraph and I'll close this one here
and now let me see sometimes it's
fighting with my auto complete but that's fine because I did this sort of the
wrong way at first and now let me go ahead and finish this thought by
closing this paragraph tag here and I've manually fixed all of my indentation
so now on line 10 I have the equivalent of hey browser start a paragraph and
then it does the Latin like text then on line 12 hey browser that's it for this
paragraph and repeat repeat repeat if I now go back to my other Tab and
reload again shouldn't be all that different
but semantically it's a little bit better why because just saying break break
doesn't really mean anything but by saying paragraph paragraph paragraph
now there's some more semantic information there now if like Google is
analyzing your page or if the programmer is trying to understand what it is
you did in the past when writing this code you just know semantically oh this
is a paragraph This is a paragraph This is a paragraph just like in a book or or
an essay so it's a little more clear
focusing more on what it is not how you want to display it any questions then
on these paragraphs now all right so a few more tags and indeed these first
few examples will really just be like sort of bang bang bang just a bunch of
different vocabulary words in the form of these new tags but we won't go
through the entire laundry list of tags this is indeed the thing for which web
references and books and the like are ultimately helpful just like a dictionary
in the real world so I'm going to go ahead and do this let me go
ahead and copy this let me create a new file called headings. HTML just so
we have a new file for this to save time I'm just going to paste that exact
same code just to get me started I'm going to change the title of it for clarity
for the code online to headings and now just like a book or an essay or a
thesis let me actually put some actual headings here now if my first heading
like chapter one I could do something like this up here I could have a
paragraph like I just learned and I could say
something like chapter one here but that's not really a paragraph and so it's
sort of better designed to tell the browser and really tell the world what it is
so it turns out there's another tag I can use like H1 for heading and like most
important heading and in here I'm just going to keep it simple and I'm going
to say something like one and in fact this is so short here's a good candidate
for just keeping this all on the same line but this has no functional difference
but it'll just make it a
little tur more tur on the screen now let me go ahead down here and I could
have multiple headings so H1 2 and down here I could have another one H1
three and if I go back to my other tab I reload it now we should see just like a
book or an essay now we have some proper now we have some oops I have
to go to the right file sorry if I go back to the index now we see the new third
file called headings. HTML and now we indeed see some fairly pretty if
simple headings as well now if these aren't
three chapters 1 two three but maybe it's a chapter then a section then a
subsection such that just visually you want things to get smaller and smaller
well those exist too and in fact you can do H1 through H6 H1 a little
paradoxically is the biggest and boldest H6 is the smallest but still bold so it
might make sense to make this H2 both open and close and maybe this H3
open and close if again this is a section or a subsection inside of that chapter
if I reload now notice just gets a little
smaller so it's more similar to what you'd see on the printed page but this
now is just another three tags that I might use in my own code all right well
how about lists of things I have three paragraphs here but let's do this let me
go back to vs code I'm going to copy this code so I have a starting point I'm
going to create a new file called say list. HTML here I'm going to copy paste
I'm going to change my title to be list just for clarity and in here I'm going
to go ahead and get rid of this whole body because let's move away from
these massive paragraphs and keep it simpler for now if I want to have a list
of things uh for instance uh if you haven't seen these already a computer
scientist when they're fishing for just some arbitrary meaningless words uh
they often use Foo bar and baz just as their go-to just like a mathematician
might use XY Z for variables so Fu bar and baz are on three separate lines
and maybe this is like my to-do list or my
shopping list but you can probably imagine if I go back to my other tab go
back to the index I now see my new file list. HTML but it's probably going to
look wrong I think I'm just going to say yeah Fubar baz all in one breath if
you will the not on new line and you can try to fight this like you can be like
really want to put some line breaks there go back and reload it's still not
going to make any change how do I want to fix this well I can do this in a few
ways I could make them paragraphs but they're
not really paragraphs they're a list so I'm going to use a different tag instead
I'm going to create for instance an unordered list using the UL tag open and
close inside of that I'm going to use the list item tag Li I and I'm going to say
Foo inside of another tag I'm going to say bar inside of a third tag open and
close I'm going to say baz so it's getting a little verbose but it's still relatively
succinct Li is all you need type for list item UL is all you need
type for unordered list so there's some shorthand syntax here that's adopted
if I now reload you're going to see a so-called unordered list like not sorted
which means by convention to show it as bullets though it could be displayed
in different ways visually as well if you actually change your mind and you
realize oh I'd really like to number this well you could obviously like just add
one and two and three but that's going to get annoying especially if the list
grows you want to change something
insert something in the middle then you have to reorder it I mean we using
computers here they can do this for us so we can change the UL to any
guesses o maybe for ordered list which is sort of the opposite here so let's
try changing that to O let me go back to my browser and I'm just hitting
command r or controlr to reload the page instead of clicking the button every
time now I automatically get 1 2 and three and you can even override the
Aesthetics using different numera bles or symbology
instead but that would be perhaps the most common there as well all right
it's a lot of tags quickly but any questions on lists paragraphs headings or
the like no all right so let me go ahead and propose this here um let's go
ahead and create what we'll call a table so let me copy and paste this into a
new file code of table. HTML and in table. HTML let me again rename the title
to table let me get rid of that ordered list from before and let me now use a
table tag open and close this one's a little weird but
inside of a table you typically have a head of the table so uh I'll say well let's
say the first row we'll keep this one simple a table row or TR inside of a table
row you would ideally have columns but that's not the nomenclature instead
you have data so TD for table data and let me go ahead and just have the
first datm be one I'm just going to arbitrarily do 1 two three just so we have
something to play with and you know what just for demonstration sake I am
going to deliberately copy paste this
twice and I'm just going to manually change the numbers just so we can see
what I'm creating 789 and then maybe just for good measure if you're seeing
where this is going let me copy this one more time and give myself a final
row with an asterisk a zero and a pound sign if maybe you see where this is
going let me go back to my other tab let me go back to the index there's my
new file table. HTML I'll click that and while it's not very pretty I'll zoom in it's
indeed a table of data I happen to mimic
like a key uh a telephone keypad but you could imagine this being much
juicier much more interesting scientific or financial data or the like laid out
into these rows TRS and these columns AKA table data as well so we have
that ability as well for structured data now of course the internet um has lots
of images on it and in fact this is all just text how can we introduce images
well let me go ahead and do this let me first uh sort of semi secretly copy an
image file that I brought from uh earlier just so we have something to
play with and I have in my account here now an image called harbor. JPEG
and I uploaded this semi- secretly a second ago into my account so that I can
reference a second file let me go ahead and copy this HTML just to save
myself some keystrokes let me go ahead and do code of image. HTML and let
me paste that code and hide my terminal window I'm going to get rid of all of
this table as just uninteresting now we're going to make this even simpler by
changing my title to image to keep all these
demonstrations uh separate and now if I want to make a web page that when
visited shows us a picture of Harvard well there's an image tag abbreviated
IMG for short I can specify what the source of that image is and if my file a
JPEG in this case is literally in the same folder I can just say quote unquote
harvard. jpeg if it's in a folder I should mention the folder name in a slash or
something like that if the image is on the internet somewhere with a URL I
could also have a whole URL https colon and then the URL of the
image but I upload it in in advance now this is just going to visually display
on the screen but not everyone of course can see images screen readers
might need a bit of assistance and even search engines might want to
analyze the page and know what this is an image of now machine learning
and artificial intelligence are maybe getting better to be fair at figuring out
just by analyzing images what they are but they're certainly imperfect I am a
human I know pretty well what I took a photo of for instance so maybe what I
should
do proactively which would be good for accessibility is have this alt tag for
alternative text and then literally say like Harvard University so that
someone who can't see or so that a a server can actually know with higher
probability what it is uh they're looking at and I could be even more detailed
than just a phrase I could describe the image as well all right let me go back
to my index in the second tab let me go back and zoom back out there's my
new file and there's my new JPEG that I quickly
uploaded before I can click now on image. HTML and albeit a little
overwhelming that is a really big image of harvor now apparently it's too big
to fit on the screen so this isn't the best user experience to have to scroll up
okay so there's the image horrible horrible design if you will at least in terms
of my code but there's going to be ways where I can sort of rein that in and
affect the height or the width as well but for now it's just deliberately a little
overwhelming instead now we can
do something a little more fun and topical today uh which might be to use an
IM a video instead so let me go ahead here and very quickly grab another file
today which is uh I brought in advance and that you might have seen briefly
earlier which is an MP4 an actual video file and what I'm going to do here by
revealing vs code again is I'm going to code a file called video. HTML get
another demonstration here I'm going to change my title to video just to
keep these things straight and instead of the
image tag you might imagine using now indeed a video tag and this is a
relatively newer tag that has increasing support among browsers so it's good
now to use and inside of this the syntax is a little different you specify and
this is weirdly annoyingly in consistent not SRC for Source you literally say
source and then in Source you use a source attribute horrible design
semantically but like this is what we're stuck with halloween. MP4 is the
name of the video we uploaded in advance made by some of
harder Harvard's digital artists and the type of this video so that the browser
knows for sure is video/ MP4 that's a so-called content type that you just
know or you look up to figure it out and just so that this is as animated as
possible I'm going to tell the video tag with a few attributes to autoplay and
it turns out that attributes often have key value pairs whereby it's the key
the attribute name equals quote unquote some value just like Lang equals
quote unquote for English but not all
attributes need values in fact if you read the documentation for html's video
tag there's an autoplay attribute where you can literally just say the key and
it needs no value it's just going to mean autoplay and if you don't want to
autoplay you just omit it alt together so so you don't necessarily need a
value on or off uh I want the thing to Loop just so it keeps going I want it to
be muted so that we don't hear any sound in fact there is no sound but
browsers nowadays for anti-spam and advertising
reasons often will not play a video if it has sound because it's just kind of
obnoxious if you visit a page and all of a sudden your speakers start blaring
so I know this from having read up on this that I should mute it so if I want it
to actually autoplay for real and then I'll set the width manually for now to be
like 128 pixels across just from some trial and error earlier and that with
attribute does have a value now I'm being a little um a little uptight here by
alphabetizing all of my attributes
not at all necessary I do it just so I can skim things faster and know if
something is there or not so for me it's just a matter of style let me go back
to my other tab go back to my index and you'll see two new files again the
mp4 file and video. HTML I'll click on the ladder and if I did this well here we
have thank thanks to our friends in Harvard U our artistic friends at Harvard
very like an oo would help with the drama here but okay but we have a very
dramatic nice Halloween type view
here as well so we have videos embedded as well and suffice it to say there's
ways to embed YouTube videos or Vimeo or other services as well using yet
more tags too but the web is of course all about hyperlink Hyper text markup
language where you click on something and you end up somewhere else and
this is how the web is so powerfully interconnected so how do we start
creating links from one website or web page to another either that I made or
someone else well let me go ahead and open back up my
terminal window and let's create a file called link. HTML just to demonstrate
what you and I know as a link I'll hide my terminal window now let me copy
paste just to save myself some keystrokes and let me get rid of the video tag
so we can focus now on links suppose that I want you to visit Harvard
virtually well I could say something like visit uh Harvard period this is
uninteresting because it's just going to be text I probably want you to
actually visit [Link] instead more specifically
and I'll lower case it just to be consistent with what browsers do in the
address bar all right let me go now to the video uh back to this video Tab and
go back where we now see my index I'll Zoom back in and there's link. HTML
unfortunately when I click this and I'll zoom in you literally just see the text
that I wrote and yet on every social media platform nowadays except like
Instagram when you type a URL or what looks like a URL even if you didn't
bother with the HTTP or https it usually
automatically links it for you on Facebook on Twitter and other sites as well
that's just a convenience Discord and slack do that too but they're just doing
it to make things more userfriendly but they have to generate HTML with the
proper tags and attributes so to get this to actually work it's not even good
enough to say https [Link] because if I go back now and reload
now you'll literally just see all of that as text if you want the browser to treat
this as a link you
need to use the anchor tag it'd be great if it were called the link tag but it's
not it's called The Anchor tag or a for short and the way you reference the
URL to which you want to lead the user is via href for hyper reference this is
one of the earliest tags perhaps among the most Arcane now but if I then put
that whole URL in quotes and close my tag I now have to opportunity to
finish my thought in between the start tag and the end tag for this anchor
and what I put in between the start and end tag is
whatever the human's going to see so here I can say Harvard I can go back
to my other tab I can reload the page and now you see the familiar blue
underline this now is an actual link and if I click it I'll be whisked away to the
actual Harvard websites but there's a risk here can anyone imagine pretty
simply after like what 60 seconds of the link tag of the anchor tag how could
someone an adversary misuse this tag alone How could a website run by an
adversary How could a spammer misuse
this tag do you think yeah yeah absolutely you could have it say one thing
but lead another lead elsewhere so I could say Yale in here nothing stopping
me as the developer go back to the page reload now it says visit Yale you
click on Yale and voila you end up applying to the wrong place instead now
there's some hints of this I could hover over this and super small like this
isn't very good for your anti-hacking techniques but way down here you can
actually see the URL that it's going to go to in most browsers
indeed do this at least on desktops and laptops so it's a little bit of a hint but
what you're seeing here even though this is kind of a silly um uh playful
example this is exactly how fishing attacks work p i sh iing work whereby an
adversary tells you to log into your PayPal account but it doesn't go to
[Link] it goes to some other random website that they bought and built
that then tries to collect your username and password and stored in their
database so now they can log into your PayPal
account as you and it boils down to that simple primitive and you can be
even more manipulative to you can even say the whole URL for Yale like
[Link] or Worse htps [Link] reload that and now who I mean who
among you and people in your lives are necessarily going to be so paranoid
as to not just blindly click on that URL this is why just being a defensive real
world person nowadays digitally is just ever more so important so these
same things that can be used for good or uh benign use cases can also
be used for ill purposes too and it is literally that simple questions now on
any of these tags thus far just a few more to offer up any questions on this
here no well let me open up a couple that I brought in advance just so we
don't have to type all of them here um if you for instance have a web page
that's got quite a bit of code let me go ahead and grab from the website a
couple of examples real fast here namely one that we'll call how about uh
meta HTML and in this example here give me
just a moment to full screen it we're going to have a file So codem Meta
HTML I'll open this up next no relationship to what we now know as meta the
company but rather this is going to be a page that I copied and pasted the
same chunk of Latin like text from earlier so it's going to be a really big
paragraph of text and this is an example where if you were to open this web
page not on my own Mac or your PC but on your phone the font might
actually be really annoying and difficult to read why because your
phone's going to try to squeeze all of the content onto the tiny viewport the
rectangular region of your phone instead so it turns out there are ways pretty
easy ways to make your website mobile friendly as well Otherwise Known
technically as responsive and the easiest way to do this is to include this tag
here a meta tag again no relationship to Facebook this has been here much
longer and this case here this meta tag online five has its own sort of
approach to key value pairs this is a good example of where it'd be nice if it
looked just like everything else but this is what we have historically you can
have a meta tag with an attribute called name that refers to the name of
some feature of the browser in this case viewport is the technical term for
like the big rectangular region to which I keep referring the body really of
your page the content for the viewport you can say some AR esoteric details
like this the initial scale should be one that is no matter who visits your site it
shouldn't start zoomed in it shouldn't start zoomed out
it should start at just the default sizing and then this here with equals device
width is a very arcane way of saying if the user has a small screen show the
text proportional to that size don't just try to cram it all into a tiny little
window so it's super simple but if for the next problem set or future projects
as well you find that just things look really bad on mobile like this kind of tag
is the place to start meta there aren't terribly many of these that you'll use
but they're useful
for other mechanisms as well in fact let me go ahead and semi secretly pull
up one other example as well whereby I'm going to grab another example
that uses more of these tags and in just a moment I'll reveal it here give me
just one second here I'll have I'll propose that in this example of meta we
now add these properties instead so I copy pasted this from an existing file
just so as to not waste time typing all of these out if you've ever shared a
URL on Facebook or Twitter or slack or
Discord Facebook Twitter and other such sites show by default for that can
use nowadays what are called open graph tags which is to say there's other
uses of the meta tag and you just look these things up even I had to look this
up to remember what the key value pairs are the meta tag can also have a
property attribute that can be these very specific strings OG title OG
description OG image which denotes open graph which again is this standard
that's evolved in recent years and what you can do here is
tell browsers and in turn servers what you want them to show as the default
title of the page P the description of the page and even the default image
just so you can exercise more control when sharing things socially nowadays
as well again it just boils down to these key value pairs this is absolutely the
kind of thing you look up as needed to cross check but those capabilities are
there and so literally the next time you paste a link into slack or Discord or
any online site that then displays it in
embedded fashion just know that all this time a little bit of textual code like
this in HTML has been there by whoever authored the site all right let's do
one final example in HTML alone before we transition to just cleaning up the
Aesthetics and improving the visuals of everything we've been creating let
me go ahead here and close meta HTML let me code up a new file called how
about search. HTML and see if we can't draw some inspiration from our
cursory understanding earlier of how
see a lot of this a lot of noise too and distractions but there's going to be
some equal signs most likely Andor some ampersands as well and those are
just separating key value pairs now what can I do here well if you think back
to how we manually searched for cats earlier let me quickly do this I'll do this
one manually doc type HTML as my very first line HTML tag with how about
my Lang attribute for English up here and then inside of this I'll have a head
tag inside of this I'll have a title I'll
call this example uh search and then down here I'll have my beginning of a
body tag and now let me introduce you to really a final tag for now uh a form
tag which will create a web form the thing with text boxes and buttons that
you and I use every day on any number of websites inside of this form I'm
going to have an input like a text box whose name is going to be Q for query
because I'm trying to re-implement Google here uh the uh type of that I want
to be a text box or if I know I'm using this for
search I can actually change this to a search box and it's going to let me it's
going to generally put a little X there so you can clear it quickly that's a nice
little enhancement as well and then I'm going to give myself a submit button
by doing input whoops I'm going to give myself a submit button by doing
input type equals submit and then I'll leave that as such here all right now I
need to do a little bit more but let's see how this looks let me go over to my
other tab let me go back to my index and
if I zoom out there is search. HTML I'll click it and there's not much going on
here even if I zoom in but I do indeed have a really big text box and a submit
button but I haven't in my HTML told anyone anywhere that I want this input
whether I type cat or dog to go to [Link] so for that I need a couple of
more attributes and I know this from having done this before and any online
reference will say the same you can add an action attribute like what do you
want the action of this form to be and
you can put the URL to which you want this form to be submitted and I know
from tinkering that it should be hdps [Link] search I don't need to
put any question marks here myself but I do want the uh browser to do that
for me so let me go back to my other tab let me reload and nothing visually
has happened but watch this when I now type in cats but before I hit enter
notice that I'm currently at some long crazy URL search. HTML is expected if I
now go down to the submit button and click submit watch
what happens to the URL and the page itself I'm whisked away to the actual
[Link] and indeed there are those same cats and if I zoom in here you'll
see that my URL has changed to be indeed SL search question mark Q
equals cats so this is just how web forms work when you submit any form on
the web in this way the browser automatically goes to that action URL adds a
question mark puts any key value pairs that you manually typed into the text
boxes and lets the server do its thing now here's where Chrome is
starting to simplify things Safari does this too if you double click on your url
now you see the full URL but if any parts are missing that's just a UI thing to
eliminate visual distractions nowadays meanwhile if I go back to my own
form if I search this time for dogs and hit enter now again the URL changes
to be Q equals dogs and it all reduces to this basic building block of using an
a form tag now I can be more explicit if I know I want to use get it which is
actually the default I can literally say
quote unquote get in all lowercase even though the verb earlier was by
Design in uppercase but here now I'm just being ever more explicit if I don't
want the label of this button to be very generically submit maybe I want it to
be Google search quote unquote well if you read the documentation for
forms you can actually change the value of the button to be quote unquote
Google search and if I now go back here and reload I get a fresh form and
now I get a button that literally says Google Search and if I
tinker with this further because this isn't very user friendly there's even more
attributes I can do I can add on my t uh search input a uh autoc complete
equals uh off if I don't want to see my own history for whatever reason I don't
want people knowing I'm searching for cats and dogs on this page I can
autofocus on the text box so that it shows the cursor blinking in that box by
default and I can even do something like this I can have a placeholder
attribute that says something like query or some
other documentation for the user and if I now go back and reload you'll see
notice it says query and it's subtle but my cursor is already positioned there
it gave it focus and I can type cats now without having to click in the Box
manually which is just marginally better for the user's experience any
questions now on all of this here any questions all right that too was a lot
why don't we take a casual five minute break and when we resume we'll take
a look at CSS add in some JavaScript and
then wrap up so 5 minutes only for now all right we are back so that's
technically it for HTML like here on out it'll be up to like online resources and
references we point you to just to fill in your vocabulary for more tags and
attributes but like conceptually that's it there's attributes uh there are tags
and there are attributes and the rest of it really is just kind of a laundry list of
capable uh possible features but it turns out too you'll see over time that you
can even see the HTML for websit so
page on the internet and so for instance here is the underlying HTML for
Harvard's homepage as of right now and aesthetically some of it's been
collapsed so if I click on these various triangles I'll see what's actually inside
of that is the children of some of these HTML tags but here on out if you're
ever curious as to like how a web page uh made some feature visually you
can just literally use these developer tools built into your own browser just to
see what the uh web developer
actually did and you can do things too like this like if you really like maybe uh
let's see if you really like this menu in the top right hand corner of Harvard's
website you can even rightclick that or control click that specifically choose
inspect and what browsers will do is jump to the HTML corresponding to that
visual element on the page and here you can see though we've not talked
about this tag before there's a button tag there's an ID attribute and there's
some other
attributes as well that Define that button um you can do other things too in
the web page let me scroll down for instance here and let's go actually let's
go to another one like [Link] here in today's theme and suppose we want
to do something like uh change the Aesthetics of this website well let's do
how about this over here life at Yale let's rightclick on this choose inspect
that's going to jump to that part of the page and notice what you can do here
in this elements tab we can be a little a
little playful in return today life at Harvard and voila we've now changed
Yale's website it would seem so have we really like hopefully hacking is not
actually this easy what did we actually do based on today's mental model
like I have changed the page but yeah just changed how it is for me right
because my browser just like with Phyllis and Brian from the GetGo
requested Yale's web page I got back a virtual envelope containing that HTML
as we've now called it my browser has a
local copy it's got its own tree otherwise known as a Dom document object
model built up in its memory and yeah I went to town and changed my copy
of it but of course hopefully I've not changed the actual server and in fact if I
reload Yale's website now hopefully it will revert back to indeed yep what it
should be instead life at Yale but this ability in your own browser be it
Chrome or Firefox or Edge or Safari to have these built-in developer tools are
very powerful because it's going to enable
you to not only diagnose problems that will invariably arise in the coming
weeks with your own code but is also going to allow you to learn from other
sites like how you can do things and Tinker as well but up until now we
focused only on tags and attributes and on the structure of a web page let's
now focus more on the Aesthetics and fine-tuning that it turns out that HTML
has very limited support for anything aesthetic like font sizes and colors and
so forth and in recent years people have
properties is the new word in CSS for what a moment ago we called
attributes in H but it's the same idea just different vocabulary that you get
used to over time a few phrases I might use now and you'll hear in the
coming days would be these type selector class selector ID selector attribute
selector which just refer to different techniques we're about to see that are
going to allow you to control more precisely the Aesthetics of specific things
on the page and the way we're going to do this
is we're going to take our basic HTML like we saw earlier and we're going to
introduce in the next few minutes just a couple of more tags and or
attributes one we're going to introduce you to a tag called style which nicely
named allows you to um allows you to control the style the Aesthetics the
visuals of the web page or we're going to introduce you to a link tag which
very confusingly does not give you a link that you can click on it just links to
another file that then gets automatically included or
imported to borrow our language from C or in Python but same idea this will
allow us to include secondary files and we're going to ultimately show you
how you can leverage third third party Frameworks libraries that other
people wrote so as to not get stuck in the weeds of all the fine tuning of
Aesthetics and just make pretty things fast so you can focus really on the
intellectually interesting part if that's your choice of building the content the
site out the application out yourself all right so with that said let
before reload my index there's my new file [Link] and I'll click that and
you'll see okay I mean this is sort of 1636 style web page super simple all
text nothing really interesting going on there but we can start to style it a
little differently like if the title of the page is John Harvard and then it's
welcome to my homepage and then this less important footer why don't we
have the text be large then medium then small so something arbitrary but a
little more nuanced so let me go back to vs code
here and in my [Link] file let me introduce not yet the style tag but what
I'm going to call temporarily the style attribute both indeed exist this one's
simpler and it's going to be correct but we'll see in a moment not as well
designed arguably as is often our narrative so inside of the style tag you can
put this language called CSS key value pairs otherwise known as properties
the only way you know what properties exist what keys exist is by taking a
class reading a book looking at
an online reference and we're going to give you just a sampling of what's out
there so suppose I want to control the font size of this first paragraph I can
literally say font Das size in all lower case colon and then a word like large or
I can specify 12p point or 18 point or something more precise like that like
from Google Docs or Microsoft Word and suppose I want to make this text
down here uh medium well uh I'll do quote unquote font size colon medium
and down here I'll do style equals uh font Das
size small so I'm going to start with just these three key value pairs same key
but different values I'll go back to my page and in a moment I'll reload and
it's going to be somewhat subtle but watch how the font size do change
when I reload now all right so got a little bigger middle one's about the same
and the last one is a little smaller what if I want to center it just like many
web pages have the text like this centered well I can separate these key
value pairs with semicolons and I'm sorry
semicolons are kind of sort of back with CSS but I can do text-align Colon
Center strictly speaking I don't need the last semicolon if there's no more key
value pairs but I'll just do it to be consistent uh text align Colon Center and
then down here after another semicolon text align Colon Center all right let's
go back reload now it's going to be much more obvious the change and we
now have the beginnings of a homepage still pretty basic but at least it's a
little more interesting turns out
we can do a little better with the copyright symbol like most computers
actually have support for a circle with a c in it but you can't just do that with
uh text like this there's different ways to do this you could copy paste it from
like a website that already has it so you don't have to figure out the mag iCal
keystroke on your Mac or PC but there's also in HTML what are called entities
and you can actually specify using heximal or decimal codes numbers like
this H1 169 semicolon after an
Amper sand and this is a special symbol that you can look up in any online
reference for like special characters that are hard or impossible to type
manually at your keyboard and this let me zoom in just so it's obvious if I
reload now instead of being two parentheses and a c character now it's a
proper Copyright symbol so you'll see these out there they're not necessarily
that frequently used nowadays but it's good to know that they exist but let
me go back now to my code and propose that while correct uh this
is arguably not very welld designed and even if you've never seen HTML
never seen CSS before what Instinct might you have for why this is poorly
designed yeah there's repetition right in general in the past several weeks
see python SQL like repetition generally bad and and sloppy and it's not
going to scale well so the repetition I think you're probably alluding to is
textalign Center textalign Center textalign Center well we can factor that out
in CSS the C in CSS means cascading and this means that
thing first it turns out that these two are arguably not paragraphs right this is
like a header the body the essence of the page and then the footer so if a
paragraph isn't quite the right English semantics you can actually use more
generically a tag that's all over the internet called div for division of the page
and this is just a very generic term for like a big rectangular region that
divides the page again and again just so that you can think about different
regions now that I have div
which really has no more meaning than that it's a division of the page
interpret as you will now I can have multiple ones of these and let me go
ahead and open a div tag here let me close a new div tag here and then just
to keep everything tidy I'm going to highlight everything in between and hit
Tab and that just automatically indents everything for me now I have a three
divs inside of another div and that's totally fine this is very commonly done
now I'm going to do this style equals
quote unquote text align Colon Center semicolon or not and now I have some
cascading capabilities now the parent of those three children John Harvard
welome to my homepage and the copyright will now all inherit that property
so when I hit reload nothing aesthetically has changed whoops sorry um I
should have done reload slightly earlier when you use a div instead of a
paragraph it actually gets rid of the par uh the space between those
paragraphs it just sandwiches them a little closer together I can fix this in
another way but that aside everything is still centered and the text is still
large medium and small but I should have called out that change in the
paragraph spacing but we could bring that back before long if we wanted
now what more could I do to maybe improve this well strictly speaking I don't
really need that parent div right because these three divs inside already had
a parent so let me actually get rid of that new div just undo what I did I'll
highlight this and if you haven't seen this trick
shift tab will unindent nicely which is perhaps helpful I could just put that text
align Center on the body tag so text align Colon Center quote unquote this
two would work as well so long as you go up the family tree so to speak
reload and now indeed there's nothing aesthetically that has changed this
time but it turns out nowadays the web is getting a little more sophisticated
and even though you will see so many examples online and tutorials and
books using div div div div all over the place
there are newer semantic tags semantic just means they have more
meaning than this generic notion of a division and if you look up the a
documentation for HTML you'll see that if you want to have a header on a
page not a heading like H1 H2 but a header there's literally nowadays a
header tag and this is marginally better because it now says what it is search
engines like Google and Bing can detect oh that's the header of the page
maybe we should use this and give it more prominence in the search
results you can then have a main part of the page so literally a tag called
main nowadays you can literally have a footer of the page and again these
are often useful for screen readers to help recite things verbally for folks who
might otherwise not be able to read them and probably these screen readers
might highlight the header and the main part but maybe not might not
spend time for the user on the footer which is arguably a little less important
semantically usually um or search engines again now
know what's the header what's the footer what's the main part of the page
so they know what to search and analyze so this would arguably be a a
better design nowadays as well but what else remains as a problem well this
is now getting a little bit more subtle and takes some experience but this
practice of putting HTML and CSS all in the same file it's a little sloppy why
because it means I'm co-mingling my data with the presentation thereof like
the juicy stuff I care about like John Harvard and
the phrase welcome to my homepage and all of the Aesthetics that I might
want to change over time and honestly because everything is currently in
one big file it's going to make it really hard for me to collaborate with a
classmate or a colleague at work so that maybe I do the HTML they do the
CSS like uhuh not if you're all working in the same file it would be a
nightmare even if you use vs code sharing feature like Google Docs and both
are typing at the same time like you're going to mess up somehow
it'd be nice if we could separate these two languages well one way to do that
would be as follows let me get rid of all of the style tags sorry style attributes
that I've added up until now on all four now of these tags and let me
introduce the style tag that we saw on the slide earlier instead I'm going to
go up here into the head of the page which is where technically these style
tags must go so that they're already loaded into memory before the body is
even analyzed by the browser and
inside of the the style tag I'm actually going to select the HTML elements
that I want to stylize if you will so if I want to change the body's Aesthetics
I'm going to literally type the name of that tag body and then I'm sorry curly
braces are back also from c u inside of these curly braces I'm going to put
text align Center so the key value pairs are the same the only new thing I've
done is I've moved some of the syntax up to this new style tag in the head if
I want to
now control the header tag as well I can use the same curly braces this is
convention to put the open curly brace on the same line the closed curly
brace on another the browser doesn't really care but this is a common CSS
style convention I'm going to do font size large semicolon then for the main
tag I'm going to do font size medium and then for the footer tag I'm going to
do font size small so same exact thing and it's admittedly a little bit more
verbose it's taking up more lines of
code it doesn't all quite fit on the screen but if you scroll back down now and
you'll acquire an eye for this this is just better like it's just more compact it's
more readable the the content the data jumps out and there's no visual
distractions like the CSS properties as before upside here too is that we don't
actually need to uh this doesn't actually change the Aesthetics if I reload the
same page it still looks the same but I've taken a step toward some slightly
better design but let me
propose that there's other ways to do this too we just selected things by way
of their uh type so that was a so-called type selector when I literally just
specifi the type of tag body header main footer but there's other ways that
now we can lay the foundation for making reusable CSS that you and
colleagues and classmates can use and reuse in multiple files and even in
multiple projects so let me actually go ahead and do this instead of just very
explicitly saying I want the body to be centered let me
invent an adjective if you will and let me change this to do centered and this
new uh this new vocabulary word centered will literally mean texal Center let
me go ahead here and I'm just going to create a new adjective called large a
new adjective called medium and a new adjective called small they are
deliberately consistent with what the properties do but these are now my
own vocabulary words and they are called classes so a class is just a
collection of key value a collection of properties that you get
to invent for yourself and what it lets you do now is this now if I want the
whole body to be centered I can add this tag which we actually saw briefly in
Yale's HTML class equals centered down here in the header if I want this to
be large I can say class equals quote unquote large down here on Main I can
say class equals quote unquote medium and down here I can have class
equals quote unquote small now I have taken one step backward by read
addding some of the Aesthetics to the page but it's not
the actual properties it's not the key value pairs it's now more semantically
nice because now I just know from reading the HTML what these things are
going to look like whereas the implementation details for all four of those
adjectives is now relegated up above and these are literally my words I could
change it to Fu and use class equals quote unquote Foo but obviously that
would not be the best choice of words in this case all right any questions on
this this now is what we would call a class selector by using
literally the dot even though the dot does not appear elsewhere but dot
means this is a class these are not always the best syntactic design decisions
that the world makes all right well one last trick then notice that this is a
little Annoying that I'm still working in the same file and if my classmate
wants to clean up my Aesthetics make my homepage look way better if my
colleague wants to do the same wouldn't it be nice if we could actually move
all of this code to a different function file like a python
library or a c header file well you can let me go ahead and delete that whole
style tag let me add a confusingly named link tag the href of which let's call
a new file [Link] and let's say that the relationship of that file is that of
stylesheet so this is a term of Art in the world of web development a
stylesheet is a text file that contains lots of styles lots of CSS properties let
me open my terminal real fast and let me do code of [Link] enter and in
this file I'm going to
paste all of those same lines as earlier but now they're in a separate file and
indeed if I hide my terminal window and I give this file to a colleague they
can now work on the Aesthetics of the page and make things a lot prettier
than this maybe use specific font sizes maybe add colors and the like
whereas I can focus entirely on the HTML because this file now will reference
that other and if I go back to my other Tab and reload Cod the content's
going to be exactly the same but now I'm using some separate
file instead any questions now about these techniques here no all right so
with that said let me show just one example now of what I called a moment
ago Frameworks and this is where web development gets kind of fun at least
if you like this especially if you like the sort of logical design the presentation
of information you care about but you really don't want to struggle with like
font sizes and colors and getting everything Pixel Perfect so to speak let me
propose that I open up here an example in just a moment in vs
interesting header for your table there's another tag called T head there's
another tag called T body these are not all that intellectually interesting I just
read the documentation and realized oh to make things prettier I need a t
head a t body and so forth but what's interesting here is that if I go to my
index here and open this file called favorites. HTML here is all of the data
from last week's Google spreadsheet which we exported as CS and I
manually before class converted
to just HTML it's indeed a table but it's really not pretty like the columns are
really close together it's kind of hard to distinguish one row from another but
this is just raw HTML written by me now I could use CSS and some of the
tricks we just saw to maybe change font size there's ways to change color
background color and a lot of things like that but honestly other surely other
people in the world have presented tabular data in pretty ways right I've
been to many websites that have prettier
tables than M can I maybe use someone else's framework someone else's
CSS include it in my page but then stand on their shoulders and just make
my stuff look prettier well I dare say I can let me go ahead here and semi
secretly open up vs code again and let me grab a slightly different version of
favorites. HTML that I also opened in advance wherein I add this line of code
instead give me just a moment to foreground this version and the data is is
all the same as before but I've added one of these
link tags and I'm not linking to my own [Link] I'm using a popular Library
called bootstrap and bootstrap is just one of many popular libraries out there
free at that that has a whole bunch of CSS files and soon JavaScript files that
you can just use for free in your own projects personally or professionally
that just make things look and behave better without you having to reinvent
Wheels now to access their CSS I had to read their documentation and grab
this very long URL here but it's the same
idea link a forre equals quote unquote something and I read their
documentation and they told me to add this they told me that if I want my
tables to be prettier I have to add a class attribute to my own table tag and
specify a little weirdly but this is what bootstrap told me to do a class called
table and that will make it a prettier bootstrap table and if I want to stripe it
like every other row is gray instead of white just to make it pop a little more
visually I can also add a second class separated by
a space called table striped that's all I did I added line five and I changed line
nine and that is it the rest of the hundreds of lines in favorites. HTML are the
same but if I go back here now and reload the browser now thanks to
bootstrap voila like it's much prettier now I can zoom out and that changes
the font size just locally for me and even if you don't love their Aesthetics I
mean this is easily better than my own there and it turns out we can can do
even better by adding interactivity to
this too but to do that we're going to need one final language for today and
this one is an actual programming language and we won't use it all that
much in cs50 but we introduce it here as we begin web stuff because there's
just so many free libraries and professional libraries that you can use just to
make your web applications fancier and more interactive mobile applications
as well increasingly use HTML CSS and JavaScript to power our iPhones and
Android devices as well so a quick tour some syntax and
then we'll conclude with just some hopefully inspiring examples to give you a
taste of what JavaScript can do so JavaScript supports conditionals just like C
and python before it if we rewind to our scratch days here of course is a
conditional here is the corresponding JavaScript code as of today it's pretty
much identical to see with the syntax here uh if we had an if an if else in
scratch it looked like this in JavaScript it's going to look like this instead so
it's a bit of a regression V
ofv python like the the parentheses are back the curly braces are back the
semicolons I mentioned in CSS are also back in JavaScript potentially but it's
familiar is the point here and it's a different language that's frequently used
for the web whereas you can't use python in the ways we're about to use
JavaScript it just wasn't designed for that purpose meanwhile if you have an
if El it's else if else in scratch well in JavaScript just like in C it's going to look
like this instead variables in
JavaScript of course are a thing too and in scratch we might have initialized a
counter variable to Z in JavaScript a few different ways to do this and just the
for now the keyword is let it's sort of a polite way of asking for a variable let
uh counter equal zero semicolon so you don't mention the type but you do
use a keyword here in this case called let if you want to increment counter by
one few different ways in JavaScript you can do this just like in C in JavaScript
you can do this just like in C and in
Python in JavaScript you can also get this so plus plus is back so maybe that
counterbalances the other syntax as well that was was not the case in
Python Loops are back of course in JavaScript whereas in scratch you could
repeat three times like this in JavaScript it's pretty much just like C the only
difference here is that you say let instead of int for an example like this
meanwhile if you want to do something forever in scratch in JavaScript just
like in C you say while true in this case so this is to say we're sort of
tree the browser automatically creates in memory or RAM for you JavaScript
is now a proper programming language that lets us dynamically manipul
ulate like read data from this change this and this is how Google for instance
implements your inbox they might have uh in your inbox it's like a table so
TR TR TR TR probably something like that or heck maybe div div div div using
JavaScript anytime they realize someone sent you new mail they can create
a new node a new rectangle in memory and you the
human see a new div or a new TR again and again and again so with
JavaScript you just have the ability to control the user's experience and
instead of like I've been doing constantly hitting reload in the page to see
some new content to see some new content JavaScript can be running 247
so that you can actually see all of these changes live all right let's go about
writing some JavaScript code now instead of writing it on the server and
executing it on the server we're going to actually use a very common
Paradigm
hardcoded hello body and let's actually go ahead here and use a form tag
but we're not going to use this form in the usual way whereby the data gets
sent all the way back to the server we're going to Leverage control over this
form client side instead so I'm going to go ahead and create this open form
tag close form tag inside of that let me give myself a text input that's going
to have autocomplete equals quote unquote off just to ensure that what I
previously type in my examples doesn't
label how about something like uh how about we'll call this greet so that's
what the button will actually say well let me actually go back into my
browser tab let me reload this page and we should now see a relatively
simple form whereby I have the cursor blinking on a text inp input prompting
the user for their name and then a greet button that I can click but if I click
this button now it's not going to do anything useful because I haven't written
any code to tell the browser what to do when I click
that button but it turns out there's all sorts of events in the world of
JavaScript that you can listen for so to speak in fact here's just a list of some
of them anytime something changes in a form field anytime the user clicks
or drags on something anytime the user presses a key and maybe lifts their
finger up anytime the mouse goes down or over or up on top of something or
anytime a form is submitted those are events in the same way that we talked
about events back in week zero in scratch and in JavaScript just like in
scratch where you can do something when green flag clicked in JavaScript
you can write code that actually listens for any of these events or more so
with that said let's go back to vs code here and let's make a couple of
changes instead let's go ahead and add to this form a new attribute that's
not the best way to do it but it's perhaps the simplest for version one here
and let's say onsubmit do the following so onsubmit is an HTML attribute and
curiously its value inside
of the quotes there can actually be some JavaScript code and let's go ahead
now and let's assume there exists a function in the world called greet and
what I want to do is call that function right then and there well now in
JavaScript how do I go about making that function exist it doesn't come out
of the box just like print might or say might in python or scratch respectively
but I can do this let me go up into the head of this page inside of a script tag
here both open and close let me actually
write some JavaScript code and just so it stands out I'm going to give myself
a couple of blank lines though not strictly necessary and let me Define a new
function in JavaScript called greet and this is the syntax in JavaScript for
creating your own function similar in Python instead of saying defa in
JavaScript you just say function then the name of the function and any
arguments within the parentheses thereafter but I'm not going to pass in any
here then inside of curly braces what I'm going to do is use a built-in
JavaScript function that comes with any browser called alert it's not the best
or prettiest user interface but for now it's going to get the job done what do I
want to say to the user well let's first just say something simple like hello
comma World close quote semicolon thereby alerting the user with precisely
that message now what I'm going to do down here is make one other change
I don't want this form to actually get submitted to the server just like we've
seen in the past when you submit a form
browser please don't actually submit the form only call the Greet function all
right well let me go back to my browser here let me reload this because I
need to download the latest version of the JavaScript code and I'm just going
to go ahead without even typing my name I'm going to click on the Greet
button and you'll see that Al be it a little cryptically at the top we see an alert
that says hello world there's my ugly URL of my code space there at the
moment but we do indeed see that string but
what I haven't of course done is taken any actual name from the user so how
can we go about doing that well ideally I want to alert the user with hello
comma David or hello comma Carter whatever a name I type into that box so
how can I go about doing that well let me create a variable called name and
let me set it equal to this function call document. query selector that comes
with JavaScript itself let me then in parenthesis pass in an argument that is
going to be huh the ID the unique I need a unique
identifier for the thing I want to select so let me actually go back to my HTML
code here and instead of giving this form field a name like Q for query let me
actually use another HTML attribute called ID where now I can call this
anything I want and for clarity I'm just going to call this input element
uniquely name now up here in query selector just like in CSS where you can
use hashes and dots and other symbology in order to select certain nodes in
your Dom that is rectangles in that memory
tree well I can go ahead and select hash name which again is just the Syntax
for uniquely selecting the element whose ID is in this case name so you have
the hash up here you don't need the hash as the value of the attribute down
here on line 20 and now if I want to actually get the value of that text box I
literally just say do value so document refers to the whole web page itself
query selector is a function that's built into that object so to speak and the
value accessible via value just like
a c stru or even a python class allows me to go inside of that text field and
get whatever the value the user has typed in now as I've uh been able to do
in languages like python pretty Le fairly readily I can concatenate this name
onto the string hello comma space so as to form a complete phrase and
you'll notice here that I'm actually using single quotes in my JavaScript
double quotes in my HTML this is perhaps a common convention in
JavaScript the language does not care if you use double
quotes or single quotes but I dare say single quotes are just more common
and so that's what I've done here all right now as always I'm going to cross
my fingers go back to this page I'm going to reload because I've changed the
JavaScript and I need my browser to download it and now I'm going to type in
my name for instance David click greets with fingers crossed and voila now I
see hello comma David all right so it turns out that while functional this isn't
the best design and co-mingling your uh HTML
with your JavaScript script code AS with this onsubmit attribute isn't
particularly clean it's better as with CSS to keep your HTML over here your
CSS over here and your JavaScript now over here so to speak and better still
perhaps even in some separate files so how can I go about changing this a
little bit well let me go ahead and actually let's go ahead and delete all of
this code for just a moment and let me go and get rid of this on submit
Handler down here and really just distill my HTML only into the HTML and
the attributes therefore and what I'm instead going to do now is do this I can
use JavaScript to achieve the listening for that submit event or that onsubmit
event I don't need to actually use HTML for that I can use JavaScript entirely
so it turns out I can access some other uh member of this document by doing
document. query selector again but this time let's select the actual form tag
and it doesn't have an ID because it has no ID in its HTML but it does have a
tag name so just like in CSS when you can
Target Elements by way of their name I'm just going to select the one and
only form on this page by using that same query selector function and now
I'm going to use another function that just comes with JavaScript in the
context of browsers whereby once you select an element like that form I can
call add event listener which is similar in spirit to scratches when green flag
clicked or any block like that you can then tell the browser what event you
want to listen for I want to listen for
the submit event so you don't say onsubmit here now that we're in pure
JavaScript you just say submit and now I can do something like this I can go
ahead and say call the following function and I'm not even going to bother
giving this function a name and that is allowed to in JavaScript as we saw
briefly in Python and what I'm going to do now inside of curly braces after
that keyword function is the same kind of code as before I'm going to do let
name equals document. query selector I'm
going to select that same IDE uh same Name ID as before and get its value
and then I'm going to do alert and then pass in hello comma a single quote
again after that concatenate with that the name and then semicolon but I
need to do one other thing it turns out that this function and if you read the
documentation for this technique actually takes automatically a special
argument called by convention event and this is just an a variable if you will
that refers to whatever event just happened in this case it's of course
of my HTML as pure HTML down here and I've put all of my JavaScript code
as pure JavaScript up here this sort of separation of concerns similar to what
we started doing with CSS just a bit ago in order to keep those two
languages separate too well let me go back to my browser here reload the
page and unfortunately there's a subtle mistake I've made here let me go
ahead and type in David and click greet and unfortunately nothing actually
seems to happen well maybe it's just my name
Carter greet and nothing seems to happen that alert does not come up well
why is that well let me go back to vs code here and point out that order of
operations in zscript matters similar in spirit to C because on line seven I'm
selecting the form and trying to add an event listener for submitting that
form unfortunately the form had better exist at that moment in time but it
doesn't because just like in C and in some cases python where the compiler
or The Interpreter reads the code top to bottom
notice that the form doesn't actually exist and therefore get loaded into the
computer's memory until line 19 so we've got to kind of reorder these
somehow now maybe the simplest way to do this would just be to perhaps
do something like this let me scroll back up to my script tag and perhaps a
little more explicitly move it into the order in which I want it to be executed
so I'm going to go below my form and inside of my body which is actually
okay for JavaScript here and just use that same code and
assuming I didn't make any typos let's go back to the browser click reload
again to get the latest typee in my name again using that purely JavaScript
solution and the only change I made was I move the code from up here to
down here clicking greet now and wow it's now back we get the alert with
hello comma David so those kinds of things those kinds of principles matter
at least when we're back in this world but there's other Solutions too and just
so that you've seen it because it's a common
convention in libraries as well let me undo that change and put that script
tag back in the head or really anywhere else in the page where it might be
and let me propose that there's one other way to solve this problem to
postpone that code on line 7 through 11 getting executed until really the
whole Dom the tree is ready to go and the Syntax for this might be as follows
I can do document and I can add to the document a an event listener that's
going to listen for something a little special and I always have to look this
up myself to remember the spelling and the capitalization but it turns out
that the browser itself once it's done loading all of your HTML top to bottom
left right it will raise an event called Dom content loaded capitalized exactly
as such and if you want to call some function and I don't even need an event
argument in this case you can open curly braces just as before and put inside
of those curly braces the code that you want to execute only once the dumbs
content has been loaded top to bottom
and now let me just finish my thought with a closed curly brace Clos
parenthesis and semicolon it gets a little Annoying to visually line all of this
up but I think I'm still good and now even though this code is at the top of
my file or really above the form tag itself I think we're okay so let's go back
to the browser here reload the page type in David and click greet and we still
get the same correct behavior and so this is just a very common Paradigm to
use these kinds of events to listen
and listen and listen for something to happen and then only do something
once that thing has transpired all right well let's take one more step with
JavaScript code before we take a look at what's really fun about this
language and what you can do with browsers in particular by just cleaning
things up a little bit further I'm going to go back into the code here and I'm
actually going to remove uh or cut all of this code out of the [Link] file
itself and I'm going to change my script tag to have nothing
in between the open and close tag but I am going to give it a source attribute
and let's go ahead and call this for instance [Link] sojs would be the
convention for the file extension for a Javascript file and even though this is a
little weird that we have the script tag and a source attribute then nothing in
between the open and close tag this is indeed the convention when you want
to put all of your code in a separate file and let me go ahead and do that let
me go ahead and open my terminal window
create a new file called [Link] and then in that file I'm just going to paste
the very code that I just cut from the previous file so no changes to the code
all I'm doing is factoring it out and now I'm doing something just like our CSS
factorization before which confusingly use the link tag this uses the script tag
this just now allows me to collaborate with someone like Carter or someone
else so that they can do the JavaScript code I can do the HTML maybe a third
person can do the CSS and indeed
maybe we can build even grander things by uh designing things in this way
all right well let me go back to my browser again reload the page I shouldn't
see any visual changes but if I type in my name again David and click greet
this still now works and what my browser has just done underneath the hood
is not only download the uh [Link] file as always because there's now
this script tag that's referencing the source of another file just like an image
tag might reference the source of an image
the browser is automatically helping me out by loading that into its memory
as well and now how about one final example and for this one I'm going to go
ahead and not write it live but open it up as prepared in advance just to show
you what you can do by listening for some of these other events as well like
the key up the finger going down the finger going up and listening for exactly
that so as the user is typing something you can do something interesting as
well I'm going to go back into my directory
listing here and I click on this Source a directory which has all of the
examples that I wrote here in advance and I'm going to scroll down to one
called hello 5. HTML and in hello 5 now we've gotten rid of the button and we
just have this text box but notice now what happens if I start typing my
name as d a v i d D I'm not typing enter at all and in fact if I start deleting
and I change my mind and start typing Carter's name notice now that the
web page the Dom inside of the computer's
creating a variable called input and selecting from the document the one and
only input tag that we saw just a moment ago I'm then adding on line 11 in
event listener for key up which is exactly that gesture so that I can execute
some additional code anytime the human lists their finger from the keyboard
after typing a key what do I then do well I'm going to go ahead it seems and
declare another variable called name and I'm just going to select some P tag
on the page and now we didn't really see a P
tag so I think it's time to look at the HTML if I scroll down to the bottom of the
page where my actual HTML is you'll see that there's just a form tag and no
onsubmit Handler anymore as before there's just an input tag and no button
at all but there is on line 29 here an open and close P tag just so I have an
empty placeholder in which to put something like hello David or hello Carter
so that's why now on line 12 I can define a variable called name and I can
select that P tag so that what do I
delete delete delete delete delete and nothing's there now that if condition is
no longer true and so we see this default value instead so this is only to say
that by harnessing these various events that are constantly happening on
most any web page we can now register code just like we did way back in
scratch to actually listen for those events and do something with them now it
turns out we can do some interesting things even using third party coat and
just as we used bootstrap a bit ago to
make our table prettier allow me to propose that we also take a look at this
version of favorites as well let me go back into my source a directory and
open up favorites 2 which I made in advance which looks almost the same
though I've zoomed in here a bit but you'll notice somewhat subtly over the
leftmost column in this table you'll see now this arrow in Gray pointing up
and pointing down previously those were not there all I had was a static
HTML table with all of this data sorted in whatever order it
was inputed the other day in that form but now notice what I can do if I want
to sort in uh one order I can click this Arrow or the other order I can sort in
this Arrow so essentially doing it chronologically forward or backward now
how is that sorting happening it's presumably based on all of the timestamps
that were registered when we submitted that Google form just a bit ago um
uh when it was live but now using JavaScript it turns out that we can use
some logic somehow and sort this data by
the same and you don't get that automatically just by using HTML alone now
how did I achieve that well it turns out if I go ahead and close these hello
files and in vs code let's open up uh favorites 2. HTML you'll see that all of
the HTML is actually the same if I scroll down and down through this file but I
added a little something interesting at top I copied and pasted the
appropriate URLs and HTML tags from bootstraps documentation and you'll
see here that I have a file called not only
adding now another HTML attribute called Data toggle whose value is table
and I know that only from the documentation of these libraries indicating
that's that's how I can now enable this table to be interactive as I can too by
adding data sortable equals quote unquote true on specifically the
timestamp column and the only thing unfamiliar here perhaps is I'm using t P
for table heading as opposed to TD as I do elsewhere but that's all that it
takes to now focus on the raw data you want to present and let
someone else do the heavy lifting of actually implementing the logic well
let's end with just a look at what more you can do with Java script and just
how powerful it is when you combine a language like this with the data and
the uh user interface you want to convey let's go ahead and open up within
Source a directory something called background. HTML now this interface
here is quite simple it just has three buttons RG and B and the background of
course noticed by default is just white but when I
click on the r the background turns red when the I click on the G it turns
green and the blue it turns blue and again and again so how is this working
well if you think back again to the available events perhaps I'm just listening
for a click on those buttons and then doing something with maybe the CSS of
the page to allow me to see those different colors so in fact let's go back to
vs code here and let's open up background. HTML and in here you'll see
some simple HTML at the top just three buttons but
I've given each a unique ID so that I can reference it in code and then inside
of a script tag here below because I didn't bother with the uh Dom content
loaded event here notice that I'm doing the following I'm creating a variable
called body that lets me select the body tag I then have in these three line
some code that handles red what am I doing well I'm selecting from the
document whatever h HTML tag has unique ID of red and then I'm adding an
event listener for any click on that button and anytime
someone clicks on that red button I call this function anonymously it doesn't
even have or need a name and this syntax here is powerful because now in
JavaScript I can alter the CSS of my page by doing body which is the tag that
I selected two lines ago accessing its style accessing its background color
property and setting it equal to quote unquote red and I do the same down
below for green I do the same down below for blue and the only thing worth
noting here is that in CSS it turns out it's
the case that the CSS property for the background color of a page is actually
background Das color in all lower case with a hyphen in between
unfortunately in the world of JavaScript a hyphen would be mistaken for
subtraction like background minus color which would be wrong so the
convention in JavaScript is when you're trying to manipulate CSS you take
whatever the property name is uh font size background color and you
change it into so-called camel case here you get rid of The Hyphen and you
capitalize the subsequent words like color in this case here all right how
about another well it turns out back in the day back in my day there was a
HTML tag that would actually allow you to do this create blinking text on a
screen it's rather unpleasant at this rate certainly but how might this work
well it turns out in JavaScript if we take a look at the blink. HTML file here
we'll see that you can in your HTML do something literally as simple as hello
world in the body but then you can call
this function here turns out just like document there's another Global special
variable you can use in JavaScript and browsers called window which refers
to all things related to the window itself the window comes with a set interval
function which lets you do exactly that set an interval in milliseconds and
after every expiration of that interval some function will be called for you so
in this case it's saying every 50 milliseconds but let's actually slow that down
now to 500 milliseconds or for
one half a second call a function called blink notice I do not have
parentheses after the blink name because I don't want to call blink now I
want to tell JavaScript to call the function called blink every 500 milliseconds
now we'll see in a moment what that code looks like but let's go back to the
page and reload and you'll see now that it's a more pleasant blinking if that's
even the case every half second because I'm now firing that event that is I'm
uh calling that function now every 500
milliseconds instead how am I doing that well this same script tag I've
invented my own blink function this is has the distinction back in the day of
actually being an HTML tag and among the few tags that the world removed
and got rid of so that it's no longer used because it's not all that user friendly
but down here what am I doing I'm getting the body of the document itself
with this variable and then I'm checking two CSS properties that we didn't
talk about before but it
turns out that you can check and set the visibility of an element in JavaScript
by going into that tag checking its style and getting its visibility uh property
and if it happens to equal hidden the next line of code here 13 sets it equal
to visible instead else if it's not hidden it must already be visible and so line
17 flips it the other way and changes it to Hidden here left hand right hand
clearly not talking no idea why the opposite of visible Is Not Invisible it's
indeed visible and
hidden but this just allows you every time this function is called to change
the property from one value to another achieving that blinking effect you can
do even more powerful things that you and I take for granted every day let's
go into Source 8 and go to autocom complete. HTML which I wrote in
advance and this is a page that also loads into memory a really big
dictionary that you might recall from problem set five and if I go ahead and
type in something like c a t you'll notice dynamically an
unordered bulleted list appearing below the text box that shows you all of
the words in that dictionary from pet 5 that start with c and then CA and then
T just like the autocomplete you see every day on your phone in Google or
websites like it how is that working well probably I'm listening for the key
press going up as soon as that key is pressed I'm probably searching through
a big array uh really of all of those words maybe using linear search maybe
binary search if uh faster than that and then looking for any
string in that array that starts with C or CA or c a t and then I'm generating
automattic IC Ally the HTML therefore but perhaps most familiar nowadays is
just how much your phone and your laptop know about you and let me go
into a final example here in Source 8 called geolocation HTML which is a term
of art that just refers to figuring out your geography for instance your GPS
coordinates now here we have a third and final Global variable that your
browser provides you with called Navigator and
and you can do anything you want with the position that comes back the
latitude and longitude respectively so I'm going to use a function that's not
often that help but for our purposes today it's just going to write to the
document itself to my big rectangular region whatever the latitude is then a
comma and then the longitude as well so if we go back with this final flourish
into Source 8 open up geolocation HTML you'll see that my browser first
wants my permission to let this website my own
know my location I'm going to go ahead and click allow crossing my fingers
because it might take a moment for the phone or the laptop to figure it out
and it looks like according to Google I am right this moment with my Mac at
latitude longitude 42375 comma 7111 let's go ahead and highlight and copy
that let's go to a website like Googl [Link] crossing our fingers if you've
never done this you can search for GPS coordinates too let's hit enter and
amazing we are indeed in Sanders
Theater roughly there standing on this stage on Halloween and that then is
week eight we will see you next time happy Halloween nothing go buffering
okay Josh [Music] nice it's Moy no oh wait that was amazing Josh uh um
Sophie [Music] amazing that was [Music] perfect I think I hey to you [Music]
oh that was amazing thank you all so good [Music] [Music] [Music] [Music]
[Music] all right this is cs50 and this is already week nine which is our second
to last indeed this is really the last week
where you'll learn in this class how to program but indeed it's this week
that's really meant to be the the pedagogical climax of like all of these
various languages we've been looking at all of these various techniques all of
this syntax so that at the end of cs50 in just a few weeks you indeed feel
that you didn't take a class on C and you didn't take a class on python but
you really more generally took a class on programming because indeed we
know already about half of you uh will go on
to study computer science further but half of you will not and indeed all of
your programming chops here on out theoretically will have a foundation in
what we been doing these past many weeks but here on out it's really going
to be up to you to learn some new fangled language when it comes out or to
follow some new trend when some language eclipses the ones we've been
using as more popular as more appropriate for problems you want to solve
and so today really is about synthesizing so many of
the past few weeks but doing it in the context of web programming which For
Better or For Worse is so very much invogue nowadays both on our laptops
and phones and indeed the languages we looked at in recent weeks are used
not only to make websites but also full fledged applications and app stores
and the like so this really will be the culmination of those past several weeks
and indeed we'll even talk about some familiar Concepts like shopping carts
when you're on Amazon and these things
called cookies when you're visiting websites all of those topics too will come
into play and you'll have an understanding of what all that means from the
ground up so how did we get here well just last week we focused on HTML
and CSS primarily which are not programming languages they're just about
Aesthetics structuring your data presenting your data and so forth um and
we served the web pages we wrote using this program HTTP server this is
just one such program there's dozens hundreds
of different web servers that you can use out there this is just a super simple
one we pre-installed in your codes space for you in vs code so that you can
just serve up web pages at the end of last week two though we teased
JavaScript a full-fledged programming language that you can use to
manipulate the users experience for the better to make things more Dynamic
and interactive by actually running code in the user users's browser on their
Mac their PC their phone as opposed to server side
which up until now is where all of our code in C and python has been written
so you're writing code on a server you're serving code from a server but now
with HTML CSS and JavaScript it's getting executed in a browser but today
we're going to give you one final feature of python or really languages like it
that you can also use code on the server to generate automatically
dynamically the HTML uh the JavaScript the CSS that you actually want the
user to rece receive you don't have to hardcode everything as
you have when making your own homepage well let's consider what what
some of the building blocks were last week so here's a sample URL and over
here slash is sort of the default page on any web server it might be
[Link] it might be something else that's just a convention but it refers to
whatever the default actually is you can visit of course when any browser like
a URL that ends in file. HTML or something else. HTML and that literally
means your browser wants this file on this server
or of course we saw that it can be a folder and inside of that folder is
presumably some default file name like again [Link] or you can be more
explicit like folder file. HTML and these more generally we just called paths
and indeed a path is just a location on your Mac your PC or on a server of
some piece of information but today we're just going to rename this only to
use other common terminology but they're really just synonyms today we're
going to refer to those same things as
routes because now today we're going to ultimately replace HT PTP server
which just serves up static content that you all write with your own web
server like now you will be the ones controlling what it is the server does in
response to the user in uh requests so that you can respond interactively and
dynamically but we're still going to see techniques like this these were our
so-called HTTP parameters they're everything that comes after a question
mark in a URL and it can be like key
equals value and an example was what when we played with Google what
was the key and what was the value that I first tried any recollection I was
searching for cats and so the key I figured that was Q because that's what
Larry and Sergey who created Google years ago decided the name would be
of the HTML text box that you type your query into and if I type cat for cat
the value of that would end up in the URL for Google as being question mark
cat equals value and I mentioned that it's often the case that
you want to send two different inputs to a server and this is why I propose
that you just keep an eye out for Amper Sands and and Ampersand separate
these key value pairs but again this is the same darn Paradigm as before and
we've seen this so many times right key value pairs in dictionaries in Python
we've seen uh HTML attributes and their values we've seen CSS properties
and their values it's all the same thing associating something with something
else even though every language every person seems
to have their own uh vernacular for it it really is just the same idea this
associating of something with something else we'll continue to see and here
to be concrete were the HTTP lines of text that were in those virtual
envelopes if you will if I were indeed selecting trying to search for something
like cats on Google this recall was the message that got sent to the server by
my browser in order to tell Google to please search for not dogs but in this
case cats now what is HTTP server been doing for us well it's
just been serving up HTML files CSS files maybe some js or JavaScript files
but it has been ignoring any HTTP parameters like HTTP server does not take
user input why well what's it going to do with it because you already wrote
the HTML you already wrote the CSS like there's no decisions to be made
until we introduce a proper programming language on the server and so
we're going to move away now from this simple HTTP server program and
introduce you to your own server that's going to handle the
parsing that is the extraction of these key value pairs so that you and I don't
have to write python code all of a sudden that like analyzes this stuff figures
out what pages requested the key value pairs all of that were still going to
get for free by just using the right framework and so today we revisit python
uh which we've now used in some form the past few weeks and indeed it's
kind of been the glue that allows us to stitch together some of our own logic
we saw it with SQL we're going to now see it with
HTML CSS and even JavaScript if we want and we're also going to see
another language today not a programming language called Ginga and this
is going to be a common Paradigm in the real world whereby different
languages different libraries different Frameworks often like borrow from
each other or they use uh technologies that someone else wrote just so they
don't have to reinvent that wheel so flask is just a framework that is a third
party Library it's pretty popular nowadays it's
these are not the interesting ideas the interesting ideas are the ones we'll
focus on in code but starting today instead of running HTTP server to serve
up a static website we'll have you start running literally flask space run in
your terminal window to run your own web server that's implemented in
Python using this flask framework so bootstrap was a library for making your
CSS and JavaScript prettier and more interactive flask is a framework or
library for just making your python code more pleasant to
use since you're borrowing features from someone else all right so how can
we go about doing this well if you are to write your very own web application
your own [Link] your own [Link] in Python using flask minimally
you need to have a file called [Link] by convention which is where all your
python code goes and then a folder called templates which is where all of
your templates go and for now your templates are just your HTML files so if
we're going to now start building more interesting
convention so that the server can like automatically install things for you
without you having to do it manually and then static is going to be where
literally your static content goes so if you've got images for your web
application if you've got JavaScript files CSS files by convention that goes in
static these are just conventions like all of this can be changed but this is like
the way to do things so we'll introduce you to the defaults all right so what
does this mean how for instance
could I go about implementing my own web application using python that
somehow spits out a message like Hello World all right well turns out just this
now we'll tease this apart in just a moment but this is the content of a
sample [Link] file that apparently uses some Library stuff like familiar
syntax from something import something else we've seen that before with
csvs and other libraries this is somewhat new syntax but it's kind of copy
paste for now this is definitely new syntax and kind of
weird with the at sign here but we'll see this again and again today and it's
just copy paste initially until you understand what it's doing for you but at
least there's some familiar stuff here like [Link] is still going to be with
us but it's going to be up to us when and how to show it to the user so let's
make this more real let me go over to vs code here and let me go ahead and
create a how about we'll do this in hello let me do makeer hello to make a
new folder called hello and I'm going to
CD into it just to isolate all of these files to the same directory so that we
have different apps today and different folders and now I'm going to do code
of let's do this actually let's do our maker templates. HTM ah sorry not
templates. HTML let me rename that to templates using the MV command
this has nothing to do with web programming this is me making typos so if I
type LS now I've got a folder called templates all right in there let's create a
file called [Link] that is going to be
super simple and pretty much copy paste from last week let me hide my
terminal window and let me just very quickly whip up a simp simple hello
world page using my HTML tag Lang will equal English then inside of this I'm
going to have a head tag inside of this I'm going to have a title tag and I'm
just going to call this thing hello uh I'm going to then have a body and in this
I'm only going to say something simple like hello comma world and just so
this is mobile friendly recall that we touched on these
meta tags so just in case you after class play with your mobile device instead
of your laptop I'll do name equals quote unquote viewport uh viewport and
and content equals and I never remember this I'm literally reading it off of a
cheat sheet initial scale equals 1 width equals device width and this is just
this magical incantation that says to the browser like size things
appropriately for the size of the device it blows up the font sizes a bit all right
so that's what I would have done
last week and I would have served this web page by running HTTP server in
the same directory and boom I would see that HTML but let's now start to
take some control over the user's experience and for now it's going to be
underwhelming it's just going to always say hello world but in a moment
version two is going to say hello David or hello Carter a bit more dynamically
and we'll quickly escalate from there to just more interesting applications as
well culminating with things like cookies and
shopping carts and the like so let me go back into my terminal window and
as promised let me create another file called [Link] and this is where now I
need to implement the web server I'm going to run using this flask
framework and for now I'm just going to kind of do some copy paste from uh
what we saw on the slide a moment ago from the flask Library which we've
pre-installed for you I'm going to import a uh function called flask capital F
it's subtle but it's important there and I'm also going
to import a few other things a function called render template and another
variable called request and the only way I know this is from having taught
this before read the documentation followed a tutorial like you wouldn't know
this unless someone told you or you read how to do this this but what this
means is that this Library called flask has three things in it a function called
flask capital f a function called render template and a variable built into it
called request and this is going to be
all the building blocks I need to implement my own web server the
convention in flask when you want to create a web app in Python is you
create a variable by convention called app and then you assign it the return
value of that flask function capital F and pass into it underscore uncore name
uncore underscore which is weird but we have seen this before a few weeks
ago anyone recall when and why we mentioned uncore uncore name uncore
uncore yeah I think was it name or something yeah if we wanted
to check if the name of the file was itself main so that we avoided a situation
where if you're writing your own Library code you don't want your code to be
executed automatically you want to potentially execute the main function
and that was a solution to that problem here for today's purposes this is just
the way you do it underscore underscore name underscore uncore refers to
the current file and so this is just a little trick that says turn this file into a
flask application that's all it
is and for now uh that line suffices all right what do I want to do after that
well now I'm in charge of the web server I need to write the code that
decides based on the browsers request what file or files I'm going to send
from the server to the browser last week http server did all of this for us just
based on the file name but today I'm going to take over control over that
process and the way I do that is as follows I say app. route with weirdly an at
sign in front of it this is known in python as a
decorator and it's a feature of python not a flask that we just didn't introduce
in weeks past but it's a special it's a handy trick to do what we're about to do
the route I want to Define is quote unquote slash so that is here is code I
want the server to execute whenever I user visits forward slash the default
page of the website well what code do I want them to execute well I want
them to execute a function and I can therefore Define in Python a function I
can technically call this
my terminal window let me go ahead and do flask run in the same directory
that has appy and hit enter I'm going to see some cryptic output but
including a URL of my code space and if I open that URL after hovering over
it I'll indeed see hello world as you might hope but let me do this let me go
ahead and rightclick on the page and click view page source which if you
haven't done before shows you all of the HTML for a page however pretty or
messy it is and that's it there's no HTML that I've spit out it's
just quote unquote hello world well if I actually want to spit out a full web
page which is not a big deal here because who cares it's just the text anyway
but if I want to spit out a whole file let me do this I want to return essentially
the contents of [Link] which have all of the tags I want the mobile friend
stuff and all of that well I can't just return [Link] but I can return this
render template quote unquote [Link] and per the documentation for
flask this render template function will
the puzzle pieces if you will via which to now store all of our HTML in one
place and presumably CSS JavaScript and so forth but then serve up
whatever we want even though I'm just blindly spitting out [Link] so
before we proceed any questions on this which again I claim is like my
manual version of what HTTP server was doing for us automatically last week
but this is how you do it yourself any questions all right well let's make it
more interesting which we could not do with HTTP server and HTML alone
why
don't we go ahead and do this let me visit the same URL and I'm going to
zoom in and your url will differ from my code space but it's going to end
similarly here I'm going to do slash question mark name equals David for
instance or Q equals cats or name equals Carter any key value pair I want
I'm going to append after a slash and a question mark thereby providing user
input to the server albe it in a very user unfriendly way no one's going to
normally do this in their browser enter nothing changes
here it just says hello world but wouldn't it be nice if it says hello David or
equivalently if I zoom in here again and change David to Carter and hit enter
wouldn't it be nice if it says hello Carter instead so we need some dynamism
there and here's now python is going to be our friend if I want to access the
HTTP parameters that the user has provided via the URL be it Q equals cats
or name equals David I can use this special variable I already preemptively
imported earlier and I can do this if
there is an HTTP parameter called name in what I'm going to call request.
args then I'm going to go ahead and create a variable called name and I'm
going to set it equal to request. args bracket name else if there is no quote
unquote name key in this special variable called request. args I'm going to
just assume that the user's name is World by default now what's going on
here well it turns out that flask provides us with this special variable called
request. ARs and in there is all of the key value pairs
that might have come in via the URL so if you had to guess what type of data
or what data type is request. args that's its name and here is in context line
n might provide a clue in Python what data type might request. args be yeah
uh it's not going to be an array or a list because those are always in every
language we've seen numerically indexed but you're close someone else it's
a dictionary so a dictionary is similar syntactically to a list in Python but
instead of numeric
indices like 012 you can literally use strings like quote unquote name now
that's a bit of a white lie it is a dictionary but it's flask's special fancy version
of a dictionary but the syntax via which you can access it is exactly the same
and I actually this is a typo I didn't mean to say names there I meant to say
name singular but otherwise I think the code is correct this is going to on line
eight check if there's a key called name in request. ARs and if so it's going to
set it equal
to that value otherwise it's going to default to world I deliberately did not do
this I added this IFL and did not do this why what error might happen if I just
blindly grab name exactly if there was nothing at the end of the URL that was
of the form question mark name equals someone then there would be no
name key and this is uh you know a couple weeks back but this would give
you one of those annoying key errors when you get a trace back because you
screwed up because you used a string that doesn't exist that's why I'm just
proactively trying to avoid that situation just like I'm might have a couple of
weeks ago so even though it's more verbose this is just much more
defensive so that I don't accidentally index into a dictionary where there is
no key but we'll see how we can tighten this up to be not four lines but one
but I think now I can do this wouldn't it be nice if now in my [Link] file
which recall is in my templates folder wouldn't it be nice if I could do the
equivalent in C of like a percent s here
for instance or in Python something like this name well it's close and this is
just because different humans invent different languages invent different uh
Frameworks the Syntax for this in flask is to actually do whoops two curly
braces and then name of the variable inside of it why it's just probably
someone figured what are the odds that a normal person is ever going to use
two curly braces at once versus just one so this is probably decreasing the
probability that people actually want to
Output literal curly braces like this so it's similar in spirit to Python's F strings
it's similar in spirit to C's percent s it's similar in spirit to sql's question marks
same idea slightly different syntax and this there is ginger so it's not
programming code per se it's just a template and indeed that's why this
folder is called templates it is sort of like a a blueprint for what I want to be
spit out to the user but I've got these placeholders like this variable that I
want to plug into that value now this alone is not enough watch what
happens if I go back to my other browser and I reload the page after
changing up here let's do name equals David again enter nothing outputs
after the hello comma so it seems that the name variable doesn't exist yet
and that's why indeed if I do view page Source you can see what was sent to
the browser something's wrong with my placeholder but I just need to be a
little more explicit as to what I want to send where so it turns out that
the render template function takes not just one argument the name of the
template you want to spit out but it takes after that with commas all of the
placeholders you want to plug in so for instance if you want the placeholder
to be this literally placeholder inside of those curly braces you can then
specify as the second argument to rep uh render template a placeholder
named argument equals whatever the name is so name is the variable in the
lines above placeholder is the name of my literal placeholder in the curly
braces and so
now if I go back to my browser and reload this with still quote unquote with
still question mark name equals David in the URL now I indeed see hello
comma David and if I zoom in here and let me move over here let me type in
Carter and hit enter now I see Hello Carter instead now this is a little
unnecessary to explicitly call the placeholder placeholder especially if you
want to have two or three of them so you can actually call this anything you
want and I'm going to change it back to name which is a little more
straightforward the only weird thing here is that now you'll see that you're
writing code like this and this is correct and this is the norm it just looks
weird but the thing on the left of the equal sign is the placeholder you're
using in the template the thing on the right can be any value you want
including a variable so even though I'm naming them exactly the same which
looks stupid admittedly like this is what people tend to do just because it's
uh simpler than introducing another word
explicit default value so if you don't want none to be on the screen like hello
comma blank or I mean that would be weird too you can just put in a default
value per the documentation of this function like world so now we've gone
from four lines to just one so arguably it's better designed and if I go back to
the browser now still with Carter in the URL and hit reload same thing
happens but we notice this suppose I uh get rid of the name parameter
altogether and hit enter now it goes to the default instead
world so it's just a little better a little better designed than doing it the other
way instead all right how about we take things up one more Notch and how
about we introduce multiple routes and actually introduce perhaps a form to
the mix because again no normal person is going to like visit a URL and add
a slash and a question mark and their name like that's not how browsers
work uh well that's how browsers work that's not how humans interact with
browsers you and I use a form to quickly instead so
now things can get a little more interesting when making our own web
application cuz maybe we could do something like this let me go and zoom
out again let me go back to my code here and let me move this around and
focus now on the [Link] file instead of just this placeholder why don't we
go ahead and give ourselves a form like we've played with a little bit in the
past be it for Google or something else and let's do this uh form and inside of
this form let's have an input and the
name of this input will be quote unquote name so that too is confusing but
inputs have name attributes but this is a person's name so I'm saying name
equals name here so just a messy world of semantics and let me go ahead
and make this a text box by default and then let me give myself a button
whose default type will be submit and the name of this button will be greet
for instance so let's see what happens here but let me change [Link] to just
be the original simpler I'm not passing in any
placeholders now and I'm going to even get rid of this I'm just going to
rewind to the first version of this for Simplicity let's now change the url to get
rid of Carter and myself so we just go to slash and hit enter and now we have
a super simple form again all right this is not super userfriendly but there's
some nice enhancements we can make for instance like we can uh for
instance turn off autocomplete especially if I want to type David and Carter
manually and I don't want it
and I'm sort of ready to go but this form hasn't been wired up to go
anywhere yet and so let's do this let's for instance say that the action of this
form is not going to be something like [Link] which we did last time
with for cats I am now going to be both the front end and the back end of
this website the front end is what the human sees the web page the graphics
the forms the back end is the stuff the human typically doesn't see the
python code the SQL code the server itself but now
I'm in control of both sides of the experience the HTML and also the routes so
let's just propose that we invent our own route and instead of calling it SL
search like Google does let's call it SLG greet and let me specify that the
method this form will use which is technically the default will be get and
confusingly it is lowercase get even though in the envelope we keep talking
about virtually it's actually capitals again left hand wasn't talking to right
hand when these things were decided all
right so all I've done is create a web form that's going to submit whatever
the text box value is to a route called SLG greet by default because there's
no HTTP or htps or no domain name SLG greet is going to be assumed to be
not at [Link] but whatever my own serers URL is so whatever my code
spaces URL is that's going to be the implicit prefix this SLG greet is just the
route so now let's go back to VSS codes [Link] file how do I now Stitch this
together well I think we're good to go with [Link]
if index. html's purpose in life is just to spit out this form we're done with one
of my routes but if I want to have a second route greet that actually spits out
some greeting to the user well let's prepare that template too let me go
ahead and highlight all of this HTML let me go back into my terminal window
and into my hello directory and then into my templates directory and let me
create another template called greet HTML whose purpose in life will not be
to show a form but to greet the user with hello so
and so so in this file I'm going to paste all that same HTML but I'm going to
get rid of the form and essentially revert to our previous version hello
comma and then using the ginger syntax name so one template [Link]
is for the form the second template now is for the greeting of hello comma so
and so but otherwise these files notice are almost the same except one has
the form one has just the hello so now let's finish this up in app doop High let
me go down here after a couple of blank
lines stylistically let me do app. route quote unquote SLG greet but I could
call this route anything I want I'm just using a a reasonable verb then let's
define another function I could call the function anything I want X Y or Z I'm
going to call it more reasonably greet no arguments and then now is the
code where I want to render the template so I do return render template
greet HTML but but I need to do one more thing what else do I want to do if I
want greet HTML to have access to the
human's name just to recap I think we solved this already but I deleted it but
what do I have to add back yeahh yeah so I got to pass in the placeholder
somehow so I can do this a couple of different ways I I'm going to keep it a
little more elegant this time I'm just going to put my name uh argument
there and I'm going to set it equal to request. ar. get quote unquote name
comma world before I used a separate variable but I only used it in one place
so that's not strictly necessary so this is fine too but if
this gets a little overwhelming notice that I can alternatively do this I can
create an actual variable called name and then I can pass in an argument
called name with a value that is that variable but again what's really the
point here it was kind of prettier all on one line so these are the exact same
things I'm just trying to tighten things up further here all right so what just
happened if I go back to my form this is still [Link] if I reload it nothing
has changed if I type in my name to this
form notice again the URL I'm currently at this is Chrome hiding things it's
technically slash by default even though many browsers are just hiding
unnecessary uh characters these days but Watch What Happens now if I
scroll over here and I click greet on this new form notice my URL my route
changed to SL greet question mark name equals David and the body of the
page at top left says hello comma David so this is exactly how [Link]
works and it's how we implemented search. HTML last
time but instead of submitting the form to Google via the form I'm submitting
it to myself my very own route so I'm implementing my own backend for this
same front end all right any questions just yet much less interesting than
Google certainly but we kind of have all of the wiring now any questions no
all right so what can we do to further uh tighten this up and adhere to some
conventions well let me propose that in this version we solve one problem
and even if you've never done this sort of
thing before I dare say we have enough weeks of cs-50 where if I show you
[Link] again and greet HTML again odds are to someone's mind there's a
opportunity for improvement why is this web app super simple though it is
arguably poorly designed at the moment and the answer lies somewhere in
these two templates [Link] and [Link] probably did
necessarily for your homepage why because when you have HTML only
maybe CSS and even JavaScript that's all you can do is copy paste copy
paste and just make sure that you have the same structure maybe you have
the same CSS file the same Javascript file the same third party libraries but it
makes it very very annoying as you might have realized already to just m
make a change that affects everything so wouldn't it be nice to like factor out
all of this and all of this and just let the body change so here too is
something that flask and really other equivalent Frameworks let us do it
allows us to create what we're going to call
other file so instead of just using curly braces two of them left and right I
have to use slightly different syntax to say I want a whole block of HTML here
from some other file and the way to do this even though the syntax is a little
non-obvious is you use open curly brace percent sign block then you can call
the next word anything you want it just has to be a special type of
placeholder for an actual file not for just a variable I'm going to call it body
only because I'm in the body so I'm want a
placeholder to be the entire body and then outside of this you then say in
one word no space end block so it looks kind of stupid honestly and why do
we have yet more ugly syntax again just different software developers in the
world are all choosing their own Syntax for their own libraries so they all kind
of look different but are all kind of similar in spirit and you just get used to
seeing the different syntax this now is not nearly as pretty as the pair of
curly braces for variables but this is
how I can say plug the contents of an entire file Here and Now what does this
let me do I can now go back into my [Link] file which at the moment still
looks like this but almost all of this is copy paste the only lines that are
interesting and different are these four lines here in the body so what I can
actually do now is I'm going to highlight that and cut it and then I'm going to
highlight everything else and just delete the entire file and I'm going to use
some of that same syntax
and say curly brace percent sign extends quote unquote layout. HTML and
then I close my thought with a percent sign and close curly brace so this
syntax as you might just be inferring is now saying please extend whatever
layout. HTML looks like that's the original blueprint the mold out of which I
want to make this web page and now here the syntax is a little weird too but
similar at least from before I can now say the block the body block that I
want you to plug into that layout is going to be everything
between these two tags which we already saw earlier but in layout. HTML
they're sort of giving a placeholder in [Link] this is what I'm going to
plug in to those other placeholders as well so I'm just going to give myself
some extra white space I'm going to paste the HTML that was there if I want
to make clear what's going on I can indent it although this has no no
functional impact but it just makes clear that just like in HTML you can open
a ginger tag and close it but in
ginger here we have this here hey uh Hey python here comes the body of
this page hey python that's it for the body of this page and all of this stuff
should be plugged into this main parent layout if you will so super ugly
admittedly but now at least things get way less redundant because I'm going
to do the exact same thing over here in greet HTML it looks like this but now
I'm going to do this extends layout. HTML also just as before uh the body the
uh the body that I want to plug in is going to be
everything inside of these tags here and this body is just going to be hello
comma name in curly braces like that so again ugly syntax got really ugly
fast but it's really just following these patterns now and we have two types of
placeholders two curly braces for variables and now this kind of syntax with
the percent signs and the single curly braces for like contents of actual files
and so now in this world or in the world of a homepage if you were using
flask and python to make your personal
homepage with all of those various Pages you would probably design one
main layout with all of your pretty logos and colors and fonts and like what
you want the site to look like and then each of your smaller Pages would now
be distilled into just these smaller fragments and whether you're using
python or Java or Javas script or other languages too all different
programming languages have popular Frameworks that do things like this
the idea is the same across all of them all right let's see
if it works let's go back into the browser let me go back to my slash route
there's that same form let me type in David and type and click greet and
indeed I see hello comma David I see that greet was automatically added to
the URL by the browser when I submitted the form followed by the key value
Pairs and if I view the page source as I did earlier you'll see that you have the
entirety of that layout with hello David plugged in meanwhile if I go back to
the form and view this page Source you'll
see the exact same layout but with the form tag plugged in and here's where
you can be a little less uh nitpicky with styling okay yes this isn't technically
indented inside of the body but it was relative to the original file so at this
point in the game you don't need to worry about your outputed HTML looking
super pretty you want your source code that the see to be pretty not the
browser this is not a stylistic concern okay questions on these capabilities
then a flask or problems that we've just solved
and why yeah uh okay so if the files in question are in different folders for
instance if I go back into my uh index page which has the form um the routes
here are entirely dependent on what is an [Link] there's no notion of a
folder when it comes to implementing a web application anymore they are
more generically routes however and we've not done this yet you can put
your static content your images your video files your CSS files in a folder
called Static and there can be subfolders therein and that would affect
what you use as your Source attributes for images or your Source uh tags for
video or any of those kinds of assets and we'll see that eventually in the
home in the uh the problem set next other questions on what we've just
done here yeah good question how do I how did I ensure that the web app
starts on the form and then goes to the hello page so whatever you decide
your default index route is like the implicit slash that is what is going to be
pulled up when a user visits the domain name where your
sense might this be bad design or in what kinds of web apps might you not
want the name to show up in the URL like that because this is what Google
does this is what my app does yeah yeah so if I'm logging in with a username
and password I I could imagine that they show up in the URL after the
question mark where username equals mailin and password equals 1 2 3 4 5
but then all my you know like nosy siblings need to do is go through through
my browser history and boom like it's right there for them to copy paste so
that
Source method is now post so let me go ahead and type in David now and
click greet and before we saw hello David but now I get method not allowed
and this is somewhat subtle but in the title of the tab notice that it's a 405
error which is not familiar probably almost all of us have seen 404 file not
found turns out 405 a little more Arcane is the method the HTTP verb is not
allowed why because by default my [Link] only currently supports get by
default how do I support post well I just need a little
bit more syntax so let me go back into vs code here let me go into [Link]
now and after changing the form I just need to inform flask that you know
what the method I want this GRE route to use should not be the default
which is only get I want it to use these methods and it takes a second
argument called Methods the value of which is a list the default of which is
quote unquote get so that's the default this has not made any changes but if
I want to support post instead I can explicitly pass a list
with one string in it P instead and now what does this mean we didn't talk
about this in any detail last week but inside of this virtual envelope typically
is that line like get slash search Q equals cat after the ex after the question
mark if you want to hide that kind of information for privacy sake or because
you want to upload like an image which just doesn't make sense to put in the
URL essentially the part of the story would be well the computer looks
deeper inside of that virtual envelope and
anything submitted via post goes below the htdp headers like deeper in that
envelope so they're still there they're just not obviously visible uh for prying
eyes in the user's own browser so just by making that change in the HTML
telling the browser to submit the data via post and changing [Link] to tell
the route to expect the data via post I can now go back to my other tab let
me go back to the original page let me reload just so I've got the latest HTML
and indeed view page Source it's still yep
it's still post but now when I type in DAV ID and click greet now it works but
but but notice the Privacy implication I'm at the SLG greet route but where's
my name it's not actually there it's still went to the server but it's not in your
autocomplete or your history now for privacy sake questions now on post
yeah oh No Just scratching all right can you the programmer see this well let
me show you a couple of other features of Chrome's uh Chrome and Safari
and other browsers as well I keep
going to view page source which just shows you like a readon version of your
HTML but recall that last time I actually right-clicked and went to inspect or
viewed developer tool tools and this brings up a much fancier version of the
developer tools and under elements here you see everything and it's nice
and pretty printed it's hierarchical it collapses things into these clickable
triangles but it's the exact same thing it's just more interactive but notice
what I can do today is this if I go to the network tab
here and let me zoom out a little bit let me go ahead and re uh load the form
here and type in David again and click greet notice now in the network tab of
Chrome's developer tools I see a few things as we saw before one I see that
the request method is post two I see that the server automatically without
me writing any code for this returns 200 when it's successful but I can scroll
down down down down down and you'll see that eventually after all these
cookies more on those later if I click on
payload the second tab next to headers you can see as the developer what
was actually sent to the server so indeed this is going to be super useful like
when doing problem set 9 maybe your final projects if you want to see
what's going from browser to server you have complete control over all of
that information even if you're using htps because your browser and you the
developer can certainly see all of this so again these developer tools even
though there's a lot of tabs and buttons
you probably won't need anytime soon some of them like elements and
network and with JavaScript console are going to be super useful to start to
get familiar with all right any questions now on this implication of post
anything at all no okay how about one final hello example that ties a few of
these things together how about now we try to tighten things up further only
in anticipation of something like problem set 9 or really more complicated
web apps where you might have not two but 20 or maybe
even more different routes it might might be ideal to just minimize how
many total routes we have so we don't get a little too overwhelmed and I
dare say that these two routes are so short maybe I can combine them into
one and maybe I can keep the user at what seems to be the same URL but
just a kind of Tidy things up so let me propose that we do this instead let me
get rid of my greet route and let me go into my form in [Link] and let
me go ahead and just have the action of this form still slash
so I want the form to be visible at slash the index of the site but I also want
the form to submit to itself if only because I don't want to introduce another
route like SLG greet which eventually indeed will be compelling so you don't
have one route for everything you want your website to do so technically this
is the default to and if I omit action the exact same thing would happen as
well but let me rewind and let me now go into [Link] to see how we can
make this happen well if I want
my one and now only route to support both methods I can say methods
equals and then a list with both get and post in any order but I'll keep them
alphabetical like this this now tells python hey this route should handle both
get and post requests at the same place let's now go into this function I kind
of want to say the equivalent of this if get then I want to return the form else
if post I want to then return render template of greet Doh HTML with the
user's name but this is not yet complete
code but I think I can do this I'm going to go ahead and say the following I'm
going to go ahead and say if request. method equals equals get then indeed
return [Link] L if request. method equals equals post then go ahead and
return greet HTML this isn't quite enough though because I still want to pass
in that placeholder so let me again add back name equals request. args
doget quote unquote name and then a default value of world what does this
Now do for me well let me go back to my other tab
here let me close the developer tools let me go back to the form here let me
reload to make sure I have the latest let me view page Source just to make
sure I have the latest and yep I have the latest because it still says post but
it now says slash and let's see what happens now if I type in my name David
previously this submitted bya post so I didn't see any name or value thereof
in the URL but I did end up at SLG greet but if the action is now slash and I
click greet notice that it still kind of
works I see hello comma World although that didn't quite work so we'll come
back to that issue in a moment but notice the URL ends in just slash and
again Chrome is hiding the slash because that's all that's there but it does
not end in name equals David in this case or name and equals world now
notice this too if I reload I'm going to get this warning do you want to confirm
form resubmission the page you're looking for used information that you
entered returning to that page might cause any
action you took to be repeated do you want to continue you might have seen
this on websites you've actually visited where you hit reload and you're
prompted wait a minute do you want to do that odds are you've been
prompted to reload explicitly because why whatever you just did was post
instead of get and by convention besides post being used for privacy to like
hide your username your password your credit card number or the like
besides being used to upload bigger files like images or videos post is also
is a bug though here it says hello comma World instead of hello comma
David and it actually would have said the same a moment ago and I just
didn't retest the code and reveal as much to you or if I did I didn't even
notice it said hello world instead of hello David it turns out that request. RGS
is only used for get when using get request. args is a dictionary that contains
all of your key value pairs but somewhat confusingly when using post with
flask you have to go into request. form I have no idea why
these are not sort of more obvious opposites like [Link] or request. form
and sorry [Link] and request. poost would be sort of sensible names in
this case though we have request. args for get and request. form for post all
right that's an easy fix though if I go back to vs code here let's change
request. ARS to request. form let's go back to my other tab let me just reload
and you know what I'm going to say okay continue to resubmit the same
form because the form was okay it was my
python code that was buggy hitting enter now it's accessing David okay but
watch this again if I hit reload command r or control r i get the same warning
are you sure you want to submit the form yes if I do it manually with the
reload icon I get the same warning as before but if I want to manually induce
a get request well that's fine don't hit reload and send the same request
instead go up to your URL and just put the cursor up there and hit enter and
now notice same URL is a get by default so anytime you
and I have typed URLs into browsers get is always the default only when you
click on a button typically that the programmer has configured to use post
are you actually adding things to your shopping cart or the like all right so we
are back and if I go way back in time myself like this is actually like the first
web application I made back in 1997 I believe uh so at the time this was
would have been what my sophomore or so year I had taken cs50 I took a
follow- on class called cs51 which is a different type of programming and
then I
pretty much taught myself a language called Pearl which is somewhat less
popular nowadays but it's another language like python like Java like
JavaScript like others that can be used to make web-based applications and
the web was very young at the time and the process via which students my
classmates could register uh for the first year intral sports program AKA
Frost's was to grab a piece of paper and like write your name and email
address on it and walk it across across the yard to Wigglesworth I believe
where the Proctor
lived and you'd Slide the piece of paper under the door and like that was how
we submitted forms in my day um so this was an opportunity even back in
1997 is to like move things online and the website went on to live on until I
think like 2007 I found this online and then it's become something else since
um but this was a website via which people could register for sports and
people could log in the scores for various games and whatnot and so
underneath the hood I didn't even know anything anything about
databases at the time it was just like CSV files that I was storing the data in
but there were HTML forms and there was with pearl the language at the
time the way to do the exact kind of stuff that we've just been doing already
with flask and so what I thought we'd do is Implement a slightly less ugly
version of this um repeating uh graphical backgrounds were invogue in like
1997 as you can see here um but this is where these were the Aesthetics of
the day uh including the the so-called blink tag so
let's at least focus on the functionality of this website and not so much the
Aesthetics and see if we can't Implement some of the plumbing for actually
solving like a real world representative problem be it for freshman inal sports
or something else like it where you're getting data from users and processing
it somehow so let me go over here to VSS code let me create a new directory
called fros IMS just so we can keep all of this code in its own directory let me
CD into fros IMS let me proactively make another
directory called templates in which our templates our HTML files do need to
live and eventually I'm going to go ahead and create a two files minimally
[Link] and [Link] so let's do the first of those [Link] will live in my Frost
im's directory and I'm just going to recreate something very simple like we
have previously so from flask in lowercase import flask capitalized render
template and also request so same first line is before let me then give myself
a variable called app set it equal to
calling the flask function capital f with underscore uncore name underscore
uncore and then let me give myself a route for slash as before with an index
function though again I could call that anything I want and just for now let's
return render template of quote unquote [Link] as though that exists so
this is not really a web application as much as it is at the moment just a
recreation of HTTP server for one file let's now in another tab create a
templates file uh called called [Link] and I'm going to save myself
a few keystrokes I let me copy paste from earlier almost all of the layout
from before I've change the title in advance to frosts instead of hello but this
is essentially the same template and for now though because I'm in an
[Link] I'm not going to use extends or any of that fancy block stuff yet
I'm just going to go ahead and create a relatively simple form via which back
in the day my classmates could have registered for interal Sports so let's go
ahead here and I'll propose that we
do this um in this page we'll have a form the action of which will be a route
called SL register though I could call that anything I want it'll be somewhat
private so I'm going to use post instead of get just so that people don't
accidentally maybe register twice by hitting reload uh without warning uh
inside of this form let's go ahead and give them an input uh where
autocomplete will be off as always for demonstration sake autofocus so the
cursor goes there initially the name of this field will be
unless you dabbled further on with forms on your own but I can create a
select menu otherwise known as a drop-down menu in HTML inside of which
are a whole bunch of options and each option typically follows this Paradigm
the value of the option and then the actual text that the human sees so the
value of these options will be how about we do uh basketball as one and I
want the human to see literally the same thing though just like with a link in
HTML they could be different but I'm going to keep them
the same another option will be uh let's say soccer and whoops let me fix my
quotes and this human will see the exact same thing though it could say
something else and then lastly the value will be quote unquote Ultimate
Frisbee and the humans will see the same thing there ultimate frisbe all right
so this is going to create as we'll soon see just a drop- down menu with three
separate options if I want the students to be able to submit this now let me
give them a button the type of which is submit and this button
will be like the word register on it so I think we're pretty much good to go like
this is all just HTML no python no flask per se except for the rendering of the
same template so let me go into my terminal window let me do flask run
inside of this directory because I need to serve this app instead I'm going to
see some ugly output including my own URL and if I hover over that and then
open that URL I should now see a more interesting form it's got not only a
field for their name but also this
drop-down menu with all three Sports now this isn't maybe the best user uh
experience thus far because I feel like I'm biasing people to registering for
basketball maybe because it's checked by default I mean a lot of forms
nowadays have like a blank placeholder for the form so this is just an
aesthetic thing but I can do this let me go back to the same form and let me
give myself just a a blank option at the top that in fact I'm going to disable so
you technically can't select it proactively but I am
alone and if I click on this you'll see that sport is great out and therefore not
manually selectable but I can select any of these other three still all right
well un unfortunately if I type in David and I try registering for instance for
soccer and click register I do end up atreg and there's no question mark or
name or sport so it's probably indeed post instead of get those are hints but
not found notice the tab here very uh uh uh succinctly says 404 not found
well why is that just to be clear why did SL
register give me a 404 what's the logic here perhaps just State the obvious
or it doesn't exist right we haven't done that step yet all right so something
as simple as that and so I actually U sort of belabor that point because as
you're learning like a lot of these conventions and some of this new syntax
like honestly you're just going to make stupid mistakes something's not
going to work but again go back to First principles why is it not found all
right/ register should be a template
maybe called register. HTML oh I forgot my app. route so that should be the
kind of of thinking as you try to diagnose these problems moving forward all
right so let me go into app. and let me give myself a second route here uh so
app. route quote unquote SL route then let me Define a function called
anything I want but I'm going to call it oh sorry not SL route SL register let
me call the function just to be consistent register so but I could call that
anything I want and just for now let's not do anything
too interesting let's just return the rendering of a template called
success .html let's just pretend for now that registration is successful no
matter who you are or what you do now I need that template and I only have
[Link] at this point so let me actually now do my best practices let me
copy all of that let me in a separate terminal window let me do code uh let
me go into my Frost im's directory and let me create a new template called
layout. HTML just like before let me paste all that same code
let me delete the form and just put in that big placehold folder so block body
and then end block is all I did earlier this is just kind of boiler plate now
convention everything else I'm going to leave the same but if I wanted to
make it prettier I could add my CSS up top if I wanted to add like this crazy
repeating background I could probably do that up top too so I could make
every page look as ugly as it did back in my day but we'll focus just today on
the text all right so now that I have
layout. HTML let me clean up [Link] I don't need all this redundancy I
don't need all of these tags at the top instead recall I think I just need
extends quote unquote layout. HTML with the appropriate percent signs and
curly braces I then have the appropriate block body though I could call Body
anything I want but I'm going to stick with my convention earlier and I'm
going to delete the tags down here that I no longer need why because if I go
into layout. HTML I already have all my open
tags all my Clos tags the only stuff I want in [Link] is is going to be
which belongs in the body so end block down here and just to be pedantic let
me go ahead and highlight all that hit shift Tab and that will like unindent it
just to line things up just to be tidy all right so better even though it looks a
little cryptic now but now I've laid the foundation for making a third page a
fourth page that don't have all of that same copy paste all right so now let's
go back into [Link] success. HTML is
where I left off so okay let me open my terminal window let me code up a
template called success. HTML whose purpose in life is literally just going to
be like to say you are registered just so that we see some informative
message on the screen so this part I do still need extends layout. HTML so
there's a little bit of copy paste still which is a little ugly but so be it block
body for this template and I'm just going to say you are registered
exclamation point all right and then end block so super simple it's just an
informative message claiming that the Stu the student is registered all right
let's go back to the original form which is this let me reload to make sure my
HTML has reloaded type in David I'm going to register again for soccer and
click register and oh interesting method not allowed so I'm not getting a 404
anymore I'm getting 405 at SL register what's the deduction here how did I
screw up this time 405 is progress yeah the placeholder uh so it's not the
placeholder I think is okay this is now
about the underlying HTTP stuff the method was disallow was not allowed
say again so get purposes post thing too so by default all of these routes in
flask just by default assume get because it's safe it doesn't allow you to send
information to the server in quite the same way but if I do want to support
post recall that we changed this to be methods equals and then a list with
quote quote Post in it so I just need to enable support for that method that
that is that HTTP verb all right let's go back to the form reload just to
make sure I haven't screwed up type in my name David select soccer from
the dropdown click register and now I'm not only at SL register in the URL it
claims that I am indeed registered now of course I'm not I've done nothing
interesting there's no database there's no CSV file we'll get to that in a bit
but at least I now have the plumbing in place to do something Dynamic
based on that sport all right well how can I now improve upon this how about
we go ahead and implement store the actual
sorry the key is going to be the student's name and the value is going to be
whatever sport they registered for so David and soccer and Carter and
basketball and so it kind of makes sense for like a two column dictionary so
to speak as we often depict it on screen so how can I use this dictionary well
let me go ahead and do this down here under SL register let me go ahead
and initially do this how about we get the user's name from request. form.
Get and set it equal to whatever the value of
name is and I'm not going to give a default value now because I don't want
to call the student world or something strange like that I'm just going to
assume for now that it's there let's then create another variable called Sport
and do request. form doget quote unquote sport to get these students Sport
and then let's go ahead and do this in the registrant dictionary let's index
into it using the student's name and let's set it equal to whatever the sport is
so
I've got these variables just to keep my code tidy and I'm now putting a key
value pair in that uh into that dictionary all right well what do I want to now
do and I'll I'll go ahead and say success. HTML sure let's go ahead and do
that but now I think success. HTML means that so let me go back to the form
reload let me type in David and soccer register okay let me go back and say
Carter and basketball registered okay now let's see what I want to do next
how about I go into um Let me give myself another route
and let's play around here so app. route let's give myself another third route
called registrant whose purpose in life is just to show me who all of those
registrants are just like you would expect from a website like this and then
let me Define a function called registrants or anything else and then let me
return the rendering of a template called registrant H ML and let me pass in
this is kind of neat I can do registrant equals registrant which again looks
weird but what am I doing I'm presuming
this An Li uh and then like the students's name and then an L uh maybe Li
yeah like that and then maybe like sport something like this but I didn't pass
in a name I didn't pass in a sport I passed in the entire dictionary of
registrants now in Python if we were just doing something at the black and
white terminal window and doing a command line program you know I'd
probably have some kind of for Loop in Python Ginger does allow you to do
this so a templating language tends to come with very lightweight
mechanisms
for doing placeholders doing Simple Loops doing Simple conditions so python
like syntax and it's almost identical so watch what I can do inside of this
unordered list let me not start to manually output a single Li let me use this
syntax the same Ginger syntax that I used for Block so curly brace percent
sign and I'm going to say this for name in registrant so this is just like python
Syntax for iterating over a dictionary and now this is going to look stupid but
the opposite of that is end4
so in HTML you use the slash in ginger you literally use the word end no
space and then the name of the keyword so end4 is how you close this but
this is where templating gets really cool you can now do Li and in here I can
do something like that student's name and that's it I'm going to leave it like
that and what I'm doing here is using really a template as templates are
intended I've got like the basic building blocks of what I want this output to
look like but thanks to this little for Loop here
thanks to Ginger syntax the curly brace and the percent sign I'm going to
iterate over every dictionary printing out name name name name and so if
I've got two kids registered now I'm going to see two liis David and Carter
respectively so let's see let me go back to uh my Frost IM tab here and I don't
have a link yet so I got to do this manually uh like a developer would let me
go to slash registrant and I'll zoom out and hit enter and you'll see what
you'll probably see two when making mistakes
for the first time in this world so where is the error message unfortunately
internal Ser server error is not all that useful but we do tell you you see
terminal window so if I go to the terminal window I haven't been paying
attention to this for quite some time and in fact I have two terminal windows
open so that I can still use commands at the prompt but if I go back to my
first terminal window AKA bash there you'll see in your terminal window
When developing web applications like all of
the mistakes you made in the terminal itself this is one of those python
tracebacks that's related to me screwing up here now let me go ahead here
and let's see uh type error function is not iterable and block for name
function is not iterable all right so what mistake did I make well this is what
happens when I don't follow my notes and make changes on the fly so I have
this variable on line five called registrant in all lower case but what did I then
do on the Fly here in line 22 I defined a function called
registrant so like newbie mistake like I shouldn't have done this I can't have a
variable and a function of the same name name because the symbols are
literally identical so just to make clear that this variable up here is actually
Global we'll use our convention like we did in C often when we had a global
variable we'll capitalize it all just to make it stand out like a a constant value
up there and so down here what I'm going to do is pass in registrants in all
cap so that was stupid didn't mean to confuse
there but the reason for that error to be clear is that you can't have a
function that's the same name as a variable I could just change the variable
name alt together I'm going to go ahead and just capitalize it to make it
really stand out that this is in fact a global variable up top all right now I'm
going to go back to my browser let's do David and socer all right but there's
going to be some other mistakes here so on line 17 let me go ahead and
change this variable to be capitalized there because indeed I
want to put the key and the value in this newly named variable as all capitals
registrant let me now go back to vs code here let me go back to the form and
let me start adding some data fresh let me register David for soccer clicking
register now and we should see you are registered but hopefully now it's
indeed in the computer's memory let me go back and register now Carter for
basketball clicking register again and hopefully it's now registered if I now
change my route manually to B SL
registrant which is this newly added route that I made and hit enter now I
see thank God now I see the unordered list containing everything in the
computer's memory so when I say you are registered I kind of mean it now
because the server is still running and in the computer's memory is in this
registrant Global variable a dictionary of key value pairs of course we're only
seeing the keys at the moment so it might be nice to actually see the values
in as well so let me go back to VSS code and
let me go into registr trans. HTML and I'll just do something a little messy I'll
just say uh how about let just make it a sentence is registered four and now
another placeholder I'm going to say registrant bracket name so just like in
Python if registr is itself a dictionary registr bracket and then the key you
want to index into is perfectly valid syntax as well so now let me go back to
SL registrant let me click reload again so why isn't it working everyone
what's the bug that I
introduced earlier if David is registered for none and Carter is registered for
none but David and Carter are in the dictionary like that's a good thing so
some of the data is in there so why are there no Sports Associated well the
first thing I literally just did in front of you all was I went to [Link] and I
stared at line 17 thinking like how did I screw this up I'm putting sport as the
value of the key which is the student's name all right line 17 looked fine to
me a few seconds ago so I
looked then with my eyes at line 16 and this too looked okay my first thought
was oh did I use request. args instead of request. form instead because that
would have assumed get instead of post but no like that looks okay too so
then my final Instinct was oh my god did I screw up the HTML form and so
that's why I went back over to my tab here I went to the original form here I
then view page source and this might not be as obvious to you if you've
never seen the select menu before what is apparently
missing here that might explain my mistake yeah yeah I didn't name this
form field quote unquote sport now to be fair you haven't seen me do this as
a select menu before and it's different from this input when you have an
input tag you literally say name equals whatever on the input tag it turns out
I don't know why I skipped this earlier I probably meant to come back to it
the select tag also can take a name parameter so if I go back to the name
parameter here and go back and add the
name parameter let me go into that template which is [Link] let me add
name equals quote unquote sport in all lowercase which is different from the
visual aesthetic of this temporary disabled option that's just there to make
things prettier for the human now let me go ahead here and first I'm going to
go into my terminal window and I'm actually going to hit control C to stop the
server alt together because I want to throw away the contents of memory
and therefore get rid of that dictionary that had David
and Carter and those nun values so this is sort of me clearing the computer's
memory I'm going to rerun flask run I get that same URL as before so I'm
going to hover over that and open the new tab and just to be sure I'm going
to do view page source and here I see now okay now the form has both a
name and a sport in it all right now I'm really going to cross my fingers
because I intend for this now to work David will register again for soccer
register claims we are registered I'm going to go back and do
it again for Carter and basketball register we still don't have a link so I'm
going to manually go up to the URL and change/ register to registrant as
before zooming out and hit enter and thank God now I'm actually registered
properly for this so oh thank you so what is it like 20 years later I'm still
struggling to implement this site okay so um so here now we have for the
first time in Python and web stuff like now we have a proper web application
and it's not just echoing
back hello David hello Carter this could now work for any of you and it's
currently served privately but if I made this URL public I could put this on the
web now and let anyone in the world register but there's kind of some issues
here there's some security flaws potentially and so for instance let me go
back to the web form here and let me open up the inspect tab the developer
tools and just remind you that anyone on the internet not only you the
developer but a an adversary can see all of your
HTML see all of your CSS see all of your JavaScript but more importantly
because this is all client side in the browser there is literally nothing
technically stopping them from changing the HTML or at least their copy of it
and I did that last week with Yale I changed their website but no I changed
my copy of their website but when forms get involved you could maybe be
actually malicious now because even though this drop-down menu only has
basketball soccer and Ultimate Frisbee suppose I really want to register for
how about uh
let's say uh name your favorite sport volleyball we really want to register for
volleyball but like this website won't let me well there's nothing stopping me
from going under the elements tab in my browser going into this select menu
here and you know what no one ultimate for me let's change this to
volleyball and let's change this to volleyball enter I'm going to close the
inspector now and as requested now we support volleyball in the form now
now it's not changed on the server to be
fair but think about how HTTP works when I fill out this with uh say let's see
Bernie's name Bernie really wants to register for volleyball as well at the
moment my code is just going to trust that what's in request. form is what
was in the original form itself no matter whether the human adversarially
actually changed it so if I actually submit this form and click register for
Bernie and volleyball even though that's not one of the supported available
Sports if I now go to registrant my website nonetheless
has trusted that uh Bernie and perhaps you are registered for volleyball so
what's the implication of this this has surely happened in the past when like
really poorly implemented websites um allow you to uh specify the price of
an item for instance in your shopping cart and they just trust that when you
click submit or add to cart it adds the price to the backend server if you're
not validating the price and making sure as with a database that wait a
minute that price is valid valid or wait a minute
those sports are valid like who knows what people are going to do to your
site and it's that simple to actually hack a website accordingly now we can
very easily fix this with some um some week six style python we really just
need to do a bit of logic here and so let me propose this let me go into
[Link] here and at the very top let me also create how about a uh Global
variable called Sports in all caps and I'm going to set that equal to in square
brackets the list of Sports I actually want to
support so I'm going to put in basketball here I'm going to put in soccer here
and I'm sorry no volleyball officially I'm going to put in ultimate frisbee here
so I've got this Global list of supported Sports now think about how I made
this form a while ago I just hardcoded these Sports here well I don't have to
do that I can sort of draw upon my own official list of sports instead so let me
scroll down to my [Link] rendering template here let me say that the
sports I want to support
are are these so just using the same placeholder trick as before but I'm now
telling the template what sports we currently support now if I go back into
[Link] I don't have to manually do any of this let me get rid of all three
of those options which I manually inputed earlier let me use my new trick
with ginger syntax and say for sport in sports then let me proactively say end
for just to finish that thought and then in here let me do option value equals
in curly BRAC is Sport and then so that the
human also sees the same words I'm going to say sport out here so I've
completely changed what was hardcoded manually typed to something now
that's completely Dynamic so now it's not going to stop someone adversar
like me from changing the HTML but watch this the behavior on the form if
we go back is still now the same drop down as before so aesthetically it looks
the same but you know what why don't we be clever now and let's go into
[Link] and the/ register route and why don't we say this uh if
how about sport not in sports then let's return render template uh failure.
HTML now this template doesn't exist yet so let me just quickly make this
real fast I'm going to copy that code from before let me create a code file in
uh failure. HTML I'm just going to paste this here so I have a super simple
error message and I'm going to say you are not registered just to that's what
we mean by failure and now in [Link] consider what logic I've added Sports
in all caps on line 22 is that same Global list as
before by asking pythonic if sport not in sports well then you hacked me like
you tried to inject volleyball or some other sport into request. form so I'm just
going to say no failure not letting you register and I can do this a little more
uh verbosely too why don't I also say this if if not name so if the name is
blank let's similarly return a render template of failure. HTML in other words
if you didn't give me a name you left it blank that's not useful for me running
the sports program let's also consider
that to be a failure so if I go back to this tab now I'm going to reload just to
make sure I have the latest client side let me be lazy and just click register
enter you are not registered because I didn't give it an actual name all right
well let's go back how about I now type David but no no I'm not going to
choose a sport I just want register myself nope that did not work now let me
go ahead here and choose soccer this I think does work let me go back now
and try this hacker trick whereby I go into the drop-
down menu I go into the select menu I change as before Ultimate Frisbee to
volleyball and I'll change this one here to volleyball let me close the tab now
this looks like it's available now but when I click register this this time it says
you are not registered and this is much better than relying on other
techniques you might see or have seen online with regard to HTML because
there's also this trick let me go back to the screen here let me go back to
[Link] and you might have seen online or you might
eventually see online that there's other attributes you can use like required
you can literally tell the browser uhuh this field is required you cannot leave
it blank if I go back to the browser now reload and I again presume to be lazy
and I don't type in any name and click register okay so that's kind of nice like
now the browser is being a little more helpful for me saying no no no this is
required you have to fill this out but again if you know what you're doing
okay well I disagree with your requiring
a name of me let me go in here let me go over to this tag let me delete the
required attribute and now I slip through but I didn't slip through on the
server and so there's a difference here and an important distinction and so
many people in the real world still screw this up there's client side validation
like actually checking that the data is as you expect on the client side the
browser and there server side validation and even though client side
validation like adding that required attribute
makes things more user friendly right like that was a pretty little popup it
tells me that it's required it just looks better than the previous version it is
not trustable you cannot trust any input that ever comes from the user
because clearly with like a an hour or so of cs50 like they can learn how to
turn all of these defenses off so even if you like the user interface better
client side you have to have to have to do server side validation always users
are not to be trusted and as soon as any
app or website you make becomes popular unfortunately then you have to
deal with all of the adversarial possibilities as well oh good question could
the adversary potentially access things sensitive like [Link] theoretically no
like if flask itself is buggy then sure maybe if you're running some other
software on your server on your laptop then sure uh it it's possible however if
your server is properly configured theoretically they should not be able to get
access to that with that said we'll
soon see or You might with your final project if you do something web based
you're never going to want to write like usernames and passwords in your
actual code you can put them in what are called environment variables so
sort of in the computer's memory but not in your code just in case you or
someone screws up there are uh still ways to defend against those kinds of
possibilities however slim yeah a good question and this comes back to First
principles just like in C and in python as soon as you return from a
place to store Frost's registration data what's the implication of this okay
might slow the whole process down but that actually Ram is actually good
memory is actually generally a good thing so not going to be a deal breaker
here why might I not want to store that data though in that variable you can
perhaps infer how I fix something earlier yeah and back yeah it gets it's the
memory gets deleted garbage collected if you will as soon as flask stops
running so if you so much as hit control C like you've just
lost all of your freshmen who registered for the sport probably not a good
thing I did this deliberately a moment ago and I hit control C because I did
want to clear the dictionary but trusting that your server will never crash and
your code will always work and the power will never go out like that's not the
right way to build any kind of web application with persistent data so what
we probably want to do is reintroduce csvs and we've played with those in C
and in Python could totally use csvs but we also now
have p uh SQL at our disposal and let me propose that we do this in SQL in
instead and for this let me go ahead and open up a version of the program
that I wrote in advance so let me go ahead and close these templates which
will look very similar but a little different from the ones I wrote in advance
and let me go ahead and open up uh in today's let me go into Source 9 let
me go into Frost im's how about version four technically in the versions
online and let me go ahead and open up [Link] as follows so
here is an already made version that does just a little something different at
the very top I'm importing cs50 SQL Library which you might recall we used a
couple of weeks past just to write python that talks to a SQL database and
this feels like an opportune moment to bring that idea back down here on
line eight I'm creating a DB variable that opens up a file called frost. DB
using syntax that we've seen before I did create this FRS DB file in advance
of class just so that we have a couple of
columns in which to store names and sports and such here's that same
Global array a global list called Sports and let's just see what's going on
down below if I scroll down to Index this is the same as before as we wrote
together on the Fly let's skip deregister for a moment and go now into
register so this one's a little different but let's see what I've done I've got
some comments in here because I wrote it in advance and I think this logic is
pretty much the same
though I tightened it up and I'm asking two questions at once using on line
38 the or keyword here just to say if there's not a name or the sport is not in
sports that is what we'll call now a failure but what's fun now is that on line
42 I'm using the cs50 SQL library to execute some actual SQL and I'm going
to insert into a table called registrant two columns name and Sport what
names and Sport well these two values with placeholders plugging in name
and Sport notice I'm using the question marks
after the user registers if I want to automatically show them now everyone
who is registered atreg I don't have to manually expect that they'll change
the url like I've been doing for the past few minutes I can just redirect them
anywhere I want on my app or heck I could redirect them to any URL on the
internet using this function call and it's just a nice way to send them to a
different route if you want them to see in this case those registrant so let me
do this in the same directory let me
increase my terminal window size let me do SQL light three of frost. DB and
let me type schema and you can indeed see it wraps on to two lines here
that each registrant has an ID which will be automatically assigned one two
three on up a name which is not null text and a sport which is also the same
and the primary key is just going to be this unique identifier so that I made in
advance but if I do select star from registrant semicolon there's no one
currently registered for any sports but let's try now running this let me go
ahead and close my old version which we wrote together and I'll close that
tab let me do flask run in this version four here all right I'm going to see
some similar output I'm going to open the URL now and you'll see that I
made a couple of Chang changes before instead of using a select menu I
used what are called radio buttons now which is a reference to Old School
radio buttons that were mutually exclusive in cars back in the day and we'll
see how to do this but it's just an alternative to a select
menu and I'm going to go ahead and type in my name again here so I'll do
David I'll do soccer by selecting this radio button and I'm going to click
register now and notice what happened it's a little ugly the formatting but so
again was this 20 years ago here I have now now at the slash registrant
route instead of an unordered list I'm just using a simple HTML table so I'll
show you what this looks like in just a moment too and I'll show you this
deregister button which is sort of
unnecessarily large I also have functionality we'll soon see for how you can
unregister someone from a sport as well so take your name out of contention
well let me go back to my terminal window here and I'm going to click the
plus to give myself s a second terminal so I can go back into Source 9 Frost
IMS 4 I'm going to do SQL light of frost. DB I'm going to do select star from
registrant now and now you'll see that indeed there's David registered for
soccer and in fact if I quit the flask
program with control C and rerun it again no big deal because that next
version of flask will just use the database as well so I'm persisting keeping
the data in SQL light whereas I'm actually grabbing it using my python code
in flask all right let's put one more person in here so we can delete one of us
too Carter for basketball register and now we see both of us here all right so
let's see how we did this let's go back over to VSS code let me shrink down
my terminal window let me go into the
actually let's go into the templates directory and let's look at for instance
[Link] so previously we were using a select menu turns out radio
buttons use the input tag but instead of having input of uh type equals text
like for the human's name you have type equals radio and so long as each of
your radio buttons has the same name the same name the same name that's
what makes them mutually exclusive so checking one radio button turns off
the others because they have the same name the value I want to
assign to each of these radio buttons is just the sport placeholder this is what
the human sees on the screen so it's almost the same as the select menu it
just looks aesthetically different but there's my same button so that's all the
difference I made there and I added a heading tag H1 just to say register to
make clear what it is but let's take a look at another file uh this one now
being how about the SL registrant route so if I open up registr trans. HTML
here now it's way more verbose than my
unordered list but this is just kind of boring HTML here's my table tag table
head table row table heading this makes things bold as the first row of the
table name sport are my two columns I've got a third empty column just so I
can fit that button as we'll soon see again T body for table body here's the
same for Loop trick again so that I can output for every registrant a whole
table row and there's this weird form in there but we'll come back to that but
there's the registrant's name there's
the registrant sport but notice the slightly different syntax here recall that
cs50's select uh cs50's execute function when it returns to you a list of
dictionaries you can then get at the individual columns by way of those keys
so let's go to the/ registrant Route let me go back to [Link] scroll down here
and it's actually super simple here I have a SL registr route that first
executes select star from registrant so just old SQL stuff give me everyone
from the registr table let me then render the
template called registr trans. HTML and just pass in this list of dictionaries
and we haven't quite done this yet but if you go back to register. HTML how
do you iterate over each dictionary in that list well the syntax is just for
registrant in registrants that makes this a dictionary one at a time in the list
just like in Python so registrant name and registrants sport is just another
Syntax for using the square bracket notation it's just a little cleaner and
slightly more succinct than
having quotes and square brackets everywhere and then the rest of this is
just HTML so what happens now if I want to uh like Carter has been cut from
the basketball team if you will so how do we do that well we want to click this
button deregister next to Carter's name but how does this work and think
about now any website you visited that has something like a shopping cart
uh where you can remove things from your cart or update quantities or add
more quantities to your shopping cart on Amazon or
anything else well let's actually look at the HTML that my app has spit out
let's actually look at this here and we'll see the following we'll see that we
have here in the HTML that reached the user not only is David in the first
column socer in the second notice that my register. HTML form is also
spitting out a tiny little web form of its own it's ugly but I only care about its
functionality for now and notice what I'm doing here every registrant in this
database gets their very own deregister
button and that form has a button that says deregister but notice what else
each of those forms have there's no text box there's no drop down menu
there's no radio buttons rather you have a hidden input field here so there is
a way with HTML to have a form that will submit information but you don't
have to give the user the ability to change that information you can just go
ahead and tuck it inside of the form invisibly if you will hidden in fashion and
so what's going to happen is if I click the D
register button next to Carter his primary key is two mine is instead one so
what's going to happen if I click his deregister button it submits a form with a
ID parameter whose value is two and it submits it to the deregister route so
what do that mean well if I go to VSS code and I go to [Link] let's look at the
deregister route that I skipped over so if you access the deregister route via
post this code gets called I grab from request form the ID that was submitted
in Hidden fashion if there's
indeed an ID that is it's not blank it's not zero it's an actual number like one 2
three or more I execute delete from registrant where ID equals that value
with a question mark Place holder and then I redirect the user back to
registrant now if I go back to this form here I click deregister we'll see that in
action gone is now Carter and in fact if I go back to my terminal window here
I open open up uh SQL light 3 of frost. DB and rerun select star from
registrant Carter is now gone so again using very
simple HTML forms you can get buttons and links and other such UI
mechanisms to like do things on the server that you want but there is a
danger here this now is really meant this example is like an administrative
website like it was some 20 years ago just for us internal staff to be doing
things technically this is dangerous what I've just done two even though
Carter's ID is two and hidden and mine is one in Hidden what could this allow
an adversary to do if they had admin access to the same
site any thoughts yeah yeah yeah they could change the value of that
hidden attri by opening up chrom's like developer tools change the number
in the HTML they could delete anyone deregister anyone want from the
database now in this case I claim this is fine because this is only meant for us
staff who were running Sports back in the day but it's indeed a risk so
wouldn't it be nice if we could actually ensure that only those users who are
authorized are allowed to execute certain actions I
think for this capability we're actually going to need to introduce something
a bit more and so here of course is an opportunity to talk briefly about really
what you and I do all day long every day we log into one or more websites or
apps or at least until uh you're logged out automatically and you have to do
it again so here for instance is a screenshot of Gmail when you type in your
username you type in your password maybe your to factor code that gets
texted or sent to your phone then you're
logged in and thankfully you're not prompted to log in again typically for
number of hours or days or weeks depending on the website like Gmail
keeps you logged in for ages your bank probably logs you out within an hour
or so for safety sake so that is completely configurable on the server but how
does g Emil know how does Google know that even as you're checking
different mails again and again and again how do they know that you're still
the same person who logged in well it turns out that
using these same building blocks as today HTTP and HTML and more you can
actually implement the notion of a login feature by doing the equivalent of
something with something called cookies essentially what happens when you
first log into a website for the very first time successfully with your username
and password a cookie so to speak is planted on your computer and
metaphorically this is kind of like taking a hand stamp and your hand is now
stamped in this case a smiley face so that every other time you
constantly represented to the site every time you click a link or make
another request to that website and mechanically how this works not just
metaphorically and ink essentially this is what happens here is a example of
an HTTP request to something like Gmail and suppose for instance that
you've logged in typically as of last week we said that coming back from the
server would be another virtual envelope containing like a 200 okay message
and then like the actual web page or the picture of a cat or whatever
it may be but Google can also if they verified that you have some username
and password correctly inputed they can do the equivalent of stamping your
hand and the way they do this is they send an additional line of text in that
virtual envelope from the server to your browser literally using another HTTP
header not content type which just mundanely tells you what kind of content
has come back they literally send an http header called set-cookie and then
they set a key value pair on your Mac or PC this is
the technical equivalent of this smiley face handstamp and what your
computer is designed to do because your computer and intern browser are
supposed to implement HTTP anytime you click another link on Gmail or click
on another mail or the like your browser unbeknownst to you presents that
handstamp and how it does it technically is in the envelope it sends to
Google from your browser it doesn't send set cookie it just sends cookie
colon and the exact same thing and so long as Google is smart and they
that I stamp his hand or really put a cookie on his computer and only if his
cookie lines up with the user ID he's trying to deregister should he be
allowed to in fact do so so all of this is quite possible and indeed the
technical term for this is session and what we thought we'd do in our
remaining time Today show you some examples of exactly how some of the
most familiar web functionality is implemented today some of which you'll
use in your own uh problem set 9 uh which itself will be a
web app or perhaps even your final project so let me go ahead and do this
let me close my previous tabs and all things for ashs and let's move on to
implementing some notion of login so in just a moment I'll switch over here
to VSS code and what I'm going to do is indeed in my source 9 directory I'm
going to go into a login directory and if I type LS here you'll see [Link]
requirements. text which just refers to libraries I want to automatically install
and a templates folder as well
I'm going to go ahead and stop the previous server and close that terminal
window and I'm going to open up this version of [Link] so there's a few new
lines here and we'll give you these lines for problems at nine but I've got
some of the familiar stuff up here including this new redirect function we just
used and I have a session variable that comes with flask 2 so what's nice
again about flask is that it deals with all of this cookie stuff for you it sets the
cookie it checks the cookie and what flask does
for you is it gives you the abstraction of a variable called session so that
anything you put in the session variable which itself is a dictionary will be
there again and again and again whenever that same user comes back a
session is how you implement essentially like the proverbial shopping cart uh
if I'm logged into Amazon you're logged into Amazon Amazon knows which of
us is which by way of that cookie and Amazon if they're using flask provides
the programmer with a dictionary called
session and flasks make sure that when Carter is visiting the site the code
uses his session object when I'm visiting the site it uses my session object
but it's all implemented with those same cookies this is the same as before
these lines are new and you'll see these in problem Set n this is how we
enable sessions in a web application and I'll just wave my hands at the detail
there's different ways to implement sessions whether you use cookies on the
server cookies on the browser or other things these just
ensure that we're storing the the session information the shopping cart on
the server itself now down here let's go ahead and do this let's go ahead and
run flask run so I can see what this app does if I do this and visit the URL that
gets outputed you'll see a very simple web page here and if I type in for
instance my name I'm not going to bother with a password and click log in
you'll see that you are logged in as David and now I can log out so I'm going
to go ahead and click log out and now it seems
to know that I'm not logged in again I can log in as Carter because I didn't
bother implementing passwords for Simplicity but now the site knows I'm
logged in as Carter better yet if I reload reload reload or click this button
again and again notice it still knows that I'm Carter uh until such time as I log
out all right well how is this working well let's go back to vs code here and let
me scroll down to first this route this is a very common Paradigm here
whereby I'm checking for
the index route if there is not a name in the session redirect the user to SL
login now what does that mean well let me go back to vs code here let me
go to the slash route so again your url will be different but I'm just going to
go to slash and hit enter notice that I got automatically redirected to login
and so many websites do this if you go to a website and you're not logged in
you're very often redirected to SL login or SL account or something like that
where you're prompted the code for doing that
generic HTML here's the body block I have this and this two is ginger
because of the curly brace and the percent sign if there's a name in the
session this is just python syntax then say this sentence you are logged in as
whatever name is in the session in the shopping cart if you will and then I
just have this HTML link for logging the user out else if there is no name in
the session logically just say you are not logged in and give them a manual
link for logging in instead so that's all this particular
template does but how does the slash login work well let's go into this other
template code of login. HTML to which I'm redirected super simple this is just
copy paste from HTML before I've got a login form that's going to have an
action of Slash login submits for privacy sake just via post and then the rest
of this is just a simple form and I'm using using an input type equals submit
instead of button equals type equals submit but same idea here too and if I
go back to [Link] well let's see
how login works all right it's a lot all at once but they're relatively simple
reapplications of the same idea so if the user visits SL login via get or post
call this function login if the user has submitted via post and we saw this
technique before go ahead and do this on line 23 store in this special session
variable that comes with flask a name key and store in it the user's own
name so quote unquote name will have a value of David or Carter or the like
and as soon as you do that redirect the user
back to slash just so they see the homepage again and this is how Amazon
and all these other websites work too otherwise if they visit this page
implicitly via get and even though I didn't say equals equals get anywhere
that's sort of the implication because if you can only get here via get or post
and we already handled post logically All That Remains is get well then just
show them the login screen instead but there's half a dozen ways we could
express that same logic and then for log
out this is kind of straightforward if the user clicks that log out link and ends
up at SL logout this route well just change the value of that key in the
session to be none effectively no Carter is gone David's gone there's no one
logged in so that is all that's required to actually implement the notion of
logging in and logging out of website plus the password thing which should
probably involve a database but one thing at a time and really session is sort
of like the code version of a
shopping cart whereby if I visit the same code I get my own session object if
Carter visits the website he gets his own session object and the way flask
Keeps Us straight is they put one cookie on my computer a different cookie
on his computer and uses those to line up with making sure the right session
gets shown to the right actual user questions on this notion of sessions no all
right how about a couple final examples just to tie this all together let me go
back into vs code here let me quit my previous version of
flask let's go into Source 9 and go into store which is a separate app Al
together and let's start by just running flask to see what it does let's hover
over the URL and open it in another Tab and this is pretty ugly too let me
zoom in but it's a very simple bookstore like an early [Link] for each of
these seven books here Each of which seems to be like maybe this is H1 this
is H2 H2 H2 H2 and then there's a button underneath each well now let's use
this as an opportunity to kind of infer like
for any website how this thing works let me go ahead and do view page
source and you can do this for any website on the internet let's try to figure
out how this bookstore adds things to a cart well here's the H1 tag
uninteresting H2 H2 H2 so the juicy part is in these forms each of these
forms has an action of SL cart so that's the route that's going to be
interesting in a moment and it uses post for privacy sake each of these forms
like the deregister feature for Carter has an ID attribute an ID
parameter that's hidden visually that has a value of one or two or three so
like the unique uh uh like barcodes for the books if you will but super small
numbers in our case and then each of these other forms just had each of
these other books has an identical form except for the value of this year now
in this case this isn't such a big deal that a user could technically hack the
HTML of this bookstore [Link] and change the IDS because whoa
what's the worst they're going to do like buy more books
by adding more IDs to their shopping cart like that's not a problem there's no
prices here it's just the unique ideas of books so whereas the deregister was
maybe worrisome because you're changing the server I think it's okay
because the user can only at worst buy more books than then uh they might
via the buttons alone so how does this now work well let's go into vs code
again here let me give myself another terminal window and in Source 9/
store let me open up [Link] which is where all the
logic is so I'll flip through most of this quickly because we've seen this before
these Imports are pretty much the same as before this line is the same this
line is almost the same but the database now is called store. DB instead of
frost. DB this is the boilerplate code for just enabling sessions this notion of a
shopping cart and so let's see how the index Works how is that I'm seeing all
seven books at once well in this index function I'm using on line 19 select
star from books to get all the
books from the database and then I'm rendering a template called books.
HTML passing in as a placeholder all of those books all right let's go down
that rabbit hole for a second let me open up books. HTML in my templates
directory and here again even though it's you know new today it's probably
increasingly familiar syntactically here's the H1 here is my Loop here's the
H2 which is going to Output the current book's title in that Loop here's the
form here's the
in some boilerplate rewrote we're going to use the session object to store a
variable called cart that in this case is going to be a list so session again is
just a dictionary you can put anything in it you want previously we put
students names and sports now uh what I want to do is actually sorry prev
previously we put uh the student name the user's name in it now I'm going
to actually store a cart key whose value is a list why because I want to
aggregate more and more books in this list all
right so that just makes sure that I have at least an empty shopping cart the
very first time the user does this if they visit this form via post let's go ahead
and get the ID of the book that they posted if it is not empty if there is a
number like one or two or three let's go ahead and go into the shopping C
cart which is a list per this line and just append that ID so this list of books in
your shopping cart is going to contain like one comma 2 comma four comma
six whatever books you're actually
buying and then the user gets redirected to cart what if though the user got
here via get and not post well this one's relatively straightforward if you just
visit slart we select star from books where ID is in okay so this is interesting
this list and and this is a syntax you might not have seen before but if you
read the documentation for cs50's Library if you select something and use a
question mark placeholder and the placeholder itself is a list we output a
comma separated list of values
just like you would use maybe in problem Set uh seven for doing SQL queries
on your own so this just means show me only those books in my shopping
cart not Carters not someone else's not in the whole database only show me
the books in my shopping cart and then render it as such so we only saw
what the catalog here looks like at slash books Let's go ahead and in slash
let's go ahead and add maybe the first book to my cart and now I see at SLC
cart only that first book whose ID is one let me
now go back to the bookstore here scroll down to maybe the seventh book
and add that to my cart and now I see this here too meanwhile all of this
information is stored in my session and so when I reload this cart again and
again the reason I'm only seeing my two is because we're checking only the
list in my session and flasks make sure again that my session is different
from your session is different from Carter's session as well but you write the
code once and it works for thousands millions
of people in parallel any questions on this this yes in the back sorry say a
little louder uh so to recap so users will never have the same session values
theoretically the cookie that gets planted does not look like a smiley face for
everyone each of us gets a big random number that's assigned to us so it' be
like each of us gets a completely unique hand stamp that no one else can
see the reason no one else can see it is because if the website's using https
every time this hand stamp is shown every time this
cookie is sent back and forth It's all encrypted as well so each of us can even
if we have the same contents by coincidence because we like the same
books they will be separate cookies separate memory separate sessions
behind you yeah really good question when does the session end totally
configurable typically it ends when you close the tab or when you quit the
browser or you can also configure cookies to themselves be persistent for a
day for a week for longer so for instance when you log into
um uh say Gmail they plant a cookie on your computer probably for a week a
month a year something like that because it would be annoying and probably
drive you to like Outlook or something else if you kept having to log into your
account whereas your bank account might actually wait for you to just close
the tab and then for your own Financial safety they just automatically delete
the session far sooner but totally configurable by default as I'm using it it will
typically be thrown away when the
browser itself quits and here too is another reason to develop websites using
incognito mode because if you want to just throw throw away all of your
cookies you close the incognito window mode open a new one and now
you're starting from scratch you don't have to manually delete all your
cookies which could log you out of websites you actually care about yeah a
good question when using sessions if someone maliciously changes the
value of sub forms could it affect other people theoretically no because the
worst you can do is like add books to your own shopping cart that you don't
want there so at that point even though it's on the server it doesn't affect
you or Carter or anyone else unless there is something more globally
happening like registering or deregistering for a sport or removing books
from the Amazon database that would be problematic but in this case we're
removing things only from my own session that the website is giving me all
right the last topic for today is this thing here which is sort
web-based services that you can use to get back data like the weather or the
current time or the database of Amazon books for instance all might have
apis often web-based that allow you using URLs or some other technology to
just get data from someone else as though it's a function you're calling
remotely but HTTP is very often the mechanism that's used to actually get
data from servers and the way the data can come back can be as follows let
me end with one final example using some of our
familiar shows from uh weeks past let me go ahead and close the old flask
version version go back into Source 9 and go into how about an example
called shows and the first version of this zero I'm just going to go ahead and
run with flask run I'll hover over my URL and open it here and you'll see now
that I have a very simple form as we keep doing today I'm going to type in
like o f f i c office into this search box and click search and you'll see now
that I ended up at a URL ending in/ search question
mark Q equals office so this is like my own baby version of [Link] but I
implemented it myself and for any title of a TV show from a couple of weeks
past that matches o i I spit it out into an unordered list how is this working
you can maybe imagine even if you might not be able to program this off the
top of your head certainly so soon let me go into Source 9 let me go into uh
show zero let me open up [Link] and in this file you'll see that that I'm
grabbing a uh file called shows. DB which is like a simpler
version of the one from a couple of weeks past uh here is why I see the web
form my first route my index is super simple it just spits out that form and
my search route like you can think of this as [Link] there only like four
lines of code so if the user sends data to SL search this function called search
is called I declare a variable called shows I execute a SQL command that is
Select star from shows where title like question mark and the syntax here is
a little crazy but I want to prefix to the
user's input percent sign and suffix it with a percent sign as well putting in
the between those two values the actual input why in SQL what is it mean if
you have a percent sign to the left and to the right nothing to do with Ginger
today yeah it's a it's a wild card so it means match zero or more characters
on the left or match zero or more characters on the right you have to do the
concatenation as the second argument to this function you can't do
something clever like put it here around the like the is the
placeholder that you plug these values into but this just means hey SQL
show me all of the titles that have o f i somewhere in them that gives me
back an a list of dictionaries I pass that in as a placeholder for a variable
called shows and if we look at search. HTML let's look at that in my
templates directory there's something called search. HTML super simple I
mean this is like the essence of [Link] search results I'm using an
unordered list to keep things simple but I iterate over
every show in the shows list that came back and I output An Li with each of
those shows titles and that's it now Google has like Blue Links and like little
previews and other text the first sentence or so from each page but like
that's the idea like this is really similar in spirit to what [Link] search
does for you now how is this working there's no API involved here yet this is
just very basic HTTP I submit the the form I go to another route and I get
back the results but check out this
version Let me close these tabs here and open my first terminal window let
me go into shows one from today's Source 9 directory and do flask run this
time let me go ahead and hover over that URL and open it here and gone
now is the submit button now I'm going to make an user interface that uses a
technique called Ajax for asynchronous uh JavaScript and XML which is
somewhat of a data term because we're not using something called ml
anymore but Ajax is a technique whereby you don't have to submit forms
anymore to get more data from the server you can use JavaScript per last
week listen for an event like the key press coming down or up and as soon as
you hear such an event you can secretly in JavaScript code send a request to
the server to get back more data and then plug it into the Dom the tree in
the computer's memory and this just makes for more seamless experiences
like autocomplete on any website so now let me try typing o okay we got
auto complete super fast f f i c e and you'll
see every time I add more keys to my input I'm doing another search another
search another search and the data is changing now how is this working well
let me go back to vs code here and in my other terminal window let me open
up uh [Link] and in [Link] you'll see that there's still a search route down
below that turns a search template but watch this let me go into templates
search. HTML and notice here that we're indeed getting back an unordered
list of shows again and again and again and this HTML that's coming
back let me go here let me open my terminal oh sorry this is the wrong
version uh sorry I was in the wrong folder let's fix this and shows one code of
[Link] it's almost the same thing inserch here okay well I changed this
slightly let me show you this version of search if I open up [Link] here's my
search route I'm getting a variable called Q giving it the value of whatever
request. RX has from the user like Q equals office and then I'm checking if
the user actually typed something in execute this
SQL query select star from shows where title is like that using the same and
this time just to keep things efficient I limited the total results to 50 instead
of an infinite number otherwise if the user type nothing just to be super safe
here I'm setting shows equal to an empty list so if you don't type anything
there's nothing to show and no matter what I render this template called
search. HTML well let's look at that if I open up templat search. HTML this
time there's no layout there's no
give you a whole bunch of Li tags which almost looks the same but let me
view the page source only am I going to hand you back a fragment of HTML
I'm not giving you an HTML tag a body tag a title tag a head tag I'm not
giving you a web page I'm giving you a fragment of HTML that you can now
do whatever you want including insert this into your own unordered list so
notice what happens in this actual app if I go back to vs code here let me
open up my index template here and you'll see some JavaScript
magic so in JavaScript here in my form that only had the text box and no
button what am I doing in a script tag here I am creating a variable called
input and I'm using this function called query selector that just gets me a
reference to the input text box on the form so I can see what the human
typed this is a little different today but I'm using input. atevent listener which
is a way in JavaScript to tell it just like in scratch listen for something to
happen like the green flag being clicked but in
this case listen for an event that involves input that is like typing on the
keyboard whether it's by a copy paste manual input or anything else then
whenever that happens call this function and async stands for asynchronous
this is a term of art which means that this this function might take like a split
second maybe even a second or two to execute so it's going to do it behind
the scenes like in the background so to speak and what is it going to do well
it's going to call a JavaScript function
that all browsers now Support called Fetch which is a function that uses HTTP
to go fetch more data via the from the server it's going to fetch data from a
route called SL search question mark Q equals and whatever the value is
that the user typed in so I'm just sort of manually creating my own mini URL
and telling JavaScript go fetch me that HTML when it comes back via this line
of text here called response. text and let me wave my hand at await await
just means this might not come back immediately
let's await the response and when it does come and then let's execute this
code I'm going to do this I'm going to search the document the whole web
page for this UL tag which is somewhere in this page that we'll see in a
moment change its inner HTML to be that fragment of Li Li Li of all of those
matching shows and where does this all go well if we scroll up here you'll
notice that there's my usual HTML up at top head tag body tag and all of that
there's the text box that we've talked
browser and I'm going to manually visit after zooming in let's do again slash
uh search question mark uh Q equals Office enter and this is what Json looks
like now at a glance this does not seem like an improvement like this looks
crazy that it's just this Big Blob of text but it's just enough text for the
computer to be able to process it reliably notice that there's a cur a square
bracket here and if I actually scroll to the Bott botom there' be a closed
square bracket
like way down there inside of that square bracket is a curly brace then ID
colon and then a number then a title quote unquote colon and then the title
and then the closed curly brace so what you're seeing in JavaScript object
notation is a very standard super popular format that's just text that still
uses square brackets for lists AKA arrays that still uses curly braces for
dictionaries key value pairs so what you see here is a massive list up to 50 I
think think shows that came back from
this API Each of which has a dictionary if you will an object of key value pairs
what keys and values an ID key and a title key Each of which has a value
respectively and this is the same data from IMDb some of which you might
be recalling visually this is just a very raw computer friendly way of returning
a whole bunch of data that we humans don't need to see but I can use this
data by going back into vs code let me open another terminal window and go
into Source 9/ shows 2 and in here let me go ahead
and open up how about uh templates [Link] which previously just used
that inner HTML trick and this is not going to impress you're not going to be
pleased with this syntax but let me just at least explain what we're doing it
turns out that Json is just the better way in general the more generic multi-
purpose user agnostic language agnostic way of returning data from a server
because it's just text so it doesn't matter if you're using python or C or C++
or JavaScript or Ruby or PHP or
something else like all of those languages can process Json information and
indeed here is some JavaScript that does just that same code as before
initially I declare a variable called input that gives me access to the user's
text box I listen for input like key strokes going up and down and when they
happen I call this Anonymous function uh I fetch data from the server using
the exact same code as before search question mark Q equals office or
anything else and then this is just now new code that I use to convert that
Json
data into my own HTML format be it an unordered list an ordered list a table
or anything else what am I doing I've got a variable called HTML initialized to
nothing so I've got no HTML initially I then iterate over every ID in those
shows so every one of IMDB's unique identifiers I iterate over them one at a
time and then I go into the show at its ID location and I grab its title and this
is forget this for just a moment I then take this HTML variable concatenate or
join onto it my own Li tag plus the
title plus the close Li tag and I skipped this because it got scary pretty fast
but it turns out that if some TV shows have actually angled brackets in that
could break my HTML entirely so it turns out you might recall super briefly
last week we had the copyright symbol using an HTML entity using the
Ampersand and the hash symbol and 169 semicolon it turns out there are
other such cryptic sequences of characters that represent otherwise
dangerous or untypable characters like this which could confuse
the computer into thinking it's at the beginning of a tag and an Amper sand
which could similarly trick the computer into thinking that it's a entity which
it isn't but long story short there are libraries thankfully that handle much of
this for you for our purposes the takeaway is that now that you understand a
bit of HTTP uh now that you understand a bit of HTML CSS and JavaScript all
of which they have their roles you can use them ultimately to start
assembling your own web applications as you will for problem
set nine um stitching together all of those languages and building full-
fledged web applications mobile applications or anything more and for that I
think we are all set and if the first one up here uh can have these cookies as
well we'll see you next time for our very last cs50 [Music] [Music] lecture
[Music] [Music] [Music] [Music] good afternoon my name is Sarah and my
name is Grant and we are the Harvard crocodillos and pitches now Sarah and
I understand that today is the final lecture of cs50 it's been a tough
semester we made it through pets four five and even Finance now I know this
is an unpopular opinion but I particularly enjoyed Finance I spent a lot of time
with my flask the P set there was a p set well um at least things are looking
up um today is our last lecture and look how far we've come if I were an
emoji right now I'd be the face with tears of joy sorry about that we're just
trying to work some cs50 references into the uh intro I I mean uh boy I sure
hope this uh tide man doesn't run off my Mario
filter you could say that for Loop x equals Open Bracket one comma 2 close
bracket boy I sure wish we had checked to see if these jokes were funny
when we wrote them any who we hope that you'll enjoy this brief [Music]
serenade your boot [Music] KN House of well there's friers Andy truck
barbecue ribs Tri [Music] the boy boy bo boy bo bo boy boy down at the
house the house of well your root and you walk on down to a knock down
Shack on the edge of town There's a l there just quit you see
fall down the house the house [Applause] of up your boots and you walk on
down at the house the House of Blue Light to your down at the of light
[Applause] good afternoon everyone we are the Harvard crocodillos and it is
such an honor to be here with the Radcliff pitches performing for cs50's final
lecture congratulations to everyone and we hope you'll enjoy this our tribute
to cs50 one two one 1 2 3 4 C is for the language I once knew O is for all
notation I must do D is for dynamic flask run and finance it is even
more than David May in canor so code is all that I can give to you to youe
debugging it since PE soon deadlines I'll make it hit compile and please don't
break it code was made by [Music] foru I for notation I must do is Dam even
more is [Music] I is for the language 0 1 0 1 0 oh is for otation 0 1 1 D is for
dynamic flas run and finance e is even more than David mail in canor so C is
all that I can give to you debugging it since PE soon deadlines I'll make it hit
compile and please don't break it code
was made by me for COD was made by me for COD was [Music] by all right
all right this is cs50 and cs50 this was the Harvard crocodillos and the
Radcliff pitches if one more time we could thank them for joining us today so
this is already week 10 our last and indeed among the goals for today are to
hopefully give you all the more of appreciation of truly just how far you've
come recall that in week zero we began with this this visual here whereby it
was described this class as a bit of a fire hose whereby drinking from
that fire hose or really a fire hose from a water fountain uh is not unlike
getting an education down the road too and so this is to say that if you're
feeling like you didn't quite get it all down like that's actually okay and that's
to be expected and even if you felt that with each passing week 0 1 two all
the way up now till 10 it never really ever got easier perhaps just consider
that what was once hard like Mario and like getting hello world to compile is
indeed the right measure of
the Delta between week Zer and now in week 10 in fact you might recall that
again in week zero 2third of your classmates had never taken a CS course
before now of course you all have and indeed if you think back too to this
final sentiment from week zero that indeed what ultimately matters in this
course is not where you end up relative to your classmates but where you
end up relative to where you yourself began so I would take some pride take
some satisfaction take some relief even though a little bit more work does
remain at really just how far you've come since that week zero and recall
that in week zero we literally started with just zeros and ones and by now
many of you might have gleaned that these 64 zeros and ones have been
spelling something week by week in fact today is our very last message here
in binary uh encoded on stage but then quickly we introduced scratch and
we started to assemble some building blocks of programming Loops
conditions uh uh Loops conditions functions and the like but without all the
distractions of
semicolons and curly braces and all of that which admittedly we introduce
the next week when we introduce you to C but even now that we've
transitioned to python hopefully even those kinds of Curiosities or confusions
are hopefully starting to just get more familiar and so you finally start to see
the missing semicolon as opposed to spending time on that kind of struggle
recall too that in week two we started talking already about memory and like
how you can manage things in arrays that later became of
course in Python lists uh the week after we talked not only about uh
debugging uh bugs in code but how to debug those same programs um
thereafter we started talking talking about algorithms and we took a step
back from code and looked at the bubble sorts and the selection sorts and
the merge sorts and all of the searches as well that go hand in hand with
that and indeed this ultimately is what a lot of problem solving moving
forward is going to be about just solving problems with some form of
algorithm and you have so many different
languages um in your toolkit now with which to approach problems like those
we talk thereafter about pointers which are not likely to come back in any
modern languages that you now use but hopefully you have an all the better
of a sense underneath the hood of like what's going on inside of the
computer so that when you're designing something you're using something
something crashes you at least have a mental model for what's going on and
it's no longer that week zero black box as it once was I mean you built
things like this think back to week five when you built your own hash table
and those things are everywhere key value pairs whether it's in python or in
C or if it's now in CSS and JavaScript and even HTML like that principle of key
value pairs is really everywhere and so of course now code doesn't
necessarily have to look like this it now wonderfully looks a little something
more like this but eventually you're probably not I'm not going to use Python
anymore something new and better is
going to come along but odds are like a lot of the building blocks from these
past 11 weeks are still going to be useful for wrapping your mind around
those new worlds and indeed SQL we introduced you a little bit too and even
if you don't feel yourself yet an expert hopefully have a sense of like what
you can do with it and what problems you can solve uh it's of course a better
alternative to something like something simple like a spreadsheet and now of
course like web stuff is everywhere
whether it's on your laptop or desktop or a lot of the mobile apps that you
use on your phone even though they're native applications like you install
them from Google Play or the Apple App Store like they're implemented
increasingly with HTML CSS and JavaScript but they're put in a little
rectangular window so you don't even notice that that's actually really just
an embedded browser and then of course you can build things as you might
for your final project that to might very well be web based if you go
that route I mean I'm still clinging to like the very first like web app I ever
made years ago um but honestly I do that in part because I was just so darn
proud that like I taught myself how to do something and it actually worked
and was used by other people so whether it's just used by you or your
classmates or your roommates or your family or your company down the line
there's a great sense of satisfaction that comes despite all of the the pain
that might be along the way when you just can't see or fix
that bug now of course we'll transition as you'll see in the coming days to try
empowering you to code client side as well up until now you've been using
our own vs code installation in the cloud which is nice cuz you got up in
running super fast in week one focusing only on code challenges not on
technical difficulties but among the goals now if you so choose and want to
program after this class even if you never take another CS course before you
can use these same real world deao standard
attention back to your final project there of course will be several meals
during the day culminating with 500 a.m. uh shuttles to IHOP the local
Pancake Place uh if you are so awake at that point or even if you get there uh
this is not an uncommon site as you might recall from week zero and then
lastly is the cs-50 fair which is finally back after a couple of years now of it
not being on campus and this will be an opportunity for everyone to present
their final projects to passers by classmates faculty and staff and really
just Delight in what it is you created on your Mac your PC your phone in the
cloud or anywhere else and indeed it's just going to be an opportunity to
bring your laptop to a shared space or your phone and introduce your project
to passers by such as might appear and ultimately celebrate what you all uh
accomplished and indeed will you be handed at the cs50 fair your very own I
took cs50 t-shirt which I dare say I'm still wearing all of these years later and
so you to we'll have that uh ahead of you as well
so for what's on the agenda today we thought we would not only look back
but look forward but first we wanted to thank so many of the team members
that have been helping both on stage and off who've made this course and
these sections and so much more about cs50 possible of course um the
building that we are now in there's a whole team downstairs in Memorial Hall
who helps us get set up and organized each day our thanks to them there's
the education Support Services team who makes
everything look and sound so well down here especially when we have all of
the more microphones as well our friends the harbard crocodillos and the
redcliff pitches most recently and then of course cs50's own team uh but
butter cs50's own favorite restaurant Chang show down the road indeed if
you find yourself in Cambridge for the next one two 3 four years or visiting
from out of town uh do pay a visit to our friends just down the road and in
fact we'll have our very last cs50 lunch this Friday if you're
able locally to partake and then there cs50's own team um both on stage
and off in thanks truly because not only do they make the everything run so
smoothly they capture it for students here who might not be physically
present here for our friends down in New Haven at Yale and certainly for
anyone online who might be tuning in as well and then lastly wanted to
thank of course the huge team of your classmates your peers that make
cs50 possible in sections and office hours tutorials and more allow me to
share
with you the outtakes so that even we the teaching staff sometimes struggle
with computer science here are some of the clips that we captured when just
passing packets via tcpip a while back you saw the finally the the nicely
polished version but here are if I may if we could dim the lights are some of
the outakes nothing go buffering okay Josh nice [Music] Helen no oh wait that
was amazing Josh uh um [Music] Sophie amazing that was perfect B I think I
what you oh nice [Music] guy that was amazing thank you all so
good indeed and that moment if we could just one round of applause for
everyone who's helped out this semester so back in week zero uh we
introduced you of course to this idea of computational thinking which is to
think a a little more methodically a little more algorithmically and by way of
these various languages hopefully that is something you notice maybe not in
the moment but in the months and the years to come that you do find that
your thoughts are indeed a little more cleaned up and you're just able to
express yourself a little more precisely and even spot illogic in someone
else's document or statements as well but at the end of the day really a
course like this is also about critical thinking and indeed rewind again to
week zero when we frame the entirety of computer science is really just this
like problem solving and any problem in the world be it CS or otherwise has
some input and we decided how to represent those inputs it needs some
output the solution there too and then all of what you focused on doing
and learning and applying these past several weeks is in that proverbial
black box which hopefully is not such an abstraction anymore but is indeed
something that you know how to harness and know what could be going on
underneath the hood even if it's in some technology or some language that
maybe we ourselves didn't cover because a lot of those first principles
remain the same now along the way we talked about the quality of solutions
to those problems we happen to focus on correctness just does it work
design
which is a bit more qualitative and subjective and then style the Aesthetics
of it all and these two are characteristic maybe not with these same words of
just how you might write or evaluate other creations in life be it physical or
written or the like so think about too as you solve problems just how you can
sort of frame for yourself like am I doing a good job or not by quantizing it
along these or perhaps other axes as well and we thought we'd highlight just
two topics from that week zero that have really been manifest for
the past several weeks namely abstraction like taking complicated things
and ideas and trying to simplify them so that we can sort of operate at this
level and like solve problems we care about without getting into the weeds of
implementation detail so to speak but there's this tension because you know
now from all of these different languages that code is fairly unforgiving I
mean even omitting a stupid semicolon sometimes breaks everything and so
Precision is sort of at odd sometimes with this idea of
like a lot of hands are going up in this okay a lot of hands how about I saw
the first hand there uh yes yes who's yes who's pointing at herself now come
on down we just need the one hand for now but oh wait oh wait uh you'll be
our number two well okay we have way too many volunteers now no no
please please come down yes in the black shirt and if you guys we will okay
we'll do pair programming in just a bit if you want to hang out in the wings
here we'll have our second demonstration as well so okay
now maybe a round of applause for our three volunteers oh come on up first
oh second and third okay you come first we'll do order no uh this is a q okay
Q here okay what's your name I'm Danny Danny okay take this mic okay so
we will DQ you momentarily all right so Danny come on over to the middle
here and in a moment I'm going to hand to Danny a sheet of paper that has
a picture on it and this picture is going to be something that I'd like you to
verbally program the audience to draw you can use any words
any abstractions any PR level of precision that you want but you just can't
make hand gestures or sort of show them what to draw But first you want to
tell us a little something about yourself including everyone here I'm daddy
and I took ts50 okay wonderful wonderful so I'm going to reveal the picture
only to Danny and if each of you would like to take out that sheet of paper
and just make sure that no one else can see this if you want to hold it up this
way everyone here is now holding
their pen or pencil and in some number of steps give them a verbal
algorithm for drawing what you see and you can say anything you want but
no gestures okay so you're going to want to draw a square in the center of
the paper with the diagonal pointing to the center of the edge wait no
actually I scratched that draw a rbus in the center of your paper and for
those who forget what a rhombus is for um a d Diamond a a square that's on
its side and then from the bottom vertex draw a straight line down but not all
the way to the edge of the paper okay and then keep your pencil or pen at
that point and you're going to want to draw a line that's parallel to the line of
the original rumus to the right and then keep your pencil or pen at that point
and draw a line straight up connecting to the side vertex yes and then go
back to the line that you drew from the bottom vertex to the bottom of the
paper and then draw a line parallel to the left edge of the rumus and then
then keep your L your paper your pencil at that point and draw
a line up to the vertex of the rumus again the end the end all right well thank
you to Danny hang on to your paper thank you so much and if you want to
step off to the stage there we will reveal thank you a round of applause if we
could for Danny that is not an easy task I'm sure and if Carter wouldn't mind
just grabbing a few samples here let's actually take a look on the overhead if
we could I'm going to pop down over here real fast we don't need to collect
them all but if you're
feeling either good or bad with what you drew happy to collect a few of them
okay okay thank you thank you okay hope you won't mind if I can't reach
everyone just a couple more okay over here okay all right that's that's okay
this one's really funny okay I'm going to go with this one if I may and Carter
has some too okay thank you so much okay so just a random assortment
here let me turn on a camera so I can show you what I see here for instance
is one classmate drawing which might resemble perhaps what you
drew here uh here is the beginnings of a house it seems nice this one okay
this one is larger and how about a couple of others that were getting closer I
think okay so more edges and vertices there this one seems a little similar in
spirit if not proportional this is uh Zach's the best one but it turns out if I may
Zach you're not all that far off here Denny is what you were reciting to
everyone algorithmically indeed it was this here Cube and so Danny can you
come on back up for a moment so if you'd like to share for
just a moment like what were some of the thoughts going through your head
and why did you choose the words that you did okay so what was going
through my head when I saw the cube um I didn't know if I could say draw
Cube so I decided to start with the top and so draw a Remus in the center of
your paper and then draw a line down and just like do the first part then the
second part then the third part and then you would get a cube like Zach yeah
and so had you said and you could have said draw a cube which
yourselves in the middle here hi I'm hey there I'm sadik from Turkey nice to
meet you all wonderful welcome and this time we're going to flip it around so
as to have the audience do what Danny just did for us the only catch here is
that the only means we have for showing the audience what they need to tell
you to draw is like literally right above the chalkboard so on our system here
that your eyes must stay on the chalkboard and not look up and in just a
moment if you guys want to both stand in front of
the chalkboard back to the audience and as you're talking with each other
other verbalize it through the microphone if you will I'm going to show
everyone else in the room a second and final drawing and we'll just go rapid
fire around the room give us one step at a time collectively and we'll see if
these guys can't draw exactly that same outcome so is there another chunk
what's that is there another chunk or just just the one so you'll have to
collaborate and let's
give you a clean slate here literally all right so no looking up that's the only
rule for you guys here we go for the audience here is what we'd like them
ironically to draw step one from anyone in the audience yes draw a circle
draw a circle anywhere not anywhere not anywhere okay that's step one step
two someone else yeah in the middle draw a line down from the bottom of
the circle about halfway down I think there was a hand in front of you too
number three [Music] okay okay the overarching goal here for
those unable here is to draw a person fig okay it may be a SI figure draw the
left leg of the person of this person okay good job all right next step four
yeah of line up Circle okay the left oh sorry to the right draw a v okay to the
right of the vertex at the bottom of the circle draw a v draw a v um like what
V nope not interactive draw a v um no no well yeah it seems Weir weird let's
get ready for maybe something like step five okay we'll go with that step five
someone else step five someone else
someone else yeah draw the right side of the leg okay nice step six step six
happy face six six yes erase the line erase the line that you have on the left
on the okay okay step seven yes instead of like that line that was before
going up make it go down instead of that line before going up make it go
down mhm okay step eight step eight step eight yes connect that line to the
hip connect that line to the hip not like not touching not touching something
like this maybe okay compromise not touching okay
not touching okay all right step nine almost there I think step nine step nine
step nine yes in back uh write the word high on the top left of the circle here
okay and step 10 almost there line draw a line pointing to high so like a spee
bubble basically yeah okay and step uh 10 11 yeah erase the exclamation
point nice 12 do we want to give them one more 12 or we good yeah last one
erase the erase erace the right arm okay I think we're going to need a 13
then and then yeah repat repeat the left arm but rotate it
by 90° that feels wrong wait how would you like as an organic human being
how put your arms like would you put your would you ever structure your
arm like that that would not be a stick figure CU would you do this or would
you do that or a little hint maybe get a get a give me a step 14 step 14 and
final step 14 I think we just got to tell them what to do step 14 yes think of a
walking man and and have the left the right hand walking to your right all
right so like it's like where could the hand go where
should the hand go on that arm but yeah yeah okay yes no yeah I mean look
look look like right here look right here look right I yes sorry thank you 14 St
that's pretty close so congratulations to you guys and thank you as well all
right so I mean these these things too are not yes Round of Applause then
sure so so this is to say that these ideas of abstraction and precision and
really every other term of art that we explored this term are sort of
omnipresent and can be easier or harder to implement
depending on exactly what the problem is but what we thought we' do now
in our uh final day is try to now similarly prepare prepare you for life after
cs50 and this is really going to be a list of really potential to-dos so that you
can stand on your own after the class after the class's infrastructure write
actual code and then we'll come full circle one final time with our friend
Jennifer 8 Lee to look at the world of emojis and how they relate to all forms
of representation that we've talked to uh
talked about up until now so one how can you go about programming after
cs50 so one you can actually install command line tools on your own Mac or
PC perhaps unbeknownst to you Windows has what's generally called a
command prompt Mac OS literally comes with a terminal program in your
applications utilities folder and so even if you've never run those programs
you've actually had a sort of blinking cursor black and white prompt available
to you might not have all of the same software installed as your code
space in the cloud but you have that command line interface even within
today's graphical tools and among the tools you can install within that
command line interface would be something called xcode on the Mac which
comes not only with a guey IDE integrated development environment but
also those command line tools and Microsoft for Windows has something
similar as well learning git so we've used git only unbeknownst to you
underneath the hood for the most part but git is a very very popular tool if
challenging to pick up for the first time that makes it easy to push code to a
website called GitHub or any equivalent and then collaborate more
effectively with classmates there's definitely a bit of a learning curve but
thanks to cs50's own Brian youu you can start for instance with a video like
this and this indeed is going to be one of these deao standards in the real
world at least for the next several years that you'll probably encounter if you
work in Tech or really any company
where you're doing some programming vs code itself will walk you through
this process in the coming days but you can indeed install it on your own Mac
or PC and what can you do when you write code well you can certainly write
software for your Mac for your PC for your phone or of course per week 10 uh
week nine you can host uh your own website be it static as in week uh week
eight um hosting it at websites like these which gener generally have free or
student-friendly accounts via which you
can put something statically on the web at a real domain name that you
might choose or you can host a full-fledged web app and using uh student
tiers on Amazon and Microsoft and Google's cloud services or others you can
sign up for being a student certainly a whole lot of free software free hosting
so as to if nothing else um experiment and uh perhaps maximally get your
own app or website up and running so know that those are resources
available to you and this is by uh certainly a non-exhaustive
list if you'd like to geek out in the coming months in the coming years these
are just some of the places that people who take computer science classes
who write code might tend to hang out and ask and answer questions of
each other um so keep an eye for instance on these here and then cs50 has
its own communities as you'll see if you go to this URL here via the open
coreware version of cs50 which is open to the world do uh is there a Vibrant
Community uh thanks to time zones that's pretty
much active 247 365 talking about not only cs50 going on in problem sets
and projects but really technology more generally as well so certainly feel
welcome to partake either asking or answering questions now in speaking of
asking and answering questions a couple of weeks ago you kindly gave us a
whole bunch of review questions which we called through and picked out our
favorite 20 of them these of course were multiple choice questions and in
preparation for this week uh in preparation for life ahead we thought we
would choreograph a bit of a a quiz show here and ceed as you came in at
the start of class you might recall being invited to go to this URL here cs50.
either here in person or if you're watching live from home at this URL here
you can use a phone or a laptop and if it's easier on a phone you can point
your camera at this 2D barcode here we'll give folks a moment to pull that
up and again that URL was cs50. l/p and once it looks like most folks have it
up and running our friend Carter here
will help us dive into this uh review session if you will with a bit of fun along
the way all right Carter if you'd like to take it away what do we have as our
first question you should see on your phone or laptop this same question
being asked the first question is how do you print quote un quote hello world
in Python so among the possible answers are these here buzz in on your
phone or your laptop we've got a few hundred responses are ready 7
Seconds to make your decision this is question one of 20 go
to it with some confidence I think we're down to zero on the clock and Carter
it looks like 98% of you uh indeed said hello world and Carter per the check
mark That's indeed the correct answer here now to make things interesting
in know that you'll see some number of points and we've deliberately
anonymized it so only you know what number you are so a whole lot of
guests have a perfect score of 1,000 at the moment hopefully we'll see over
the next several questions things start to bridge out uh
but know that the speed with which you buzz in will also factor into how
many points you now get So the faster you move the more points you get
question two if we could what does DNS stand for from just a couple of
weeks back domain number system domain name system data numbering
structure or there's no such thing as DNS few hundred responses are in 8
seconds remain fewer points now but still a chance to buzz in and now as we
hit zero the responses are these domain name system which is indeed
correct and 84%
of you got that one correctly and indeed exists we talked about it a couple of
weeks ago so we're still seeing a whole lot lot of ties at 2,000 we'll see if
someone starts to pull away before long question three what is the upper
bound of merge sorts runtime so that escalated quickly Big O of n log n Big O
of log n Theta or Omega of log n or Big O of one what is the upper bound of
merge sorts runtime that was the last of the algorithms we solve for sorting
and in 1 second we'll see that the correct answer
is is just edging out everyone else indeed 46% it is n I know it's n log now if I
may as the teacher it can't be login because login is strictly less than n and
you can't possibly sort n elements unless you minimally look at or touch
each of them so it's got to be at least greater than n intuitively we still have
a whole bunch of ties let's move on to number four what is stored in ARG C
back to the language C is it in Array of arguments the maximum size of an
array the count of arguments given to a
program when first run or how much memory is allocated to a function again
you wrote all of these questions and we have 5 seconds for the reveal ARG C
is indeed the count of arguments given to a program when first run think
back to C when we did command line arguments there was argc and arv argv
was the array but Arc was indeed the count the CN AR save all right we still
have a whole bunch of ties at the top here but let's move on then to number
five what is the duck debugger favorite hobby
the next question now is six what is the function used to open a file in C F
open open file open file what is the function used to open a file in C 7
Seconds there's some differences between C and python here and the reveal
it is indeed fop it's 77% correct too all right let's see the rankings now if you
are guest 15 9715 6171 3753 or 3273 you're now in the lead as we move on
on to se question seven how does Sterling compute the average sorry how
does Sterling compute the length of a string
in C it looks at how much memory the string uses it counts the number of
characters until it reaches back0 it counts the number of bits in the string or
it creates pointers for each character and counts them 10 seconds Sterling in
C recall that we implemented this ourselves in class but then we used the
library thereafter and in indeed with 85% it simply counts the number of
characters until it reaches that Sentinel back sl0 AKA null and in this case we
have five four four of you tied now for first all right question eight where
does Malo allocate memory from the stack the Heap the pointers or the temp
where does malok allocate memory from responses are coming in 8 seconds
a good review question at that in two seconds we'll see that malok allocates
memory from woo close one the Heap is correct the Heap is correct the stack
recall is where functions store their local variables and their arguments and
that just happens automatically the Heap represented in our pictures up top
is where malok draws from now we have guest
15 has made its way to the top here but others can catch up if they don't
buzz in fast enough so number nine how many people flew from 50v to New
York on the day of the crime 16 29 8 or three anyone with a laptop perhaps
has an advantage here 5 Seconds and the answers are but the answer is 16
let's see if guest 15 got this they did not goodbye to guest 15 at the top all
right question 10 we're about halfway there what are meta tags used for in
HTML to describe a web page to Define parameters for an element to group
find the address of a variable in C think back a few weeks star dollar sign
Ampersand or ask from one of your own classmates how do you find the
address of a variable in C and the number one answer is ersan which is
indeed the address of operator at 62% nicely done let's see who's the top of
the list now guess 4669 has retained their lead so we move on to 12 what
does the arrow operator mean in C a hyphen and a greater than sign nothing
starts a comment replaces a star and Dot operator declares a
pointer what does this Arrow operator mean in C again from a few weeks
back 3 seconds harder assortment perhaps and it's oh replaces a star and
Dot operator the number two answer was indeed correct this was just a
cleaner way syntactic sugar for collapsing what would be a star and then
some parentheses and then a DOT into quite simply something that looks
like an arrow itself all right Carter who's in the lead now still that same guest
and let's see what 13 has for us which of these is not a data type in
SQL light blob string integer text we used a few of these more commonly
than others but not all of these are for real 5 Seconds to make your decision
and the results are blob is a thing string is not in SQL light it's of course
called text as we've seen it blob as goofy as it sounds is just binary large
object but indeed it's how you might store a binary file in your database all
right the rankings now oh guest 8444 has Eed ahead so we move on to 14
which of the following is a valid
is still at the top and pulling ahead two final questions 19 what is the
difference between null one L and null 2 L's they null and null mean the
same thing nulls refers to back sl0 whereas null to L's is the zero address null
is the zero address whereas null to L's refers to back sl0 null is null but lazier
5 Seconds subtle not the best design perhaps to have in technical terms but
indeed 62% of you got that NL is the first thing we talked about when we
talked about back sl0 and N is a pointer
it's the zero pointer same thing same number but different context all right
Carter guest 8444 is the person to beat with our final 20th questions what do
the binary bulbs on stage spell today and these are your four choices
different from usual we usually use 8 bit asky today we are using utf8 which
is a form of Unicode which is the larger subset that uses one or two or three
or even four bites to spell a single character and the answer wow close close
is indeed a cupcake indeed a cupcake well done and let's see the final
results 8444 is the winner are they here in person perhaps 8444 you're 844
come on [Applause] down thank you here you go congratulations oh you're
all right so today if we may give me just one moment all right all right so
today we are so pleased to be joined by uh Jennifer 8 Lee who's an of the
college a dear friend and is actually really the reason why there's evidence of
Muppets in cs50 in fact some years ago I was visiting her and she had on her
shelf like this custom Muppet it wasn't one that appears
on TV but she had somehow gone on a website former toy store called FAO
Schwarz at the time and you're allowed to configure your own Muppet
whatnot choose the eyes the nose the face and the Torso and I just thought
this was the coolest thing and so in the taxi on the way home I was like going
on the website trying to purchase our very first Muppet I then woke up the
next morning thinking why did I just buy a puppet in the back of a taxi and so
it sat on the shelf for really 2 years and then a colleague of mine within
cs50's
team decided after I brought it into the office to sit on a shelf there to
actually bring it to life and indeed if you Google around cs50 Muppet and
puppetry online you'll see in fact these as characters not only over the past
couple of years in coid times when really there was next to no one actually
here and so they were instead um but indeed she's brought not only this this
educational element this pedagogical element this playful element El to cs50
and we have her here today to speak to
exactly the sorts of encodings that are here on stage Jenny is the former Vice
chair of the Unicode subcommittee on emoji which is to say that she and her
colleagues have been influential in taking emoji from what was a very limited
character set early on and by far unrepresentative of much uh human
emotion in speech into really an initiative now to capture digitally all of the
world's languages past present and future as well as the range of emotions
that might see here in the form of that pillow or even in the cake that
awaits so allow me to introduce Jennifer 8 Lee thank [Music] you I much drink
okay clicker hi all right hold on I have to hide my drinks I might need more
water all right um I'm really excited to speak to speak here last time last year
I was here uh one I was wearing a mask which is like a real bummer if you're
lecturing and then the entire like front part was all Muppet so I'm really
happy to see humans actually um and it's always an honor to speak at
Sanders and and then Dave and I were
Chinese is women and we like like to text about food and so I sent her this
picture of dumplings she was like yum yum yum yum yum yum yum yum um
and then she was like oh Apple doesn't have a dumpling emoji and I was like
oh that's kind of interesting and didn't really think anything about it because
like you know people Point things out to you all the time and then you just
like forget you just like move on but then half an hour later um on my phone
appears this like dumpling with hard eyes and you
don't see it in in because it's a still shot but it actually had like blinking eyes
so she liked to call it like bling bling dumpling so she as a designer had
decided to like go in and like make her own dumpling Emoji um because she
was like I'm a designer I can fix it but that actually got me thinking I was like
where do Emoji come from and like how is there not a dumpling Emoji cuz
from my perspective dumplings are this kind of universal food right so and
there are a lot of Japanese Foods
on the emoji keyboard and I I was not this was like back in 2015 I was like not
a big Emoji user like at all so I mean you have things like Ramen you have
Bento boxes you have Curry you have tempura you even have like kind of
obscure um kind of foods like this thing things on a stick turns out to be fish
fish things on a stick then this pink and white white swirly thing is also a fish
thing and there's even like that triangle rice ball that looks like it's had a
bikini wax all well represented on
the Emoji Keyboard but no dumplings and it's very strange cuz like all
cultures kind of have their dumpling right whether or not it's um kinali or
ravioli or aanas like essentially everyone sort of like discover the idea of like
um yummy goodness inside a carbo hydrate carbohydrate shell whether not
baked or fried or um steamed so I was like okay I literally Google I was like
who controls emoji and you discover that they're actually regulated by a
nonprofit called the Unicode Consortium and um it is you
know and I just like went on their website and I discovered that they had 12
full voting members as of 2015 so this is 2015 and they were like mostly us
multinational tech companies it was Oracle it was IBM Microsoft Adobe
Google Apple Facebook and Yahoo and of the three that were not
multinational US tech companies they were let's see a German company
called sap a Chinese Telecom company called Huawei and then like the
government of Oman like those were basically the 12 full voting members of
the US multi-national tech
companies so they at that point paid um $118,000 a year to have full voting
power on the you know Unicode committee and I was like oh that's a that's a
lot of money and I kind of felt indignant about this and uh but then like if you
kind of keep on digging on their website you found like there was this kind of
interesting loophole which is you could join as an individual for $75 um you
don't get voting power but it gave you the right to put yourself on the email
list and also to like attend
the quarterly Unicode meeting so I was like I'll do that I had no idea what I
was doing but I'm like I'm going to go fight um for this dumpling Emoji
because from my perspective dumplings are Universal Emoji are kind of
universal so uh the fact there was no dumpling Emoji meant like something
was wrong in the universe and I was I was determined to fix this so um you
know I was on this email list and then a couple of like maybe even like a
couple weeks later I got you know they they they kind of sent
out this note that's like hey who's coming to the quarterly meeting and I was
like um I looked at the calendar I looked at my schedule I was like oh I'll be in
like you know Silicon Valley that time so I basically like rsvpd and I was like I
will be there and took um cow train to an Apple building it's a legal building
in I think it was sunny Veil so I just like show up and I don't know what I was
sort of expecting like with like you know the Unicode I think maybe thought it
was going to be like a baby
Congress like you know like with a little you know very formal seats people
with gabbles um that is not what I found uh basically it is a it was a
conference room full of people who skewed wider skewed older skewed Mal
skewed engineers and this is basically the room where it happen so this is
2015 these were the people who decided your Emoji um all very nice and um
there was you know one one even had a daughter who had a sense of humor
and made him a shirt that said shadowy Emoji Overlord um so I just kind
of listen to them debate things like milk emoji and beans emoji and it just
seemed like not quite right to me that like it would be uh This Global visual
language that were basically decided decided by like a a small group of
people inside a conference room in Silicon Valley so I decided to former
group called Emoji nation whose motto is like Emoji by the people for the
people and it basically advocates for more kind of Representative inclusive
Emoji um you know we we we we start with a Kickstarter campaign uh
dumpling Emoji
process trying to like you know write the wrong in this world and uh made
this little cute video sort of advocating dink for one of the most universal
crosscultural Foods in the world Georgia has Kali Japan has giosa Korea has
Mandu Italy has ravioli Poland has barog Russia has pelman Argentina has
empanadas Jewish people have kler China has pot stickers Nepal and Tibet
have Momos yet somehow despite their popularity there is no dumpling
Emoji in the standard set why is that emoji exists for pizza tempora Sushi
spaghetti
hot dog and now tacos which Taco Bell takes credit for we need to write this
disparity dumplings are Global Emoji a global isn't it time we brought them
together oh yeah and while we're at it how about an emoji for Chinese
takeout so uh I did put together a dumpling Emoji proposal I wrote this uh I
remember Thanksgiving Day 2015 on a plane and uh actually and we got it
past basically dumpling takeout box Chopsticks and uh fortune cookie I have
to say I don't think fortune cookie would have made it on its own merits but
it kind of like slid in on the clo tails of the other ones um and so these were
the proposals as we submitted them and then these are the ones that kind of
uh exist now on the Apple keyboard and I have to say the dumpling looks
really really realistic um oddly realistic and whereas like the fortune cookie is
think it's like a big fail because first of all there's like it has no it has no Gap it
looks like a dead 3D Pac-Man so I'm very disappointed in uh the
manifestation of that but that's okay that's okay um and
so it's kind of interesting like what is the process of getting an emoji um ped
and I will sort of walk you through it so first of all you come up with your idea
right and then you kind of write this proposal and then you submit it to the
Unicode Emoji subcommittee who then kind of gives you comments and then
sends it back to you and you kind of go around and around in the circle um
and when so these are things that we consider so somewhere in there I also
like fought my way on to the Emoji
committee and then also became a vice chair became a vice chair sort of a
extracurricular that's like completely run a muck in my life um um so things
that matter uh popular demand is a frequently requested Emoji um multiple
usages usages and meaning so that's actually kind of very important for
something like you know certain animals have meaning so we did you know
sloth a while ago and that also has not only the literal meaning but sort of
like um like um connotations there are visually
pumpkin in so You' like you know have a have a rainbow heart thing with a
little pumpkin stuck in the middle so orange heart obviously should be added
and give a sense of completeness um and then something else is existing
vendor cap cap compatibility and so a good example for that was um many
years ago what's app decided to uh add the gender non-binary emoji and
then once it did that then all the other kind of vendors um jumped on um so
what kind of knocks out an emoji so too specific or narrow
so we'll often see that with like very specific animals or a very specific group
it's redundant so one year oh my God who makes that Butterball Butterball
makes the turkeys Butterball submitted um a an emoji proposal that was like
a cooked turkey but we already had a live turkey so it seem kind of
redundant have both like a cooked turkey for Thanksgiving and a live turkey
so so not visually discernible um this is a struggle for things like I don't my
friends have kind of proposed kimchi kimchi is is really hard on emoji
sizes for many reasons and part of that but part of that tension is because
it's not visually discernable then there are no logos Brands deities or
celebrities so no Nike swish no McDonald's M and then this is one that we
kind of decided in the last uh year or so which is no more flags flags are a
very complicated thing um and as a result Unico does not want to be in the
business of deciding what is a country or not a country uh so like you know
when you get a proposal from like Kurdistan you're like yeah so
right now the way that the Emoji flags are decided is they kind of depend on
what the UN recognizes and then those get passed down to the international
standards organization and then and then Uno just does that like it does not
want to be in the business of kind of you know geopolitical Affairs um so
once it comes out of the subcommittee it goes to the full Unicode technical
committee UTC those were the people that were in the room that I showed
you um and they vote once a year basically to pass all the Emoji and
Emoji Nation you know kind of has done its thing so of the weird things is like
how did Unicode this like kind of nonprofit organization based in Mountain
View California end up controlling this like Global visual language um so a lot
of it has to start with has has to do with the fact that emoji started in Japan
uh back in the late 1990s one of the this set from dok 1999 is considered is
widely considered sort of like the first like color Emoji set it has been
collected by the Museum of Modern Art
and um so these the Japanese telecom companies would use basically would
have their own sets of emoji and then they were different companies so they
would have different sets so you could basically only send these like visual
characters with someone who is on your same carrier so it's like it's like
basically equivalent of if you were on like Verizon you can only text people on
Verizon with like Emoji or if you were on um T-Mobile you could only do that
so at a certain point they um decided they were they the Apple
and Google came into Japan they wanted to start selling smartphones and
they realized that it was a hodgepodge of systems and they wanted to unify
it and so in 2007 they went to unic code and they're like okay help us unify
the Emoji um like kind of like basically all the Emojis so that we have one
standard system and part of the reason is why Unicode because Unicode
basically has this mission is to enable everyone speaking every language on
Earth to be able to use your languages on computer
and smartphone so it basically unifies um all written languages into one
ginormous set and that was not the case actually when I was uh growing up
there was a point where like if you you were Japanese on Apple that would be
different than Japanese on um dos or like Chinese or Arabic so it drove
everyone crazy and they basically decided um around the late 80s early
1990s that they were going to come up with one standardized system that
sort of encoded all characters in in one ginormous set so um there's three
main
projects for Unicode if you care so one is encoding characters including Emoji
now they're about 100,000 characters aside assigned so that includes like
like Chinese Japanese Korean uh Arabic cilic um actually all the hieroglyphics
all of the Emoji um a lot of things like the Bitcoin symbol or like copy left or
whatever those all assigned were about 100,000 uh characters even those
languages that are basically out of use so the other thing it does it creates
um localization resources so things
things so that you know like oh if you're in this country this is this is uh
you're using the Euro or you're using the pesos or something so there's a lot
of localization that is that data that is needed depending on which um
geography you're using your device from or like that you know you you know
that the time is used this way or the dates are shown that way so that is
called um the common local data repository or cldr as they call it and the
other thing they do is they kind of maintain libraries
a number back and forth and then locally your phone or like your laptop
decides like oh this number correlates with which image um in terms of our
emoji font and then pulls it up so this is really key to know why different
Emoji look different on different platforms so 2007 to 2010 it took about
three years but Unicode 6.0 came out with a first our first little baby set of
emoji um and it just kind of hung out there for a year like no one it wasn't
doing anything so 2011 though Apple starts adding the emoji keyboard and it
it just like explodes like I feel like in some ways Emoji are were not like
invented they were discovered they like obviously touch something very very
Primal to um to like our human desire to like communicate in like little
colorful glyphs on electronic devices and you kind of have what's kind of
really interesting is like the ambiguity that comes with what emoji kind of
mean and so one of the you know this one my favorite emoji is sort of like an
upside down smiley face very very ambiguous
clearly very ambiguous because if you start typing into Google like the top
hits are like what does it mean from a guy what does it mean from a girl like
it's clearly something that a lot of people are are using in like complex
situationships between each other so um so so one of the fun things is who
can propose Emoji technically anyone can anyone here can normal normal
humans can we have basically Google doc uh or Google form that we throw
up uh historically between August sorry April and August in the last two
years um and
uh so this is one of our my our favorite examples this is Ru aumed she was a
15-year-old Saudi Arabian girl who was living in Vienna at the time that she
proposed the um hijab emoji and then you know then she was like Time
Magazine like you know coolest teens she got like a whole bunch of different
things she got into Harvard and Stanford and she went to Stanford um and
uh so this is a PR we got then there was a group of folks from Argentina who
got the mate Emoji kind of they you know similar their national drink then we
worked with
drop so there was like red wine there was like kind of that Rose um with the
like falling petal and then my favorite is actually um my friend who used a
Japanese flag as a way to indicate that she was having her period so um one
of the biggest contributors this uh skin tone Emoji appeared I think in 2015 it
was amazing it is and it was proposed by a mom Katrina parrot she is a
entrepreneur and a mom who is just like at home one day and her daughter
comes home and is like you know I wish there were emoji
that looked like me and her mom was like that's great honey what's an emoji
and so she like me I guess Googled and just figured out that uh Unicode
controlled emoji and she just came up with a proposal saying we should not
only have the yellow you know skin tones at that time everything was sort of
Simpsons yellow um it's really interesting to see how race and like nationality
are like depicted in different parts of the world so originally in Japan
everyone was yellow um but this these were the non
you know by default everyone was just like like you know human or Japanese
but they had like a couple things that were like not like one was you had a
blonde person so there's an emoji called like blonde blonde woman or
whatever that represents all westerners okay so that was one and then they
have one that's like an Indian guy with a turban so that's supposed to
represent Indian people and then there's like a like a guy with a little like um
little hat that's supposed to represent like
Chinese people so that was like that was the view Japanese view of race
which was like default then you were I then you were like blonde Western
Chinese or Indian and and that is all there was and um obviously in the
United States we care a lot about race and then so she came up with this
system with five skin tones um like just like normal people some guy in
Germany decided that he wanted to do a a a face with one eyebrow raised or
as we call it the coar Emoji um and oh this one's fun so woman's flat
shoe I have to say not highly used statistically at this point but I really kind of
appreciate it because it was a mom who was very offended that all women's
shoes had heels even the sandals so this is her she had like three kids at the
time now is four she was very fertile um and she also did um the women
women's flat okay she also did um um one piece bathing suit because she
was also um offended by the fact that the only kind of bathing suit you had
was like this like little itsy-bitsy polka dot
bikini thing which is not great if you're like taking your six-year-old so I have
to say that got passed but like it didn't go over like super well with everyone
um so you know unic Cod because it's very public submits things for
comments and um we got this comment back so one piece bathing suit why
a person want to indicate the use of swimwear can't use existing bikini is this
really necessary what about a Victorian bathing costume or a wet suit or
water rings this is like literally in in like uh the
records and like do not encode um and so the person who did it's actually
very impressive he's actually the person who created the middle finger emoji
and actually you ever seen the Vulcan Emoji he he the Vulcan hand emoji
he's actually very active and I have to say this is actually I think one of the
more impressive Emoji so so obviously obviously we have a lot of active
debate um sometimes you get like whole countries submitting so literally the
government of Finland as in like their
tow towel it's like super like dicey but we wanted to help them because it
was like literally a foreign government coming from an Unicode advocating
for the sauna emoji on behalf of their ENT country so then this is um is sort
of evolved into just basically person in stey room which is which is the most
sort of like the PG version of sauna no and there's no spoon they're all
dressed it's very odd but um so you can see the evolution of what it started
out what we submitted and what it ended up so
there's a lot of like Evolution throughout the entire process um and like
companies can submit Emoji proposals too so Google actually worked on this
one I love this one okay so just to give you some context as of 2015 there
are many ways you could be or have an occupation as a male on the Emoji
Keyboard right like like for example you could be a police officer you could be
a detective you could be a Buckingham Palace guard you could even be
Santa Claus like these are so many jobs that you could have but if you were
a woman
as of 2015 there were four things that you could be you could be a princess
you could be a bride you could be a dancer or you could be a Playboy Bunny
these were the sum total of all the occupations that we can have so so we
there was sort of this movement at that time there was like this like video on
YouTube that like went viral there was like a New York Times op that was like
where are the women with professions so um basically they came up with a
set of emoji for professions and what's nice is
not only did women have these professions now men have them too so of
emoji Nation Emoji these are some of the ones that we've worked on I think
about 130 of the Emoji on your keyboard probably came through touched our
system in in some way including I I have to say uh microbe or virus I think I
have the opinion that every Emoji has its day right like it might not be like
today it might not be next year but I have to say virus was not doing
anything then came 2020 and that was like such a good
moment for it um along with soap we had also done soap so um you know
among the other emoji that we have worked on are sari moon cake llama like
uh teddy bear there were no toys I felt really sad for toys we have like giraffe
um there was Hut bubble te bubble tea was very controversial actually I have
to say it kind we tried to slide it in originally with the um takeout box and the
dumplings and people were not not having it I understand that CU there's not
a lot like compared to like beer or wine
like bubble tea does not is not long does not have a long history on this
planet um but I will say that um they submitted again actually kind of
originally proposing that it was not just bubble tea but but like a black ball
and milk and tea it was It was kind of cool and I I have to say there was
definitely a generational divide between like the Asian women who sit in that
room and are like this absolutely is a thing that we consume like almost like
every week of our life and people who are a little bit older who are like that
looks like a parfait how do you not know that's a parfet and we're like we
absolutely know it is uh not a parfait and so so it got in eventually so it does
sort of influence it kind of shows like who is in the room influences you know
the decisions um that get made or sometimes in the room now sometimes
more more likely in the zoom um I actually I just say beaver Emoji if you see
Beaver Emoji that's one of one of the ones I'm most proud of so that is
actually co-authored by a professor here at
Harvard um who is both lesbian and was married to a woman from Canada
so it was very important to her to get um the beaver Emoji passed and she
promised me it would always be the first line of her bio so and indeed if you
go to her Twitter handle uh it's like Joan Donovan creator of the Beaver Emoji
comma is headed research at the shoren scene Center at the Harvard
Kennedy School It's Kind it's pretty impressive um and then we did greens
actually greens was really interesting because people this
was also like a generational cultural thing people were like why do we need
greens we have salad and I was like we're Chinese we don't eat raw greens
cuz like you don't know where it's been or if it's clean so we cook our greens
so salad is not something that we have So eventually I got my greens and
um so that was kind of fun and then so these are some of the people who
sort of have contributed to our little Emoji Nation things including a number
of Native Americans who help get feather so why do
Moon Sun you can mix and match them which is super fun um so you know
two trees kind of make a forest oops sorry oh well okay then the um Moon
and Sun together means bright which I like um I like this one so if you stop
and you think about this so this is a basically a pig under a roof and what
does that mean it does not mean Farm as you might think it um it actually
means home or family so in the Chinese kind of structure and outlook on The
View it's like where you keep your pigs is actually where your
home is and what your farm is um so it gets kind of weird in all kinds of ways
so one of my favorite radicals so this character means woman uh KN and as I
was learning Chinese you kind of notice like how it shows up so so this is a
woman underneath a roof and you're like oh it means mom or wife or or like
whatever like home um it does not it means peace because things are at
home when the woman is or things are at peace when the woman is at home
underneath a roof which I always thought was a little bit odd um then there
is
also uh woman plus child so you're like oh and actually specifically Boy Child
the connotation there is a little unclear so you're like a woman plus child
family mom you know whatever um it is not it means good so the standard
for goodness in ancient China was a woman who had a male child child so
that kind of kind of just like kind of irked me um growing up and then you
know three women together means evil which is like very like MCB this
character means greed this character means slave this marriage
let's see I think this one is jealousy and this means means adultery or
betrayal so like definitely not loving the way women were P you know
portrayed um on the emoji keyboard so uh in case you're wondering we just
came out with a kids book called had emoji that kind of Compares emoji and
uh Chinese and I think I I think they sent a bunch of books so that you guys
can do some kind of contest some like later on with cs50 um so but the
mixing and matching is really interesting right for example the
skin tones are actually the same yellow character plus a layer of skin tone on
top of this so I kind of took my lessons from Chinese in terms of seeing how
things can be binded so there's something you should know about which is
zge this is also an invisible emoji character it stands for zero with Joiner and
it was actually originally created mostly for I think Arabic where you would
you would basically kind of force something to have like be in the beginning
of of a word or an end of the word by kind of having this like
invisible character so the rainbow flag for example is actually a rainbow plus
the white flag um and we we could have all kinds of fun combinations if you
look at polar bear it actually is if you have an older device or it breaks apart
it is bear plus snow which is really cute it was originally I had bear plus white
and then we decided that bear plus snow made a lot more sense so uh
another one this is new I I think if you guys should have it if you've updated
your phones in the last year or so so mending heart is
because you have two genders plus a third like gender plus two people plus
five skin tones plus yellow like how many Emoji couples can you come come
up with when you introduce this Factor so this is um and underneath it it's
just a wige sequence it's like two people standing together that are like glued
together now this actually gets interesting from a cs-50 perspective because
in many cases even though you only see one character underneath the hood
depending on how your system works they're counting each one of these as
an
individual character so your string length actually might be five and instead
of one and this this kind of became a problem with things like Twitter where
things had a hard skin um length so gender inclusivity um is actually one of
the things that we've been dealing most with um in the last couple years so
it's kind of interesting if you think about both what a pictorial language looks
like versus the abstractness of a of a spoken language so because you know
we had boy and we had girl but there was no way to say a
generic child right like if you were on you wanted to say child you had to pick
a boy or a girl but not a way to say like just some little person and that's
really key cuz in English at least there is no gender implied by child um so
how do we mimic that and it also is key for something like doctor right doctor
and teacher those are those have don't have gender implied but when we
have them on on the Emoji Keyboard you to pick a male teacher or um you
know a a a female doctor or whatnot so
there was actually uh a guy at Adobe who can considers himself non-gender
binary also The Man Behind the orange um the orange heart he fought and
got basically uh the first three non-gender uh non-gender binary emojis so
child adult and old older adults so those are creative and then we started
having to propagate them through actually all the occupations so these are
the gender neutral versions of many of those um but then uh we got into this
whole thing where every emoji that had a gender originally had to be
mirrored so
originally we had bearded man and then we're like okay well we actually
have to get bearded woman so that is on your keyboard there is pregnant
woman there is now pregnant man which is interesting um there is you know
woman in a bridal gown there's now man in a bridal gown um and then there
were ones that actually had to be created that were neither man or woman
so this is a mer person so there was merman there was mermaid and there
was M person it was really interesting to figure out like how do
you draw a gender neutral M person like a bunch of them in the beginning
actually had the arms crossed around sort of the chest um Monarch so there
was prince and princess and now there is monarch and one of my favorite
actually is there was Santa Claus and there was Mrs Claus and now there's
MX Claus like the name of this character literally in Unicode is MX claw so I
feel like it's sort of like a very official enshrining of gender non biner in like
the world um not everyone loved it New York Post did
not love this that they like you know we're we're cing into like Emoji woke
Wars um so some Emoji stats for you this is very fascinating this is sort of
like the Gen the General Distribution by far the single emoji that used more
than anything else is the face with tears of joy about 10% of all Emoji scent
is that one character and then number two is heart red heart and then it kind
of goes down so um there's a frequency of emoji use these this is sort of
done by um order magnitude so one is half of two
two is half of one so it's really interesting it's a very very steep drop off after
the first couple um in case you ever want to go onto the the Emoji kind of
Unicode website you can you too can see all the frequency things so I think
it's really funny so basically it's green going this way it increased in usage
between 2019 and uh 2021 and it's red going this way it drop and so
pleading face which is a relatively new emoe you just sort of shut up on the
charts um and whereas actually like
smiling face with heart eyes like kind of kind of slipped which is interesting
so we just closed our Emoji proposal round for 2022 these were sort of the
breakdowns people love submitting Smileys and food and beverages animals
and nature da da da I don't I mean these are very googly colors um so what
is the future of emoji I will um I will tell you because we just had a meeting
two weeks ago so I can now publicly talk about it so historically there was
this whole idea like Unico doesn't want to be
in the world of like encoding glyphs for like devices everywhere like there
was very controversial when it started doing that because mostly what
Unicode used to do was you take an existing language could be dead you
know and then it would just take it and digitize it right it took languages that
existed and just digitize them and uh then when it kind of wandered into
Emoji World Suddenly It's like deciding what deserves to be like an emoji
decides to be digitized so um trying to get kind of get out of it
and they have proposals over time where it's like oh maybe we should like
come up with a a way to just send pictures back and forth where you you it's
a fixed picture and you like use a hash so that you know like we would look
at the picture and then like go do a lookup somewhere like that did not go
over well then there was actually a really interesting proposal I kind of like uh
didn't go over well which is using something called the qid which is in Wiki in
the Wikipedia world so in
Wikipedia World items all have uh numbers across the different language
Wikipedia so Obama human Earth they will have an ID number so that the
page in English and the page in German the page in Chinese all know that
they're pointing to the same thing so the question so one idea came up like
why don't we use the numbering system so we can use like Eiffel Tower in
you know see the number and then like oh people know like oh you're trying
to say Eiffel Tower that did not go over well so those
are both both of those proposals seem dead as of yet um and it's too bad cuz
and you'll see kind of what's happening okay so what's coming in 22 so these
are the Emoji that I actually sort of thought they would be on your phones by
now but cuz we're in mid November and they usually update early November
so three more Hearts people love hearts like uh wing blackbird Goose Birds
also purple flowers jellyfish moose face donkey donkey was a little bit late for
the kind of Elections um Ginger peep pod
Wireless cond shaking face um folding hand pan that one was interesting cuz
when people first proposed it they proposed it as like an electric fan and that
didn't like who knows what electric fans will look like in 100 years because
the thing is Once An Emoji always an emoji they never retire so they're
always looking for things that have a long visual longevity floppy disc did not
actually do that so there's always like we don't want another floppy disc um
and then hairpick is interesting so
there was a whole debate about how to convey um like afro African hair like
the the curly hair that they introduced a couple years ago was supposed to
do that and most of the vendors actually have it in a sort of Afro way except
for Apple so there's a lot of complaints but um hairpick was sort of a an
interesting way it means both comb but also has sort of an interesting
historic connotation and it's been around for about 2,000 years uh a couple
music things maracus and flute uh Beyond 2022 one of the
things that's going to die oops can I go back no I can't oh well uh we're going
to retire the the family Emoji um they didn't go over so well there were so
many of them combinatorically if you had all the everything in in uh you
know all the race all the races all because you wanted to have skin tones
because you didn't want to imply that families can only be one race it was
such an ordeal uh essentially we're all like no one uses them and there's so
many and it's
like the fonts like in terms of the load is like too large so they're just going to
make them all into like basically little like bathroom symbol Type U folks um
so I think that is those will disappear what what's actually really interesting
about the family Emoji is they had gay when they introduced the family they
had gay family emoji and the Russian government went berserk and actually
you can Google this in 2015 you'll see a bunch of articles about like the
Russian government considering
reason why it matters is because not all languages run in the same direction
so for example Arabic so we are used to left to right but a lot of languages go
right to left um so and this kind of It kind of changes the meaning of emoji
for example right to left I send this a lot to my friends when I'm going flying
from the Bay Area to New York um if you do it from left to right however that
is what it looks like so it looks like um you know you're in NE Bay but the
plane is still going that you know kind of up and
to the right and then now it looks like you're going from New York to the Bay
Area the other place is like oh it's it's a girl and she's running really fast right
uh that is left to right to left in our world wait left to right sorry sorry that was
that's that's supposed to be left to right and in here it would be she's like
behind like pollution or something like that so sorry about so so an example
this is actually in The Proposal like in one case if it's left or right you're
running away from a line of cars and the other one it's a warning to not run
behind car fumes so they are trying to figure out like how do we mirror a
bunch of the Emoji um but the LA the main thing that I think sort of I don't
know when it's going to happen I'm really hopeful is going to happen is trying
to come up with a system that supports little stickers in line that don't need
Unicode so this is like slack or on um twitch you can embed little pictures on
in line and all all all the vendors have
to get together and agree to come up with a standard way to do that they
have not yet come up with that but that is sort of one of the ways that unic
coded like want it wants to back away from actually being like a global
regulator for like little colorful glyphs and so if you ever need to reach me um
on in my emoji world you can find me um Jenny EmojiNation org there will be
um the it will be actually a while I think before we see a next generation of
emoji showing up in it used to be like every year they would get new code
points it might be a little bit less than every year now as they work on things
like directionality over time so that uh if anyone has questions you can ask
questions you can find me afterwards I think I've I feel like there's supposed
to be some some hubub right now about maybe maybe um microphones but
maybe not but maybe I'm just done and if if there are questions or if David is
around I'm happy you know he can I'm I'm happy to to answer any questions
that folks have yes hi yes I was wondering what were
your thoughts on the emo so the question is what are my thoughts on the
Emoji Movie you're talking about the Sony animated one yes okay my
thought on that is it is better than a 6% rating on um Rotten Tomatoes would
lead you to believe so that's my one thought and my my next thought is that
um that was a rush job from an animation perspective that was that was was
about 18 months of whereas a typical animated movie takes four years so in
my spare time I also produce movies and documentaries so one thing that is
key
to know about about movies and animated movies and this is very important
they take a very long time but you can always fix it because you haven't shot
anything and a very good example of that is I assume you guys have seen
Frozen if you haven't seen you seen of the age you would have seen Frozen
um I do not understand like how huge and a phenomenon or why it was such
a huge phenomenon but um they actually did a original cut of Frozen and it
did so I don't know you guys know the the Eye the the sort of
Snow Queen thing but she's like super dark and like not fun and like kind of
evil and like not someone you want to like get behind as a character so they
actually did sort of a rough cut of that of Frozen and they came out of that
with um it's just storyboarding and they're like that is not good and they
killed it so they were like we can't go with this and then started from scratch
more or less again starting with the song um Let It Go which is actually
written by a kid from my elementary school Bobby Lopez or
co-written by Bobby Lopez um I also actually fun fact I also went I would took
the school bus with uh Lyn mmel Miranda so I was a fourth grader when he
was like a kindergartener so we had a very musical Elementary School in
New York City but the thing is they could fix it because they had enough time
and have enough money not like movies where you shoot humans much
harder to fix so you have the footage that you have and you can do little
pickups but you can't fix it so essentially what happened in
that case I think um it's 18 months and it could have gotten better and a lot
of the movies that you see with Pixar like it's very it's actually sort
emotionally similar to the movie called inside out and uh but inside they just
have more time and so it's better so as opposed to 18 months which is not
long enough to make a animated movie good but uh the other fun thing is is
it was the it it was so weird cuz they sold sponsorships it was like like oh my
God here comes the Bots and the malware let's go into
Dropbox and protect ourselves and was so I think that it got a lot of like bad
kind of um kind of uh vibes from the from the press for doing things like that
but from a kids perspective it's it's fine I think if um I don't know that I would
like put into my top 10 of animated pictures but it's better than 6% on Rotten
Tomatoes and then actually if you guys ever care we we have done uh I did a
documentary about Emoji so and all the people that create helped create
emoji and uh we did
a cs50x movie night I think during the pandemic was it during the pandemic
everything sort of like blurred together but it was during the pandemic yeah
Are we more questions yes um I wanted to know you mentioned that one of
the criteria for AC do or there's demand yeah yeah that's a very good
question yeah so the question is one of the propos one of our criterias of of
um getting an emoji accepted is to sort of to demonstrate demand and how
do we demonstrate demand and I would say in a in a pretty um clumsy way
actually at
this point so the main thing that you have in our in our current proposal
process is we have um a median Emoji which is elephant so elephant is like if
you stack ranked all the emoji for popularity elephant is like right there in the
middle and it's also a concept that's like universally understood across all
languages so um elephant shows up somewhere between 500 million and
700 million in Google search results like if you type it into a laptop you'll see
you know elephant 500,000 uh 500 million
search results and generally you're you're trying to when you're comparing
your term to elephant you want to see very roughly how many Google search
results B search results sometimes Instagram so actually something that was
really surprising to me was someone proposed hummingbird I think
hummingbird is uh a good proposal and um but if you look at Hummingbird
it's only like 21 million in terms of the stat so which I thought was like very
surprisingly low so that's one of the main ways that we
kind of we kind of see like is it also visually used and all of that yeah any
other questions are we good I didn't even need my water or anything oh can
I take I'm going to take a picture I'm going to take a picture for because now
you guys are actually human and not Muppets so I'm very very excited about
this so I will send this to like my block mates and be like I just lectured at CS
you know in Sanders Theater my thanks to Jenny Lee thank you yeah you
can stay up here for give
weekend wherein not only are cs50's own students here in the audience but
also some family members as well now you're showing up in the semester a
little bit late we've just tackled week eight which is really our ninth week
since computer scientists start counting from zero so we've done a whole lot
of work over the past few weeks as you might have heard via emails or text
messages home including a language known here as binar so on the screen
here of course is a lot of zeros and ones and suffice it to say
let me sum up the past nine weeks with this is what's going on underneath
the hood but of course today we thought we'd make things a little more
accessible a little more broadly applicable and indeed our Focus today will
not be on what these patterns of zeros and ones represent which in astute
eye might notice are replicated visually with these light bulbs being in a
pattern on and off and as your child might have hinted uh before class or
perhaps now this might very well spell a word up to
eight characters long because you can encode even in the real world things
digital too but today we'll focus on things much more high level this notion of
cyber security like our the security of our data our privacy of our systems
particularly on the internet nowadays because presumably all of us are
carrying Technologies around in our pocket using laptops and desktops every
day and so the goal today is to stipulate that this is what's going on
underneath the hood but let's solve some
that you and I are choosing for our system so as of this past year according
to one measure the most commonly used password in systems everywhere
was 1 2 3 4 5 6 all right number two password in our top 10 here list was
only slightly longer 1 2 3 4 5 6 7 8 9 after that we took a turn in the other
direction 1 2 3 4 five alone after that got a little more interesting quiry which
might sound pretty cryptic but not if you look down at your US keyboard and
it's the top leftand row of the keys on a an American
keyboard so also not all that hard uh perhaps not surprisingly uh a little
disconcertingly number five was password meanwhile number six returns us
to digits 1 2 3 4 5 6 7 8 after that really less effort 111 111 uh after that a
little more variation but not all that much 1 two 3 one two 3 after that it's
getting even less interesting 1 2 3 4 5 6 7 890 and then lastly topping the list
is just 1 2 3 4 5 6 7 so this is not a good top 10 list to be on so among
today's first takeaways is if you see your
password on the screen like you didn't make the list in a good way this
means hundreds thousands millions of other people probably have that
password of yours now in of itself that's not necessarily worrisome because I
don't know who has these passwords in a room as large as this but just
intuitively why is this a bad thing either parent or child welcome to raise a
hand here why might this be intuitively yeah access to it so access to it like I
mean we literally as computer scientists now have a database
of really common passwords and your your thoughts password and can just
find it out quickly yeah you can just find it out quickly I mean you could
imagine trying to guess someone's password by just typing in random letters
random numbers random words but not if you have a top 10 list like the the
adversaries in the world might as well just start with this list now you'll
notice that even absent from this are slight variants like some of you might
be thinking I'm not on the list cuz I do
something clever like I use an exclamation point for the number one or a
three for an e or a five for an S and based on the smiles in the room right
now you're not all that clever it turns out because other people are smiling
too which is to say that an adversary can take those same fistic that you
might think are making things more secure by just tweaking some letters to
numbers or vice versa but if you're doing it and other people are doing it the
bad guys so to speak are going to be doing it as
browser and if you can't that's fine we'll share some aggregate data
nonetheless but you should have an opportunity to tap one of your answers
and we'll give folks a few more seconds if you'd like to play along at home
and here in just a moment probably have many people reporting but why
don't we go ahead and take a look at some percentages it looks like most of
you 67% are proposing just a few seconds so that's not all that good news if
it's a four-digit passcode
some of you are hoping it's a few minutes uh 8% are hoping a few hours four
more than 4% of you are really hoping perhaps it's a few days well let's
actually consider how we can answer this question and make today not just
conceptual but a little quantitative too and see if we can't slap some
numbers on questions like these so ultimately you can make more informed
decisions with your system security so for instance when it comes to four
digigit passcodes rather than just consider how secure it is well
let's make it a more precise question like what are the forms of attack well
the simplest attack might be just someone grabbing your phone be it in your
family or maybe at Starbucks or the airport or the like and just starting all
possible combinations maybe 00000000 then 00001 and 00002 we could
maybe automate this a little bit so for instance I might potentially be able to
do something like uh roboticize this here let me go ahead and full screen a
quick video here that's just going to
paint a picture in just a moment on the screen of how if we're a really clever
adversary and know how to build things well at least maybe we could
automate some of that process so here's an Android phone on a counter
here's a very simple tripod and a little touch device robotically doing all of
that hacking for you starting at 0000 Z probably all the way up to 9999 now
that too wasn't necessarily all of that all that fast but at least you the
adversary can step away and doesn't actually have to be
bothered with the time involved the cost involved in actually hacking that
particular device well let's go one level deeper a little more interestingly and
consider here um the how much much time really this so-called Brute Force
attack would take and that's actually a term of art much like in yester year
when maybe there was a battering ram trying to brute force their way into a
castle or something like that a Brute Force attack digitally is just someone
trying manually all possible codes or
maybe robotically trying all possible codes but generally automating the
process in some way to go through all possibilities well if you've got for
instance um a 4digit passcode let's ask maybe a follow-up question here not
how long will take but how many possible 4digit passcodes are there
because then maybe we can do some quick math and if every passcode
takes me a second or a few milliseconds or the like then I think we can try to
extrapolate from that whether the first answer was seconds or minutes or
days or hours or
something else so how many four-digit passcodes are possible if you take out
your same device it should have just changed automatically if it doesn't
seem to have maybe reload your browser with some menu option and then
tap in here how many four digigit passcodes are possible four total 40 99,999
10,000 or unsure is okay too so let's see we'll give you a few more moments
how many four digigit passcodes are possible and shall we reveal the results
so now it looks like uh a a few of you uh 2% of you are
saying just four passcodes 40 99,999 there's definitely some contention here
and 6% are un sure well how do we wrap our minds around this well let's just
kind of do this real simply here let me switch back over to doing a bit of math
and if we have here 10 possibilities for each digit if there's four digits each
digit can be 0 1 2 3 4 5 6 7 8 nine so that's 10 possibilities so if you think
about the number of permutations that's 10 possibilities for the first digit
times 10 for the next time 10 to the for
the next times 10 for the next and so if we do that out 10 * 10 * 10 * 10 or
10 the 4th there are indeed as 66% of you uh found uh 10,000 possibilities
and so now we can kind of work backwards and decide how long is it going to
take for an adversary to hack into this phone because if it's one attack one
guess per second well that's going to map out to 10,000 seconds but maybe
not if the adversary isn't a roboticist or a human what if they're like at a
software programmer someone who's taken even a
class introductory like cs50 and learned a little bit of programming well a
little bit frighteningly it's not all that hard to hack into systems if you just
know how to code to and really have the computer do your work for you so in
fact let me go ahead and change over to another screen on my computer
here um this is different for students in the group from vs code this is just a
black and white version of it that we've used briefly in the past and I'm just
going to go ahead and create a program called
USB cable or a lightning cable surely we could figure out how to connect like
laptop or desktop to phone and just like automate the process nowadays by
just sending all of the numbers into the phone until one uh unlocks the trick
just like in the movies or TV well in Python I could write a program that does
this as follows I can import so to speak all of the decimal digits 0 through n
and this for students in the room is just a slightly better version of typing out
10 different numbers manually I can
also import from a library so to speak called iter tools for iteration tools
which means to do something again and again I can import a function called
Product which means the cross product like combine this with this some
number of times and then it's just two more lines of code I can use what's
called a loop and programming so for every passcode in the cross product of
all 10 of those digits repeated a total of four times let me go ahead and
rather than bother connecting my phone and hacking
my own phone let me just print out every one of those 10,000 codes on the
screen and we'll see how fast the hacker could do this let me go ahead and
print and with an asterisk which is a little trick to format it nicely I'm going to
print out each of those passcodes and that's it four lines of code maybe 40
seconds of talking but maybe really four seconds of code if I actually did this
without the audience and now let me go ahead and save the file and I'm
going to run as we do every in class of late python of
[Link] and when I hit enter I should see on the screen all 10,000
possibilities from 000000 to 9999 so let's see is it a few seconds minutes
hours or days done so barely even seconds plural if that so that should be a
little disconcerting because all that adversary needs to do is grab your phone
off the counter plug in a cable and boom they're done like there's no ticking
clock or worries as in the movies or TV that maybe you're going to come into
the room you don't need that much of a
window of time so what would be better than this well let's consider what our
options might be if we don't want to just use four-digit passcode some of you
indeed might have better passcodes than that and maybe you use four letter
passcodes instead so A through Z maybe uppercase and lowercase that
starts to make things a little more interesting so should we pull this question
too if we upgrade from four digits to just four letters English letters a through
z uppercase and lowercase why don't we go
ahead and pull the grip here and ask how many four letter passcodes are
there instead so this time the range starts at four still not the right answer
though this time how many four-letter passcodes are possible not just just
take a couple more seconds all right almost a couple hundred responses in
already few more seconds and why don't we go ahead and reveal now the
answers which are Okay so we've solved a couple problems at least so we
well we well okay someone's just messing with us now all right so it looks like
most of you 76% of you have claimed it's 7 million plus possibilities so that's
encouraging cuz that's a whole order of magnitude more than before well
let's figure out how we might do this mathematically so if we've got 26
lowercase 26 uppercase that's 52 possibilities now for each of those four
digits so that's 52 times itself four times which indeed either off the top of
your head a good guess a calculator on the same device you're using right
now indeed gives us 7 million instead well
what might be slightly better than that well maybe four characters and this
indeed is what your Macs PCS and phones are urging us to do nowadays not
just numbers not just letters but like really annoying punctuation so it really
looks cryptic not just to the adversary but also to you and me unfortunately
and that's the downside but here now we have a mental model and really a
computational framework via which we can evaluate the security of these
and I'll go ahead and spoil some of the math here
point out just how easy it is to make these changes instead of importing
digits as before I can import as your uh child might know asky letters which
are a through z uppercase lowercase and I can just change this here asky
letters and so this was that first version where we just changed to letters let
me now rerun the code and instead of seeing numbers we'll see letters flying
across the screen and if I walk over here to the screen we'll see that by the
time I get here we're halfway through the
entire alphabet lowercase if I now start walking away I think yep we're
already done now with uppercase as well if I upgrade this slightly further let's
go ahead and take it one more level and perhaps do let's say ask letters and
digits and punctuation and this would be the pythonic way to say that and
I'm going to add to those letters those same digits those same punctuation
symbols let me shrink my font just so the code still fits on the screen and
what we now have is with a two seconds of changes a
program that if I run this version whoops without the typographical error this
is what we call in cs50 a bug so now we run the same this is what we call in
cs50 a second bug punctuation okay this is where I cross my fingers okay so
now it's going to be a little hard to see as it flies across the screen but you
probably are seeing glimpses of some weird punctuation characters as well
and I won't waste our time trying to talk through this because this is going to
take longer we're still
in the lower case I'm still over here already we've not even gotten to n now O
then P so this is going to run longer but let's end with one final question on
the security of all these systems I'm going to cancel that by hitting contrl C
on my keyboard and let's ask the question instead if we use eight character
passwords so twice as many characters but even that is not terribly long
right this is eight characters alone on the stage eight characters using letters
numbers and punctuation
might be better let's do one final vote here if we could on your same device
how many eight character possibilities are there now for these passcodes
and now for didn't even make the list this time all right few more seconds
about 100 responses so far how about we go ahead and Carter if you would't
mind let's reveal the results based on the vote a pretty decent spread here
although the quadrillions are quickly buzzing in and they're contending with
the others here looks like 44% of you said quintilian
34% said quadrillion and this time for the first time uh you overbid so indeed
if we go back to the math here at least the majority overbid if we have eight
character passcodes that gives us 94 times itself 8 times or 94 to the eth
power and in fact that gives us roughly 6 quadrillion 95 trillion 689 Bill 385
mil 410,000 and 86 possible passcodes now what does that mean well the
adversary's algorithm the step-by-step code that they write to try to hack
into your phone is no different and honestly if your passcode is eight
because you want someone else's passcode to be the one that the adversary
does something with with just like in the physical world even though it's a bit
uh uncomfortable to consider your house doesn't need to be 100% secure
and indeed it's difficult to make it such there's always going to be a a point
of weakness maybe it's that window the door or something like that but if
your home is more secure than the next door home just probabilistically you
are more secure you're not secure and indeed any
website you see down the road that says we are secure because we do X Y or
Z like that's nonsense security is really about comparisons and Val ating
things if quantitatively relative to some other system relative to some other
code so what's the takeaway here well hopefully a non-trivial number of you
we'll go home this weekend on Monday and change at least one passcode
um but there's going to be a trade-off here and we talk about this all the
time in cs50 anytime we improve something we pay some price
in time and performance and cost somewhere else so what's the downside
then of this advice that you should use minimally eight character passcodes
why might you want to say nay and not do this say again you have to
remember you have to remember it right and so here there's sort of some
sociology there's some human behavior you know some of you might have
colleagues if you're working in the real world at least back in healthier times
when you had colleagues with desks and cubicles and there's
probably one person in the office with like a Post-It note on their monitor with
their passcode you know it's a bit of a cyber security offense but it's also a
sort of real world side effect maybe of corporate policies that aren't really
calibrated for human behavior so we'll see if there's some other defenses
and indeed let me propose that we talk briefly about one that actually tends
to kick in automatically even if your passcode is not as strong as we've just
seen one of these six quadrillion
possibilities well what could we do instead well as anyone and I'll zoom in on
this here accidentally locked themsel out of their own phone before like when
does that happen yeah when you try the password too many times yeah so
too many times maybe your finger's slightly off maybe you're slightly off and
you just don't input the same passcode correctly after like five times 10
times there's some reasonable threshold and why does that happen well
Apple and Google equivalently figure just
usability downsides too so security is really just about finding The Sweet Spot
among these various tradeoffs here but there's other mechanisms too and
some of you might recognize this screen from Gmail via which of course you
log in but after you log into Gmail or similar websites or apps or systems at
work nowadays especially you might be presented with uh what's called two-
factor authentication and what is this in a nutshell in Lay person terms most
many of you if you do anything digitally at work might have to
do this now yeah exactly you get texted at your phone an additional code
that's not your same password it's typically a numeric code maybe six digits
long it expires after a minute or 10 minutes but why is this a good thing well
one it's no longer just a piece of information that you know or that you might
have written down it's information that changes every time you try to log in
but more importantly it's a fundamentally second factor which means it's not
just something you know now it's something
you have so you for instance are the only one theoretically that should be
receiving that code and so now the adversary if they want to get into your
account not only have to guess or brute force or maybe read off of a Post-It
note your password they also have to physically have access now to that
phone so there's still a threat absolutely but it's not everyone on the internet
with an internet connection now it's only the people in Starbucks now it's
only the people at work now it's only the people
in your home who might have access to that second Factor so there too it
just raises the bar to the adversary making it harder more timec consuming
more geographically impossible for them to attack you but what's the
downside of two-factor authentication whether it's a device or even
nowadays it's in software whether it's on your keychain or on your phone
where you're prompted for this code what's a downside as some of us have
probably experienced too forg you forget your cell phone
absolutely right the the factor that you have you don't have with you or
maybe you're in a basement somewhere don't have reception you're on a
plane you can't get the code and so there too are these tradeoffs and even it
departments need to keep that in mind because what does that mean for
them well if you don't have your phone with you and you are in the habit of
calling it to help you fix this now there's a a cost a human cost maybe even a
financial cost and so it policy nowadays is really just
about finding the right balance and where we want to spend our resources
but at least raise the bar to the adversary but of course there's other ways
too and this is going to be one of our homework assignments if you will after
today there's this software called password managers and no need to buzz in
on your phone but maybe with a physical hand how many folks here use a
password manager okay let me ballpark this at 10 20% perhaps Okay so
we've got 80% upside here and a lesson learned potentially so
as for every other now you use the password manager software to generate
something difficult to guess for you that is you tell the password manager
give me an8 character random passcode not 0000 but something with
punctuation with numbers with letters and better yet the password manager
is the name suggests remembers that password for you and the next time
you go to another website you do it again with a completely different
password maybe same username maybe two-factor authentication but
different password different
password different password and it doesn't have to be eight I mean I'm in the
habit of using a dozen two dozen characters in total and at that point I can't
even pronounce the number of possibilities because it goes well be Beyond
uh the quadrillions so the probability that someone's going to get into one of
those accounts for me now is very very very low and they're going to take
less interest in me and maybe more interest in someone else that's not using
as good of a password now what
does this mean in real terms well when you go to log into that managed site
you don't manually type your password anymore in fact you don't generally
even need to know it nowadays I probably don't know 90 plus 99% of my
passwords I entrust them to this password manager now of course you'd like
to think that the password manager itself is secure so what might that mean
well those of you who do use a password manager how do you access that
software itself what's protecting your data in your
and put it in a safe deposit box or a safe or just kind of hide it somewhere
physically that there's very low probability someone's going to find the
backup copy that might be Al loan but of course the flip side is now if you
forget that Ma that primary password you've now lost all of the eggs in the
basket if someone gets that primary password now they have access to
everything so that's rather the trade-off but I dare say you're probably less
threatened depending on your family
uh uh by the people immediately around you than the billions of other people
on the internet that have access potentially to those same systems so there
to it's a trade-off and it's up to you to decide whether or not to manage your
passwords in this way but if you were on that top 10 list or even if you're not
but you can think of several accounts that all have the same password you're
probably going to benefit from something like this and why is it bad to be
clear to use the same password on
multiple sites in case that's never sort of dawned in thought why is that a
bad thing to reuse a password on different websites different apps any
intuition yeah and back exactly once it's attacked you can the adversary
presumably by transitivity can see oh well if this user's username is mailin
[Link] on this website and their password is foolishly 1 2 3 4 56 or
even something way more complicated they can probably just assume with
high probability that if I'm being a little reckless let's try access
mailin at [Link] other accounts other apps using that exact same
password and so by transitivity essentially you're putting your other uh
Accounts at risk so what's maybe a takeaway minimally here I would start to
reconsider your passcodes on your most important data Maybe it's medical
maybe it's Financial maybe it's email anything remotely personal that you
really wouldn't want to have access uh do you necessarily need the same
form level of security on eCommerce sites or sites
that you don't really care about or that you signed up for once and after that
H that's it probably not so you can decide for yourself but again software like
a password manager and these are just some of the possibilities out there
um are probably to be your friend a couple of these are free they come with
Windows or Mac OS a couple are commercial Harvard has a site license for
students for uh one of these as well so there are options out there but what
else do people use what else can people use to
wanted to send this message out to someone in this room or out on the
Internet or maybe equivalently back in the day maybe write a message down
on a scrap of paper in grade school and pass a secret note a secret love note
to someone in class with hopes that the teach or any other students in the
class can't intercept it and read it well you probably don't want to say this is
cs50 or I love you or anything remotely sensitive but rather maybe you want
to encrypt it and let's change the the T to
relatively simplistic but back in the day it's not so simplistic if you're the first
person in the world to ever use it or think of it but nowadays this is not
actually what we use but it's similarly mathematical in nature it's not quite as
simple as just adding one or subtracting one to go from now what we'd call
text to plain test but it's similarly math that's involved and let me just
stipulate that the way the math works is that the sender and the receiver
just have to have in mind some kind of secret
and the Secret in this case would very trivially be one but it could be a much
bigger much more unguessable number or maybe some other secret we
share the presumption being that my classmates my teacher in that grade
school classroom if they don't know what that secret is that number is yeah
they could try to brute force it and try all possible mathematics plus one plus
2 plus three but that's going to take them some time and they probably don't
care enough and so my data might be therefore relatively
secure but we use encryption all the time nowadays and so for instance this
is at the start of most URLs nowadays even if you don't type it yourself with
that said Safari and even Chrome now or kind of simplifying if not dumbing
down user interfaces to just hide details that you and I as sort of normal
users don't need to see 24/7 but it is there and if in fact on your phone or
laptop you click on the URL even if it's super short initially you'll probably see
the whole thing starting with this and the s
means secure the s means that encryption is being used but there's other
forms of this not just when you visit websites there's this endtoend
encryption which is being talked about more nowadays especially during coid
times with so many more of us on video and talking about uh more sensitive
things tele medicine talking to doctors things that you also wouldn't want to
verbally or visually get out into the wild just like text what's different about
endtoend encryption versus HTT GPS and the type
of encryption that most of us use every day on websites alone end to end
encryption is sort of a better feature that you want to increasingly seek when
using services like Zoom or Microsoft teams or Whatsapp or the like any
instincts here yeah over on the right good so the encryption the scrambling
of information happens in The Source the sender and the destination the
receiver without a so-called middleman in between and this is actually very
different from most contexts nowadays that use just https
because when you're using https to buy something on Amazon securely with
your credit card well of course Amazon needs to be able to decrep the
message at the end of the day and so that's fine but even when you're using
services like video conferencing or maybe text messaging nowadays well if
you're using Whatsapp that's owned by meta and if you're using Instagram
that's owned by meta there's a lot of middlemen in these apps that we're
using and if they were only using encryption period or only
using something like https yes your en your connection from you to
WhatsApp and in turn to the recipient might very well be secure on each end
of that channel but meta in the between the company and any other
company in between could theoretically For Better or For Worse be looking at
that data whether it's to mine it for advertising purposes whether it's to
Snoop on data that you're sending that is not end to-end encryption if the
middleman a company typically has technically access to that
data now zoom and Microsoft teams and WhatsApp and iMessage and other
services with which you're familiar increasingly are offering stronger
guarantees of encryption whereby it's indeed between parties A and B and
not the one in the middle now there's downsides here and you can actually
see this kind of functionality manifest in certain settings for instance besides
iMessage uh which just does this for you on iPhones or Macs besides Zoom
um you can actually fine-tune these settings indeed
within Zoom itself s so here's a screenshot that I took last night of just what
the user interface looks like today to create a new Zoom meeting with the
latest version of Zoom software and maybe unbeknownst to you there's a
choice of buttons down here and most likely yours is by default on enhanced
encryption which is brilliant marketing speak because it's just encryption it's
not enhanced it actually ironically means worse than this um but they want
you using it most likely why well it's a
little easier to implement it's a little less expensive for them computationally
and to be fair enhanced encryption does scramble the data but not in a way
that Zoom can't see it Zoom can indeed see it but that's actually a plus in
some context because if you want to do like Cloud recordings and you want a
meeting recorded not on your Mac or PC but like let Zoom deal with that if
you want automatic transcription nowadays so the words to appear whether
it's English or something else on the screen well you
can't really lock Zoom or any other middleman out of that because someone
needs to save it to the cloud someone needs to translate the voice to those
English or some other language words so enhanced encryption enables those
features but they also allow a bad actor malicious employee someone who's
just nosy at Zoom or the equivalent middleman to just kind of poke around
your video conference and hear what you've said or see what you've typed
as well unless you instead check this box as well so
increasingly look for mentions of end to-end encryption or give that some
thought when you choose a technology via which to communicate with
someone whether it's within your family or without as well now last but not
least there's other applications of encryption too and this too might be a
lesson learned as well full dis encryption so a dis is like where your data
stored in your Mac or PC or even your phone and full dis encryption just
means ideally that all of your data is encrypted that
even if someone else steals that device opens the lid unless they don't
unless they have your passcode they can't even plug in fancy cables to the
device and just rip the zeros and ones off of the device and see what's
actually there full dis encryption means they could do that but they would
just see seemingly random zeros and ones now there's a downside here too
this might slow things down potentially but it is a feature increasingly that's
offered and is absolutely something you should consider
enabling um in general especially if your laptop or phone travels with you
and certainly your phone does or if you plan to donate or sell or give away a
device you don't want to leave all of the zeros and ones the remnants of your
own sensitive data passed on there so Windows has a feature called bit
Locker Mac OS has a feature called file Vault there's commercial options as
well but generally we're at the point now in 2022 we're clicking a button is
suff efficient to enable these features with
that said don't rush into all of these decisions I would make backups of your
data and don't maybe email cs50 if something goes wrong with that process
but I would do your own due diligence but this too would be a menu of
possibilities and now the bad side the downside of what seems to be great
this notion of full dis encryption unfortunately just as we can encrypt our
data to protect it from the adversaries so can the adversaries if they get into
our devices encrypt our data and do what
not tell us that secret key and so this is generally applied in the context of
ransomware which tragically you increasingly hear about in Hospital Systems
school systems municipalities where systems are getting attacked and the
data is not just getting stolen because what is the adversary typically need
with like local municipal or even Hospital data the value to the adversary is
encrypting all of the hospital all of the mpali data preventing them from
accessing it if they have no backups or the like and so ransomware is liter
Lally about trying to convince someone to pay you money or pay you Bitcoin
or something like that to give you that secret key and this key in this case is
surely more sophisticated than the number one but it's really the same idea
so here too yet again a trade-off just as we sort of invent something for good
it can also be used for evil and so to speak as well but it's really the same
underlying principles even though we keep seeing it and hearing about it in
these different forms and lastly if only
because folks are generally familiar but don't necessarily know what it is that
it's doing for them browsers nowadays have what's often called incognito
mode or private mode which has nothing to do with encryption but does
have to do with cyber security or really cyber privacy keeping your data from
prying eyes uh incognito mode if you open it in Chrome for instance looks a
little something like this and we use it in cs50 when introducing students as
we did last week to web programming because it in effect
lets you start with a clean slate like a brand new browser that has never
visited any websites before which is good for just diagnosing problems but
it's often commonly used if you want to log into maybe your Gmail account
on someone else's computer and you don't want your password being saved
or you want to visit some website where you don't want the URL or the
Search terms ending up in your autocomplete history so there's multiple
uses for incognito mode but what does it really do well it doesn't
stop your company it doesn't stop your University your internet service
provider be it Comcast Verizon or the like from knowing what websites you
go to because ask your student a couple weeks ago we talked about actually
a week ago we talked about how the internet works and unfortunately
computer has an IP address which is unique identifier which goes out
anytime you go anywhere incognito mode or not so this isn't really covering
your tracks outside of your office or outside of your home or outside of your
company but
it is at least throwing away local information and so we'll talk in fact in cs50
is week nine this coming Monday about cookies which you might generally
know about and what are called sessions and so long story short what
incognito mode does is it throws away when you close the window any
locally stored information to the these things called cookies which are sort of
like virtual handstamps that just remember what you've logged in as or
what's in your shopping cart or the like but it doesn't
hide any information from anyone outside of your own Mac or PC it only
prevents those local prying eyes so there too even though we have tools that
many of you are probably in the habit of using or thinking you should use to
um be more private be more secure on the internet what we do really in cs50
boast week's uh past and future is talk about how these Technologies work
so that ultimately we have all the more of an educated citizenry um here
among undergrads and here as well as on line so that you can apply these
same Lessons
like but at least seeking out that feature at least for accounts that you really
care about your email social media Financial medical anything where you'd
be embarrassed at best or really violated at worst if that kind of information
got out and then increasingly using not just encryption which you kind of get
automatically for most Technologies today but increasingly choosing
technologies that offer stronger guarantees that keep those middlemen
those companies out of the way if only so that you can trust with
higher probability that only party B knows what party a has said or sent now
this of course was a whirlwind tour there's so much more that you can do
online indeed this course cs50 can be taken for free online via platforms like
edex at [Link] cs50 I thought it might be appropriate to end on this note if
anyone would like to conjecture before we start playing music and adjourn
for lunch what our final message here is if we reverse the plus one and
maybe start minus one here minus one here and