any probability PA is the event A will happen between 0 and 1
rule 2: all possible outcomes must add up to 1 for sample space
the probability that an event DOES NOT occur is 1-probability. this is called a complement
two events A and B are DISJOINT or MUTUALLY EXCLUSIVE if they cannot occur at the same sample space
this is if and ONLY if A n B are disjoint
this is A U B which equals A+B
A OR B= ADDTION
can't be a sophmore and freshman at the same time..
Two events A and B are independent if knowing that one occurs does NOT change the probaility that the other occurs
A upside down U B= AxB
we should begin with the sample space
-DONT assume events are independent
dependent events are probability of one event if we know another event occurs
conditional probability: probaility obtained with the additional information that some other events already occcurs (conditional probaility of B given A )
B|A is read B GIVEN A or as event B occuring AFTER A happened 7 rule: B|A represents conditional probability of event B occuring after A B|A= (Aupsidedown U B)/A
A NOT hAPPENING given that B already occured= (A-|B)=1-(A-|B)
disjoint is when 2 groups cant happen at the same time
other events are overlap
They're like, oh no, call again tomorrow. I was like, oh, I didn't have to get into class. But so yeah, sorry. I hopefully the video that you watched and were able to grasp most things.
I probably said this in the video, but I'll say it today several times.
These first two lectures of this unit, I realized they were a little dry.
It's a lot of A's and B's, probability stuff flying around.
It seems sort of nebulous. But come Friday next week, we're really going to ground it in some real public health medical stuff.
And I think it will mean a lot to us. Then some other professors who teach this course spend a lot of time on these probability rules, but I just kind of scratch the surface on them, make sure we know enough of them so we can get to actually using them and stuff.
And some of the stuff we'll go over next week, I think shows up on the NKET, which I guess is nice for those NKET people.
We have a lab due Monday night. We have another lab due Friday. And if I remember correctly, the lab is due Friday.
It is the one that tends to be tricky for people. And I actually haven't looked at it since the summer but I'll probably pull it up at the end of class today.
I might take some questions on it. This lab is on data cleaning and as an analyst data cleaning is actually the hardest part because it involves kind of the most meticulous coding.
So you get data from, let's say, maybe it's a clinician, maybe it's a couple of study sites.
It all comes to you, you, an analyst, the biostatistician, epidemiologist, whatever.
You have to make sense of this data, right? Or if you're the data scientist in New York, not in public health.
And oftentimes data comes in like nasty formats and you really have to do a lot of like trickery with the coding to get it into a workable data set that you can actually make sense out of or answer your question with.
So this lab, I think people have a lot of issues with it because some people don't understand what it's actually asking you.
But it's essentially asking you to change a lot of different variable types.
Like with R, there's a few variable types and sometimes it doesn't get read in as the variable type you need it to be.
So you have to convert it as such. If you're quite annoying, data cleaning, a lot of people think like as a data analyst, you're just like, oh, they just come in crunching numbers, but you actually spend most of the time giving the data into a working order.
So we'll look at that a little bit in the class. But for now, I just want to kind of bridge in with a couple just review bullet points from the video that you hopefully watched Monday of yesterday.
So we had our first like three or four rules we went over.
I'm not going to go into all the examples of them, but I will talk about them briefly.
Rule one, any probabilities number between zero and one.
So in this class, for the rest of this class, if you're a vertical a probability, for some reason, it shakes out in between, or it shakes out as less than 0, or greater than 1, you've got a problem, right?
So any probability is going to be between 0 and 1. Zero being probability would be that it never happens.
One would be that it always happens. Rule two, all possible outcomes together must have probability of 1.
This one refers to something that starts with an S, sample space, right?
So all possible events are going to add up to one and that's going to be our sample space for whatever phenomena that we are studying.
Like let's say we're studying the survival of cancer in a certain group, option to be survived or did not.
That would be our sample space. Those are the two possible outcomes in that sample space.
And we usually denote that with this big S. This one's super important. This is going to come up time and time again for the rest of the semester.
And this is the complement rule. Probability, then the event does not occur. It's just 1 minus the probability that it does occur.
So there will be a lot of instances maybe in homework, maybe even down the line, we're actually calculating the statistics, the values, where you'll have to kind of use this complement.
So if an event occurs 70% of the time, it fails to occur 30% of the time.
And once again, we call this compliment. And so when you see a letter or maybe a word with a line on top of it, you wanna read that as whatever low gear compliment.
So if you see like, let's say survival, what would the complement of survival be?
Yes, rough, if you're passing away, you're gonna study.
So that's kind of morbid. So A complement, and so for any event, probability of not a is 1 minus the probability a and I know we didn't meet on Monday so I'll just talk through this real quick too.
When you see this P in parentheses next to it, that just means the probability of this happening, whatever's in parentheses.
Could be a capital letter, that's one way we denoted it a lot.
It could also be some actual instance probability of rolling the heads on a die and whatnot.
And then finally we get to rule four. So two events, A and B are disjoint or mutually exclusive if they cannot occur at the same sample space.
So what this means is, let's say we are running a study and we are looking at the prevalence of what year people are in this class.
And so we have four options, refreshment, sophomore, junior, senior.
For the purposes of the study, we're not having somebody do both a freshman and a sophomore.
Those are mutually exclusive. Okay. So if that's the case, if that's the type of thing we're looking at, when we want to calculate the union of two probabilities.
That is the probability of being a freshman or a sophomore.
We do this probability a cup b. And we just add, right? So the probability of whenever you see probability A or B or anything with an or in between them, what operator are we thinking about?
Edition, right? And This little mathematical symbol is called cup.
It stands for union. And this is the addition rule for disjoint events.
And disjoint is a weird word, but all that refers to disjoint means it can't occur in the same sample space at the same time.
You can't be a freshman in a sophomore at the same time.
Not everything is going to be so nice. There's going to be some instances where we have a cross happening.
And in that situation, it'll get a little more complicated.
But we'll get to that today. I think this is where it ended, right? This is as far as the lecture went. So this is the new stuff now. It's a good place to stop and say if you have any questions about the lecture from Monday.
Cool. All right, so does anybody remember what independent events mean?
I think that did end up in the lecture, the Monday's lecture.
What are independent events? What does that mean? Swimming event happens, and then the next event is completely independent of it.
Exactly, yeah. So one event happens, and the other one does not depend on it at all.
Like, let's say we were, we were like tracking sex of infants in the hospital, if pregnant person has a female baby in one room, and a pregnant person has a male baby in another room, those two things are not going to be connected at all.
They're going to be truly independent events. So if in only two events, A and B are independent, we can calculate their intersection, that is the probability of A and B as just multiplying those two together simply.
So this little cap, right, Sort of looks like an N.
When you see this, what operator are you going to think about doing?
Multiplication. Alright, so when you see this A, and you can read this as A, probability A and B, and that's just talking about where these two intersect.
If you have a probability A, probability B, any time you calculate an intersection, That intersection is going to be less than both A and B.
Because probabilities are all fractions. So if you calculate that intersection, it'll be less than that.
So a really simple example. If we toss the coin twice, what's the probability that we would get a heads bow top?
See again? I think you said it right. Is there one fourth right? Yeah. So how we would get that is we would do 0.5, 10.5, and we would just get to 0.25.
We're going to do 0.5 times 0.5 and we'll just get to 0.25.
So we can take this intersection rule And if we have truly independent events, we can extend it to several other independent events.
So right now we're just talking about two, A and B.
Well, let's say we have A, B, C, D going all down the line.
We can multiply all these together. And you can imagine that probably is going to get very, very small for that intersection as we get smaller and smaller.
And this is something I don't think we'll really deal with too much in our homework or whatnot, but it does exist.
And for, actually, we will deal with this when we talk about discrete distributions.
I like. So for this, let's say before we looked at tossing the coin twice, probably getting heads, twice, well, we're going to do three times, same thing, 0.5 times 0.5 times 0.125, notice it gets even less, right?
So we're looking at a smaller intersection. Kind of think about it on that Venn diagram, the area on the Venn diagram, where it crosses going to be even smaller.
Okay, so maybe difficult because not a lot of people showed up, but we're going to try it.
We have a little side here. So how many people do you think need to put in a room before it becomes more likely than not that at least two people in the room have the same birthday, and that's the same month and day, not actual year.
So how many people do we need to have for it to be just like 50-50, like a full coin flip, that we'd have two people that's going on the same day?
It would be like if you do 24 for 12 months and then...
You're on here getting on the right path, right? So what are some things we might need to consider with this?
For it to be two people, it has to be twice as likely.
So you need at least twice as many from each month and each day, you said?
Yeah, we want someone to be, we want just two people, same day.
Just throw, let's just start, no wrong answers here.
No reason here. So, say, maybe 365 would be, all right? So 365 people, for this to be a coin flip, for it to be more likely than not to have two people.
So what do we think about that? Maybe we need double. I'm thinking 754 because that's 365 times 2 and then 12 times 2.
Okay, so we have 754, 365, what else? Anybody, any other ideas? So I think you all are considering some great things.
How many days we have in a year? Kind of starts getting at the sample space. How could we go about actually answering this question?
I'm going to walk you down how we would do this. So, think of this as the sample space, right? And within the sample space, we'll have the probability of two people having the same birthday, right?
And then we have the probability of everybody having a different birthday.
And essentially what we're trying to find is the line in which that comes 50-50, right?
That is like 50-50 chance whether we have two people with the same birthday or not.
So this probability that any two people in a room have the same birthday is gonna be pretty difficult to figure out.
There's a lot going on there. The probability that everybody has a different birthday, I think we can start chewing on that for different numbers, right?
You're both getting at the thing that we're trying to do here, but it's a little more complicated.
So how could we find the probability that everyone in a room has a different birthday?
How many days are in a year? 365. 365, right? So the probability that two people have a different birthday would be 365 over 365, that's the first person.
They're not going to have their birthday themselves and then 364 over 365 right maybe a high percentage that they would be have different birthdays right we don't expect just any random two people to have the same birthday.
So think about that extension of the multiplication rule.
I'm going to consider that people in the biostatistics class here at UGA aren't being selected or registering for this class because they have similar birthdays or anything, right?
Everyone just needs to take a status class. So how could we extend this to more people? What if we wanted to figure out the probability that five people have different birthdays?
365, 364, 363. You got it? So this is four people, sir. So probability that a person, one, two, three, four, have different birthdays.
We extend the algorithm and we get to 0.98. And if we went one to ten, we would just do the same thing, right?
Person one, two, three, four. And keep in mind, this is probability that everybody has a different birthday, right?
We're gonna have to use another rule in a second here to get the probability that two people have the same birthday.
Which rule are you going to have to use for that? Ah, exactly. So I'll ask again, how many people do you think will need for this probability to reach 0.5?
Let's find out. So it's only 23 people. So if you actually do this algorithm all the way out, you get 0.4927.
And then using the compliment rule, we just do 1 minus that.
So, when you have 23 people in a room, it becomes a coin flip on whether or not we have two people with the same birthday.
So let's see this whole interior. Well, we have to go use our software. I actually just downloaded R on this computer for the first time.
I had not even used it yet on this computer. So here's a little algorithm that's going to walk us through this.
So how many people do we have in class today? Anybody want to do a quick counter? 17, 18, maybe? So 18, including me. It's probably not going to work. It's OK. So it's a little vector, number of people in the room.
This is a numeric vector to create the probabilities.
And this is a crazy algorithm that we've put together, but it'll just pretty much do what was being done on that sheet with all those, you know, 365, you know, and so.
So we run this code and solve for the probability we are at a .35 chance that we have two people with the same birthday name.
Usually I do this on a day. We usually have classes at 40, so I don't have problem clearing the 23 thing, so it usually works.
For some reason they gave me classes at 28, maybe it's because they put me in this funny small room.
But, we were born in January. February. March. Maybe we're on to something. What day? Eighth. Second. Fifth. January February March What day? 8th. Second. Fifth. Ninth. We were so close. We were so close. We're only at a 34% chance. April. Okay. May. June. Fourth. Sixth. Close again. July. Okay. August. September. Okay, August. September. We have three people. All right, go ahead. December. December. It's okay. So, I'm going to go ahead and do the rest of the book.
December. December. November. December. December. December. That's okay. So, Usually there's more of a payoff here. One time we had three people who all had the same birthday.
And it was a class of, I think, 35 people showed up that day.
It was a great day for mathematics at the University of Georgia.
And I like to think I'm pretty good with segues in this class.
Even though I've been doing this activity for years, at this point I still don't know.
I have a good way to wrap it up. So usually I can say AMF, but I can't even say that now because it didn't work.
So sorry for wasting time. But yeah, so it's, I mean, my first guess when I, you know, saw this activity where it was something like 365, 700 or whatever, or maybe even up into the thousands, you know, which I think about getting a big group of people in a room and be like, oh, this just seems like there's so many days in a year.
But, surprisingly, having two people on the same birthday, 23, it's pretty likely.
I mean, I don't think, I think maybe I've, there's been one class I've had with over 23 people where it hasn't happened.
In the 40s it always happens because like if you put in, I'll show you this real quick, if I change this vector to my full class of 40, or the full classes I usually have, it becomes 90% off.
So it's usually wild. Yeah, it's wild. It's wild. I don't really know what else to do with it. So you guys have to say, hey, it's wild. And then you have to move on. It's wild. All right. Back to rules. OK, so if we roll the die, flip the coin, What's the probability of getting a 3 on the face of the die and getting a head on the coin?
Even by we're talking about intersection of events.
So what type of operator are we going to be using here?
Multiply, right? Yeah, I'm not familiar with the definition yet, but I was going to say just multiply.
You got it? Yeah. So what's the probability of getting 3 on the base of the dime?
Three over six times 1 over 2. So work on that first one. Wait, wait, wait, sorry. One over six times one over two. Yep, so number on a die, it would just be one over six, right, six out of die.
And probability that coin turns up heads. So these are independent events, right? What happened? The colon and my die are going to influence each other.
So we can just multiply those two things together and we get one in 12.
A Little aside, I think I said this in the video before, but I always get students during this part of the course like really overthink some of these probabilities.
They'll put the coin in the next slide perfectly, you know, and I definitely encourage that type of hyperanalytic thinking in life.
But for this part of the course, you can take a lot of these things with base value and these control probability problems and don't waste your time over and stuff like this.
I think that's pretty funny because wouldn't there technically be an infinite number of probabilities with that case?
Yeah, it would be, it would get up there. Yeah. So independence is tricky. You can't always just assume two events are independent.
Sometimes you can apply logic and reasoning, like with the coin flip, roll a die thing.
But rarely, if you do not know that two events are independent, you have to assume that they're not and you can't use this formula.
For this is a funny day because I showed you all these formulas on Monday and now I'm showing you when you can't.
But Sometimes we have two events that aren't dependent.
So sometimes the probability of an event can change.
If we know some other event has occurred. So these are dependent events. And then the probability of the second event needs to take into account the first one.
So can anybody think of any examples of deep-ended events?
The marbles, yeah. We did a little intro probability of the models. And we did a little thing at the end of the lecture, probability of getting certain color marbles out of the bag.
If you take one out, you don't put it back in. It's going to change the probability of your next selection.
So this next example is going to be sort of similar to that.
A biologist experiments with a sample of two vascular plants.
Those are denoted by this v for vascular and four non-vascular plants denoted by n for non-vascular.
She wants to randomly select two of the plants for further experimentation.
It's assumed that the selections are made without replacement after the plant is selected.
It is not put back in the pool. So take the plant away, not going back. Find the probability that the first selected plant is non-bascular and the second is non-bascular.
So how do we go about this? We're still going to be doing what, operators? We're still going to be multiplying. And what will we multiply? What will we multiply? What's the probability of the first selected plane is non-basket?
Two hundredths. And what's the probability that the second plant is non-bascular?
You got it. So, since the two plants are selected without replacement.
They are dependent. So the first selection of the four out of six plants are non-bascular.
And after selecting a non-bascular plant on the first selection, we are left with five plants, including three that are non-bascular.
So we do that four out of six times three out of five, and we get 0.4.
Take it to account that something has changed. What has changed in this problem? And I'm looking for a word that we introduced Monday and reviewed in the lecture today.
Sample space. Sample space. So the sample space has changed. So we have to account for that. Any questions on that? All right. A little bit more on dependent events. If two events are dependent, or if you're not sure, you have to use a formula that accounts for the dependence kind of like we did before.
So If we're looking for an intersection, we have to multiply not only the two probabilities.
We have to let's say if you think A and B are dependent.
Sorry. We multiply the probability of B given that A has occurred times A, or equivalently, probability of A given B has occurred times the probability of B.
So this line, what this stands for, is a conditional probability.
And I'm going to get to that in a second, so hold on real quick.
But in the terms of last problem, the probability that the first selected plant is non-vascular and the selected, the second plant is non-vascular.
You could write it out like this, probability of B given A times A.
So essentially a conditional probability on that central space changing.
So conditional probability of an event is a probability of a team with additional information that some other event has already occurred.
So the conditional probability of B given A can be found by assuming that A has occurred.
So we're operating in this new world where A has occurred and we calculate probability of B occurring under that world.
So example this would be, let's say we had the data set of some cancer outcomes.
What's the probability that someone survives given that they have been diagnosed with pancreatic cancer?
So in that situation, we would stratify, we would parse out our data set to just those with pancreatic cancer.
And then we'd calculate the probability of surviving pancreatic cancer.
And the important thing I kind of want you to take away for today is when you see this line, right, this B vertical line A, I want you to read that as B given A or even further, event B occurring after A has already occurred, right?
So probability of B happening after A has already occurred.
So there's sort of a lot to unpack in that line. And once again, that's called a conditional probability.
And that brings us to rule 7. And I promise you there's only a few more rules. Probability of B given A represents the conditional probability of event B occurring after it is assumed that event A has already occurred.
So probability is B given A. And we can calculate that as the intersection of A and B over the probability of A.
Once again, On Friday, we're going to ground us in some tables with some real data.
And we're going to look at how this works within contingency tables.
And two matching tables, I think it's going to make a lot more sense.
I think the calculations are easier when we look at real data in real tables.
So what's the probability that someone survives given that they have been diagnosed with pancreatic cancer?
Once again, the denominator here would be that thing that's conditioned upon.
It would be only those diagnosed with pancreatic cancer.
And just like with regular events, we have those compliments, right?
The compliment is probability of something not happening, just one minus it happening.
We can have complements for conditional events as well.
It just looks a little funny and a little bit trickier.
So with conditional probabilities, you can't change the event that's already happened.
So the compliment rule is modified a little bit. The compliment would be the probability of A not happening given B has already happened.
It's just one minus the probability A has happened given B is already happening.
So sort of a labyrinth there, but going back to our example, what's the probability that someone passes away given that they have been diagnosed with pancreatic cancer?
So the denominator, so only be those with pancreatic cancer, but we would just do one minus whatever we calculated before.
So it would be the complement of that. Does that make sense? Any questions about that? All right. I think this is the last rule. It's not the end of the lecture, it is the last rule.
So we talked about disjoint events before. What does disjoint events mean? So to go back to the example we talked about in the class, So, go back to the exam, we'll talk about the beginning of class, survey everybody in the class, see what grade and college we're all in.
We can't have, can't be a freshman and a sophomore together, right?
So completely cannot be correlated? Yeah, they can't, you can't like, you can't be in both groups, right, at the same time.
So that's disjoint. Well, not everything is disjoint. Some things are going to be non-disjoint. Some things are going to have some overlap, right?
So we're going to talk about how we do a union of events when that happens.
So probability of A, cup B, that tells you to just add the two, right?
A or B. But if we have an intersection between the two and we add them, we're going to double up on that intersection.
We're going to end up with too much area. So we have to kind of alter our formula a bit when we have non-distjoint events.
And non-distjoint events, again, is when there's some sort of crossing between them.
So the union rule for any two events, non-distjoint or non, is given by probability of A plus the probability of B and you have to subtract that intersection.
So when you subtract that intersection, you're accounting for this area that's getting added twice.
The reason you can use this for disjoint or not is If there is no intersection, you can do zero this out.
And just add the two. So, I think this does the best job showing it. And it's called a line. Sorry. So, purple section is what we want to calculate. The union of these two events, right? The red over here, that's the probability of A. The blue section is the probability of B. And then here's our intersection, the probability of A and B.
So we want to add both of these. But when we add both of these, we add this intersection twice.
So then we just subtract that intersection. And that gives us the union of those two non-disjoint events.
And I realize non-disjoint is a weird way, but that's how statisticians like to do things.
You'll see when we get to unit three, there's a whole spoo of other weird ways statisticians like to talk.
I didn't make much statistics, I'm just a little steward.
Okay, so just a quick review. Once again, the name's fellow client, I'm sorry, we can talk to me after this, and I can break this down a different way.
Probability of A and B, the intersection, that's just given by this little sliver right here.
The blue circle is probability of B. The green circle is the probability of A. The purple section, the Pac-Man looking thing, the probability of A and not B, right?
So it's A here and it takes away B. The red section, the probability of not A and B, right?
So we have this little background thing here. And then the pink section is just not a and not be everything outside I'm afraid to move a little, whether it's facing downwards, I'm so sorry, I know that I'm not using direct technology.
It's nothing, really. So like... So you're looking at these symbols? Yes. So these symbols... Yeah, I'm glad you, I'm glad you brought this up. So see that like bottom one at least, it's like going down and then on the other side it's going up.
Yeah. And I think that's for the opposite because the line above it is probably not happening, right?
Yeah, I think, let me back up and make sure you understand what you're asking.
So this cap, what you're stating is AND. That always stands for intersection. And that always tells you to multiply. Right? This guy, this cup, that's a union. And that's always telling you to add two things. Okay. Right, so if you see the one that looks like a coffee cup, tells you to add, and the terminology with that one is 4 probably k or b.
Copy cup, copy cup, 4. The one that wants you to multiply is this one that looks more like an N.
And the terminology with that one is A and B. And the way you kind of remember that is it looks like an N.
So an N is an N, but that helps a little bit. Yeah, sure, Ian, that's one of those things that if we all would have been here Monday, I would have probably made that crystal clear.
But I decided to just give you a video. And that's not something I've traditionally done.
But I didn't want to knock our whole schedule off for the semester because of Jerry and me.
So yeah, with that, maybe it's a good point, time for me to say if anybody has any questions about the lecture of my day, please come talk to me.
And I can make sure we're all speedy. So at the beginning of the lecture, we talked about that multiplication rule extending our events and multiplying like a bunch together.
We had so much fun with the birthdays. It wasn't as illuminating as I was hoping it to be.
That's okay. So this is actually used a lot in engineering, engineering people in here.
Every now and then I get a few like bioengineering students.
So We can use this kind of type of math type of thought to improve the redundancy of critical components.
So like in different machinery or in different transportation equipment, we might have a multiple of different radars, multiple of different computer functions.
And the idea is that there's some sort of probability of one of them failing that's very low.
But if we have multiple in there and they're independent events, we can calculate this probability of failure at like, you know, losing a flame in the sky due to the radar failing.
It will be very low if they are in fact independent.
So one flame filled largely of carbon fiber had to carry two radar transponders.
That's because if a single transponder failed, the flame was nearly invisible to the radar.
So if one component has this 0.001 probability of failure, and we think they're truly independent, we can multiply.
I assume you get my mouse in here. This is a new track pad on the new map, it's so sensitive.
So we can just multiply both of them and see that there's a very low likelihood that we would have failure.
Great, that's where we want to be. We're going to want planes falling out of the sky.
So here's an instance where this, fortunately, did not fall.
So just after leaving for Miami, Eastern Airlines flight 855 had one engine shut down because of a low oil pressure warning light.
After turning back, both of the remaining engines failed with the same warning lights.
When the plane reached 4,000 feet, the crew was finally able to restart one engine and they landed safely.
So pretty scary stuff, right? So with independent jet engines, the probability of all three of these failing is one in a trillion.
So it's very low, like if these events are truly independent, right?
So thinking about this, Do you have any ideas on what happened?
Or any ideas on why this 1 in a trillion might not hold?
Because if it's, in fact, 1 in a trillion, we just saw something very unlikely.
Right? Is that what you're going to say? And any ideas on why they might not be? If one goes down, did other engines have to go increase load?
That's a good idea, yeah. That could be what happened. Any other ideas? Yeah, like the probability increases after one goes down.
Oh, no. They can't really all be independent if it's one working kind of object.
They fail through like the same things. They don't want to be lost apart. Yeah, we're all starting to kind of get at it. With the oil pressure? So we're assuming independence of all three of these engines.
And who works on the engines? Oh. Mechanics, right? And at the same knucklehead worked on all three engines and didn't do something right, The failure rate of each engine is what they're going to be dependent on each other, right?
So the FAA found the same mechanic, changed the oil, and failed to replace the oil plug ceiling, rings, and all three engines.
So was the probability of all failing actually that?
No, because they were now dependent events. And that's why it happened. So now the FAA has new policy where they actually have different mechanics operate on different engines in the plans.
So like, mechanic one would replace the, or would do the oil change on one, mechanic two would do the oil change on two, and so on to keep these, you know, hopefully independent.
Because we don't want planes falling out of the sky.
I'm guessing they did that after they fired the guy.
Yeah I guess you had to get a job elsewhere. It's been a tough day at the office. And real quick before we leave, we're not gonna really see much of this, but I just wanna make sure you know it.
So we can extend that union rule as well. It doesn't just have to be two events we're looking at.
So if we want to find a union of disjoint events, that is ones that don't have any crossing, right?
We can just add them. So probability of A plus probability of B plus probability of C.
So an example here, probability of a person selected has stage 0 cancer or stage 1 or stage 3.
We can just add all those up. Going back to the example we looked at earlier, the year in school, we can add up the people who are sophomore, juniors, and seniors.
Just add them up. And this is only for disjoint when there starts being some crossing in there.
That can be a little confusing. But like I said, I don't think we'll use this all that much.
That's it for today, but I am gonna pull up the lab just real quick.
And once again, on Friday, and really the rest of the next week, we're gonna use this stuff in a way that I think is hopefully being more meaningful to us.
We're not just going to be talking about AIDS and bees and birth dates.
So before I get into it, does anybody have any specific questions about the lab?
So I've already gotten a few via email. So this part is your friend, right? I mean you should read the whole thing, but this whole section is really important.
So we know our data types, hopefully by now. We all feel great about that. Well, if you have a data type in R, and you know it should be continuous, you're gonna need to convert it to being numeric, right?
You don't want your, you don't want a continuous data type.
Being read in is something different. If you have a discrete data, you know it should be discrete.
It should be converted to an integer in R. And if you have a categorical data type, like a binary data type, it should be a factor, right?
But when you read data in, it doesn't always come in as such.
So there's going to be a lot of questions on this, like asking you how is this read and what variables need to be changed, right?
So this data dictionary is going to kind of tell you what things should be.
So like age, which one of these three should that be when you're preparing for analysis?
Should be continuous. So it's data, it's our data type, we'll be called numeric, right?
Sex. What our data type will that be? Factor. Factor, right? Resting blood pressure on admission to the hospital.
Numeric. Numeric, right? Cholesterol. Numeric. Exactly. Fasting blood sugar over 120 milligrams per deciliter.
Numeric. That's actually true or false. So it's going to be factored. So, kind of keep that in mind, right? So it's going to kind of ask you to check, it's going to say which one need to have their variable types changed.
So you're going to want to look at how they were read in.
So here it says age was read in as an integer. So that's not looking good because we can't, if we apply a statistical model into an integer, it's not going to work.
But I'm going to stop talking now because class is over.
You all are great. You all are awesome. Please let me know if you have-
Made With Glean | Open Event