0% found this document useful (0 votes)
6 views7 pages

Understanding Random Variables and Distributions

Uploaded by

kainamates
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views7 pages

Understanding Random Variables and Distributions

Uploaded by

kainamates
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Random Variables

Bernoulli requirements :

iscreet 1. There are only two possible outcomes, success or


failure
2. The probability of success p, is constant
Discreet Variable:
-A variable that can take on individual distinct values such that no further value exists between 2 subsequent values. So there is a finite set / it is
countable

Random variable:
- a variable is considered random if the numerical values it can take occur as the outcomes Of A probability experiment.

P(X)
-X = a statistical variable is a characteristic of a population or sample that can vary

function

Ofor domain Linear origin -datserads


Propability mass function ·

3 aE[X] Eb
2 = 1, 2 , 34
-A function made of probabilities
P(X x) = = E x)x
-

=
8
·
[(aX = b) =

other
·
P(X x)= =
f(x) for domain
function ·
var [aX = b] =
aVar[X]
GP(X x) = = 1
·
SD[aXIb] = aSD[X]

Expected values + others


Class Pad :
·
E(X] = P(i) · BinomialPDf(x ,
n
,
P)
I using
Binomial (Df(Lowwer ,
must state before
p)
I
↓ ·
upper n
essentially the mean of the function
, ,

E[X]-f(x]
H

·
Var[X] =
Cliu) Paris or
·
Solve E[x] on main

El ,
2
,
3
,
43 +
50 25
.

,
0 29
.

,
0 25 0
.

,
.
753 Same 3 do get #(x]

·
Solve all on stats :
as
mejettie
+ 2

pl
C stats ,
Call
,
one var, free list X =
E[X] Ox =
SD

·
Solve Bin distribution Stret on graph mode table

Mean median mode and range


·
mean = E[X]
·
median = P(X(x) = 0 . S

·
mode =
P(X () = =
highest Probability

Range max-min
·
= outcomes

·
Gu

Binomially distributed Random variable Propability histograms


1) as n increases the smoothness Of the histogram
of X-Bindn , 0
3 assumptions / things needed to de a binomially distributed random variable
increases
-1) there is a known and constant number of trials = n
2) when P increases the histogram translates to the
-2) each Bernoulli trial is identical and independently distributed
right and as P decreases the histogram translates left
-3) there is identical and known probability of success = p and
to the left

P(X x) X- Bin (n p)
=
,
= "G: p" " . 3) the peak of the histogram is The expected value
accounts for the diff combinations in order
4) the histogram is symmetrical on both sides of the
·
E(X] =
nP expected value

· var [x] nP(1-p)


=
/
·
SD(X] = nP(1-p)
C Pos
m
Class Interval: A range of values for the continuous random variable, X.e.g., a ≤ x < b
Frequency: the number of times an outcome / data point in class interval occurs
Relative Frequency: the frequency of an outcome / class interval out of the total frequency,. This is also referred to as the experimental probability.
Cumulative Frequency: the addition of frequencies in ascending order, such that P(X ≤ maxx) = 1.
(Probability Density Function): A continuous random variable, 𝑋, has a probability density function (p.d.f.), 𝑓( 𝑥 ) , which can be used to describe the
likelihood for a range of possible values of the continuous random variable.

Axioms of Probi Cumulative distribution function histogram Prob

Ejafkadac
1) f(x)] 0 xLa
2 .

g 401x(4S 3

2) ( : + (x)dx
.
= 1 f(x) =
P(X =x) =
P(x() =
4S1xLSC 2

3) P(X x) = =
0 for (RV fundamental theorem of Calc says
1
,
x ?b
2) P(42 -x(48)

freq -(3) =(2)


Sat()d
3
not actually
+
axiom
=

(f(x)] =
=
an

f(x) = ↓
frequency of 40-45

prob of being within 42-4

.
Mean Standard deviation Percentile
E(x] =
fac f(x) dx SD(x] =
var(X] we are
identifying the score for which K % of the data lies below

e .
g finding 99th Percentile P(X1k) = 0 . 99 : Sa"fad = 0 . 99

Variance Median

var(y] = (a(x -
u) f(x) dx is the value of X When there is so % of data on either side of the score

or
2
P(a = x = m) =
P(m = x = b) = 0 .
3

=
(c + (c)dx- [Saxf(x)dx] or using the CDF

E(X -
E(x]2 f(m) =
P(x[m) = 0 . 9 so Catades

linear changes of origin and scale Uniform Continous random Variables

f(aX b) = = at(X] = b Denoted as X-U(a b) ,


where f(x) = Ease f(x)

ba
Var(aX + b]]
E
avar(X] k =
so f(x) = Aaa2xzb
k

· Otherwise
SD(aX = b] = laISD(x]

Expected ,
var and median ofu crv
'
,
j
(b a)
Triangular distribution [(x] "I :
-

=
median Varixs =
12

Cumulative distribution function

f(x) P(T(x) = =

prof23 Jad
+
=

"

al + ]
:

all
Normal distribution
Def: we say that a continuous random variable x is normally distributed with two parameters; the mean M And variance ja. A normally distributed
random variable can be denoted using the notation

Denoted : X-N (M 02 Must define at beginning of questions for P(x)a) P


·
=
,

f(x) = de ) for -
x(x( *

Properties of the normal distribution curve


changing M and 0

The curve is symmetrical about a given value of x. This value of x is the


1) mean, u, but also the median and mode. u is going to change the location of the centre

2) The curve has asymptotic behaviours, that is, f(x) -


-0 as 1 -> 1 0

There are two points of inflection, either side of the mean. It can be shown using O
is going to change the spread, that is, a greater
3) calculus techniques that the points of inflections occur at u -0 u + 0 standard deviation will create a greater spread.
,

4S The total area under the curve is equal to 1, making it a valid probability density function.

5) The probability at a point is 0, i.e P(X = k) = 0.

Standard
Score 68i .
as% · 99 7 %. . rule

:
-

z =
xjM · I score-how many SD away from mean
-

StandardScore distribution P( 122(1) -


PC 2 (222)
-

P( 322(3)
-

= 0 .
68 =
0 . 93 = 0 . 997
:12

z-N(0 1) +(2) ,
= for -11210

Must define at beginning of questions for P(zsu) P =

Tailss Centre tail Right tail

Properties :

1) the peak of the bell curve is at the mean value of 0, which is also the median
and mode.
-1 sa 8
F is in
--
ido

P(X(k) P( k(x(k)
-

P(X)k)

2) the curve is asymptotic to the z-axis and continues infinitely in both directions Inv Norm (Df :

Identify tail
,
insert Prob of the CDE ,
insert SD
,
insert mean

3) the points of inflection are at z = +1 -

Comparing Scores Standardisation + Predictions Quantile

calculate 2 scores for The thQuantile


·

e a c h Standardise
·

test scores by calculating is the equivalent decimal representation of

score and compare which is the 2 Score of each test the k""Percentile

greatest ·

finding the are I score C .

g P(X[x)
= c
: &

Calculate Predicted Score by


·

solving for

1 using z =
·
Random
Sampling
Population: The set of all eligible members of a group which statisticians intend to study is called a population

Census: Data collection from every member of the population

Parameter: is a numerical measure of a population, such as mean, median, mode, standard deviation etc. can only be calculated using data from census
However, an entire population can be very large or can be difficult to access, thus it would be very impractical to conduct a census
every time we wish to collect data.

Sample: A select subset of the population is called a sample.

Survey: can then be used to obtain the same information from each member of our sample, which is much quicker and cheaper than dealing with the
whole population.

Statistic: what we call a numerical measure of a sample of a target population. These sample statistics can be used to make inferences about the
population parameters. ·
mode median mean Range Max Min , , , ,
,

Point estimate: Is an estimate for a given parameter e.g the mean

Representative sample: one that is representative of the population, as a result we would expect a fair and representative sample to produce
sample statistics close to the population parameters

Rias
Bias: A biased sample will not be representative of the population, as it will favour some section of the population.

Response
bias
Sampling bias
·
Spatial bias location bias ·

Voluntary bias-the Sample is randomly selected but ,


a member

Temporal bias Time bias chooses whether or not to respond

Under-coverage bias -
Includes an under or overrepresentation of the population
·
Non-response bias-due to an
unwillingness or an inability to respond

·
Self-Selection bias-as a result of an opt-in process ·

Leading - Question bias-when a question is asked in a


way to Prompt a

Certain response

Sampling
Probability Random sampling
( Simple random sampling 3) Stratified Sampling

every member of the Population has an equal chance of being included in the sample
·
The Population is divided into strata based on common

·
can be done Using Randlist (n ,
a
,
b) characteristics

·
n =
intergors generated from values a b
·
Then relevant proportions are taken from each subgroups

Systematic
sampling 2 .

9 7 8 a lo 11 12 S = Strata Size

Th Population is
·
sorted in an order and every nt" number of the population n in in in in in in

is chosen to make the desired sample Size ny + ng +


nq + n
, c
+ n
,
+ n
, z
= n

·
with this method it is important there is no hidden biased list order

&
Randlist (1 , a
,
b) o free: , every f from random numbe 4) Cluster Sampling
Rounded down ·
The Population is divided into clusters With each

subgroup having Similar characteristics as the whole Population

from used to
·

the cluster a simple random sampling Method is make the sample

The method is good with large and dispersed populations , but


dealing be
·
for we must cautious When there

are large diff between clusters .


As a result it can be hard to Guarantee a representative sample
Random
Sampling e.

Sampling
Non-Probability sampling
1) convience sampling 2 ) Quota
.

sampling
cass accessible sampling e .

g data for Primary school selected local data is collected until Quota filled

3) volunteer response sampling 4) Purposive/Judsement sampling

opted in for it researcher chooses sample that is useful to research

es research work speace health sample of old workers


,

Simulating
~
samples

Variability of Samples: Each of the different samples taken from a population are likely to produce variable statistics, simply due to the
randomness of sampling
We can observe variation in samples through a process of simulation.

Simulation: A simulation is an equivalent situation to model the events of a random probability experiment, without actually conducting the
experiment

Parent Distribution: A parent distribution is the assumed distribution from which the sample is beinglaken from. \

Simulating uniform distribution L a


of large w
numbers

a + (b a) Randlis + (m)
-
As n- > N ,
the Sample Statistics should

* clives m values between 0 and tend towards the Population Parameteres , but

it
creates interval for which sample selected may still vary due of random
Y be
may to nature

Creates a lowerbound sampling

Simulating Normal distribution Suitability of Normal distribution


RandNorm (0 u , ,
m) data must be symetrical about the mean

Proportion of data must lay within a given range Similair

Simulating Bernoulli distribution of what we should expect from a normal distribution

RandBin(1 ,
P
,
m)
·
Represents distribution of m bernauhi trials Normal a proximations to Binomial

XeBinCh, ps then XnwN(np npCI-p)) ,

Simulating Binomial distribution ·


where P = c . S

RandBin (n ,
p
,
m) ·creates Set where Eas no of successes
&

Number of Simulations Capture Recapture


*

Probability Success trial


= N == captured and tagged
of for given
~

number of Bernoulli trials ·


+z =
number tagged Second observation
Sample Proportions
Sample proportion: experimental probability from a sample size n

Denoted 8 In :

,
x =
number of successes n =
sample Size Note : P is a constant true value ,
however due

· can be used as an estimator of O, the Population Proportion to


variability of
sampling ,
is a sample stat

that will
vary from sample to sample

Sample proportion: - Random variables:

Denoted & = in X =
,
XeBin(n p) ,
f(x] =
nP Varx] =
nOCI-p)
·
This distribution is used as the sample is interviewed via Bernoulli trials (Independent Answer ,
=
Yes/no)

E[P] Var(E] SD(]


=
inE(x] =
in (np) =
(h)"var(X] =
p(1 P) if p is unknown be
may
-
·

= P = np(1 p) -
N
used as a point estimate for P

P
=
x

Sampling distribution of sample proportions: Distributions created by repeated samples of the population

Central limit theorem: given that p is approximately = 0.5 (meaning symmetry) and n is sufficiently large. Or given that p is not = 0.5 but np >10 and n(1-
p) >10 then the sampling distribution of sample proportions is approximately normally distributed. So as n—> infinity the distribution becomes more
normal

P-NCPOLican be used to determine te Prob that -

X-
Note if

Bin (n P)
conditions

,
arent met must ap
a

How many SD
away from true p ·
Probability interval
2 x P4))
p)

-p zx0( pap +
-

P(p
- <
-

z -
=
08s as =in ·
The area between 2 Points on a pencu n)
n ,

Prac Problem PC21K) = 0 . 01


,
the value that
of corresponds typically PGKICK) =
12
,
solve for k via invnorm

& marks do invright tail P(P = k) = 0 . 0 known

p -yaknemn
③ marks do inv right tail P(22k) = 0 . 01
,
then =
SD@ knour

Confidence Interval
Provides a range of values within which the population mean could lie in a sample proportion distribution

Z scre for : 90 % CI 95 % CI 991 (I so range :


Peo parz margin of error form : P

1 .
645 1 960 .
2 . S76 represented via (0-z ] z N

finding values
=
a + b

find p given a Cl [a b) ,
: 2
b
find min n Given given max value of f
a
=

constant
constant

findt given a (I (a b) ,
: w = b - a
,
f : with ,
E = b. ,
E = P -
a & use historical data for 0 :

find2 given (I(a b7 SD , ,


: 0 = OPCECZCE) zNN(C 1) ,
② use p = 0 . S as it creates the largest
Sample Proportions Ptz

CI interpretations : do in context

Comment on likelihood the true value of p lies within a single CI: likelihood cannot be inferred from a single observed sample as the CI either
contains or doesn’t contain the true value but we never know for certain due to the nature of Random sampling

If n amount of samples were to be taken from a c% confidence interval how many are expected to contain the true amount:
n*c

Precision: The precision of a confidence interval is a qualitative measure of how close the estimate is to the
true value of the parameter. To obtain a better interval estimate the width of the CI should be decreased
whilst preserving the confidence level

So the smaller the width with same CI the more precise

do in
Comparing 2 samples context

When given 2 samples you may be asked to comment where something has affected the sample proportions

Case 1 : Case 2 : CaSe 3 :

A I * A I * *
I

* & * * I * A ,

If one or both of the sample proportions If there is no overlap at all then = If there is partial overlap between the 2
Are contained then = confidence intervals such that neither sample
There is sufficient evidence to proportion is contained then =
There is insufficient evidence to suggest suggest that a changed condition
that a changed condition had an impact on may have had an impact on the There is insufficient evidence to conclude
the population population anything definitive about the 2 samples

Example

You might also like