0% found this document useful (0 votes)
38 views5 pages

R Programming: Probability Distributions

All notes for 4th unit

Uploaded by

rakupatil999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views5 pages

R Programming: Probability Distributions

All notes for 4th unit

Uploaded by

rakupatil999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

R Programming Notes – Unit IV

Common Probability Distributions


• Common Probability Mass Functions – Discrete Random Variables

– Binomial Distribution

– Poisson Distribution

• Common Probability Density Functions – Continuous Random Variables

– Normal Distribution

• Other Distribution Function

Binomial Distribution
 The binomial distribution is a discrete probability distribution.
 It describes the outcome of n independent trials in an experiment.
 Each trial is assumed to have only two outcomes, either success or failure.
 If the probability of a successful trial is p, then the probability of having x successful
outcomes in an experiment of n independent trials is as follows:

1-p  indicates the probability of not having success (failure)


n-x  indicates the number of unsuccessful outcomes

We can use dbinom(x, size, prob) function in R to generate binomial distribution.

Where, “x” is the probability to be calculated, “size” is the number of observations, and “prob” is
the probability of success of each trial.

Refer the class notes for problems and solutions.

KLS GCC BCA SEM V Page 1


R Programming Notes – Unit IV

Poisson Distribution
The Poisson distribution is the probability distribution of independent event occurrences in an
interval. If λ is the mean occurrence per interval, then the probability of having x occurrences within
a given interval is.

Examples:

1. The number of defective electric bulbs manufactured by a reputed company.


2. The number of telephone calls per minute at a switch board
3. The number of cars passing a certain point in one minute.
4. The number of printing mistakes per page in a large text.

We can use dpois(x, lambda) function in R to generate Poisson distribution.

Where, “x” is the probability to be calculated and “lambda” is the mean occurrence per interval.

Refer the class notes for problems and solutions.

Normal Distribution
The normal distribution is one of the most well-known and commonly applied probability
distributions in modeling continuous random variables. Characterized by a distinctive “bell-shaped”
curve, it’s also referred to as the Gaussian distribution.

For a continuous random variable –∞ < X <∞, the normal density function f is

where µ and σ are the mean and standard deviation, π is 3.142, and “exp” is the exponential
function.

Standard Normal Distribution


 It is the distribution that occurs when a normal random variable has a mean of zero and a
standard deviation of one.
 The normal random variable of a standard normal distribution is called a standard score or a Z
score.
 Every normal random variable X can be transformed into a Z score via the following equation:

KLS GCC BCA SEM V Page 2


R Programming Notes – Unit IV

We convert normal distributions into the standard normal distribution for several reasons:

 To find the probability of observations in a distribution falling above or below a given


value.
 To find the probability that a sample mean significantly differs from a known population
mean.
 To compare scores on different distributions with different means and standard
deviations.

Characteristics of standard normal curve:


 mean = median = mode
 symmetry about the center
 50% of values less than the mean and 50% greater than the mean
 68% of values are within 1 standard deviation of the mean.
 95% of values are within 2 standard deviations of the mean.
 99.7% of values are within 3 standard deviations of the mean.

Refer the class notes for problems and solutions.

KLS GCC BCA SEM V Page 3


R Programming Notes – Unit IV

Hypothesis Testing
A hypothesis is an assumption that is proposed for the sake of argument so that it can be tested to
see if it is TRUE or FALSE. Hypothesis testing is used to infer the result of hypothesis performed on
sample data from a larger population. Hypothesis testing can be done with the help of:
 Null hypothesis (H0) - It is interpreted as a decline or a no change hypothesis.
 Alternate hypothesis (H1 or Ha) - It is often defined as inequality.

t-Test
A t-test is a statistical hypothesis test that is used to determine whether there is a significant
difference between the means of two groups. It helps you assess whether any observed
differences between the groups are likely to have occurred by chance or if they are statistically
significant.

One-sample t-test
While performing this test, the mean of one group is compared against a set average or a standard
value.

Independent two-sample t-test


A two sample t-Test is used to compare the means of two different sample. This test can be used
when data values are independent and are sampled from independent groups.

Steps to perform t-test:

1. Calculate the t-test


2. Find the t-critical value from the t-table
degree of freedom is n-1 and confidence interval is 95% (0.05)
3. If t-test is less than t-critical value, then we fail to reject null hypothesis and conclude that
sample mean is same as standard mean.
The difference in mean we observe is by chance and not by any significant difference.

t-test()
In R programming, the t-test() function is used to perform one and two sample t-tests.
Syntax for one-sample t-test:
[Link](x, mu, alternative=c(“less”, “greater”)

Syntax for two-sample t-test:


[Link](x, y, alternative= c("[Link]")

KLS GCC BCA SEM V Page 4


R Programming Notes – Unit IV

ANOVA (ANalysis Of VAriance)


It is a statistical method used to analyse the differences between the means of two or more groups
or treatments. It is often used to determine whether there are any statistically significant
differences between the means of different groups.

The hypotheses of interest in an ANOVA are as follows:


H0: μ1 = μ2 = μ3 ... = μk
H1: μ1, μ2, μ3, …. μk are not all equal.

where k = the number of independent comparison groups.

aov()
In R programming, the aov() function is used to perform analysis of variance.

Syntax:
aov(formula, data)

KLS GCC BCA SEM V Page 5

Common questions

Powered by AI

The binomial distribution describes the outcome of n independent trials, with each trial having two possible outcomes: success or failure. If the probability of a successful trial is p, the probability of having x successful outcomes in an experiment of n independent trials is determined by the formula: 1-p indicates the probability of failure, and n-x indicates the number of unsuccessful outcomes. In R programming, the dbinom(x, size, prob) function is used to generate the binomial distribution, where 'x' is the number of successes, 'size' is the number of trials, and 'prob' is the probability of success of each trial .

A t-test assesses the differences between group means by comparing the observed data to a theoretical distribution under the null hypothesis, which assumes no significant difference. The test calculates a t-statistic, which is compared to critical values from the t-distribution (dependent on sample size and chosen confidence level), to determine if observed differences are statistically significant. R uses t.test() to perform these tests, supporting both one-sample and two-sample comparisons .

The 68-95-99.7 rule, also known as the empirical rule, states that in a normal distribution, approximately 68% of data falls within one standard deviation of the mean, 95% within two, and 99.7% within three. This rule is significant because it provides a quick, intuitive understanding of variability and probability for data that follows a normal distribution, aiding in the interpretation of relative positions within the dataset and decisions about potential anomalies .

A normal random variable can be transformed into a standard normal distribution (Z score) by subtracting the mean and dividing by the standard deviation. This transformation is done to find probabilities of observations falling above or below a given value, assess differences between sample and population means, and compare scores across different distributions with different means and standard deviations. This standardized approach allows for consistent comparison across varying datasets .

The Poisson distribution is characterized by the probability of independent event occurrences in a given interval. It is determined by λ, the mean occurrence per interval. The R function dpois(x, lambda) is used to calculate the Poisson distribution, where 'x' is the number of events, and 'lambda' is the average number of events per interval .

In a normal distribution, having the mean, median, and mode equal implies that the distribution is perfectly symmetric about the center. This symmetry means that 50% of the values lie below and above the mean, creating a bell-shaped curve where most data points cluster around the mean. It also allows for predictable behavior regarding the distribution of values within standard deviations, adhering to the empirical rule (68-95-99.7 rule).

Hypothesis testing involves stating two hypotheses: the null hypothesis (H0) and the alternative hypothesis (H1 or Ha). The null hypothesis typically proposes no change or effect, suggesting any observed data is due to chance. Conversely, the alternative hypothesis suggests a significant effect or difference exists. Statistical tests are conducted to determine if there is enough evidence to reject the null hypothesis in favor of the alternative hypothesis .

Converting observations into a Z score standardizes different data points, allowing for comparison across different normal distributions. This conversion highlights how many standard deviations an observation is from the mean, facilitating the calculation of probabilities and hypothesis tests involving normal variables, independently of original units and scale .

The standard normal distribution allows for the comparison of scores from different datasets because it normalizes data by expressing it in terms of a standard scale (mean=0, standard deviation=1). This transformation eliminates the influence of initial measurement units and different scales, which makes it possible to meaningfully compare relative positions of scores across varied datasets, interpreting how each compares to its respective group mean .

ANOVA (Analysis of Variance) is used to analyze the differences among the means of two or more groups. It tests the null hypothesis that all group means are equal against the alternative that at least one group mean is different. In R, ANOVA is performed using the aov() function, applied to a model formula involving group variables and their response .

You might also like