0% found this document useful (0 votes)

21 views7 pages

Mech 262: Data Analysis Fundamentals

Q: How is the Central Limit Theorem significant in the estimation of population parameters based on sample data?

The Central Limit Theorem (CLT) is significant because it states that for sufficiently large sample sizes (n>30), the distribution of sample means approximates a normal distribution, regardless of the population's distribution . This allows statisticians to make inferences about population parameters using sample data. It also implies that as the number of data points per sample increases, the standard deviation of the sample mean decreases, which enhances the precision of estimated population parameters .

Uploaded by

sisitrash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views7 pages

Mech 262: Data Analysis Fundamentals

Uploaded by

sisitrash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Introduction

Terminology:
● Population: Entire data set
● Sample: Subset of population
● Sample Space: all possible outcomes of data set
● Discrete: Fixed number of options
● Continuous: Infinite number of options
● Random/Stochastic variable: assigned number to identify outcome
● Distributions
○ Symmetric
○ Uniform
○ Bimodal
○ Skewed
○ J-Shaped
● Stochastic process
○ Random process
Describing data set:
● Central tendency
○ Mean
○ Median
○ Mode
● Dispersion
○ Standard deviation (root mean square)

○
Normal distribution:
● Also called gaussian distribution or bell curve
● Relates mean to standard deviation

●
Probability axioms:
● Probability: Likelihood that event will happen
● Axiom 1: probability is between 0 and 1
● Axiom 2: P=1 means event must happen
● Axiom 3: sum of all probabilities equals 1
Probability rules:
● Mutually exclusive
○ Events cannot occur at same time
○ RULE: 𝑃(𝐴∪𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
○ eg. Flipping coin and getting heads or tails
● Mutually inclusive
○ Two event may or may not occur together
○ RULE: 𝑃(𝐴∪𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴∩𝐵)
■ So we don’t double count overlap
● Independent events
○ Outcome of A doesn’t influence B
○ RULE: 𝑃(𝐴∩𝐵) = 𝑃(𝐴) × 𝑃(𝐵)
○ eg. Flipping two coins separately
● Dependent events:
○ Also called conditional probabilities
○ Probability of A given B happening
○ Denoted: 𝑃(𝐴|𝐵)
○ RULE: 𝑃(𝐴∩𝐵) = 𝑃(𝐵) 𝑃(𝐴|𝐵)
Probability distribution:
● Probability mass functions (PMF)
○ For discrete random variables
○ Mean: µ = Σ𝑥𝑖 𝑃(𝑥𝑖)
2 2
○ Variance: σ = Σ(𝑥𝑖 − µ) 𝑃(𝑥𝑖)
● Probability density function (PDF)
○ Since infinite number of outcomes, probability of given outcome is 0
■ ∴intervals must be used
○ Use integral instead of sums
○ 𝑃(𝑎 ≤ 𝑥 ≤ 𝑏) = ∫𝑓(𝑥) 𝑑𝑥
○ Mean: µ = ∫𝑥 𝑓(𝑥) 𝑑𝑥
2
○ Variance: σ = ∫(𝑥 − µ) 𝑓(𝑥) 𝑑𝑥
● Cumulative distribution function (CDF)
○ Use probability function to determine probability of event in certain range
■ Adjust bonds of integration

Discrete probability distributions

● Types
○ Binomial distribution
○ Poisson distribution
Choose notation:
● Permutations
○ Denoted: nPk
○ Order matters
○ Number of unique ordered sets
● Combinations
○ Denoted: nCk
○ Order independent
○ Number of unique non-ordered sets
Binomial distribution:
● Used for only 2 possible outcomes
○ Eg. pass or fail
○ Probability of certain outcome is process is repeated x amount of times

● where p is probability of desired event, k is #of desired events, n is

total number of events
Poisson distribution:
● Probability of given events in fixed interval of time or space
● Used for
○ Constant mean rate
○ Events independent of time since last event
○ Time or spatially discrete problems
● Assumptions:
○ Probability of event is independent of time interval (given same length of time)
○ Probability of event is independent of other events
𝑘 −λ
λ𝑒
● 𝑃(𝑘) = 𝑘!
where k is number of time desired event occurs, λ is probability of event in
given time interval of interest

Continuous probability distributions

Types:
● Normal/gaussian distribution
● Standard log-normal distribution
● Exponential distribution
Normal/gaussian distribution:
● Symmetric and negative
● Models random error

●
○ But hard to solve so use change of variables
(𝑥−µ)
○ 𝑧= σ
■ How many standard deviations away from mean
○ Set standard integral:

■
■ Set z1=0 to make it a single variable function
■ Use tables
● Using matlab to find probability
○ p=normcdf(z) where z is change of variable OR
■ Form -∞ to point (NOT 0)
○ p=normpdf(x, μ, σ)
■ Form -∞ to point (NOT 0)
Standard lognormal distribution:
● Strictly positive and occasionally very large
○ eg. Lifetime of equipment
● Logarithm that is normally distributed
○ Take ln of variable apply standard normal distribution
Exponential distribution
● Likelihood of event increase or decreases exponentially with time
−λ𝑥
● PDF function: 𝑓(𝑥, λ) = λ𝑒 where λ is rate parameter
1
● Mean (μ) = standard deviation (σ) = λ
−λ𝑥1 −λ𝑥2
● CDF: 𝑃(𝑥1 ≤ 𝑥 ≤ 𝑥2) = 𝑒 −𝑒

Test for normality:

● Test how normal an experimental distribution is
● 2 measures
○ Skewness: skewness of data
■ Denoted S
■ 𝑆 = 0 is perfectly symmetric
■ |𝑆| > 1 ⇒ very skewed
○ Kurtosis: sharpness of data (also indication of tails/extremes)
■ Denoted K
■ 𝐾 = 3 is perfect normal distribution
■ note: Some programs subtract 3 automatically
■ |𝐾 − 3| > 3 ⇒ very sharp/blunt
● Standardized moment chart to describe data relative to normal distribution

Population and samples

Distribution parameter estimation:
● Impractical to measure all population
○ ∴usually measure only sample
● How do we determine if our sample accurately represents the mean
● Central limit theorem
○ Distribution of sample means is normally distributed
■ For n>30 (#of data points per sample)
■ Regardless of population distribution
○ ↗ #of a data points (n) per sample →
○ ↘ standard deviation of, mean of samples
○ Properties
■ If population is normally distributed → samples mean ( ) is normally
distributed
■ If #of data points per sample (n)>30 → samples mean ( ) is normally
distributed

■ If n>30 →
Interval estimation of mean:
● Determining error in our sample mean
●
○ δ is confidence interval
○ Standard in 95% confidence interval (ASME standard)
● Confidence level (C)
○ Probability that population mean (μ) lies within confidence interval
○ 𝐶 = 1 − α where α is level of significance
○ C is % chance event will happen
○ α is % chance event will not happen
● Assume standard deviation of sample is equal to standard deviation of population
○ 𝑆= σ
𝑧α/2𝑆
● δ= where α is significance
𝑛
○ To find zα/2 reverse the z process using tables
○ Using matlab
■ z = norminv(p)
■ Linkes probability to z value
○ z1 is at norminv(α/2) AND z2 is at norminv(C+α/2)
● One-sided intervals
○ Only interested in upper of lower limit

■ Upper:

■ Lower:
○ DON’T divide α by 2 since all area (probability) is on one side
Student’s t-distribution:
● Use when n<30
● Same procedure as normal distribution BUT
○ Use t instead of z
○ Matlab:
α
■ tinv(p, nu) where 𝑝𝑢𝑝𝑝𝑒𝑟 = 𝐶 + 2
■ ν (nu) is degree of freedom
● As ν→∞, distribution approaches normal distribution
● As ν→∞, distribution flattens and widens
Estimation of population variance:
2
● Use chi-squared (χ ) distribution
● Use matlab
○ chi2inv(p, nu)
2
● χ is only positive therefore bounds are:
α α
○ 𝑝1 = 2
𝑝2 = 𝐶 + 2

Correlation
Linear correlation:
● Linear correlation coefficient (rx, y)
○ 𝑟 = 1 → strong positive correlation
○ 𝑟 =− 1 → strong negative correlation
○ 𝑟 =± 0. 1 → no correlation
● Only provides data on correlation
○ NO slope
○ NO non-linear correlation
● Matlab:
○ corr(x, y) where x and y are arrays of values
● Significance of linear correlation coefficient
○ ↗ data points → ↗ significance
○ Table gives minimum correlation coefficient needed to accept correlation
○ Depends on
■ #of data points sampled
■ Significance level wanted (α) (%that correlation is due to pure chance)

Correlation and causation:

● Correlation NOT causation

Common questions

The Central Limit Theorem (CLT) is significant because it states that for sufficiently large sample sizes (n>30), the distribution of sample means approximates a normal distribution, regardless of the population's distribution . This allows statisticians to make inferences about population parameters using sample data. It also implies that as the number of data points per sample increases, the standard deviation of the sample mean decreases, which enhances the precision of estimated population parameters .

Measuring an entire population is often impractical due to the large size and resource constraints in terms of time and cost . Statistical sampling provides a solution by selecting a representative subset, allowing for reliable inferences about the population. The Central Limit Theorem ensures that the distribution of sample means approximates a normal distribution for large samples, facilitating estimation of population parameters . This approach maintains efficiency and accuracy in statistical analysis and decision-making while reducing unnecessary expenditures.

The exponential distribution is distinct in its application as it models the time between independent events that occur at a constant average rate, which suits data like time until the next phone call received . The normal distribution, being symmetric, models random error and data like the distribution of heights . The log-normal distribution models multiplicative growth processes or variables that cannot take on negative values, like asset prices . Hence, while the normal distribution is for symmetric data and log-normal for rapid growth, the exponential is for time-oriented data.

Permutations and combinations are fundamental in probability theory for counting the possible ways events can occur. Permutations (nPk) are used when the order of outcomes matters, crucial for scenarios like scheduling or sequencing tasks . In contrast, combinations (nCk) are used when order does not matter, such as when selecting team members from a group . These concepts are particularly useful in discrete distributions where specific counting of outcomes is needed, influencing the calculation of probabilities in binomial and Poisson distributions.

When using the Student's t-distribution, which is appropriate for sample sizes less than 30, the confidence interval is calculated using the t-value instead of the z-value from the normal distribution . This is because the t-distribution accounts for additional uncertainty due to the smaller sample size. As the degrees of freedom increase, the t-distribution approaches the normal distribution . The procedure involves using MATLAB commands like tinv(p, nu) to find critical t-values .

The Poisson distribution is used under the assumptions that events occur independently, with a constant mean rate, and the probability of an event happening is independent of the time since the last event . It is suitable for modeling events over a fixed interval, like the number of calls at a call center in an hour. In contrast, the binomial distribution is used for a fixed number of independent trials with only two possible outcomes (e.g., pass or fail). Poisson distributions model the number of events in a time frame, while binomials focus on success in trials.

Discrete random variables have a finite number of possible outcomes, and their probabilities are described using probability mass functions (PMFs), which sum the probabilities of each possible outcome to find the total probability . Continuous random variables, on the other hand, have an infinite number of possible values within a range, requiring intervals and the use of probability density functions (PDFs) to determine probabilities over intervals . Unlike PMFs, the probability of any single point in a PDF is zero, and so integrals are used to calculate probabilities over ranges .

Correlation coefficients quantify the strength and direction of a linear relationship between two variables; a value of 1 or -1 indicates a strong positive or negative correlation, respectively . However, they only reflect linear relationships and do not imply causation. Furthermore, high correlation might be due to a third underlying variable or chance, and outliers can disproportionately affect the coefficient. Therefore, while useful for initial data exploration, correlation coefficients must be interpreted with caution to avoid misleading conclusions .

Skewness measures the asymmetry of the data distribution. A skewness of zero indicates perfect symmetry, whereas significant positive or negative values suggest a lopsided distribution . Kurtosis measures the sharpness of the peak of the data distribution. A kurtosis of three suggests a normal distribution, while deviations indicate more extreme or less than usual tails . These measures are crucial because anomalies in skewness or kurtosis suggest deviations from normality, affecting statistical analyses assuming normal distribution properties.

Mutually exclusive events cannot occur simultaneously, and their probability calculation follows the rule P(A∪B) = P(A) + P(B) because there is no overlap . On the other hand, mutually inclusive events can occur together, and their probability is computed with P(A∪B) = P(A) + P(B) − P(A∩B), accounting for the overlap to avoid double counting . These differences underscore the importance of understanding the relationship between events when calculating probabilities.

Lectures
No ratings yet
Lectures
507 pages
Financial Statistics Fundamentals
No ratings yet
Financial Statistics Fundamentals
32 pages
Probability and Variance Formulas 2020
No ratings yet
Probability and Variance Formulas 2020
7 pages
Probability Concepts and Distributions
No ratings yet
Probability Concepts and Distributions
7 pages
Understanding Inferential Statistics Basics
No ratings yet
Understanding Inferential Statistics Basics
4 pages
QTMD1 Concepts and MCQs
No ratings yet
QTMD1 Concepts and MCQs
25 pages
Overview of Probability Distributions
No ratings yet
Overview of Probability Distributions
20 pages
Probability and Statistics Overview
No ratings yet
Probability and Statistics Overview
19 pages
Key Concepts in Probability Distributions
No ratings yet
Key Concepts in Probability Distributions
34 pages
Statistical Concepts in Probability
No ratings yet
Statistical Concepts in Probability
6 pages
Understanding Lognormal Distribution
No ratings yet
Understanding Lognormal Distribution
23 pages
Standard Distribution
No ratings yet
Standard Distribution
8 pages
Statistical Methods for Quality Management
No ratings yet
Statistical Methods for Quality Management
71 pages
Comprehensive Guide to Statistical Methods
No ratings yet
Comprehensive Guide to Statistical Methods
65 pages
Probability and Statistics Cheat Sheet
No ratings yet
Probability and Statistics Cheat Sheet
4 pages
Understanding Data Distributions in Statistics
No ratings yet
Understanding Data Distributions in Statistics
24 pages
Comprehensive Probability Cheat Sheet
No ratings yet
Comprehensive Probability Cheat Sheet
6 pages
Statistics Maths@Khan - The.analyst
No ratings yet
Statistics Maths@Khan - The.analyst
43 pages
Statistical Models in Simulations
No ratings yet
Statistical Models in Simulations
65 pages
Mean and Range in Probability Distributions
No ratings yet
Mean and Range in Probability Distributions
4 pages
Key Probability Distributions Explained
No ratings yet
Key Probability Distributions Explained
25 pages
Discrete Variable Probability Distributions
No ratings yet
Discrete Variable Probability Distributions
24 pages
Econometric Modeling and Data Types
No ratings yet
Econometric Modeling and Data Types
109 pages
Understanding Data Distributions in Statistics
No ratings yet
Understanding Data Distributions in Statistics
23 pages
Discrete Probability Distributions Explained
No ratings yet
Discrete Probability Distributions Explained
25 pages
Probability2 Summary
No ratings yet
Probability2 Summary
7 pages
Gaussian Distribution Explained: A Guide
No ratings yet
Gaussian Distribution Explained: A Guide
27 pages
Understanding Population and Probability Concepts
No ratings yet
Understanding Population and Probability Concepts
2 pages
Understanding Probability Distributions
No ratings yet
Understanding Probability Distributions
21 pages
Discrete Probability Distribution
No ratings yet
Discrete Probability Distribution
45 pages
Understanding Statistical Distributions
No ratings yet
Understanding Statistical Distributions
62 pages
A Probability and Statistics Cheatsheet
No ratings yet
A Probability and Statistics Cheatsheet
28 pages
Understanding Statistics and Probability
No ratings yet
Understanding Statistics and Probability
108 pages
Data Transformation and Statistical Tests
No ratings yet
Data Transformation and Statistical Tests
32 pages
SOA Exam P Probability Notes
100% (1)
SOA Exam P Probability Notes
10 pages
Statistics and Maths for Data Analysts
No ratings yet
Statistics and Maths for Data Analysts
23 pages
Statistical Methods for Quality Management
No ratings yet
Statistical Methods for Quality Management
71 pages
IIM Amritsar Statistics Microeconomics
No ratings yet
IIM Amritsar Statistics Microeconomics
21 pages
Day 02-Random Variable and Probability - Part (I)
No ratings yet
Day 02-Random Variable and Probability - Part (I)
34 pages
Understanding Probability Distributions
No ratings yet
Understanding Probability Distributions
9 pages
CFA Level I: Quantitative Analysis Guide
0% (1)
CFA Level I: Quantitative Analysis Guide
84 pages
Experimental Measurement Uncertainty Analysis
No ratings yet
Experimental Measurement Uncertainty Analysis
16 pages
Data Modeling: Random Variables & CLT
No ratings yet
Data Modeling: Random Variables & CLT
44 pages
Quantitative Analysis for FRM Part 1
No ratings yet
Quantitative Analysis for FRM Part 1
24 pages
Viva Answers Statistics
No ratings yet
Viva Answers Statistics
3 pages
Statistical Symbols Cheat Sheet
67% (6)
Statistical Symbols Cheat Sheet
7 pages
Statistics and Probability Basics Guide
No ratings yet
Statistics and Probability Basics Guide
22 pages
Data Analysis Visualization Unit1 Protected
No ratings yet
Data Analysis Visualization Unit1 Protected
55 pages
Understanding Probability Concepts
No ratings yet
Understanding Probability Concepts
17 pages
Probability and Statistics Study Notes
No ratings yet
Probability and Statistics Study Notes
8 pages
Methodology Training - Basic Statistics (Divya Beri) PDF
No ratings yet
Methodology Training - Basic Statistics (Divya Beri) PDF
30 pages
Class 10 Probability Concepts Explained
No ratings yet
Class 10 Probability Concepts Explained
17 pages
Math
No ratings yet
Math
8 pages
Statistics Cheat Sheet: Formulas & Problems
No ratings yet
Statistics Cheat Sheet: Formulas & Problems
6 pages
Statistics and Probability Concepts Overview
No ratings yet
Statistics and Probability Concepts Overview
7 pages
Probability and Statistics Overview
No ratings yet
Probability and Statistics Overview
18 pages
Probability and Statistics Cookbook
No ratings yet
Probability and Statistics Cookbook
28 pages
CVP Analysis
100% (1)
CVP Analysis
23 pages
Frequency Analysis
No ratings yet
Frequency Analysis
73 pages
Wind Speed Distribution Modeling For Wind Power Estimation - Case of Agadir in Morocco
No ratings yet
Wind Speed Distribution Modeling For Wind Power Estimation - Case of Agadir in Morocco
11 pages
3D Variance Reconstruction from 2D Data
No ratings yet
3D Variance Reconstruction from 2D Data
9 pages
Reliability of Drag Anchors in Clay
No ratings yet
Reliability of Drag Anchors in Clay
19 pages
SPE 102093 Pore Perm Relationship
No ratings yet
SPE 102093 Pore Perm Relationship
9 pages
Nonparametric Smearing Estimate Method
No ratings yet
Nonparametric Smearing Estimate Method
7 pages
Using JMP To Assess Risk in Financial Predictions by Using Monte Carlo Simulations
No ratings yet
Using JMP To Assess Risk in Financial Predictions by Using Monte Carlo Simulations
21 pages
Curriculum Vitae of Dr. Mohammed Baker
No ratings yet
Curriculum Vitae of Dr. Mohammed Baker
14 pages
2011 12 Bohs Nvva Sampling Strategy Guidance
100% (1)
2011 12 Bohs Nvva Sampling Strategy Guidance
51 pages
Distribucion Log Normal
No ratings yet
Distribucion Log Normal
52 pages
Log-Normal Shadowing Model Overview
No ratings yet
Log-Normal Shadowing Model Overview
7 pages
Lognormal Distribution Analysis and Exercises
No ratings yet
Lognormal Distribution Analysis and Exercises
3 pages
Objective Biology For Medical Entrance Examinations Vol I Rajiv Vijay Ebook Chapter by Chapter
100% (1)
Objective Biology For Medical Entrance Examinations Vol I Rajiv Vijay Ebook Chapter by Chapter
44 pages
Probability Distributions and Analysis
No ratings yet
Probability Distributions and Analysis
67 pages
Statistical Reliability and Probability Rules
No ratings yet
Statistical Reliability and Probability Rules
38 pages
A - C - 1 - 1.5 - Papers - Assessment of Diversification Benefit
No ratings yet
A - C - 1 - 1.5 - Papers - Assessment of Diversification Benefit
46 pages
An Empirical Comparison of New Product Trial Forecasting Models
No ratings yet
An Empirical Comparison of New Product Trial Forecasting Models
21 pages
Excel Monte Carlo for Oil & Gas Evaluation
No ratings yet
Excel Monte Carlo for Oil & Gas Evaluation
14 pages
Time-Dependent Failure Models Explained
100% (1)
Time-Dependent Failure Models Explained
37 pages
B.Sc. Mathematics: Probability & Statistics Course
No ratings yet
B.Sc. Mathematics: Probability & Statistics Course
6 pages
Lognormal Random Multivariate Analysis
No ratings yet
Lognormal Random Multivariate Analysis
5 pages
Black-Scholes-Merton Option Pricing Model
No ratings yet
Black-Scholes-Merton Option Pricing Model
12 pages
AACE 119r-21
No ratings yet
AACE 119r-21
7 pages
Understanding Populations and Distributions
No ratings yet
Understanding Populations and Distributions
28 pages
Customer Satisfaction in Electric Utilities
No ratings yet
Customer Satisfaction in Electric Utilities
8 pages
Lognormal Distribution Overview and Applications
No ratings yet
Lognormal Distribution Overview and Applications
5 pages
Rohrs2006 PDF
No ratings yet
Rohrs2006 PDF
11 pages
Leperltier C. 1969. A Simplified Statistical Treatment of Geochemical Data by Graphical Representation
No ratings yet
Leperltier C. 1969. A Simplified Statistical Treatment of Geochemical Data by Graphical Representation
13 pages
Naghibi Farzaneh PHD ENGM Nov 2014
No ratings yet
Naghibi Farzaneh PHD ENGM Nov 2014
148 pages

Mech 262: Data Analysis Fundamentals

Uploaded by

Mech 262: Data Analysis Fundamentals

Uploaded by

Introduction

Discrete probability distributions

● where p is probability of desired event, k is #of desired events, n is

Continuous probability distributions

Test for normality:

Population and samples

Correlation and causation:

Common questions

How is the Central Limit Theorem significant in the estimation of population parameters based on sample data?

Why is it often impractical to measure an entire population, and how does statistical sampling address this issue while ensuring reliable inferences?

In what way does the exponential distribution differ in its application compared to the normal and log-normal distributions, particularly regarding the type of data it models?

Discuss the role and significance of permutations and combinations in probability theory, particularly in the context of discrete distributions.

How does the calculation of a confidence interval change when using the Student's t-distribution instead of the normal distribution?

What assumptions underlie the use of a Poisson distribution, and how does it differ from a binomial distribution in modeling events?

What is the difference between discrete and continuous random variables in probability distributions, and how are probability mass functions and probability density functions related to them?

What is the significance of correlation coefficients in interpreting relationships between variables, and what are some limitations of using them?

How do skewness and kurtosis contribute to the evaluation of normality in a data set, and why are these measures important?

Explain how mutually exclusive and mutually inclusive events differ in probability and the rules that govern their probability calculations.

You might also like