Probability and Statistics Tutorial for Engineers
Probability and Statistics Tutorial for Engineers
A trimmed mean is an average calculated after removing a certain percentage of the smallest and largest values to mitigate extreme outliers' effects. For a data set of total marks obtained by 10 students, calculating a 20% trimmed mean involves removing the top and bottom 10% of scores (or, for small datasets, some highest and lowest scores), then averaging the remaining figures. This gives a central tendency measure less swayed by outliers .
The survey process involves: (1) Setting up administrative organization to establish clear roles and protocols; (2) Designing forms to ensure data collection consistency; (3) Selecting, training, and supervising field investigators to maintain data integrity; (4) Controlling the quality of fieldwork and performing field edits for data accuracy; (5) Following up non-responses to maximize response rate; (6) Processing data for analysis; (7) Preparing the final report to communicate findings effectively. This sequence ensures logical progression and data quality .
The harmonic mean is particularly useful in situations where average rates are desired, such as average speeds over equal distances or financial rates. It tends to be lower than the arithmetic mean, emphasizing the impact of smaller values on the data set. This is crucial when small values have a significant impact on the overall result, like in finance where lower interest rates can have a large negative impact .
Qualitative variables describe non-numeric characteristics such as categories or labels, while quantitative variables represent numeric values. Among the examples: 'time to travel to work,' 'price for a canteen meal,' 'delivery time for a parcel,' and 'height of a child' are quantitative. 'Shoe size,' 'wavelength of light,' and 'customer satisfaction on a scale from 1 to 10' are also quantitative; shoe size is discrete, whereas time-related variables and height are continuous. 'Preferred political party,' 'eye color,' 'gender,' and 'blood type' are qualitative .
Histograms are used to represent the distribution of numerical data and show frequency distributions using bars without gaps. Bar charts display categorical data with gaps between the bars. A histogram is more suitable for visualizing the distribution of a continuous data set like exam scores, while a bar chart is better for comparing discrete categories, such as the number of students in different clubs .
Sample spaces, the total possible outcomes, simplify probability by framing the scope. When four cards are drawn from a deck, understanding that the sample space consists of all combinations of four cards helps in computing the probability of drawing at least one ace. Define favorable outcomes as any subset containing at least one ace from those 52 cards possibilities, then divide by the total sample space size for the probability of drawing an ace .
Understanding discrete (countable) and continuous (measurable) variables is crucial for appropriately choosing statistical methods and visualizations; for instance, histograms for continuous data or bar charts for discrete data. Incorrect classification might lead to misinterpretation or inappropriate tests impacting results reliability. For example, treating shoe size (discrete) as continuous could lead to incorrect assumptions about the data distribution .
The intersection in probability involves finding elements common to sets, representing outcomes simultaneously satisfying different event criteria. Studying the intersection of sets X = {3n−1, n ∈N, n < 3} and Y = {y is a prime number < 7}, leads to {2, 5}, emphasizing that both logical and mathematical criteria must be met for elements in this intersection. This is crucial for complex event probability computations .
Bayesian theorem can apply by using prior probability distributions of student performances in Mathematics, Physics, and Chemistry given certain passing criteria. If a student's total score passes thresholds and fails specific subjects, the theorem evaluates the conditional probabilities of these competing hypotheses. Supporting evidence involves comparing these probabilities with known distributions of student scores across the subjects .
To introduce a new variable 'product' in the DNase data set in R, first read and view the data using functions like `read.table()` and `View()`. Create the product variable as the product of the existing variables 'concentration' and 'density' using the assignment `data$product <- data$concentration * data$density`. This new variable can be used for further statistical analysis, such as regression models, to explore the relationship between concentration, density, and their interactions .