Descriptive Stats & Probability Analysis
Descriptive Stats & Probability Analysis
1. Look at the data given below. Plot the data, find the outliers and find out μ , σ , σ 2
Name of company Measure X
Allied Signal 24.23%
Bankers Trust 25.53%
General Mills 25.41%
ITT Industries 24.14%
[Link] & Co. 29.62%
Lehman Brothers 28.25%
Marriott 25.81%
MCI 24.39%
Merrill Lynch 40.26%
Microsoft 32.95%
Morgan Stanley 91.36%
Sun Microsystems 25.99%
Travelers 39.42%
US Airways 26.71%
Warner-Lambert 35.00%
Ans:
Mean of given data is μ=¿ 0.332713.
Median for given data is 0.2671.
Standard Deviation is σ = 0.163708.
Standard Deviation Square is σ 2= 0.0268.
Outliers is Morgan Stanley = 91.36%.
2.
c. If it was found that the data point with the value 25 is actually 2.5, how would
the new box-plot be affected?
Ans:
If the data point with the value 25 is actually 2.5 then the outlier which lies on the
value 25 will be in the boxplot towards the lower extreme.
3. Answer the following three questions based on the histogram below.
c. Suppose that the above histogram and the box-plot in question 2 are plotted for
the same dataset. Explain how these graphs complement each other in providing
information about any dataset.
Ans:
The histogram shows whether the data is symmetric or not and the box will show
the outlier and IQR of the given data. We can see median in boxplot and mode in
histogram.
Histogram provides the frequency distribution so we can see how many times each
data point is occurring however boxplot provides the quantile distribution i.e., 50%
data lies between 5 and 12.
Boxplot provides whisker length to identify outliers, no information from histogram.
We can only guess looking at the gap that 25 may be an outlier.
4. AT&T was running commercials in 1990 aimed at luring back customers who had
switched to one of the other long-distance phone service providers. One such
commercial shows a businessman trying to reach Phoenix and mistakenly getting Fiji,
where a half-naked native on a beach responds incomprehensibly in Polynesian. When
asked about this advertisement, AT&T admitted that the portrayed incident did not
actually take place but added that this was an enactment of something that “could
happen.” Suppose that one in 200 long-distance telephone calls is misdirected. What is
the probability that at least one in five attempted telephone calls reaches the wrong
number? (Assume independence of attempts.)
Ans:
If 1 in 200 long-distance telephone calls are getting misdirected.
Probability of call misdirecting = 1/200
Probability of call not Misdirecting = 1- 1/200 = 199/200
The probability for at least one in five attempted telephone calls reaches the wrong
number of Calls = 5
n=5
p = 1/200
q = 199/200
P(x) = at least one in five attempted telephone calls reaches the wrong number
= 1 - none of the call reaches the wrong number
= 1 - P (0)
= 1 - (1/200) (199/200) ^5
= 1 - (199/200) ^5
= 0.02475.
The probability that at least one in five attempted telephone calls reaches the wrong
number 0.02475.
5. Returns on a certain business venture, to the nearest $1,000, are known to follow the
following probability distribution
x P(x)
-2,000 0.1
-1,000 0.1
0 0.2
1000 0.2
2000 0.3
3000 0.1
a. What is the most likely monetary outcome of the business venture?
Ans:
Max. P = 0.3 for P (2000). So, most likely outcome is 2000
c. What is the long-term average earning of business ventures of this kind? Explain
Ans:
Weighted average = x*P(x) = 900. This means the average expected earnings
over a long
period of time would be 900$
d. What is the good measure of the risk involved in a venture of this kind? Compute
this measure
Ans:
P(loss) = P (x= -2000) +P(x=-1000) =0.2.
So, the risk associated with this venture is 20%.
Advertisements, such as AT&T's depiction of a possible miscommunication, play a significant role in shaping consumer perceptions by dramatizing potential benefits and drawbacks of services . They are designed to captivate attention and elicit emotional responses, often exaggerating scenarios for effect. Ethically, such portrayals must be carefully managed to avoid misleading the public, as overstatement or false representation might lead to consumer distrust and regulatory scrutiny. Companies must balance creativity and ethical responsibility, ensuring honest communication while engaging audiences, to maintain credibility and consumer trust .
The boxplot and histogram together offer a detailed view of the dataset’s distribution characteristics by serving complementary roles. The histogram shows the frequency distribution of the data, revealing its shape, central tendency, and spread, which includes insight into skewness and modality . Meanwhile, the boxplot focuses on the data’s quartiles and outliers, providing insights into its spread, like the inter-quartile range and median . Together, they give a more full picture: the histogram identifies the mode, and the boxplot pinpoints outliers and divides data into quantiles, offering a robust analysis of personal and extreme values .
The probability calculation of 0.02475 indicates that there is a 2.475% chance that at least one out of five attempted long-distance telephone calls will be misdirected . This low probability suggests that misdirected calls are rare events, reinforcing the reliability of the service with high accuracy but also highlighting the significance of rare errors due to potentially disrupting outcomes when they do occur. This probability analysis is essential for assessing operational quality and areas for improvement in communication services .
Identifying outliers like Morgan Stanley (91.36%) is significant because they can disproportionately influence the results of statistical analyses, such as mean and standard deviation, by skewing them towards the outlier's direction . In practical terms, eliminating or treating outliers is crucial for accurate data analysis, as they may indicate data entry errors, or represent special cases that need to be analyzed separately. Outliers can also mask underlying trends in the data if not appropriately managed, potentially leading to incorrect conclusions .
The assumption of independence is crucial for calculating the probability of misdirected calls, as it implies that the outcome of one call does not affect another, allowing the use of binomial probability models . This independence assumption makes the calculation straightforward by using the formula: P(at least one misdirected call in five) = 1 - (probability of no misdirected calls). Independence simplifies the computation and analysis, supporting accurate predictions and resource allocation in telecommunications, where dependent events would require more complex modeling .
A measure of risk, such as the probability of loss (20% in this data), is important because it quantifies the likelihood of negative financial outcomes, essential for investors and managers to assess potential downsides and prepare risk management strategies . It is computed by summing the probabilities of outcomes that result in losses, which in this data are -$2,000 and -$1,000. Accurate risk assessment helps in making informed decisions, identifying potential challenges, and optimizing resource allocation to mitigate risks .
In a positively skewed dataset, the mean is typically greater than the median, which indicates the presence of higher frequency, lower value data points with some extreme high values pulling the mean upwards . This skewness suggests caution in using the mean as a measure of central tendency, as it might be misleading compared to the median, which better represents the central location of most data points. Consequently, for data interpretation, the median might offer a more robust insight into the typical values within the dataset, as it is less affected by outliers .
The probability of making a profit—60%, as described in the dataset—is a critical evaluation metric because it directly measures the venture's potential for success and financial viability . A higher probability of profit implies better return expectations and minimized risk, essential for stakeholders and investors. This metric allows for comparison between different ventures and aids in decision-making processes, demonstrating the venture's potential to achieve positive returns more often than not, which would attract investment interest and strategic partnerships .
Changing a data point from 25 to 2.5 would significantly affect the dataset's interpretation. The point at 25, previously an outlier, would move towards the lower extreme of the dataset on the boxplot. This would reduce the overall range and potentially alter the interquartile range, adding more data to the lower quartile . Such a change would suggest a more pronounced left skewness and potentially affect interpretations related to variability and significant deviations from typical values. It highlights how sensitive visual representations like boxplots are to changes in individual data points .
The concept of weighted average, calculated as $900, represents the expected long-term earnings from the business venture, accounting for all possible outcomes weighted by their probabilities . This average provides a single measure that captures the central tendency of possible financial returns, offering insights into expected profitability over time. It is particularly useful in evaluating risk versus reward, guiding strategic planning and investment decisions. This calculation helps in understanding how different outcomes contribute to overall performance, reflecting the business’s potential future earnings .