OTHER
Research hypothesis = testable statement that predicts how 2 or more variables correlate if
they correlate at all
o Those with traumatic brain injuries will have higher depressive symptoms than those
with mild or moderate
Statistical hypothesis = mathematically precise, correspond to specific claims, aimed at a
population, usually absolutes (there is or isn't), non-directional and doesn't tease out which
group is affected so might not help answer research hypothesis
o There is (or is not) a difference with individuals with depression score for mild,
moderate and severe traumatic brain injuries
o H0 = No trend or pattern
o H1 = Distinct trend or pattern (difference between variables)
o Directional hypotheses use one tailed tests, non-directional use two-tailed (only looking
at two-tailed in this course, take probability of one tail and double it)
Step by step process
o Background reading to identify RQ and hypotheses > design study/identify methods and
measures > carry out study > general descriptive statistics > identify variables involved >
identify the appropriate test > check assumptions for test > run test and conclude
Types of Research Design
o Longitudinal vs Cross-sectional
Longitudinal considerations; test-retest effects, participant attrition (drop-out),
time investment, retrospective data (recall bias)
Time-points (waves) either predicting over time or measuring change, related
groups t-test (numeric) or McNemar's test (categorical)
o Experimental vs Non-experimental
Criteria for a cause and effect: covariance rule (relationship), temporal
precedence (cause must come first), internal validity (excluding other causes)
Non-experimental methods; single-variable, correlational research, quasi-
experimental
Experimental methods; between-subjects (researcher control, non-naturally
occurring groups, distinct), within subjects (same group of people but with
different 'conditions')
Between subjects needs more participants, statistically less powerful,
shorter experimental time good for participants, no carry-over effects
Within subjects needs half the participants, statistically powerful,
double experimental time, carry-over effect potential
o Survey vs Observational
Probability sampling
o Simple random, systematic (non-bias method to select), stratified random (characteristic
groups randomly chosen between), cluster sampling (only select certain clusters)
Non-probability sampling
o Convenience sampling, snowball sampling
Reliability = consistency/dependability of measurement, reliability of scale, does it measure
construct
Validity = face validity, content validity, criterion validity, concurrent vs predictive, convergent,
discriminant
Types of data = Qualitative vs Quantitative (or mixed) / Discrete vs Continuous
o Data analysis for qualitative is often described/explained, coded for common themes
(inter-rater reliability important) and then turned into variables for analysis
o
o Independent (predict/cause an outcome) vs dependent (outcome) = E.g. PAL attendance
(IV) on final grade (DV)
o Extraneous variable (anything other than IV/DV) and confounding variable (potentially
explain relationship)
o Nominal (unordered, categorical, arbitrary with no hierarchy, one example is
binary/dichotomous variable with only two groups/levels). Gender is nominal,
woman/man/other, totally distinct
o Ordinal (ordered categorical, hierarchy). Highest level of education.
o Interval (distance is meaningful, numeric scale with consistent differences between
points). Pain on a scale from 0-10.
o Ratio (numeric scale with consistent differences between points AND absolute zero, e.g.
0 Celsius doesn't mean there's no temperature, while 0 means absolute absence of the
thing you're measuring). Distance between two friends sitting next to each other (cms).
o
One categorical variable = Bar chart (frequency of categories, greater detail), pie chart (better
for comparing two or more categorical values)
o Two categorical variables = Contingency table (numeric summary), clustered bar chart
(graph)
One numerical variable = Comparative boxplot, histogram (no comparison, shape of distribution
shown)
o Two numerical variables = Correlation (numeric summary), scatterplot (graph)
Numeric summaries; frequency tables with count, percentage or proportion. Mean or median,
SQ or IQR, variance and range.
o With two categorical variables we create a cross-tabulation of one categorical variable
by another, allows comparison
Interval estimate gives us a range of believable values for the parameter, the interval/range is
called a confidence interval. Need a confidence level of 95%
Why 5% significance level? Minimize type 1 error (we reject the null hypothesis when we
shouldn't, false positive result, sample effect is due to chance)
o Type 2 error is not rejecting the null hypothesis when we should, our sample isn't
detecting a population effect, false negative result
o
Power is the probability that we correctly reject the null hypothesis, influenced by significance
level (higher significance, higher power). Also impacted by sample size (power increase with
larger size) and the variability in DV (more variable the DV, harder it is to reject)
TESTS
One sample t-test = single numeric variable
One sample z-test = single numeric variable (population mean known)
o Z value close to 0 big probability, far from 0 is a small probability
Chi-square goodness of fit test = single categorical variable
Correlation = two numeric variables
Independent samples/two-sample t-test = one numeric variable, one categorical variable
Paired samples t-test = one numeric variable, one categorical variable (related groups)
Chi-square test of independence = two categorical variables
McNemar's test = two categorical variables (different time point categories)
Effect sizes
o Cohen's d = 0.2 (small), 0.5 (medium), 0.8 (large)
o Cohen's w = 0.1 (small), 0.3 (medium), 0.5 (large)
o Pearson's r = 0.1 (small), 0.3 (medium), 0.5 (large)
o Pearson's correlation coefficient (all +)= 0-.10 (weak-none), .10-.30 (weak), .30-.50
(moderate), .50-1.00 (strong)
Descriptive statistics, understanding the sample (and comparison to population), phenomena
and the statistical analyses needed
o Demographics = tab1 Gender Ethnicity (frequency table), summarize Age (mean, range,
std)
o Key Variables = tab1 control group, tab1 experimental group
o Histogram frequency charts
STATA
label define yesno 1 "Yes" 2 "No"
o Labelling values that was previously numbers
label values variable variable variable yesno
o Attaches the yes/no label to specific variables
tab1 variable
o Frequency table
tabulate variable, expected row chi2
o Assumption check that expected frequency is at least 5 for Chi square test of
independence
recode variable (min/9 = 0) (10/max = 1), generate(new_variable_name)
o Separates data into 0 and 1 (under/over the limit) rather than a whole range of numbers
to summarize
summarize variable, detail
o Descriptive statistics (mean, standard deviation, median, interquartile range, min and
max)
swilk variable
o One sample t-test assumption check of normality
reshape wide tol, i(id) j(age)
o This will restructure the data so that we have separate columns of tolerance score for
each age group and person.
robvar Mean_general_ds, by(Gender)
o
o Levene's test to test equality of variances, null hypothesis is that they're equal, do we
reject it