Boost Your Potential Workshops:
Manipulating Stats and Generating Reports
(Basic Level)
Session 2
Prepared and Presented by:
Razan Youssef, Daniel Bou Najm, and Malak Al Bourji
M2 Neuroscience, LU
Under the supervision of Dr. Sarine El Daouk
1
Descriptive Statistics and Data Visualization
2
Variable Types
Categorical Numeric
Nominal Ordinal Discrete Continuous
Blood type: Education level: • Number of people • Age
• A • No formal education • Number of correct • Height
• B • Elementary / Middle School answers (MCQ) • Weight
• AB • High School (Can’t be halved)
• O • Bachelor's Degree
• Master's Degree
Gender: • Ph.D.
• Male
• Female
3
CODING OF VARIABLES
4
CODING
Coding–process of translating data gathered from questionnaires or other sources into something that can be
analyzed. It Involves assigning a value to the Data given—often value is given a label.
SPSS only reads numbers and not letters!!!
We need to code categorical variables on SPSS. No need to code numeric data!
5
CODING
Nominal Variables
For coding nominal variables, the order makes no difference
Example: variable RESIDENCY
1 = Nabatieh
2 = South
3 = Beirut
4 = North
5 = Bekaa
6= Akkar
7= Baalbeck – Hermel
8= Mount Lebanon
Order does not matter, no ordered value is associated with each response (coding can be at random)
6
CODING
Nominal Variables
Common coding systems (code and label) for dichotomous variables:
0=No and 1=Yes
OR 1=No and 2=Yes
Order does not matter, no ordered value is associated with each response
(For yes and no, it is better to start coding from “no”)
7
CODING
Ordinal Variables
Coding process is similar to other categorical variables
Example: variable EDUCATION, possible coding:
1 = Did not graduate from high school
2 = High school graduate
3 = Some college or post-high school education
4 = College graduate
Could be coded in reverse order (1=college graduate > 4= Did not graduate high school).
8
CODING
Ordinal Variables
Example of BAD coding:
0 = Very satisfied
1 = Not satisfied
2 = Satisfied
3 = Neutral
Data has an inherent order but coding does not follow that order—NOT
appropriate coding for an ordinal categorical variable
Correct way: (or vice versa)
0 = Very satisfied
1 = Satisfied
2 = Neutral
3 = Not Satisfied
9
HOW TO CODE ON SPSS
Code on Excel Code on SPSS
1. First, identify the variable you want to code and copy its categories from excel or SPSS
(so you can paste them directly in SPSS)
1. Open SPSS > Transform > Recode into different variables
2. Select the variable you want to recode and click the arrow to make the variable move to the box
3. Put the name of the variable (best to put beside it NEW). Then label it. Click Change
4. Click Old and New Values
5. Put the old value as the category name, and the new value as its code (1, 2, etc.). Click Add for each variable
you put.
6. Once you are done, click continue then OK.
10
11
1
2 Once you name, label, and click change, you
click on “old and new values”. It is better to
name it something unique like “NEW” or
“CODED”
12
You can copy the
value from Excel
13
Remember the code for future
step!!!!
Male = 1
Female = 2
Then click OK
14
You will see in the variable view the new variable you created:
Now let us add the values…
15
Click on the blue button
Recall what code numbers you put for each
variable
Add the number in “value”
Add the name of the category in “label”
Click Add
Repeat when necessary 16
To confirm, click on Data view > Value Labels
Then click OK
17
Descriptive Statistics and Data Visualization
RUNNING DESCRIPTIVE STATISTICS CATEGORICAL VARIABLES
18
Select Analyze > Descriptive Stats > Frequencies
19
There are
missing values
in the data
= 17
20
Descriptive Statistics and Data Visualization
RUNNING DESCRIPTIVE STATISTICS CONTINUOUS (NUMERIC) VARIABLES
21
MEASURES OF CENTRAL TENDENCY
MEAN, MEDIAN, and MODE
• Mean: The "average" number; found by adding all data points and dividing by the number of data points.
• Median: The middle number; found by ordering all data points and picking out the one in the middle.
• Mode: The most frequent number—that is, the number that occurs the highest number of times.
• Mean ± SD and Median ± IQR
Most of the time the mean is reported (especially if the sample size is large), while the median is reported when the
sample size is small.
22
MEASURES OF DISPERSION
VARIANCE AND STANDARD DEVIATION
• Standard deviation: how spread out the data is. You can think of it as “the average distance of the data from the
mean”.
• Variance: It is the square of the standard deviation that also represents how dispersed the data is
23
THERE ARE TWO WAYS:
1st way to calculate the mean, median, and mode:
Select Analyze > Descriptive Stats > Frequencies
24
Once you put your variable(s), Click on the “statistics” button > Check the parameters
Click Continue
25
26
2nd way (to calculate mean, median, SD, IQR, and variance)
Select Analyze > Descriptive Stats > Explore > Add your variable of interest to the
“Dependent List” > Click OK
27
Mean ± SD
25.8 ± 4.1
Variance = 17.2
Median ± IQR
26.1 ± 6.8
28
Categorical Data
- Frequency
Descriptive Statistics - Percentage/proportion
Continuous Data
- Mean ± Standard Deviation
- Median ± IQR
Exploring Data
Categorical Data
- Bar chats
Graphical Illustrations - Pie graphs (preferably %)
They can be in frequency or percentages
Continuous Data
- Histogram
29
• Bar charts and pie charts for categorical variables can be done in Excel
(after cleaning the data).
• You can copy and paste the table output (frequencies) from SPSS to Excel
and modify it to put your variables (in rows) and Frequency or Percentage
in columns. It is important to label the axes!!!
30
31
• To generate Histograms for numeric variables on SPSS, click
Graphs > Legacy Dialogues > Histogram > Add the variable of interest
32
It is best to wrap text while pasting the graph in your report
33
INFERENTIAL STATISTICS
34
Plan
A- Main Statistical Tests:
Two-way Chi-square Test
Two-independent sample T Test
One-way ANOVA Test
Pearson Correlation
B- Linear Regression:
Simple Linear Regression
Multiple Linear Regression
35
Flow Chart
2 Groups Two-independent sample T Test
Continuous >2 Groups One-way ANOVA Test
Continuous Pearson Correlation
Outcome/
Dependent
Variable Exposure/ Independent Variable
Categorical >= 2 Groups Two-way Chi-square Test
36
1
Chi-Square:
χ2 Test of Independence
37
Chi-Square, χ2 Test of Independence
The chi-square test is used to determine whether there is an association between two
categorical variables.
Conditions:
• Both variables should be nominal
Yes/No, Male/Female, …
• Each variable can contain two or more groups
Low/Moderate/High, …
38
Chi-Square, χ2 Test of Independence
Steps:
1. Analyze → Descriptive Statistics → Crosstabs
2. Transfer the “exposure” variable into the row(s) and the “outcome” variable into the
column(s)
3. In “Statistics”, select “Chi-square”. Then click Continue
4. In “Cells”, select “Observed” in Counts, & “Row” in Percentages
39
Chi-Square, χ2 Test of Independence
Output & Interpretation:
3 tables will appear in the output sheet:
1) Case Processing Summary:
This table highlights the number of valid and missing
cases
2) Crosstabulation:
It is a descriptive statistics table
40
How to read the results?
Chi-Square, χ2 Test of Independence
Output & Interpretation:
3 tables will appear in the output sheet:
3) Chi-square tests
The p-value should be <0.05 to deduce a significant association
between both variable
In this case, the p-value = 0.000 (which is reported as “<0.001”.
This means that there is a significant difference in the percentage
of the outcome (Depression) and the different groups of the
exposure (Social media use)
➢ P-value indicates if there is a statistically significant difference between 2
P-value groups, or if there is an association between 2 variables
➢ The lower the p-value, the higher the significance
➢ A p-value lower that 0.05 (5%) is considered as significant 41
2
Two Independent Sample T-test
42
Two Independent Sample T-test
This test is used to determine weather the mean value of an outcome variable is
significantly different between two groups of participants.
Conditions:
1. Dependent Variable (Outcome) is a continuous variable
Ex: Blood Sugar Level, Blood Pressure, Scale total score, …
2. Independent Variable (Exposure) is a categorical variable with 2 groups
Ex: Gender, …
43
Two Independent Sample T-test
Steps:
1. Analyse → Compare Means → Independent-Samples T Test
2. Transfer the dependent/continuous variable into “Test Variable(s)” and the independent/categorical
variable into “Grouping Variable”
3. Click on “Define Groups” and add the code of each group based on how you coded it
In this example, 1=Male and 2=Female (You can verify in the Variable View → “Gender” → Label)
44
Two Independent Sample T-test
Output & Interpretation:
2 Tables will appear:
1) Group Statistics: Descriptive statistics table
showing the frequency (N) with the mean and
standard deviation
2) Independent Samples Tests: It is divided into 2
parts:
• Levene’s Test for Equality of Variances
• t-test for Equality of Means
45
How to read the results?
Two Independent Sample T-test
Output & Interpretation:
A) Levene’s test for equality of variance:
• If the Levene’s Test p-value is > 0.05, read the P-value at the top line
(As there is no significant difference in the variances between both groups → “Equal variances assumed)
• If this Levene’s Test p-value is < or = 0.05, read the P-value at the bottom line
(As there is a significant difference in the variances between both groups → “Equal variances not assumed)
B) T-test results:
• If P-value > 0.05 → no significant difference between groups
• If P-value < or = 0.05 → significant difference between groups
46
How to read the results?
Two Independent Sample T-test
Output & Interpretation:
In this example:
A) P-value (Sig.) of Levene’s Test is 0.438 >0.05 ➔ So, we continue reading the top row
B) P-value (Sig. 2-tailed) of the t-test is 0.029 <0.05 ➔ So, there is a significant difference in the mean of the
outcome (Memory Satisfaction between both groups (Gender)
47
3
One-Way ANOVA
ANalysis Of VAriance
48
One-way ANOVA Test
Same objective as the two independent sample T-test, but it differs but the number of
groups of the categorical variable:
• Two independent sample T-test is used to show if there is a difference between the
mean values of two independent groups.
• ANOVA is used to compare differences in the mean values of three or more
independent groups
Conditions:
• Dependent variable (Outcome) is continuous
• Independent variable (Exposure) is categorical with three or more groups
49
One-way ANOVA Test
Steps:
1. Analyze → Compare mean → One-Way ANOVA
2. Transfer the dependent/continuous variable into “Dependent List” and the independent/categorical
variable into “Factor”
3. Click on “Option” then on “Descriptive” and “Homogeneity of variance test”
50
How to read the results?
One-way ANOVA Test
Output & Interpretation:
3 tables will appear:
1) Descriptives: It indicates the frequency (N),
mean, SD, 95% confidence interval, …
2) Test of Homogeneity of Variances:
P-Value > 0.05 is required to use the ANOVA test.
In this example, p-value = 0.721
51
How to read the results?
One-way ANOVA Test
Output & Interpretation:
3 tables will appear:
3) ANOVA:
P-Value < 0.05 means that the mean of the
dependent variable differs significantly among
the different groups of the independent
variable
In this example, P-value <0.001 which means
that there is a significant difference in the
mean of memory satisfaction among at least 2
stages of depressive symptoms
52
4
Pearson Correlation
53
Pearson Correlation
Correlation is used to determine the relationship between two continuous variables.
➢ Positive correlation coefficient ➔ Both variables increase in value together
➢ Negative coefficient ➔ One variable decreases in value while the other increases
This test calculates a coefficient called “Pearson’s correlation coefficient (r)” that will give
an idea about the strength of the association between the two variables.
54
Pearson Correlation
Steps:
1. Analyze → Correlate → Bivariate
2. Transfer both continuous variables to the « Variables » section
3. Click on « Pearson » in the correlation coefficients
55
How to read the results?
Pearson Correlation
Output & Interpretation: 1) Read the p-value (Sig. 2-tailed):
• If >0.05: The 2 variables are not correlated
1 table will appear: • If <0.05: There is a correlation between both variables
2) Read the “Pearson Correlation” coefficient:
• Only if p-value < 0.05
• This coefficient indicates the strength of the correlation
• The strength is assumed by comparing it to the Pearson
coefficient table
Pearson Coefficient “r” Correlation
In this example:
0.00 – 0.19 Very Weak
• P-value <0.0001, Pearson coefficient
0.20 – 0.39 Weak • Memory satisfaction and somatic symptoms variables are
0.40 – 0.59 Moderate negatively correlated. However, this correlation is weak (r=-0.321,
0.60 – 0.79 Strong p<0.0001)
0.80 – 1.00 Very Strong 56
5
Single Linear Regression
57
Simple Linear Regression
The simple linear regression is used to predict the value of a dependent variable
(outcome) based on the value of an independent variable
(Predictor/Explanatory factor).
It makes predictions about the values of one variable based on values of a
second variable by generation a regression equation.
Conditions:
• Dependent Variable (Outcome) should be a continuous variable
• Independent Variables (Exposure) should also be a continuous variable
58
Simple Linear Regression
Steps:
1. Analyse → Regression → Linear
2. Transfer the dependent and Independent variable
59
How to read the results?
Simple Linear Regression
Output & Interpretation:
4 tables will appear
3) ANOVA:
Determines if this model is a good fit to predict
the outcome
P-value < 0.005 is essential. If not, linear
regression can’t be done
4) Coefficients
If this regression model was shown to be a good
fit, this table permits to generate a regression
equation:
Y= B0 + B1*X Memory_Satisfaction = 48.23 - 1.2 * (Somatic_Symptoms)
60
6
Multiple Linear Regression
61
Multiple Linear Regression
A standard multiple regression allows you to predict a dependent variable (outcome)
based on multiple independent variables.
It is an extension to simple linear regression
Conditions:
• Dependent Variable (Outcome) should be a continuous variable
• Independent Variables can be both continuous and categorical (unlike the
simple linear regression)
62
Multiple Linear Regression
Steps:
1. Analyse → Regression → Linear
2. Transfer the dependant variable and all the independent variables that had a p-value<0.05 in the
simple linear regression.
3. In Statistics, click on « Confidence Intervals » in addition to « Estimates » and « Model fit »
63
How to read the results?
Multiple Linear Regression
Output & Interpretation:
1) Variables Entered/Removed:
Indicates the dependent variable along the independent
variables that were entered
2) Model Summary:
R Square determines the percentage of the variability of the
outcome by the independent variable.
In this example, R Square = 0.194 which means that all the
independent variables combined explain 19.4% of the
variability of the dependent variable (Satisfaction)
64
How to read the results?
Multiple Linear Regression
Output & Interpretation:
3) ANOVA:
Determines if this model is a good fit to
predict the outcome
P-value < 0.005 is essential. If not, multiple
linear regression can’t be done
4) Coefficients
If this regression model was shown to be a
good fit, this table permits to generate a
regression equation.
Only variables with a p-value <0.05 are
included in the equation:
Y= B0 + B1*X + B2X2 + …
65
How to read the results?
Multiple Linear Regression
Output & Interpretation:
In this example:
- Gender, Health evaluation and
Depressive symptoms are the only
variables included in the equation as p-
value < 0.05
Equation:
Memory_Satisfaction = 48.63 - 4.244*Gender + 2.714*Health_Evaluation
- 4.451*Depressive_Symptoms
66
Report Writing
How to present data results ?
67
What is a Report?
• a structured document that presents information, findings, or results in a clear and organized manner.
• typically includes an introduction, a main body with details or analysis, and a conclusion.
• is used to communicate data, research, analysis, or recommendations to inform decision-making or
provide insights on a specific topic.
68
Organizing and presenting statistical results
The statistical results (numbers) should be presented with appropriate visual aids.
Histogram
• Continuous variable
Bar chart
• Categorical variable
• Vertical or horizontal
Pie chart
• Categorical variable
69
DO NOT “COPY-PASTE” tables and charts DIRECTLY from the output of SPSS.
70
Parts of a Report:
1. Introduction
• Background for the topic: definitions, statistics, scale overview, etc.
• Objective: the aim of the report specifically (Often, it can be known from the title of the scale used).
2. Methodology = Materials and Methods.
• Identify the scale if used.
• How the scale was generated (details like how we share the scale and organize it).
• General interpretation for the total score: how can we know the result of data analysis?
Case of a SCALE.
71
In any other General study:
• Ethical consideration/approval.
• Target population (common with the scale case).
• Variables (conditions, exposures), outcomes and their types (categorical or continuous).
• Statistical tests conducted.
• Significance of the P-value (0.05).
72
3. a. Results:
• Mention the total sample size (N).
• Reporting the results of the main variables (interested variables).
• Reporting the results of the outcomes.
Use the graphs, charts and tables.
• Write a title for each figure (below) and table (above).
• Figures and tables must have ordered numbers.
• Write a simple and direct interpretation (significance difference or values interpretation) of the most
important results in this visual aid.
• Case of total and individual scores (scales):
Split the results into 2 parts.
73
3. b. What numbers must be reported in the tables?
Descriptive Statistics:
• Categorical Variables: Frequency (n) and percentage (%).
• Continuous Variables: mean and Standard Deviation (SD).
Statistical Tests:
• Two-Sample T-Test: mean, SD for the 2 groups + mean difference, 95% Confidence Interval (CI),
and P-value.
74
• One Way ANOVA Test: frequency (n), means, SD and P-value.
• Pearson & Spearman Correlation: Pearson coefficient and P-value.
75
• Two-way Chi-Square Test: frequency (n), percentage (%) and P-value.
Scale:
• Total Score
• Individual scores for each item/question of the scale.
76
• Simple & Multiple linear regression: B coefficient, P-value, 95% CI and R square adjusted.
77
Scale
Total Score
domains
• Tables
Total score of the Total score of
scale (overall) each domain • Bar graphs
• Dot plots
Total score of
Total score of the
each subdomain
scale in each
in each academic
academic year
year
78
Categorical variable reporting
Continuous variable reporting
(N=141)
Table 3.
(N=141)
79
4. Conclusion.
• General conclusion: can be recognized from the total score in case of scale.
• Strengths of the results.
• Limitations of the results (or data processing).
• Recommendations based on the limitations.
80
5. Tips for Reporting
81
82
THANK YOU
If you have any questions, do not hesitate to contact us!
razanbyoussef@[Link]
daniel.bounajm496@[Link]
malakbourji060@[Link]
83