AP Statistics Winter Project
Project Overview
You will design an experiment or select a real-world dataset to perform a complete statistical
analysis(at least one inference method you learn from AP Statistics). You have two paths:
1. Experimental Path: Design and execute a randomized experiment (e.g., A/B testing),
make inference on the data.
2. Observational Path: Collect or find data to analyze trends or relationships.
Ideally, your dataset is expected to contain at least 100 data points. However, if you choose the
Experimental Path—where data collection is often more challenging—a smaller sample size is
acceptible.
Group Composition
• Group Size: You may work individually or in groups of up to 3 people.
• Role Accountability: Each group must submit a brief Contribution Statement with their
report, outlining who was responsible for:
o Data collection and cleaning.
o Data Analysis
o Report writing
o Presentation
(Note: Everyone is expected to understand the whole project for the Q&A portion of
the presentation!)
• Peer Evaluation: At the end of the project, each member will fill out a confidential peer-
review form and email to the teacher. Individual grades may be adjusted based on the
consistency of contribution.
Deliverables
• A 5-Minute Presentation: A concise summary of your question, data, inference
method(s), and conclusion.
• A Report: Choice of a formal paper, a Jupyter Notebook/R-Markdown file (showing both
code and output), or a creative poster (A3 or bigger). Background introduction, the
dataset, data summary, methodoloies, inference procedures, and conclusion must be
included in the report.
Project Deadlines
Project Proposal Submission (Topic, Group Members, and chosen Path): 11:59 p.m., Feb 7
o Email to the teacher, titled as “Path option+member names”.
Report Submission: 11:59 p.m., March 1
o If digital, compress all files and email to the teacher, titled as “Project name+member
names”.
Presentation: March 3
Technical Requirements
1. Exploratory Data Analysis (EDA): You must include basic summary statistics and at least
one visual display (histogram, boxplot, scatterplot, etc.) that describes your data.
2. The Inference:
o State your hypotheses (𝐻! 𝑣. 𝑠. 𝐻" )with defined parameters.
o You must check all “Exam Standard” conditions (Random, Independent/10%,
Normal/Large Counts,etc.). If a condition is not met (e.g., your sample size is too
small), you must explain the potential impact on your results and justify why you
are proceeding anyway.
o Calculate the test statistic and 𝑝-value.
o State a conclusion in the context of your problem.
Academic & Data Integrity
To ensure the validity of your research and the fairness of the project, all groups must adhere to
the following standards. Failure to cite sources will result in a significant point deduction or a
zero for the project.
• Data Sources: If you are using an existing dataset (e.g., from Kaggle, [Link], or a
research paper), you must provide a full URL and a brief description of how the data was
originally collected.
• AI & Code Attribution: If you used AI (like ChatGPT or Gemini) or external libraries to
generate your Jupyter Notebook code or to help structure your report, you must include
an "AI & Code Appendix."
o Briefly describe what you asked the AI to do (e.g., "Generated the code for the side-
by-side boxplots").
o State how you verified that the code/output was statistically correct.
Alternative Project: Inference Procedure Poster
If you feel the full original research project is too challenging at this stage, you may opt for a
individusal project of creating a poster that summarizes the logic and execution of one inference
category (z-tests, t-tests, or Chi-Square tests). This deliverable requires a Chinese New Year
aesthetic and a full demonstration of the inference procedures applied to standard textbook
problems.
Note: You must include all the specific test methods learned under your chosen category. (For
example, the "t-test" category includes the one-sample t-test, two-sample t-test, and t-test for the
slope). Because this project focuses on summarizing existing problems rather than original data
collection and research, the maximum score achievable is 80/100.
Project Rubrics
Category Emerging (Low) Proficient (Mid) Mastery (High)
Evidence of "Tidying"
Data is messy or Clear description of
Data Cleaning data or using
design has obvious sampling/experiment
& Design advanced design
confounding variables. or data source.
(Blocking/Stratifying).
Use of Multivariate
Professional, labeled
Graphs are present but visuals (e.g.,
Visual graphs that clearly
lack labels or are color/size as a 3rd
Communication show
inappropriate types. variable) or advanced
patterns/outliers.
libraries.
Correct test chosen;
Precise interpretation
Test performed but all conditions
Statistical of P-value and
conditions/assumptions (Random, 10%,
Inference Confidence Interval in
are ignored. Normal, etc.)
context.
verified.
Successful
Attempted
multivariate linear
The No attempt at expansion but with
model or outside-
"Expansion" expansion. minor conceptual
scope test with high-
errors.
level justification.
"Publication-ready"
Clear, professional
(Clean Jupyter
Deliverable Disorganized; difficult flow (Intro →
Notebook,
Quality to follow the logic. Methods → Results
RMarkdown, or high-
→ Conclusion).
impact Poster).