0% found this document useful (0 votes)

11 views15 pages

Data Analyst Interview Questions Guide

This document provides a comprehensive guide for data analyst interviews, including over 50 questions categorized into SQL, Python/Pandas, Statistics, Business/Case Studies, and Behavioral questions. Each section contains common interview questions along with detailed answers and explanations to help candidates prepare effectively. The guide emphasizes the importance of understanding concepts and articulating thought processes during interviews.

Uploaded by

nehasonis2424

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views15 pages

Data Analyst Interview Questions Guide

Uploaded by

nehasonis2424

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Analyst Interview Questions

SQL + Python + Statistics + Business

50+ questions with answers to crack your next interview

What's Inside:
• SQL Questions (Most asked - 15 questions)

• Python/Pandas Questions (10 questions)

• Statistics Questions (10 questions)

• Business/Case Study Questions (10 questions)

• Behavioral Questions (5 questions)

PART 1: SQL Questions
SQL is asked in 95% of data analyst interviews. Master these.

Q1: What's the difference between WHERE and HAVING?

Answer:

• WHERE filters rows BEFORE grouping (works on raw data)

• HAVING filters AFTER grouping (works on aggregated results)

-- WHERE example: Filter before grouping

SELECT department, AVG(salary)
FROM employees
WHERE status = 'Active' -- filters individual rows first
GROUP BY department;

-- HAVING example: Filter after grouping

SELECT department, AVG(salary)
FROM employees
GROUP BY department
HAVING AVG(salary) > 50000; -- filters grouped results

Q2: Find the second highest salary from Employee table.

Answer: Multiple approaches:

-- Method 1: Subquery
SELECT MAX(salary) FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

-- Method 2: LIMIT OFFSET (MySQL/PostgreSQL)

SELECT DISTINCT salary FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

-- Method 3: DENSE_RANK (works for Nth highest)

SELECT salary FROM (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) as rnk
FROM employees
) ranked
WHERE rnk = 2;

Q3: Find duplicate records in a table.

SELECT email, COUNT(*) as count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;

-- To see all duplicate rows:

SELECT * FROM customers
WHERE email IN (
SELECT email FROM customers
GROUP BY email
HAVING COUNT(*) > 1
);

Q4: What are different types of JOINs?

Answer:

• INNER JOIN - Only matching rows from both tables

• LEFT JOIN - All rows from left + matching from right (NULL if no match)

• RIGHT JOIN - All rows from right + matching from left

• FULL OUTER JOIN - All rows from both (NULL where no match)

• CROSS JOIN - Every row from A with every row from B (Cartesian product)
Q5: Difference between RANK, DENSE_RANK, and ROW_NUMBER?
-- Data: Salaries = 100, 100, 90, 80

-- ROW_NUMBER: 1, 2, 3, 4 (unique numbers, no gaps)

-- RANK: 1, 1, 3, 4 (same value = same rank, gaps after ties)
-- DENSE_RANK: 1, 1, 2, 3 (same value = same rank, no gaps)

SELECT
name, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) as row_num,
RANK() OVER (ORDER BY salary DESC) as rank,
DENSE_RANK() OVER (ORDER BY salary DESC) as dense_rank
FROM employees;

Q6: Write a query to find employees who earn more than their manager.
SELECT [Link] as employee, [Link] as emp_salary,
[Link] as manager, [Link] as mgr_salary
FROM employees e
JOIN employees m ON e.manager_id = m.employee_id
WHERE [Link] > [Link];

Q7: What's the difference between DELETE, TRUNCATE, and DROP?

• DELETE - Removes specific rows, can use WHERE, can rollback, slower

• TRUNCATE - Removes ALL rows, cannot use WHERE, faster, resets auto-increment

• DROP - Removes entire table structure + data, cannot recover

Q8: Find customers who placed orders in both 2023 AND 2024.
-- Using INTERSECT
SELECT customer_id FROM orders WHERE YEAR(order_date) = 2023
INTERSECT
SELECT customer_id FROM orders WHERE YEAR(order_date) = 2024;

-- Using JOIN
SELECT DISTINCT o1.customer_id
FROM orders o1
JOIN orders o2 ON o1.customer_id = o2.customer_id
WHERE YEAR(o1.order_date) = 2023 AND YEAR(o2.order_date) = 2024;

Q9: Calculate running total of sales.

SELECT
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date) as running_total
FROM sales;

-- Running total by customer

SELECT
customer_id,
order_date,
amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) as customer_running_total
FROM sales;
Q10: Find the top 3 products by sales in each category.
SELECT * FROM (
SELECT
category,
product_name,
total_sales,
DENSE_RANK() OVER (
PARTITION BY category
ORDER BY total_sales DESC
) as rank
FROM products
) ranked
WHERE rank <= 3;
Quick SQL Questions:

Q11: UNION vs UNION ALL?

UNION removes duplicates (slower). UNION ALL keeps all rows (faster).

Q12: What is a Primary Key vs Foreign Key?

Primary Key: Unique identifier for each row, cannot be NULL.

Foreign Key: References primary key of another table, creates relationship.

Q13: What is normalization?

Organizing data to reduce redundancy. 1NF (no repeating groups), 2NF (no partial dependencies), 3NF
(no transitive dependencies).

Q14: What is an index? When to use?

Index speeds up data retrieval (like a book index). Use on columns used in WHERE, JOIN, ORDER BY.
Don't over-index - slows down INSERT/UPDATE.

Q15: NULL vs empty string vs 0?

NULL = unknown/missing value. Empty string = blank text (exists but empty). 0 = actual numeric value.

■ SQL Interview Tip: Always talk through your approach before writing. 'First I'll filter X, then join with Y,
then aggregate by Z.' Shows you can think, not just memorize.
PART 2: Python/Pandas Questions
Q1: How do you handle missing values in pandas?
# Check missing values
[Link]().sum()

# Drop rows with any missing values

[Link]()

# Drop rows where specific columns are missing

[Link](subset=['email', 'phone'])

# Fill missing values

df['salary'].fillna(df['salary'].mean(), inplace=True)
df['category'].fillna('Unknown', inplace=True)

# Forward fill (use previous value)

df['value'].fillna(method='ffill')

Q2: Difference between loc and iloc?

• loc - Label-based selection (uses column names, index labels)

• iloc - Integer-based selection (uses position numbers)

# loc (by label)

[Link][0:5, ['name', 'salary']] # rows 0-5 (inclusive), specific columns

# iloc (by position)

[Link][0:5, 0:3] # rows 0-4 (exclusive end), columns 0-2

Q3: How to merge two DataFrames?

# Inner join (default)
merged = [Link](df1, df2, on='customer_id')

# Left join
merged = [Link](df1, df2, on='customer_id', how='left')

# Join on different column names

merged = [Link](df1, df2, left_on='cust_id', right_on='customer_id')

# Multiple join columns

merged = [Link](df1, df2, on=['customer_id', 'product_id'])

Q4: Explain groupby in pandas.

# Basic groupby
[Link]('category')['sales'].sum()

# Multiple aggregations
[Link]('category')['sales'].agg(['sum', 'mean', 'count'])

# Group by multiple columns

[Link](['region', 'category'])['sales'].sum()

# Different aggregations for different columns

[Link]('category').agg({
'sales': 'sum',
'quantity': 'mean',
'customer_id': 'nunique'
})

Q5: How to create a new column based on conditions?

# Simple condition
df['status'] = [Link](df['score'] >= 50, 'Pass', 'Fail')

# Multiple conditions
df['grade'] = [Link](
[df['score'] >= 90, df['score'] >= 80, df['score'] >= 70],
['A', 'B', 'C'],
default='F'
)

# Using apply with function

def categorize(row):
if row['sales'] > 10000:
return 'High'
elif row['sales'] > 5000:
return 'Medium'
return 'Low'

df['tier'] = [Link](categorize, axis=1)

Q6: List vs Tuple vs Set vs Dictionary?

List: Ordered, mutable, allows duplicates [1, 2, 3]

Tuple: Ordered, immutable, allows duplicates (1, 2, 3)

Set: Unordered, mutable, NO duplicates {1, 2, 3}

Dict: Key-value pairs, keys unique {'a': 1, 'b': 2}

Q7: What is a lambda function?

# Lambda = anonymous one-line function
square = lambda x: x**2

# Common use with pandas

df['column'].apply(lambda x: [Link]())
df.sort_values(key=lambda x: [Link]())

Q8: How to remove duplicates in pandas?

df.drop_duplicates() # All columns
df.drop_duplicates(subset=['email']) # Based on email
df.drop_duplicates(keep='last') # Keep last occurrence

Q9: Difference between apply, map, and applymap?

• apply: Works on rows or columns (Series or DataFrame)

• map: Works element-wise on a Series only

• applymap: Works element-wise on entire DataFrame

Q10: How to pivot a DataFrame?

# Pivot table
pivot = df.pivot_table(
values='sales',
index='region',
columns='category',
aggfunc='sum'
)

# Melt (unpivot)
melted = [Link](df, id_vars=['id'], value_vars=['Q1', 'Q2', 'Q3'])
PART 3: Statistics Questions
Q1: Mean vs Median vs Mode - when to use what?
Mean: Average of all values. Use for normally distributed data.

Median: Middle value. Use when data has outliers (salary, house prices).

Mode: Most frequent value. Use for categorical data.

Example: Salaries of 50K, 55K, 60K, 65K, 500K → Mean=146K (misleading), Median=60K (better)

Q2: What is standard deviation?

Measures how spread out data is from the mean. Low SD = data clustered around mean. High SD = data
spread out.

68% of data within 1 SD, 95% within 2 SD, 99.7% within 3 SD (for normal distribution).

Q3: What is correlation? What does correlation of 0.8 mean?

Correlation measures linear relationship between two variables (-1 to +1).

+1 = Perfect positive (both increase together)

-1 = Perfect negative (one increases, other decreases)

0 = No linear relationship

0.8 = Strong positive correlation - when one increases, the other likely increases too.

■■ Correlation ≠ Causation. Ice cream sales and drowning are correlated (both high in summer), but
one doesn't cause the other.

Q4: What is a p-value?

Probability of getting your result if the null hypothesis is true.

p < 0.05: Result is 'statistically significant' - unlikely due to random chance.

p > 0.05: Cannot reject null hypothesis - result might be due to chance.

Q5: Type I vs Type II error?

Type I (False Positive): Rejecting true null hypothesis. Example: Saying drug works when it doesn't.

Type II (False Negative): Failing to reject false null. Example: Saying drug doesn't work when it does.

Q6: What is the Central Limit Theorem?

When you take many samples from ANY distribution, the sample means will be normally distributed.

This is why normal distribution is so important - it shows up everywhere.

Q7: What is A/B testing?

Comparing two versions to see which performs better.

• Split users randomly into control (A) and treatment (B)

• Measure metric (conversion, clicks, revenue)

• Use statistical test to determine if difference is significant

Q8: Explain the difference between population and sample.
Population: Entire group you want to study (all customers).

Sample: Subset you actually measure (1000 surveyed customers).

We use samples because measuring entire population is often impossible.

Q9: What is a confidence interval?

Range where true value likely lies. '95% confidence interval of 45-55' means:

If we repeated this study 100 times, 95 of those intervals would contain the true value.

Q10: Normal distribution vs skewed distribution?

Normal: Symmetric bell curve, mean = median = mode.

Right-skewed: Long tail to right (income, house prices). Mean > Median.

Left-skewed: Long tail to left (age at retirement). Mean < Median.

PART 4: Business/Case Study Questions
Q1: How would you measure the success of a new product feature?
Framework:

1. Define the goal - What problem does this feature solve?

2. Choose metrics - Primary (e.g., conversion rate) + Secondary (e.g., time spent)

3. Set baseline - What's the current performance?

4. Run A/B test - Compare with control group

5. Analyze results - Statistical significance + practical significance

6. Consider tradeoffs - Did improving one metric hurt another?

Q2: Revenue dropped 10% this month. How would you investigate?
1. Verify data quality - Is the data correct? Any logging issues?

2. Segment analysis:

• By region - Is it all regions or specific ones?

• By product - Which products declined?

• By customer type - New vs returning?

• By channel - Web vs mobile vs app?

3. Time analysis - When did it start? Gradual or sudden?

4. External factors - Seasonality? Competitor actions? Economic conditions?

5. Internal changes - Price changes? Marketing spend? Site changes?

Q3: What metrics would you track for an e-commerce business?

Revenue metrics: GMV, Revenue, AOV (Average Order Value)

Conversion funnel: Visitors → Product views → Add to cart → Purchase

Customer metrics: CAC, LTV, Retention rate, Repeat purchase rate

Operational: Cart abandonment rate, Delivery time, Return rate

Q4: How would you explain correlation vs causation to a non-technical person?

'Just because two things happen together doesn't mean one causes the other.'

Example: Ice cream sales and drowning deaths both increase in summer. Ice cream doesn't cause
drowning - hot weather causes both.

To prove causation, you need controlled experiments where you change one thing and keep everything
else the same.

Q5: How do you prioritize what to analyze when you have many possible questions?
1. Business impact - What decisions will this inform?

2. Urgency - Is this blocking something important?

3. Feasibility - Do we have the data? How long will it take?

4. Stakeholder needs - Who asked for this and why?

Focus on high-impact + feasible items first.

PART 5: Behavioral Questions
Q1: Tell me about yourself.
Structure (2 minutes max):

• Present: Current role/what you're doing now

• Past: Relevant experience/education

• Future: Why this role/company

End with why you're excited about THIS opportunity.

Q2: Tell me about a project you're proud of.

Use STAR method:

• Situation: Context and problem

• Task: Your specific responsibility

• Action: What YOU did (be specific)

• Result: Quantified outcome (increased X by Y%)

Q3: How do you handle conflicting priorities?

'I clarify urgency and impact with stakeholders, communicate tradeoffs, and if needed, escalate to get
alignment on priority.'

Q4: Describe a time when your analysis was challenged.

Show you can:

• Listen without being defensive

• Validate concerns by re-checking your work

• Either defend with evidence OR acknowledge the error

• Learn from the experience

Q5: Why data analytics?

Be genuine. Talk about:

• What excites you about working with data

• A specific moment that sparked your interest

• How you see data driving real impact

Final Interview Tips

✓ Think out loud - Interviewers want to see your thought process

✓ Ask clarifying questions - It's expected and shows you're thorough

✓ Admit what you don't know - 'I'm not sure, but I would approach it by...'

✓ Practice SQL on paper - Many interviews don't have a computer

✓ Prepare 2-3 project stories - Know them inside out

✓ Research the company - Know their product, data challenges, recent news

You've got this. Prepare well, stay calm, and show them how you think. Good luck! ■

Markdown To PDF
No ratings yet
Markdown To PDF
50 pages
Top SQL & DSA Interview Questions
No ratings yet
Top SQL & DSA Interview Questions
16 pages
SQL and DSA Interview Questions Guide
No ratings yet
SQL and DSA Interview Questions Guide
15 pages
SQL Interview QA
No ratings yet
SQL Interview QA
11 pages
Wipro Data Analyst Interview Guide
No ratings yet
Wipro Data Analyst Interview Guide
29 pages
10 SQL Questions Interview PDF
No ratings yet
10 SQL Questions Interview PDF
5 pages
SQL Functions and Queries Overview
No ratings yet
SQL Functions and Queries Overview
9 pages
SQL Interview Questions for Data Analysts
No ratings yet
SQL Interview Questions for Data Analysts
14 pages
American Express Data Analyst Interview QnA 1763981443
No ratings yet
American Express Data Analyst Interview QnA 1763981443
30 pages
SQL Interview Preparation List
No ratings yet
SQL Interview Preparation List
36 pages
100 SQL Interview Questions & Answers
No ratings yet
100 SQL Interview Questions & Answers
33 pages
SQL Interview Questions for Data Analysts
No ratings yet
SQL Interview Questions for Data Analysts
15 pages
Essential Data Analyst Interview Questions
No ratings yet
Essential Data Analyst Interview Questions
15 pages
SQL Interview Questions for 10-20 LPA
No ratings yet
SQL Interview Questions for 10-20 LPA
15 pages
Difference Between HAVING and GROUP BY
No ratings yet
Difference Between HAVING and GROUP BY
15 pages
SQL Interview Questions & Answers Guide
No ratings yet
SQL Interview Questions & Answers Guide
9 pages
SQL Interview Questions for Data Analysts
No ratings yet
SQL Interview Questions for Data Analysts
17 pages
KPMG Data Analyst Interview SQL Guide
No ratings yet
KPMG Data Analyst Interview SQL Guide
30 pages
American Express Data Analyst SQL Guide
No ratings yet
American Express Data Analyst SQL Guide
24 pages
SQL Basics and Top 10 Questions Guide
No ratings yet
SQL Basics and Top 10 Questions Guide
8 pages
Advanced SQL Interview Questions Guide
No ratings yet
Advanced SQL Interview Questions Guide
6 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
22 pages
Interview Questions
No ratings yet
Interview Questions
39 pages
SQL Interview Questions & Answers Guide
No ratings yet
SQL Interview Questions & Answers Guide
18 pages
Data Analyst Interview Questions Guide
No ratings yet
Data Analyst Interview Questions Guide
14 pages
SQL Subquery Examples and Insights
No ratings yet
SQL Subquery Examples and Insights
4 pages
50 Essential SQL Questions & Answers
No ratings yet
50 Essential SQL Questions & Answers
9 pages
Top 50 SQL Interview Questions & Answers
No ratings yet
Top 50 SQL Interview Questions & Answers
6 pages
Intermediate SQL Problems Explained
No ratings yet
Intermediate SQL Problems Explained
8 pages
SQL Interview Questions for Data Analysts
No ratings yet
SQL Interview Questions for Data Analysts
17 pages
SQL Joins Interview War Notes
No ratings yet
SQL Joins Interview War Notes
105 pages
100 SQL Interview Questions & Answers
No ratings yet
100 SQL Interview Questions & Answers
12 pages
American Express SQL Interview Questions
No ratings yet
American Express SQL Interview Questions
15 pages
First Youtube Video
No ratings yet
First Youtube Video
13 pages
Top SQL Interview Questions for Beginners
No ratings yet
Top SQL Interview Questions for Beginners
16 pages
Azure Data Engineering Interview Questions
No ratings yet
Azure Data Engineering Interview Questions
29 pages
Understanding Clustered Index in SQL
No ratings yet
Understanding Clustered Index in SQL
12 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
8 pages
SQL Interview Questions & Answers Guide
No ratings yet
SQL Interview Questions & Answers Guide
9 pages
SQL Optimization Techniques for Data Analysts
No ratings yet
SQL Optimization Techniques for Data Analysts
11 pages
Barclays Data Analyst Interview Prep
No ratings yet
Barclays Data Analyst Interview Prep
5 pages
SQL Queries for Data Analysis Techniques
No ratings yet
SQL Queries for Data Analysis Techniques
17 pages
Tech Mahindra SQL Interview Insights
No ratings yet
Tech Mahindra SQL Interview Insights
11 pages
SQL Interview Questions & Answers Guide
No ratings yet
SQL Interview Questions & Answers Guide
167 pages
SQL Interview Questions for Data Analysts
No ratings yet
SQL Interview Questions for Data Analysts
26 pages
SQL Interview Questions for Analysts
No ratings yet
SQL Interview Questions for Analysts
9 pages
Essential SQL Questions for Interviews
No ratings yet
Essential SQL Questions for Interviews
8 pages
Cognizant SQL Interview Questions List
No ratings yet
Cognizant SQL Interview Questions List
4 pages
SQL Interview Questions and Answers Guide
No ratings yet
SQL Interview Questions and Answers Guide
9 pages
SQL Interview Questions Guide
No ratings yet
SQL Interview Questions Guide
13 pages
SQL Interview Questions for Data Analysts
No ratings yet
SQL Interview Questions for Data Analysts
12 pages
Essential SQL Interview Q&A Guide
No ratings yet
Essential SQL Interview Q&A Guide
4 pages
American Express SQL Interview Questions
No ratings yet
American Express SQL Interview Questions
14 pages
Basic SQL Interview Questions Guide
No ratings yet
Basic SQL Interview Questions Guide
18 pages
SQL Analytics Study Guide
No ratings yet
SQL Analytics Study Guide
30 pages
SQL Interview Questions for Data Engineers
No ratings yet
SQL Interview Questions for Data Engineers
15 pages
Data Analyst Interview Q&A: SQL Basics
No ratings yet
Data Analyst Interview Q&A: SQL Basics
26 pages
CELPIP Writing Task 1 Email Samples
No ratings yet
CELPIP Writing Task 1 Email Samples
8 pages
NACH Debit Mandate Cancellation Form
No ratings yet
NACH Debit Mandate Cancellation Form
1 page
Admissions 2023-24: PG Programmes Guide
No ratings yet
Admissions 2023-24: PG Programmes Guide
30 pages
Accounts Officer & Assistant Roles in Nepal
No ratings yet
Accounts Officer & Assistant Roles in Nepal
2 pages
Grade 5 Teacher's Program SY 2025-2026
No ratings yet
Grade 5 Teacher's Program SY 2025-2026
9 pages
Key Components of Environmental Management
No ratings yet
Key Components of Environmental Management
8 pages
Additional Premium Debit Note Details
No ratings yet
Additional Premium Debit Note Details
1 page
Laplace Transforms Overview
No ratings yet
Laplace Transforms Overview
21 pages
Articles of Partnership for Brixton Coffee
No ratings yet
Articles of Partnership for Brixton Coffee
5 pages
Financial Accounting III Assignment 2018
No ratings yet
Financial Accounting III Assignment 2018
4 pages
2023 Harford County Crash Report
No ratings yet
2023 Harford County Crash Report
2 pages
BE Project Report in Civil Engineering
No ratings yet
BE Project Report in Civil Engineering
7 pages
TEMS Investigation 16 0 Connectable Devices
No ratings yet
TEMS Investigation 16 0 Connectable Devices
8 pages
Overview of PPST Indicators: COIs & NCOIs
100% (2)
Overview of PPST Indicators: COIs & NCOIs
52 pages
Understanding Object-Oriented Programming
No ratings yet
Understanding Object-Oriented Programming
26 pages
COVID-19 Impact on Amazon's Economics
No ratings yet
COVID-19 Impact on Amazon's Economics
12 pages
Community Health Nursing Overview 2024
No ratings yet
Community Health Nursing Overview 2024
120 pages
Calculus and Its Applications 14th Edition Goldstein Solutions Manual No Waiting Time
100% (18)
Calculus and Its Applications 14th Edition Goldstein Solutions Manual No Waiting Time
74 pages
PIC/S GMP Guidelines Overview
No ratings yet
PIC/S GMP Guidelines Overview
4 pages
Abrogar v. Cosmos: Marathon Negligence Case
No ratings yet
Abrogar v. Cosmos: Marathon Negligence Case
3 pages
Ebook Is Your Part Numbering Scheme Costing You Millions
No ratings yet
Ebook Is Your Part Numbering Scheme Costing You Millions
15 pages
Hypebeast Economy: Trends and Insights
No ratings yet
Hypebeast Economy: Trends and Insights
11 pages
Database Sheet
No ratings yet
Database Sheet
10 pages
Akhil Eda Frontend Developer
No ratings yet
Akhil Eda Frontend Developer
6 pages
Ferroptosis in Gastric Cancer Review
No ratings yet
Ferroptosis in Gastric Cancer Review
34 pages
Loan Packages and Complaints Overview
No ratings yet
Loan Packages and Complaints Overview
10 pages
Transform Your Business with Burning Desire
No ratings yet
Transform Your Business with Burning Desire
10 pages
Tech Discovery Guide
No ratings yet
Tech Discovery Guide
11 pages
Understanding GPU in Computer Systems
No ratings yet
Understanding GPU in Computer Systems
2 pages
Multimodal Transport in India: Overview and Benefits
No ratings yet
Multimodal Transport in India: Overview and Benefits
7 pages