Top 20
Data Analyst
Interview Questions
Asked at Infosys
in 2025
- By Ananya Laha
Top 20 Data Analyst Interview Questions and Answers
commonly asked at Infosys in 2025:
1. How would you identify and handle missing data in a dataset?
Answer:
To handle missing data:
• Identification: Use methods like isnull() in Python or IS NULL in
SQL to detect missing values.
• Handling:
o Removal: If the missing data is minimal, remove the
affected rows or columns.
o Imputation: Fill missing values using mean, median,
mode, or predictive models.
o Flagging: Create an indicator variable to flag missing
values for further analysis.
2. Explain the difference between INNER JOIN and LEFT JOIN in SQL.
Answer:
• INNER JOIN: Returns records that have matching values in both
tables.
• LEFT JOIN: Returns all records from the left table and the
matched records from the right table. If no match, NULLs are
returned for right table's columns.
3. Describe a situation where you optimized a slow-running SQL
query.
Answer:
In a previous project, a report was taking over 10 minutes to run. I
optimized it by:
• Indexing: Added indexes to frequently filtered columns.
• Query Refactoring: Rewrote subqueries as JOINs.
• Limiting Data: Retrieved only necessary columns and filtered
data early in the query.
This reduced execution time to under a minute.
4. What is normalization? Why is it important in databases?
Answer:
Normalization is the process of organizing data to reduce redundancy
and improve data integrity. It involves dividing a database into two or
more tables and defining relationships between them. This ensures:
• Efficient data storage.
• Easier maintenance.
• Prevention of data anomalies.
5. How would you explain a complex data analysis project to a non-
technical stakeholder?
Answer:
I would:
• Simplify Language: Avoid technical jargon.
• Use Visuals: Present findings through charts and graphs.
• Relate to Business Impact: Explain how the analysis affects
business decisions or outcomes.
6. What are the different types of data analytics?
Answer:
• Descriptive Analytics: What happened?
• Diagnostic Analytics: Why did it happen?
• Predictive Analytics: What is likely to happen?
• Prescriptive Analytics: What should be done?
7. How do you ensure data quality in your analysis?
Answer:
• Validation: Check for accuracy, completeness, and consistency.
• Cleaning: Remove duplicates, handle missing values, and
correct errors.
• Documentation: Maintain clear records of data sources and
transformations.
8. Explain the concept of a data warehouse.
Answer:
A data warehouse is a centralized repository that stores integrated
data from multiple sources. It supports analytical reporting,
structured queries, and decision-making processes.
9. What is the difference between OLAP and OLTP?
Answer:
• OLAP (Online Analytical Processing): Used for complex queries
and data analysis.
• OLTP (Online Transaction Processing): Used for managing
transactional data like insertions, updates, and deletions.
10. Describe a time when you had to learn a new tool or technology
quickly.
Answer:
In a previous role, I was required to use Power BI for dashboard
creation. I took online courses and practiced with sample datasets,
enabling me to build interactive dashboards within a week.
11. How do you handle outliers in your data?
Answer:
• Identification: Use statistical methods like Z-scores or IQR.
• Handling:
o Removal: If outliers are errors.
o Transformation: Apply log transformation.
o Segmentation: Analyze separately if they represent a
significant segment.
12. What is the purpose of A/B testing?
Answer:
A/B testing compares two versions of a variable to determine which
performs better. It's used to make data-driven decisions in areas like
marketing strategies or UI changes.
13. How do you prioritize tasks when handling multiple projects?
Answer:
• Assessment: Evaluate the urgency and impact of each task.
• Planning: Use tools like Gantt charts or Kanban boards.
• Communication: Regularly update stakeholders on progress
and any changes.
14. Explain the difference between structured and unstructured
data.
Answer:
• Structured Data: Organized in rows and columns (e.g.,
databases).
• Unstructured Data: Lacks a predefined format (e.g., emails,
social media posts).
15. What is a KPI, and how do you determine which KPIs to track?
Answer:
A Key Performance Indicator (KPI) measures performance against
objectives. To determine KPIs:
• Align with business goals.
• Ensure they are measurable and actionable.
• Regularly review and adjust as needed.
16. Describe a challenging data analysis problem you faced and how
you solved it.
Answer:
I encountered inconsistent data formats from multiple sources. I
standardized the data using transformation scripts and created a
validation process to ensure consistency, which improved data
reliability.
17. How do you stay updated with the latest trends in data
analytics?
Answer:
• Continuous Learning: Enroll in online courses and certifications.
• Networking: Attend industry conferences and webinars.
• Reading: Follow relevant blogs, journals, and publications.
18. What is the significance of data visualization in analytics?
Answer:
Data visualization helps in:
• Simplifying complex data.
• Identifying patterns and trends.
• Communicating insights effectively to stakeholders.
19. How do you handle data from different sources with varying
formats?
Answer:
• Integration: Use ETL (Extract, Transform, Load) processes.
• Standardization: Convert data into a common format.
• Validation: Ensure data consistency and accuracy post-
integration.
20. What tools and technologies are you proficient in for data
analysis?
Answer:
I am proficient in:
• Programming Languages: Python, SQL.
• Visualization Tools: Power BI, Tableau.
• Databases: MySQL, PostgreSQL.
• Others: Excel, SAS, R.