FAANG SQL Questions & Solutions Guide
FAANG SQL Questions & Solutions Guide
To find the top spender among customers in the last 90 days, a CTE is used to filter transactions within this timeframe and calculate the total spent by each customer. The query uses SUM(amount) to aggregate spending and DENSE_RANK() to rank customers based on total spending in descending order. The CTE's result filters for customers with a rank of 1, indicating the highest spender: WITH cte AS (SELECT customer_id, SUM(amount) AS total_spent, DENSE_RANK() OVER (ORDER BY SUM(amount) DESC) AS rnk FROM Orders WHERE order_date >= DATE_SUB(CURDATE(), INTERVAL 90 DAY)) SELECT customer_id, total_spent FROM cte WHERE rnk = 1 .
The retrieval of the second highest salary per department uses a Common Table Expression (CTE) with DENSE_RANK as the primary SQL function. The query assigns a rank to salaries within each department using DENSE_RANK(), ordered by salary in descending order. The CTE filters out the salaries to keep only those ranked second. The final selection joins the results with a Departments table to fetch department names via a join operation: WITH cte AS (SELECT name, salary, department_id, DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rnk FROM Employees) SELECT e.name, e.salary, d.dept_name FROM cte e JOIN Departments d ON e.department_id = d.id WHERE rnk = 1 .
Window functions like DENSE_RANK() provide significant benefits over traditional sorting and grouping because they allow for more nuanced data manipulation and analysis without reducing the dataset's size. Unlike grouping (GROUP BY), which collapses rows, window functions calculate rankings and other aggregates across the entire dataset within specified partitions. This permits retaining complete row details while adding additional analytical layers, such as rankings or moving averages. DENSE_RANK() specifically helps address ties more effectively by assigning the same rank to equal values, unlike ROW_NUMBER(), which would differentiate them sequentially, thus providing more meaningful insights in scenarios requiring rank-based analysis .
Effective SQL strategies for dealing with large datasets to identify top performers include using index optimizations to expedite data retrieval, applying window functions like RANK() or DENSE_RANK() to dynamically analyze data, and employing partitioning to manage data subsets efficiently. Implementing CTEs for logical separation of query components also aids in managing performance. Strategic use of WHERE clauses narrows datasets before processing, decreasing computation loads. Finally, leveraging analytical functions over raw data aggregation permits insight without limiting dataset visibility, which is crucial in performance and high-value entry analysis .
To calculate the longest order streak for each customer, a window function combined with the concept of grouping sequential data through row numbering is used. A CTE is created to assign each order a group number by subtracting the row number from the order date, ensuring orders on consecutive dates group together. The query counts the number of orders per group per customer to determine the length of each streak. The result is then ordered by customer and grouped sequence: WITH streaks AS (SELECT customer_id, order_date, order_date - INTERVAL ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) DAY AS grp FROM Orders) SELECT customer_id, COUNT(*) AS longest_streak FROM streaks GROUP BY customer_id, grp ORDER BY customer_id .
SQL can differentiate between tied scores within a dataset while maintaining a consistent ranking sequence using the DENSE_RANK() function. Unlike RANK(), which creates gaps in ranks for ties, DENSE_RANK() assigns the same rank to identical scores but does not skip subsequent ranks. This ensures a continuous sequence where each score ranks correctly without leaving gaps. The syntax typically used is DENSE_RANK() OVER (ORDER BY [score_column] DESC), and the dataset must be ordered based on the scoring criteria to apply ranks appropriately .
Potential issues with using LEAD() and LAG() functions include performance hits due to large data volumes and logic errors when partitioning is improperly configured. Since these functions access subsequent or previous rows, they can cause inefficiencies without proper indexing or limiting datasets using WHERE clauses. Logical errors occur when incorrect partitioning or ordering happens, leading to unintended cross-reference across unrelated groupings. Mitigation strategies involve optimizing query conditions to limit unnecessary calculations, using appropriate indices for partition sorting, and ensuring correct partition and order criteria to maintain logical consistency in time series or sequential comparison tasks .
The LEAD() function is crucial for identifying consecutive login streaks as it allows the query to compare each user's login date with the subsequent two login dates. It computes the next two activity dates for each user ordered by the activity_date field. The logic then checks if the difference between these dates is exactly one and two days, respectively, confirming a consecutive streak. When both conditions are met, the user ID is selected as part of the result. This window function is essential for prefetching the next row's data and simplifies comparing consecutive rows: SELECT DISTINCT user_id FROM (SELECT user_id, activity_date, LEAD(activity_date,1) OVER (PARTITION BY user_id ORDER BY activity_date) AS next1, LEAD(activity_date,2) OVER (PARTITION BY user_id ORDER BY activity_date) AS next2 FROM Activity) t WHERE DATEDIFF(next1, activity_date) = 1 AND DATEDIFF(next2, activity_date) = 2 .
SQL's CTE enhances the readability and manageability of complex queries by allowing the breaking down of queries into modular parts. It provides a way to write more organized and understandable queries by creating temporary result sets referenced within the main query. Through CTEs, particularly when handling recursive or deeply nested SQL logic, a programmer can define separate logic units that clarify each step's function and contribute to overall query clarity and maintenance. Furthermore, CTEs can be used to avoid repeated subquery computations, which optimizes the query execution plan by focusing on necessary data transformations in a clear, orderly manner .
The ORDER BY clause enables ordering of results by their aggregated totals when calculating total sales per customer, using the syntax ORDER BY total_amount DESC. This is beneficial because it allows for prioritization of customers based on their contribution to sales, thereby facilitating insights into customer spending behavior or identifying significant revenue sources. In cases where total amounts are aggregated, such as SUM(amount), this clause ensures that the most critical data is presented prominently, which supports targeted marketing or prioritization in business strategies .