0% found this document useful (0 votes)

6 views23 pages

LLM Systems

The document is a comprehensive guide on LLM systems, covering architecture, training, inference, and system design. It is aimed at candidates preparing for interviews and engineers building LLM systems, providing insights into interview expectations and technical foundations. The content includes chapters on various aspects of LLM systems, from transformer architecture to evaluation metrics.

Uploaded by

aggarwal.kapil.2013

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views23 pages

LLM Systems

Uploaded by

aggarwal.kapil.2013

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

LLM System Interview

A Guide to Architecture, Training, Inference, and System Design

Author: Hao Hoang

Institute: AI Interview Prep
Date: April 25, 2026
Version: First edition, 2026
Focus: Applied AI/ML and LLM systems

[Link]@[Link] • [Link] • LinkedIn • Substack

Contents

Preface vi

How to Use This Book vii

For Candidates Preparing for Interviews x

For Engineers Building LLM Systems xiii

Notation and Symbols xvi

Acknowledgments xxi

About the Author xxii

I Overview and Interview Landscape 1

Chapter 1 The LLM Systems Interview 2

1.1 What Companies Actually Ask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Roles That Require LLM Systems Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Chapter 2 How to Approach an LLM System Design Question 8

2.1 A Repeatable Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Trade-Off Axes You Must Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Common Pitfalls in Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

II Foundations: Transformer Architecture and Training Signals 16

Chapter 3 Transformer Architecture Interview Essentials 17

3.1 The Baseline Decoder-Only Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Normalization Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Activation Functions in Modern LLMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Position Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Chapter 4 Hyperparameters You Will Be Asked to Justify 40

4.1 Width and Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Attention Head Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Vocabulary Size and Tokenization Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4 Regularization in Pre-Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Chapter 5 Stability Tricks in Large-Scale Training 63

5.1 Softmax Instabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Numerical Precision and Training Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
CONTENTS

III Mixture of Experts and Sparse Architectures 76

Chapter 6 Why MoE Is Now Standard in Frontier Systems 77

6.1 Dense vs. Sparse Models for Fixed FLOPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2 Routing: The Heart of Every MoE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Chapter 7 MoE Training and Systems Considerations 89

7.1 Fine-Grained and Shared Experts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.2 Load Balancing and Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.3 Expert Parallelism Gotchas for Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

IV GPUs, Kernels, and Single-Device Performance 107

Chapter 8 GPU Architecture for LLM Engineers 108

8.1 The Compute and Memory Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8.2 Arithmetic Intensity and the Roofline Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

Chapter 9 Making a Single GPU Go Fast 121

9.1 Reducing Memory Movement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
9.2 Exploiting Memory Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
9.3 Lower Precision and Quantization-Aware Training . . . . . . . . . . . . . . . . . . . . . . . . 133

Chapter 10 Writing and Benchmarking Custom Kernels 139

10.1 Benchmarking and Profiling Discipline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
10.2 Implementing Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
10.3 FlashAttention as a Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

V Distributed Training: Parallelism at Scale 159

Chapter 11 Multi-GPU and Multi-Node Fundamentals 160

11.1 Interconnect Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
11.2 Collective Communication Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
11.3 The Software Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

Chapter 12 Parallelism Strategies 178

12.1 Data Parallelism and ZeRO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
12.2 Model Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
12.3 Activation and Sequence Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
12.4 Putting It All Together: 3D and 4D Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . 196

VI Scaling Laws and Training Economics 203

Chapter 13 Predictable Scaling for Interview Answers 204

13.1 The Three Canonical Scaling Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

ii
CONTENTS

13.2 Using Scaling Laws to Make Design Decisions . . . . . . . . . . . . . . . . . . . . . . . . . 210

Chapter 14 Chinchilla and Beyond 216

14.1 Compute-Optimal Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
14.2 Inference-Aware Token-to-Parameter Ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
14.3 Maximal Update Parameterization (muP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

VII Inference Systems 234

Chapter 15 The Inference Workload 235

15.1 Prefill vs. Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
15.2 Latency, Throughput, and Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

Chapter 16 Reducing the KV Cache 247

16.1 Attention Variants for Cheaper Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
16.2 Cross-Layer and Local Attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

Chapter 17 Going Beyond the Transformer for Inference 258

17.1 State-Space and Linear-Attention Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
17.2 Non-Autoregressive Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

Chapter 18 Speculative Decoding and Serving Optimizations 270

18.1 Speculative Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
18.2 Serving System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

Chapter 19 Compression Techniques for Deployment 283

19.1 Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
19.2 Pruning and Distillation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

VIII Data: The Real Differentiator 294

Chapter 20 Pre-Training Data Pipelines 295

20.1 Where Training Data Comes From . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
20.2 Evolution of Open Pre-Training Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
20.3 Legal and Ethical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

Chapter 21 Data Filtering and Deduplication Algorithms 312

21.1 Quality Filtering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
21.2 Targeted Filtering Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
21.3 Deduplication at Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323

Chapter 22 Mid-Training and Post-Training Data 330

22.1 Instruction and Chat Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
22.2 Long-Context and Domain Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
22.3 Data Quality Heuristics That Matter in Interviews . . . . . . . . . . . . . . . . . . . . . . . . 342

iii
CONTENTS

IX Evaluation 349

Chapter 23 Designing an Evaluation 350

23.1 Goals of Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
23.2 Metrics and Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355

Chapter 24 Benchmarks You Must Know 362

24.1 Knowledge and Reasoning Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
24.2 Instruction Following and Chat Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
24.3 Agentic and Safety Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374

Chapter 25 Validity, Contamination, and Real-World Use 380

25.1 Train-Test Overlap and Contamination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
25.2 Real-World Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385

X Alignment and Post-Training 391

Chapter 26 Supervised Fine-Tuning 392

26.1 What SFT Can and Cannot Teach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
26.2 SFT Data in Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398

Chapter 27 Preference-Based Alignment (RLHF) 405

27.1 The RLHF Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
27.2 Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
27.3 Pitfalls of RLHF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417

Chapter 28 Reinforcement Learning from Verifiable Rewards 423

28.1 From RLHF to RLVR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
28.2 Policy Gradient Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
28.3 Case Studies in Reasoning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435

XI End-to-End System Design Drills 441

Chapter 29 Building a Production LLM Serving Stack 442

29.1 Reference Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
29.2 Capacity Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448

Chapter 30 Designing a Pre-Training Run from Scratch 454

30.1 From Compute Budget to Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
30.2 Operations and Failure Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460

Chapter 31 Designing a Fine-Tuning and Alignment Pipeline 466

31.1 Scoping the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
31.2 Operational Concerns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472

iv
CONTENTS

Appendix A Napkin Math for LLM Interviews 478

A.1 Parameter Counts and Memory Footprints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
A.2 FLOPs per Forward and Backward Pass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
A.3 KV-Cache Sizing and Latency Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481

Appendix B Common Interview Questions 483

B.1 Architecture and Training Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
B.2 Inference and Serving Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
B.3 Alignment and Evaluation Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487

Appendix C Checklists for System Design Answers 490

C.1 The 10-Minute LLM System Design Checklist . . . . . . . . . . . . . . . . . . . . . . . . . . 490
C.2 The Pre-Training Run Readiness Checklist . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
C.3 The Inference Deployment Readiness Checklist . . . . . . . . . . . . . . . . . . . . . . . . . 492

Appendix D Further Reading 495

D.1 Canonical Papers by Topic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
D.2 Blogs, Talks, and Engineering Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501

v
Preface

This book exists because the interview and the job have converged in a way they did not five years ago.
Until recently, most software engineering interviews tested general systems-design intuition: databases,
caches, load balancers, message queues. The underlying assumption was that specialized domain knowledge
could be learned on the job once a strong generalist was hired. That assumption has broken down at frontier
AI companies. A candidate who cannot reason about KV cache memory pressure, arithmetic intensity on a
GPU, or the difference between compute-bound prefill and memory-bandwidth-bound decoding will not pass a
senior-level systems interview at the teams building today’s most important models. The domain has become
so operationally specific that generalist intuition is no longer sufficient, and no single paper, blog post, or course
covers it end to end.
The gap this book fills. Academic courses teach the theory of transformers; they rarely explain why GQA
with Nkv = N/8 reduces KV cache memory by 8× and how that changes the feasible batch size on a single
H100. Research papers present results; they do not explain how to answer “walk me through how you would
size the hardware for a 70B model serving 5,000 QPS” in the 45 minutes of a live interview. Engineering blogs
cover individual components in depth; they do not provide the unified arithmetic framework that lets you move
fluidly from model parameters to memory budgets to parallelism strategies to cost estimates. This book does.
What this book is. It is a technical reference and an interview preparation guide written for engineers who
already know how to code and who already understand basic ML. Every chapter develops the arithmetic behind
a topic from first principles, shows how that arithmetic connects to real hardware constraints, and then frames
the result as an interview question with a worked answer. The appendices collect the most important formulas
for fast reference during active preparation.
What this book is not. It is not an introduction to deep learning. It does not teach backpropagation or
explain what a transformer is from scratch. It does not survey the research literature comprehensively. Readers
who need that foundation first should work through a course on neural networks before returning here. It is also
not a prescriptive guide to any specific company’s interview process-the frameworks and derivations are generic
because the physics of hardware and the economics of inference are generic.
Coverage. The book follows a deliberate progression. Part I maps the interview landscape and introduces
the four-question framework that structures every design answer. Parts II through IV build the technical founda-
tion: transformer architecture, training hyperparameters, GPU hardware, and kernel-level performance. Parts V
and VI scale that foundation across distributed training and scaling laws. Part VII covers the inference stack
in depth, from the prefill-decode split through KV cache optimizations, speculative decoding, and compres-
sion. Parts VIII through X address the three domains that interviewers probe as differentiators-data pipelines,
evaluation design, and alignment methods. Part XI synthesizes everything into three full end-to-end design
drills.
Using derivations, not memorization. Every number in this book is derived, not asserted. The estimate
that a 7B model in bf16 requires approximately 13 GB of inference memory follows from 2P bytes, with P =
6.7 × 109 . The estimate that a single H100 decode step takes roughly 8 ms at batch size 1 follows from dividing
Mweights by the HBM bandwidth. These derivations are the point. An interviewer who changes one constraint-
“now make it a 13B model”-expects a candidate who can update the arithmetic in real time, not one who has
memorized a table. First principles produce that flexibility; memorization does not.
How to Use This Book

The book supports three distinct uses: interview preparation, on-the-job reference, and systematic study of
LLM systems. Read it linearly if you are building up the subject from scratch. Follow a targeted path if you are
preparing for a specific role or plugging specific gaps.

Linear Reading Order

The parts are ordered by dependency. Each part assumes the vocabulary and arithmetic developed in all
prior parts.

Part Topic Prerequisite

I Interview landscape and framework None

II Transformer architecture and training None
III Mixture of Experts Part II
IV GPU architecture and custom kernels Part II
V Distributed training and parallelism Parts II, IV
VI Scaling laws and training economics Parts II, V
VII Inference systems Parts II, IV, VI
VIII Data pipelines Part II
IX Evaluation Parts II, VIII
X Alignment and post-training Parts II, VIII, IX
XI End-to-end design drills All prior parts

Appendix A (parameter counts, FLOPs, KV cache formulas) and Appendix C (checklists) are designed to be
printed and kept at your desk during active preparation. Appendix B is a curated question bank organized by
topic. Appendix D is an annotated reading list of papers and engineering reports.

Targeted Reading Paths

Path 1 - Candidate Preparing for an Inference / Serving Interview

Focus: memory arithmetic, KV cache, latency-throughput trade-offs, speculative decoding, quantization.
1. Chapter 2 (framework) - read fully, internalize the four-axis requirement template.
2. Chapter 3, Sections 1-2 (transformer essentials, parameter count).
3. Appendix A (reference formulas) - commit to memory.
4. Chapter 8 (GPU architecture, roofline model, arithmetic intensity).
5. Chapter 15 (the inference workload: prefill vs. decode, latency vs. throughput).
6. Chapter 16 (KV cache reduction: GQA, MQA, MLA).
7. Chapter 18 (speculative decoding and continuous batching).
8. Chapter 19 (quantization and compression).
9. Chapter 29 (full end-to-end serving design drill).
How to Use This Book

Path 2 - Candidate Preparing for a Training / Infrastructure Interview

Focus: distributed training, parallelism strategies, memory budgets, scaling laws, resilience.
1. Chapter 2 (framework).
2. Chapter 3 (architecture) and Chapter 4 (hyperparameters).
3. Chapter 5 (training stability).
4. Chapter 8 (GPU architecture).
5. Chapter 11 (multi-GPU fundamentals: collectives, bandwidth).
6. Chapter 12 (parallelism strategies: TP, PP, DP, ZeRO/FSDP).
7. Chapter 13 (scaling laws and napkin math).
8. Chapter 14 (Chinchilla and compute-optimal training).
9. Appendix A (FLOPs and memory formulas).
10. Chapter 30 (end-to-end pre-training design drill).

Path 3 - Candidate Preparing for an Applied / Product Engineering Interview

Focus: serving economics, fine-tuning, alignment, evaluation, end-to-end system design.
1. Chapter 1 (interview landscape, role definitions).
2. Chapter 2 (framework).
3. Chapter 15 (inference workload fundamentals).
4. Chapter 19 (compression and deployment).
5. Chapter 20-22 (data pipelines and data quality).
6. Chapter 23-25 (evaluation design, benchmarks, contamination).
7. Chapter 26 (supervised fine-tuning).
8. Chapter 27 (RLHF and preference alignment).
9. Chapter 31 (fine-tuning and alignment pipeline drill).

Chapter Diﬀiculty
Chapters are written at three levels. The level is not labeled explicitly because the same content serves both
preparation and reference; the depth at which you engage with it is the variable.
Conceptual (most of Parts I, II, VIII, IX, X) Develops vocabulary, mental models, and interview framing.
Suitable as first-pass reading and as review before an interview day.
Arithmetic (most of Parts IV, VI, VII; Appendix A) Derives quantitative estimates from first principles. Re-
quires pencil and paper. Work through the examples actively rather than reading passively.
Implementation (most of Parts III, V; Chapter 10) Discusses kernel-level, compiler-level, or systems-level
mechanics. Assumed background: familiarity with PyTorch, CUDA concepts, and distributed training
frameworks.

Conventions in Each Chapter

Every chapter follows the same internal structure:
The Take - the single most important insight for an interview, stated in one paragraph.
Technical content - derivations, diagrams, or system descriptions.

viii
How to Use This Book

How This Shows Up - two or three representative interview questions with annotated strong answers.
Key Takeaways - a short bullet list of the points an interviewer will probe.
Questions in How This Shows Up appear in italics. Strong-answer annotations focus on structure-what to say
first, what to derive, what follow-up to anticipate-rather than providing a script. Interviewers at frontier labs
probe depth by changing constraints; the annotations show which numbers to re-derive when that happens.

ix
For Candidates Preparing for Interviews

The LLM systems interview rewards candidates who derive answers, not candidates who recall them. Every
technique in this section is aimed at building that derivation habit before you sit down with an interviewer.

What the Interview Actually Tests

A senior-level LLM systems interview has three observable signals interviewers are collecting simultane-
ously.
First: vocabulary precision. “The KV cache grows with sequence length” is not an answer. “At 4,096
tokens, a single sequence in a 7B GQA model with 8 KV heads and head dimension 128 costs approximately
0.17 GB per sequence in bf16” is. The arithmetic formulas in Appendix A are the vocabulary; the chapters
explain where each term comes from.
Second: constraint-driven reasoning. Every design question has a binding constraint-usually KV cache
memory pressure for serving, usually communication overhead for training. Strong candidates name the bind-
ing constraint before proposing any optimization. Candidates who list optimizations without identifying the
bottleneck signal that they have read blog posts, not reasoned from hardware physics.
Third: adaptability under constraint changes. The most reliable signal in a live interview is what
happens when the interviewer says “what if context length doubles?” or “what if you now need to support
five model variants?” A candidate who derived their original answer updates the arithmetic immediately. A
candidate who memorized an architecture has to restart. This is why every technical chapter in this book teaches
derivation, not recall.

A Four-Week Preparation Plan

This plan assumes roughly two hours of active study per day. Adjust the pacing to your timeline; the
sequence of topics is fixed, but the time budget per topic is flexible.
Week 1 - Foundation Read Chapters 1-5 and Appendix A fully. After each chapter, close the book and re-
derive the key formulas from memory on paper. Target: be able to estimate model memory footprint
(inference and training), forward-pass FLOPs, and KV cache size for any model configuration given to
you verbally.
Week 2 - Hardware and Distributed Systems Read Chapters 8-12. Work through every numerical example
with a calculator. Practice explaining the roofline model out loud to an imaginary interviewer. Spend extra
time on Chapter 12 (parallelism strategies): draw the TP/PP/DP combination diagram from memory until
it takes less than two minutes.
Week 3 - Inference and Scaling Read Chapters 13-19. Focus on Chapter 15 (the inference workload) and
Chapter 18 (speculative decoding and serving). For each optimization technique in Chapters 16-19, write
one sentence explaining which constraint it addresses and what it trades away. Read Appendix C (check-
lists) and internalize the ten-minute system design checklist.
Week 4 - Integration and Drills Read Chapters 20-28 at pace (one per day). Spend the last three days on
Part XI (Chapters 29-31), working each drill as a timed mock interview: 45 minutes, whiteboard or
For Candidates Preparing for Interviews

paper, no references. After each drill, compare your answer against the chapter’s annotated response and
identify which constraints you named late or which numbers you could not derive.

How to Drill a Chapter

Passive reading does not build the derivation habit. Use this sequence for every chapter in the arithmetic
and implementation tiers.
1. Read the chapter fully once, following along with any derivations.
2. Cover the page. Write down the key formula from memory.
3. Plug in a different model configuration (change d, change L, change Nkv ) and re-derive the result.
4. Answer the “How This Shows Up” questions from the chapter out loud, targeting 90 seconds per answer
before expanding to the full 5-minute version.
5. Read the Key Takeaways bullet list and verify you can explain each point without the book.

Timing Your Answers

A 45-minute design interview has roughly the following pacing for a strong candidate:

Phase Time What you are doing

Requirements clarification 3-5 min Name TTFT, throughput, quality, cost; ask which is binding
End-to-end pipeline sketch 5-8 min Tokenize → prefill → decode → scheduler; label bottlenecks
Binding constraint analysis 5-7 min Derive KV cache footprint or memory budget; identify the limit
Targeted optimizations 10-15 min Propose GQA, speculative decoding, quantization as constraint responses
Trade-off discussion 5-8 min Answer the interviewer’s constraint-change follow-ups
Wrap-up 2-3 min Summarize and invite feedback

The requirement clarification phase is the most commonly skipped and the most damaging to skip. Interviewers
at staff and principal level watch for it explicitly.

Common Failure Modes

Cargo-culting numbers without derivation Stating “Chinchilla says 20 tokens per parameter” without being
able to derive where the 20 comes from, or what changes when inference cost is amortized over more
queries. Chapter 14 derives this ratio and explains when it does not apply.
Conflating latency and throughput Describing TTFT and tokens-per-second as if they optimize together.
They do not; Chapter 15 derives why and what each responds to.
Listing optimizations before naming the bottleneck Proposing speculative decoding, FlashAttention, and
quantization in the same breath before establishing whether the system is KV-cache-constrained, compute-
constrained, or communication-constrained. Chapter 2 provides the framework for identifying the con-
straint first.
Treating batch size as a free variable Failing to recognize that batch size is bounded by VRAM and that
both data parallelism and pipeline parallelism consume it as a resource. Appendix A and Chapter 12
derive the precise constraints.

xi
For Candidates Preparing for Interviews

Ignoring inference economics during training design Designing a training run without considering the serv-
ing cost of the resulting model. Chapter 13 establishes why inference cost is a first-class training con-
straint.

Using the Appendices

Appendix A (formulas) is the most important reference during active preparation. Print it. Put it next to
your desk. After two weeks of preparation you should not need to look at it; if you still do, spend another session
re-deriving each formula from scratch.
Appendix B (question bank) is organized by topic. Use it to simulate a 45-minute interview: pick one
design question and one deep-dive question from the same topic area, set a timer, and answer both without
references.
Appendix C (checklists) provides a ten-minute LLM system design checklist, a pre-training run readi-
ness checklist, and an inference deployment readiness checklist. Internalize the design checklist until you can
reproduce it verbally in under two minutes.
Appendix D (reading list) is for candidates preparing at the staff or principal level, where interviewers
expect familiarity with the primary literature. Papers are annotated with the specific claims you are expected to
reproduce, not just name.

xii
For Engineers Building LLM Systems

This book is organized around interview questions, but its technical content is not interview-specific.
The arithmetic behind batch-size constraints, the parallelism trade-offs in distributed training, the memory-
bandwidth math behind KV cache sizing-these are the same calculations you do on the job. This section maps
the book’s chapters to the decisions you encounter in production.

When to Reach for This Book

You are sizing a serving fleet for a new model Start with Appendix A (KV cache formula, memory foot-
print) and Chapter 15 (the inference workload, latency-throughput trade-offs). The KV-crossover batch
size B ∗ = Mweights /Ckv is the first number to compute; it determines whether you are in the superlinear
or sublinear throughput regime before you touch any configuration.
You are choosing a parallelism strategy for a training run Chapter 12 derives the memory and communi-
cation cost of every combination: tensor-parallel, pipeline-parallel, data-parallel, ZeRO stages 1-3, and
FSDP. The worked examples are parameterized so you can substitute your own model size and cluster
topology.
You are debugging a training run that is not hitting expected MFU Chapter 8 (arithmetic intensity and the
roofline model) and Chapter 11 (collective communication bandwidth and all-reduce timing) are the di-
agnostic starting points. Low MFU is almost always communication-bound, memory-bound, or pipeline-
bubble-bound; the chapters give you the arithmetic to distinguish the three cases.
You are evaluating a KV cache optimization (GQA, MLA, paged attention) Chapter 16 derives the mem-
ory reduction and the resulting change in maximum batch size for each technique. The formulas let you
estimate the throughput gain on your specific hardware and model before running any experiment.
You are designing an evaluation harness for a fine-tuned model Chapters 23-25 cover evaluation design
from first principles: metric selection, contamination detection, benchmark validity, and the gap between
benchmark performance and deployment behavior. Chapter 25 covers domain-specific evaluation design
(medical, legal, code) and deployment telemetry.
You are deciding between SFT, DPO, and RLVR for a post-training objective Chapter 26 (SFT), Chapter 27
(DPO and RLHF), and Chapter 28 (GRPO and RLVR) each cover the method’s mechanics, its data re-
quirements, its failure modes, and the scenarios where it is the right choice. Chapter 31 synthesizes the
decision into an end-to-end alignment pipeline design.
For Engineers Building LLM Systems

Chapter Map by Engineering Decision

Decision / Problem Primary chapters

Model memory footprint (inference) Appendix A, Ch. 3

Model memory footprint (training) Appendix A, Ch. 9
Forward-pass and training FLOPs Appendix A, Ch. 13
KV cache sizing and batch limits Appendix A, Ch. 15, 16
GPU roofline and arithmetic intensity Ch. 8
FlashAttention and kernel fusion Ch. 9, 10
Collective communication cost Ch. 11
Parallelism strategy selection Ch. 12
Compute-optimal token budget Ch. 13, 14
Inference latency-throughput curve Ch. 15
GQA / MQA / MLA trade-offs Ch. 16
State-space and hybrid architectures Ch. 17
Speculative decoding setup Ch. 18
Quantization (INT8, INT4, FP8) Ch. 19
Pre-training data pipeline design Ch. 20, 21
Instruction and preference data curation Ch. 22
Evaluation metric selection Ch. 23
Benchmark selection and contamination Ch. 24, 25
SFT data and training recipe Ch. 26
Reward modeling and RLHF Ch. 27
GRPO and verifiable reward RL Ch. 28
End-to-end serving stack design Ch. 29
End-to-end pre-training design Ch. 30
End-to-end fine-tuning pipeline Ch. 31

The Arithmetic Is the Point

Every number in production LLM engineering is derivable. When a colleague states that “a 70B model
needs at least 4 H100s to serve,” the correct response is to ask whether that accounts for KV cache at the target
batch size and context length, not to accept it as a fact. The derivation: Mweights = 2 × 70 × 109 ≈ 140 GB in
bf16, spread across d140/80e = 2 H100s for weights alone, but a realistic batch size and context window can
double that figure.
Chapter 15 and Appendix A derive these estimates in full, including the formulas for Bmax (VRAM-limited
batch size), B ∗ (KV-crossover batch size), and tstep (decode step latency as a function of batch size and model
size). These are not rules of thumb; they are consequences of memory capacity and bandwidth, and they update
correctly when you change the model, the hardware, or the context length.

xiv
For Engineers Building LLM Systems

Connecting the Book to the Engineering Literature

The book’s technical content is grounded in papers and engineering reports that have shaped the current
state of production LLM systems. Where a chapter’s content derives from a specific paper, the source is cited
inline. Appendix D provides an annotated reading list organized by topic, with each entry annotated to indicate
which specific claims are worth understanding in depth rather than simply citing by name.
The FlashAttention algorithm (Chapters 9 and 10), the Chinchilla scaling law (Chapter 14), continuous
batching (Chapter 18), and the GRPO training algorithm (Chapter 28) are examples where reading the original
paper alongside the corresponding chapter will deepen your understanding of the design choices behind the
method. The reading list indicates which papers repay that deeper reading and which are sufficient to know at
the result level.

Staying Current
The LLM systems field moves fast. The core arithmetic and hardware physics in this book-memory band-
width, arithmetic intensity, roofline analysis, parallelism trade-offs-are stable. Specific techniques (attention
variants, serving schedulers, compression methods) continue to evolve. When a new technique is announced,
the most reliable way to evaluate it is to ask: which constraint does it address, and what does it trade away? That
question is the same one this book trains you to ask, and it does not go stale.

xv
Notation and Symbols

This chapter standardizes the symbols, abbreviations, and conventions used throughout the book. Defini-
tions are precise; where a symbol carries multiple meanings in the literature, the book’s chosen convention is
stated explicitly. Units are always written out on first use in each chapter; the table below lists the canonical
forms.

Model and Architecture

Symbol Meaning Notes

P Total parameter count e.g. P = 7 × 109 for a 7B model

L Number of transformer layers (depth)
d Hidden / residual stream dimension also written dmodel
dmodel Hidden dimension (explicit form) synonym for d
dhead Attention head dimension typically 128
dff Feed-forward (MLP) intermediate dimension b8d/3c for SwiGLU; 4d for ReLU/GeLU
N Number of query attention heads
Nkv Number of key-value heads Nkv = N (MHA); Nkv < N (GQA/MQA)
H Head dimension; also dhead context disambiguates
HQ Query head count used in GQA ratio HQ /HKV
HKV Key-value head count HKV = 1 (MQA); HKV = HQ /G (GQA)
G GQA group size (N/Nkv ) reduction factor for KV cache
V Vocabulary size tokens in the tokenizer
S Sequence length in tokens also written T in some chapters
T Sequence length in tokens synonym for S; also used for training tokens
Ke Expert utilization rate in MoE fraction of experts activated per token
r LoRA rank dimension of low-rank adapters

Training and Optimization

Notation and Symbols

Symbol Meaning Notes

C Total compute budget (FLOPs)

Cfwd FLOPs for one forward pass ≈ 2NT P
Cstep FLOPs for one training step ≈ 6NT P (fwd + bwd)
NT Total tokens in a batch: B × T
D Training dataset size (tokens)
D∗ Chinchilla-optimal token count D∗ ≈ 20P at compute-optimal
N∗ Chinchilla-optimal parameter count
η Learning rate
α Adam β1 decay (first moment) or generic scaling coefficient
β Adam β2 decay (second moment); also KL penalty coefficient context disambiguates
ϵ Adam numerical stability term; also PPO clip ratio tolerance
mt Adam first moment (mean) at step t
vt Adam second moment (variance) at step t
µ Momentum coefficient; also mean of a distribution
γ Gradient clipping threshold; also discount factor in RL
λ L2 regularization weight; also GAE discount in RL

Inference and Serving

Symbol Meaning Notes

B Batch size (number of concurrent sequences)

Bmax Maximum batch size given VRAM b(MGPU − Mweights )/Ckv c
B∗ KV-crossover batch size Mweights /Ckv ; throughput shifts from superlinear to s
Ckv KV cache memory per sequence 2 × L × Nkv × H × S × 2 bytes (bf16)
Cseq KV cache memory per sequence (alternate notation) synonym for Ckv
Mweights Model weight memory 2P bytes in bf16
MGPU Total GPU VRAM e.g. 80 GB for H100 SXM
tstep Decode step latency (Mweights + B · Ckv )/BW
TPS Tokens per second (throughput) aggregate across all concurrent sequences
TTFT Time to first token prefill latency; compute-bound
TBT Time between tokens (decode latency) per-token; memory-bandwidth-bound
TPOT Time per output token synonym for TBT
ITL Inter-token latency synonym for TBT / TPOT
πθ Policy (language model) parameterized by θ used in RLHF/RL chapters
πref Reference policy (frozen pre-RLHF model)
rϕ Reward model parameterized by ϕ
θ Model parameters (generic)
G Number of rollouts per prompt in GRPO distinct from GQA group size G; context disambigua
Âi Group-normalized advantage for response i GRPO: (ri − µR )/σR
yw , y l Preferred / rejected response in a preference pair DPO/RLHF notation

Hardware and Performance

xvii
Notation and Symbols

Symbol Meaning Notes

BW Memory bandwidth (HBM) GB/s or TB/s

MFU Model FLOPs Utilization achieved FLOP/s ÷ peak FLOP/s; target ≥ 0.4
MBU Memory Bandwidth Utilization achieved BW ÷ peak BW
Pattn Parameters in attention layers ≈ 4d2 per layer (MHA)
Pmlp Parameters in MLP layers 3 d dff per layer (SwiGLU)
Br , B c FlashAttention tile (block) sizes for rows and columns SRAM tile dimensions
Ak , B k Left and right matrix tiles in tiled GEMM
twave Time to execute one SM wave GPU scheduling unit
Ntotal Total expert count in MoE
Nactive Active experts per token in MoE Nactive Ntotal

Alignment and Post-Training

Symbol Meaning Notes

DKL (·k·) Kullback-Leibler divergence

DKL (πθ kπref ) Policy drift from reference regularization term in RLHF/DPO
σ Standard deviation; also sigmoid activation context disambiguates
p, q Generic probability distributions
x Input token sequence
i, j, k Generic indices

Common Abbreviations

xviii
Notation and Symbols

Term Meaning

MHA Multi-Head Attention

GQA Grouped-Query Attention
MQA Multi-Query Attention (GQA with Nkv = 1)
MLA Multi-Head Latent Attention
KV cache Key-Value cache stored between decode steps
HBM High-Bandwidth Memory (GPU DRAM)
SRAM Static RAM; on-chip shared memory / L1 cache
SM Streaming Multiprocessor (GPU compute unit)
FLOP Floating-point operation (one multiply-add = 2 FLOPs)
MFU Model FLOPs Utilization
TP Tensor Parallelism (intra-layer, intra-node)
PP Pipeline Parallelism (inter-layer, inter-node)
DP Data Parallelism
ZeRO Zero Redundancy Optimizer (stages 1-3)
FSDP Fully Sharded Data Parallelism (ZeRO stage 3 in PyTorch)
SFT Supervised Fine-Tuning
RLHF Reinforcement Learning from Human Feedback
DPO Direct Preference Optimization
PPO Proximal Policy Optimization
GRPO Group Relative Policy Optimization
RLVR Reinforcement Learning from Verifiable Rewards
LoRA Low-Rank Adaptation
QLoRA Quantized LoRA
PEFT Parameter-Efficient Fine-Tuning
MoE Mixture of Experts
CoT Chain of Thought
BF16 Brain Float 16 (1 sign, 8 exponent, 7 mantissa bits)
FP16 IEEE Float 16 (1 sign, 5 exponent, 10 mantissa bits)
INT8 8-bit integer quantization
SLO Service Level Objective
QPS Queries per second

Conventions
Tokens vs. sequences. “Tokens” refers to individual vocabulary elements; “sequence” or “context” refers
to an ordered list of tokens. Batch size B counts sequences, not tokens; NT = B × T counts tokens.
FLOPs counting. One multiply-accumulate (MAC) operation = 2 FLOPs. The formula Cfwd ≈ 2NT P
follows this convention. “FLOP/s” (floating-point operations per second) uses the same factor.
Memory units. 1 GB = 109 bytes throughout (SI prefix, not binary). Bandwidth is reported in GB/s or
TB/s.
Latency units. Milliseconds (ms) for per-token and per-request latencies; seconds (s) for end-to-end
generation time; microseconds (µs) for kernel-level timings.
Throughput units. Tokens per second (tok/s or tokens/s) for generation throughput; queries per second

xix
Notation and Symbols

(QPS) for request-level throughput.

Precision notation. Memory footprints assume bf16 (2 bytes/parameter) unless otherwise stated. The
training-memory figure of 16P bytes assumes AdamW in fp32.
Prefill vs. decode. Prefill encodes the full prompt in one parallel forward pass (compute-bound). Decode
generates tokens one at a time in an autoregressive loop (memory-bandwidth-bound). These two phases
have distinct bottlenecks and optimizations.
Latency vs. throughput. TTFT optimizes differently from aggregate TPS. Minimize TTFT with small
batches and fast prefill hardware; maximize TPS with large batches and high memory bandwidth. A single
serving configuration cannot simultaneously optimize both without workload-specific scheduling.
Overloaded symbols. G denotes both the GQA group size and the GRPO rollout count; T denotes
both sequence length and training token count; β denotes both the Adam second-moment decay and the
KL penalty coefficient. In each case, context and the surrounding equation make the intended meaning
unambiguous.

xx
Acknowledgments

A book about systems is itself a system, and this one had many contributors.

The technical content in this book was shaped by countless conversations with engineers and researchers
working on real production systems. Several people gave detailed feedback on draft chapters, caught errors in
derivations, and pointed out places where the framing did not match how the work actually gets done in practice.

A number of colleagues reviewed early drafts and offered both technical corrections and perspective on
what candidates actually encounter in interviews at frontier labs. Their input made the interview framing in
Part I and the end-to-end drills in Part XI significantly sharper.

The open-source research community deserves specific acknowledgment. The derivations in this book
stand on the published work of the teams behind FlashAttention, the Chinchilla scaling law study, the DeepSeek
architecture and training reports, the LLaMA model series, the Mixtral MoE architecture, the RLHF and DPO
preference alignment papers, and the GRPO and RLVR training algorithms. Each of these is cited in the relevant
chapter and listed in Appendix D. The existence of detailed technical reports from frontier labs-a relatively
recent norm-made it possible to ground the book’s engineering content in actual production practice rather than
academic approximations.

Finally, thank you to everyone who read early versions and offered encouragement at the moments when
the project was hardest to continue. You know who you are.
About the Author

I am Hao Hoang, an applied AI/ML engineer and technical writer. I am based in Los Angeles, California,
and I am originally from Quang Tri, Viet Nam.
I own AI Interview Prep and write for the community every day on Substack and on LinkedIn. On Substack
I publish Daily AI Interview Questions and longer notes on LLM system design, reinforcement learning, vision-
language models, RAG-style retrieval, and other high-signal topics for engineers and researchers preparing for
rigorous AI interviews. On LinkedIn I share the same thread of ideas in shorter, daily posts so people can follow
along between editions. That public writing reaches more than 55,000 LinkedIn followers and 12,000 Substack
readers, and it is the main place I teach, learn in public, and answer what the community is asking about.
About this book. I am the sole author: I researched and wrote every chapter myself, without co-authors
or ghostwriters. I wrote it because I wanted a single, opinionated reference for the patterns I kept having to
re-derive when preparing for interviews and when shipping LLM systems in production-at the intersection of
research, systems engineering, and deployment.

You can reach me at:

Email: [Link]@[Link]
Website: [Link]
Substack: [Link]
LinkedIn: [Link]
I publish daily posts on LinkedIn and the newsletter at [Link] Errata
or corrections are welcome at the email above.

Beginner's Guide to LLMs
No ratings yet
Beginner's Guide to LLMs
161 pages
Efficient LLM Inference: Interview Pocket Notes
No ratings yet
Efficient LLM Inference: Interview Pocket Notes
57 pages
Qwopus3!5!27b-Colab Complete Guide To LLM Finetuning
No ratings yet
Qwopus3!5!27b-Colab Complete Guide To LLM Finetuning
26 pages
Semester 2 AI Handbook: NLP Focus
No ratings yet
Semester 2 AI Handbook: NLP Focus
137 pages
Neural Networks Book
No ratings yet
Neural Networks Book
110 pages
ERNIE 4.5 Technical Report Overview
No ratings yet
ERNIE 4.5 Technical Report Overview
72 pages
Neural Networks Lecture Notes Overview
No ratings yet
Neural Networks Lecture Notes Overview
231 pages
LongCat-Flash: Efficient MoE Model Report
No ratings yet
LongCat-Flash: Efficient MoE Model Report
36 pages
Insights on Large Language Models
No ratings yet
Insights on Large Language Models
174 pages
Neural Networks Lecture Notes
100% (1)
Neural Networks Lecture Notes
228 pages
LongCat-Flash Tech Report Overview
No ratings yet
LongCat-Flash Tech Report Overview
36 pages
Neural Network Compression Techniques
No ratings yet
Neural Network Compression Techniques
73 pages
Samuel Daoud TFE
No ratings yet
Samuel Daoud TFE
75 pages
DeepSeek V4
No ratings yet
DeepSeek V4
58 pages
Foundations of Large Language Models Tong Xiao and Jingbo Zhu 835
No ratings yet
Foundations of Large Language Models Tong Xiao and Jingbo Zhu 835
277 pages
Foundations of Large Language Models
No ratings yet
Foundations of Large Language Models
277 pages
Post-Training Techniques for LLMs Survey
No ratings yet
Post-Training Techniques for LLMs Survey
87 pages
Foundations of Large Language Models
No ratings yet
Foundations of Large Language Models
277 pages
UnderstandingDeepLearning 05-29-25 C
No ratings yet
UnderstandingDeepLearning 05-29-25 C
541 pages
Understanding Deep Learning Concepts
No ratings yet
Understanding Deep Learning Concepts
100 pages
DeepSeek-V3 Technical Report Summary
No ratings yet
DeepSeek-V3 Technical Report Summary
53 pages
Understanding Deep Learning (2023)
No ratings yet
Understanding Deep Learning (2023)
100 pages
Ai Computing Systems An Application Driven Perspective 1nbsped 0323953999 9780323953993 9780323953986
No ratings yet
Ai Computing Systems An Application Driven Perspective 1nbsped 0323953999 9780323953993 9780323953986
441 pages
Deep Learning Fundamentals Explained
No ratings yet
Deep Learning Fundamentals Explained
185 pages
DeepSeek-V3 Technical Report
No ratings yet
DeepSeek-V3 Technical Report
106 pages
Distributed Training of Transformer Models
No ratings yet
Distributed Training of Transformer Models
58 pages
Survey of Modular Deep Learning
No ratings yet
Survey of Modular Deep Learning
76 pages
AI Era Survival Guide by Taehoon Kim
No ratings yet
AI Era Survival Guide by Taehoon Kim
259 pages
Step3 5flash
No ratings yet
Step3 5flash
67 pages
Deep Learning with TensorFlow on GPUs
No ratings yet
Deep Learning with TensorFlow on GPUs
50 pages
Efficient Architectures for LLMs Survey
No ratings yet
Efficient Architectures for LLMs Survey
82 pages
Long Context Language Modeling Survey
No ratings yet
Long Context Language Modeling Survey
130 pages
I2DL Lecture Summary at TUM
No ratings yet
I2DL Lecture Summary at TUM
97 pages
A Comprehensive Survey On Long-Context Language Modeling
No ratings yet
A Comprehensive Survey On Long-Context Language Modeling
133 pages
Deep Learning Tutorial Release 0.1
No ratings yet
Deep Learning Tutorial Release 0.1
173 pages
InternLM2: Advanced Open-Source LLM
No ratings yet
InternLM2: Advanced Open-Source LLM
53 pages
Research Notes Cs8976N - LLM - Training
No ratings yet
Research Notes Cs8976N - LLM - Training
15 pages
Machine Learning Systems LLMs Agents in Real World 1759102709
No ratings yet
Machine Learning Systems LLMs Agents in Real World 1759102709
290 pages
Inference Engines
No ratings yet
Inference Engines
16 pages
The Mathematics of Large Language Models LLMs
No ratings yet
The Mathematics of Large Language Models LLMs
8 pages
Thesis Master Transformer MalingaTembo
No ratings yet
Thesis Master Transformer MalingaTembo
69 pages
Understanding Large Language Models
No ratings yet
Understanding Large Language Models
19 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
82 pages
Get Artificial Intelligence Hardware Design 1st Edition Albert Chun-Chen Liu Full Access
100% (1)
Get Artificial Intelligence Hardware Design 1st Edition Albert Chun-Chen Liu Full Access
187 pages
Machine Learning Systems Overview
100% (2)
Machine Learning Systems Overview
2,048 pages
Understanding Large Language Models
No ratings yet
Understanding Large Language Models
19 pages
The Little Book of Deep Learning
100% (1)
The Little Book of Deep Learning
158 pages
Dive into Deep Learning Overview
No ratings yet
Dive into Deep Learning Overview
505 pages
HyperonWhitePaper2025 Xko668r5pjbjtb95nkzpr66cze
No ratings yet
HyperonWhitePaper2025 Xko668r5pjbjtb95nkzpr66cze
63 pages
Machine Learning Systems
100% (1)
Machine Learning Systems
300 pages
Deep Dive Into Llms
No ratings yet
Deep Dive Into Llms
16 pages
Machine Learning Insights for Physicists
No ratings yet
Machine Learning Insights for Physicists
91 pages
Zefs Guide to Deep Learning
50% (2)
Zefs Guide to Deep Learning
163 pages
Understanding Generative AI Concepts
No ratings yet
Understanding Generative AI Concepts
33 pages
Deep Learning Fundamentals and Applications
No ratings yet
Deep Learning Fundamentals and Applications
90 pages
PG Diploma in Data Analytics at IIIT-B
No ratings yet
PG Diploma in Data Analytics at IIIT-B
9 pages
Tax Code Determination in CCS
100% (1)
Tax Code Determination in CCS
2 pages
OSS Components for Customer Solutions
No ratings yet
OSS Components for Customer Solutions
17 pages
Data Structures Exam Paper July 2022
No ratings yet
Data Structures Exam Paper July 2022
4 pages
Financial Analyst Profile: Moris Kiringa
No ratings yet
Financial Analyst Profile: Moris Kiringa
3 pages
SAP Shipment Cost Configuration Guide
100% (2)
SAP Shipment Cost Configuration Guide
21 pages
Mechatronics Report Writing Guide
No ratings yet
Mechatronics Report Writing Guide
7 pages
Urdu Love Shayari Collection
60% (5)
Urdu Love Shayari Collection
5 pages
Product Clustering Using Machine Learning
No ratings yet
Product Clustering Using Machine Learning
25 pages
ELMConfig Guide for Ford ECU Modifications
0% (1)
ELMConfig Guide for Ford ECU Modifications
18 pages
Top 20 Active Directory Interview Q&A
No ratings yet
Top 20 Active Directory Interview Q&A
6 pages
Unit 5 A1 Beginner Workbook Answers
No ratings yet
Unit 5 A1 Beginner Workbook Answers
1 page
SAR Concept of Operations for OSSAT
No ratings yet
SAR Concept of Operations for OSSAT
27 pages
Trivariate Normal Distribution in STA680
No ratings yet
Trivariate Normal Distribution in STA680
2 pages
Overview of Manmohan's PDF Document
No ratings yet
Overview of Manmohan's PDF Document
34 pages
PMYP Portal User Guide for Candidates
No ratings yet
PMYP Portal User Guide for Candidates
28 pages
Efficient Grid-Based Robot Path Planning
No ratings yet
Efficient Grid-Based Robot Path Planning
11 pages
Python Logical Tests and Loops Guide
No ratings yet
Python Logical Tests and Loops Guide
3 pages
2020 Web Development Bootcamp Guide
No ratings yet
2020 Web Development Bootcamp Guide
25 pages
Free 3D Pallet Load Calculator
No ratings yet
Free 3D Pallet Load Calculator
6 pages
Test Item Creation and Validation Guide
No ratings yet
Test Item Creation and Validation Guide
88 pages
MBM Engineering College Overview
No ratings yet
MBM Engineering College Overview
3 pages
Understanding Systems and Models in IB ESS
No ratings yet
Understanding Systems and Models in IB ESS
8 pages
Employee Satisfaction Study at Woodcastle
No ratings yet
Employee Satisfaction Study at Woodcastle
6 pages
Commcrete Stardust Tactical Radio
No ratings yet
Commcrete Stardust Tactical Radio
5 pages
Application and Database Server Overview
No ratings yet
Application and Database Server Overview
20 pages
JEE Main 2019 Admit Card Instructions
No ratings yet
JEE Main 2019 Admit Card Instructions
1 page
Online Exam Management System Proposal
No ratings yet
Online Exam Management System Proposal
3 pages
Notes - Internal Security by Vivek Koushik Sir
No ratings yet
Notes - Internal Security by Vivek Koushik Sir
4 pages
Overview of Hidden Markov Models
No ratings yet
Overview of Hidden Markov Models
41 pages