– deploying to virtual machines, Deploying to Virtual
Machines-Deploying to Virtual Machines
– deploying with containers, Deploying with Containers-
Deploying with Containers
development environment setup, FastAPI, Setting Up Your
Development Environment-Creating a Simple FastAPI Web
Server
– creating a simple FastAPI web server, Creating a Simple
FastAPI Web Server-Creating a Simple FastAPI Web Server
– installing Python/Fast API/required packages, Installing
Python, FastAPI, and Required Packages
device authorization flow, Device authorization flow
diffusion models, What Is Generative AI?
directional expectation tests (DETs), Directional expectation
tests (DETs)
Django, Why Build Generative AI Services with FastAPI?,
Comparing FastAPI to Other Python Web Frameworks-
Comparing FastAPI to Other Python Web Frameworks
Docker Compose tool
– basics, Docker Compose-Docker Compose
– enabling GPU access in, Enabling GPU Access in Docker
Compose
Docker Hub, Container Registries-Container Registries
Docker, containerization with, Containerization with Docker-
docker init
– bridge network driver
– configuring user-defined bridge networks, Configure
user-defined bridge networks-Configure user-defined
bridge networks
– embedded DNS, Embedded DNS
– publishing ports, Publishing ports
– building Docker images, Building Docker Images-Building
Docker Images
– container filesystem and Docker layers, Container
Filesystem and Docker Layers-Container Filesystem and
Docker Layers
– container registries, Container Registries-Container
Registries
– containerization versus virtualization, Deploying with
Containers
– Docker architecture, Docker Architecture
– Docker networking, Docker Networking-None network
driver
– bridge network driver, Bridge network driver-
Publishing ports
– host network driver, Host network driver
– none network driver, None network driver
– Docker storage mechanisms, Docker Storage-Handling
filesystem permissions
– bind mounts, Bind mounts
– Docker volumes, Docker volumes
– handling filesystem permissions, Handling filesystem
permissions-Handling filesystem permissions
– interpreting Linux filesystem permissions, Handling
filesystem permissions
– temporary mounts, Temporary mounts (tmpfs)
– optimizing Docker images, Optimizing Docker Images-
Multi-stage builds
– avoiding GPU inference runtimes, Avoid GPU inference
runtimes-Avoid GPU inference runtimes
– externalizing application data, Externalize application
data
– layer ordering and caching, Layer ordering and caching-
Use external cache
– multi-stage builds, Multi-stage builds-Multi-stage builds
– using minimal base image, Use minimal base image
– Unionfs, Container Filesystem and Docker Layers
Dockerfile, Building Docker Images-Building Docker Images
documentation, automatic generation of, Creating a Simple
FastAPI Web Server, Automatic Documentation
dummies, Dummies
dynamic (continuous) batching, Request batching and
continuous batching
dynamic few-shot prompting, In-context learning
dynamically typed languages, Introduction to Type Safety
E
EBMs (energy-based models), What Is Generative AI?
embeddings, Tokenization and embedding, Project: Talk to
Documents (RAG)
end-to-end testing, End-to-End Testing-Horizontal E2E tests
– drawbacks, Types of Tests
– horizontal E2E tests, Horizontal E2E tests-Horizontal E2E
tests
– overview, Types of Tests
– vertical E2E tests, Vertical E2E tests-Vertical E2E tests
energy-based models (EBMs), What Is Generative AI?
ensembling, prompting, Ensembling
environment variables, parsing with Pydantic Settings,
Parsing Environment Variables with Pydantic-Parsing
Environment Variables with Pydantic
ephemeral storage, Docker Storage
event loop, Synchronous Versus Asynchronous (Async)
Execution, Event Loop and Thread Pool in FastAPI-Event
Loop and Thread Pool in FastAPI
EventSource interface, Server-Sent Events
eviction policies, Eviction policies
exception handling, for WS, Handling WebSocket Exceptions
exponential backoff, Async Programming with Model
Provider APIs, SSE with POST Request
external authorization service, Hybrid Authorization Models-
Hybrid Authorization Models
external model serving, Externalizing Model Serving-Paged
attention
F
fakes, Fakes
FastAPI (basics), Getting Started with FastAPI-Summary
– automatic documentation, Creating a Simple FastAPI Web
Server, Automatic Documentation
– background tasks, Built-In Support for Background Tasks
– bidirectional WebSocket support, Bidirectional Web
Socket, GraphQL, and Custom Response Support-
Bidirectional Web Socket, GraphQL, and Custom Response
Support
– comparison to other Python web frameworks, Comparing
FastAPI to Other Python Web Frameworks-Comparing
FastAPI to Other Python Web Frameworks
– custom middleware and CORS support, Custom
Middleware and CORS Support
– data validation/serialization, Data Validation and
Serialization-Data Validation and Serialization
– dependency injection system, Dependency Injection
System-Dependency Injection System
– development environment setup, Setting Up Your
Development Environment-Creating a Simple FastAPI Web
Server
– creating a simple FastAPI web server, Creating a Simple
FastAPI Web Server-Creating a Simple FastAPI Web
Server
– installing Python/Fast API/required packages, Installing
Python, FastAPI, and Required Packages
– event loop and thread pool, Event Loop and Thread Pool in
FastAPI-Event Loop and Thread Pool in FastAPI
– features and advantages, FastAPI Features and
Advantages-Modern Python and IDE Integration with
Sensible Defaults
– handling asynchronous/synchronous operations, Handling
Asynchronous and Synchronous Operations
– lifespan events, Lifespan Events
– limitations, FastAPI Limitations-Lack of Support for
Resource-Intensive AI Workloads
– dependency conflicts, Dependency Conflicts
– Global Interpreter Lock, Restricted to Global Interpreter
Lock
– inefficient model memory management, Inefficient
Model Memory Management
– inefficient splitting of AI workloads between CPU and
GPU, Cannot Efficiently Split AI Workloads Between CPU
and GPU
– lack of support for micro-batch processing inference
requests, Lack of Support for Micro-Batch Processing
Inference Requests
– lack of support for resource-intensive AI workloads,
Lack of Support for Resource-Intensive AI Workloads
– limited number of threads, Limited Number of Threads
– modern Python/IDE integration with sensible defaults,
Modern Python and IDE Integration with Sensible Defaults
– onion (layered) application design pattern, Onion/Layered
Application Design Pattern-Onion/Layered Application
Design Pattern
– plug-ins for, Rich Ecosystem of Plug-Ins
– project structures, FastAPI Project Structures-Progressive
Reorganization of Your FastAPI Project
– flat structure, Flat Structure
– modular structure, Modular Structure-Modular
Structure
– nested structure, Nested Structure
– progressive reorganization of project, Progressive
Reorganization of Your FastAPI Project-Progressive
Reorganization of Your FastAPI Project
– reasons to build GenAI services with, Why Build
Generative AI Services with FastAPI?-Why Build
Generative AI Services with FastAPI?
– security/authentication components, Security and
Authentication Components
– service layer customization, Freedom to Customize Any
Service Layer
– setting up a managed Python environment/tooling, Setting
Up a Managed Python Environment and Tooling-Setting Up
a Managed Python Environment and Tooling
few-shot prompting, In-context learning
fine-tuning (optimization technique), Fine-Tuning-How to
fine-tune a pretrained model
– for pretrained model, How to fine-tune a pretrained
model-How to fine-tune a pretrained model
– when to consider, When should you consider fine-tuning?-
When should you consider fine-tuning?
firewall rules, Docker Networking
fixture functions, Fixtures and scope
fixtures (input data), Fixtures and scope
Flask, Why Build Generative AI Services with FastAPI?,
Comparing FastAPI to Other Python Web Frameworks-
Comparing FastAPI to Other Python Web Frameworks
flat project structure, Flat Structure
formatters (Python packages), Setting Up a Managed Python
Environment and Tooling
G
G-Eval framework, Implementing a Moderation Guardrail-
Implementing a Moderation Guardrail
generative adversarial networks (GANs), What Is Generative
AI?
generative AI (GenAI) basics, Introduction-Summary
– building a GenAI service, How to Build a Generative AI
Service-How to Build a Generative AI Service
– capstone project overview, Overview of the Capstone
Project
– challenges when adopting, What Prevents the Adoption of
Generative AI Services
– defined, What Is Generative AI?-What Is Generative AI?
– reasons to use FastAPI to build GenAI services, Why Build
Generative AI Services with FastAPI?-Why Build
Generative AI Services with FastAPI?
– serving generative models (see serving GenAI models)
– why GenAI services will power future applications, Why
Generative AI Services Will Power Future Applications-
Scaling and Democratizing Content Generation
– acting as interface to complex systems, Acting as an
Interface to Complex Systems
– automating manual administrative tasks, Automating
Manual Administrative Tasks
– facilitation of creative process, Facilitating the Creative
Process-Facilitating the Creative Process
– minimizing delay in resolving customer queries,
Minimizing Delay in Resolving Customer Queries-
Minimizing Delay in Resolving Customer Queries
– personalization of user experience, Personalizing the
User Experience
– role of context-rich prompts in generative models,
Suggesting Contextually Relevant Solutions
– scaling/democratizing content generation, Scaling and
Democratizing Content Generation
– suggesting contextually relevant solutions, Suggesting
Contextually Relevant Solutions-Suggesting Contextually
Relevant Solutions
given-when-then (GWT) model, Test Phases
Global Interpreter Lock (GIL), Restricted to Global Interpreter
Lock, Optimizing GenAI Services for Multiple Users