0% found this document useful (0 votes)

2 views10 pages

Building Generative AI

The document discusses the complexities and pitfalls of asynchronous programming, particularly in the context of FastAPI, which supports both synchronous and asynchronous operations. It emphasizes the importance of using async-compatible tools and managing resources properly to avoid performance issues and memory leaks. Additionally, it outlines the benefits of using an event loop for concurrency and provides guidance on implementing features like web scraping and document processing in a FastAPI service.

Uploaded by

xiaowang198808

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views10 pages

Building Generative AI

Uploaded by

xiaowang198808

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Understanding and debugging errors can be more complex

due to the nonlinear execution flow of concurrent tasks.

Some libraries, like aiohttp , require nested async context
managers for proper implementation. This can get confusing
pretty fast.
Mixing asynchronous and synchronous code can negate any
performance benefits, such as if you forget to mark functions
with the async and await keywords.
Not using async-compatible tools and libraries can also
cancel out any performance benefits; for example, using the
requests package instead of aiohttp for making async
API calls.
Forgetting to await coroutines within any async function or
awaiting non-coroutines can lead to unexpected behavior. All
async keywords must be followed by an await .
Improperly managing resources (e.g., open API/database
connections or file buffers) can cause memory leaks that
freeze your computer. You can also leak memory if you don’t
limit the number of concurrent operations in async code.
You might also run into concurrency and race condition
issues where the thread-safety principle is violated, causing
deadlocks on resources leading to data corruption.

This list is not exhaustive, and as you can see, there are several
pitfalls to using asynchronous programming. Therefore, I
recommend starting with writing synchronous programs first,
to understand the basic flow and logic of your code, before
dealing with the complexities of migrating to an async
implementation.

Event Loop and Thread Pool in FastAPI

Under the hood, FastAPI can handle both async and sync
blocking operations. It does this by running sync handlers in its
thread pool so that blocking operations don’t stop the event loop
from executing tasks.

As I mentioned in Chapter 2, FastAPI runs on the ASGI web

framework via Starlette. If it didn’t, the server would effectively
run synchronously, so you would have to wait for each process
to finish before it could serve the next. However, using ASGI, the
FastAPI server supports concurrency via both multithreading
(via a thread pool) and asynchronous programming (via an
event loop) to serve multiple requests in parallel, while keeping
the main server process from being blocked.

FastAPI sets up the thread pool by instantiating a collection of

threads at application startup to reduce the runtime burden of
4
thread creation. It then delegates background tasks and
synchronous workloads to the thread pool to prevent the event
loop from being blocked by any blocking operations inside the
synchronous handlers. The event loop is also referred to as the
main FastAPI server thread that is responsible for orchestrating
the processing of requests.

As I mentioned, the event loop is the core component of every

application built on top of asyncio , including FastAPI that
implements concurrency. Event loops run asynchronous tasks
and callbacks, including performing network I/O operations,
and running subprocesses. In FastAPI, the event loop is also
responsible for orchestrating the asynchronous processing of
requests.

If possible, you should run handlers on the event loop (via

asynchronous programming) as it can be even more efficient
than running them on the thread pool (via multithreading).
This is because each thread in the thread pool has to acquire
the GIL before it can execute any code bytes, and that requires
some computational effort.

Imagine if multiple concurrent users were using both the

synchronous and asynchronous OpenAI GPT-3.5 handlers
(endpoints) of your FastAPI service, as shown in Example 5-4.
FastAPI will run the async handler requests on the event loop
since that handler uses a nonblocking async OpenAI client. On
the other hand, FastAPI has to delegate the synchronous
handler requests to the thread pool to protect the event loop
from blocking. Since delegating requests (to threads) and
switching between threads in a thread pool is more work, the
synchronous requests will finish later than their async
counterparts.

NOTE

Remember that all of this work—processing both synchronous and async handler
requests—is running on a single CPU core within the same FastAPI Python process.

This is so that the CPU idle time is minimized while waiting for responses from
OpenAI API.

The differences in performance are shown in Figure 5-5.

Figure 5-5. How multithreading and Async IO handle I/O blocking operations

Figure 5-5 shows that with I/O-bound workloads, async

implementations are faster and should be your preferred
method if you need concurrency. However, FastAPI does still do
a solid job of serving multiple concurrent requests even if it has
to work with a synchronous OpenAI client. It simply sends the
synchronous API calls within threads of the thread pool to
implement some form of concurrency for you. That’s why the
FastAPI official documentation tells you to not worry too much
about declaring your handler functions as async def or def .

However, keep in mind that when you declare handlers with

async def , FastAPI trusts you with performing only
nonblocking operations. When you break that trust and execute
blocking operations inside async routes, the event loop will be
blocked and can no longer continue with executing tasks until
the blocking operation is finished.

Blocking the Main Server

If you’re using the async keyword when defining your

functions, make sure you’re also using the await keyword
somewhere inside your function and that none of the package
dependencies you use inside the function are synchronous.

Avoid declaring route handler functions as async if their

implementation is synchronous. Otherwise, requests to the
affected route handlers will block the main server from
processing other requests while the server is waiting for the
blocking operation to complete. It won’t matter if the blocking
operation is I/O-bound or compute-bound. Therefore, any calls
to databases or AI models can still cause the blockage if you’re
not careful.

This is an easy mistake to make. For instance, you may use a

synchronous dependency inside handlers you’ve declared as
async, as shown in Example 5-5.

Example 5-5. Incorrect implementation of asynchronous

handlers in FastAPI

import os
from fastapi import FastAPI
from openai import AsyncOpenAI, OpenAI

app = FastAPI()

@[Link]("/block")
async def block_server_controller():
completion = sync_client.[Link]
return [Link][0].[Link]

@[Link]("/slow")
def slow_text_generator():
completion = sync_client.[Link]
return [Link][0].[Link]
@[Link]("/fast")
async def fast_text_generator():
completion = await async_client.[Link]
return [Link][0].[Link]

I/O blocking operation to get ChatGPT API response.

Because the route handler is marked async, FastAPI trusts
us to not run blocking operations, but as we are, the
request will block the event loop (main server thread).
Other requests are now blocked until the current request
is processed.

A simple synchronous route handler with blocking

operation that doesn’t leverage asynchronous features.
Sync requests are handed off to the thread pool to run in
the background so that the main server is not blocked.

An asynchronous route that is nonblocking.

The request won’t block the main thread and doesn’t need to be
handed off to the thread pool. As a result, the FastAPI event loop
can process the request much faster using the async OpenAI
client.
You now should feel more comfortable implementing new
features in your FastAPI service that require performing I/O-
bound tasks.

To help solidify your understanding of the I/O concurrency

concepts, in the next few sections you will build several new
features using concurrency into your FastAPI service. These
features include:

Talk to the web

Build and integrate a web scraper module that allows you

to ask questions to your self-hosted LLM about the content
of a website by providing an HTTP URL.

Talk to documents

Build and integrate a RAG module to process documents

into a vector database. A vector database stores data in a
way that supports efficient similarity searches. You can
then use semantic search, which understands the
meaning of queries, to interact with uploaded documents
using your LLM.

Both projects will give you a hands-on experience interacting

asynchronously with external systems such as websites, a
database, and a filesystem.
Project: Talk to the Web (Web Scraper)

Companies often host a series of internal web pages for

manuals, processes, and other documentation as HTML pages.
For longer pages, your users may want to provide URLs when
asking questions and expect your LLM to fetch and read the
content. This is where having a built-in web scraper can come
in handy.

There are many ways to build a web scraper for your self-
hosted LLM. Depending on your use case, you can use a
combination of the following methods:

Fetch web pages as HTML and feed the raw HTML (or inner
text content) to your LLM to parse the content into your
desired format.
Use web scraping frameworks such as BeautifulSoup and
ScraPy to parse the content of web pages after fetching.
Use headless web browsers such as Selenium and Microsoft
Playwright to dynamically navigate nodes in pages and parse
content. Headless browsers are great for navigation single-
page applications (SPAs).

Building Generative AI
No ratings yet
Building Generative AI
10 pages
FastAPI Async Techniques Explained
No ratings yet
FastAPI Async Techniques Explained
10 pages
Building Generative AI
No ratings yet
Building Generative AI
10 pages
Building Generative AI
No ratings yet
Building Generative AI
10 pages
Fastapi Interview
No ratings yet
Fastapi Interview
11 pages
FastAPI Architecture Handbook Overview
No ratings yet
FastAPI Architecture Handbook Overview
10 pages
Building Generative AI
No ratings yet
Building Generative AI
10 pages
Aiohttp Performance: 1 Million Requests
No ratings yet
Aiohttp Performance: 1 Million Requests
9 pages
1743686744module 7 Advanced Topics in API Development
No ratings yet
1743686744module 7 Advanced Topics in API Development
19 pages
Xplain How You Would Design A Backend Service in P
No ratings yet
Xplain How You Would Design A Backend Service in P
23 pages
Building Generative AI
No ratings yet
Building Generative AI
10 pages
Production Grade API Architectures FastAPI Pydantic 2026
No ratings yet
Production Grade API Architectures FastAPI Pydantic 2026
16 pages
Synchronous vs Async Python Performance
No ratings yet
Synchronous vs Async Python Performance
3 pages
Fastapi Documentation
No ratings yet
Fastapi Documentation
15 pages
Python Async
No ratings yet
Python Async
18 pages
Python API Development Guide
No ratings yet
Python API Development Guide
34 pages
Mastering Python Asyncio Essentials
100% (1)
Mastering Python Asyncio Essentials
141 pages
FastAPI API Development Guide
No ratings yet
FastAPI API Development Guide
10 pages
High-Performance Web Apps With FastAPI: The Asynchronous Web Framework Based On Modern Python 1st Edition Malhar Lathkar Ebook All Formats Available
100% (3)
High-Performance Web Apps With FastAPI: The Asynchronous Web Framework Based On Modern Python 1st Edition Malhar Lathkar Ebook All Formats Available
39 pages
Python Asynchronous Programming Guide
No ratings yet
Python Asynchronous Programming Guide
15 pages
Task-Based Asynchronous Programming in .NET
No ratings yet
Task-Based Asynchronous Programming in .NET
15 pages
Building Generative AI
No ratings yet
Building Generative AI
10 pages
Building Generative AI
No ratings yet
Building Generative AI
10 pages
High-Performance Web Apps With FastAPI: The Asynchronous Web Framework Based On Modern Python 1st Edition Malhar Lathkar Ebook Instantly Openable
100% (2)
High-Performance Web Apps With FastAPI: The Asynchronous Web Framework Based On Modern Python 1st Edition Malhar Lathkar Ebook Instantly Openable
47 pages
FastAPI: Features, Setup, and Comparison
No ratings yet
FastAPI: Features, Setup, and Comparison
9 pages
Building Generative AI
No ratings yet
Building Generative AI
10 pages
FastAPI Model Deployment Guide
No ratings yet
FastAPI Model Deployment Guide
4 pages
aiohttp Web Server Guide
No ratings yet
aiohttp Web Server Guide
17 pages
Flutter StreamBuilder Insights
No ratings yet
Flutter StreamBuilder Insights
458 pages
Building Generative AI Services With FastAPI51
No ratings yet
Building Generative AI Services With FastAPI51
10 pages
Python Async Features Quiz Guide
No ratings yet
Python Async Features Quiz Guide
26 pages
Async Programming Notes
No ratings yet
Async Programming Notes
3 pages
Mastering Async Programming in Python
No ratings yet
Mastering Async Programming in Python
86 pages
Official Python Asyncio Documentation
No ratings yet
Official Python Asyncio Documentation
39 pages
Fastapi Module3
No ratings yet
Fastapi Module3
22 pages
Python Concurrency: Sync, Async, Threads
No ratings yet
Python Concurrency: Sync, Async, Threads
7 pages
Asynchronous Programming in Django
No ratings yet
Asynchronous Programming in Django
34 pages
Mastering LangChain A Applications
No ratings yet
Mastering LangChain A Applications
5 pages
Learning Flask Framework
No ratings yet
Learning Flask Framework
3 pages
Learning Flask Framework
No ratings yet
Learning Flask Framework
3 pages
Learning Flask Framework
No ratings yet
Learning Flask Framework
3 pages
Learning Flask Framework
No ratings yet
Learning Flask Framework
3 pages
Learning Flask Framework
No ratings yet
Learning Flask Framework
3 pages
Building Generative AI
No ratings yet
Building Generative AI
10 pages
Mastering 2025
No ratings yet
Mastering 2025
5 pages
Building Generative AI Services With FastAPI51
No ratings yet
Building Generative AI Services With FastAPI51
10 pages
Mastering 2025
No ratings yet
Mastering 2025
5 pages
Mastering 2025
No ratings yet
Mastering 2025
5 pages
Mastering 2025
No ratings yet
Mastering 2025
5 pages
Mastering 2025
No ratings yet
Mastering 2025
5 pages
Mastering 2025
No ratings yet
Mastering 2025
5 pages
Mastering 2025
No ratings yet
Mastering 2025
5 pages
Serving Static Files: Deploying Your Application
No ratings yet
Serving Static Files: Deploying Your Application
10 pages
Test - Py, Exampletest: Testing Flask Apps
No ratings yet
Test - Py, Exampletest: Testing Flask Apps
10 pages
Login and Logout Views: Begin Building Our Actual Login View, Let's Start With The
No ratings yet
Login and Logout Views: Begin Building Our Actual Login View, Let's Start With The
10 pages
Sessions: Authenticating Users
No ratings yet
Sessions: Authenticating Users
10 pages
Mastering 2025
No ratings yet
Mastering 2025
5 pages
Preprocessors and Postprocessors: Ajax and Restful Apis
No ratings yet
Preprocessors and Postprocessors: Ajax and Restful Apis
10 pages
Creating A URL Scheme: Templates and Views
No ratings yet
Creating A URL Scheme: Templates and Views
10 pages
Cleaning Up: - Deleted Status - Deleted
No ratings yet
Cleaning Up: - Deleted Status - Deleted
10 pages
Reading Values From The Request: Specifying A Name? As You Can See, The Flask Development Server Will Return A
No ratings yet
Reading Values From The Request: Specifying A Name? As You Can See, The Flask Development Server Will Return A
10 pages
Creating The Entry Table 29 Working With The Entry Model 30
No ratings yet
Creating The Entry Table 29 Working With The Entry Model 30
10 pages
Creating Your First Flask Application: WWW - It-Ebooks - Info
No ratings yet
Creating Your First Flask Application: WWW - It-Ebooks - Info
10 pages
Com/submit Errata
No ratings yet
Com/submit Errata
10 pages
Find Second Largest Number in Array
No ratings yet
Find Second Largest Number in Array
20 pages
Managing Schema Evolution in ADF
No ratings yet
Managing Schema Evolution in ADF
99 pages
JSP Basics and Examples
No ratings yet
JSP Basics and Examples
19 pages
Decompiler Microcode Architecture Overview
No ratings yet
Decompiler Microcode Architecture Overview
58 pages
CS506 Java Servlet and JSP Overview
100% (3)
CS506 Java Servlet and JSP Overview
37 pages
libpenguin.so Not Found Error
No ratings yet
libpenguin.so Not Found Error
39 pages
Understanding Redux and Redux Toolkit
No ratings yet
Understanding Redux and Redux Toolkit
3 pages
Game Save Error: Serialization Issue
No ratings yet
Game Save Error: Serialization Issue
2 pages
Understanding Malbolge Programming
100% (1)
Understanding Malbolge Programming
2 pages
Axiom: FFmpeg GUI for Windows
No ratings yet
Axiom: FFmpeg GUI for Windows
7 pages
Wireframing & Prototyping Tools Guide
No ratings yet
Wireframing & Prototyping Tools Guide
8 pages
Week 4
No ratings yet
Week 4
17 pages
Start Here
No ratings yet
Start Here
16 pages
Python Fundamentals for Data Science
No ratings yet
Python Fundamentals for Data Science
62 pages
AR App Development with Dr. Mehendale
No ratings yet
AR App Development with Dr. Mehendale
9 pages
OOAD Laboratory Manual for CS Students
No ratings yet
OOAD Laboratory Manual for CS Students
52 pages
CFW5XX FwUpdate G1
No ratings yet
CFW5XX FwUpdate G1
3 pages
SQL Basics Quiz for Online Shopping
100% (1)
SQL Basics Quiz for Online Shopping
4 pages
Overview of Operating System Types
No ratings yet
Overview of Operating System Types
12 pages
R: Software Development Life Cycle A Description of R's Development, Testing, Release and Maintenance Processes
No ratings yet
R: Software Development Life Cycle A Description of R's Development, Testing, Release and Maintenance Processes
15 pages
Micro Project on ATM Management System
No ratings yet
Micro Project on ATM Management System
16 pages
C++ Developer & QA Jobs at Raghunandan Capital
No ratings yet
C++ Developer & QA Jobs at Raghunandan Capital
2 pages
Static Website Design Assignment
No ratings yet
Static Website Design Assignment
7 pages
Sanic Framework Documentation 19.12.2
No ratings yet
Sanic Framework Documentation 19.12.2
147 pages
Python Programming Exercises
No ratings yet
Python Programming Exercises
8 pages
STM8 Development Tools Overview
No ratings yet
STM8 Development Tools Overview
17 pages
Python Full Stack Development Guide
No ratings yet
Python Full Stack Development Guide
14 pages
PHP Homework Assignments Overview
No ratings yet
PHP Homework Assignments Overview
25 pages
Correlated Subqueries in SQL Explained
No ratings yet
Correlated Subqueries in SQL Explained
18 pages
Android JIT Compiler Optimizations
No ratings yet
Android JIT Compiler Optimizations
35 pages