0% found this document useful (0 votes)
2 views10 pages

Building Generative AI

The document discusses the complexities and pitfalls of asynchronous programming, particularly in the context of FastAPI, which supports both synchronous and asynchronous operations. It emphasizes the importance of using async-compatible tools and managing resources properly to avoid performance issues and memory leaks. Additionally, it outlines the benefits of using an event loop for concurrency and provides guidance on implementing features like web scraping and document processing in a FastAPI service.

Uploaded by

xiaowang198808
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views10 pages

Building Generative AI

The document discusses the complexities and pitfalls of asynchronous programming, particularly in the context of FastAPI, which supports both synchronous and asynchronous operations. It emphasizes the importance of using async-compatible tools and managing resources properly to avoid performance issues and memory leaks. Additionally, it outlines the benefits of using an event loop for concurrency and provides guidance on implementing features like web scraping and document processing in a FastAPI service.

Uploaded by

xiaowang198808
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Understanding and debugging errors can be more complex

due to the nonlinear execution flow of concurrent tasks.


Some libraries, like aiohttp , require nested async context
managers for proper implementation. This can get confusing
pretty fast.
Mixing asynchronous and synchronous code can negate any
performance benefits, such as if you forget to mark functions
with the async and await keywords.
Not using async-compatible tools and libraries can also
cancel out any performance benefits; for example, using the
requests package instead of aiohttp for making async
API calls.
Forgetting to await coroutines within any async function or
awaiting non-coroutines can lead to unexpected behavior. All
async keywords must be followed by an await .
Improperly managing resources (e.g., open API/database
connections or file buffers) can cause memory leaks that
freeze your computer. You can also leak memory if you don’t
limit the number of concurrent operations in async code.
You might also run into concurrency and race condition
issues where the thread-safety principle is violated, causing
deadlocks on resources leading to data corruption.

This list is not exhaustive, and as you can see, there are several
pitfalls to using asynchronous programming. Therefore, I
recommend starting with writing synchronous programs first,
to understand the basic flow and logic of your code, before
dealing with the complexities of migrating to an async
implementation.

Event Loop and Thread Pool in FastAPI

Under the hood, FastAPI can handle both async and sync
blocking operations. It does this by running sync handlers in its
thread pool so that blocking operations don’t stop the event loop
from executing tasks.

As I mentioned in Chapter 2, FastAPI runs on the ASGI web


framework via Starlette. If it didn’t, the server would effectively
run synchronously, so you would have to wait for each process
to finish before it could serve the next. However, using ASGI, the
FastAPI server supports concurrency via both multithreading
(via a thread pool) and asynchronous programming (via an
event loop) to serve multiple requests in parallel, while keeping
the main server process from being blocked.

FastAPI sets up the thread pool by instantiating a collection of


threads at application startup to reduce the runtime burden of
4
thread creation. It then delegates background tasks and
synchronous workloads to the thread pool to prevent the event
loop from being blocked by any blocking operations inside the
synchronous handlers. The event loop is also referred to as the
main FastAPI server thread that is responsible for orchestrating
the processing of requests.

As I mentioned, the event loop is the core component of every


application built on top of asyncio , including FastAPI that
implements concurrency. Event loops run asynchronous tasks
and callbacks, including performing network I/O operations,
and running subprocesses. In FastAPI, the event loop is also
responsible for orchestrating the asynchronous processing of
requests.

If possible, you should run handlers on the event loop (via


asynchronous programming) as it can be even more efficient
than running them on the thread pool (via multithreading).
This is because each thread in the thread pool has to acquire
the GIL before it can execute any code bytes, and that requires
some computational effort.

Imagine if multiple concurrent users were using both the


synchronous and asynchronous OpenAI GPT-3.5 handlers
(endpoints) of your FastAPI service, as shown in Example 5-4.
FastAPI will run the async handler requests on the event loop
since that handler uses a nonblocking async OpenAI client. On
the other hand, FastAPI has to delegate the synchronous
handler requests to the thread pool to protect the event loop
from blocking. Since delegating requests (to threads) and
switching between threads in a thread pool is more work, the
synchronous requests will finish later than their async
counterparts.

NOTE

Remember that all of this work—processing both synchronous and async handler
requests—is running on a single CPU core within the same FastAPI Python process.

This is so that the CPU idle time is minimized while waiting for responses from
OpenAI API.

The differences in performance are shown in Figure 5-5.


Figure 5-5. How multithreading and Async IO handle I/O blocking operations

Figure 5-5 shows that with I/O-bound workloads, async


implementations are faster and should be your preferred
method if you need concurrency. However, FastAPI does still do
a solid job of serving multiple concurrent requests even if it has
to work with a synchronous OpenAI client. It simply sends the
synchronous API calls within threads of the thread pool to
implement some form of concurrency for you. That’s why the
FastAPI official documentation tells you to not worry too much
about declaring your handler functions as async def or def .

However, keep in mind that when you declare handlers with


async def , FastAPI trusts you with performing only
nonblocking operations. When you break that trust and execute
blocking operations inside async routes, the event loop will be
blocked and can no longer continue with executing tasks until
the blocking operation is finished.

Blocking the Main Server

If you’re using the async keyword when defining your


functions, make sure you’re also using the await keyword
somewhere inside your function and that none of the package
dependencies you use inside the function are synchronous.

Avoid declaring route handler functions as async if their


implementation is synchronous. Otherwise, requests to the
affected route handlers will block the main server from
processing other requests while the server is waiting for the
blocking operation to complete. It won’t matter if the blocking
operation is I/O-bound or compute-bound. Therefore, any calls
to databases or AI models can still cause the blockage if you’re
not careful.

This is an easy mistake to make. For instance, you may use a


synchronous dependency inside handlers you’ve declared as
async, as shown in Example 5-5.

Example 5-5. Incorrect implementation of asynchronous


handlers in FastAPI

import os
from fastapi import FastAPI
from openai import AsyncOpenAI, OpenAI

app = FastAPI()

@[Link]("/block")
async def block_server_controller():
completion = sync_client.[Link]
return [Link][0].[Link]

@[Link]("/slow")
def slow_text_generator():
completion = sync_client.[Link]
return [Link][0].[Link]
@[Link]("/fast")
async def fast_text_generator():
completion = await async_client.[Link]
return [Link][0].[Link]

I/O blocking operation to get ChatGPT API response.


Because the route handler is marked async, FastAPI trusts
us to not run blocking operations, but as we are, the
request will block the event loop (main server thread).
Other requests are now blocked until the current request
is processed.

A simple synchronous route handler with blocking


operation that doesn’t leverage asynchronous features.
Sync requests are handed off to the thread pool to run in
the background so that the main server is not blocked.

An asynchronous route that is nonblocking.

The request won’t block the main thread and doesn’t need to be
handed off to the thread pool. As a result, the FastAPI event loop
can process the request much faster using the async OpenAI
client.
You now should feel more comfortable implementing new
features in your FastAPI service that require performing I/O-
bound tasks.

To help solidify your understanding of the I/O concurrency


concepts, in the next few sections you will build several new
features using concurrency into your FastAPI service. These
features include:

Talk to the web

Build and integrate a web scraper module that allows you


to ask questions to your self-hosted LLM about the content
of a website by providing an HTTP URL.

Talk to documents

Build and integrate a RAG module to process documents


into a vector database. A vector database stores data in a
way that supports efficient similarity searches. You can
then use semantic search, which understands the
meaning of queries, to interact with uploaded documents
using your LLM.

Both projects will give you a hands-on experience interacting


asynchronously with external systems such as websites, a
database, and a filesystem.
Project: Talk to the Web (Web Scraper)

Companies often host a series of internal web pages for


manuals, processes, and other documentation as HTML pages.
For longer pages, your users may want to provide URLs when
asking questions and expect your LLM to fetch and read the
content. This is where having a built-in web scraper can come
in handy.

There are many ways to build a web scraper for your self-
hosted LLM. Depending on your use case, you can use a
combination of the following methods:

Fetch web pages as HTML and feed the raw HTML (or inner
text content) to your LLM to parse the content into your
desired format.
Use web scraping frameworks such as BeautifulSoup and
ScraPy to parse the content of web pages after fetching.
Use headless web browsers such as Selenium and Microsoft
Playwright to dynamically navigate nodes in pages and parse
content. Headless browsers are great for navigation single-
page applications (SPAs).

You might also like