Python asyncio：提升AI应用性能的异步编程指南

MarkTechPost@AI 09月28日 06:34

本文深入探讨了Python的asyncio库，阐述了其如何通过异步编程显著提升AI应用的性能。文章详细解释了asyncio的核心概念，即利用async/await语法实现并发执行，尤其适用于处理大量I/O密集型任务，如LLM API调用。通过与同步编程的对比示例，直观展示了asyncio在减少等待时间、提高执行效率方面的巨大优势。文章还提供了在实际AI应用场景中（如多用户内容生成、多API数据整合）应用asyncio的指导，强调了其在提升响应速度、降低成本和增强可伸缩性方面的重要性。

✨ Python的asyncio库是实现异步编程的关键工具，它允许开发者使用async/await语法编写高效的并发代码。这对于处理I/O密集型任务（如API调用）至关重要，因为它可以让程序在等待响应的同时执行其他任务，从而避免阻塞，极大提升整体执行效率。

🚀 在AI应用中，特别是与大型语言模型（LLMs）交互时，asyncio能带来显著的性能提升。通过异步方式并行处理多个API请求，可以从同步模式下所有请求时间累加的总耗时（例如49.76秒）大幅缩短至接近最慢单个请求的耗时（例如8.25秒），这对于需要处理大量请求的应用至关重要。

💡 异步编程在多种AI应用场景中具有重要价值，包括：同时为多个用户生成内容（如聊天机器人、推荐引擎）、在一个工作流中多次调用LLM（用于摘要、优化、多步推理）、以及从多个API（如向量数据库、外部服务）并行获取数据。这些应用场景都能从asyncio带来的性能、成本效益、用户体验和可伸缩性优势中获益。

📊 通过对比同步和异步执行LLM API调用的示例，文章清晰地展示了asyncio的优势。同步模式下，每个请求按顺序执行，耗时累加；而异步模式下，所有请求几乎同时发起，总耗时接近最长单次请求时间，这种并行处理能力是加速AI应用的关键。

In many AI applications today, performance is a big deal. You may have noticed that while working with Large Language Models (LLMs), a lot of time is spent waiting—waiting for an API response, waiting for multiple calls to finish, or waiting for I/O operations.

That’s where asyncio comes in. Surprisingly, many developers use LLMs without realizing they can speed up their apps with asynchronous programming.

This guide will walk you through:

What is asyncio?Getting started with asynchronous PythonUsing asyncio in an AI application with an LLM

What is Asyncio?

Python’s asyncio library enables writing concurrent code using the async/await syntax, allowing multiple I/O-bound tasks to run efficiently within a single thread. At its core, asyncio works with awaitable objects—usually coroutines—that an event loop schedules and executes without blocking.

In simpler terms, synchronous code runs tasks one after another, like standing in a single grocery line, while asynchronous code runs tasks concurrently, like using multiple self-checkout machines. This is especially useful for API calls (e.g., OpenAI, Anthropic, Hugging Face), where most of the time is spent waiting for responses, enabling much faster execution.

Getting Started with asynchronous Python

Example: Running Tasks With and Without asyncio

In this example, we ran a simple function three times in a synchronous way. The output shows that each call to say_hello() prints “Hello…”, waits 2 seconds, then prints “…World!”. Since the calls happen one after another, the wait time adds up — 2 seconds × 3 calls = 6 seconds total. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser

import timedef say_hello():    print("Hello...")    time.sleep(2)  # simulate waiting (like an API call)    print("...World!")def main():    say_hello()    say_hello()    say_hello()if __name__ == "__main__":    start = time.time()    main()    print(f"Finished in {time.time() - start:.2f} seconds")

The below code shows that all three calls to the say_hello() function started almost at the same time. Each prints “Hello…” immediately, then waits 2 seconds concurrently before printing “…World!”.

Because these tasks ran in parallel rather than one after another, the total time is roughly the longest single wait time (~2 seconds) instead of the sum of all waits (6 seconds in the synchronous version). This demonstrates the performance advantage of asyncio for I/O-bound tasks. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser

import nest_asyncio, asyncionest_asyncio.apply()import timeasync def say_hello():    print("Hello...")    await asyncio.sleep(2)  # simulate waiting (like an API call)    print("...World!")async def main():    # Run tasks concurrently    await asyncio.gather(        say_hello(),        say_hello(),        say_hello()    )if __name__ == "__main__":    start = time.time()    asyncio.run(main())    print(f"Finished in {time.time() - start:.2f} seconds")

Example: Download Simulation

Imagine you need to download several files. Each download takes time, but during that wait, your program can work on other downloads instead of sitting idle.

Copy CodeCopiedUse a different Browser

import asyncioimport randomimport timeasync def download_file(file_id: int):    print(f"Start downloading file {file_id}")    download_time = random.uniform(1, 3)  # simulate variable download time    await asyncio.sleep(download_time)    # non-blocking wait    print(f"Finished downloading file {file_id} in {download_time:.2f} seconds")    return f"File {file_id} content"async def main():    files = [1, 2, 3, 4, 5]    start_time = time.time()        # Run downloads concurrently    results = await asyncio.gather(*(download_file(f) for f in files))        end_time = time.time()    print("\nAll downloads completed.")    print(f"Total time taken: {end_time - start_time:.2f} seconds")    print("Results:", results)if __name__ == "__main__":    asyncio.run(main())

All downloads started almost at the same time, as shown by the “Start downloading file X” lines appearing immediately one after another.Each file took a different amount of time to “download” (simulated with asyncio.sleep()), so they finished at different times — file 3 finished first in 1.42 seconds, and file 1 last in 2.67 seconds.Since all downloads were running concurrently, the total time taken was roughly equal to the longest single download time (2.68 seconds), not the sum of all times.

This demonstrates the power of asyncio — when tasks involve waiting, they can be done in parallel, greatly improving efficiency.

Using asyncio in an AI application with an LLM

Now that we understand how asyncio works, let’s apply it to a real-world AI example. Large Language Models (LLMs) such as OpenAI’s GPT models often involve multiple API calls that each take time to complete. If we run these calls one after another, we waste valuable time waiting for responses.

In this section, we’ll compare running multiple prompts with and without asyncio using the OpenAI client. We’ll use 15 short prompts to clearly demonstrate the performance difference. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser

!pip install openai

Copy CodeCopiedUse a different Browser

import asynciofrom openai import AsyncOpenAIimport osfrom getpass import getpassos.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')

Copy CodeCopiedUse a different Browser

import timefrom openai import OpenAI# Create sync clientclient = OpenAI()def ask_llm(prompt: str):    response = client.chat.completions.create(        model="gpt-4o-mini",        messages=[{"role": "user", "content": prompt}]    )    return response.choices[0].message.contentdef main():    prompts = [    "Briefly explain quantum computing.",    "Write a 3-line haiku about AI.",    "List 3 startup ideas in agri-tech.",    "Summarize Inception in 2 sentences.",    "Explain blockchain in 2 sentences.",    "Write a 3-line story about a robot.",    "List 5 ways AI helps healthcare.",    "Explain Higgs boson in simple terms.",    "Describe neural networks in 2 sentences.",    "List 5 blog post ideas on renewable energy.",    "Give a short metaphor for time.",    "List 3 emerging trends in ML.",    "Write a short limerick about programming.",    "Explain supervised vs unsupervised learning in one sentence.",    "List 3 ways to reduce urban traffic."]    start = time.time()    results = []    for prompt in prompts:        results.append(ask_llm(prompt))    end = time.time()    for i, res in enumerate(results, 1):        print(f"\n--- Response {i} ---")        print(res)    print(f"\n[Synchronous] Finished in {end - start:.2f} seconds")if __name__ == "__main__":    main()

The synchronous version processed all 15 prompts one after another, so the total time is the sum of each request’s duration. Since each request took time to complete, the overall runtime was much longer — 49.76 seconds in this case. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser

from openai import AsyncOpenAI# Create async clientclient = AsyncOpenAI()async def ask_llm(prompt: str):    response = await client.chat.completions.create(        model="gpt-4o-mini",        messages=[{"role": "user", "content": prompt}]    )    return response.choices[0].message.contentasync def main():    prompts = [    "Briefly explain quantum computing.",    "Write a 3-line haiku about AI.",    "List 3 startup ideas in agri-tech.",    "Summarize Inception in 2 sentences.",    "Explain blockchain in 2 sentences.",    "Write a 3-line story about a robot.",    "List 5 ways AI helps healthcare.",    "Explain Higgs boson in simple terms.",    "Describe neural networks in 2 sentences.",    "List 5 blog post ideas on renewable energy.",    "Give a short metaphor for time.",    "List 3 emerging trends in ML.",    "Write a short limerick about programming.",    "Explain supervised vs unsupervised learning in one sentence.",    "List 3 ways to reduce urban traffic."]    start = time.time()    results = await asyncio.gather(*(ask_llm(p) for p in prompts))    end = time.time()    for i, res in enumerate(results, 1):        print(f"\n--- Response {i} ---")        print(res)    print(f"\n[Asynchronous] Finished in {end - start:.2f} seconds")if __name__ == "__main__":    asyncio.run(main())

The asynchronous version processed all 15 prompts concurrently, starting them almost at the same time instead of one by one. As a result, the total runtime was close to the time of the slowest single request — 8.25 seconds instead of adding up all requests.

The large difference happens because, in synchronous execution, each API call blocks the program until it finishes, so times add up. In asynchronous execution with asyncio, API calls run in parallel, allowing the program to handle many tasks while waiting for responses, drastically reducing total execution time.

Why This Matters in AI Applications

In real-world AI applications, waiting for each request to finish before starting the next can quickly become a bottleneck, especially when dealing with multiple queries or data sources. This is particularly common in workflows such as:

Generating content for multiple users simultaneously — e.g., chatbots, recommendation engines, or multi-user dashboards.Calling the LLM several times in one workflow — such as for summarization, refinement, classification, or multi-step reasoning.Fetching data from multiple APIs — for example, combining LLM output with information from a vector database or external APIs.

Using asyncio in these cases brings significant benefits:

Improved performance

Cost efficiency

Better user experience

Scalability

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post What is Asyncio? Getting Started with Asynchronous Python and Using Asyncio in an AI Application with an LLM appeared first on MarkTechPost.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

What is Asyncio?

Getting Started with asynchronous Python

Example: Running Tasks With and Without asyncio

Example: Download Simulation

Using asyncio in an AI application with an LLM

Why This Matters in AI Applications

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签