High Concurrency Tuning
This document provides best practices for high-concurrency scenarios.
Connection Poolingâ
Reuse HTTP connections to reduce connection establishment overhead:
import httpx
from openai import OpenAI
client = OpenAI(
api_key="sk-real200-xxx",
base_url="https://real200.com/v1",
http_client=httpx.Client(limits=httpx.Limits(max_connections=100))
)
Concurrency Controlâ
Recommended to use semaphores to control concurrency:
import asyncio
from openai import AsyncOpenAI
client = AsyncOpenAI(api_key="sk-real200-xxx", base_url="https://real200.com/v1")
semaphore = asyncio.Semaphore(20) # Max concurrent 20
async def call_api(prompt):
async with semaphore:
return await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
Rate Limit Handlingâ
High concurrency more easily triggers rate limits (429 errors). Recommendations:
- Implement exponential backoff retry
- Spread requests across different time windows
- Contact admin to increase rate limit thresholds