Skip to main content

High Concurrency Tuning

This document provides best practices for high-concurrency scenarios.

Connection Pooling

Reuse HTTP connections to reduce connection establishment overhead:

import httpx
from openai import OpenAI

client = OpenAI(
    api_key="sk-real200-xxx",
    base_url="https://real200.com/v1",
    http_client=httpx.Client(limits=httpx.Limits(max_connections=100))
)

Concurrency Control

Recommended to use semaphores to control concurrency:

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(api_key="sk-real200-xxx", base_url="https://real200.com/v1")
semaphore = asyncio.Semaphore(20)  # Max concurrent 20

async def call_api(prompt):
    async with semaphore:
        return await client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": prompt}]
        )

Rate Limit Handling

High concurrency more easily triggers rate limits (429 errors). Recommendations:

Implement exponential backoff retry
Spread requests across different time windows
Contact admin to increase rate limit thresholds

Connection Pooling
Concurrency Control
Rate Limit Handling