Switching from Python mock libraries to aimock

pytest-mockllm, openai-responses-python, and evalcraft work great for single-process Python tests. When your AI app spans multiple services—or you want to test from any language—aimock gives you a real mock server accessible from anywhere.

The libraries

Library	Approach	Scope
pytest-mockllm	pytest fixture + monkey-patching	OpenAI and Anthropic in-process
openai-responses-python	Decorator that intercepts `httpx`	OpenAI API responses only
evalcraft	Mock + evaluation framework	OpenAI completions + eval metrics

All three work by intercepting HTTP calls within the same Python process. This is convenient for unit tests, but it breaks down when your AI application spans multiple services (API server, agent worker, background jobs) or when you need to test from Playwright, a Node.js frontend, or another language entirely.

Honest assessment

Two paths for Python teams. If you have Node.js available, npx aimock starts a mock server in one command — no Docker needed. The aimock-pytest pip package is in development to provide native pytest fixture integration with automatic server lifecycle management. For Docker-based CI environments, the ghcr.io/copilotkit/aimock image works with any language.

Code comparison

Here's what the switch looks like in practice. The Python decorator becomes a Docker container + conftest.py fixture.

pytest-mockllm (before)

test_agent.py py

import pytest
from pytest_mockllm import mock_openai

@mock_openai(response="Hello from the mock")
def test_my_agent():
    result = my_agent.run("hello")
    assert result == "Hello from the mock"

openai-responses-python (before)

test_completions.py py

from openai_responses import mock_completions

@mock_completions(content="Hello from the mock")
def test_chat():
    client = OpenAI()
    resp = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "hello"}]
    )
    assert resp.choices[0].message.content == "Hello from the mock"

aimock (after)

conftest.py py

import pytest
import subprocess, time, os

@pytest.fixture(scope="session")
def aimock_server():
    # Start aimock via Docker
    proc = subprocess.Popen([
        "docker", "run", "--rm",
        "-p", "4010:4010",
        "-v", f"{os.getcwd()}/fixtures:/fixtures",
        "ghcr.io/copilotkit/aimock:latest",
        "-f", "/fixtures"
    ])
    time.sleep(2)  # wait for server

    # Point OpenAI SDK at the mock
    os.environ["OPENAI_BASE_URL"] = "http://localhost:4010/v1"
    os.environ["OPENAI_API_KEY"] = "mock-key"

    yield "http://localhost:4010"
    proc.terminate()
    proc.wait()

test_agent.py py

import openai

def test_chat_completion(aimock_server):
    client = openai.OpenAI(
        base_url=f"{aimock_server}/v1",
        api_key="mock-key"
    )
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "hello"}]
    )
    assert response.choices[0].message.content == "Hello from the mock"

fixtures/hello.json json

{
  "match": { "userMessage": "hello" },
  "response": { "content": "Hello from the mock" }
}

What you gain

🌐

Cross-process, cross-language

Your Python tests, Node.js frontend, Go microservices, and Playwright E2E tests all hit the same mock server. No per-language patching.

📡

10+ LLM providers

OpenAI, Claude, Gemini, Bedrock, Azure, Vertex AI, Ollama, Cohere. The Python libraries only cover OpenAI (and sometimes Anthropic).

⏺

Record & replay

Proxy real APIs, save responses as fixtures, replay forever. No manual response construction.

🧩

MCP / A2A / Vector

Mock your entire AI stack—MCP tool servers, A2A agent endpoints, vector databases—not just LLM calls.

🔌

WebSocket + streaming

Built-in SSE streaming and WebSocket protocol support (OpenAI Realtime, Gemini Live). The Python libraries don't handle streaming.

💥

Chaos testing

Inject latency, drop chunks, corrupt payloads mid-stream. Test your error handling under realistic failure conditions.

What you lose (honestly)

Capability	Python mocks	aimock	Notes
In-process decorator convenience	✓	✗	Coming with `aimock-pytest` pip package
Native pytest integration	✓	conftest.py fixture	Works, but more boilerplate today
Zero infrastructure	✓	Docker or npx	Requires Docker or Node.js runtime
Cross-process mocking	✗	✓	aimock's key advantage
Multi-provider	1–2 providers	10+
Streaming SSE	✗	Built-in
WebSocket protocols	✗	3 protocols
Record & replay	✗	✓
MCP / A2A / Vector	✗	✓
Chaos testing	✗	✓

CLI / Docker quick start

Install & run sh

# Run the mock server (requires Node.js)
npx aimock -p 4010 -f ./fixtures

# Point your Python app at the mock
export OPENAI_BASE_URL=http://localhost:4010/v1
export OPENAI_API_KEY=mock-key

# Run your tests
pytest

Docker (no Node.js required) sh

# Pull and run
docker run -d -p 4010:4010 \
  -v $(pwd)/fixtures:/fixtures \
  ghcr.io/copilotkit/aimock:latest \
  -f /fixtures

# Point your Python app at the mock
export OPENAI_BASE_URL=http://localhost:4010/v1
export OPENAI_API_KEY=mock-key

# Run your tests
pytest

Docker is the recommended path for Python teams since it doesn't require Node.js in your development environment. Add the container to your docker-compose.yml or CI pipeline alongside your Python services.

Alternative: npx fixture (no Docker)

If Node.js is available in your environment, you can skip Docker entirely and use npx aimock directly from your conftest.py.

conftest.py (npx) py

import pytest
import subprocess, time, os

@pytest.fixture(scope="session")
def aimock_server():
    proc = subprocess.Popen(
        ["npx", "aimock", "-p", "4010", "-f", "./fixtures"],
        stdout=subprocess.PIPE, stderr=subprocess.PIPE
    )
    time.sleep(2)  # wait for server

    os.environ["OPENAI_BASE_URL"] = "http://localhost:4010/v1"
    os.environ["OPENAI_API_KEY"] = "mock-key"

    yield "http://localhost:4010"
    proc.terminate()
    proc.wait()