Switching from Python mock libraries to aimock

pytest-mockllm, openai-responses-python, and evalcraft work great for single-process Python tests. When your AI app spans multiple services—or you want to test from any language—aimock gives you a real mock server accessible from anywhere.

The libraries

Library	Approach	Scope
pytest-mockllm	pytest fixture + monkey-patching	OpenAI and Anthropic in-process
openai-responses-python	Decorator that intercepts `httpx`	OpenAI API responses only
evalcraft	Mock + evaluation framework	OpenAI completions + eval metrics

All three work by intercepting HTTP calls within the same Python process. This is convenient for unit tests, but it breaks down when your AI application spans multiple services (API server, agent worker, background jobs) or when you need to test from Playwright, a Node.js frontend, or another language entirely.

Honest assessment

Two paths for Python teams. If you have Node.js available, npx @copilotkit/aimock starts a mock server in one command — no Docker needed. The aimock-pytest pip package is in development to provide native pytest fixture integration with automatic server lifecycle management. For Docker-based CI environments, the ghcr.io/copilotkit/aimock image works with any language.

Code comparison

Here's what the switch looks like in practice. The Python decorator becomes a Docker container + conftest.py fixture.

pytest-mockllm (before)

test_agent.py py

import pytest
from pytest_mockllm import mock_openai

@mock_openai(response="Hello from the mock")
def test_my_agent():
    result = my_agent.run("hello")
    assert result == "Hello from the mock"

openai-responses-python (before)

test_completions.py py

from openai_responses import mock_completions

@mock_completions(content="Hello from the mock")
def test_chat():
    client = OpenAI()
    resp = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "hello"}]
    )
    assert resp.choices[0].message.content == "Hello from the mock"

aimock (after)

conftest.py py

import pytest
import subprocess, time, os

@pytest.fixture(scope="session")
def aimock_server():
    # Start aimock via Docker
    proc = subprocess.Popen([
        "docker", "run", "--rm",
        "-p", "4010:4010",
        "-v", f"{os.getcwd()}/fixtures:/fixtures",
        "ghcr.io/copilotkit/aimock:latest",
        "-f", "/fixtures", "-h", "0.0.0.0"
    ])
    # Wait for health endpoint — fail loudly if aimock never comes up
    import requests
    for _ in range(30):
        if proc.poll() is not None:
            raise RuntimeError(f"aimock exited early with code {proc.returncode}")
        try:
            if requests.get("http://localhost:4010/health").ok:
                break
        except requests.ConnectionError:
            pass
        time.sleep(0.2)
    else:
        raise RuntimeError("aimock did not become healthy after 30 attempts")

    # Save originals so we don't clobber real credentials in the test process
    prev_base = os.environ.get("OPENAI_BASE_URL")
    prev_key = os.environ.get("OPENAI_API_KEY")
    os.environ["OPENAI_BASE_URL"] = "http://localhost:4010/v1"
    os.environ["OPENAI_API_KEY"] = "mock-key"

    try:
        yield "http://localhost:4010"
    finally:
        proc.terminate()
        try:
            proc.wait(timeout=10)
        except subprocess.TimeoutExpired:
            proc.kill()
            proc.wait(timeout=5)
        # Restore originals (or remove if there were none)
        for name, val in (("OPENAI_BASE_URL", prev_base), ("OPENAI_API_KEY", prev_key)):
            if val is None:
                os.environ.pop(name, None)
            else:
                os.environ[name] = val

test_agent.py py

import openai

def test_chat_completion(aimock_server):
    client = openai.OpenAI(
        base_url=f"{aimock_server}/v1",
        api_key="mock-key"
    )
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "hello"}]
    )
    assert response.choices[0].message.content == "Hello from the mock"

fixtures/hello.json json

{
  "match": { "userMessage": "hello" },
  "response": { "content": "Hello from the mock" }
}

What you gain

🌐

Cross-process, cross-language

Your Python tests, Node.js frontend, Go microservices, and Playwright E2E tests all hit the same mock server. No per-language patching.

📡

12 LLM providers

OpenAI (Chat, Responses, Realtime), Claude, Gemini (REST, Live, and Interactions), Bedrock, Azure, Vertex AI, Ollama, Cohere. The Python libraries only cover OpenAI (and sometimes Anthropic).

⏺

Record & replay

Proxy real APIs, save responses as fixtures, replay forever. No manual response construction.

🧩

MCP / A2A / AG-UI / Vector

Mock your entire AI stack — LLM, MCP, A2A, AG-UI, vector — on one port.

🔌

WebSocket + streaming

Built-in SSE streaming and WebSocket protocol support (OpenAI Realtime, Gemini Live). The Python libraries don't handle streaming.

💥

Chaos testing

Inject latency, drop chunks, corrupt payloads mid-stream. Test your error handling under realistic failure conditions.

What you lose (honestly)

Capability	Python mocks	aimock	Notes
In-process decorator convenience	✓	✗	Coming with `aimock-pytest` pip package
Native pytest integration	✓	conftest.py fixture	Works, but more boilerplate today
Zero infrastructure	✓	Docker or npx	Requires Docker or Node.js runtime
Cross-process mocking	✗	✓	aimock's key advantage
Multi-provider	1–2 providers	12
Streaming SSE	✗	Built-in
WebSocket protocols	✗	3 protocols
Record & replay	✗	✓
MCP / A2A / AG-UI / Vector	✗	✓
Chaos testing	✗	✓

CLI / Docker quick start

Install & run sh

# Run the mock server (requires Node.js, flag-driven llmock bin)
npx -p @copilotkit/aimock llmock -p 4010 -f ./fixtures

# Point your Python app at the mock
export OPENAI_BASE_URL=http://localhost:4010/v1
export OPENAI_API_KEY=mock-key

# Run your tests
pytest

Docker (no Node.js required) sh

# Pull and run
docker run -d -p 4010:4010 \
  -v $(pwd)/fixtures:/fixtures \
  ghcr.io/copilotkit/aimock:latest \
  -f /fixtures -h 0.0.0.0

# Point your Python app at the mock
export OPENAI_BASE_URL=http://localhost:4010/v1
export OPENAI_API_KEY=mock-key

# Run your tests
pytest

Docker is the recommended path for Python teams since it doesn't require Node.js in your development environment. Add the container to your docker-compose.yml or CI pipeline alongside your Python services.

Alternative: npx fixture (no Docker)

If Node.js is available in your environment, you can skip Docker entirely and use npx @copilotkit/aimock directly from your conftest.py.

conftest.py (npx) py

import pytest
import subprocess, time, os

@pytest.fixture(scope="session")
def aimock_server():
    proc = subprocess.Popen(
        ["npx", "-p", "@copilotkit/aimock", "llmock", "-p", "4010", "-f", "./fixtures"],
        stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
    )
    # Wait for health endpoint — fail loudly if aimock never comes up
    import requests
    for _ in range(30):
        if proc.poll() is not None:
            raise RuntimeError(f"aimock exited early with code {proc.returncode}")
        try:
            if requests.get("http://localhost:4010/health").ok:
                break
        except requests.ConnectionError:
            pass
        time.sleep(0.2)
    else:
        raise RuntimeError("aimock did not become healthy after 30 attempts")

    # Save originals so we don't clobber real credentials in the test process
    prev_base = os.environ.get("OPENAI_BASE_URL")
    prev_key = os.environ.get("OPENAI_API_KEY")
    os.environ["OPENAI_BASE_URL"] = "http://localhost:4010/v1"
    os.environ["OPENAI_API_KEY"] = "mock-key"

    try:
        yield "http://localhost:4010"
    finally:
        proc.terminate()
        try:
            proc.wait(timeout=10)
        except subprocess.TimeoutExpired:
            proc.kill()
            proc.wait(timeout=5)
        # Restore originals (or remove if there were none)
        for name, val in (("OPENAI_BASE_URL", prev_base), ("OPENAI_API_KEY", prev_key)):
            if val is None:
                os.environ.pop(name, None)
            else:
                os.environ[name] = val