Switching from Python mock libraries to aimock
pytest-mockllm, openai-responses-python, and evalcraft work great for single-process Python tests. When your AI app spans multiple services—or you want to test from any language—aimock gives you a real mock server accessible from anywhere.
The libraries
| Library | Approach | Scope |
|---|---|---|
| pytest-mockllm | pytest fixture + monkey-patching | OpenAI and Anthropic in-process |
| openai-responses-python | Decorator that intercepts httpx |
OpenAI API responses only |
| evalcraft | Mock + evaluation framework | OpenAI completions + eval metrics |
All three work by intercepting HTTP calls within the same Python process. This is convenient for unit tests, but it breaks down when your AI application spans multiple services (API server, agent worker, background jobs) or when you need to test from Playwright, a Node.js frontend, or another language entirely.
Honest assessment
Two paths for Python teams. If you have Node.js available,
npx @copilotkit/aimock starts a mock server in one command — no
Docker needed. The aimock-pytest pip package is in development to provide
native pytest fixture integration with automatic server lifecycle management. For
Docker-based CI environments, the ghcr.io/copilotkit/aimock image works
with any language.
Code comparison
Here's what the switch looks like in practice. The Python decorator becomes a Docker
container + conftest.py fixture.
pytest-mockllm (before)
import pytest
from pytest_mockllm import mock_openai
@mock_openai(response="Hello from the mock")
def test_my_agent():
result = my_agent.run("hello")
assert result == "Hello from the mock"
openai-responses-python (before)
from openai_responses import mock_completions
@mock_completions(content="Hello from the mock")
def test_chat():
client = OpenAI()
resp = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "hello"}]
)
assert resp.choices[0].message.content == "Hello from the mock"
aimock (after)
import pytest
import subprocess, time, os
@pytest.fixture(scope="session")
def aimock_server():
# Start aimock via Docker
proc = subprocess.Popen([
"docker", "run", "--rm",
"-p", "4010:4010",
"-v", f"{os.getcwd()}/fixtures:/fixtures",
"ghcr.io/copilotkit/aimock:latest",
"-f", "/fixtures", "-h", "0.0.0.0"
])
# Wait for health endpoint — fail loudly if aimock never comes up
import requests
for _ in range(30):
if proc.poll() is not None:
raise RuntimeError(f"aimock exited early with code {proc.returncode}")
try:
if requests.get("http://localhost:4010/health").ok:
break
except requests.ConnectionError:
pass
time.sleep(0.2)
else:
raise RuntimeError("aimock did not become healthy after 30 attempts")
# Save originals so we don't clobber real credentials in the test process
prev_base = os.environ.get("OPENAI_BASE_URL")
prev_key = os.environ.get("OPENAI_API_KEY")
os.environ["OPENAI_BASE_URL"] = "http://localhost:4010/v1"
os.environ["OPENAI_API_KEY"] = "mock-key"
try:
yield "http://localhost:4010"
finally:
proc.terminate()
try:
proc.wait(timeout=10)
except subprocess.TimeoutExpired:
proc.kill()
proc.wait(timeout=5)
# Restore originals (or remove if there were none)
for name, val in (("OPENAI_BASE_URL", prev_base), ("OPENAI_API_KEY", prev_key)):
if val is None:
os.environ.pop(name, None)
else:
os.environ[name] = val
import openai
def test_chat_completion(aimock_server):
client = openai.OpenAI(
base_url=f"{aimock_server}/v1",
api_key="mock-key"
)
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "hello"}]
)
assert response.choices[0].message.content == "Hello from the mock"
{
"match": { "userMessage": "hello" },
"response": { "content": "Hello from the mock" }
}
What you gain
Cross-process, cross-language
Your Python tests, Node.js frontend, Go microservices, and Playwright E2E tests all hit the same mock server. No per-language patching.
12 LLM providers
OpenAI (Chat, Responses, Realtime), Claude, Gemini (REST, Live, and Interactions), Bedrock, Azure, Vertex AI, Ollama, Cohere. The Python libraries only cover OpenAI (and sometimes Anthropic).
Record & replay
Proxy real APIs, save responses as fixtures, replay forever. No manual response construction.
MCP / A2A / AG-UI / Vector
Mock your entire AI stack — LLM, MCP, A2A, AG-UI, vector — on one port.
WebSocket + streaming
Built-in SSE streaming and WebSocket protocol support (OpenAI Realtime, Gemini Live). The Python libraries don't handle streaming.
Chaos testing
Inject latency, drop chunks, corrupt payloads mid-stream. Test your error handling under realistic failure conditions.
What you lose (honestly)
| Capability | Python mocks | aimock | Notes |
|---|---|---|---|
| In-process decorator convenience | ✓ | ✗ | Coming with aimock-pytest pip package |
| Native pytest integration | ✓ | conftest.py fixture | Works, but more boilerplate today |
| Zero infrastructure | ✓ | Docker or npx | Requires Docker or Node.js runtime |
| Cross-process mocking | ✗ | ✓ | aimock's key advantage |
| Multi-provider | 1–2 providers | 12 | |
| Streaming SSE | ✗ | Built-in | |
| WebSocket protocols | ✗ | 3 protocols | |
| Record & replay | ✗ | ✓ | |
| MCP / A2A / AG-UI / Vector | ✗ | ✓ | |
| Chaos testing | ✗ | ✓ |
CLI / Docker quick start
# Run the mock server (requires Node.js, flag-driven llmock bin)
npx -p @copilotkit/aimock llmock -p 4010 -f ./fixtures
# Point your Python app at the mock
export OPENAI_BASE_URL=http://localhost:4010/v1
export OPENAI_API_KEY=mock-key
# Run your tests
pytest
# Pull and run
docker run -d -p 4010:4010 \
-v $(pwd)/fixtures:/fixtures \
ghcr.io/copilotkit/aimock:latest \
-f /fixtures -h 0.0.0.0
# Point your Python app at the mock
export OPENAI_BASE_URL=http://localhost:4010/v1
export OPENAI_API_KEY=mock-key
# Run your tests
pytest
Docker is the recommended path for Python teams since it doesn't
require Node.js in your development environment. Add the container to your
docker-compose.yml or CI pipeline alongside your Python services.
Alternative: npx fixture (no Docker)
If Node.js is available in your environment, you can skip Docker entirely and use
npx @copilotkit/aimock directly from your conftest.py.
import pytest
import subprocess, time, os
@pytest.fixture(scope="session")
def aimock_server():
proc = subprocess.Popen(
["npx", "-p", "@copilotkit/aimock", "llmock", "-p", "4010", "-f", "./fixtures"],
stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
)
# Wait for health endpoint — fail loudly if aimock never comes up
import requests
for _ in range(30):
if proc.poll() is not None:
raise RuntimeError(f"aimock exited early with code {proc.returncode}")
try:
if requests.get("http://localhost:4010/health").ok:
break
except requests.ConnectionError:
pass
time.sleep(0.2)
else:
raise RuntimeError("aimock did not become healthy after 30 attempts")
# Save originals so we don't clobber real credentials in the test process
prev_base = os.environ.get("OPENAI_BASE_URL")
prev_key = os.environ.get("OPENAI_API_KEY")
os.environ["OPENAI_BASE_URL"] = "http://localhost:4010/v1"
os.environ["OPENAI_API_KEY"] = "mock-key"
try:
yield "http://localhost:4010"
finally:
proc.terminate()
try:
proc.wait(timeout=10)
except subprocess.TimeoutExpired:
proc.kill()
proc.wait(timeout=5)
# Restore originals (or remove if there were none)
for name, val in (("OPENAI_BASE_URL", prev_base), ("OPENAI_API_KEY", prev_key)):
if val is None:
os.environ.pop(name, None)
else:
os.environ[name] = val