CrewAI

Test your CrewAI crews without API keys. Each agent in a crew makes its own LLM calls — aimock handles them all with fixture-based responses.

Quick Start

CrewAI agents make OpenAI-compatible LLM calls by default. Point them at aimock and every agent in your crew will send requests to the mock server instead of the real API. The recommended approach is to use CrewAI's LLM class with an explicit base_url, shown in the examples below.

                Start aimock, then run the crew
                shell
              

# Terminal 1 — start the mock server
npx -p @copilotkit/aimock llmock --fixtures ./fixtures

# Terminal 2 — run your CrewAI script
export OPENAI_BASE_URL=http://localhost:4010/v1
export OPENAI_API_KEY=test
python crew.py

crew.py python

from crewai import Agent, Task, Crew, LLM

# Recommended: use the LLM class with an explicit base_url
llm = LLM(
    model="openai/gpt-4o",
    base_url="http://localhost:4010/v1",
    api_key="test",
)

researcher = Agent(
    role="Researcher",
    goal="Research topics",
    backstory="Expert researcher",
    llm=llm,
)

task = Task(
    description="Research the history of testing",
    expected_output="A short summary",
    agent=researcher,
)

crew = Crew(agents=[researcher], tasks=[task])
result = crew.kickoff()
print(result)

Environment variables vs LLM(base_url=...) — Setting OPENAI_BASE_URL works when agents use the default OpenAI provider, but the LLM(base_url=...) approach is more reliable across all configurations and is the recommended way to point CrewAI at aimock.

With aimock-pytest

The aimock-pytest plugin starts and stops the server automatically per test, so you never need to manage a background process.

Install shell

pip install aimock-pytest

conftest.py python

import os, pytest

@pytest.fixture(autouse=True)
def mock_llm(aimock):
    """aimock-pytest provides the `aimock` fixture automatically.
    It starts a fresh server for each test that requests it (function-scoped).
    You must set OPENAI_BASE_URL yourself so CrewAI agents route to aimock."""
    os.environ["OPENAI_BASE_URL"] = aimock.url + "/v1"
    os.environ["OPENAI_API_KEY"] = "test"
    aimock.load_fixtures("./fixtures/crewai-crew.json")
    yield aimock

test_crew.py python

from crewai import Agent, Task, Crew

def test_researcher_crew():
    researcher = Agent(
        role="Researcher",
        goal="Research topics",
        backstory="Expert researcher",
    )
    task = Task(
        description="Summarize recent AI breakthroughs",
        expected_output="A short summary",
        agent=researcher,
    )
    crew = Crew(agents=[researcher], tasks=[task])
    result = crew.kickoff()
    assert "AI" in str(result)

Multi-Agent Crews

In a CrewAI crew, each agent makes independent LLM calls. The researcher agent sends its own chat completion request, then the writer agent sends a separate one. Because aimock matches on the userMessage field, you can write fixtures that target each agent's prompt pattern independently.

fixtures/crewai-crew.json json

{
  "fixtures": [
    {
      "match": { "userMessage": "research" },
      "response": {
        "content": "Based on my research, the key findings are:\n\n1. LLM testing with fixture-based mocks eliminates flaky tests caused by non-deterministic API responses.\n2. Proxy recording captures real interactions for replay in CI without API keys.\n3. Multi-agent frameworks like CrewAI benefit most because each agent multiplies the number of LLM calls per run."
      }
    },
    {
      "match": { "userMessage": "write" },
      "response": {
        "content": "# Testing LLMs in CI\n\nFixture-based mocking brings determinism to AI-powered applications. By replacing real API calls with recorded responses, teams ship faster with confidence.\n\n## Why It Matters\n\nEvery agent in a CrewAI crew makes independent LLM calls. Without mocking, a two-agent crew means two sources of non-determinism per run. With aimock, every call returns the exact same response every time."
      }
    }
  ]
}

Two-agent crew python

from crewai import Agent, Task, Crew, Process

researcher = Agent(
    role="Researcher",
    goal="Research topics thoroughly",
    backstory="Senior research analyst",
)

writer = Agent(
    role="Writer",
    goal="Write compelling articles",
    backstory="Technical content writer",
)

research_task = Task(
    description="Research LLM testing best practices",
    expected_output="Key findings as bullet points",
    agent=researcher,
)

write_task = Task(
    description="Write a blog post from the research",
    expected_output="A short blog post in markdown",
    agent=writer,
)

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task],
    process=Process.sequential,
)
result = crew.kickoff()

The researcher's prompt contains "research", matching the first fixture. The writer's prompt contains "write", matching the second. Each agent gets its own deterministic response.

Tool Calls

CrewAI agents can use tools. When an agent invokes a tool, CrewAI sends a chat completion with tool_choice and expects a tool-call response. aimock fixtures handle this with the toolCalls response field.

fixtures/crewai-tools.json json

{
  "fixtures": [
    {
      "match": { "userMessage": "search", "sequenceIndex": 0 },
      "response": {
        "toolCalls": [
          {
            "name": "web_search",
            "arguments": { "query": "LLM testing frameworks 2025" }
          }
        ]
      }
    },
    {
      "match": { "userMessage": "search", "sequenceIndex": 1 },
      "response": {
        "content": "Based on the search results, the top LLM testing frameworks are aimock, promptfoo, and deepeval."
      }
    }
  ]
}

Agent with tools python

from crewai import Agent, Task, Crew
from crewai_tools import SerperDevTool

search_tool = SerperDevTool()

researcher = Agent(
    role="Researcher",
    goal="Search the web for information",
    backstory="Expert web researcher",
    tools=[search_tool],
)

task = Task(
    description="Search for the latest LLM testing frameworks",
    expected_output="A ranked list of frameworks",
    agent=researcher,
)

crew = Crew(agents=[researcher], tasks=[task])
result = crew.kickoff()

The first fixture triggers the tool call. After CrewAI processes the tool result and sends it back to the LLM, the second fixture matches the follow-up message and returns the final answer.

CI with GitHub Action

Use the CopilotKit/aimock GitHub Action to run aimock as a background service in your CI pipeline.

.github/workflows/test.yml yaml

name: Test CrewAI Crew
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - uses: CopilotKit/aimock@v1
        with:
          fixtures: ./fixtures
      - run: pip install crewai pytest aimock-pytest
      - run: pytest
        env:
          OPENAI_BASE_URL: http://127.0.0.1:4010/v1
          OPENAI_API_KEY: test

The action starts aimock on port 4010, loads your fixtures, and keeps the server running for the duration of the job. No real API keys needed.

Record & Replay

Record a full crew execution against a real LLM, then replay it deterministically in tests and CI. This is especially useful for capturing complex multi-agent interactions.

Record a crew run shell

# Start aimock in record mode — unmatched requests go to OpenAI
npx -p @copilotkit/aimock llmock --fixtures ./fixtures \
  --record \
  --provider-openai https://api.openai.com

# Run your crew with the real API key (proxied through aimock)
export OPENAI_BASE_URL=http://localhost:4010/v1
export OPENAI_API_KEY=sk-your-real-key
python crew.py

# New fixtures appear in ./fixtures/recorded/
# Commit them to your repo for deterministic replay

On subsequent runs without --record, aimock replays the recorded fixtures. Every agent in the crew gets the exact same response it received during the original recording, making your tests fully reproducible.