LlamaIndex

Test your LlamaIndex RAG pipelines end-to-end. aimock mocks both the LLM and the vector database — retriever and generator in one server.

Quick Start

Point the LlamaIndex OpenAI LLM at aimock instead of the real API. No code changes to your RAG pipeline — just swap the base URL.

Python python

from llama_index.llms.openai import OpenAI

# Point at aimock instead of api.openai.com
llm = OpenAI(
    api_base="http://localhost:4010/v1",
    api_key="test",
)

# Configure LlamaIndex to use aimock for both LLM and embeddings
from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = llm
Settings.embed_model = OpenAIEmbedding(api_base="http://localhost:4010/v1", api_key="test")

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What is gravity?")

Start aimock with fixtures that match the queries your pipeline will send:

Terminal shell

npx -p @copilotkit/aimock llmock --fixtures ./fixtures/llamaindex

Mock Both LLM and Vector DB

This is where aimock shines for RAG testing. A LlamaIndex RAG pipeline has two external dependencies: the retriever (vector database) and the generator (LLM). aimock serves both on one port, so a single server replaces Pinecone/Qdrant and OpenAI/Anthropic.

fixtures/rag-pipeline.json json

{
  "fixtures": [
    {
      "match": { "userMessage": "What is gravity?" },
      "response": {
        "content": "Based on the retrieved documents, gravity is a fundamental force of nature that attracts objects with mass toward one another. It is described by Newton's law of universal gravitation and Einstein's general theory of relativity."
      }
    },
    {
      "match": { "inputText": "What is gravity?", "endpoint": "embedding" },
      "response": {
        "embedding": [0.9, 0.1, 0.05]
      }
    }
  ]
}

aimock.json json

{
  "llm": {
    "fixtures": "./fixtures/rag-pipeline.json"
  },
  "vector": {
    "collections": [
      {
        "name": "knowledge-base",
        "dimension": 3,
        "vectors": [
          {
            "id": "doc-gravity",
            "values": [0.9, 0.1, 0.05],
            "metadata": { "source": "physics.pdf", "page": 12 }
          }
        ],
        "queryResults": [
          {
            "id": "doc-gravity",
            "score": 0.97,
            "metadata": { "source": "physics.pdf", "page": 12 }
          }
        ]
      }
    ]
  }
}

Load both with npx @copilotkit/aimock --config aimock.json. The config points to the fixture file via llm.fixtures, so aimock handles both legs of the RAG pipeline:

/v1/chat/completions — matches LLM fixtures for the generator
/vector — serves vector query results for the retriever

Python — dual mock python

from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.qdrant import QdrantVectorStore

# Generator: LLM pointed at aimock
llm = OpenAI(
    api_base="http://localhost:4010/v1",
    api_key="test",
)

# Embeddings: also served by aimock
embed_model = OpenAIEmbedding(
    api_base="http://localhost:4010/v1",
    api_key="test",
)

# Retriever: aimock's vector endpoint
# Point your vector store client at localhost:4010/vector
# aimock implements the Qdrant-compatible REST API

# Now your entire RAG pipeline runs against one mock server

Embedding Fixtures

LlamaIndex indexes documents by generating embeddings. Use inputText matching to return deterministic embedding vectors for specific inputs, ensuring your indexing and retrieval paths produce consistent results in tests.

fixtures/embeddings.json json

{
  "fixtures": [
    {
      "match": { "inputText": "What is gravity?", "endpoint": "embedding" },
      "response": {
        "embedding": [0.9, 0.1, 0.05]
      }
    },
    {
      "match": { "inputText": "Gravity is a fundamental force", "endpoint": "embedding" },
      "response": {
        "embedding": [0.88, 0.12, 0.07]
      }
    }
  ]
}

The inputText matcher performs substring matching, so "gravity" matches any input containing that word. Use exact strings when you need precision.

With aimock-pytest

The aimock-pytest plugin starts and stops the server automatically per test. Install with pip install aimock-pytest.

test_rag.py python

from llama_index.llms.openai import OpenAI

def test_rag_query(aimock):
    # Load fixtures before making LLM calls
    aimock.load_fixtures("./fixtures/llamaindex/rag.json")

    llm = OpenAI(
        api_base=f"{aimock.url}/v1",
        api_key="test",
    )
    response = llm.complete("What is gravity?")
    assert "force" in str(response).lower()

CI with GitHub Action

Run your LlamaIndex test suite in CI with the aimock GitHub Action. The action starts aimock as a background service and exposes it on the default port.

.github/workflows/test.yml yaml

name: LlamaIndex Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Start aimock
        uses: CopilotKit/aimock@v1
        with:
          fixtures: ./fixtures/llamaindex

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Run tests
        run: pytest tests/
        env:
          OPENAI_BASE_URL: http://127.0.0.1:4010/v1
          OPENAI_API_KEY: test

No API keys needed in CI. Your LlamaIndex pipeline talks to aimock, which returns deterministic responses from fixtures.

Record & Replay

Record a RAG query end-to-end against real services, then replay it in tests. aimock captures both the LLM completions and the embedding calls, so the full pipeline is reproducible.

Record mode shell

# Record LLM and embedding calls from a live session
npx -p @copilotkit/aimock llmock \
  --record \
  --provider-openai https://api.openai.com \
  --fixtures ./fixtures/llamaindex

# Run your LlamaIndex pipeline against aimock
python my_rag_pipeline.py

# aimock saves fixtures to ./fixtures/llamaindex/
# Next run replays them without hitting the real API

Replay in tests python

def test_rag_query(aimock):
    # Load the recorded fixtures
    aimock.load_fixtures("./fixtures/llamaindex/recorded.json")

    from llama_index.llms.openai import OpenAI
    llm = OpenAI(api_base=f"{aimock.url}/v1", api_key="test")
    # ... run your RAG pipeline, assert on results