LlamaIndex
Test your LlamaIndex RAG pipelines end-to-end. aimock mocks both the LLM and the vector database — retriever and generator in one server.
Quick Start
Point the LlamaIndex OpenAI LLM at aimock instead of the real API. No code changes to your RAG pipeline — just swap the base URL.
from llama_index.llms.openai import OpenAI
# Point at aimock instead of api.openai.com
llm = OpenAI(
api_base="http://localhost:4010/v1",
api_key="test",
)
# Configure LlamaIndex to use aimock for both LLM and embeddings
from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings.openai import OpenAIEmbedding
Settings.llm = llm
Settings.embed_model = OpenAIEmbedding(api_base="http://localhost:4010/v1", api_key="test")
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What is gravity?")
Start aimock with fixtures that match the queries your pipeline will send:
npx aimock --fixtures ./fixtures/llamaindex
Mock Both LLM and Vector DB
This is where aimock shines for RAG testing. A LlamaIndex RAG pipeline has two external dependencies: the retriever (vector database) and the generator (LLM). aimock serves both on one port, so a single server replaces Pinecone/Qdrant and OpenAI/Anthropic.
{
"fixtures": [
{
"match": { "userMessage": "What is gravity?" },
"response": {
"content": "Based on the retrieved documents, gravity is a fundamental force of nature that attracts objects with mass toward one another. It is described by Newton's law of universal gravitation and Einstein's general theory of relativity."
}
},
{
"match": { "inputText": "What is gravity?", "endpoint": "embedding" },
"response": {
"embedding": [0.9, 0.1, 0.05]
}
}
]
}
{
"llm": {
"fixtures": "./fixtures/rag-pipeline.json"
},
"vector": {
"collections": [
{
"name": "knowledge-base",
"dimension": 3,
"vectors": [
{
"id": "doc-gravity",
"values": [0.9, 0.1, 0.05],
"metadata": { "source": "physics.pdf", "page": 12 }
}
],
"queryResults": [
{
"id": "doc-gravity",
"score": 0.97,
"metadata": { "source": "physics.pdf", "page": 12 }
}
]
}
]
}
}
Load both with npx aimock --config aimock.json. The config points to the
fixture file via llm.fixtures, so aimock handles both legs of the RAG
pipeline:
/v1/chat/completions— matches LLM fixtures for the generator/vector— serves vector query results for the retriever
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.qdrant import QdrantVectorStore
# Generator: LLM pointed at aimock
llm = OpenAI(
api_base="http://localhost:4010/v1",
api_key="test",
)
# Embeddings: also served by aimock
embed_model = OpenAIEmbedding(
api_base="http://localhost:4010/v1",
api_key="test",
)
# Retriever: aimock's vector endpoint
# Point your vector store client at localhost:4010/vector
# aimock implements the Qdrant-compatible REST API
# Now your entire RAG pipeline runs against one mock server
Embedding Fixtures
LlamaIndex indexes documents by generating embeddings. Use inputText matching
to return deterministic embedding vectors for specific inputs, ensuring your indexing and
retrieval paths produce consistent results in tests.
{
"fixtures": [
{
"match": { "inputText": "What is gravity?", "endpoint": "embedding" },
"response": {
"embedding": [0.9, 0.1, 0.05]
}
},
{
"match": { "inputText": "Gravity is a fundamental force", "endpoint": "embedding" },
"response": {
"embedding": [0.88, 0.12, 0.07]
}
}
]
}
The inputText matcher performs substring matching, so
"gravity" matches any input containing that word. Use exact strings when you
need precision.
With aimock-pytest
The aimock-pytest plugin starts and stops the server automatically per test.
Install with pip install aimock-pytest.
from llama_index.llms.openai import OpenAI
def test_rag_query(aimock):
# Load fixtures before making LLM calls
aimock.load_fixtures("./fixtures/llamaindex/rag.json")
llm = OpenAI(
api_base=f"{aimock.url}/v1",
api_key="test",
)
response = llm.complete("What is gravity?")
assert "force" in str(response).lower()
CI with GitHub Action
Run your LlamaIndex test suite in CI with the aimock GitHub Action. The action starts aimock as a background service and exposes it on the default port.
name: LlamaIndex Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Start aimock
uses: CopilotKit/aimock@v1
with:
fixtures: ./fixtures/llamaindex
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run tests
run: pytest tests/
env:
OPENAI_BASE_URL: http://127.0.0.1:4010/v1
OPENAI_API_KEY: test
No API keys needed in CI. Your LlamaIndex pipeline talks to aimock, which returns deterministic responses from fixtures.
Record & Replay
Record a RAG query end-to-end against real services, then replay it in tests. aimock captures both the LLM completions and the embedding calls, so the full pipeline is reproducible.
# Record LLM and embedding calls from a live session
npx aimock \
--record \
--provider-openai https://api.openai.com \
--fixtures ./fixtures/llamaindex
# Run your LlamaIndex pipeline against aimock
python my_rag_pipeline.py
# aimock saves fixtures to ./fixtures/llamaindex/
# Next run replays them without hitting the real API
def test_rag_query(aimock):
# Load the recorded fixtures
aimock.load_fixtures("./fixtures/llamaindex/recorded.json")
from llama_index.llms.openai import OpenAI
llm = OpenAI(api_base=f"{aimock.url}/v1", api_key="test")
# ... run your RAG pipeline, assert on results