TLDR

Control the environment where integration tests run.
Control the test data to ensure consistency and reliability.
Test only what’s necessary, maintaining control is key.
Focus on catching real breaking changes (prioritize usefulness over coverage percentage).
To skip right to the sample code: fastapi-opensearch-integration-test

Introduction

I’ve worked with many codebases where testing becomes more about hitting quality gate numbers than actually catching breaking changes. In my view, it’s far better to have 25% code coverage with meaningful, maintainable tests than 100% coverage filled with fragile, low-value ones.

Here, I’ll focus on integration testing, which tends to get more complex than unit testing. These tests often work fine at first but start breaking down as systems grow. I’ll cover a few key issues I’ve seen repeatedly:

Uncontrolled environments: A common mistake is running integration tests against a shared database where people are constantly adding, changing, or deleting data. This leads to flakiness because the environment isn’t stable.
Uncontrolled data flow: Even in a controlled environment, you need to manage your data. In unit testing, you typically follow an arrange-act-assert pattern; for integration tests, the “arrange” step should include seeding your data in a predictable way.
Over-testing: Many teams try to test too much. If your system consists of a service and a database, focus on those. Once you start adding assertions for external calls, you’re no longer testing your system. If integration tests are a deployment gate and your authentication service goes down in the test environment, your entire deployment process can be blocked for reasons outside your control.

Example

In this example, we’ll use OpenSearch as the backend data store for a vector search index, and Python with uv and FastAPI for the service layer. Your datastore doesn’t need to be a search index or use vectors to apply the patterns discussed here, these are just the technologies I’m currently working with.

I’m less familiar with integration testing in Python than in other stacks. There may be more efficient ways to run these tests, but the focus here is on principles that apply regardless of language or framework.

Step 1 - Prepare an Ephemeral OpenSearch

To spin up a local OpenSearch instance for testing, I added the following compose.yml file using Docker Compose:

services:
  opensearch:
    image: opensearchproject/opensearch:3
    container_name: opensearch-node
    environment:
      - discovery.type=single-node
      - OPENSEARCH_INITIAL_ADMIN_PASSWORD=${OPENSEARCH_PASSWORD}
      - DISABLE_SECURITY_PLUGIN=true
    ports:
      - 9200:9200
      - 9600:9600
    volumes:
      - opensearch-data:/usr/share/opensearch/data

volumes:
  opensearch-data:

After verifying this setup locally, I reused the same configuration in GitHub Actions:

    services:
      opensearch:
        image: opensearchproject/opensearch:3.0.0
        ports:
          - 9200:9200
        env:
          discovery.type: single-node
          DISABLE_SECURITY_PLUGIN: "true"
          DISABLE_INSTALL_DEMO_CONFIG: "true"
          cluster.routing.allocation.disk.watermark.low: "95%"
          cluster.routing.allocation.disk.watermark.high: "97%"
          cluster.routing.allocation.disk.watermark.flood_stage: "98%"
        options: >-
          --health-cmd="curl -fsS http://localhost:9200 >/dev/null || exit 1"
          --health-interval=3s
          --health-retries=40
          --health-timeout=3s

Step 2: Create the Index

With OpenSearch running, the next step is to set up an index. In this example, we’ll use a simple index for books.

NOTE: This code defines an index schema for OpenSearch, but the same concept applies to creating a database schema or other backend storage structures.

We’ll use the opensearch-py package to create the index and wait until the cluster reaches a usable state (yellow or green

# scripts/create_index.py
import os
import time
import json
from opensearchpy import OpenSearch, RequestsHttpConnection
from opensearchpy.helpers import bulk
from opensearch.schema import BOOKS_INDEX, BOOKS_INDEX_SCHEMA
from typing import Dict, Generator, Optional
from fastembed import TextEmbedding

OS_HOST = os.getenv("OS_HOST", "http://localhost:9200")
OS_USER = os.getenv("OS_USER", "admin")
OS_PASSWORD = os.getenv("OS_PASS", "admin")
IS_LOCAL = os.getenv("IS_LOCAL", False)

def get_client() -> OpenSearch:
    return OpenSearch(
        hosts=[OS_HOST],
        http_auth=(OS_USER, OS_PASSWORD),
        use_ssl=IS_LOCAL,
        verify_certs=False,
        connection_class=RequestsHttpConnection,
        timeout=30,
        max_retries=3,
        retry_on_timeout=True
    )

def wait_for_yellow(client, seconds=60):
    deadline = time.time() + seconds
    while time.time() < deadline:
        status = client.cluster.health().get("status")
        if status in {"yellow", "green"}:
            return status
        time.sleep(1)
    raise TimeoutError("Cluster did not reach yellow/green")

def create_index(client: OpenSearch, index: str, schema: dict) -> None:
    if client.indices.exists(index=index):
        return
    client.indices.create(index=index, body=schema)

Step 3: Create Sample Data

I generated a simple NDJSON file with about 20 documents. You can expand this over time or create multiple files for different test scenarios.

Here’s an example record:

{ "index": { "_index": "books", "_id": "9780000000001-1" } }
{ "id": "9780000000001-1", "isbn": "9780000000001", "chapter": 1, "title": "Orbital Shadows", "chunk_text": "Orbital Shadows — Chapter 1. The corridor hummed with a soft resonance.", "author": "Alex Stone", "category": "scifi", "vector": [0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08] }

NOTE: Ignore the vector field here. In realistic scenarios, these are regenerated. If you’re not building a vectorized index, you can seed your data with simple text instead.

Step 4: Generate Embeddings

Next, we’ll add embeddings for each chunk in the sample file:

# scripts/create_index.py
def generate_embeddings(in_path: str, out_path: str, model_name="BAAI/bge-small-en-v1.5"):
    embedder = TextEmbedding(model_name=model_name)   # 384-D
    with open(in_path, "r", encoding="utf-8") as fin, open(out_path, "w", encoding="utf-8") as fout:
        while True:
            action = fin.readline()
            if not action: break
            doc = fin.readline()
            if not doc: break
            d = json.loads(doc)
            vec = vec = next(iter(embedder.embed([d["chunk_text"]])))
            vec = list(vec)
            d["vector"] = [float(x) for x in vec]
            fout.write(action)
            fout.write(json.dumps(d, ensure_ascii=False) + "\n")

This step takes each chunk_text and adds embeddings. I used fastembed to keep integration tests fast and lightweight, but you can use a more robust model and generate embeddings offline if needed. The script outputs a new file with embedded data ready for indexing.

Step 4: Index Data

With our dataset ready, we can now index it into OpenSearch, either locally or in the CI/CD pipeline. This step is OpenSearch specific, but the general pattern applies to any backend that supports bulk inserts.

# scripts/create_index.py
def index_ndjson(filepath: str, index_name: str, refresh: bool = True) -> tuple[int, int]:
    client = get_client()
    success, errors = bulk(client, generate_actions(filepath, index_name), request_timeout=60)
    if refresh:
        client.indices.refresh(index=index_name)

    error_count = len(errors) if isinstance(errors, list) else 0
    return success, error_count

This function bulk-loads the NDJSON data into the target index, optionally refreshing it afterward so it’s immediately searchable.

Step 5: Add Seed Data Through GitHub Actions

With the code in place, we can now use GitHub Actions to seed data into our ephemeral OpenSearch instance.

    steps:
      - name: Check out repository
        uses: actions/checkout@v4

      - name: Set up uv
        uses: astral-sh/setup-uv@v3
        with:
          enable-cache: true

      - name: Install deps
        run: uv sync

      - name: Create index
        env:
          OS_HOST: "http://localhost:9200"
        run: |
          # Ensure your package imports work when running a script directly
          export PYTHONPATH="$PWD"
          uv run python scripts/create_index.py

This workflow checks out the repository, installs dependencies, and runs the index creation script against the ephemeral OpenSearch instance spun up in your CI environment.

Step 6: Write Integration Tests

We’re using pytest for integration testing. I prefer keeping things simple by using the same framework for both unit and integration tests when possible.

# tests/conftest.py
import os
import pytest_asyncio
import httpx

API = os.getenv("API_BASE_URL", "http://127.0.0.1:8000")

@pytest_asyncio.fixture
async def client():
    async with httpx.AsyncClient(base_url=API, timeout=15.0) as c:
        yield c

# tests/test_vector_search.py
import pytest


@pytest.mark.asyncio
async def test_vector_search_with_category_filter(client):
    payload = {
        "query": "derelict space station",
        "filters": { "category": "scifi" },
        "size": 3,
        "num_candidates": 100
    }

    r = await client.post("/vector-search", json=payload)

    assert r.status_code == 200, r.text
    data = r.json()
    assert "results" in data
    assert len(data["results"]) >= 1
    assert all(item["source"]["category"] == "scifi" for item in data["results"])

@pytest.mark.asyncio
async def test_vector_search_with_author_filter(client):
    payload = {
        "query": "echoes and memories",
        "filters": {"author": "Alex Stone"},
        "size": 2,
        "num_candidates": 100
    }
    r = await client.post("/vector-search", json=payload)
    assert r.status_code == 200
    data = r.json()
    assert len(data["results"]) >= 0
    if data["results"]:
        assert all(hit["source"]["author"] == "Alex Stone" for hit in data["results"])

@pytest.mark.asyncio
async def test_bad_isbn_rejected(client):
    payload = {
        "query": "anything",
        "filters": {"isbn": "bad-isbn"},
        "size": 1,
        "num_candidates": 10
    }
    r = await client.post("/vector-search", json=payload)
    assert r.status_code in (400, 422)

This setup keeps your integration tests lightweight and readable. Using pytest with httpx.AsyncClient allows async tests to run cleanly against your running API.

Step 7: Enhance our Github Actions to Run Integration Tests

name: CI

on:
  push:
    branches:
      - main
  pull_request:

jobs:
  integration-tests:
    runs-on: ubuntu-latest

    services:
      opensearch:
        image: opensearchproject/opensearch:3.0.0
        ports:
          - 9200:9200
        env:
          discovery.type: single-node
          DISABLE_SECURITY_PLUGIN: "true"
          DISABLE_INSTALL_DEMO_CONFIG: "true"
          cluster.routing.allocation.disk.watermark.low: "95%"
          cluster.routing.allocation.disk.watermark.high: "97%"
          cluster.routing.allocation.disk.watermark.flood_stage: "98%"
        options: >-
          --health-cmd="curl -fsS http://localhost:9200 >/dev/null || exit 1"
          --health-interval=3s
          --health-retries=40
          --health-timeout=3s

    steps:
      - name: Check out repository
        uses: actions/checkout@v4

      - name: Set up uv
        uses: astral-sh/setup-uv@v3
        with:
          enable-cache: true

      - name: Install deps
        run: uv sync --extra test

      - name: Create index
        env:
          OS_HOST: "http://localhost:9200"
        run: |
          # Ensure your package imports work when running a script directly
          export PYTHONPATH="$PWD"
          uv run python scripts/create_index.py

      - name: Start FastAPI (background)
        env:
          OS_HOST: "http://localhost:9200"
        run: |
          set -e
          uv run uvicorn main:app --host 0.0.0.0 --port 8000 > server.log 2>&1 &
          echo $! > uvicorn.pid
          for i in {1..60}; do
            if curl -fsS "http://127.0.0.1:8000/health" >/dev/null; then
              echo "FastAPI is up"
              break
            fi
            sleep 1
          done

      - name: Run Integration Tests
        run: uv run pytest -q

      - name: Stop FastAPI
        if: always()
        run: |
          if [ -f uvicorn.pid ]; then kill "$(cat uvicorn.pid)" || true; fi

Other Enhancements / Patterns

You can easily make this setup configurable to run certain tests against a live environment for sanity or smoke checks. The main advantage of this approach is that you don’t need to wait for full infrastructure to come online before running integration tests. You’re still testing a realistic setup; OpenSearch running in Docker, your actual index mapping, and seeded test documents. I prefer this over using live environments since it reduces the ongoing maintenance of managing test data over time.

If you do test against a live environment, keep it limited to basic sanity checks; production or shared data is rarely consistent. Ideally, use a dedicated space (such as a separate index or database schema) to maintain control. If that’s not possible, create specific test data for these runs and clean up afterward to prevent your datastore from growing unnecessarily over time.

Conclusion

We now have a fully functional integration testing setup that runs directly in our CI/CD pipeline with minimal effort. Remember that simplicity scales; before building a complex testing framework, consider what it will be like to maintain hundreds of tests over time.

Eliminating Flakiness in Integration Testing