Index a repository with BugViper

Before BugViper can review pull requests or answer questions about your codebase, it needs to index the repository. Indexing clones the repository, parses every source file using Tree-sitter, and writes the result into a Neo4j knowledge graph. Once indexed, BugViper can search your code in milliseconds, trace call chains, detect security issues in PRs, and answer natural-language questions about your entire codebase.

What happens during ingestion

When you submit a repository for indexing, BugViper runs the following pipeline:

Clone — BugViper downloads the repository from GitHub at the specified branch.
Parse — Tree-sitter parsers run against all recognized source files. BugViper supports 17 languages out of the box.
Graph write — The parser extracts Function, Class, Variable, File, Module, and Repository nodes and writes them to Neo4j along with relationships (CONTAINS, DEFINES, CALLS, IMPORTS, INHERITS). Cyclomatic complexity is calculated and stored on every Function node at this stage.
Embed (optional) — If an OpenRouter API key is configured, BugViper batch-embeds all nodes using text-embedding-3-small and stores 1536-dimension vectors in Neo4j vector indexes for semantic search.

Ingest a repository via the API

Send a POST request to /api/v1/ingest/github with the repository details. All API requests require a Firebase ID token in the Authorization header.

curl -X POST https://your-bugviper-instance/api/v1/ingest/github \
  -H "Authorization: Bearer YOUR_FIREBASE_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "owner": "acme-corp",
    "repo_name": "my-api",
    "branch": "main",
    "clear_existing": false
  }'

The API responds immediately with a job ID you can use to track progress:

{
  "job_id": "3f2a1b4c-8d9e-4f0a-b1c2-d3e4f5a6b7c8",
  "status": "pending",
  "message": "Ingestion started for acme-corp/my-api",
  "poll_url": "/api/v1/ingest/jobs/3f2a1b4c-8d9e-4f0a-b1c2-d3e4f5a6b7c8"
}

If an ingestion job is already running for the same repository, BugViper returns the existing job’s ID and status instead of starting a duplicate. Jobs that have been stuck in pending or dispatched for more than 10 minutes are automatically marked as failed so you can resubmit.

Setting "clear_existing": true deletes the entire knowledge graph for that repository before re-ingesting. All previously stored nodes, relationships, and embeddings are permanently removed. Use this only when you need a clean re-index — for example, after a major refactor where stale nodes would otherwise remain in the graph.

Poll for ingestion progress

Ingestion runs in the background. Poll the job endpoint with the job_id from the previous response to check progress:

curl https://your-bugviper-instance/api/v1/ingest/jobs/JOB_ID \
  -H "Authorization: Bearer YOUR_FIREBASE_TOKEN"

A job in progress looks like this:

{
  "job_id": "3f2a1b4c-8d9e-4f0a-b1c2-d3e4f5a6b7c8",
  "owner": "acme-corp",
  "repo_name": "my-api",
  "branch": "main",
  "status": "running",
  "created_at": "2026-04-10T10:00:00Z",
  "updated_at": "2026-04-10T10:01:23Z",
  "started_at": "2026-04-10T10:00:05Z",
  "completed_at": null,
  "stats": null,
  "error_message": null
}

The status field moves through the following states:

Status	Meaning
`pending`	Job is queued and waiting to start.
`running`	Ingestion is actively processing the repository.
`completed`	Ingestion finished successfully.
`failed`	Ingestion encountered an unrecoverable error. Check `error_message`.

Ingestion stats

When a job completes, the stats object is populated with a summary of everything BugViper found in the repository:

{
  "status": "completed",
  "stats": {
    "files_processed": 84,
    "files_skipped": 12,
    "classes_found": 47,
    "functions_found": 312,
    "imports_found": 209,
    "total_lines": 18540,
    "errors": [],
    "embedding_status": "completed",
    "nodes_embedded": 359,
    "embedding_error": null
  }
}

Field	Description
`files_processed`	Source files successfully parsed and written to the graph.
`files_skipped`	Files ignored because their language is unsupported or they exceeded size limits.
`classes_found`	Total `Class` nodes created across all processed files.
`functions_found`	Total `Function` nodes created, including class methods.
`imports_found`	Total `Module` import relationships recorded.
`total_lines`	Aggregate line count across all processed files.
`embedding_status`	`completed` if vectors were generated, `skipped` if embedding was not run, or `failed` if it errored.
`nodes_embedded`	Number of nodes that received vector embeddings.

Add semantic search after ingestion

If embedding was skipped during the initial ingest — or if it failed partway through — you can trigger a standalone embedding pass without re-cloning the repository:

curl -X POST https://your-bugviper-instance/api/v1/ingest/acme-corp/my-api/embed \
  -H "Authorization: Bearer YOUR_FIREBASE_TOKEN"

This endpoint is safe to call multiple times. Nodes that already have embeddings are skipped, so only new or un-embedded nodes are processed.

{
  "repo_id": "acme-corp/my-api",
  "nodes_embedded": 147,
  "breakdown": {
    "Function": 98,
    "Class": 31,
    "Variable": 18
  },
  "message": "Embedded 147 nodes successfully"
}

Documentation Index

​What happens during ingestion

​Ingest a repository via the API

​Poll for ingestion progress

​Ingestion stats

​Add semantic search after ingestion

What happens during ingestion

Ingest a repository via the API

Poll for ingestion progress

Ingestion stats

Add semantic search after ingestion