Before BugViper can review pull requests or answer questions about your codebase, it needs to index the repository. Indexing clones the repository, parses every source file using Tree-sitter, and writes the result into a Neo4j knowledge graph. Once indexed, BugViper can search your code in milliseconds, trace call chains, detect security issues in PRs, and answer natural-language questions about your entire codebase.
What happens during ingestion
When you submit a repository for indexing, BugViper runs the following pipeline:
- Clone — BugViper downloads the repository from GitHub at the specified branch.
- Parse — Tree-sitter parsers run against all recognized source files. BugViper supports 17 languages out of the box.
- Graph write — The parser extracts
Function, Class, Variable, File, Module, and Repository nodes and writes them to Neo4j along with relationships (CONTAINS, DEFINES, CALLS, IMPORTS, INHERITS). Cyclomatic complexity is calculated and stored on every Function node at this stage.
- Embed (optional) — If an OpenRouter API key is configured, BugViper batch-embeds all nodes using
text-embedding-3-small and stores 1536-dimension vectors in Neo4j vector indexes for semantic search.
Ingest a repository via the API
Send a POST request to /api/v1/ingest/github with the repository details. All API requests require a Firebase ID token in the Authorization header.
curl -X POST https://your-bugviper-instance/api/v1/ingest/github \
-H "Authorization: Bearer YOUR_FIREBASE_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"owner": "acme-corp",
"repo_name": "my-api",
"branch": "main",
"clear_existing": false
}'
The API responds immediately with a job ID you can use to track progress:
{
"job_id": "3f2a1b4c-8d9e-4f0a-b1c2-d3e4f5a6b7c8",
"status": "pending",
"message": "Ingestion started for acme-corp/my-api",
"poll_url": "/api/v1/ingest/jobs/3f2a1b4c-8d9e-4f0a-b1c2-d3e4f5a6b7c8"
}
If an ingestion job is already running for the same repository, BugViper returns the existing job’s ID and status instead of starting a duplicate. Jobs that have been stuck in pending or dispatched for more than 10 minutes are automatically marked as failed so you can resubmit.
Setting "clear_existing": true deletes the entire knowledge graph for that repository before re-ingesting. All previously stored nodes, relationships, and embeddings are permanently removed. Use this only when you need a clean re-index — for example, after a major refactor where stale nodes would otherwise remain in the graph.
Poll for ingestion progress
Ingestion runs in the background. Poll the job endpoint with the job_id from the previous response to check progress:
curl https://your-bugviper-instance/api/v1/ingest/jobs/JOB_ID \
-H "Authorization: Bearer YOUR_FIREBASE_TOKEN"
A job in progress looks like this:
{
"job_id": "3f2a1b4c-8d9e-4f0a-b1c2-d3e4f5a6b7c8",
"owner": "acme-corp",
"repo_name": "my-api",
"branch": "main",
"status": "running",
"created_at": "2026-04-10T10:00:00Z",
"updated_at": "2026-04-10T10:01:23Z",
"started_at": "2026-04-10T10:00:05Z",
"completed_at": null,
"stats": null,
"error_message": null
}
The status field moves through the following states:
| Status | Meaning |
|---|
pending | Job is queued and waiting to start. |
running | Ingestion is actively processing the repository. |
completed | Ingestion finished successfully. |
failed | Ingestion encountered an unrecoverable error. Check error_message. |
Ingestion stats
When a job completes, the stats object is populated with a summary of everything BugViper found in the repository:
{
"status": "completed",
"stats": {
"files_processed": 84,
"files_skipped": 12,
"classes_found": 47,
"functions_found": 312,
"imports_found": 209,
"total_lines": 18540,
"errors": [],
"embedding_status": "completed",
"nodes_embedded": 359,
"embedding_error": null
}
}
| Field | Description |
|---|
files_processed | Source files successfully parsed and written to the graph. |
files_skipped | Files ignored because their language is unsupported or they exceeded size limits. |
classes_found | Total Class nodes created across all processed files. |
functions_found | Total Function nodes created, including class methods. |
imports_found | Total Module import relationships recorded. |
total_lines | Aggregate line count across all processed files. |
embedding_status | completed if vectors were generated, skipped if embedding was not run, or failed if it errored. |
nodes_embedded | Number of nodes that received vector embeddings. |
Add semantic search after ingestion
If embedding was skipped during the initial ingest — or if it failed partway through — you can trigger a standalone embedding pass without re-cloning the repository:
curl -X POST https://your-bugviper-instance/api/v1/ingest/acme-corp/my-api/embed \
-H "Authorization: Bearer YOUR_FIREBASE_TOKEN"
This endpoint is safe to call multiple times. Nodes that already have embeddings are skipped, so only new or un-embedded nodes are processed.
{
"repo_id": "acme-corp/my-api",
"nodes_embedded": 147,
"breakdown": {
"Function": 98,
"Class": 31,
"Variable": 18
},
"message": "Embedded 147 nodes successfully"
}