What is the BugViper knowledge graph?

When you index a repository, BugViper doesn’t just store your files — it parses every function, class, variable, and import into a Neo4j property graph. Each code element becomes a node, and the relationships between them (calls, inheritance, imports) become edges. The result is a structured representation of your codebase that the AI review agent and search features query directly, giving every feature full context about how your code is connected — not just the files that changed.

How the graph is built

BugViper builds the graph in three steps at ingestion time:

Clone — BugViper downloads your repository from GitHub.
Parse — Tree-sitter AST parsers run over every supported file, extracting symbols and their relationships without executing any code.
Write — Nodes and relationships are written to Neo4j. Cyclomatic complexity is calculated and stored on every Function node. Optionally, all nodes are batch-embedded and their vectors stored in Neo4j vector indexes for semantic search.

Node types

Every code element is stored as a typed node in the graph:

Node type	What it represents
`Repository`	The top-level repository (owner/name)
`File`	A source file in the repository
`Function`	A function or method definition
`Class`	A class definition
`Variable`	A module-level or class-level variable
`Module`	An external or internal package referenced by an import statement

Each node carries properties extracted directly from the source: name, source_code, docstring, line_number, path, and (for functions) cyclomatic_complexity.

Graph schema

The relationships between nodes follow a consistent schema:

(Repository)-[:CONTAINS]──►(File)
(File)-[:CONTAINS]─────────►(Function | Class | Variable)
(File)-[:IMPORTS]──────────►(Module)
(Class)-[:CONTAINS]────────►(Function)
(Class)-[:INHERITS]────────►(Class)
(Function)-[:CALLS]────────►(Function)

For example, when BugViper sees a file that defines a class DataLoader with a method load_batch that calls embed_texts, the graph contains:

A File node connected via CONTAINS to a Class node (DataLoader)
That Class node connected via CONTAINS to a Function node (load_batch)
That Function node connected via CALLS to another Function node (embed_texts)

The AI review agent traverses these relationships to understand call chains, inheritance hierarchies, and blast radius before generating findings.

Vector embeddings

Vector embeddings are optional but unlock semantic search. When enabled, BugViper embeds every node using text-embedding-3-small (via OpenRouter) and stores the resulting 1536-dimensional vectors in Neo4j vector indexes using cosine similarity.

Enable embeddings during ingestion if you want to search your codebase by intent — for example, querying “embedding model configuration” to find nodes related to that concept, even if those words don’t appear verbatim in symbol names.

Supported languages

BugViper uses Tree-sitter parsers to support 17 programming languages:

Systems
JVM & .NET
Scripting
Web & TypeScript
Other

C
C++
Rust
Go

Files in unsupported languages are skipped during ingestion. Supported files in any of the 17 languages above are fully parsed into the graph.

Why the graph matters

Traditional static analysis and code review tools only see the diff — the lines that changed in a pull request. BugViper’s AI agent has access to the full graph of your codebase: every caller of a function, the full inheritance chain of a class, all files that import a given module. This means the reviewer can identify issues that are invisible from the diff alone, such as a change that breaks a caller in a completely different file, or a pattern that conflicts with how the same logic is implemented elsewhere.

A repository must be fully indexed before code search, PR review, and the Ask Agent feature are available. See Index a repository to get started.

Documentation Index

​How the graph is built

​Node types

​Graph schema

​Vector embeddings

​Supported languages

​Why the graph matters

How the graph is built

Node types

Graph schema

Vector embeddings

Supported languages

Why the graph matters