Skip to main content
When you index a repository, BugViper doesn’t just store your files — it parses every function, class, variable, and import into a Neo4j property graph. Each code element becomes a node, and the relationships between them (calls, inheritance, imports) become edges. The result is a structured representation of your codebase that the AI review agent and search features query directly, giving every feature full context about how your code is connected — not just the files that changed.

How the graph is built

BugViper builds the graph in three steps at ingestion time:
  1. Clone — BugViper downloads your repository from GitHub.
  2. Parse — Tree-sitter AST parsers run over every supported file, extracting symbols and their relationships without executing any code.
  3. Write — Nodes and relationships are written to Neo4j. Cyclomatic complexity is calculated and stored on every Function node. Optionally, all nodes are batch-embedded and their vectors stored in Neo4j vector indexes for semantic search.

Node types

Every code element is stored as a typed node in the graph:
Node typeWhat it represents
RepositoryThe top-level repository (owner/name)
FileA source file in the repository
FunctionA function or method definition
ClassA class definition
VariableA module-level or class-level variable
ModuleAn external or internal package referenced by an import statement
Each node carries properties extracted directly from the source: name, source_code, docstring, line_number, path, and (for functions) cyclomatic_complexity.

Graph schema

The relationships between nodes follow a consistent schema:
(Repository)-[:CONTAINS]──►(File)
(File)-[:CONTAINS]─────────►(Function | Class | Variable)
(File)-[:IMPORTS]──────────►(Module)
(Class)-[:CONTAINS]────────►(Function)
(Class)-[:INHERITS]────────►(Class)
(Function)-[:CALLS]────────►(Function)
For example, when BugViper sees a file that defines a class DataLoader with a method load_batch that calls embed_texts, the graph contains:
  • A File node connected via CONTAINS to a Class node (DataLoader)
  • That Class node connected via CONTAINS to a Function node (load_batch)
  • That Function node connected via CALLS to another Function node (embed_texts)
The AI review agent traverses these relationships to understand call chains, inheritance hierarchies, and blast radius before generating findings.

Vector embeddings

Vector embeddings are optional but unlock semantic search. When enabled, BugViper embeds every node using text-embedding-3-small (via OpenRouter) and stores the resulting 1536-dimensional vectors in Neo4j vector indexes using cosine similarity.
Enable embeddings during ingestion if you want to search your codebase by intent — for example, querying “embedding model configuration” to find nodes related to that concept, even if those words don’t appear verbatim in symbol names.

Supported languages

BugViper uses Tree-sitter parsers to support 17 programming languages:
  • C
  • C++
  • Rust
  • Go
Files in unsupported languages are skipped during ingestion. Supported files in any of the 17 languages above are fully parsed into the graph.

Why the graph matters

Traditional static analysis and code review tools only see the diff — the lines that changed in a pull request. BugViper’s AI agent has access to the full graph of your codebase: every caller of a function, the full inheritance chain of a class, all files that import a given module. This means the reviewer can identify issues that are invisible from the diff alone, such as a change that breaks a caller in a completely different file, or a pattern that conflicts with how the same logic is implemented elsewhere.
A repository must be fully indexed before code search, PR review, and the Ask Agent feature are available. See Index a repository to get started.