Full-text and semantic code search in BugViper

BugViper exposes two complementary search modes for indexed repositories: full-text search for finding exact identifiers and keywords, and semantic search for finding code by intent. Both modes query your Neo4j knowledge graph directly — no code needs to leave your environment, and results are anchored to precise line numbers in the original source.

Full-text search

Full-text search is backed by Apache Lucene indexes inside Neo4j — the same engine that powers Elasticsearch. When you submit a query, BugViper runs a three-tier strategy to find the best results:

Tier 1 — Lucene fulltext index on symbols

The first tier queries a code_search Lucene index built over Function, Class, and Variable nodes, searching across three fields: name, docstring, and source_code.

Index	Node types	Fields searched
`code_search`	`Function`, `Class`, `Variable`	`name`, `docstring`, `source_code`
`file_content_search`	`File`	`source_code`

For clean identifiers (like parse_unified_diff), BugViper uses phrase search to find the exact symbol instantly. For queries containing special characters (like "Authorization: Bearer"), the query is tokenised and joined with AND so Lucene can match across fields.

Tier 2 — Name CONTAINS fallback

If Tier 1 returns no results — for example, when a symbol hasn’t been indexed with a docstring and the name doesn’t phrase-match — BugViper falls back to a name CONTAINS query against the primary identifier extracted from your search string. This catches partial matches that Lucene misses.

Tier 3 — File content line search

If both symbol-level tiers return empty results, BugViper searches raw file content line-by-line using the file_content_search Lucene index (with a source_code CONTAINS fallback). This tier returns individual matching lines with their path and line number rather than full symbol bodies, keeping the response lean.

Your query
    │
    ├─► Tier 1 — Lucene fulltext (name, docstring, source_code)
    │       Clean identifiers → phrase search
    │       Special characters → AND-keyword tokenisation
    │
    ├─► Tier 2 — Name CONTAINS fallback (if Tier 1 empty)
    │
    └─► Tier 3 — File content line-by-line (if Tier 2 empty)
            Returns: path + line_number + matching line

Search result fields

Every result from the /api/v1/query/search endpoint includes:

Field	Description
`type`	`function`, `class`, `variable`, or `line`
`name`	Symbol name (empty for line-level results)
`path`	Repo-relative file path
`line_number`	Line in the source file
`score`	Lucene relevance score (symbol results score higher than line results)

Symbol results (functions, classes, variables) are returned before line-level matches because they carry higher Lucene scores.

The peek viewer

Line-level search results return only the matching line — not a full file dump. To read context around any result, use the /api/v1/query/code-finder/peek endpoint, which returns a configurable window of lines above and below the anchor line. The anchor line is flagged with is_anchor: true so your UI can highlight it.

GET /api/v1/query/code-finder/peek
  ?path=src/embedder.py
  &line=42
  &above=10
  &below=10

You can expand or collapse the window (up to 200 lines in either direction) without re-fetching the whole file. This keeps responses fast regardless of file size.

Semantic search

Semantic search lets you find code by intent rather than exact identifier. Instead of matching keywords, BugViper embeds your query using text-embedding-3-small (via OpenRouter) and queries the Neo4j vector indexes using cosine similarity.

Semantic search requires that embeddings were enabled when the repository was indexed. If semantic search returns no results, the repository may not have been ingested with the embedding option enabled.

For example, querying "embedding model configuration" returns nodes like EmbeddingModelName, RoastResponse, and other conceptually related code at similarity scores of 73%, 69%, and 68% — even though none of those symbol names contain the words “embedding model configuration” verbatim. Results are ranked by similarity score and capped at 10 results. Each result includes:

Field	Description
`name`	Symbol name
`type`	`function`, `class`, `variable`, or `unknown`
`path`	File path
`line_number`	Line in the source file
`source_code`	Full body of the matching symbol
`docstring`	Docstring if present
`score`	Cosine similarity score (0.0–1.0)

Choosing between full-text and semantic search

Use full-text when
Use semantic search when

You know the name of a function, class, or variable
You’re searching for a specific string literal or pattern (e.g., an API endpoint path or a configuration key)
You want exact matches ranked by relevance score
You’re looking for where a specific error message or log string appears

Both modes are available from the Query tab in the BugViper dashboard or directly via the REST API.

Documentation Index

​Full-text search

​Tier 1 — Lucene fulltext index on symbols

​Tier 2 — Name CONTAINS fallback

​Tier 3 — File content line search

​Search result fields

​The peek viewer

​Semantic search

​Choosing between full-text and semantic search

Full-text search

Tier 1 — Lucene fulltext index on symbols

Tier 2 — Name CONTAINS fallback

Tier 3 — File content line search

Search result fields

The peek viewer

Semantic search

Choosing between full-text and semantic search