tRAGar playground

initializing…
chunks:  |  dim:  |  model:
Ingested documents
No documents yet.
Embedder
Fast 6-gram hash. Deterministic, no download, but not truly semantic — synonyms won't score well.
Chunking
Split on one or more blank lines. Natural for prose and markdown.
Filter out chunks shorter than this
Chunk preview
Paste text in the Ingest tab, then click preview.
Display
Dim results below X% of top score
30%
Instance (read-only)
namespace: playground
store:
embedder: playground-hash-v1
dim: 384
To change namespace or store, use reset storage in the Ingest tab.
k = 5
Results will appear here.
No chunks yet. Ingest some text first.
Run a query first, then switch here to see the score distribution.
Chunks

Before indexing, each document is split into chunks — smaller pieces of text that can be matched independently. Smaller chunks give more precise results; larger chunks preserve more context.

StrategyHow it splitsBest for
Blank-lineOne or more empty linesProse, markdown, docs
SentenceSentence-ending punctuationDense text, articles
Fixed sizeEvery N charactersCode, logs, structured data

Configure the strategy in the Config tab. Use Preview current text to see how your document would be split before ingesting.

Embeddings

Each chunk is converted into a vector — an array of numbers (here, 384 dimensions). Similar texts produce similar vectors. This encoding captures meaning, not just keywords.

The playground supports a hash embedder (fast, no download, not semantic) and real transformer models via the Config → Embedder selector. The hash embedder counts 6-character n-gram hashes, which captures surface patterns but not synonyms or paraphrases. Switching to all-MiniLM-L6-v2 or better will dramatically improve recall. Each model uses its own namespace so your data isn't lost when switching.

Click any result or chunk to expand its embedding bar chart — the top-64 active dimensions, showing which features the embedder fired on.

Cosine similarity (score)

Retrieval ranks chunks by the cosine similarity between the query vector and each chunk vector. The score is between −1 and 1; higher is more similar.

Scores near 1.0 mean near-identical vectors. Scores near 0.0 mean orthogonal (unrelated). Negative scores are rare with the hash embedder because all values are non-negative.

The Score dim threshold in Config dims results that score less than X% of the top hit — useful for filtering out low-relevance chunks without hard-coding a cutoff.

k (top-k)

The k slider in the Results tab controls how many results to return. Larger k = higher recall (more chunks returned), but may include lower-relevance hits.

The Histogram tab shows the full score distribution across all chunks — not just top-k — which helps you choose a good k value and understand how well-separated the relevant chunks are.

Storage (OPFS / IndexedDB)

tRAGar persists chunks to Origin Private File System (OPFS) — a fast, sandboxed, browser-native file store. Chunks survive page reloads within the same origin.

If OPFS is unavailable (older browser or HTTP context), the library automatically falls back to IndexedDB and emits a StoreFallback warning shown in the log.

The store namespace (here: playground) isolates data. Two pages with different namespaces don't share chunks. Use reset storage to wipe the current namespace.

Keyboard shortcuts
KeyAction
EnterRun query (when query input focused)