| Title: | Lightweight Vector Database with Embedded Machine Learning Models |
|---|---|
| Description: | A lightweight vector database for text retrieval in R with embedded machine learning models and no external API (Application Programming Interface) keys. Supports dense and hybrid search, optional HNSW (Hierarchical Navigable Small World) approximate nearest-neighbor indexing, faceted filters with ACL (Access Control List) metadata, command-line tools, and a local dashboard built with 'shiny'. The HNSW method is described by Malkov and Yashunin (2018) <doi:10.1109/TPAMI.2018.2889473>. |
| Authors: | Kwadwo Daddy Nyame Owusu Boakye [aut, cre] |
| Maintainer: | Kwadwo Daddy Nyame Owusu Boakye <[email protected]> |
| License: | Apache License (>= 2) |
| Version: | 1.1.2 |
| Built: | 2026-05-22 18:51:35 UTC |
| Source: | https://github.com/knowusuboaky/vectrixdb-r |
Create ACLConfig from list of ACL strings
acl_config_from_list(acl_list)acl_config_from_list(acl_list)
acl_list |
Character vector of ACL strings |
ACLConfig object
ACL configuration for a document or collection
read_principalsWho can read
deny_principalsWho cannot read (takes precedence)
is_publicIs public access allowed
new()
Create a new ACLConfig
ACLConfig$new( read_principals = list(), deny_principals = list(), is_public = FALSE )
read_principalsList of ACLPrincipal objects
deny_principalsList of ACLPrincipal objects
is_publicLogical
clone()
The objects of this class are cloneable with this method.
ACLConfig$clone(deep = FALSE)
deepWhether to make a deep clone.
Access Control List filter for security-aware search
acl_fieldMetadata field containing ACLs
new()
Create a new ACLFilter
ACLFilter$new(acl_field = "_acl")
acl_fieldField name for ACLs (default: "_acl")
filter()
Filter documents based on user's ACL principals
ACLFilter$filter(documents, user_principals, default_allow = FALSE)
documentsList of documents with metadata
user_principalsCharacter vector or list of ACLPrincipal
default_allowAllow if no ACL defined (default: FALSE)
Filtered documents
add_acl()
Add ACL to document metadata
ACLFilter$add_acl(metadata, principals)
metadataDocument metadata
principalsCharacter vector of principal strings
Updated metadata
create_filter_condition()
Create ACL filter condition for query
ACLFilter$create_filter_condition(user_principals)
user_principalsCharacter vector of principals
Filter condition list
clone()
The objects of this class are cloneable with this method.
ACLFilter$clone(deep = FALSE)
deepWhether to make a deep clone.
## Not run: acl_filter <- ACLFilter$new() filtered <- acl_filter$filter( documents = results, user_principals = c("user:alice", "group:engineering") ) ## End(Not run)## Not run: acl_filter <- ACLFilter$new() filtered <- acl_filter$filter( documents = results, user_principals = c("user:alice", "group:engineering") ) ## End(Not run)
ACL matching operators
ACLOperatorACLOperator
An object of class list of length 5.
An ACL principal (user, group, or role)
typePrincipal type
valuePrincipal value
new()
Create a new ACLPrincipal
ACLPrincipal$new(type, value)
typePrincipal type (user, group, role)
valuePrincipal value
matches()
Check if this principal matches another
ACLPrincipal$matches(other)
otherAnother ACLPrincipal
Logical
to_string()
Convert to string
ACLPrincipal$to_string()
clone()
The objects of this class are cloneable with this method.
ACLPrincipal$clone(deep = FALSE)
deepWhether to make a deep clone.
Enterprise-grade search capabilities:
Faceted search with aggregations
ACL/Security filtering
Text analyzers (stemming, synonyms, stopwords)
Combines multiple signals for better reranking:
Semantic similarity (word vectors)
BM25/keyword overlap
Query coverage
Position bias
Length normalization
weightsFeature weights
new()
Create a new AdvancedReranker
AdvancedReranker$new( semantic_weight = 0.4, bm25_weight = 0.3, coverage_weight = 0.2, position_weight = 0.1, sentence_embedder = NULL )
semantic_weightWeight for semantic similarity (0-1)
bm25_weightWeight for BM25 score (0-1)
coverage_weightWeight for query term coverage (0-1)
position_weightWeight for position bias (0-1)
sentence_embedderOptional SentenceEmbedder for semantic scoring
set_embedder()
Set sentence embedder
AdvancedReranker$set_embedder(embedder)
embedderSentenceEmbedder object
rerank()
Rerank results
AdvancedReranker$rerank( query, query_vector = NULL, results, doc_vectors = NULL, limit = 10 )
queryQuery text
query_vectorQuery embedding vector
resultsList of result objects with id, text, score
doc_vectorsMatrix of document vectors (optional)
limitNumber of results to return
Reranked list of results
learn_weights()
Learn optimal weights from relevance judgments
AdvancedReranker$learn_weights( queries, results_list, relevance_list, iterations = 100 )
queriesCharacter vector of queries
results_listList of result lists (one per query)
relevance_listList of relevance scores (1=relevant, 0=not)
iterationsNumber of optimization iterations
clone()
The objects of this class are cloneable with this method.
AdvancedReranker$clone(deep = FALSE)
deepWhether to make a deep clone.
Chain multiple analyzers together
analyzersList of TextAnalyzer objects
new()
Create a new AnalyzerChain
AnalyzerChain$new(analyzers)
analyzersList of TextAnalyzer objects
analyze()
Run text through all analyzers
AnalyzerChain$analyze(text)
textInput text
Character vector of tokens
clone()
The objects of this class are cloneable with this method.
AnalyzerChain$clone(deep = FALSE)
deepWhether to make a deep clone.
Abstract base class for cache backends
configCache configuration
statsCache statistics
new()
Create a new cache
BaseCache$new(config = NULL)
configCacheConfig object
get()
Get a value from cache
BaseCache$get(key)
keyCache key
Cached value or NULL
set()
Set a value in cache
BaseCache$set(key, value, ttl = NULL)
keyCache key
valueValue to cache
ttlTime to live (optional)
delete()
Delete a key from cache
BaseCache$delete(key)
keyCache key
Logical success
exists()
Check if key exists
BaseCache$exists(key)
keyCache key
Logical
clear()
Clear all cache entries
BaseCache$clear()
size()
Get cache size
BaseCache$size()
Integer count
get_many()
Get multiple values
BaseCache$get_many(keys)
keysCharacter vector of keys
Named list of values
set_many()
Set multiple values
BaseCache$set_many(items, ttl = NULL)
itemsNamed list of values
ttlTime to live
delete_many()
Delete multiple keys
BaseCache$delete_many(keys)
keysCharacter vector of keys
Integer count of deleted keys
make_key()
Make a prefixed key
BaseCache$make_key(key)
keyRaw key
Prefixed key
clone()
The objects of this class are cloneable with this method.
BaseCache$clone(deep = FALSE)
deepWhether to make a deep clone.
High-performance caching for low latency
Supports multiple cache backends:
InMemory LRU: Ultra-fast, limited by RAM
File-based: Persistent cache using RDS files
Create config from environment variables
cache_config_from_env()cache_config_from_env()
CacheConfig object
Available cache backends
CacheBackendCacheBackend
An object of class list of length 3.
Configuration for cache layer
backendCache backend type
memory_max_sizeMax items in memory
memory_ttl_secondsDefault TTL in seconds
file_cache_dirDirectory for file cache
file_ttl_secondsFile cache TTL
prefixCache key prefix
compressionUse compression
new()
Create a new CacheConfig
CacheConfig$new( backend = "memory", memory_max_size = 10000, memory_ttl_seconds = 3600, file_cache_dir = NULL, file_ttl_seconds = 86400, prefix = "vectrix:", compression = TRUE )
backendBackend type
memory_max_sizeMax memory items
memory_ttl_secondsMemory TTL
file_cache_dirFile cache directory
file_ttl_secondsFile TTL
prefixKey prefix
compressionUse compression
clone()
The objects of this class are cloneable with this method.
CacheConfig$clone(deep = FALSE)
deepWhether to make a deep clone.
A cached entry with metadata
valueCached value
created_atCreation timestamp
ttlTime to live in seconds
hitsNumber of hits
new()
Create a new CacheEntry
CacheEntry$new(value, ttl)
valueThe value to cache
ttlTime to live in seconds
is_expired()
Check if entry is expired
CacheEntry$is_expired()
Logical
clone()
The objects of this class are cloneable with this method.
CacheEntry$clone(deep = FALSE)
deepWhether to make a deep clone.
Cache statistics for monitoring
hitsCache hits
missesCache misses
setsCache sets
deletesCache deletes
evictionsCache evictions
record_hit()
Record a cache hit
CacheStats$record_hit()
record_miss()
Record a cache miss
CacheStats$record_miss()
record_set()
Record a cache set
CacheStats$record_set()
record_delete()
Record a cache delete
CacheStats$record_delete()
record_eviction()
Record a cache eviction
CacheStats$record_eviction()
hit_rate()
Get hit rate
CacheStats$hit_rate()
Numeric hit rate
to_list()
Convert to list
CacheStats$to_list()
List representation
reset()
Reset statistics
CacheStats$reset()
clone()
The objects of this class are cloneable with this method.
CacheStats$clone(deep = FALSE)
deepWhether to make a deep clone.
CLI tools for VectrixDB operations
Provides command-line style functions for:
Creating and managing collections
Adding and searching documents
Exporting and importing data
Database statistics and info
Configuration for CLI behavior
verbosePrint verbose output
colorUse colored output
data_dirDefault data directory
new()
Create CLI config
CLIConfig$new(verbose = TRUE, color = TRUE, data_dir = NULL)
verboseVerbose output
colorColored output
data_dirData directory
clone()
The objects of this class are cloneable with this method.
CLIConfig$clone(deep = FALSE)
deepWhether to make a deep clone.
Vector collection with indexing and search
nameCollection name
dimensionVector dimension
metricDistance metric
languageLanguage setting ("en" or "ml")
new()
Create a new Collection
Collection$new( name, dimension, metric = "cosine", storage = NULL, language = "en" )
nameCollection name
dimensionVector dimension
metricDistance metric
storageStorage backend
languageLanguage behavior ("en" = ASCII-focused, "ml" = Unicode-aware)
add()
Add documents to collection
Collection$add(ids, vectors, metadata = NULL, texts = NULL)
idsDocument IDs
vectorsMatrix of vectors
metadataList of metadata
textsCharacter vector of texts
search()
Search collection
Collection$search(query, limit = 10, filter = NULL, include_vectors = FALSE)
queryQuery vector
limitNumber of results
filterMetadata filter
include_vectorsInclude vectors in results
Results object
keyword_search()
Keyword search
Collection$keyword_search(query_text, limit = 10, filter = NULL)
query_textQuery text
limitNumber of results
filterMetadata filter
Results object
hybrid_search()
Hybrid search (dense + sparse)
Collection$hybrid_search( query, query_text, limit = 10, vector_weight = 0.5, text_weight = 0.5, filter = NULL, include_vectors = FALSE, rrf_k = 60, prefetch_multiplier = 10 )
queryQuery vector
query_textQuery text
limitNumber of results
vector_weightWeight for vector search
text_weightWeight for text search
filterMetadata filter
include_vectorsInclude vectors in results
rrf_kRRF constant
prefetch_multiplierPrefetch multiplier
Results object
get()
Get documents by ID
Collection$get(ids)
idsDocument IDs
List of results
delete()
Delete documents by ID
Collection$delete(ids)
idsDocument IDs to delete
count()
Get document count
Collection$count()
Integer count
clear()
Clear collection
Collection$clear()
clone()
The objects of this class are cloneable with this method.
Collection$clone(deep = FALSE)
deepWhether to make a deep clone.
A community of entities
idCommunity ID
levelHierarchy level
entity_idsMember entity IDs
summaryCommunity summary
parent_idParent community ID
child_idsChild community IDs
new()
Create a new Community
Community$new( id, level = 0, entity_ids = character(0), summary = NULL, parent_id = NULL )
idID
levelLevel
entity_idsMembers
summarySummary
parent_idParent
size()
Get size
Community$size()
Integer
clone()
The objects of this class are cloneable with this method.
Community$clone(deep = FALSE)
deepWhether to make a deep clone.
Detects communities using connected components
min_sizeMinimum community size
max_levelsMaximum hierarchy levels
new()
Create a new CommunityDetector
CommunityDetector$new(min_size = 5, max_levels = 3)
min_sizeMin size
max_levelsMax levels
detect()
Detect communities in graph
CommunityDetector$detect(graph)
graphKnowledgeGraph object
List of Community objects
clone()
The objects of this class are cloneable with this method.
CommunityDetector$clone(deep = FALSE)
deepWhether to make a deep clone.
Factory function to create cache backend
create_cache(config = NULL)create_cache(config = NULL)
config |
CacheConfig object or NULL for defaults |
Cache object
Create default config with regex extractor
create_default_graphrag_config(...)create_default_graphrag_config(...)
... |
Additional options |
GraphRAGConfig object
Factory function to create HNSW index
create_hnsw_index(dimension, metric = "angular", n_trees = 50)create_hnsw_index(dimension, metric = "angular", n_trees = 50)
dimension |
Vector dimension |
metric |
Distance metric |
n_trees |
Number of trees |
HNSWIndex object
Factory function for GraphRAGPipeline
create_pipeline(config = NULL)create_pipeline(config = NULL)
config |
GraphRAGConfig (optional) |
GraphRAGPipeline object
Convenience function to create a SentenceEmbedder with GloVe vectors
create_sentence_embedder(model = "glove-100", use_idf = TRUE)create_sentence_embedder(model = "glove-100", use_idf = TRUE)
model |
Model name (default: "glove-100") |
use_idf |
Use IDF weighting |
SentenceEmbedder object
## Not run: # Downloads GloVe if not present embedder <- create_sentence_embedder("glove-100") # Embed texts vectors <- embedder$embed(c("Hello world", "Machine learning is cool")) ## End(Not run)## Not run: # Downloads GloVe if not present embedder <- create_sentence_embedder("glove-100") # Embed texts vectors <- embedder$embed(c("Hello world", "Machine learning is cool")) ## End(Not run)
Create a VectorCache with specified backend
create_vector_cache(backend = "memory", ...)create_vector_cache(backend = "memory", ...)
backend |
Backend type: "memory", "file", or "none" |
... |
Additional config options |
VectorCache object
Generates dense vector embeddings using pre-trained word vectors
dimensionEmbedding dimension
model_typeType of model being used
languageLanguage setting ("en" or "ml")
new()
Create a new DenseEmbedder
DenseEmbedder$new( dimension = 100, model_path = NULL, model_type = "tfidf", sentence_embedder = NULL, auto_download = FALSE, language = "en" )
dimensionVector dimension (default: 100 for word2vec, 50/100/200/300 for GloVe)
model_pathOptional path to pre-trained model file
model_typeType: "word2vec", "glove", "glove-pretrained", or "tfidf"
sentence_embedderOptional SentenceEmbedder object to use
auto_downloadAuto-download GloVe vectors if model_type is glove-pretrained
languageLanguage behavior ("en" = ASCII-focused, "ml" = Unicode-aware)
set_sentence_embedder()
Set a SentenceEmbedder to use for embeddings
DenseEmbedder$set_sentence_embedder(embedder)
embedderSentenceEmbedder object
embed()
Embed texts to vectors
DenseEmbedder$embed(texts)
textsCharacter vector of texts
Matrix of embeddings (rows are documents)
fit()
Train embedder on corpus (for TF-IDF)
DenseEmbedder$fit(texts)
textsCharacter vector of training texts
clone()
The objects of this class are cloneable with this method.
DenseEmbedder$clone(deep = FALSE)
deepWhether to make a deep clone.
Available distance metrics for vector comparison
DistanceMetricDistanceMetric
An object of class list of length 4.
Splits documents into text units
chunk_sizeTarget chunk size
chunk_overlapOverlap size
by_sentencePreserve sentences
new()
Create a new DocumentChunker
DocumentChunker$new(chunk_size = 1200, chunk_overlap = 100, by_sentence = TRUE)
chunk_sizeTarget size
chunk_overlapOverlap
by_sentencePreserve sentences
chunk()
Chunk a document
DocumentChunker$chunk(text, document_id = NULL)
textDocument text
document_idDocument ID
List of TextUnit objects
clone()
The objects of this class are cloneable with this method.
DocumentChunker$clone(deep = FALSE)
deepWhether to make a deep clone.
Download GloVe or other pre-trained word vectors
download_vectors(model = "glove-50", dest_dir = NULL)download_vectors(model = "glove-50", dest_dir = NULL)
model |
Model name: "glove-50", "glove-100", "glove-200", "glove-300" |
dest_dir |
Destination directory |
Path to downloaded model
Downloads GloVe or fastText word vectors
download_word_vectors(model = "glove-100", dest_dir = NULL, overwrite = FALSE)download_word_vectors(model = "glove-100", dest_dir = NULL, overwrite = FALSE)
model |
Model to download: "glove-50", "glove-100", "glove-200", "glove-300", "glove-twitter-25", "glove-twitter-50", "glove-twitter-100", "glove-twitter-200" |
dest_dir |
Destination directory (default: user cache) |
overwrite |
Overwrite existing files |
Path to the downloaded vectors file
## Not run: # Download 100-dimensional GloVe vectors (~130MB) path <- download_word_vectors("glove-100") # Use with Vectrix db <- Vectrix$new("docs", model = "glove", model_path = path) ## End(Not run)## Not run: # Download 100-dimensional GloVe vectors (~130MB) path <- download_word_vectors("glove-100") # Use with Vectrix db <- Vectrix$new("docs", model = "glove", model_path = path) ## End(Not run)
Embedding models for text vectorization using R-native packages
Common English stopwords
ENGLISH_STOPWORDSENGLISH_STOPWORDS
An object of class character of length 45.
Search results with enterprise features
resultsList of result items
facetsNamed list of FacetResult objects
total_countTotal results before filtering
filtered_countResults after ACL filtering
query_time_msQuery time in milliseconds
rerank_time_msRerank time in milliseconds
facet_time_msFacet time in milliseconds
new()
Create new EnhancedSearchResults
EnhancedSearchResults$new( results, facets = list(), total_count = 0, filtered_count = 0, query_time_ms = 0, rerank_time_ms = 0, facet_time_ms = 0 )
resultsList of results
facetsNamed list of FacetResult
total_countTotal count
filtered_countFiltered count
query_time_msQuery time
rerank_time_msRerank time
facet_time_msFacet time
to_list()
Convert to list
EnhancedSearchResults$to_list()
List representation
clone()
The objects of this class are cloneable with this method.
EnhancedSearchResults$clone(deep = FALSE)
deepWhether to make a deep clone.
An extracted entity
idUnique identifier
nameEntity name
typeEntity type
descriptionDescription
source_chunksSource chunk IDs
embeddingVector embedding
metadataAdditional metadata
new()
Create a new Entity
Entity$new( id = NULL, name, type, description = NULL, source_chunks = NULL, embedding = NULL, metadata = NULL )
idUnique ID
nameName
typeType
descriptionDescription
source_chunksSources
embeddingVector
metadataMetadata
to_list()
Convert to list
Entity$to_list()
clone()
The objects of this class are cloneable with this method.
Entity$clone(deep = FALSE)
deepWhether to make a deep clone.
Result of entity extraction
entitiesList of Entity objects
relationshipsList of Relationship objects
source_chunkSource chunk ID
new()
Create new ExtractionResult
ExtractionResult$new( entities = list(), relationships = list(), source_chunk = NULL )
entitiesEntities
relationshipsRelationships
source_chunkSource
clone()
The objects of this class are cloneable with this method.
ExtractionResult$clone(deep = FALSE)
deepWhether to make a deep clone.
Extractor Types
ExtractorTypeExtractorType
An object of class list of length 4.
Faceted search aggregator for computing aggregations/counts
aggregate()
Aggregate facet values from documents
FacetAggregator$aggregate(documents, facet_configs)
documentsList of documents with metadata
facet_configsList of field names or FacetConfig objects
Named list mapping field names to FacetResult
to_list()
Convert facet results to list format
FacetAggregator$to_list(facet_results)
facet_resultsNamed list of FacetResult objects
List format suitable for JSON
clone()
The objects of this class are cloneable with this method.
FacetAggregator$clone(deep = FALSE)
deepWhether to make a deep clone.
## Not run: aggregator <- FacetAggregator$new() facets <- aggregator$aggregate( documents = list( list(category = "tech", author = "Alice"), list(category = "science", author = "Bob") ), facet_fields = c("category", "author") ) ## End(Not run)## Not run: aggregator <- FacetAggregator$new() facets <- aggregator$aggregate( documents = list( list(category = "tech", author = "Alice"), list(category = "science", author = "Bob") ), facet_fields = c("category", "author") ) ## End(Not run)
Configuration for a facet field
fieldField name to facet on
limitMax values to return
min_countMinimum count to include
sort_bySort by "count" or "value"
include_zeroInclude zero-count values
new()
Create a new FacetConfig
FacetConfig$new( field, limit = 10, min_count = 1, sort_by = "count", include_zero = FALSE )
fieldField name
limitMax values (default: 10)
min_countMin count (default: 1)
sort_bySort method (default: "count")
include_zeroInclude zeros (default: FALSE)
clone()
The objects of this class are cloneable with this method.
FacetConfig$clone(deep = FALSE)
deepWhether to make a deep clone.
Result of facet aggregation
fieldField name
valuesList of FacetValue objects
total_countTotal count
other_countCount of values not in top-N
new()
Create a new FacetResult
FacetResult$new(field, values, total_count, other_count = 0)
fieldField name
valuesList of FacetValue objects
total_countTotal count
other_countOther count
to_list()
Convert to list
FacetResult$to_list()
clone()
The objects of this class are cloneable with this method.
FacetResult$clone(deep = FALSE)
deepWhether to make a deep clone.
A single facet value with count
valueThe facet value
countNumber of occurrences
new()
Create a new FacetValue
FacetValue$new(value, count)
valueThe value
countThe count
clone()
The objects of this class are cloneable with this method.
FacetValue$clone(deep = FALSE)
deepWhether to make a deep clone.
File-based persistent cache using RDS files
VectrixDB::BaseCache -> FileCache
new()
Create a new FileCache
FileCache$new(config = NULL)
configCacheConfig object
get()
Get value from cache
FileCache$get(key)
keyCache key
Value or NULL
set()
Set value in cache
FileCache$set(key, value, ttl = NULL)
keyCache key
valueValue to cache
ttlTime to live
delete()
Delete key from cache
FileCache$delete(key)
keyCache key
Logical success
exists()
Check if key exists
FileCache$exists(key)
keyCache key
Logical
clear()
Clear cache
FileCache$clear()
size()
Get cache size
FileCache$size()
Integer
cleanup_expired()
Cleanup expired entries
FileCache$cleanup_expired()
Integer count removed
clone()
The objects of this class are cloneable with this method.
FileCache$clone(deep = FALSE)
deepWhether to make a deep clone.
Build metadata filters for search queries
conditionsList of filter conditions
new()
Create a new Filter
Filter$new(...)
...Named filter conditions
eq()
Add equality condition
Filter$eq(field, value)
fieldField name
valueValue to match
Self for chaining
ne()
Add not-equal condition
Filter$ne(field, value)
fieldField name
valueValue to exclude
Self for chaining
gt()
Add greater-than condition
Filter$gt(field, value)
fieldField name
valueThreshold value
Self for chaining
lt()
Add less-than condition
Filter$lt(field, value)
fieldField name
valueThreshold value
Self for chaining
in_list()
Add in-list condition
Filter$in_list(field, values)
fieldField name
valuesVector of values
Self for chaining
to_list()
Convert to list for API
Filter$to_list()
List representation
clone()
The objects of this class are cloneable with this method.
Filter$clone(deep = FALSE)
deepWhether to make a deep clone.
Community-based graph search
communitiesList of communities
kNumber of communities
new()
Create a new GlobalSearcher
GlobalSearcher$new(communities, k = 5)
communitiesList of Community objects
kNumber of communities
search()
Search communities
GlobalSearcher$search(query)
queryQuery string
GlobalSearchResult
clone()
The objects of this class are cloneable with this method.
GlobalSearcher$clone(deep = FALSE)
deepWhether to make a deep clone.
Result from global community search
communitiesMatching communities
summariesCommunity summaries
contextCombined context
scoreRelevance score
new()
Create new GlobalSearchResult
GlobalSearchResult$new( communities = list(), summaries = character(0), context = NULL, score = 0 )
communitiesCommunities
summariesSummaries
contextContext
scoreScore
clone()
The objects of this class are cloneable with this method.
GlobalSearchResult$clone(deep = FALSE)
deepWhether to make a deep clone.
Native GraphRAG implementation for VectrixDB
Features:
Entity and relationship extraction
Hierarchical community detection
Local, global, and hybrid search strategies
Incremental graph updates
Configuration for VectrixDB's GraphRAG implementation
enabledWhether GraphRAG is enabled
chunk_sizeTarget tokens per chunk
chunk_overlapOverlapping tokens
chunk_by_sentencePreserve sentence boundaries
extractorExtraction method
nlp_modelNLP model name
llm_providerLLM provider
llm_modelModel name
llm_api_keyAPI key
llm_endpointCustom endpoint
llm_temperatureTemperature
llm_max_tokensMax tokens
max_community_levelsMax hierarchy depth
min_community_sizeMin entities per community
relationship_thresholdMin relationship strength
deduplicate_entitiesMerge similar entities
entity_similarity_thresholdSimilarity for dedup
search_typeDefault search strategy
local_search_kSeed entities for local search
global_search_kCommunities for global search
traversal_depthMax hops
include_relationshipsInclude relationship context
include_community_contextInclude community summaries
enable_incrementalIncremental updates
batch_sizeChunks per batch
use_cacheCache embeddings
cache_ttlCache TTL seconds
entity_typesTypes to extract
relationship_typesTypes to extract
new()
Create a new GraphRAGConfig
GraphRAGConfig$new(enabled = FALSE, ...)
enabledEnable GraphRAG
...Additional configuration options
with_openai()
Configure for OpenAI
GraphRAGConfig$with_openai(model = "gpt-4o-mini", api_key = NULL)
modelModel name
api_keyAPI key
Self
with_ollama()
Configure for Ollama
GraphRAGConfig$with_ollama( model = "llama3.2", endpoint = "http://localhost:11434" )
modelModel name
endpointEndpoint URL
Self
clone()
The objects of this class are cloneable with this method.
GraphRAGConfig$clone(deep = FALSE)
deepWhether to make a deep clone.
## Not run: config <- GraphRAGConfig$new(enabled = TRUE) db <- Vectrix$new("knowledge_base", graphrag_config = config) ## End(Not run)## Not run: config <- GraphRAGConfig$new(enabled = TRUE) db <- Vectrix$new("knowledge_base", graphrag_config = config) ## End(Not run)
Complete GraphRAG processing pipeline
configGraphRAGConfig
graphKnowledgeGraph
communitiesDetected communities
new()
Create a new GraphRAGPipeline
GraphRAGPipeline$new(config = NULL)
configGraphRAGConfig
process()
Process documents
GraphRAGPipeline$process(texts, document_ids = NULL)
textsCharacter vector of documents
document_idsDocument IDs
Self
search()
Search the graph
GraphRAGPipeline$search(query, search_type = NULL)
queryQuery string
search_type"local", "global", or "hybrid"
Search result
stats()
Get statistics
GraphRAGPipeline$stats()
Named list
clone()
The objects of this class are cloneable with this method.
GraphRAGPipeline$clone(deep = FALSE)
deepWhether to make a deep clone.
Graph Search Types
GraphSearchTypeGraphSearchType
An object of class list of length 3.
Hierarchical Navigable Small World graph for fast approximate nearest neighbor search
Uses RcppAnnoy for high-performance ANN search. Falls back to brute-force search if RcppAnnoy is not available.
High-performance approximate nearest neighbor index
dimensionVector dimension
metricDistance metric
n_treesNumber of trees (for Annoy)
search_kSearch parameter
new()
Create a new HNSWIndex
HNSWIndex$new(dimension, metric = "angular", n_trees = 50, search_k = -1)
dimensionVector dimension
metricDistance metric: "angular", "euclidean", "manhattan", "dot"
n_treesNumber of trees for index (higher = more accuracy)
search_kSearch parameter (higher = more accuracy, -1 = auto)
add_items()
Add items to the index
HNSWIndex$add_items(ids, vectors)
idsCharacter vector of IDs
vectorsMatrix of vectors (rows = items)
Self
build()
Build the index (required before searching)
HNSWIndex$build()
Self
search()
Search for nearest neighbors
HNSWIndex$search(query, k = 10, include_distances = TRUE)
queryQuery vector
kNumber of neighbors
include_distancesReturn distances
Data frame with id, distance columns
get_vector()
Get vector by ID
HNSWIndex$get_vector(id)
idItem ID
Vector or NULL
get_ids()
Get all IDs
HNSWIndex$get_ids()
Character vector
size()
Get item count
HNSWIndex$size()
Integer
remove_items()
Remove items from index
HNSWIndex$remove_items(ids)
idsIDs to remove
Self
clear()
Clear the index
HNSWIndex$clear()
Self
save()
Save index to file
HNSWIndex$save(path)
pathFile path
load()
Load index from file
HNSWIndex$load(path)
pathFile path
Self
clone()
The objects of this class are cloneable with this method.
HNSWIndex$clone(deep = FALSE)
deepWhether to make a deep clone.
## Not run: # Create index index <- HNSWIndex$new(dimension = 128, metric = "angular") # Add vectors index$add_items(ids = c("a", "b", "c"), vectors = matrix(rnorm(384), nrow = 3)) # Search results <- index$search(query = rnorm(128), k = 5) ## End(Not run)## Not run: # Create index index <- HNSWIndex$new(dimension = 128, metric = "angular") # Add vectors index$add_items(ids = c("a", "b", "c"), vectors = matrix(rnorm(384), nrow = 3)) # Search results <- index$search(query = rnorm(128), k = 5) ## End(Not run)
Treats entire input as single token
VectrixDB::TextAnalyzer -> KeywordAnalyzer
analyze()
Analyze text as single keyword
KeywordAnalyzer$analyze(text)
textInput text
Single-element character vector
clone()
The objects of this class are cloneable with this method.
KeywordAnalyzer$clone(deep = FALSE)
deepWhether to make a deep clone.
Graph storage for entities and relationships
nameGraph name
new()
Create a new KnowledgeGraph
KnowledgeGraph$new(name = "default")
nameGraph name
add_entity()
Add an entity
KnowledgeGraph$add_entity(entity)
entityEntity object
add_relationship()
Add a relationship
KnowledgeGraph$add_relationship(relationship)
relationshipRelationship object
get_entity()
Get entity by ID
KnowledgeGraph$get_entity(entity_id)
entity_idEntity ID
Entity or NULL
get_all_entities()
Get all entities
KnowledgeGraph$get_all_entities()
List of Entity objects
get_all_relationships()
Get all relationships
KnowledgeGraph$get_all_relationships()
List of Relationship objects
get_neighbors()
Get neighbors of an entity
KnowledgeGraph$get_neighbors(entity_id, direction = "both")
entity_idEntity ID
direction"out", "in", or "both"
List of Entity objects
traverse()
Traverse graph from seed entities
KnowledgeGraph$traverse(seed_ids, max_depth = 2)
seed_idsStarting entity IDs
max_depthMaximum depth
SubGraph object
entity_count()
Get entity count
KnowledgeGraph$entity_count()
Integer
relationship_count()
Get relationship count
KnowledgeGraph$relationship_count()
Integer
search_entities()
Search entities by name
KnowledgeGraph$search_entities(query, limit = 10)
queryQuery string
limitMax results
List of Entity objects
clone()
The objects of this class are cloneable with this method.
KnowledgeGraph$clone(deep = FALSE)
deepWhether to make a deep clone.
Token-level embeddings for late interaction scoring
dimensionToken embedding dimension
languageLanguage setting ("en" or "ml")
new()
Create a new LateInteractionEmbedder
LateInteractionEmbedder$new(dimension = 64, language = "en")
dimensionEmbedding dimension per token
languageLanguage behavior ("en" = ASCII-focused, "ml" = Unicode-aware)
embed()
Embed texts to token-level embeddings
LateInteractionEmbedder$embed(texts)
textsCharacter vector of texts
List of matrices (each matrix is token embeddings for a document)
score()
Compute late interaction (MaxSim) score
LateInteractionEmbedder$score(query_embeddings, doc_embeddings)
query_embeddingsQuery token embeddings matrix
doc_embeddingsDocument token embeddings matrix
Numeric score
clone()
The objects of this class are cloneable with this method.
LateInteractionEmbedder$clone(deep = FALSE)
deepWhether to make a deep clone.
LLM Provider Types
LLMProviderLLMProvider
An object of class list of length 4.
Load saved index from file
load_hnsw_index(path)load_hnsw_index(path)
path |
File path |
HNSWIndex object
Loads pre-trained word vectors from a file
load_word_vectors(path, max_words = NULL, normalize = TRUE)load_word_vectors(path, max_words = NULL, normalize = TRUE)
path |
Path to word vectors file (GloVe .txt or word2vec .bin) |
max_words |
Maximum number of words to load (NULL for all) |
normalize |
Normalize vectors to unit length |
WordVectors object
Entity-based graph search
graphKnowledge graph
kNumber of seed entities
traversal_depthMax hops
new()
Create a new LocalSearcher
LocalSearcher$new(graph, k = 10, traversal_depth = 2)
graphKnowledgeGraph
kSeed entities
traversal_depthMax depth
search()
Search the graph
LocalSearcher$search(query)
queryQuery string
LocalSearchResult
clone()
The objects of this class are cloneable with this method.
LocalSearcher$clone(deep = FALSE)
deepWhether to make a deep clone.
Result from local graph search
entitiesMatching entities
relationshipsRelated relationships
subgraphTraversed subgraph
contextCombined context text
scoreRelevance score
new()
Create new LocalSearchResult
LocalSearchResult$new( entities = list(), relationships = list(), subgraph = NULL, context = NULL, score = 0 )
entitiesEntities
relationshipsRelationships
subgraphSubGraph
contextContext
scoreScore
clone()
The objects of this class are cloneable with this method.
LocalSearchResult$clone(deep = FALSE)
deepWhether to make a deep clone.
In-memory LRU cache with TTL support
Ultra-low latency, limited by available RAM. Best for hot data, session data, frequently accessed vectors.
VectrixDB::BaseCache -> MemoryCache
new()
Create a new MemoryCache
MemoryCache$new(config = NULL)
configCacheConfig object
get()
Get value from cache
MemoryCache$get(key)
keyCache key
Value or NULL
set()
Set value in cache
MemoryCache$set(key, value, ttl = NULL)
keyCache key
valueValue to cache
ttlTime to live
delete()
Delete key from cache
MemoryCache$delete(key)
keyCache key
Logical success
exists()
Check if key exists
MemoryCache$exists(key)
keyCache key
Logical
clear()
Clear cache
MemoryCache$clear()
size()
Get cache size
MemoryCache$size()
Integer
cleanup_expired()
Cleanup expired entries
MemoryCache$cleanup_expired()
Integer count removed
clone()
The objects of this class are cloneable with this method.
MemoryCache$clone(deep = FALSE)
deepWhether to make a deep clone.
Reranks for diversity using MMR algorithm
lambdaBalance between relevance and diversity (0-1)
new()
Create a new MMRReranker
MMRReranker$new(lambda = 0.7)
lambdaRelevance vs diversity tradeoff (higher = more relevance)
rerank()
Rerank for diversity
MMRReranker$rerank(query_vector, doc_vectors, doc_ids, scores, limit = 10)
query_vectorQuery embedding
doc_vectorsMatrix of document embeddings
doc_idsVector of document IDs
scoresOriginal relevance scores
limitNumber of results
Data frame with reranked results
clone()
The objects of this class are cloneable with this method.
MMRReranker$clone(deep = FALSE)
deepWhether to make a deep clone.
Disabled cache (no caching)
VectrixDB::BaseCache -> NoCache
get()
Get value from cache (always returns NULL)
NoCache$get(key)
keyCache key
NULL
set()
Set cache value (no-op)
NoCache$set(key, value, ttl = NULL)
keyCache key
valueValue to cache
ttlTime-to-live in seconds (ignored)
Invisibly returns NULL
delete()
Delete key from cache (always FALSE)
NoCache$delete(key)
keyCache key
FALSE
exists()
Check if key exists (always FALSE)
NoCache$exists(key)
keyCache key
FALSE
clear()
Clear cache (no-op)
NoCache$clear()
Invisibly returns NULL
size()
Get cache size (always 0)
NoCache$size()
Integer zero
clone()
The objects of this class are cloneable with this method.
NoCache$clone(deep = FALSE)
deepWhether to make a deep clone.
Parse ACL string like 'user:alice' or 'group:engineering'
parse_acl(acl_string)parse_acl(acl_string)
acl_string |
ACL string |
ACLPrincipal object
Quick search - Index texts and search immediately
quick_search(texts, query, limit = 5)quick_search(texts, query, limit = 5)
texts |
Character vector of texts to index |
query |
Search query |
limit |
Number of results |
Results object
## Not run: results <- quick_search( texts = c("Python is great", "Java is verbose", "Rust is fast"), query = "programming language" ) print(results$top()$text) ## End(Not run)## Not run: results <- quick_search( texts = c("Python is great", "Java is verbose", "Rust is fast"), query = "programming language" ) print(results$top()$text) ## End(Not run)
Simple regex-based entity extractor (no external dependencies)
entity_typesEntity types to extract
new()
Create a new RegexExtractor
RegexExtractor$new(entity_types = NULL)
entity_typesTypes to extract
extract()
Extract entities from text
RegexExtractor$extract(text, chunk_id = NULL)
textText to extract from
chunk_idChunk ID
ExtractionResult
clone()
The objects of this class are cloneable with this method.
RegexExtractor$clone(deep = FALSE)
deepWhether to make a deep clone.
A relationship between entities
idUnique identifier
source_idSource entity ID
target_idTarget entity ID
typeRelationship type
descriptionDescription
weightRelationship weight
source_chunksSource chunk IDs
metadataAdditional metadata
new()
Create a new Relationship
Relationship$new( source_id, target_id, type, description = NULL, weight = 1, source_chunks = NULL, metadata = NULL )
source_idSource entity
target_idTarget entity
typeRelationship type
descriptionDescription
weightWeight
source_chunksSources
metadataMetadata
to_list()
Convert to list
Relationship$to_list()
clone()
The objects of this class are cloneable with this method.
Relationship$clone(deep = FALSE)
deepWhether to make a deep clone.
Reranks results using term overlap and semantic similarity
languageLanguage setting ("en" or "ml")
new()
Create a new RerankerEmbedder
RerankerEmbedder$new(language = "en")
languageLanguage behavior ("en" = English stopwords, "ml" = Unicode tokens)
score()
Score query-document pairs
RerankerEmbedder$score(query, documents)
queryQuery text
documentsCharacter vector of document texts
Numeric vector of scores (0-1)
clone()
The objects of this class are cloneable with this method.
RerankerEmbedder$clone(deep = FALSE)
deepWhether to make a deep clone.
Represents a single search result with id, text, score, and metadata
idDocument ID
textDocument text
scoreRelevance score
metadataDocument metadata
new()
Create a new Result object
Result$new(id, text, score, metadata = list())
idDocument ID
textDocument text
scoreRelevance score
metadataOptional metadata list
print()
Print result summary
Result$print()
clone()
The objects of this class are cloneable with this method.
Result$clone(deep = FALSE)
deepWhether to make a deep clone.
Collection of search results with convenient accessors
itemsList of Result objects
querySearch query
modeSearch mode
time_msExecution time in ms
new()
Create a new Results object
Results$new(items = list(), query = "", mode = "hybrid", time_ms = 0)
itemsList of Result objects
querySearch query string
modeSearch mode used
time_msExecution time in milliseconds
length()
Get number of results
Results$length()
texts()
Get all result texts
Results$texts()
Character vector of texts
ids()
Get all result IDs
Results$ids()
Character vector of IDs
scores()
Get all scores
Results$scores()
Numeric vector of scores
top()
Get top result
Results$top()
Result object or NULL if empty
get()
Get result by index
Results$get(i)
iIndex
Result object
foreach()
Iterate over results
Results$foreach(fn)
fnFunction to apply to each result
print()
Print results summary
Results$print()
clone()
The objects of this class are cloneable with this method.
Results$clone(deep = FALSE)
deepWhether to make a deep clone.
Available search modes for VectrixDB
SearchModeSearchMode
An object of class list of length 5.
Creates sentence embeddings by averaging word vectors with IDF weighting
dimEmbedding dimension
vocab_sizeVocabulary size
new()
Create a new SentenceEmbedder
SentenceEmbedder$new(word_vectors, use_idf = TRUE, smooth_idf = 1)
word_vectorsWordVectors object from load_word_vectors()
use_idfUse IDF weighting (recommended)
smooth_idfSmoothing for IDF
fit()
Fit IDF weights on a corpus
SentenceEmbedder$fit(texts)
textsCharacter vector of texts
embed()
Embed texts to sentence vectors
SentenceEmbedder$embed(texts)
textsCharacter vector of texts
Matrix of embeddings (rows are sentences)
get_word_vector()
Get word vector for a single word
SentenceEmbedder$get_word_vector(word)
wordWord to look up
Numeric vector or NULL if not found
has_word()
Check if word is in vocabulary
SentenceEmbedder$has_word(word)
wordWord to check
Logical
most_similar()
Find most similar words
SentenceEmbedder$most_similar(word, n = 10)
wordQuery word
nNumber of results
Data frame with word and similarity
clone()
The objects of this class are cloneable with this method.
SentenceEmbedder$clone(deep = FALSE)
deepWhether to make a deep clone.
Set CLI Config
set_cli_config(config)set_cli_config(config)
config |
CLIConfig object |
Simple suffix-stripping stemmer (no external dependencies)
suffixesList of suffixes to remove
stem()
Stem a word
SimpleStemmer$stem(word)
wordWord to stem
Stemmed word
stem_words()
Stem multiple words
SimpleStemmer$stem_words(words)
wordsCharacter vector
Stemmed words
clone()
The objects of this class are cloneable with this method.
SimpleStemmer$clone(deep = FALSE)
deepWhether to make a deep clone.
Generates sparse BM25 embeddings for keyword search
vocabVocabulary
languageLanguage setting ("en" or "ml")
new()
Create a new SparseEmbedder
SparseEmbedder$new(language = "en")
languageLanguage behavior ("en" = ASCII-focused, "ml" = Unicode-aware)
fit()
Fit the embedder on a corpus
SparseEmbedder$fit(texts)
textsCharacter vector of texts
embed()
Embed texts to sparse vectors
SparseEmbedder$embed(texts)
textsCharacter vector of texts
Sparse matrix of BM25 scores
query_terms()
Get term scores for a query
SparseEmbedder$query_terms(query)
queryQuery text
Named vector of term scores
clone()
The objects of this class are cloneable with this method.
SparseEmbedder$clone(deep = FALSE)
deepWhether to make a deep clone.
A subset of a knowledge graph
entitiesEntities in subgraph
relationshipsRelationships in subgraph
new()
Create a new SubGraph
SubGraph$new(entities = list(), relationships = list())
entitiesEntities
relationshipsRelationships
to_list()
Convert to list
SubGraph$to_list()
clone()
The objects of this class are cloneable with this method.
SubGraph$clone(deep = FALSE)
deepWhether to make a deep clone.
English analyzer with stemming and stopwords
text_analyzer_english()text_analyzer_english()
TextAnalyzer object
No tokenization - treat input as single token
text_analyzer_keyword()text_analyzer_keyword()
TextAnalyzer object
Lowercase + letter-only tokenization
text_analyzer_simple()text_analyzer_simple()
TextAnalyzer object
Lowercase + basic tokenization
text_analyzer_standard()text_analyzer_standard()
TextAnalyzer object
Text analyzer for search indexing
Provides text processing pipelines:
Tokenization
Lowercasing
Stopword removal
Stemming
Synonym expansion
lowercaseConvert to lowercase
remove_stopwordsRemove stopwords
stopwordsSet of stopwords
stemmerStemmer object
synonymsSynonym dictionary
min_token_lengthMinimum token length
max_token_lengthMaximum token length
token_patternRegex pattern for tokens
new()
Create a new TextAnalyzer
TextAnalyzer$new( lowercase = TRUE, remove_stopwords = FALSE, stopwords = NULL, use_stemmer = FALSE, synonyms = NULL, min_token_length = 1, max_token_length = 100, token_pattern = "[a-zA-Z0-9]+" )
lowercaseLowercase text (default: TRUE)
remove_stopwordsRemove stopwords (default: FALSE)
stopwordsCustom stopwords (default: ENGLISH_STOPWORDS)
use_stemmerUse stemming (default: FALSE)
synonymsNamed list of synonyms
min_token_lengthMin length (default: 1)
max_token_lengthMax length (default: 100)
token_patternRegex pattern
analyze()
Analyze text and return tokens
TextAnalyzer$analyze(text)
textInput text
Character vector of tokens
analyze_query()
Analyze a query string
TextAnalyzer$analyze_query(query)
queryQuery text
Character vector of tokens
clone()
The objects of this class are cloneable with this method.
TextAnalyzer$clone(deep = FALSE)
deepWhether to make a deep clone.
## Not run: analyzer <- TextAnalyzer$english() tokens <- analyzer$analyze("The quick brown foxes are jumping") # c("quick", "brown", "fox", "jump") ## End(Not run)## Not run: analyzer <- TextAnalyzer$english() tokens <- analyzer$analyze("The quick brown foxes are jumping") # c("quick", "brown", "fox", "jump") ## End(Not run)
A chunk of text from a document
idUnique identifier
textText content
document_idSource document
chunk_indexIndex in document
start_charStart position
end_charEnd position
metadataAdditional metadata
new()
Create a new TextUnit
TextUnit$new( id, text, document_id = NULL, chunk_index = 0, start_char = 0, end_char = 0, metadata = NULL )
idUnique ID
textContent
document_idSource doc
chunk_indexIndex
start_charStart
end_charEnd
metadataMetadata
clone()
The objects of this class are cloneable with this method.
TextUnit$clone(deep = FALSE)
deepWhether to make a deep clone.
Add documents to a collection
vdb_add(db, texts, metadata = NULL, ids = NULL)vdb_add(db, texts, metadata = NULL, ids = NULL)
db |
Vectrix object or collection name |
texts |
Character vector of texts |
metadata |
Optional metadata |
ids |
Optional IDs |
Vectrix object
## Not run: vdb_add(db, c("Document 1", "Document 2")) vdb_add("my_docs", c("Another doc")) ## End(Not run)## Not run: vdb_add(db, c("Document 1", "Document 2")) vdb_add("my_docs", c("Another doc")) ## End(Not run)
Add all text files from a directory
vdb_add_dir(db, dir_path, pattern = "\\.txt$", recursive = TRUE)vdb_add_dir(db, dir_path, pattern = "\\.txt$", recursive = TRUE)
db |
Vectrix object or collection name |
dir_path |
Directory path |
pattern |
File pattern (default: "*.txt") |
recursive |
Search subdirectories |
Vectrix object
## Not run: vdb_add_dir(db, "./documents/") ## End(Not run)## Not run: vdb_add_dir(db, "./documents/") ## End(Not run)
Create a new VectrixDB collection
vdb_create(name, model = "tfidf", dimension = NULL, data_dir = NULL)vdb_create(name, model = "tfidf", dimension = NULL, data_dir = NULL)
name |
Collection name |
model |
Embedding model |
dimension |
Vector dimension |
data_dir |
Data directory |
Vectrix object
## Not run: db <- vdb_create("my_docs") ## End(Not run)## Not run: db <- vdb_create("my_docs") ## End(Not run)
Start the VectrixDB API server and mount the HTML dashboard at
/dashboard.
vdb_dashboard( db = NULL, data_path = NULL, port = 7377, host = "127.0.0.1", launch.browser = TRUE, api_key = NULL )vdb_dashboard( db = NULL, data_path = NULL, port = 7377, host = "127.0.0.1", launch.browser = TRUE, api_key = NULL )
db |
Optional |
data_path |
Path to vector database directory. |
port |
Port number (default: |
host |
Host address (default: |
launch.browser |
Whether to open browser on start. |
api_key |
Optional API key for authenticated write operations. |
Invisibly returns server object from vectrix_serve().
Convenience wrapper around vdb_dashboard() using db$path.
vdb_dashboard_simple(db)vdb_dashboard_simple(db)
db |
Vectrix object. |
Invisibly returns server object from vdb_dashboard().
Delete a collection
vdb_delete(name, data_dir = NULL, confirm = TRUE)vdb_delete(name, data_dir = NULL, confirm = TRUE)
name |
Collection name |
data_dir |
Data directory |
confirm |
Require confirmation |
Logical success
## Not run: vdb_delete("my_docs") ## End(Not run)## Not run: vdb_delete("my_docs") ## End(Not run)
Delete documents by ID
vdb_delete_docs(db, ids)vdb_delete_docs(db, ids)
db |
Vectrix object or collection name |
ids |
Document ID(s) |
Vectrix object
Export collection to JSON file
vdb_export(db, path)vdb_export(db, path)
db |
Vectrix object or collection name |
path |
Output file path |
Logical success
## Not run: vdb_export(db, "backup.json") ## End(Not run)## Not run: vdb_export(db, "backup.json") ## End(Not run)
Get document by ID
vdb_get(db, ids)vdb_get(db, ids)
db |
Vectrix object or collection name |
ids |
Document ID(s) |
List of Result objects
Import documents from text file
vdb_import(db, path, separator = "\n")vdb_import(db, path, separator = "\n")
db |
Vectrix object or collection name |
path |
Input file path |
separator |
Line separator for documents |
Vectrix object
## Not run: vdb_import(db, "documents.txt") ## End(Not run)## Not run: vdb_import(db, "documents.txt") ## End(Not run)
Get collection information
vdb_info(db)vdb_info(db)
db |
Vectrix object or collection name |
Named list of info
## Not run: vdb_info(db) vdb_info("my_docs") ## End(Not run)## Not run: vdb_info(db) vdb_info("my_docs") ## End(Not run)
Start an interactive VectrixDB session
vdb_interactive(collection = NULL)vdb_interactive(collection = NULL)
collection |
Default collection name |
## Not run: vdb_interactive() ## End(Not run)## Not run: vdb_interactive() ## End(Not run)
List all VectrixDB collections in the data directory
vdb_list(data_dir = NULL)vdb_list(data_dir = NULL)
data_dir |
Data directory path |
Character vector of collection names
## Not run: vdb_list() ## End(Not run)## Not run: vdb_list() ## End(Not run)
Open an existing collection
vdb_open(name, data_dir = NULL)vdb_open(name, data_dir = NULL)
name |
Collection name |
data_dir |
Data directory |
Vectrix object
## Not run: db <- vdb_open("my_docs") ## End(Not run)## Not run: db <- vdb_open("my_docs") ## End(Not run)
Search a collection
vdb_search(db, query, limit = 10, mode = "hybrid", show = TRUE)vdb_search(db, query, limit = 10, mode = "hybrid", show = TRUE)
db |
Vectrix object or collection name |
query |
Search query |
limit |
Number of results |
mode |
Search mode: "dense", "sparse", "hybrid", "ultimate" |
show |
Print results |
Results object
## Not run: results <- vdb_search(db, "machine learning") results <- vdb_search("my_docs", "AI", limit = 5) ## End(Not run)## Not run: results <- vdb_search(db, "machine learning") results <- vdb_search("my_docs", "AI", limit = 5) ## End(Not run)
Get detailed statistics
vdb_stats(db)vdb_stats(db)
db |
Vectrix object or collection name |
Named list of stats
Specialized cache for vector search results
Features:
Query result caching
Vector embedding caching
Automatic cache invalidation
prefixCache key prefix
new()
Create a new VectorCache
VectorCache$new(cache, prefix = "vec:")
cacheBase cache backend
prefixKey prefix (default: "vec:")
get_search_results()
Get cached search results
VectorCache$get_search_results(collection, query, filter = NULL, limit = 10)
collectionCollection name
queryQuery vector
filterFilter conditions
limitResult limit
Cached results or NULL
set_search_results()
Cache search results
VectorCache$set_search_results( collection, query, results, filter = NULL, limit = 10, ttl = 300 )
collectionCollection name
queryQuery vector
resultsSearch results
filterFilter conditions
limitResult limit
ttlTime to live (default: 300)
get_vector()
Get cached vector
VectorCache$get_vector(collection, vector_id)
collectionCollection name
vector_idVector ID
Cached vector data or NULL
set_vector()
Cache vector data
VectorCache$set_vector(collection, vector_id, data, ttl = 3600)
collectionCollection name
vector_idVector ID
dataVector data
ttlTime to live (default: 3600)
invalidate_vector()
Invalidate cached vector
VectorCache$invalidate_vector(collection, vector_id)
collectionCollection name
vector_idVector ID
stats()
Get cache statistics
VectorCache$stats()
CacheStats object
clone()
The objects of this class are cloneable with this method.
VectorCache$clone(deep = FALSE)
deepWhether to make a deep clone.
Zero config. Text in, results out. One line for everything.
nameCollection name
pathStorage path
dimensionVector dimension
model_nameModel identifier
model_typeModel type
languageLanguage setting
tierStorage tier
new()
Create or open a VectrixDB collection
Vectrix$new( name = "default", path = NULL, model = NULL, dimension = NULL, embed_fn = NULL, model_path = NULL, language = NULL, tier = "dense", auto_download = TRUE )
nameCollection name
pathStorage path. Defaults to a session temp directory.
modelEmbedding model: "tfidf" (default), "glove-50", "glove-100", "glove-200", "glove-300", or "word2vec"
dimensionVector dimension (auto-detected for GloVe)
embed_fnCustom embedding function: fn(texts) -> matrix
model_pathPath to pre-trained word vectors (GloVe .txt or word2vec .bin)
languageLanguage behavior: "en" (English-focused) or "ml" (multilingual/Unicode)
tierStorage tier: "dense", "hybrid", "ultimate", or "graph"
auto_downloadAutomatically download GloVe vectors if needed (default: TRUE)
\dontrun{
# Default TF-IDF embeddings (no external files needed)
db <- Vectrix$new("docs")
# With GloVe 100d word vectors (auto-downloads ~130MB)
db <- Vectrix$new("docs", model = "glove-100")
# With pre-downloaded GloVe
db <- Vectrix$new("docs", model_path = "path/to/glove.6B.100d.txt")
# Custom embedding function
db <- Vectrix$new("docs", embed_fn = my_embed_function, dimension = 768)
}
add()
Add texts to the collection
Vectrix$add(texts, metadata = NULL, ids = NULL)
textsSingle text or character vector of texts
metadataOptional metadata list or list of lists
idsOptional custom IDs
Self for chaining
\dontrun{
db$add(c("text 1", "text 2"))
db$add("another text", metadata = list(source = "web"))
}
set_language()
Update collection language behavior
Vectrix$set_language(language = "en")
languageLanguage behavior: "en" or "ml"
Self for chaining
search()
Search the collection
Vectrix$search( query, limit = 10, mode = "hybrid", rerank = NULL, filter = NULL, diversity = 0.7 )
querySearch query text
limitNumber of results (default: 10)
modeSearch mode: "dense", "sparse", "hybrid", "ultimate"
rerankReranking method: NULL, "mmr", "exact", "cross-encoder"
filterMetadata filter
diversityDiversity parameter for MMR (0-1)
Results object with search results
\dontrun{
results <- db$search("python programming")
results <- db$search("AI", mode = "ultimate", rerank = "mmr")
print(results$top()$text)
}
delete()
Delete documents by ID
Vectrix$delete(ids)
idsDocument ID(s) to delete
Self for chaining
clear()
Clear all documents from collection
Vectrix$clear()
Self for chaining
count()
Get number of documents
Vectrix$count()
Integer count
get()
Get documents by ID
Vectrix$get(ids)
idsDocument ID(s)
List of Result objects
similar()
Find similar documents to a given document
Vectrix$similar(id, limit = 10)
idDocument ID
limitNumber of results
Results object
close()
Close the database connection
Vectrix$close()
print()
Print Vectrix summary
Vectrix$print()
clone()
The objects of this class are cloneable with this method.
Vectrix$clone(deep = FALSE)
deepWhether to make a deep clone.
## Not run: # Create and add - ONE LINE db <- Vectrix$new("my_docs")$add(c("Python is great", "Machine learning is fun")) # Search - ONE LINE results <- db$search("programming") # Full power - STILL ONE LINE results <- db$search("AI", mode = "ultimate") # dense + sparse + rerank ## End(Not run) ## ------------------------------------------------ ## Method `Vectrix$new` ## ------------------------------------------------ ## Not run: # Default TF-IDF embeddings (no external files needed) db <- Vectrix$new("docs") # With GloVe 100d word vectors (auto-downloads ~130MB) db <- Vectrix$new("docs", model = "glove-100") # With pre-downloaded GloVe db <- Vectrix$new("docs", model_path = "path/to/glove.6B.100d.txt") # Custom embedding function db <- Vectrix$new("docs", embed_fn = my_embed_function, dimension = 768) ## End(Not run) ## ------------------------------------------------ ## Method `Vectrix$add` ## ------------------------------------------------ ## Not run: db$add(c("text 1", "text 2")) db$add("another text", metadata = list(source = "web")) ## End(Not run) ## ------------------------------------------------ ## Method `Vectrix$search` ## ------------------------------------------------ ## Not run: results <- db$search("python programming") results <- db$search("AI", mode = "ultimate", rerank = "mmr") print(results$top()$text) ## End(Not run)## Not run: # Create and add - ONE LINE db <- Vectrix$new("my_docs")$add(c("Python is great", "Machine learning is fun")) # Search - ONE LINE results <- db$search("programming") # Full power - STILL ONE LINE results <- db$search("AI", mode = "ultimate") # dense + sparse + rerank ## End(Not run) ## ------------------------------------------------ ## Method `Vectrix$new` ## ------------------------------------------------ ## Not run: # Default TF-IDF embeddings (no external files needed) db <- Vectrix$new("docs") # With GloVe 100d word vectors (auto-downloads ~130MB) db <- Vectrix$new("docs", model = "glove-100") # With pre-downloaded GloVe db <- Vectrix$new("docs", model_path = "path/to/glove.6B.100d.txt") # Custom embedding function db <- Vectrix$new("docs", embed_fn = my_embed_function, dimension = 768) ## End(Not run) ## ------------------------------------------------ ## Method `Vectrix$add` ## ------------------------------------------------ ## Not run: db$add(c("text 1", "text 2")) db$add("another text", metadata = list(source = "web")) ## End(Not run) ## ------------------------------------------------ ## Method `Vectrix$search` ## ------------------------------------------------ ## Not run: results <- db$search("python programming") results <- db$search("AI", mode = "ultimate", rerank = "mmr") print(results$top()$text) ## End(Not run)
Create a new Vectrix collection
vectrix_create(name = "default", ...)vectrix_create(name = "default", ...)
name |
Collection name |
... |
Additional arguments passed to Vectrix$new() |
Vectrix object
Show database statistics and info
vectrix_info(path = NULL)vectrix_info(path = NULL)
path |
Database path |
## Not run: vectrix_info(file.path(tempdir(), "my_data")) ## End(Not run)## Not run: vectrix_info(file.path(tempdir(), "my_data")) ## End(Not run)
Open an existing Vectrix collection
vectrix_open(name = "default", path = NULL)vectrix_open(name = "default", path = NULL)
name |
Collection name |
path |
Storage path |
Vectrix object
Launch a REST API server with optional dashboard
vectrix_serve( path = NULL, host = "127.0.0.1", port = 7377, api_key = NULL, dashboard = TRUE, launch.browser = FALSE )vectrix_serve( path = NULL, host = "127.0.0.1", port = 7377, api_key = NULL, dashboard = TRUE, launch.browser = FALSE )
path |
Database path |
host |
Host address (default: "127.0.0.1") |
port |
Port number (default: 7377) |
api_key |
Optional API key for authentication |
dashboard |
Enable dashboard (default: TRUE) |
launch.browser |
Open dashboard/docs URL in browser (default: FALSE) |
Invisible NULL (server runs until stopped)
## Not run: vectrix_serve(path = file.path(tempdir(), "my_data"), port = 7377) ## End(Not run)## Not run: vectrix_serve(path = file.path(tempdir(), "my_data"), port = 7377) ## End(Not run)
Main database interface managing collections
vectrixdb(path = NULL, storage_type = "memory")vectrixdb(path = NULL, storage_type = "memory")
path |
Storage path |
storage_type |
Storage type |
VectrixDB object
pathDatabase storage path
new()
Create or open a VectrixDB database
VectrixDB$new(path = NULL, storage_type = "memory")
pathStorage path
storage_typeStorage type ("memory" or "sqlite")
create_collection()
Create a new collection
VectrixDB$create_collection( name, dimension, metric = "cosine", enable_text_index = TRUE )
nameCollection name
dimensionVector dimension
metricDistance metric
enable_text_indexEnable text indexing
Collection object
get_collection()
Get an existing collection
VectrixDB$get_collection(name)
nameCollection name
Collection object
list_collections()
List all collections
VectrixDB$list_collections()
Character vector of collection names
delete_collection()
Delete a collection
VectrixDB$delete_collection(name)
nameCollection name
has_collection()
Check if collection exists
VectrixDB$has_collection(name)
nameCollection name
Logical
stats()
Get database statistics
VectrixDB$stats()
List with stats
close()
Close the database
VectrixDB$close()
print()
Print database summary
VectrixDB$print()
clone()
The objects of this class are cloneable with this method.
VectrixDB$clone(deep = FALSE)
deepWhether to make a deep clone.
Download, load, and use pre-trained word vectors (GloVe, fastText)