Metadata-Version: 2.4
Name: sqlite-rag
Version: 0.1.4
Summary: Hybird search with SQLite AI and SQLite Vector
Author: SQLite AI Team
Project-URL: Homepage, https://sqlite.ai
Project-URL: Repository, https://github.com/sqliteai/sqlite-rag
Project-URL: Issues, https://github.com/sqliteai/sqlite-rag/issues
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: typer
Requires-Dist: huggingface_hub[hf_transfer]
Requires-Dist: markitdown[docx]
Requires-Dist: markitdown[outlook]
Requires-Dist: markitdown[pdf]
Requires-Dist: markitdown[pptx]
Requires-Dist: markitdown[xls]
Requires-Dist: markitdown[xlsx]
Requires-Dist: python-frontmatter
Requires-Dist: prompt-toolkit
Requires-Dist: sqlite-ai
Requires-Dist: sqliteai-vector
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-mock; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Requires-Dist: bandit; extra == "dev"
Requires-Dist: isort; extra == "dev"
Requires-Dist: pre-commit; extra == "dev"
Requires-Dist: pyarrow; extra == "dev"
Requires-Dist: pandas; extra == "dev"
Requires-Dist: psutil; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: build; extra == "dev"

[<img src="https://github.com/user-attachments/assets/0d406c41-ff61-41d7-a8de-249e9e652946" alt="https://sqlite.ai" width="110"/>](https://sqlite.ai)

# SQLite RAG

[![Run Tests](https://github.com/sqliteai/sqlite-rag/actions/workflows/test.yaml/badge.svg)](https://github.com/sqliteai/sqlite-rag/actions/workflows/test.yaml)
[![codecov](https://codecov.io/github/sqliteai/sqlite-rag/graph/badge.svg?token=30KYPY7864)](https://codecov.io/github/sqliteai/sqlite-rag)
![PyPI - Version](https://img.shields.io/pypi/v/sqlite-rag?link=https%3A%2F%2Fpypi.org%2Fproject%2Fsqlite-rag%2F)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/sqlite-rag?link=https%3A%2F%2Fpypi.org%2Fproject%2Fsqlite-rag)

A hybrid search engine built on SQLite with [SQLite AI](https://github.com/sqliteai/sqlite-ai) and [SQLite Vector](https://github.com/sqliteai/sqlite-vector) extensions.
SQLite RAG combines vector similarity search with full-text search ([FTS5](https://www.sqlite.org/fts5.html) extension) using Reciprocal Rank Fusion (RRF) for enhanced document retrieval.

## Features

- **Hybrid Search**: Combines vector embeddings with full-text search for optimal results
- **SQLite-based**: Built on SQLite with AI and Vector extensions for reliability and performance
- **Multi-format Text Support**: Process text file formats including PDF, DOCX, Markdown, code files
- **Recursive Character Text Splitter**: Token-aware text chunking with configurable overlap
- **Interactive CLI**: Command-line interface with interactive REPL mode
- **Flexible Configuration**: Customizable embedding models, search weights, and chunking parameters

## Installation

### Prerequisites

SQLite RAG requires SQLite with _extension loading_ support.
If you encounter extension loading issues (e.g., `'sqlite3.Connection' object has no attribute 'enable_load_extension'`), follow the setup guides for [macOS](https://github.com/sqliteai/sqlite-extensions-guide/blob/main/platforms/macos.md#python-on-macos) or [Windows](https://github.com/sqliteai/sqlite-extensions-guide/blob/main/platforms/windows.md#using-sqlite-with-python).

### Install SQLite RAG

```bash
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install sqlite-rag
```

## Quick Start

Download the model [Embedding Gemma](https://huggingface.co/unsloth/embeddinggemma-300m-GGUF) from Hugging Face chosen as default model:

```bash
sqlite-rag download-model unsloth/embeddinggemma-300m-GGUF embeddinggemma-300M-Q8_0.gguf
```

SQLite RAG comes preconfigured to work with the **Embedding Gemma** model. When you add a document or text, it automatically creates a new database (if one does not already exist) and uses default settings, so you can get started immediately without manual setup.

```bash
# Initialize sqliterag.sqlite database and add documents
sqlite-rag add-text "Artificial intelligence (AI) enables machines to learn from data"

sqlite-rag add /path/to/documents --recursive

# Search your documents
sqlite-rag search "explain AI"

# Interactive mode
sqlite-rag
> help
> search "interactive search"
> exit
```

For help run:

```bash
sqlite-rag --help
```

## CLI Commands

### Configuration

Settings are stored in the database and should be set before adding any documents.

```bash
# View available configuration options
sqlite-rag configure --help

sqlite-rag configure --model-path ./mymodels/path

# View current settings
sqlite-rag settings
```

To use a different database filename, use the global `--database` option:

```bash
# Single command with custom database
sqlite-rag --database path/to/mydb.db add-text "Let's talk about AI."

# Interactive mode with custom database
sqlite-rag --database path/to/mydb.db
```

### Model Management

You can experiment with other models from Hugging Face by downloading them with:

```bash
# Download GGUF models from Hugging Face
sqlite-rag download-model <model-repo> <filename>
```

## Supported File Formats

SQLite RAG supports the following file formats:

- **Text**: `.txt`, `.md`, `.mdx`, `.csv`, `.json`, `.xml`, `.yaml`, `.yml`
- **Documents**: `.pdf`, `.docx`, `.pptx`, `.xlsx`
- **Code**: `.c`, `.cpp`, `.css`, `.go`, `.h`, `.hpp`, `.html`, `.java`, `.js`, `.mjs`, `.kt`, `.php`, `.py`, `.rb`, `.rs`, `.swift`, `.ts`, `.tsx`
- **Web Frameworks**: `.svelte`, `.vue`

## Development

### Installation

For development, clone the repository and install with development dependencies:

```bash
# Clone the repository
git clone https://github.com/sqliteai/sqlite-rag.git
cd sqlite-rag

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install in development mode
pip install -e '.[dev]'
```
## How It Works

1. **Document Processing**: Files are processed and split into overlapping chunks
2. **Embedding Generation**: Text chunks are converted to vector embeddings using AI models
3. **Dual Indexing**: Content is indexed for both vector similarity and full-text search
4. **Hybrid Search**: Queries are processed through both search methods
5. **Result Fusion**: Results are combined using Reciprocal Rank Fusion for optimal relevance
