Metadata-Version: 2.4
Name: filegraphdb
Version: 0.1.1
Summary: A local file-native graph layer for relationship-aware text retrieval.
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.23
Requires-Dist: scikit-learn>=1.2
Provides-Extra: models
Requires-Dist: sentence-transformers>=2.7; extra == "models"
Provides-Extra: llm
Requires-Dist: openai>=2.0; extra == "llm"
Requires-Dist: torch; extra == "llm"
Requires-Dist: transformers; extra == "llm"
Requires-Dist: accelerate; extra == "llm"

# FileGraphDB

FileGraphDB builds a local relationship graph over ordinary text files so an LLM can retrieve only the most relevant files instead of reading an entire folder.

## Setup

```powershell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
pip install -e .
```

## Build A File Graph

```powershell
filegraph --folder . build
```

Files longer than `2,000` words keep their file node and are also split into overlapping chunk nodes:

```text
big_report.txt
big_report.txt#chunk-0001
big_report.txt#chunk-0002
```

The graph can contain:

```text
File --CONTAINS--> Chunk
File --SEMANTICALLY_SIMILAR--> File
Chunk --SEMANTICALLY_SIMILAR--> Chunk
File/Chunk --SHARES_ENTITY--> File/Chunk
File/Chunk --SHARES_TOPIC--> File/Chunk
```

You can tune this:

```powershell
filegraph --folder ./docs build --chunk-threshold 2000 --chunk-words 800 --chunk-overlap 120
```

Or disable chunking:

```powershell
filegraph --folder ./docs --chunk-threshold 0 build
```

Show strongest relationships:

```powershell
filegraph --folder . edges
```

Find files related to one file:

```powershell
filegraph --folder . related research/file_native_graph_database.md
```

Retrieve likely files for an LLM query:

```powershell
filegraph --folder . search "How can file relationships reduce LLM token cost?"
```

Print LLM-ready context:

```powershell
filegraph --folder . context "How can file relationships reduce LLM token cost?"
```

## Python Library

```python
from filegraphdb import FileGraphDB

graph = FileGraphDB("./docs")
graph.build()

for result in graph.retrieve("What caused the project delay?", limit=4):
    print(result.document.rel_path, result.score)
```

The first build is the expensive step. After that, the SQLite graph lives at:

```text
.filegraphdb.sqlite
```

## Optional Open-Source Embedding Model

By default, FileGraphDB uses local TF-IDF + LSA semantic vectors from `scikit-learn`. To use a stronger open-source embedding model:

```powershell
pip install -e ".[models]"
filegraph --folder ./docs --use-model build
```

The default model is:

```text
sentence-transformers/all-MiniLM-L6-v2
```

## Small LLM Demo

The earlier local LLM demo is still available:

```powershell
python small_llm.py
```
