Metadata-Version: 2.4
Name: carag
Version: 1.0.2
Summary: An efficient Python library for building AI applications using the Retrieval-Augmented Generation (RAG) pipeline.
Home-page: https://github.com/rizwandel/CARAG
Author: Mohamed Rizwan
Author-email: rizdelhi@gmail.com
License: GPL-3.0-only
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

<!-- Python library -->
<div align="centre" >
  <h1 align="centre"> CARAG: A python library to build the standard cache aware retrieval augmented generation pipeline with hybrid embeddings & Qdrant DB </h1>
</div>
  
![Supported python versions](https://img.shields.io/badge/python-3.11%20%7C%203.12-blue)
[![PEP8](https://img.shields.io/badge/code%20style-pep8-black.svg)](https://www.python.org/dev/peps/pep-0008/)
[![License](https://img.shields.io/badge/License-GPL%203.0-blue.svg)](LICENSE)
![GitHub stars](https://img.shields.io/github/stars/rizwandel/Build-standard-RAG-with-Qdrant?color=red&label=stars&logoColor=black&style=social)

<div align="centre" >
<img src="/images/rag.png" alt="weaviate">
<h8 align="left"> source: www.weaviate.com </h8>
</div>
  

## ✨ Description
**CARAG** is a Python library leverages a hybrid Retrieval-Augmented Generation (RAG) approach along with semantic cache (memory) to efficiently store and retrieve embeddings. By combining dense, sparse, and late interaction embeddings, It offers a robust solution for managing large datasets (unstructured text files) to get relevant grounded responses generated by the pretrained LLMs from Mistral API 

## ✨ Features
🚀 **Hybrid RAG**: Utilizes dense, sparse, and late interaction embeddings for enhanced performance.  
🔌 **Easy Integration**: Simple API for storing and searching embeddings.  
📄 **PDF/CSV Support**: Directly store embeddings from PDF/CSV documents.  
🎉  **Ground Generation from LLM** Get synthesised responses from "mistral-large-latest".

<!-- Links -->
<p align="left">
  <a href="https://rizdelhi.medium.com" style="color: #06b6d4;"> Read more on the medium Blog</a> 
</p>

## 🌱 Getting Started
#### Prerequisites
- tqdm
- PyMuPDF
- Mistral
- fastembed
- qdrant_client
- logging
- json

#### 🚀 Installation

To install **CARAG**, simply run:

```bash
pip install carag
```
#### Setting Environment Variables
```
python3.11 -m venv <env-name>
source venv/bin/activate  # On macOS/Linux
venv\Scripts\activate  # On Windows
```
#### Install dependencies

```python
pip install -r requirements.txt
```
#### Create an .environment file
Create a file named .env in the root directory of your project. This file will store your API keys and other sensitive information.

```
from dotenv import load_dotenv
load_env()

url =<qdrant_url>
api_key=<your_qdrant_api_key>
mistral_api_key=<your_mistral_api_key>
```

## 📦 Usage

```python
from carag.rag_pipeline import rag_pipe
from carag.llm_pipeline import GroundGeneration

rag = rag_pipe(
    url="YOUR_QDRANT_URL", 
    api_key="YOUR_API_KEY", 
    collection_name="YOUR_COLLECTION_NAME" # use if exists or create a collection in Qdrant cloud DB
)

# Store embeddings from a list of key,value pairs extracted from the PDF or CSV file 
rag.upload_text_chuncks(text_chunks,batch_size=1)

# Get the top 100 search results for the query (from the existing collection in vector DB)
top_100_results = rag.invoke(query="your search query")
```


```python
from carag import  GroundGeneration

gg = GroundGeneration(
      url="YOUR_QDRANT_URL", 
      api_key="YOUR_API_KEY",
      mistral_api_key="YOUR_MISTRAL_API_KEY",
      collection_name="YOUR_COLLECTION_NAME"
)

# Get top 3 responses from the Mistral LLM
top_3_responses = gg.ground_generation_from_llm(url="YOUR_QDRANT_URL",query="your search query",api_key="YOUR_API_KEY",mistral_api_key="YOUR_MISTRAL_API_KEY",collection_name="YOUR_COLLECTION_NAME")
```


> NOTE
- **Qdrant** offers a free tier with 4GB of storage. To generate your API key and endpoint, visit [Qdrant](https://qdrant.tech/).

- **Mistral AI** offers a free Tier with 1 billion tokens per month or 500K tokens per minute or 1 RPS.

## 🤝 Contributing  

Feel free to contribute to the improvement in the source code by reporting bugs, suggesting features, or submitting pull requests.
















Don't forget to [star (🌟) this repo](https://github.com/rizwandel/Build-standard-RAG-with-Qdrant) to find it easier later.
