Metadata-Version: 2.1
Name: ALLMDEV
Version: 1.2.4
Summary: A simple and efficient python library for fast inference of GGUF Large Language Models.
Author: All Advance AI
Author-email: allmdev@allaai.com
Maintainer: Soham Ghadge
Maintainer-email: soham.ghadge@allaai.com
Keywords: GGUF,GGUF Large Language Model,GGUF Large Language Models,GGUF Large Language Modeling,GGUF Large Language Modeling Library
Description-Content-Type: text/markdown
Requires-Dist: Flask
Requires-Dist: click
Requires-Dist: llama-index
Requires-Dist: llama-cpp-python
Requires-Dist: aiohttp
Requires-Dist: llama-index-llms-llama-cpp
Requires-Dist: huggingface-hub
Requires-Dist: langchain ==0.0.267
Requires-Dist: chromadb ==0.3.26
Requires-Dist: pdfminer.six
Requires-Dist: pydantic ==1.10.13
Requires-Dist: sentence-transformers

# ALLM

ALLM is a Python library designed for fast inference of GGUF (Generic Global Unsupervised Features) Large Language Models (LLMs) on both CPU and GPU. It provides a convenient interface for loading pre-trained GGUF models and performing inference using them. This library is ideal for applications where quick response times are crucial, such as chatbots, text generation, and more.

## Features

- **Efficient Inference**: ALLM leverages the power of GGUF models to provide fast and accurate inference.
- **CPU and GPU Support**: The library is optimized for both CPU and GPU, allowing you to choose the best hardware for your application.
- **Simple Interface**: With a straightforward command line support, you can easily load models and perform inference with just a single command.
- **Flexible Configuration**: Customize inference settings such as temperature and model path to suit your needs.

## Installation

You can install ALLM using pip:

```bash
pip install allm
```

## Usage

You can start inference with a simple 'allm-run' command. The command takes name or path, temperature(optional), max new tokens(optional) and additional model kwargs(optional) as arguments.

```bash
allm-run --name model_name_or_path
```

## API

You can initiate the inference API by simply using the 'allm-serve' command. This command launches the API server on the default host, 127.0.0.1:5000. If you prefer to run the API server on a different port and host, you have the option to customize the apiconfig.txt file within your model directory.

```bash
allm-serve
```

==========================================================================================================================================


## ALLM RAG 

## Local RAG Inference

To initiate local RAG inference, begin by ingesting your documents into the vector database using the allm-createagent command:

```bash
allm-createagent --doc "document_path"
```

After successfully ingesting the document, you can start the local RAG inference with the allm-agentchat command:


```bash
allm-agentchat --name 'model_name_or_path'
```


Alternatively, you can also initiate RAG inference on the API server using the allm-agentapi command:

```bash
allm-agentapi 
```

## Supported Model names
Llama2, llama, llama2_chat, Llama_chat, Mistral, Mistral_instruct

