Metadata-Version: 2.1
Name: LlaMasterKey
Version: 0.1.0
Summary: One master key for all LLM/GenAI endpoints
Author-email: "Textea Inc." <bao@textea.co>
License: MIT License
        
        Copyright (c) 2023 Textea
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: homepage, https://github.com/TexteaInc/LlaMasterKey
Keywords: tokens
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.10
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: httpx
Requires-Dist: fastapi
Requires-Dist: starlette
Requires-Dist: uvicorn

# LLaMasterKey: One master key for all LLM/GenAI endpoints

A big pain in the era of LLMs is that you need to get an API token for each of them, OpenAI, Cohere, Google Vertex AI, Anthropic, AnyScale, Huggingface, etc.

If an intern in your startup accidentally pushes the code containing the API keys to Github, you would have to revoke each of the API tokens that was assigned to him. Even worse, you already forgot which API tokens were given to him. So what do you do? Revoke all keys and suffer from service interruption?

This is when LlaMasterKey (pronounced as "La Master key" which stands for "Llama" + "Master" + "key" where "La" stands for "the" in French) comes to play. It severs as a proxy that dispatches the requests to the real cloud LLM/GenAI endpoints and returns the response to your team/customer. To authenticate, only one master key is needed between your team member or customer and your LlaMasterKey server. If any of them makes you unhappy, you only need to revoke one key to cut off his/her access to all cloud LLM/GenAI endpoints. The actual keys are hidden from your team members and customers.

## Roadmap

1. Currently no master key is enabled. We will add authentication.
2. More cloud LLM/GenAI endpoints will be supported. This is the status:
   - [x] OpenAI/chat/completion
   - [x] Cohere/chat
   - [x] AnyScale
   - [x] HuggingFace Inference API
   - [ ] Anthropic
   - [ ] Google Vertex AI
   - [ ] Vectara AI

## Installation

```bash
pip install LLaMasterKey
```

If you want to install from the path, you can do:

```bash
pip install -e .
```

## Usage

1. On your server, set up the keys for each cloud LLM/GenAI endpoint you want to use. For example, if you want to use OpenAI, set the OS environment variable `OPENAI_API_KEY`.

   ```bash
   export OPENAI_API_KEY=sk-xxx #openai
   export CO_API_KEY=co-xxx # cohere
   export HF_TOKEN=hf-xxx # huggingface
   export ANYSCALE_API_KEY=as-xxx # anyscale
   export ANTHROPIC_API_KEY=an-xxx # anthropic
   export VECTOR_AI_API_KEY=va-xxx # vectara
   ```

2. Start your LlaMasterKey server

   ```bash
   lmk
   ```

   The server will read keys set in the OS environment variables and start a server at `http://localhost:8000` (8000 because it's the default port of FastAPI).

3. On each computer that needs to connect to a cloud LLM, e.g., the laptop of your intern, use the `generated-keys.env` which is generated by the LlaMasterKey.

   ```bash
   source generated-keys.env
   ```

4. Make requests to the cloud LLM/GenAI endpoint as usual.

   For example, `test_chatgpt.py` in `tests` is a client request.

## How it works under the hood

We generate an env file that modifies the token and the endpoint URL, e.g. for OpenAI we override `OPENAI_BASE_URL` and `OPENAI_API_KEY`. The request will then be forwarded to the LlaMasterKey server and processed and forwarded to the corresponding address based on the token.

### For HuggingFace

If you work through `huggingface_hub.InferenceClient()` it works fine. But if you are working through `requests` like:

```python
import requests

API_URL = "https://api-inference.huggingface.co/models/t5-small"
headers = {"Authorization": "Bearer **********"}

def query(payload):
   response = requests.post(API_URL, headers=headers, json=payload)
   return response.json()

output = query({
   "inputs": "Меня зовут Вольфганг и я живу в Берлине",
})
```

You need to change the `API_URL` to `os.environ["HF_INFERENCE_ENDPOINT"] + "/models/t5-small"`, and change the `Authorization` header to `os.environ["HF_TOKEN"]`.

For example, if you want to use the `t5-small` model, you can do:

```python
import os
import requests

API_URL = f"{os.environ['HF_INFERENCE_ENDPOINT']}/models/t5-small"
headers = {"Authorization": f"Bearer {os.environ['HF_TOKEN']}"}

def query(payload):
   response = requests.post(API_URL, headers=headers, json=payload)
   return response.json()

output = query({
   "inputs": "Меня зовут Вольфганг и я живу в Берлине",
})
``

## License

Ah, this is important. Let's say MIT for now?
