<repository_structure>
<directory name="async_llm_handler">
    <file>
        <name>.env</name>
        <path>.env</path>
        <content>Full content not provided</content>
    </file>
    <file>
        <name>.gitignore</name>
        <path>.gitignore</path>
        <content>Full content not provided</content>
    </file>
    <file>
        <name>PROJECT_MAINTENANCE.md</name>
        <path>PROJECT_MAINTENANCE.md</path>
        <content>
# Project Maintenance Guide

This guide outlines the process for maintaining and updating the `async_llm_handler` project, including version management and releases.

## Project Structure

The project uses the following files for version management and releases:

- `pyproject.toml`: Project configuration and metadata
- `version.txt`: Contains the current version number
- `update.ps1`: PowerShell script for automating the release process

## Updating the Project

### Prerequisites

Ensure you have the following installed:

- Python 3.7+
- pip
- git

Install the required tools:

```powershell
pip install build twine
```

### Making Changes

1. Make your desired changes to the code, README, or other files.
2. Test your changes thoroughly.

### Releasing a New Version

1. Open PowerShell in your project directory.
2. Run the update script:

   ```powershell
   .\update.ps1
   ```

3. When prompted, enter the new version number (e.g., 0.1.1).
4. The script will automatically:
   - Update `version.txt`
   - Update version references in `README.md`
   - Commit changes to git
   - Create a new git tag
   - Push changes and tags to GitHub
   - Build the Python package
   - Upload the new version to PyPI

## Manual Steps (if needed)

If you need to perform any steps manually:

### Updating version.txt

1. Open `version.txt`
2. Change the version number
3. Save the file

### Updating pyproject.toml

The `pyproject.toml` file is set up to read the version from `version.txt`. You shouldn't need to manually update the version in this file.

### Building the Package

To build the package manually:

```powershell
python -m build
```

### Uploading to PyPI

To upload to PyPI manually:

```powershell
twine upload dist/*
```

## Best Practices

- Always increment the version number for any public release.
- Use semantic versioning (MAJOR.MINOR.PATCH):
  - MAJOR: Incompatible API changes
  - MINOR: Add functionality in a backwards-compatible manner
  - PATCH: Backwards-compatible bug fixes
- Test thoroughly before releasing.
- Keep the README up to date with any significant changes.

## Troubleshooting

- If the script fails, you can perform the steps manually using the commands in the "Manual Steps" section.
- Ensure you have the necessary permissions for the GitHub repository and PyPI project.
- Check that your git configuration is correct and you're able to push to the repository.

Remember to refer to this guide whenever you need to release a new version of the project. It will help ensure a consistent and smooth release process.
        </content>
    </file>
    <file>
        <name>pyproject.toml</name>
        <path>pyproject.toml</path>
        <content>
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "async_llm_handler"
version = "0.2.0"
description = "An asynchronous handler for multiple LLM APIs"
readme = "README.md"
requires-python = ">=3.7"
license = "MIT"
keywords = ["llm", "api", "async", "nlp"]
authors = [
  { name = "Bryan Nsoh", email = "bryan.anye.5@gmail.com" },
]
dependencies = [
  "anthropic",
  "google-generativeai",
  "openai",
  "python-dotenv",
  "tiktoken",
  "asyncio",
  "aiohttp",
]

[project.optional-dependencies]
dev = [
  "pytest",
  "pytest-asyncio",
]

[project.urls]
Homepage = "https://github.com/BryanNsoh/async_llm_handler"

[tool.pytest.ini_options]
asyncio_mode = "auto"

[tool.hatch.build.targets.wheel]
packages = ["async_llm_handler"]

[tool.hatch.version]
path = "version.txt"
        </content>
    </file>
    <file>
        <name>README.md</name>
        <path>README.md</path>
        <content>
# Async LLM Handler

Async LLM Handler is a Python package that provides a unified interface for interacting with multiple Language Model APIs, supporting both synchronous and asynchronous operations. It currently supports Gemini, Claude, and OpenAI APIs.

## Features

- Synchronous and asynchronous API calls
- Support for multiple LLM providers:
  - Gemini (model: gemini_flash)
  - Claude (models: claude_3_5_sonnet, claude_3_haiku)
  - OpenAI (models: gpt_4o, gpt_4o_mini)
- Automatic rate limiting for each API
- Token counting and prompt clipping utilities

## Installation

Install the Async LLM Handler using pip:

```bash
pip install async-llm-handler
```

## Configuration

Before using the package, set up your environment variables in a `.env` file in your project's root directory:

```
GEMINI_API_KEY=your_gemini_api_key
CLAUDE_API_KEY=your_claude_api_key
OPENAI_API_KEY=your_openai_api_key
```

## Usage

### Basic Usage

#### Synchronous

```python
from async_llm_handler import LLMHandler

handler = LLMHandler()

# Using the default model
response = handler.query("What is the capital of France?", sync=True)
print(response)

# Specifying a model
response = handler.query("Explain quantum computing", model="gpt_4o", sync=True)
print(response)
```

#### Asynchronous

```python
import asyncio
from async_llm_handler import LLMHandler

async def main():
    handler = LLMHandler()

    # Using the default model
    response = await handler.query("What is the capital of France?", sync=False)
    print(response)

    # Specifying a model
    response = await handler.query("Explain quantum computing", model="claude_3_5_sonnet", sync=False)
    print(response)

asyncio.run(main())
```

### Advanced Usage

#### Using Multiple Models Concurrently

```python
import asyncio
from async_llm_handler import LLMHandler

async def main():
    handler = LLMHandler()
    prompt = "Explain the theory of relativity"
    
    tasks = [
        handler.query(prompt, model='gemini_flash', sync=False),
        handler.query(prompt, model='gpt_4o', sync=False),
        handler.query(prompt, model='claude_3_5_sonnet', sync=False)
    ]
    
    responses = await asyncio.gather(*tasks)
    
    for model, response in zip(['Gemini Flash', 'GPT-4o', 'Claude 3.5 Sonnet'], responses):
        print(f"Response from {model}:")
        print(response)
        print()

asyncio.run(main())
```

#### Limiting Input and Output Tokens

```python
from async_llm_handler import LLMHandler

handler = LLMHandler()

long_prompt = "Provide a detailed explanation of the entire history of artificial intelligence, including all major milestones and breakthroughs."

response = handler.query(long_prompt, model="gpt_4o", sync=True, max_input_tokens=1000, max_output_tokens=500)
print(response)
```

### Supported Models

The package supports the following models:

1. Gemini:
   - `gemini_flash`

2. Claude:
   - `claude_3_5_sonnet`
   - `claude_3_haiku`

3. OpenAI:
   - `gpt_4o`
   - `gpt_4o_mini`

You can specify these models using the `model` parameter in the `query` method.

### Error Handling

The package uses custom exceptions for error handling. Wrap your API calls in try-except blocks to handle potential errors:

```python
from async_llm_handler import LLMHandler
from async_llm_handler.exceptions import LLMAPIError

handler = LLMHandler()

try:
    response = handler.query("What is the meaning of life?", model="gpt_4o", sync=True)
    print(response)
except LLMAPIError as e:
    print(f"An error occurred: {e}")
```

### Rate Limiting

The package automatically handles rate limiting for each API. The current rate limits are:

- Gemini Flash: 30 requests per minute
- Claude 3.5 Sonnet: 5 requests per minute
- Claude 3 Haiku: 5 requests per minute
- GPT-4o: 5 requests per minute
- GPT-4o mini: 5 requests per minute

If you exceed these limits, the package will automatically wait before making the next request.

## Utility Functions

The package includes utility functions for token counting and prompt clipping:

```python
from async_llm_handler.utils import count_tokens, clip_prompt

text = "This is a sample text for token counting."
token_count = count_tokens(text)
print(f"Token count: {token_count}")

long_text = "This is a very long text that needs to be clipped..." * 100
clipped_text = clip_prompt(long_text, max_tokens=50)
print(f"Clipped text: {clipped_text}")
```

These utilities use the `cl100k_base` encoding by default, which is suitable for most modern language models.

## Logging

The package uses Python's built-in logging module. You can configure logging in your application to see debug information, warnings, and errors from the Async LLM Handler:

```python
import logging

logging.basicConfig(level=logging.INFO)
```

This will display INFO level logs and above from the Async LLM Handler.
        </content>
    </file>
    <file>
        <name>update.ps1</name>
        <path>update.ps1</path>
        <content>
# Ensure we're in the project root
Set-Location $PSScriptRoot

# Read the current version
$current_version = Get-Content version.txt
Write-Host "Current version: $current_version"

# Prompt for the new version
$new_version = Read-Host -Prompt "Enter new version number"

# Update version.txt
Set-Content -Path version.txt -Value $new_version

# Update README.md if necessary
(Get-Content README.md) -replace "version $current_version", "version $new_version" | Set-Content README.md

# Git operations
git add .
git commit -m "Update to version $new_version"
git tag "v$new_version"
git push origin main --tags

# Build and upload to PyPI
python -m build
twine upload dist/*

Write-Host "Version $new_version has been released!"
        </content>
    </file>
</directory>
    <directory name=".pytest_cache">
    <file>
        <name>.gitignore</name>
        <path>.pytest_cache\.gitignore</path>
        <content>Full content not provided</content>
    </file>
    <file>
        <name>CACHEDIR.TAG</name>
        <path>.pytest_cache\CACHEDIR.TAG</path>
        <content>Full content not provided</content>
    </file>
    <file>
        <name>README.md</name>
        <path>.pytest_cache\README.md</path>
        <content>
# pytest cache directory #

This directory contains data from the pytest's cache plugin,
which provides the `--lf` and `--ff` options, as well as the `cache` fixture.

**Do not** commit this to version control.

See [the docs](https://docs.pytest.org/en/stable/how-to/cache.html) for more information.

        </content>
    </file>
    </directory>
        <directory name="v">
        </directory>
            <directory name="cache">
    <file>
        <name>lastfailed</name>
        <path>.pytest_cache\v\cache\lastfailed</path>
        <content>Full content not provided</content>
    </file>
    <file>
        <name>nodeids</name>
        <path>.pytest_cache\v\cache\nodeids</path>
        <content>Full content not provided</content>
    </file>
    <file>
        <name>stepwise</name>
        <path>.pytest_cache\v\cache\stepwise</path>
        <content>Full content not provided</content>
    </file>
            </directory>
    <directory name="async_llm_handler">
    <file>
        <name>config.py</name>
        <path>async_llm_handler\config.py</path>
        <content>
# File: async_llm_handler/config.py

import os
from dotenv import load_dotenv

load_dotenv()

class Config:
    def __init__(self):
        self.gemini_api_key = os.getenv("GEMINI_API_KEY")
        self.claude_api_key = os.getenv("CLAUDE_API_KEY")
        self.openai_api_key = os.getenv("OPENAI_API_KEY")
        self.cohere_api_key = os.getenv("COHERE_API_KEY")
        self.groq_api_key = os.getenv("GROQ_API_KEY")

    def __getitem__(self, key):
        return getattr(self, key)
        </content>
    </file>
    <file>
        <name>exceptions.py</name>
        <path>async_llm_handler\exceptions.py</path>
        <content>
# async_llm_handler/exceptions.py

class LLMAPIError(Exception):
    """Exception raised for errors in the LLM API."""
    pass

class RateLimitTimeoutError(Exception):
    """Exception raised when a rate limit wait exceeds the specified timeout."""
    pass
        </content>
    </file>
    <file>
        <name>handler.py</name>
        <path>async_llm_handler\handler.py</path>
        <content>
# async_llm_handler/handler.py

import asyncio
import json
import logging
from typing import Optional, Union, Any, Coroutine, Dict
import aiohttp
import anthropic
import google.generativeai as genai
from openai import AsyncOpenAI

from .config import Config
from .exceptions import LLMAPIError, RateLimitTimeoutError
from .utils.rate_limiter import RateLimiter
from .utils.token_utils import clip_prompt

logger = logging.getLogger(__name__)

class Handler:
    def __init__(self, config: Optional[Config] = None, rate_limit_timeout: float = 30.0, retry_attempts: int = 3, retry_delay: float = 1.0):
        self.config = config or Config()
        self.retry_attempts = retry_attempts
        self.retry_delay = retry_delay
        self._setup_clients()
        self._setup_rate_limiters(rate_limit_timeout)

    def _setup_clients(self):
        # Gemini setup
        genai.configure(api_key=self.config.gemini_api_key)
        self.gemini_client = genai.GenerativeModel(
            "gemini-1.5-flash-latest",
            safety_settings=[
                {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
                {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
                {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
                {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"},
            ],
        )
        
        # Other clients setup
        self.claude_client = anthropic.AsyncAnthropic(api_key=self.config.claude_api_key)
        self.openai_client = AsyncOpenAI(api_key=self.config.openai_api_key)

    def _setup_rate_limiters(self, timeout: float):
        self.rate_limiters = {
            'gemini_flash': RateLimiter(30, 60, timeout),
            'claude_3_5_sonnet': RateLimiter(5, 60, timeout),
            'claude_3_haiku': RateLimiter(5, 60, timeout),
            'gpt_4o': RateLimiter(5, 60, timeout),
            'gpt_4o_mini': RateLimiter(5, 60, timeout)
        }

    async def query(
        self,
        prompt: str,
        model: str,
        sync: bool = True,
        max_input_tokens: Optional[int] = None,
        max_output_tokens: Optional[int] = None,
        json_mode: bool = False
    ) -> Union[str, Coroutine[Any, Any, str]]:
        if sync:
            return self._sync_query(prompt, model, max_input_tokens, max_output_tokens, json_mode)
        else:
            for attempt in range(self.retry_attempts):
                try:
                    return await self._async_query(prompt, model, max_input_tokens, max_output_tokens, json_mode)
                except RateLimitTimeoutError as e:
                    if attempt == self.retry_attempts - 1:
                        raise
                    await asyncio.sleep(self.retry_delay)

    def _sync_query(self, prompt: str, model: str, max_input_tokens: Optional[int] = None, max_output_tokens: Optional[int] = None, json_mode: bool = False) -> str:
        loop = asyncio.new_event_loop()
        try:
            asyncio.set_event_loop(loop)
            return loop.run_until_complete(self._async_query(prompt, model, max_input_tokens, max_output_tokens, json_mode))
        finally:
            loop.close()

    async def _async_query(self, prompt: str, model: str, max_input_tokens: Optional[int] = None, max_output_tokens: Optional[int] = None, json_mode: bool = False) -> str:
        method = getattr(self, f'_query_{model}_async', None)
        if not method:
            raise ValueError(f"Unsupported model for async query: {model}")
        
        return await method(prompt, max_input_tokens, max_output_tokens, json_mode)

    async def _query_gemini_flash_async(self, prompt: str, max_input_tokens: Optional[int] = None, max_output_tokens: Optional[int] = None, json_mode: bool = False) -> str:
        await self.rate_limiters['gemini_flash'].acquire_async()
        try:
            if max_input_tokens:
                prompt = clip_prompt(prompt, max_input_tokens)
            logger.info("Generating content with Gemini Flash API (Async).")
            generation_config = {"response_mime_type": "application/json"} if json_mode else {}
            if max_output_tokens is not None:
                generation_config['max_output_tokens'] = max_output_tokens
            response = await self.gemini_client.generate_content_async(prompt, generation_config=generation_config)
            if response.candidates:
                return response.candidates[0].content.parts[0].text
            else:
                raise ValueError("Invalid response format from Gemini Flash API.")
        except Exception as e:
            logger.error(f"Error with Gemini Flash API: {e}")
            raise LLMAPIError(f"Gemini Flash API error: {str(e)}")
        finally:
            self.rate_limiters['gemini_flash'].release()

    async def _query_gpt_4o_async(self, prompt: str, max_input_tokens: Optional[int] = None, max_output_tokens: Optional[int] = None, json_mode: bool = False) -> str:
        await self.rate_limiters['gpt_4o'].acquire_async()
        try:
            if max_input_tokens:
                prompt = clip_prompt(prompt, max_input_tokens)
            json_instruction = " Respond using JSON." if json_mode else ""
            messages = [{"role": "user", "content": prompt + json_instruction}]
            params = {
                "model": "gpt-4o-2024-05-13",
                "messages": messages,
                "temperature": 0.3,
                "top_p": 1,
                "frequency_penalty": 0,
                "presence_penalty": 0,
            }
            if max_output_tokens is not None:
                params["max_tokens"] = max_output_tokens
            if json_mode:
                params["response_format"] = {"type": "json_object"}
            response = await self.openai_client.chat.completions.create(**params)
            return response.choices[0].message.content
        except Exception as e:
            logger.error(f"Error with GPT-4o API: {e}")
            raise LLMAPIError(f"GPT-4o API error: {str(e)}")
        finally:
            self.rate_limiters['gpt_4o'].release()

    async def _query_gpt_4o_mini_async(self, prompt: str, max_input_tokens: Optional[int] = None, max_output_tokens: Optional[int] = None, json_mode: bool = False) -> str:
        await self.rate_limiters['gpt_4o_mini'].acquire_async()
        try:
            if max_input_tokens:
                prompt = clip_prompt(prompt, max_input_tokens)
            json_instruction = " Respond using JSON." if json_mode else ""
            messages = [{"role": "user", "content": prompt + json_instruction}]
            params = {
                "model": "gpt-4o-mini-2024-07-18",
                "messages": messages,
                "temperature": 0.3,
                "top_p": 1,
                "frequency_penalty": 0,
                "presence_penalty": 0,
            }
            if max_output_tokens is not None:
                params["max_tokens"] = max_output_tokens
            if json_mode:
                params["response_format"] = {"type": "json_object"}
            response = await self.openai_client.chat.completions.create(**params)
            return response.choices[0].message.content
        except Exception as e:
            logger.error(f"Error with GPT-4o mini API: {e}")
            raise LLMAPIError(f"GPT-4o mini API error: {str(e)}")
        finally:
            self.rate_limiters['gpt_4o_mini'].release()

    async def _query_claude_3_5_sonnet_async(self, prompt: str, max_input_tokens: Optional[int] = None, max_output_tokens: Optional[int] = None, json_mode: bool = False) -> str:
        await self.rate_limiters['claude_3_5_sonnet'].acquire_async()
        try:
            if max_input_tokens:
                prompt = clip_prompt(prompt, max_input_tokens)
            json_instruction = "Respond using JSON." if json_mode else ""
            params = {
                "model": "claude-3-sonnet-20240229",
                "messages": [{"role": "user", "content": prompt + json_instruction}],
                "max_tokens": max_output_tokens if max_output_tokens is not None else 4096,
            }
            response = await self.claude_client.messages.create(**params)
            return response.content[0].text
        except Exception as e:
            logger.error(f"Error with Claude 3.5 Sonnet API: {e}")
            raise LLMAPIError(f"Claude 3.5 Sonnet API error: {str(e)}")
        finally:
            self.rate_limiters['claude_3_5_sonnet'].release()

    async def _query_claude_3_haiku_async(self, prompt: str, max_input_tokens: Optional[int] = None, max_output_tokens: Optional[int] = None, json_mode: bool = False) -> str:
        await self.rate_limiters['claude_3_haiku'].acquire_async()
        try:
            if max_input_tokens:
                prompt = clip_prompt(prompt, max_input_tokens)
            json_instruction = "Respond using JSON." if json_mode else ""
            params = {
                "model": "claude-3-haiku-20240307",
                "messages": [{"role": "user", "content": prompt + json_instruction}],
                "max_tokens": max_output_tokens if max_output_tokens is not None else 4096,
            }
            response = await self.claude_client.messages.create(**params)
            return response.content[0].text
        except Exception as e:
            logger.error(f"Error with Claude 3 Haiku API: {e}")
            raise LLMAPIError(f"Claude 3 Haiku API error: {str(e)}")
        finally:
            self.rate_limiters['claude_3_haiku'].release()
        </content>
    </file>
    <file>
        <name>repo_context_extractor.py</name>
        <path>async_llm_handler\repo_context_extractor.py</path>
        <content>
import os

EXCLUDED_DIRS = {".git", "__pycache__", "node_modules", ".venv"}
FULL_CONTENT_EXTENSIONS = {".py", ".txt", ".dbml", ".yaml", ".toml", ".md",".sh",".ps1", }

def create_file_element(file_path, root_folder):
    relative_path = os.path.relpath(file_path, root_folder)
    file_name = os.path.basename(file_path)
    file_extension = os.path.splitext(file_name)[1]

    file_element = [
        f"    <file>\n        <name>{file_name}</name>\n        <path>{relative_path}</path>\n"
    ]

    if file_extension in FULL_CONTENT_EXTENSIONS:
        file_element.append("        <content>\n")
        try:
            with open(file_path, "r", encoding="utf-8") as file:
                file_element.append(file.read())
        except UnicodeDecodeError:
            file_element.append("Binary or non-UTF-8 content not displayed")
        file_element.append("\n        </content>\n")
    else:
        file_element.append("        <content>Full content not provided</content>\n")

    file_element.append("    </file>\n")
    return "".join(file_element)

def get_repo_structure(root_folder):
    structure = ["<repository_structure>\n"]

    for subdir, dirs, files in os.walk(root_folder):
        dirs[:] = [d for d in dirs if d not in EXCLUDED_DIRS]
        level = subdir.replace(root_folder, "").count(os.sep)
        indent = " " * 4 * level
        relative_subdir = os.path.relpath(subdir, root_folder)

        structure.append(f'{indent}<directory name="{os.path.basename(subdir)}">\n')
        for file in files:
            file_path = os.path.join(subdir, file)
            file_element = create_file_element(file_path, root_folder)
            structure.append(file_element)
        structure.append(f"{indent}</directory>\n")

    structure.append("</repository_structure>\n")
    return "".join(structure)

def main():
    root_folder = r"C:\Users\bnsoh2\OneDrive - University of Nebraska-Lincoln\Documents\Projects\async_llm_handler"
    output_file = os.path.join(root_folder, "repository_context.txt")

    # Delete the previous output file if it exists
    if os.path.exists(output_file):
        os.remove(output_file)
        print(f"Deleted previous {output_file}")

    repo_structure = get_repo_structure(root_folder)

    with open(output_file, "w", encoding="utf-8") as f:
        f.write(repo_structure)

    print(f"Fresh repository context has been extracted to {output_file}")

if __name__ == "__main__":
    main()
        </content>
    </file>
    <file>
        <name>__init__.py</name>
        <path>async_llm_handler\__init__.py</path>
        <content>
# async_llm_handler/__init__.py

from .handler import Handler
from .config import Config
from .exceptions import LLMAPIError, RateLimitTimeoutError

__all__ = ['Handler', 'Config', 'LLMAPIError', 'RateLimitTimeoutError']
__version__ = "0.2.0"  # Updated version number to reflect the changes
        </content>
    </file>
    </directory>
        <directory name="examples">
    <file>
        <name>async_example.py</name>
        <path>async_llm_handler\examples\async_example.py</path>
        <content>
# async_llm_handler/examples/async_example.py

import asyncio
from async_llm_handler import Handler

async def main():
    handler = Handler(rate_limit_timeout=60.0, retry_attempts=5, retry_delay=2.0)
    
    prompt = "What is the meaning of life?. use any json format u see fit"

    # Using specific models with JSON mode
    models = ['gemini_flash', 'gpt_4o', 'gpt_4o_mini', 'claude_3_5_sonnet', 'claude_3_haiku']
    tasks = [handler.query(prompt, model=model, sync=False, json_mode=True) for model in models]
    responses = await asyncio.gather(*tasks, return_exceptions=True)
    
    for model, response in zip(models, responses):
        if isinstance(response, Exception):
            print(f"Error with {model}: {str(response)}\n")
        else:
            print(f"{model.replace('_', ' ').title()} Response (JSON mode):")
            print(response)
            print()

    # Example with max_input_tokens, max_output_tokens, and JSON mode
    limited_prompt = "Summarize the entire history of human civilization in great detail."
    try:
        response = await handler.query(
            limited_prompt,
            model='gpt_4o',
            sync=False,
            max_input_tokens=1000,
            max_output_tokens=100,
            json_mode=True
        )
        print(f"GPT-4o Response (limited tokens, JSON mode):")
        print(response)
        print()
    except Exception as e:
        print(f"Error with GPT-4o (limited tokens): {str(e)}\n")

    # Test rate limiting and retries
    async def test_rate_limiting():
        for _ in range(10):  # Attempt to make 10 rapid requests
            try:
                response = await handler.query("Test prompt", model='gpt_4o_mini', sync=False)
                print(response)
            except Exception as e:
                print(f"Request failed: {str(e)}")
            await asyncio.sleep(0.1)  # Small delay between requests

    await test_rate_limiting()

if __name__ == "__main__":
    asyncio.run(main())
        </content>
    </file>
    <file>
        <name>sync_example.py</name>
        <path>async_llm_handler\examples\sync_example.py</path>
        <content>
# async_llm_handler/examples/sync_example.py

import json
from async_llm_handler import Handler
from async_llm_handler.exceptions import LLMAPIError, RateLimitTimeoutError

def main():
    handler = Handler(rate_limit_timeout=60.0, retry_attempts=5, retry_delay=2.0)
    
    prompt = "What are the top 3 benefits of regular exercise? Provide a brief explanation for each."

    # Using specific models with JSON mode
    models = ['gemini_flash', 'gpt_4o', 'gpt_4o_mini', 'claude_3_5_sonnet', 'claude_3_haiku']
    for model in models:
        try:
            response = handler.query(prompt, model=model, sync=True, json_mode=True)
            print(f"{model.replace('_', ' ').title()} Response (JSON mode):")
            try:
                parsed_response = json.loads(response)
                print(json.dumps(parsed_response, indent=2))
            except json.JSONDecodeError:
                print("Failed to parse response as JSON. Raw response:")
                print(response)
            print()
        except LLMAPIError as e:
            print(f"API Error with {model}: {str(e)}")
        except RateLimitTimeoutError as e:
            print(f"Rate limit timeout for {model}: {str(e)}")
        except Exception as e:
            print(f"Unexpected error with {model}: {str(e)}")
        print()

    # Example with max_input_tokens, max_output_tokens, and JSON mode
    limited_prompt = "Summarize the history of artificial intelligence in 50 words or less."
    try:
        response = handler.query(
            limited_prompt,
            model='gpt_4o',
            sync=True,
            max_input_tokens=100,
            max_output_tokens=50,
            json_mode=True
        )
        print(f"GPT-4o Response (limited tokens, JSON mode):")
        try:
            parsed_response = json.loads(response)
            print(json.dumps(parsed_response, indent=2))
        except json.JSONDecodeError:
            print("Failed to parse response as JSON. Raw response:")
            print(response)
    except Exception as e:
        print(f"Error with GPT-4o (limited tokens): {str(e)}")
    print()

    # Test rate limiting and retries
    def test_rate_limiting():
        for i in range(10):  # Attempt to make 10 rapid requests
            try:
                response = handler.query("Test prompt " + str(i), model='gpt_4o_mini', sync=True)
                print(f"Request {i + 1} successful")
            except RateLimitTimeoutError as e:
                print(f"Request {i + 1} failed due to rate limiting: {str(e)}")
            except Exception as e:
                print(f"Request {i + 1} failed: {str(e)}")

    print("Testing rate limiting:")
    test_rate_limiting()

if __name__ == "__main__":
    main()
        </content>
    </file>
        </directory>
        <directory name="tests">
    <file>
        <name>test_handler.py</name>
        <path>async_llm_handler\tests\test_handler.py</path>
        <content>
# File: async_llm_handler/tests/test_handler.py

import sys
import os
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '.')))

import pytest
from async_llm_handler import LLMHandler, Config
from async_llm_handler.exceptions import LLMAPIError

@pytest.fixture
def handler():
    return LLMHandler()

def test_query(handler):
    response = handler.query("Test prompt")
    assert isinstance(response, str)
    assert len(response) > 0

@pytest.mark.asyncio
async def test_async_query(handler):
    response = await handler._async_query("Test prompt")
    assert isinstance(response, str)
    assert len(response) > 0

def test_invalid_model(handler):
    with pytest.raises(ValueError):
        handler.query("Test prompt", model="invalid_model")

@pytest.mark.asyncio
async def test_all_apis_fail(monkeypatch):
    def mock_api_error(*args, **kwargs):
        raise LLMAPIError("API Error")

    handler = LLMHandler()
    for model in ['gemini', 'cohere', 'llama', 'claude', 'openai']:
        monkeypatch.setattr(handler, f'_query_{model}', mock_api_error)

    with pytest.raises(LLMAPIError, match="All LLM APIs failed to respond"):
        await handler._async_query("Test prompt")
        </content>
    </file>
    <file>
        <name>test_utils.py</name>
        <path>async_llm_handler\tests\test_utils.py</path>
        <content>
# File: async_llm_handler/tests/test_utils.py

import pytest
from async_llm_handler.utils import count_tokens, clip_prompt, RateLimiter

def test_count_tokens():
    text = "Hello, world!"
    assert count_tokens(text) > 0

def test_clip_prompt():
    long_prompt = "This is a very long prompt " * 100
    max_tokens = 10
    clipped = clip_prompt(long_prompt, max_tokens)
    assert count_tokens(clipped) <= max_tokens

@pytest.mark.asyncio
async def test_rate_limiter():
    limiter = RateLimiter(rate=2, period=1)
    
    start_time = pytest.helpers.time()
    
    async with limiter:
        pass
    async with limiter:
        pass
    
    # This should wait
    async with limiter:
        pass
    
    end_time = pytest.helpers.time()
    
    assert end_time - start_time >= 1.0

def test_logger():
    from async_llm_handler.utils import get_logger
    logger = get_logger("test_logger")
    assert logger.name == "test_logger"
    assert logger.level == 20  # INFO level
        </content>
    </file>
    <file>
        <name>__init__.py</name>
        <path>async_llm_handler\tests\__init__.py</path>
        <content>
# File: async_llm_handler/tests/__init__.py
# This file can be left empty
        </content>
    </file>
        </directory>
        <directory name="utils">
    <file>
        <name>logger.py</name>
        <path>async_llm_handler\utils\logger.py</path>
        <content>
# File: async_llm_handler/utils/logger.py

import logging

def get_logger(name):
    logger = logging.getLogger(name)
    if not logger.handlers:
        handler = logging.StreamHandler()
        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
        handler.setFormatter(formatter)
        logger.addHandler(handler)
        logger.setLevel(logging.INFO)
    return logger
        </content>
    </file>
    <file>
        <name>rate_limiter.py</name>
        <path>async_llm_handler\utils\rate_limiter.py</path>
        <content>
# async_llm_handler/utils/rate_limiter.py

import asyncio
import time

class RateLimitTimeoutError(Exception):
    pass

class RateLimiter:
    def __init__(self, rate: int, period: int = 60, timeout: float = 30.0):
        self.rate = rate
        self.period = period
        self.allowance = rate
        self.last_check = time.monotonic()
        self.timeout = timeout
        self._lock = asyncio.Lock()

    async def acquire_async(self):
        async with self._lock:
            current = time.monotonic()
            time_passed = current - self.last_check
            self.last_check = current
            self.allowance += time_passed * (self.rate / self.period)
            if self.allowance > self.rate:
                self.allowance = self.rate
            if self.allowance < 1:
                wait_time = (1 - self.allowance) / (self.rate / self.period)
                try:
                    await asyncio.wait_for(asyncio.sleep(wait_time), timeout=self.timeout)
                except asyncio.TimeoutError:
                    raise RateLimitTimeoutError(f"Rate limit wait exceeded timeout of {self.timeout} seconds")
            else:
                self.allowance -= 1

    def release(self):
        pass  # No action needed for release in this implementation
        </content>
    </file>
    <file>
        <name>token_utils.py</name>
        <path>async_llm_handler\utils\token_utils.py</path>
        <content>
# File: async_llm_handler/utils/token_utils.py

import tiktoken

def count_tokens(text, encoding_name="cl100k_base"):
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(text))
    return num_tokens

def clip_prompt(prompt, max_tokens, encoding_name="cl100k_base"):
    encoding = tiktoken.get_encoding(encoding_name)
    tokens = encoding.encode(prompt)
    if len(tokens) > max_tokens:
        clipped_tokens = tokens[:max_tokens]
        clipped_prompt = encoding.decode(clipped_tokens)
        return clipped_prompt
    return prompt
        </content>
    </file>
    <file>
        <name>__init__.py</name>
        <path>async_llm_handler\utils\__init__.py</path>
        <content>
# File: async_llm_handler/utils/__init__.py

from .logger import get_logger
from .rate_limiter import RateLimiter
from .token_utils import count_tokens, clip_prompt

__all__ = ['get_logger', 'RateLimiter', 'count_tokens', 'clip_prompt']
        </content>
    </file>
        </directory>
    <directory name="dist">
    <file>
        <name>async_llm_handler-0.1.0-py3-none-any.whl</name>
        <path>dist\async_llm_handler-0.1.0-py3-none-any.whl</path>
        <content>Full content not provided</content>
    </file>
    <file>
        <name>async_llm_handler-0.1.0.tar.gz</name>
        <path>dist\async_llm_handler-0.1.0.tar.gz</path>
        <content>Full content not provided</content>
    </file>
    </directory>
</repository_structure>
