Metadata-Version: 2.4
Name: agent-sprint-testkit
Version: 0.1.3
Summary: AgentSprint TestKit - Universal AI agent benchmarking and testing framework
Home-page: https://github.com/stanhus/ASTK
Author: ASTK Team
Author-email: ASTK Team <team@astk.dev>
Maintainer-email: ASTK Team <team@astk.dev>
License: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
        
        Copyright (c) 2024 Stan Hus
        
        This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
        
        You are free to:
        
        - Share — copy and redistribute the material in any medium or format for non-commercial purposes
        - The licensor cannot revoke these freedoms as long as you follow the license terms.
        
        Under the following terms:
        
        - Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
        - NonCommercial — You may not use the material for commercial purposes.
        - NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
        - No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
        
        To view a copy of this license, visit:
        http://creativecommons.org/licenses/by-nc-nd/4.0/
        
        For commercial use or derivative works, please contact: admin@blackbox-dev.com
        
        DISCLAIMER:
        THIS SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
        
        ENFORCEMENT:
        Violation of this license may result in legal action. The copyright holder reserves all rights not explicitly granted herein.
        
        ---
        
        For more information about ASTK, visit: https://github.com/stanhus/ASTK
        
Project-URL: Homepage, https://github.com/your-org/astk
Project-URL: Documentation, https://astk.readthedocs.io
Project-URL: Repository, https://github.com/your-org/astk.git
Project-URL: Issues, https://github.com/your-org/astk/issues
Project-URL: Changelog, https://github.com/your-org/astk/blob/main/CHANGELOG.md
Keywords: ai,agent,testing,benchmark,llm,chatbot
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Testing
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: langchain>=0.1.0
Requires-Dist: langchain-openai>=0.0.2
Requires-Dist: langchain-core>=0.1.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: click>=8.0.0
Requires-Dist: opentelemetry-api>=1.20.0
Requires-Dist: opentelemetry-sdk>=1.20.0
Requires-Dist: psutil>=5.9.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: requests>=2.28.0
Requires-Dist: aiohttp>=3.8.0
Requires-Dist: pyyaml>=6.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: build>=0.10.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Provides-Extra: docker
Requires-Dist: docker>=6.0.0; extra == "docker"
Provides-Extra: all
Requires-Dist: astk[dev,docker]; extra == "all"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# ASTK Package Usage Guide 📖

> **Step-by-step instructions for using AgentSprint TestKit**

This guide shows you exactly how to install and use ASTK to test your AI agents. No technical background required!

## 🚀 What is ASTK?

ASTK is a tool that **tests your AI chatbots and agents** to see how well they work. Think of it like a "test suite" for your AI - it asks your agent different questions and measures how good the responses are.

## 📦 Step 1: Install ASTK

Open your terminal/command prompt and run:

```bash
pip install agent-sprint-testkit
```

**✅ Check it worked:**

```bash
astk --help
```

You should see a help menu. If you get an error, see [Troubleshooting](#troubleshooting) below.

## 🔑 Step 2: Set Up OpenAI API Key

ASTK uses OpenAI to help evaluate your agent's responses. You need an API key:

1. **Get an API key** from [OpenAI](https://platform.openai.com/api-keys)
2. **Set the key** in your terminal:

```bash
# On Mac/Linux:
export OPENAI_API_KEY="sk-your-key-here"

# On Windows:
set OPENAI_API_KEY=sk-your-key-here
```

## 🏁 Step 3: Your First Test

### Option A: Test the Example Agent

ASTK comes with a built-in example agent for testing:

```bash
astk init my-first-test
cd my-first-test
astk benchmark examples/agents/file_qa_agent.py
```

This will:

- ✅ Create a test project
- ✅ Run 8 different scenarios
- ✅ Generate a detailed report
- ✅ Show you how well the agent performed

### Option B: Test Your Own Agent

If you have your own AI agent, you can test it:

```bash
astk benchmark path/to/your-agent.py
```

**Your agent must accept questions as command-line arguments:**

```bash
python your-agent.py "What is 2+2?"
# Should output: "Agent: 4" or similar
```

## 📊 Understanding Results

After running a benchmark, you'll see sophisticated results like:

```json
{
  "success_rate": 0.67,           // 67% of tests passed
  "complexity_score": 0.58,       // 58% difficulty-weighted score
  "total_duration_seconds": 45.2, // Took 45 seconds total
  "average_response_length": 1247, // Average response was 1,247 characters
  "difficulty_breakdown": {
    "intermediate": {"success_rate": 1.0, "scenarios": "2/2"},
    "advanced": {"success_rate": 0.6, "scenarios": "3/5"},
    "expert": {"success_rate": 0.4, "scenarios": "2/5"}
  },
  "category_breakdown": {
    "reasoning": {"success_rate": 0.67, "scenarios": "2/3"},
    "creativity": {"success_rate": 0.5, "scenarios": "1/2"},
    "ethics": {"success_rate": 1.0, "scenarios": "2/2"}
  },
  "scenarios": [...]              // Details for each test
}
```

**🎯 What this means:**

### Core Metrics

- **Success Rate**: Percentage of scenarios completed successfully
- **Complexity Score**: Difficulty-weighted performance (Expert = 3x, Advanced = 2x, Intermediate = 1x)
- **Duration**: How fast your agent responds to complex challenges
- **Response Length**: How detailed and comprehensive the answers are

### Advanced Analytics

- **🎓 Difficulty Breakdown**: Performance across challenge levels
  - 📘 **Intermediate**: Basic problem-solving tasks
  - 📙 **Advanced**: Complex multi-step reasoning
  - 📕 **Expert**: Cutting-edge AI capabilities
- **🏷️ Category Performance**: Strengths across different domains
  - 🧠 **Reasoning**: Logic and problem-solving
  - 🎨 **Creativity**: Innovation and design thinking
  - ⚖️ **Ethics**: Responsible AI practices
  - 🔗 **Integration**: System architecture skills

### 🌟 AI Capability Ratings

Based on your **Complexity Score**:

- **🌟 Exceptional AI (80%+)**: Expert-level reasoning across multiple domains
- **🔥 Advanced AI (60-79%)**: Strong performance on sophisticated tasks
- **💡 Competent AI (40-59%)**: Good basic capabilities, room for advanced improvement
- **📚 Developing AI (<40%)**: Focus on improving reasoning and problem-solving

## 🧪 What Tests Does ASTK Run?

ASTK automatically tests your agent with 12 sophisticated scenarios across multiple categories:

### 🧠 **Reasoning & Problem-Solving**

| Test                         | What it checks                                                                                              | Difficulty      |
| ---------------------------- | ----------------------------------------------------------------------------------------------------------- | --------------- |
| **Multi-step Reasoning**     | Can your agent analyze complex problems, identify security vulnerabilities, and provide detailed solutions? | 📙 Advanced     |
| **Edge Case Analysis**       | How well does it handle unusual situations, errors, and unexpected inputs?                                  | 📘 Intermediate |
| **Performance Optimization** | Can it analyze code for bottlenecks and suggest detailed performance improvements?                          | 📙 Advanced     |

### 🎨 **Creativity & Innovation**

| Test                             | What it checks                                                                                 | Difficulty |
| -------------------------------- | ---------------------------------------------------------------------------------------------- | ---------- |
| **Creative Problem Solving**     | Can your agent design new features and architectures from scratch with implementation details? | 📕 Expert  |
| **Adaptive Learning Assessment** | Can it design self-improving systems and machine learning approaches?                          | 📕 Expert  |

### 🔗 **System Integration & Architecture**

| Test                         | What it checks                                                      | Difficulty  |
| ---------------------------- | ------------------------------------------------------------------- | ----------- |
| **Cross-domain Integration** | How well can it design complete DevOps and CI/CD strategies?        | 📕 Expert   |
| **Failure Recovery Design**  | Can it create comprehensive error handling and reliability systems? | 📙 Advanced |
| **Scalability Architecture** | Can it redesign systems for massive scale (100k+ concurrent users)? | 📕 Expert   |

### ⚖️ **Ethics & Compliance**

| Test                      | What it checks                                                      | Difficulty  |
| ------------------------- | ------------------------------------------------------------------- | ----------- |
| **Ethical AI Evaluation** | Does it understand AI bias, fairness, and responsible AI practices? | 📙 Advanced |
| **Regulatory Compliance** | Can it design systems that meet GDPR, CCPA, and AI regulations?     | 📙 Advanced |

### 💼 **Strategic & Future-Tech Analysis**

| Test                            | What it checks                                                          | Difficulty      |
| ------------------------------- | ----------------------------------------------------------------------- | --------------- |
| **Competitive Analysis**        | Can it analyze markets, competitive positioning, and business strategy? | 📘 Intermediate |
| **Quantum Computing Readiness** | Does it understand emerging technologies and future-tech implications?  | 📕 Expert       |

### 📊 **New Metrics You'll Get:**

- **🧠 Complexity Score**: Difficulty-weighted performance (Expert tasks count 3x more than Intermediate)
- **🎓 Difficulty Breakdown**: How well your agent handles Intermediate vs Advanced vs Expert challenges
- **🏷️ Category Performance**: Which areas your agent excels in (Reasoning, Creativity, Ethics, etc.)
- **🏆 Best Category**: Your agent's strongest capability area
- **🌟 AI Capability Assessment**: Overall intelligence rating from "Developing" to "Exceptional"

## 🎯 Common Use Cases

### Testing a Simple Chatbot

```bash
# Your chatbot file: my_bot.py
#!/usr/bin/env python3
import sys

def main():
    if len(sys.argv) > 1:
        question = " ".join(sys.argv[1:])
        # Your chatbot logic here
        answer = f"Bot says: {question}"
        print(answer)

if __name__ == "__main__":
    main()
```

**Test it:**

```bash
astk benchmark my_bot.py
```

### Testing Different Agent Types

**CLI Agent (takes command line arguments):**

```bash
astk benchmark my_cli_agent.py
```

**Python Module Agent (has a chat method):**

```bash
# ASTK will automatically detect and use the chat() method
astk benchmark my_module_agent.py
```

**REST API Agent:**

```bash
# ASTK will try to use the /chat endpoint
astk benchmark http://localhost:8000
```

## 📋 All Available Commands

```bash
# Initialize a new test project
astk init <project-name>

# Run benchmark tests
astk benchmark <agent-path>

# Generate detailed reports
astk report <results-directory>

# Show examples and help
astk examples

# Show version
astk --version
```

## 🔧 Troubleshooting

### ❌ "Command not found: astk"

**Problem:** Package not installed properly

**Solution:**

```bash
pip install --upgrade pip
pip install agent-sprint-testkit
```

**Still not working?** Try:

```bash
python -m pip install agent-sprint-testkit
```

### ❌ "OpenAI API key not found"

**Problem:** API key not set

**Solution:**

```bash
# Check if it's set:
echo $OPENAI_API_KEY

# Set it:
export OPENAI_API_KEY="sk-your-key-here"
```

### ❌ "Agent failed to respond"

**Problem:** Your agent doesn't accept command-line arguments

**Solution:** Make sure your agent works like this:

```bash
python your-agent.py "test question"
# Should print something back
```

**Example working agent:**

```python
#!/usr/bin/env python3
import sys

if len(sys.argv) > 1:
    question = " ".join(sys.argv[1:])
    print(f"Agent: Here's my response to '{question}'")
else:
    print("Agent: Please ask me a question!")
```

### ❌ Permission errors

**Problem:** Can't install or run commands

**Solution:**

```bash
# Try with user installation:
pip install --user agent-sprint-testkit

# Add to PATH if needed:
export PATH=$PATH:~/.local/bin
```

## 🎮 Quick Examples

### 1. Basic Test Run

```bash
pip install agent-sprint-testkit
export OPENAI_API_KEY="your-key"
astk init test-project
cd test-project
astk benchmark examples/agents/file_qa_agent.py
```

### 2. Test Your Own Agent

```bash
# Create simple agent
echo '#!/usr/bin/env python3
import sys
if len(sys.argv) > 1:
    print(f"Bot: {sys.argv[1]}")' > my_bot.py

chmod +x my_bot.py

# Test it
astk benchmark my_bot.py
```

### 3. Multiple Tests

```bash
# Test different agents
astk benchmark agent1.py
astk benchmark agent2.py
astk benchmark http://localhost:8000

# Compare results
astk report benchmark_results/
```

## 📈 Improving Your Agent

Based on ASTK results, you can improve your agent:

- **Low success rate?** Make sure your agent handles different question types
- **Slow responses?** Optimize your agent's processing speed
- **Short responses?** Add more detailed explanations
- **Failed scenarios?** Test your agent with the specific question types ASTK uses

## 💡 Tips for Best Results

1. **Test regularly** - Run ASTK after every major change to your agent
2. **Check all scenarios** - Make sure your agent handles different types of questions
3. **Monitor performance** - Watch response times and success rates
4. **Use the reports** - ASTK generates detailed reports to help you improve

## 🚀 Next Steps

1. **Install ASTK**: `pip install agent-sprint-testkit`
2. **Set API key**: `export OPENAI_API_KEY="your-key"`
3. **Run first test**: `astk init test && cd test && astk examples`
4. **Test your agent**: `astk benchmark your-agent.py`
5. **Review results** and improve your agent!

---

**🎯 Ready to test your AI agent?**

```bash
pip install agent-sprint-testkit && astk --help
```

**Need help?** Check the [main documentation](README.md) or [open an issue](https://github.com/your-org/astk/issues).
