Metadata-Version: 2.4
Name: askpandas
Version: 0.1.1
Summary: AI-powered data engineering and analytics assistant for querying CSV data using natural language—locally and intelligently
Home-page: https://github.com/irfanalidv/AskPandas
Author: Md Irfan Ali
Author-email: irfanali29@hotmail.com
Project-URL: Bug Reports, https://github.com/irfanalidv/AskPandas/issues
Project-URL: Source, https://github.com/irfanalidv/AskPandas
Project-URL: Documentation, https://github.com/irfanalidv/AskPandas#readme
Keywords: data-analysis,pandas,ai,natural-language,csv,data-science,machine-learning,llm,ollama,huggingface
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.5.0
Requires-Dist: numpy>=1.23.0
Requires-Dist: matplotlib>=3.5.0
Requires-Dist: seaborn>=0.12.0
Requires-Dist: requests>=2.28.0
Requires-Dist: transformers>=4.30.0
Requires-Dist: torch>=2.0.0
Requires-Dist: faker>=18.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=5.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Provides-Extra: full
Requires-Dist: jupyter>=1.0.0; extra == "full"
Requires-Dist: ipywidgets>=8.0.0; extra == "full"
Requires-Dist: plotly>=5.0.0; extra == "full"
Requires-Dist: bokeh>=3.0.0; extra == "full"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# AskPandas: AI-Powered Data Engineering & Analytics Assistant

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![PyPI version](https://badge.fury.io/py/askpandas.svg)](https://badge.fury.io/py/askpandas)
[![Downloads](https://static.pepy.tech/badge/askpandas)](https://pepy.tech/project/askpandas)
[![GitHub stars](https://img.shields.io/github/stars/irfanalidv/AskPandas?style=social)](https://github.com/irfanalidv/AskPandas)
[![GitHub forks](https://img.shields.io/github/forks/irfanalidv/AskPandas?style=social)](https://github.com/irfanalidv/AskPandas)
[![GitHub issues](https://img.shields.io/github/issues/irfanalidv/AskPandas)](https://github.com/irfanalidv/AskPandas/issues)

AskPandas is an open-source Python library that lets you query and transform CSV data using natural language, powered by free, local open-source LLMs via Ollama. **No API keys, no cloud, no cost.**

## 🚀 **Quick Start (5 minutes!)**

### 1. **Install AskPandas**

```bash
pip install askpandas
```

### 2. **Install Ollama (one command)**

```bash
# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows: Download from https://ollama.com/download
```

### 3. **Pull a lightweight model**

```bash
ollama pull phi3:mini    # Very small, very fast
```

### 4. **Start Ollama**

```bash
ollama serve
```

### 5. **Start analyzing data!**

```python
import askpandas as ap

# Set up AI
llm = ap.OllamaLLM(model_name="phi3:mini")
ap.set_llm(llm)

# Load your data
df = ap.DataFrame("your_data.csv")

# Ask questions in plain English!
result = df.chat("What is the total revenue?")
print(result)
```

### 🚀 **See It In Action!**

```python
import askpandas as ap
import pandas as pd

# Create sample data
data = {
    'product': ['Apple', 'Banana', 'Orange'],
    'price': [2.50, 1.00, 1.50],
    'quantity': [100, 200, 150]
}
df = pd.DataFrame(data)
df['revenue'] = df['price'] * df['quantity']

# Create AskPandas DataFrame
sales_df = ap.DataFrame(df)

# AI-powered analysis
result = sales_df.chat("What is the total revenue?")
# Output: Total Revenue: $675.00

# More complex queries
result = sales_df.chat("Show me the top 3 products by revenue")
# Output: Product analysis with rankings

result = sales_df.chat("Calculate average price by product")
# Output: Average Price: $1.67
```

## 🎯 **What Can You Do? (Everything!)**

### 📊 **Data Analysis - Just Ask!**

```python
# Basic questions
df.chat("What is the average price?")
# Output: Average Price: $1.67

df.chat("Show me the top 5 customers by revenue")
# Output: Customer rankings with revenue amounts

df.chat("How many sales were made in each region?")
# Output: Regional sales breakdown

# Complex analysis
df.chat("""
    Analyze our sales performance:
    1. Calculate total revenue by month
    2. Show the trend over time
    3. Identify the best performing products
    4. Create a visualization
""")
# Output: Comprehensive analysis with insights
```

### 🎨 **Beautiful Visualizations - Automatically!**

```python
# Charts are created automatically
df.chat("Create a bar chart of sales by region")
df.chat("Plot revenue trends over time")
df.chat("Show correlation between price and quantity")
df.chat("Display distribution of customer ages")
```

### 🔍 **Data Quality & Cleaning**

```python
# Automatic data assessment
df.chat("Check for missing values and duplicates")
df.chat("Identify outliers in numeric columns")
df.chat("Clean column names and standardize formats")
df.chat("Validate data types and suggest improvements")
```

### 🌐 **Multi-Dataset Analysis**

```python
# Work with multiple files
customers = ap.DataFrame("customers.csv")
orders = ap.DataFrame("orders.csv")
products = ap.DataFrame("products.csv")

# Cross-dataset insights
ap.chat("""
    Customer analysis:
    1. Join customers with their orders
    2. Calculate lifetime value by segment
    3. Show purchase patterns
    4. Identify high-value customers
""", customers, orders, products)
```

## 💡 **Real-World Examples**

### 📈 **Sales Analysis**

```python
import askpandas as ap

# Load sales data
sales = ap.DataFrame("sales_data.csv")

# Comprehensive sales report
sales.chat("What is our total revenue?")
# Output: Total Revenue: $78,586.11

sales.chat("Show me the top 3 products by revenue")
# Output: Product rankings with revenue amounts

sales.chat("Calculate average order value by region")
# Output: Regional performance metrics

sales.chat("How many sales were made in each region?")
# Output: Regional sales breakdown
```

### 👥 **Customer Analytics**

```python
# Customer behavior analysis
customers = ap.DataFrame("customers.csv")
transactions = ap.DataFrame("transactions.csv")

ap.chat("""
    Customer behavior insights:
    1. Customer lifetime value analysis
    2. Purchase frequency patterns
    3. Churn prediction factors
    4. Customer satisfaction metrics
    5. Personalized marketing recommendations
""", customers, transactions)
```

### 📊 **Financial Analysis**

```python
# Financial data processing
financial = ap.DataFrame("financial_data.csv")

financial.chat("""
    Financial performance review:
    1. Profit and loss analysis
    2. Cash flow trends
    3. Expense categorization
    4. Budget vs actual comparison
    5. Financial ratios and KPIs
    6. Risk assessment and recommendations
""")
```

### 🔬 **Research & Academic**

```python
# Research data analysis
research = ap.DataFrame("research_data.csv")

research.chat("""
    Statistical analysis:
    1. Descriptive statistics for all variables
    2. Correlation analysis between key factors
    3. Hypothesis testing results
    4. Outlier detection and treatment
    5. Data distribution visualizations
    6. Statistical significance testing
""")
```

## 🛠️ **Advanced Features**

### 🔧 **Custom Configuration**

```python
import askpandas as ap

# Set your preferences
ap.set_config(
    verbose=True,                    # See what's happening
    plot_style="seaborn",           # Beautiful charts
    output_dir="my_analysis",       # Save results here
    max_execution_time=120,         # Allow longer analysis
    enable_history=True             # Track all queries
)
```

### 🎨 **Custom Visualizations**

```python
# Create custom charts
from askpandas.visualization.charts import create_bar_chart, save_plot

# Custom bar chart
fig = create_bar_chart(
    df.df,
    x_col="category",
    y_col="value",
    title="My Custom Chart",
    figsize=(12, 8)
)

# Save with high quality
save_plot(fig, "custom_chart.png", dpi=300)
```

### 🔍 **Query Intelligence**

```python
# Get help with your queries
query = "Show me sales trends"
analysis = ap.analyze_query(query)
print(f"Query type: {analysis['primary_category']}")

# Get suggestions
suggestions = ap.get_query_examples('visualization')
print("Try these:", suggestions[:3])

# Validate your query
validation = ap.validate_query(query, df.columns)
if validation['is_valid']:
    print("✅ Query is valid!")
```

## 🚀 **Performance Tips for Best Results**

### 💪 **Optimize Your Queries**

```python
# ✅ Good - Specific and clear
df.chat("Calculate total revenue by month for 2024, excluding returns")

# ❌ Avoid - Too vague
df.chat("Analyze this data")

# ✅ Good - Step-by-step analysis
df.chat("""
    1. Filter data for Q4 2024
    2. Group by product category
    3. Calculate sum of revenue
    4. Sort by revenue descending
    5. Show top 10 results
""")

# ✅ Good - Include context
df.chat("Show customer retention rate, considering customers who made purchases in both 2023 and 2024")
```

### 🎯 **Choose the Right Model**

```python
# For speed and basic analysis
llm = ap.OllamaLLM(model_name="phi3:mini")      # Fastest

# For better quality and complex queries
llm = ap.OllamaLLM(model_name="mistral:7b")     # Balanced

# For best results (slower)
llm = ap.OllamaLLM(model_name="llama3.2:13b")   # Highest quality
```

### 📊 **Data Preparation Tips**

```python
# Clean your data first
df = ap.DataFrame("messy_data.csv")

# Ask AskPandas to help clean it
df.chat("""
    Help me clean this data:
    1. Identify and handle missing values
    2. Remove duplicates
    3. Fix data type issues
    4. Standardize column names
    5. Show me what was cleaned
""")

# Then analyze the clean data
df.chat("Now analyze the cleaned data for insights")
```

## 🔧 **Installation Options**

### **Basic Installation**

```bash
pip install askpandas
```

### **Full Installation (Recommended)**

```bash
pip install "askpandas[full]"
```

### **Development Installation**

```bash
git clone https://github.com/irfanalidv/AskPandas
cd AskPandas
pip install -e ".[dev]"
```

## 📱 **Platform Support**

- ✅ **macOS** - Native support with Apple Silicon optimization
- ✅ **Linux** - Full compatibility with all distributions
- ✅ **Windows** - Complete support with WSL2 recommended
- ✅ **Cloud** - Works on Google Colab, AWS, Azure, etc.

## 🆘 **Troubleshooting**

### **Common Issues & Solutions**

**"No LLM configured" error?**

```bash
# Make sure Ollama is running
ollama serve

# Check if model is downloaded
ollama list
```

**Slow responses?**

```bash
# Try a smaller model
ollama pull phi3:mini

# Close other applications to free memory
```

**Installation issues?**

```bash
# Update pip
pip install --upgrade pip

# Install with specific Python version
python3.9 -m pip install askpandas
```

## 📚 **Learning Resources**

### **Interactive Examples**

```bash
# Run the interactive demo
python simple_demo.py

# Try the configuration setup
python simple_config.py
```

### **Sample Datasets**

- `fake_sample.csv` - Small sample for testing
- `comprehensive_sample.csv` - Larger dataset for practice
- Create your own CSV files and start analyzing!

## 🎉 **Success Stories**

### **Data Scientists**

> "AskPandas reduced my data exploration time from hours to minutes. I can now focus on insights instead of coding."

### **Business Analysts**

> "I can analyze complex datasets without learning Python syntax. Natural language queries are a game-changer!"

### **Researchers**

> "Perfect for exploratory data analysis. I can quickly test hypotheses and generate visualizations for papers."

### **Students**

> "Learning data analysis has never been easier. AskPandas makes complex concepts accessible."

## 🚀 **What's Next?**

### **Version 0.2.0 (Coming Soon)**

- [ ] Jupyter notebook integration
- [ ] More visualization options (Plotly, Bokeh)
- [ ] SQL query generation
- [ ] Data pipeline automation

### **Version 1.0.0 (Future)**

- [ ] Enterprise features
- [ ] Advanced ML integration
- [ ] Real-time data streaming
- [ ] Community plugins

## 🤝 **Get Help & Contribute**

- **📖 Documentation**: [GitHub Wiki](https://github.com/irfanalidv/AskPandas/wiki)
- **🐛 Bug Reports**: [GitHub Issues](https://github.com/irfanalidv/AskPandas/issues)
- **💬 Discussions**: [GitHub Discussions](https://github.com/irfanalidv/AskPandas/discussions)
- **⭐ Star**: [GitHub Repository](https://github.com/irfanalidv/AskPandas)

## 📄 **License**

MIT License - Use freely for personal and commercial projects!

## 🙏 **Acknowledgments**

- **Ollama Team** - Making local AI accessible
- **HuggingFace** - Open-source AI models
- **Pandas Community** - Amazing data tools
- **Open Source Contributors** - Building the future together

## 🎯 **Complete Working Demonstration**

Want to see everything in action? Run our comprehensive demo:

```bash
# Clone the repository
git clone https://github.com/irfanalidv/AskPandas.git
cd AskPandas

# Run the complete demonstration
python final_working_demo.py
```

This demo showcases:

- ✅ **DataFrame Creation & Analysis** - Real data processing
- ✅ **Data Quality & Cleaning** - Automatic column standardization
- ✅ **AI-Powered Queries** - Natural language analysis
- ✅ **Multi-Dataset Analysis** - Joining and complex queries
- ✅ **Configuration Management** - Customizable settings
- ✅ **Query Intelligence** - Automatic query categorization

### **🚀 Quick Test**

```python
import askpandas as ap
import pandas as pd

# Create test data
data = {'name': ['Alice', 'Bob'], 'age': [25, 30], 'salary': [50000, 60000]}
df = pd.DataFrame(data)
ap_df = ap.DataFrame(df)

# Test basic methods
print(f"Shape: {ap_df.shape()}")  # Output: Shape: (2, 3)
print(f"Columns: {ap_df.columns()}")  # Output: Columns: ['name', 'age', 'salary']

# Get comprehensive info
print(ap_df.info())  # Output: Detailed DataFrame information

# Statistical description
print(ap_df.describe())  # Output: Statistical summary
```

---

**🚀 Ready to transform your data analysis? Install AskPandas today!**

```bash
pip install askpandas
```

**Made with ❤️ by Md Irfan Ali**
