Metadata-Version: 2.4
Name: MongoDB-toolkit
Version: 0.1.0
Summary: A Mongodb toolkit for Function Calling and AI Agents
Author-email: Ateeq Azam <mr.ateeqazam@gmail.com>
License: CC BY-ND
Project-URL: Homepage, https://github.com/semwaqas/MongoDB-toolkit
Project-URL: Repository, https://github.com/semwaqas/MongoDB-toolkit
Keywords: AI Agents,Function Calling,Tool Calling,Mongodb,Toolkit,Langchain
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pymongo
Requires-Dist: langchain
Dynamic: license-file

# MongoDB Toolkit for LangChain

[![PyPI version](https://badge.fury.io/py/mongodb-toolkit.svg)](https://badge.fury.io/py/mongodb-toolkit)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**`mongodb-toolkit`** provides a set of Python functions and pre-built LangChain tools designed to facilitate interaction between Large Language Models (LLMs) and MongoDB databases. It allows LLMs (via function/tool calling) to intelligently discover database schemas, validate query syntax and structure, and execute queries against your MongoDB instance.

This toolkit bridges the gap between natural language requests and structured MongoDB operations.

## Features

*   **Infer Database Schema:** Automatically sample collections to generate a representative schema (`generate_db_schema`). Essential for LLMs to understand data structure before crafting queries.
*   **Validate Query Syntax:** Check if a MongoDB query filter document follows correct MongoDB syntax rules, without needing a schema (`validate_mongodb_query_syntax`). Catches basic structural errors.
*   **Validate Query Against Schema:** Validate a query filter document against a previously generated schema, checking for valid field names, data types, and operator usage (`validate_query`). Ensures queries align with expected data structure.
*   **Execute Queries:** Run validated MongoDB queries (`find` operations) against the database (`execute_mongodb_query`). *(Note: Implementation details for `execute_query.py` are assumed)*.
*   **LangChain Tool Integration:** Provides ready-to-use `langchain.tools.Tool` instances for each core function, simplifying integration with LangChain agents and chains (`toolkit.py`).

## Installation

You can install the toolkit directly from PyPI (once published):

```bash
pip install mongodb-toolkit
```

Or, install directly from the source code:

```bash
git clone https://your-repo-url/mongodb-toolkit.git
cd mongodb-toolkit
pip install .
```

**Dependencies:** This package relies on `pymongo` and `langchain`. Ensure they are installed in your environment:

```bash
pip install pymongo langchain
```

## Usage

The toolkit can be used in two main ways:

1.  **Direct Function Calls:** Import and use the functions directly in your Python code.
2.  **LangChain Tools:** Integrate the pre-built tools into your LangChain agents or chains for LLM-driven database interaction.

### 1. Direct Function Usage

```python
from mongodb_toolkit import (
    generate_db_schema,
    validate_mongodb_query_syntax,
    validate_query,
    execute_mongodb_query
)
from pymongo import MongoClient

# --- Configuration ---
MONGO_URI = "mongodb://localhost:27017/"
DB_NAME = "my_database"
COLLECTION_NAME = "my_collection"
SAMPLE_SIZE = 50 # For schema generation

# --- Example Workflow ---

# 1. Generate Schema (Optional but recommended for validation)
try:
    print(f"Generating schema for database: {DB_NAME}")
    db_schema = generate_db_schema(
        db_name=DB_NAME,
        mongo_uri=MONGO_URI,
        sample_size=SAMPLE_SIZE
        # target_collection_name=COLLECTION_NAME # Optional: limit to one collection
    )
    if db_schema:
        print("Schema generated successfully.")
        collection_schema = db_schema.get(COLLECTION_NAME, {})
    else:
        print("Failed to generate schema or DB/Collection not found.")
        collection_schema = {}

except Exception as e:
    print(f"Error generating schema: {e}")
    collection_schema = {}


# 2. Define a Query (Example)
# This might be generated by an LLM based on user input and the schema
user_query_filter = {
    "status": "active",
    "age": {"$gte": 30},
    "address.city": "Springfield"
}


# 3. Validate Query Syntax
print("\nValidating query syntax...")
syntax_errors = validate_mongodb_query_syntax(user_query_filter)
if not syntax_errors:
    print("Query syntax is valid.")
else:
    print("Query syntax errors found:")
    for err in syntax_errors:
        print(f"  - {err}")
    # Handle errors - potentially ask LLM to correct


# 4. Validate Query Against Schema (if schema available)
if collection_schema and not syntax_errors:
    print("\nValidating query against schema...")
    schema_validation_errors = validate_query(user_query_filter, collection_schema)
    if not schema_validation_errors:
        print("Query is valid against the schema.")
    else:
        print("Query schema validation errors found:")
        for err in schema_validation_errors:
            print(f"  - {err}")
        # Handle errors - potentially ask LLM to correct based on schema info

# 5. Execute Query (if valid)
# Assuming execute_mongodb_query takes URI, DB, Collection, and Filter
if not syntax_errors and (not collection_schema or not schema_validation_errors):
    print("\nExecuting query...")
    try:
        results = execute_mongodb_query(
            mongo_uri=MONGO_URI,
            db_name=DB_NAME,
            collection_name=COLLECTION_NAME,
            query_filter=user_query_filter
            # Add other params like projection, limit if your function supports them
        )
        print("Query executed successfully. Results:")
        # Process or display results (limit output for LLMs)
        for doc in results: # Assuming results is an iterable like a cursor or list
            print(doc)

    except Exception as e:
        print(f"Error executing query: {e}")

```

### 2. LangChain Tool Usage

Import the tools and add them to your LangChain agent's tool list. The agent can then decide which tool to use based on the user's request or the workflow stage.

```python
from langchain_openai import ChatOpenAI # Or your preferred Chat model
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate

# Import the tools from the toolkit
from mongodb_toolkit import (
    get_schema_tool,
    validate_query_syntax_tool,
    validate_query_tool,
    execute_query_tool
)

# Ensure OPENAI_API_KEY is set in your environment or configure client
llm = ChatOpenAI(model="gpt-4o-mini", temperature=1) 

# Define the tools the agent can use
tools = [
    get_schema_tool,
    validate_query_syntax_tool,
    validate_query_tool,
    execute_query_tool
]

# --- Agent Prompt (Example) ---
# You'll need a prompt engineered for database interaction
# This is a simplified example
prompt_template = """
You are an assistant that interacts with a MongoDB database.
You have access to the following tools:

{tools}

Use the tools in sequence:
1. If you don't know the database structure, use 'get_schema' first. Provide the database connection details (URI, DB name).
2. Based on the user request and the schema, generate a MongoDB query filter document.
3. Use 'validate_query_syntax' to check the basic structure of your generated query.
4. If syntax is valid and you have a schema, use 'validate_query' to check the query against the schema.
5. If a validation tool returns errors, *you must* correct the query based on the error message and try validating again (up to 3 attempts per query). Do not execute invalid queries.
6. If the query passes all validations, use 'execute_query' to run it. Provide the connection details and the validated query filter.
7. Respond to the user with the results or a summary.

User Request: {input}
Intermediate Steps: {agent_scratchpad}
"""

prompt = ChatPromptTemplate.from_messages([
    ("system", prompt_template),
    ("human", "{input}"),
    # Use MessagesPlaceholder for agent scratchpad/history
])


# --- Create Agent ---
# Using the new create_tool_calling_agent method
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# --- Run the Agent ---
# The agent will use the tool descriptions to decide when to call them.
# Tool inputs need to be structured correctly (e.g., dictionaries for queries/schemas)
# You might need helper functions or input parsing depending on your agent setup.

response = agent_executor.invoke({
    "input": "Find active users in the 'users' collection who are older than 40 in the 'testdb' database."
    # Add database connection info if not hardcoded in tools or passed differently
})

print(response['output'])

```

## API Reference

### Core Functions

*   **`generate_db_schema(db_name: str, mongo_uri: str, sample_size: int = 100, target_collection_name: Optional[str] = None) -> dict`**
    *   Connects to MongoDB, samples documents, and infers a schema structure.
    *   Returns a dictionary representing the database schema.
*   **`validate_mongodb_query_syntax(query_doc: dict) -> list`**
    *   Checks a query filter dictionary for basic MongoDB syntax validity (operators, structure).
    *   Returns a list of error strings (empty if valid).
*   **`validate_query(query_doc: dict, expected_schema: dict) -> list`**
    *   Validates a query filter dictionary against a provided schema dictionary. Checks field names, types, operator usage relative to the schema.
    *   Returns a list of error strings (empty if valid).
*   **`execute_mongodb_query(mongo_uri: str, db_name: str, collection_name: str, query_filter: dict, **kwargs) -> List[dict]`**
    *   Executes a MongoDB `find` query. *(Signature is assumed - adjust based on your implementation)*.
    *   Returns a list of documents found.

### LangChain Tools

These wrap the core functions for use with LangChain agents.

*   **`get_schema_tool: Tool`**
    *   **Description:** Generate a schema for a MongoDB database. Input should contain connection details. Output is the schema.
    *   **Function:** `generate_db_schema`
*   **`validate_query_syntax_tool: Tool`**
    *   **Description:** Validate the syntax of a MongoDB query filter document. Input is the query document. Output indicates validity or lists errors. Suggests retrying corrections up to 3 times on error.
    *   **Function:** `validate_mongodb_query_syntax`
*   **`validate_query_tool: Tool`**
    *   **Description:** Validate a MongoDB query filter document against a schema. Input includes the query document and the schema. Output indicates validity or lists errors. Suggests retrying corrections up to 3 times on error.
    *   **Function:** `validate_query`
*   **`execute_query_tool: Tool`**
    *   **Description:** Execute a MongoDB query. Input includes connection details and the query filter document. Output is the query result.
    *   **Function:** `execute_mongodb_query`

## Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues for bugs, feature requests, or improvements.

1.  Fork the repository.
2.  Create your feature branch (`git checkout -b feature/AmazingFeature`).
3.  Commit your changes (`git commit -m 'Add some AmazingFeature'`).
4.  Push to the branch (`git push origin feature/AmazingFeature`).
5.  Open a Pull Request.

## License

Distributed under the Creative Commons Attribution-NoDerivatives (CC BY-ND) License. See `LICENSE` file for more information.
