Metadata-Version: 2.4
Name: core-plainid
Version: 2.0.0
Summary: Core library for integrating PlainID authorization into AI applications
Author: PlainID
License-Expression: MIT
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: <3.14,>=3.10
Requires-Dist: authlib<2.0.0,>=1.6.0
Requires-Dist: httpx<1.0.0,>=0.27.0
Requires-Dist: pydantic<3.0.0,>=2.12.5
Provides-Extra: all
Requires-Dist: azure-health-deidentification<2.0.0,>=1.0.0; extra == 'all'
Requires-Dist: azure-identity<2.0.0,>=1.25.2; extra == 'all'
Requires-Dist: litellm<2.0.0,>=1.81.16; extra == 'all'
Requires-Dist: presidio-analyzer<3.0.0,>=2.2.361; extra == 'all'
Requires-Dist: presidio-anonymizer<3.0.0,>=2.2.361; extra == 'all'
Requires-Dist: transformers<6.0.0,>=5.2.0; extra == 'all'
Provides-Extra: anonymization
Requires-Dist: presidio-analyzer<3.0.0,>=2.2.361; extra == 'anonymization'
Requires-Dist: presidio-anonymizer<3.0.0,>=2.2.361; extra == 'anonymization'
Provides-Extra: anonymization-ahds
Requires-Dist: azure-health-deidentification<2.0.0,>=1.0.0; extra == 'anonymization-ahds'
Requires-Dist: azure-identity<2.0.0,>=1.25.2; extra == 'anonymization-ahds'
Requires-Dist: presidio-analyzer<3.0.0,>=2.2.361; extra == 'anonymization-ahds'
Requires-Dist: presidio-anonymizer<3.0.0,>=2.2.361; extra == 'anonymization-ahds'
Provides-Extra: categorization-llm
Requires-Dist: litellm<2.0.0,>=1.81.16; extra == 'categorization-llm'
Provides-Extra: categorization-zeroshot
Requires-Dist: transformers<6.0.0,>=5.2.0; extra == 'categorization-zeroshot'
Description-Content-Type: text/markdown

# core-plainid

Core library for integrating [PlainID](https://www.plainid.com/) authorization into your AI applications. Provides text anonymization, prompt categorization, SQL query authorization, and low-level PlainID API clients.

All components fully support both **synchronous** and **asynchronous** execution. The examples below use the async API; replace `await` calls with their sync counterparts (e.g. `aanonymize` → `anonymize`, `acategorize` → `categorize`, `aget_permissions` → `get_permissions`) for synchronous usage.

## Installation

```bash
pip install core-plainid
```

The base install includes authorization clients, permissions provider, retrieval related classes and general utility functions. Additional features require optional extras:

```bash
pip install core-plainid[categorization-llm]        # LLM-based categorization via LiteLLM
pip install core-plainid[categorization-zeroshot]   # Zero-shot classification via Hugging Face
pip install core-plainid[anonymization]             # Presidio-based PII anonymization
pip install core-plainid[anonymization-ahds]        # Anonymization + Azure Health De-identification
pip install core-plainid[all]                       # Everything
```

## Setup with PlainID

Once you have installed the library, you can set up PlainID access.

1. Retrieve your PlainID credentials to access the platform — **client ID** and **client secret**.
2. Find your PlainID base URL. For the production platform you can use `https://platform-product.us1.plainid.io`. Note the URL starts with `platform-product`.

These are the 3 parameters you need to use with the library.

> **Note:** Never share your credentials or store them in your code. Use environment variables or a secret management tool to store them securely.

## Authentication

All components support three authentication modes. At least one must be available at request time.

| Mode | When to use |
|---|---|
| **Client credentials** | Provide `client_id` and `client_secret` at construction time. |
| **JWT token** | Provide only `client_id` at construction time and pass an `auth_token` per request via `RequestContext`. The `Bearer` prefix is added automatically if missing. |
| **IDP provider** | Provide only `client_id` at construction time along with an `IdpAuthProvider` for automatic OAuth2 token management. The provider fetches and caches tokens from your Identity Provider. |

When multiple modes are configured, the priority is: **auth_token** (per request) > **IDP provider** > **client_secret**. A per-request `auth_token` always takes precedence. If no `auth_token` is provided, the IDP provider is used to fetch a token. Only when neither is available does the client fall back to `client_secret`.

### Using an IDP Provider

The `IdpAuthProvider` obtains and caches OAuth2 access tokens from an external Identity Provider. Tokens are fetched lazily on first use and automatically refreshed when they expire.

```python
from core_plainid.utils.idp_auth_provider import IdpAuthProvider

idp_provider = IdpAuthProvider(
    token_url="https://your-idp.com/oauth/token",
    client_id="your_idp_client_id",
    client_secret="your_idp_client_secret",
    audience="https://your-api-audience.com",  # optional, depends on your IDP
    resource="my-resource",  # any additional custom fields are passed in the token request body
)
```

Pass the provider to any component that accepts `idp_auth_provider`:

```python
permissions_provider = PlainIDPermissionsProvider(
    base_url="https://platform-product.us1.plainid.io",
    client_id="your_client_id",
    idp_auth_provider=idp_provider,
)
```

When `idp_auth_provider` is set and no `auth_token` is provided per request, the provider automatically fetches a token from the IDP and uses it for authentication.

| Parameter | Type | Required | Description |
|---|---|---|---|
| `token_url` | `str` | Yes | The IDP's token endpoint URL |
| `client_id` | `str` | Yes | OAuth2 client ID for the IDP |
| `client_secret` | `str` | Yes | OAuth2 client secret for the IDP |
| `grant_type` | `str` | No | OAuth2 grant type (default `"client_credentials"`) |
| `audience` | `str` | No | Audience parameter included in the token request |
| `**kwargs` | | No | Additional fields to include in every token request body |

## Identity Context

PlainID supports two identity resolution modes:

- **Explicit identity** — via `entity_id`, `entity_type_id`, and optionally `additional_identities` in the request body.
- **Header-based identity** — via the `headers` field, which forwards the headers directly to the PlainID API where they are matched against configured values to resolve the relevant entities. In this mode, `entity_id`, `entity_type_id`, and `additional_identities` are not required.

```python
request_context = RequestContext(
    auth_token="your_jwt_token",
    headers={
        "x-custom-header": "value_to_match",
        "x-another-header": "value_to_match",
    },
)
```

### Multiple Identities

PlainID supports sending multiple identities in a single request. Each identity is evaluated independently, and the final decision is based on the combined evaluation (logical AND).

> **Note:** The PlainID runtime currently supports a maximum of 3 identities of 3 different types per request.

## Permissions Provider

The `PlainIDPermissionsProvider` connects to PlainID and retrieves the permissions assigned to a given identity. It is used internally by the anonymizer and categorizer, but can also be used directly to retrieve categories, entities, and tools permissions.

The provider supports **multiple identities** through the `additional_identities` field in `RequestContext`. This is designed for agentic scenarios where a primary identity (e.g. a User) and an additional agentic identity (e.g. an AI Agent) are both required when resolving permissions.

```python
from core_plainid.utils.plainid_permissions_provider import PlainIDPermissionsProvider
from core_plainid.models.context.request_context import AdditionalIdentity, RequestContext

permissions_provider = PlainIDPermissionsProvider(
    base_url="https://platform-product.us1.plainid.io",
    client_id="your_client_id",
    client_secret="your_client_secret",
)

request_context = RequestContext(
    entity_id="your_entity_id",
    entity_type_id="your_entity_type",
    additional_identities=[
        AdditionalIdentity(
            entity_id="your_additional_entity_id",
            entity_type_id="your_additional_entity_type",
        ),
    ],
)

permissions = await permissions_provider.aget_permissions(request_context)

print(permissions.categories)  # allowed category names
print(permissions.entities)    # anonymization entity actions
print(permissions.tools)       # allowed tool names
```

You can also provide `request_context` at construction time to eagerly load permissions once and reuse them across multiple calls without passing the context each time:

```python
permissions_provider = PlainIDPermissionsProvider(
    base_url="https://platform-product.us1.plainid.io",
    client_id="your_client_id",
    client_secret="your_client_secret",
    request_context=RequestContext(
        entity_id="your_entity_id",
        entity_type_id="your_entity_type",
    ),
)

permissions = permissions_provider.get_permissions()
```

## Category Filtering

The `Categorizer` classifies user prompts into categories and verifies that the classified categories are allowed by PlainID policies. If the prompt's categories are not permitted, a `PlainIDCategorizerException` is raised. Multiple identities are supported through `RequestContext`.

### PlainID Setup

To use category filtering, configure a ruleset in PlainID using the `Prompt_Control` template and set up the available categories as assets (e.g. `contract`, `HR`, `finance`):

```
# METADATA
# custom:
#   plainid:
#     kind: Ruleset
#     name: All
ruleset(asset, identity, requestParams, action) if {
    asset.template == "Prompt_Control"
}
```

### Usage

```python
from core_plainid.categorization.categorizer import Categorizer
from core_plainid.utils.plainid_permissions_provider import PlainIDPermissionsProvider
from core_plainid.models.context.request_context import RequestContext

permissions_provider = PlainIDPermissionsProvider(
    base_url="https://platform-product.us1.plainid.io",
    client_id="your_client_id",
    client_secret="your_client_secret",
)

categorizer = Categorizer(
    classifier_provider=classifier,  # see Category Classifiers below
    permissions_provider=permissions_provider,
    all_categories=["contract", "HR", "finance"],
)

request_context = RequestContext(
    entity_id="your_entity_id",
    entity_type_id="your_entity_type",
)

result = await categorizer.acategorize(
    "I'd like to know the weather forecast for today",
    request_context=request_context,
)
```

The `all_categories` parameter specifies the full list of possible categories for classification. The categorizer classifies the prompt using the provided classifier, then verifies the result against the categories allowed by PlainID. If the classified categories are not a subset of the allowed categories, a `PlainIDCategorizerException` is raised.

### Category Classifiers

Two built-in classifiers are available:

#### LLMCategoryClassifierProvider

> **Requires:** `pip install core-plainid[categorization-llm]`

Uses an LLM to classify prompts. Model calls are powered by [LiteLLM](https://docs.litellm.ai/docs/providers), which supports 100+ LLM providers (OpenAI, Anthropic, Azure, Bedrock, Ollama, etc.) through a unified interface.

Set the appropriate environment variables for your chosen provider:

```bash
# OpenAI
export OPENAI_API_KEY="your_openai_api_key"

# Anthropic
export ANTHROPIC_API_KEY="your_anthropic_api_key"

# Azure OpenAI
export AZURE_API_KEY="your_azure_api_key"
export AZURE_API_BASE="https://your-resource.openai.azure.com"
export AZURE_API_VERSION="2024-02-01"
```

For the full list of supported providers and their environment variables, see the [LiteLLM Providers documentation](https://docs.litellm.ai/docs/providers).

```python
from core_plainid.categorization.llm_category_classifier_provider import LLMCategoryClassifierProvider

llm_classifier = LLMCategoryClassifierProvider(model="openai/gpt-4o")
```

The `model` parameter follows LiteLLM's `provider/model` naming convention (e.g. `openai/gpt-4o`, `anthropic/claude-sonnet-4-20250514`, `ollama/llama3`).

Use with caution — classification quality depends on the LLM you choose. Some base models may return poor or incorrect results, so prefer larger models (OpenAI, Anthropic, etc.) or models trained for classification tasks.

#### ZeroShotCategoryClassifierProvider

> **Requires:** `pip install core-plainid[categorization-zeroshot]`

Uses a Hugging Face zero-shot classification model. The model is downloaded automatically on first use.

```python
from core_plainid.categorization.zeroshot_category_classifier_provider import ZeroShotCategoryClassifierProvider

zeroshot_classifier = ZeroShotCategoryClassifierProvider(
    model_name="facebook/bart-large-mnli",
    threshold=0.5,
)
```

The `threshold` parameter (default `0.5`) controls the minimum confidence score for a category to be included. Use this classifier if you want better classification results without relying on an external LLM API, but note it requires disk space for the downloaded model.

## Anonymization

> **Requires:** `pip install core-plainid[anonymization]`

The `PresidioAnonymizer` detects and anonymizes PII (Personally Identifiable Information) in text using [Microsoft Presidio](https://microsoft.github.io/presidio/). It supports two actions: **MASK** (replaces PII with `***`) and **ENCRYPT** (encrypts the detected PII using a provided key). Multiple identities are supported through `RequestContext`.

The list of supported PII entities is based on [Presidio's supported entities](https://microsoft.github.io/presidio/supported_entities/).

### PlainID Setup

To use anonymization, configure rulesets in PlainID using the `Output_Control` template. Each entity type you want to detect needs its own ruleset:

```
# METADATA
# custom:
#   plainid:
#     kind: Ruleset
#     name: PERSON
ruleset(asset, identity, requestParams, action) if {
    asset.template == "Output_Control"
    asset["path"] == "PERSON"
    action.id in ["MASK"]
}
```

#### Custom Regex Entities

You can define custom entities using regex patterns. Configure them in PlainID with a `REGEX` path prefix and a `regexValue` attribute containing the pattern:

```
# METADATA
# custom:
#   plainid:
#     kind: Ruleset
#     name: REGEX_EMAIL
ruleset(asset, identity, requestParams, action) if {
    asset.template == "Output_Control"
    asset["path"] == "REGEX_EMAIL"
    asset["regexValue"] == "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
    action.id in ["MASK"]
}
```

#### Azure Health De-identification (AHDS) Entities

For medical/health data, you can enable Azure Health Data Services de-identification to detect health-specific PHI entities (e.g. `DOCTOR`, `PATIENT`, `AGE`). Configure them in PlainID the same way:

```
# METADATA
# custom:
#   plainid:
#     kind: Ruleset
#     name: DOCTOR
ruleset(asset, identity, requestParams, action) if {
    asset.template == "Output_Control"
    asset["path"] == "DOCTOR"
    action.id in ["MASK"]
}
```

### Usage

```python
from core_plainid.anonymization.presidio_anonymizer import PresidioAnonymizer
from core_plainid.utils.plainid_permissions_provider import PlainIDPermissionsProvider
from core_plainid.models.context.request_context import RequestContext

permissions_provider = PlainIDPermissionsProvider(
    base_url="https://platform-product.us1.plainid.io",
    client_id="your_client_id",
    client_secret="your_client_secret",
)

anonymizer = PresidioAnonymizer(
    permissions_provider=permissions_provider,
    encrypt_key="your_16_char_key!",
)

request_context = RequestContext(
    entity_id="your_entity_id",
    entity_type_id="your_entity_type",
)

result = await anonymizer.aanonymize(
    "John Smith lives in New York",
    request_context=request_context,
)
print(result)  # "*** lives in ***"
```

The `encrypt_key` parameter is optional and only required if you use the `ENCRYPT` action. The key is used for AES encryption and must be 128, 192, or 256 bits long (16, 24, or 32 characters).

### Enabling Azure Health De-identification (AHDS)

> **Requires:** `pip install core-plainid[anonymization-ahds]`

To detect health-specific PHI entities, enable AHDS by passing `enable_ahds=True` and setting the following environment variables:

```bash
export AHDS_ENDPOINT="https://your-deid-service.api.deid.azure.com"
export AZURE_TENANT_ID="your_azure_tenant_id"
export AZURE_CLIENT_ID="your_azure_client_id"
export AZURE_CLIENT_SECRET="your_azure_client_secret"
```

```python
anonymizer = PresidioAnonymizer(
    permissions_provider=permissions_provider,
    encrypt_key="your_16_char_key!",
    enable_ahds=True,
)
```

## Tools Authorization

The `PlainIDPermissionsProvider` can retrieve the list of tools a user is authorized to use. This is useful for filtering available tools in your AI agent based on PlainID policies. Multiple identities are supported through `RequestContext`.

### PlainID Setup

Configure a ruleset in PlainID using the `Tools` template and set up the available tools as assets (e.g. `search_tool`, `calculator`, `email_sender`):

```
# METADATA
# custom:
#   plainid:
#     kind: Ruleset
#     name: All
ruleset(asset, identity, requestParams, action) if {
    asset.template == "Tools"
}
```

### Usage

```python
from core_plainid.utils.plainid_permissions_provider import PlainIDPermissionsProvider
from core_plainid.models.context.request_context import RequestContext

permissions_provider = PlainIDPermissionsProvider(
    base_url="https://platform-product.us1.plainid.io",
    client_id="your_client_id",
    client_secret="your_client_secret",
)

request_context = RequestContext(
    entity_id="your_entity_id",
    entity_type_id="your_entity_type",
)

permissions = await permissions_provider.aget_permissions(request_context)
allowed_tools = permissions.tools  # e.g. ["search_tool", "calculator"]
```

## SQL Database Authorizer

The `PlainIDSQLAuthorizerClient` dynamically modifies SQL queries based on PlainID authorization policies, enforcing Row-Level Security (RLS) and Column-Level Security (CLS) at query time.

> **Note:** The SQL Authorizer does not currently support multiple identities. Only a single identity context (`entity_id` / `entity_type_id`) can be provided per request.

### Usage

```python
from core_plainid.clients.plainid_sql_authorizer_client import PlainIDSQLAuthorizerClient
from core_plainid.models.request.sql_authorizer_request import (
    SQLAuthorizerRequest,
    SQLAuthorizerFlags,
    PoliciesJoinOperation,
)

sql_authorizer = PlainIDSQLAuthorizerClient(
    base_url="https://your-sql-authz.plainid.cloud",
    client_id="your_client_id",
    client_secret="your_client_secret",
)

request = SQLAuthorizerRequest(
    sql="SELECT * FROM accounts WHERE country = 'US'",
    entity_id="your_entity_id",
    entity_type_id="your_entity_type",
    flags=SQLAuthorizerFlags(
        empty_rls_treat_as_denied=True,
        empty_cls_treat_as_permitted=True,
        expand_star_column=True,
        policies_join_operation=PoliciesJoinOperation.OR,
    ),
)

response = sql_authorizer.authorize_sql(request)
print(response.sql)           # the modified SQL query
print(response.was_modified)  # True if policies were applied
```

Using a JWT token instead of client secret:

```python
sql_authorizer = PlainIDSQLAuthorizerClient(
    base_url="https://your-sql-authz.plainid.cloud",
    client_id="your_client_id",
)

response = sql_authorizer.authorize_sql(request, auth_token="your_jwt_token")
```

Using an IDP provider:

```python
sql_authorizer = PlainIDSQLAuthorizerClient(
    base_url="https://your-sql-authz.plainid.cloud",
    client_id="your_client_id",
    idp_auth_provider=idp_provider,
)

response = sql_authorizer.authorize_sql(request)
```

## PlainID Auth Client

The `PlainIDAuthClient` is the low-level client used internally by `PlainIDPermissionsProvider` to communicate with the PlainID API. It provides `get_token` / `aget_token` for retrieving user access tokens and `get_resolution` / `aget_resolution` for fetching resolution data. In most cases you should use `PlainIDPermissionsProvider` instead of interacting with this client directly.

All three authentication modes (client credentials, JWT token, IDP provider) and both identity approaches (explicit entity ID or HTTP headers matching) are supported — see the **Authentication** and **Identity Context** sections above.

```python
from core_plainid.clients.plainid_auth_client import PlainIDAuthClient

client = PlainIDAuthClient(
    base_url="https://platform-product.us1.plainid.io",
    client_id="your_client_id",
    client_secret="your_client_secret",
)

token_response = await client.aget_token(
    entity_id="your_entity_id",
    entity_type_id="your_entity_type",
)
```

## Exceptions

The library provides specific exceptions to help identify which component caused an error. All exceptions inherit from `PlainIDException` and include the error `message` and the `original_exception` (if applicable).

| Exception | Description |
|---|---|
| `PlainIDException` | Base exception for all PlainID errors |
| `PlainIDAuthClientException` | Errors in the PlainID Auth client |
| `PlainIDPermissionsException` | Errors in permissions processing |
| `PlainIDCategorizerException` | Errors in the categorizer component |
| `PlainIDAnonymizerException` | Errors in the anonymizer component |
| `PlainIDSQLAuthorizerClientException` | Errors in the SQL Authorizer client |
| `PlainIDFilterException` | Errors in filter processing |
| `PlainIDRetrieverException` | Errors in the retriever components |
| `LlmResponseException` | Malformed or unexpected LLM responses |

```python
from core_plainid.exceptions.plainid_exceptions import PlainIDAnonymizerException

try:
    result = await anonymizer.aanonymize(query, request_context=request_context)
except PlainIDAnonymizerException as e:
    print(f"Anonymization error: {e.message}")
```
