Metadata-Version: 2.4
Name: agent-tracker
Version: 0.4.0
Summary: Telemetry SDK for self-built AI trading agents.
Author: Ellzaf
License-Expression: MIT
Project-URL: Homepage, https://ellzaf.com
Project-URL: Documentation, https://ellzaf.com
Project-URL: Repository, https://github.com/Ellzaf/agent-tracker
Project-URL: Issues, https://github.com/Ellzaf/agent-tracker/issues
Keywords: ai,observability,telemetry,trading,paper-trading
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: build>=1.2; extra == "dev"
Requires-Dist: jsonschema>=4.23; extra == "dev"
Requires-Dist: pytest>=8; extra == "dev"
Requires-Dist: ruff>=0.8; extra == "dev"
Requires-Dist: twine>=6; extra == "dev"
Provides-Extra: aitrade
Requires-Dist: psycopg[binary]>=3.2; extra == "aitrade"
Provides-Extra: otel
Provides-Extra: openai
Provides-Extra: langchain
Provides-Extra: agno
Dynamic: license-file

# Ellzaf Agent Tracker

Ellzaf Agent Tracker is a Python SDK for AI trading agents.

Install it in your agent repo, send redacted telemetry to Ellzaf, and use that
data for engineering diagnostics, trading-agent statistics, replay checks, and
repair prompts.

Ellzaf looks at how your agent behaves:

- what it read before a decision;
- which model and prompt version it used;
- what trade or allocation it wanted to make;
- which risk gate allowed, changed, or blocked the action;
- what happened in paper trading, shadow trading, or replay;
- where the agent drifted, used stale data, missed sources, or broke tests.

Ellzaf Agent Tracker does not place broker orders, rank stocks, generate buy or sell
signals, or replace your risk gates. Your agent remains in control of its own
logic. Ellzaf observes the system and reports engineering, safety, and data
quality issues.

## Who This Is For

Use this package if you are building, testing, or maintaining a Python AI
trading agent.

It can work with most agent designs after you map your existing objects to
Ellzaf events. Your repo may use OpenAI, Anthropic, LangChain, Agno, a custom
loop, a Postgres journal, JSON files, notebooks, or plain Python classes. The
SDK only needs structured facts about the run.

You get the smoothest setup if your agent follows the Ellzaf ebook, uses the
Ellzaf reference code, or was installed through an Ellzaf setup. Those projects
already have the same concepts Ellzaf expects: prompts, source checks, market
snapshots, decisions, risk gates, paper fills, replay tests, and performance
tracking.

Learn the full system, buy the reference code, or buy a guided setup at
[ellzaf.com](https://ellzaf.com).

## What You Send

Start small. A useful integration records one run, one model call, one decision,
one risk check, and one outcome.

Add richer events when you want hosted stats and weekly repair prompts.

| Your Agent Has | Send To Ellzaf |
| --- | --- |
| Agent run or workflow | `agent.run.started`, `agent.run.completed` |
| Prompt version, model, provider | `llm.call.started`, `llm.call.completed` |
| Search, tools, citations | `tool.call.completed`, `source.claim.recorded` |
| Market data and freshness | `market.snapshot.recorded` |
| Memory or context reads | `memory.read.completed` |
| Candidate boards and reviewed opportunities | `opportunity.board.recorded`, `opportunity.candidate.reviewed` |
| Setup regimes and entry permissions | `setup.profile.recorded` |
| Proposed allocation or action | `decision.proposed` |
| Planned, skipped, clipped, or deferred actions | `action.outcome.recorded` |
| Planned order before execution | `order.intent.recorded` |
| Risk checks and blocked actions | `risk.check.completed`, `trade.rejected` |
| Paper, shadow, or replay fills | `paper.fill.recorded` |
| Positions and portfolio state | `position.snapshot.recorded`, `portfolio.snapshot.recorded` |
| Deposits, withdrawals, fees | `capital.flow.recorded` |
| P&L, returns, drawdown | `performance.snapshot.recorded` |
| Replay or regression tests | `replay.result.recorded` |
| Strategy, setup, market regime | `strategy.context.recorded` |
| Same-input model or agent comparisons | `evaluation.epoch.started`, `evaluation.epoch.member.completed` |
| Build, config, risk-gate version | `agent.build.recorded` |
| Local checks your agent or CI already runs | `diagnostic.check.completed` |
| Cost and errors | `cost.usage.recorded`, `error.recorded` |

## Install

Install in the same Python environment as your trading agent:

```bash
python -m pip install agent-tracker
```

From this repository:

```bash
python -m pip install -e .
```

For local SDK development:

```bash
python -m pip install -e ".[dev]"
```

For the optional exporter used by Ellzaf reference-code databases:

```bash
python -m pip install -e ".[aitrade]"
```

## Configure

Create a starter env file:

```bash
agent-tracker init
```

Then set your project values:

```bash
export ELLZAF_PROJECT="your-dashboard-project-slug"
export ELLZAF_API_KEY="your-tracker-ingestion-key"
export ELLZAF_ENVIRONMENT="paper"
export ELLZAF_AGENT_ID="local-agent"
```

Use the **Project slug** shown in the Ellzaf Monitoring dashboard for
`ELLZAF_PROJECT`. Do not use the display name if it differs from the slug. Use
the Tracker ingestion key shown once when you create or rotate the project key
for `ELLZAF_API_KEY`.

Common optional settings:

```bash
export ELLZAF_QUEUE_DIR=".ellzaf/queue"
export ELLZAF_TELEMETRY_ENABLED="true"
export ELLZAF_STORE_FULL_IO="false"
export ELLZAF_GZIP="true"
export ELLZAF_SAMPLE_RATE="1.0"
export ELLZAF_DEDUPE_IDEMPOTENCY_KEYS="false"
```

The default base endpoint is `https://ellzaf.com`. The SDK uploads batches to
`https://ellzaf.com/v1/events/batch`. Only set `ELLZAF_ENDPOINT` if Ellzaf
support gives you a different base URL.

Supported environments:

- `development`
- `paper`
- `shadow`
- `replay`
- `live_observe`

Keep `ELLZAF_STORE_FULL_IO=false` unless you want to store prompt and model
output text. The default sends hashes and character counts instead.

Optional local volume controls:

```bash
export ELLZAF_MAX_EVENTS_PER_RUN=""
export ELLZAF_MAX_EVENTS_PER_DAY=""
export ELLZAF_MAX_UPLOAD_BYTES_PER_DAY=""
```

Leave these blank unless your agent is high volume. Errors, failed risk checks,
failed or warning diagnostics, rejected trades, fills, portfolio snapshots,
performance snapshots, and replay results are preserved by default even when
sampling is enabled.

## Quick Start

This example records a decision that the risk gate blocks. Ellzaf can later use
that trace to explain the block, detect stale inputs, and suggest tests or code
changes.

```python
from agent_tracker import AgentTracker

tracker = AgentTracker.from_env()

with tracker.run(run_type="portfolio_allocation", symbols=["NVDA", "MSFT"]) as run:
    run.prompt_version(
        family="allocation",
        version="2026-06-07",
        prompt_hash="sha256:...",
        provider="openai",
        model="example-model",
    )

    run.market_snapshot(
        source="local_bars",
        freshness_seconds=180,
        session_state="regular",
        signed_fields_present=True,
        non_finite_count=0,
        invalid_ohlc_relation_count=0,
    )

    run.decision_proposed(
        decision_id="decision_1",
        decision_kind="target_weight",
        action="increase",
        symbol="NVDA",
        target_weight="0.15",
    )

    order = run.order_intent(
        order_intent_id="intent_1",
        decision_id="decision_1",
        symbol="NVDA",
        side="buy",
        intended_quantity="2",
        intended_price="100.00",
        open_close_effect="open",
        session_date="2026-06-07",
    )

    run.risk_check(
        approved=False,
        reasons=["max_position_pct"],
        component="risk_gate",
        severity="warning",
        mistake_family="custom.max_position_pct_block",
        next_safe_action="observe",
    )

    run.decision_outcome(
        decision_id="decision_1",
        outcome_kind="no_order",
        linked_event_ids=[order["event_id"]],
        changed_by_risk_gate=True,
    )

    run.final_action(action="no_order", reason="risk_gate_rejected")

tracker.flush_all()
```

Record opportunity diagnostics when your agent builds a board, candidate packet,
or shortlist before the model decides:

```python
with tracker.run(run_type="portfolio_allocation", symbols=["NVDA"]) as run:
    run.opportunity_board(
        board_id="board_20260609_001",
        scope="full_universe",
        source="stored_bars",
        candidate_count="48",
        reviewed_count="12",
        excluded_count="3",
    )

    run.candidate_review(
        candidate_id="candidate_NVDA_001",
        board_id="board_20260609_001",
        symbol="NVDA",
        review_status="optimizer_skipped",
        reason_code="turnover_capacity",
    )

    run.setup_profile(
        setup_profile_id="setup_NVDA_001",
        symbol="NVDA",
        primary_regime="trend_continuation",
        entry_permission="eligible_starter",
        trend_quality_score="81",
    )

    run.action_outcome(
        action_id="action_NVDA_001",
        action_kind="rebalance",
        status="clipped",
        symbol="NVDA",
        requested_notional="1000.00",
        executed_notional="600.00",
        clipped=True,
    )
```

Use evaluation epochs when you compare several models, prompts, or agent
profiles on the same input snapshot:

```python
with tracker.run(run_type="shadow_comparison") as run:
    run.evaluation_epoch(
        epoch_id="epoch_20260609_001",
        epoch_kind="model_comparison",
        context_hash="sha256:...",
        expected_member_count=3,
        candidate_count=48,
    )

    run.evaluation_epoch_member(
        epoch_id="epoch_20260609_001",
        member_id="model_a",
        expected=True,
        state="completed",
        coverage_penalty="0",
        scored=True,
    )
```

## Add Decision-Flow Diagnostics

Stats show what happened. Diagnostics explain whether the data is strong enough
to trust the explanation.

Agent Tracker can help Ellzaf answer questions like:

- Did the agent preserve signed returns instead of clamping negative values?
- Were OHLCV rows finite, fresh, and internally valid?
- Did setup profiles keep regime and entry-permission fields after restart?
- Were candidate boards and missed opportunities recorded with reasons?
- Did a prompt, config, or risk-gate change get replayed before release?

If your agent already runs a check, record it directly:

```python
with tracker.run(run_type="pre_trade_diagnostics", symbols=["NVDA"]) as run:
    run.diagnostic_check(
        check_id="decision_flow.market_data_quality",
        check_family="market_data",
        status="warning",
        severity="warning",
        component="market_data",
        mistake_family="market.open_session_stale_bars",
        money_impact="possible",
        blocking_status="workflow_deferred",
        resolution_status="open",
        next_safe_action="run_test",
        observed={"freshness_seconds": 900},
        expected={"fresh_market_data": True},
    )
```

You can also generate local diagnostic check events from an exported JSONL file:

```bash
agent-tracker decision-flow-readiness ellzaf-events.jsonl
agent-tracker diagnose ellzaf-events.jsonl --output ellzaf-diagnostics.jsonl
agent-tracker validate-jsonl ellzaf-diagnostics.jsonl --profile strict-diagnostics
```

These commands do not call an LLM and do not upload anything. They make it
easier for coding agents to verify an integration before you send data to
Ellzaf.

For a smaller transition, wrap one function and let the SDK flush after the run:

```python
from agent_tracker import AgentTracker

tracker = AgentTracker.from_env()

@tracker.trace(run_type="portfolio_allocation", flush_after=True)
def run_agent() -> None:
    ...
```

You can also wrap existing functions that already return structured results:

```python
safe_risk_gate = tracker.wrap_risk_gate(
    risk_gate.validate,
    approved=lambda result: result.approved,
    reasons=lambda result: result.reasons,
)

tracked_decision = tracker.wrap_decision(
    agent.decide,
    decision_kind="target_weight",
    action=lambda result: result.action,
    symbol=lambda result: result.symbol,
)
```

These wrappers preserve the wrapped function's return value and exception
behavior. Uploads should still be mocked in tests.

## Add Trading Stats

Ellzaf needs trade lifecycle and account context to compute useful stats. Add
these events when your agent has the data.

```python
with tracker.run(run_type="paper_fill", symbols=["NVDA"]) as run:
    run.paper_fill(
        fill_id="fill_1",
        position_id="pos_1",
        order_intent_id="intent_1",
        symbol="NVDA",
        side="sell",
        open_close_effect="close",
        quantity="2",
        price="101.00",
        fees="0.25",
        currency="USD",
        fill_source="paper",
        session_date="2026-06-07",
        strategy_id="strat_breakout",
        setup="gap_hold",
    )

    run.position_snapshot(
        portfolio_kind="paper",
        position_id="pos_1",
        symbol="NVDA",
        quantity="0",
        realized_pnl="9.75",
    )

    run.performance_snapshot(
        period_kind="daily",
        period_start="2026-06-07",
        period_end="2026-06-07",
        session_date="2026-06-07",
        trading_pnl_amount="9.75",
        net_pnl_amount="9.75",
        fees="0.25",
        flow_adjusted_equity_change="9.75",
        return_base="1000.00",
        compounded_return_pct="0.98",
        max_drawdown_pct="1.2",
    )
```

Typed payload builders are available when you want IDE help or shared helper
code:

```python
from agent_tracker import PaperFillPayload

payload = PaperFillPayload(
    fill_id="fill_1",
    position_id="pos_1",
    symbol="NVDA",
    side="sell",
    open_close_effect="close",
    quantity="2",
    price="101.00",
    fees="0.25",
    session_date="2026-06-07",
).to_payload()

tracker.event("paper.fill.recorded", run_id="run_fill_1", payload=payload)
```

Run the readiness check against exported JSONL:

```bash
agent-tracker validate-jsonl ellzaf-events.jsonl --profile strict-reporting
agent-tracker reporting-readiness ellzaf-events.jsonl
agent-tracker tier-readiness ellzaf-events.jsonl
```

The readiness report tells you which dashboards Ellzaf can compute from your
data and which fields your agent still needs to send.

## Use A Coding Agent To Integrate

This package ships prompts for Codex, Claude Code, and similar coding agents.
Run the prompt command inside the repo you want to instrument:

```bash
agent-tracker print-agent-prompt --profile ebook
```

The `ebook` profile is for agents built from Ellzaf lessons, the Ellzaf
reference code, or a similar local trading-agent architecture.

For a repo review after integration:

```bash
agent-tracker print-agent-prompt --profile review
```

For a custom Python trading agent:

```bash
agent-tracker print-agent-prompt --profile custom
agent-tracker doctor-repo --path . --write-plan agent-tracker-plan.md
```

For backend ingestion teams:

```bash
agent-tracker print-agent-prompt --profile backend
```

## Manual Events

Use `event(...)` when helper methods do not match your code.

```python
tracker.event(
    "risk.check.completed",
    run_id="run_example",
    symbols=["NVDA"],
    payload={
        "risk_check_kind": "deterministic",
        "approved": False,
        "reasons": ["stale_market_data"],
    },
)
```

The SDK validates the event before it writes to disk or uploads.

## Local JSONL Export

Use `JsonlSink` for local audits, support bundles, or custom adapters.

```python
from agent_tracker import JsonlSink

sink = JsonlSink("ellzaf-events.jsonl")
sink.write(event)
```

The sink redacts, validates, and writes one event per line.

Generate sample files:

```bash
agent-tracker emit-sample --profile ebook --output ellzaf-sample.jsonl
agent-tracker emit-sample --profile reporting --output ellzaf-reporting.jsonl
```

Validate any file before you upload or share it:

```bash
agent-tracker validate-jsonl ellzaf-sample.jsonl
agent-tracker validate-jsonl ellzaf-reporting.jsonl --profile strict-reporting
```

Build local product artifacts:

```bash
agent-tracker tier-readiness ellzaf-reporting.jsonl
agent-tracker agentic-security-readiness ellzaf-reporting.jsonl
agent-tracker decision-flow-readiness ellzaf-reporting.jsonl
agent-tracker diagnose ellzaf-reporting.jsonl --output ellzaf-diagnostics.jsonl
agent-tracker proof-readiness ellzaf-reporting.jsonl
agent-tracker arena-readiness ellzaf-reporting.jsonl
agent-tracker repair-pack ellzaf-reporting.jsonl --output repair-pack.json
agent-tracker dataset-from-events ellzaf-reporting.jsonl --output dataset.jsonl
agent-tracker eval-plan ellzaf-reporting.jsonl --output eval-plan.json
agent-tracker experiment-manifest --from-repair-pack repair-pack.json
```

These commands are deterministic and local. They do not call an LLM.

## Custom Log Mapping

If your agent already writes JSONL, CSV, JSON arrays, or SQLite logs, you can
export Ellzaf events with a declarative mapping file.

```toml
project = "your-dashboard-project-slug"
agent_id = "local-agent"
environment = "paper"

[[sources]]
name = "risk_checks"
kind = "csv"
path = "risk_checks.csv"
event_type = "risk.check.completed"
run_id_field = "run_id"
occurred_at_field = "checked_at"
symbols_field = "symbol"

[sources.payload_defaults]
risk_check_kind = "deterministic"

[sources.fields]
approved = { path = "approved", type = "bool" }
reasons = { path = "reasons", type = "list", required = false }
```

Run the export and validate it before upload:

```bash
agent-tracker map-events --config ellzaf-mapping.toml --output ellzaf-events.jsonl
agent-tracker validate-jsonl ellzaf-events.jsonl --profile strict-reporting
```

Mapping output goes through normal SDK validation and redaction. Bad rows are
skipped with sanitized row warnings so one malformed row does not block the
whole export.

## Reference-Code Exporter

Some Ellzaf projects store telemetry-like rows in a Postgres database. Use the
optional exporter when your repo has those tables or close equivalents:

```python
from agent_tracker.adapters.aitrade import AitradeExporter

exporter = AitradeExporter.from_database_url(database_url)
summary = exporter.export_jsonl("ellzaf-events.jsonl")
```

For tests, pass rows without a database:

```python
events, summary = AitradeExporter().events_from_rows(rows_by_table)
```

If your table names or fields differ, write a thin adapter that emits the same
Ellzaf event types. Most custom agents only need small mapping changes.

## Privacy And Safety

Ellzaf Agent Tracker redacts events before queueing or upload.

Default behavior:

- prompt and model output fields become hashes with character counts;
- API keys, bearer tokens, passwords, and common secret patterns become
  `[REDACTED]`;
- broker payloads and account identifiers become hashes;
- bytes become hash and byte-count metadata;
- non-finite numbers such as `NaN` and `Infinity` are rejected or converted to
  safe JSON values before upload.

The SDK does not call brokers, read broker quotes, or create orders.

## Queue And Upload

The SDK writes one event per JSONL file under `.ellzaf/queue` by default.
`flush()` uploads one batch to Ellzaf with gzip and bearer-token
authentication. `flush_all()` drains the queue until it is empty, skipped, or a
retryable error needs a later attempt.

```python
summary = tracker.flush()
summary = tracker.flush_all()

print(summary.attempted)
print(summary.accepted)
print(summary.rejected)
print(summary.retryable)
print(summary.reason_code)
print(summary.stop_reason)
```

If the API key is missing, `flush()` leaves events in the local queue and
returns a skipped summary. If Ellzaf returns a retryable error, the SDK keeps
the event pending for a later flush with local retry metadata. While a retry
backoff window is active, `flush()` returns `retry_not_due` and keeps the files
in place. If another process is already flushing the same queue, `flush()`
returns `queue_locked` instead of racing the upload.

By default the queue is append-only. Set
`ELLZAF_DEDUPE_IDEMPOTENCY_KEYS=true` only if your integration may emit the same
idempotency key more than once before a flush and you want later duplicates to
reuse the existing pending file.

Check upload configuration without moving queue files:

```bash
agent-tracker flush --dry-run
agent-tracker flush --drain --dry-run
```

Run an isolated diagnostic check:

```bash
agent-tracker doctor-upload
```

By default `doctor-upload` prepares a diagnostic batch without using the
network. Pass `--live` only when you want to send the diagnostic event to
Ellzaf.

Use `canary` for the same production-ingestion contract check with canary
labeling:

```bash
agent-tracker canary
agent-tracker canary --live
```

Disable queue writes and uploads:

```bash
export ELLZAF_TELEMETRY_ENABLED="false"
```

You can still create and validate event objects with telemetry disabled.

## Test Your Integration

Add a test in your agent repo:

```python
from agent_tracker.testing import assert_valid_agent_tracker_events


def test_agent_tracker_events(events):
    assert_valid_agent_tracker_events(events)
```

Use stricter profiles when your repo should support hosted stats, arena scoring,
or proof pages:

```python
assert_valid_agent_tracker_events(events, profile="strict-reporting")
assert_valid_agent_tracker_events(events, profile="strict-diagnostics")
assert_valid_agent_tracker_events(events, profile="strict-arena")
assert_valid_agent_tracker_events(events, profile="strict-proof")
```

The helper checks schema rules, UTC timestamps, taxonomy values, privacy flags,
secret patterns, raw prompt/output leaks, raw broker payloads, raw account IDs,
and required event coverage.

## CLI Reference

```bash
agent-tracker init
agent-tracker doctor-repo --path .
agent-tracker doctor-repo --path . --write-plan agent-tracker-plan.md
agent-tracker print-agent-prompt --profile ebook
agent-tracker print-agent-prompt --profile custom
agent-tracker print-agent-prompt --profile review
agent-tracker print-agent-prompt --profile backend
agent-tracker emit-sample --profile ebook --output ellzaf-sample.jsonl
agent-tracker emit-sample --profile reporting --output ellzaf-reporting.jsonl
agent-tracker validate-jsonl ellzaf-sample.jsonl
agent-tracker validate-jsonl ellzaf-reporting.jsonl --profile strict-reporting
agent-tracker validate-jsonl ellzaf-reporting.jsonl --profile strict-diagnostics
agent-tracker reporting-readiness ellzaf-reporting.jsonl
agent-tracker tier-readiness ellzaf-reporting.jsonl
agent-tracker agentic-security-readiness ellzaf-reporting.jsonl
agent-tracker decision-flow-readiness ellzaf-reporting.jsonl
agent-tracker diagnose ellzaf-reporting.jsonl --output ellzaf-diagnostics.jsonl
agent-tracker proof-readiness ellzaf-reporting.jsonl
agent-tracker arena-readiness ellzaf-reporting.jsonl
agent-tracker repair-pack ellzaf-reporting.jsonl --output repair-pack.json
agent-tracker dataset-from-events ellzaf-reporting.jsonl --output dataset.jsonl
agent-tracker eval-plan ellzaf-reporting.jsonl --output eval-plan.json
agent-tracker experiment-manifest --from-repair-pack repair-pack.json
agent-tracker map-events --config ellzaf-mapping.toml --output ellzaf-events.jsonl
agent-tracker queue-health
agent-tracker flush
agent-tracker flush --drain
agent-tracker flush --dry-run
agent-tracker doctor-upload
agent-tracker canary
```

Only `flush`, `doctor-upload --live`, and `canary --live` use the network. The
other commands inspect local files, print package prompts, validate JSONL, or
prepare dry-run batches.

## Development

Run the package checks:

```bash
python -m pytest
python -m ruff check src tests
python -m build
```

The package has no runtime dependencies outside the Python standard library.

## Learn With Ellzaf

Ellzaf teaches the full AI trading-agent build at
[ellzaf.com](https://ellzaf.com).

You can buy:

- the Blueprint For AI Trade ebook;
- the reference AI trading-agent code;
- a guided setup if you want Ellzaf to help you install and configure the
  system.

Agents built from those materials need fewer integration changes because they
already follow the telemetry surfaces this SDK expects.
