Metadata-Version: 2.4
Name: lyapmon
Version: 0.1.0
Summary: Online Lyapunov-drift monitor for ML retraining loops: alert when the loop trends unstable, before eval metrics show it.
Project-URL: Homepage, https://github.com/sophie-nguyenthuthuy/lyapmon
Project-URL: Repository, https://github.com/sophie-nguyenthuthuy/lyapmon
Project-URL: Issues, https://github.com/sophie-nguyenthuthuy/lyapmon/issues
Project-URL: Changelog, https://github.com/sophie-nguyenthuthuy/lyapmon/blob/main/CHANGELOG.md
Author: Thuy Nguyen
License: Apache-2.0
License-File: LICENSE
Keywords: airflow,drift,lyapunov,mlflow,mlops,observability,retraining,stability
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: System :: Monitoring
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24
Provides-Extra: airflow
Requires-Dist: apache-airflow>=2.7; extra == 'airflow'
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: mlflow
Requires-Dist: mlflow-skinny>=2.9; extra == 'mlflow'
Provides-Extra: plot
Requires-Dist: matplotlib>=3.7; extra == 'plot'
Provides-Extra: prometheus
Requires-Dist: prometheus-client>=0.19; extra == 'prometheus'
Description-Content-Type: text/markdown

# lyapmon

[![ci](https://github.com/sophie-nguyenthuthuy/lyapmon/actions/workflows/ci.yml/badge.svg)](https://github.com/sophie-nguyenthuthuy/lyapmon/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/lyapmon)](https://pypi.org/project/lyapmon/)
[![Python](https://img.shields.io/pypi/pyversions/lyapmon)](https://pypi.org/project/lyapmon/)
[![License: Apache-2.0](https://img.shields.io/badge/license-Apache--2.0-blue)](LICENSE)

**Online Lyapunov-drift monitoring for ML retraining loops.**

Your retraining DAG is a closed-loop dynamical system: the model shapes the
data that trains the next model (`data → train → deploy → data`). Closed
loops can go unstable — exposure bias, label feedback, recursive training —
and when they do, the holdout eval is the *last* place it shows up.

`lyapmon` watches the loop the way control engineering watches a plant. Each
cycle it builds a small state vector `x_k` from observables the pipeline
already has, evaluates a Lyapunov candidate `V(x_k)`, and runs an online test
on the drift

```
ΔV_k = V(x_k+1) − V(x_k)
```

While the expected drift is negative the loop is contracting toward its
commissioned-good state and may run autonomously. The first *sustained
positive drift* fires an alert — and, wired as an Airflow gate, blocks the
auto-deploy edge and pulls a human back in. That is **bounded delegation**
packaged as an observability tool: the loop earns its autonomy cycle by
cycle, and loses it the moment the stability evidence does.

```
ingest ──▶ train ──▶ evaluate ──▶ lyapunov_gate ──▶ deploy
                                       │
                                       ▼  E[ΔV] > 0, sustained
                                  ✗ fail task: block deploy, page a human
```

## Why drift on V, not a threshold on eval loss?

Eval loss on a fixed holdout grows **quadratically** in the model's bias —
it stays inside its noise band long after the loop has gone divergent.
Distribution observables (training-batch PSI, prediction shift, parameter
movement) grow **linearly**, and a trend test on increments fires before a
level test on a lagging metric. The bundled simulation measures exactly this
lead time against the rule it replaces (eval mean + 3σ, 2 consecutive):

```
$ lyapmon simulate --feedback-gain 0.65
...
lyapmon UNSTABLE at cycle 34
naive eval-loss alarm (mean+3σ, 2 consecutive) at cycle 42
lead time: 8 cycles
```

Mean lead over a 10-seed sweep is ~4 cycles with zero false alarms on
stable and near-critical loops (asserted in `tests/test_sim.py`). With
delayed outcome labels — the usual production reality — the lead widens
(`--label-delay 5` → 12 cycles), because the state vector is built from
label-free observables that stay current while the eval waits for labels.

![demo plot](demo/demo.png)

## Install

```bash
pip install lyapmon              # core: numpy only
pip install 'lyapmon[mlflow]'    # + MLflow logging/backfill
pip install 'lyapmon[prometheus]'# + Pushgateway export
pip install 'lyapmon[plot]'      # + simulation plots
```

## Quickstart

```python
from lyapmon import LyapunovMonitor, JSONLStore, WebhookAlerter, psi, mean_shift

monitor = LyapunovMonitor(
    features=["eval_auc", "psi_train", "pred_shift", "weight_delta"],
    warmup=10,                                  # cycles assumed healthy; fits V
    store=JSONLStore("/shared/lyapmon/history.jsonl"),
    alerters=[WebhookAlerter("https://hooks.slack.com/services/...")],
)

verdict = monitor.observe(
    {
        "eval_auc": auc,
        "psi_train": psi(reference_features, batch_features),
        "pred_shift": mean_shift(reference_preds, current_preds),
        "weight_delta": weight_delta_norm(prev_weights, new_weights),
    },
    cycle_id=run_id,
)

if verdict.unstable:
    block_deploy()   # verdict.top_contributors says which observable moved
```

The monitor is stateless across processes — everything (baseline, detector
state, previous `V`) checkpoints into the store, so a fresh instance per DAG
run behaves identically to a long-lived one (this is tested).

### Airflow gate

```python
from lyapmon.integrations.airflow import lyapunov_gate_callable
from airflow.operators.python import PythonOperator

gate = PythonOperator(
    task_id="lyapunov_gate",
    python_callable=lyapunov_gate_callable,
    op_kwargs=dict(
        features=["eval_auc", "psi_train", "pred_shift", "weight_delta"],
        history_path="/shared/lyapmon/history.jsonl",
        xcom_task_id="evaluate",        # evaluate task pushes the metrics dict
    ),
)
ingest >> train >> evaluate >> gate >> deploy
```

On sustained positive drift the gate raises `LoopUnstableError`: the deploy
never runs, the DAG run is red, your existing on-call alerting takes it from
there. After remediation, `monitor.rebaseline()` (or delete the checkpoint)
re-commissions the loop with a fresh warmup.

### MLflow

```python
from lyapmon.integrations.mlflow import log_verdict, states_from_experiment

log_verdict(verdict)                 # lyapmon.V / .delta_V / .drift next to your run metrics

# Backfill a monitor over an existing retraining history:
for run_id, metrics in states_from_experiment("churn-retrain", FEATURES):
    monitor.observe(metrics, cycle_id=run_id)
```

### Prometheus / Grafana

```python
from lyapmon.integrations.prometheus import write_textfile
write_textfile(verdict, "/var/lib/node_exporter/lyapmon.prom", {"pipeline": "churn"})
```

Alert on `lyapmon_status >= 3`; graph `lyapmon_drift` against
`lyapmon_drift_threshold` for the money chart.

### Shell / BashOperator

```bash
lyapmon check --history /shared/history.jsonl \
  --features eval_auc,psi_train --metrics '{"eval_auc":0.91,"psi_train":0.04}' \
  --fail-on-unstable
lyapmon report --history /shared/history.jsonl
```

## How it works

1. **State vector.** You name the observables; helpers (`psi`, `ks_distance`,
   `mean_shift`, `rate_shift`, `weight_delta_norm`) compute the standard ones
   from raw arrays. Everything is sample-only — no oracle access to truth.
2. **Lyapunov candidate.** Default is a diagonal Mahalanobis distance to a
   baseline fitted on the warmup window: `V(x) = Σᵢ ((xᵢ − x*ᵢ)/σᵢ)²` —
   positive definite around the commissioned-good state, unitless across
   mixed-scale features. A full quadratic form (`QuadraticV`) or any callable
   (`CallableV`, e.g. a learned/certified candidate) drops in unchanged.
3. **Drift test.** The conditional drift `E[ΔV|x]` is estimated by an EWMA of
   the increments; the alert threshold is calibrated from warmup noise
   (`z · σ_ΔV · √(λ/(2−λ))`) and must be breached `consecutive` cycles. A
   one-sided Page-Hinkley accumulator runs alongside to catch slow drift that
   hides under the EWMA threshold. Either detector ⇒ `UNSTABLE`.
4. **Verdict.** `STABLE` / `WARNING` / `UNSTABLE` plus the numbers and the
   top contributors to `V` (which observable is pushing the loop out).

The theory anchor is the Foster–Lyapunov drift criterion: negative expected
one-step drift of a positive-definite `V` outside a small set implies
stochastic stability. `lyapmon` monitors the empirical contrapositive — when
the drift estimate turns and stays positive, the contraction evidence is
gone, so the autonomy should be too. It is an early-warning instrument, not
a certificate; for the certificate-side story (CEGIS-learned, dReal-verified
candidates) see the companion project `lyacert`.

## Demo

```bash
lyapmon simulate --feedback-gain 0.3            # below critical gain: stable forever
lyapmon simulate --feedback-gain 0.65           # slow-burn divergence, alarm + lead time
lyapmon simulate --feedback-gain 0.65 --plot demo.png
```

The simulated loop retrains on data partially generated under its own
influence (exposure bias with amplification κ); the closed-loop pole is
`1 − lr + lr·g·κ`, so instability is a *knob*, not an anecdote — critical
gain `g* = 1/κ` exactly. See [demo/DEMO.md](demo/DEMO.md) for the full
Airflow + MLflow conference demo and talk track.

## Development

```bash
uv venv .venv && uv pip install -e '.[dev,plot]'
.venv/bin/pytest
.venv/bin/ruff check src tests
```

See [CONTRIBUTING.md](CONTRIBUTING.md) for what makes a good PR (new state
helpers, new orchestrator gates, detector invariants).

## Citing

If you use lyapmon in your work, please cite it ([CITATION.cff](CITATION.cff)):

```bibtex
@software{lyapmon,
  author  = {Nguyen, Thuy},
  title   = {lyapmon: online Lyapunov-drift monitoring for ML retraining loops},
  url     = {https://github.com/sophie-nguyenthuthuy/lyapmon},
  version = {0.1.0},
  year    = {2026},
  license = {Apache-2.0},
}
```

## License

Apache-2.0.
