======================================================================
NER PII Benchmark — NerGuard Base × nvidia-pii
======================================================================

Tier: 2
Evaluated labels: 16
  System labels: 20
  Dataset labels: 54
  Mapping applied: True

Samples: 500
Tokens: 68359

--- Token-Level Metrics ---
  Precision (macro/micro/weighted): 0.4406 / 0.5698 / 0.7410
  Recall    (macro/micro/weighted): 0.5501 / 0.6596 / 0.6596
  F1        (macro/micro/weighted): 0.4168 / 0.6114 / 0.6473

--- Entity-Level Metrics (seqeval) ---
  Precision: 0.5469
  Recall:    0.6840
  F1:        0.6078

--- Latency ---
  Mean:   34.98 ms
  Median: 29.48 ms
  P95:    53.45 ms
  P99:    79.45 ms
  Throughput: 28.6 samples/sec

--- Per-Entity F1 Scores ---
  B-date_of_birth                P=1.0000  R=1.0000  F1=1.0000  (n=68.0)
  B-email                        P=0.9619  R=0.8860  F1=0.9224  (n=228.0)
  B-last_name                    P=0.9155  R=0.7065  F1=0.7975  (n=184.0)
  B-time                         P=0.8235  R=0.7467  F1=0.7832  (n=75.0)
  B-date                         P=0.7771  R=0.7423  F1=0.7593  (n=357.0)
  B-first_name                   P=0.7321  R=0.7791  F1=0.7549  (n=249.0)
  I-phone_number                 P=0.6095  R=0.9450  F1=0.7410  (n=109.0)
  I-street_address               P=0.9936  R=0.5735  F1=0.7273  (n=272.0)
  I-date                         P=0.8636  R=0.6230  F1=0.7238  (n=122.0)
  I-time                         P=0.9394  R=0.5849  F1=0.7209  (n=53.0)
  B-city                         P=0.5183  R=0.7456  F1=0.6115  (n=114.0)
  B-phone_number                 P=0.4907  R=0.6870  F1=0.5725  (n=115.0)
  I-last_name                    P=0.3333  R=1.0000  F1=0.5000  (n=1.0)
  B-postcode                     P=0.3381  R=0.8545  F1=0.4845  (n=55.0)
  B-age                          P=0.2963  R=0.8889  F1=0.4444  (n=18.0)
  I-city                         P=0.2857  R=0.8333  F1=0.4255  (n=24.0)
  B-street_address               P=0.2778  R=0.2885  F1=0.2830  (n=104.0)
  B-ssn                          P=0.1355  R=0.9667  F1=0.2377  (n=30.0)
  B-tax_id                       P=0.1176  R=0.6667  F1=0.2000  (n=12.0)
  B-certificate_license_number   P=0.0896  R=0.3750  F1=0.1446  (n=32.0)
  I-first_name                   P=0.0769  R=1.0000  F1=0.1429  (n=1.0)
  I-credit_debit_card            P=1.0000  R=0.0395  F1=0.0759  (n=152.0)
  B-credit_debit_card            P=0.2000  R=0.0192  F1=0.0351  (n=52.0)
  B-gender                       P=0.0000  R=0.0000  F1=0.0000  (n=19.0)
  I-certificate_license_number   P=0.0000  R=0.0000  F1=0.0000  (n=3.0)
  I-email                        P=0.0000  R=0.0000  F1=0.0000  (n=0.0)
  I-postcode                     P=0.0000  R=0.0000  F1=0.0000  (n=1.0)
  I-ssn                          P=0.0000  R=0.0000  F1=0.0000  (n=0.0)
  I-tax_id                       P=0.0000  R=0.0000  F1=0.0000  (n=0.0)

--- Per-Length Bucket ---
  short   : F1=0.7442 (n=6)
  medium  : F1=0.4577 (n=123)
  long    : F1=0.4193 (n=371)

--- Error Summary ---
  False positives: 53
  False negatives: 360