======================================================================
NER PII Benchmark — Piiranha × nvidia-pii
======================================================================

Tier: 2
Evaluated labels: 14
  System labels: 17
  Dataset labels: 54
  Mapping applied: True

Samples: 1000
Tokens: 135894

--- Token-Level Metrics ---
  Precision (macro/micro/weighted): 0.5131 / 0.7130 / 0.8030
  Recall    (macro/micro/weighted): 0.5020 / 0.5974 / 0.5974
  F1        (macro/micro/weighted): 0.4731 / 0.6501 / 0.6659

--- Entity-Level Metrics (seqeval) ---
  Precision: 0.6765
  Recall:    0.5713
  F1:        0.6195

--- Latency ---
  Mean:   30.91 ms
  Median: 30.38 ms
  P95:    35.57 ms
  P99:    37.41 ms
  Throughput: 32.4 samples/sec

--- Per-Entity F1 Scores ---
  I-account_number               P=1.0000  R=1.0000  F1=1.0000  (n=6.0)
  B-postcode                     P=0.9022  R=0.8646  F1=0.8830  (n=96.0)
  B-credit_debit_card            P=0.9859  R=0.7368  F1=0.8434  (n=95.0)
  I-credit_debit_card            P=1.0000  R=0.7226  F1=0.8390  (n=274.0)
  B-email                        P=0.9626  R=0.6781  F1=0.7957  (n=494.0)
  B-first_name                   P=0.9443  R=0.6840  F1=0.7934  (n=595.0)
  B-last_name                    P=0.9556  R=0.5537  F1=0.7012  (n=428.0)
  I-last_name                    P=0.5000  R=1.0000  F1=0.6667  (n=1.0)
  I-street_address               P=0.9712  R=0.4927  F1=0.6537  (n=479.0)
  B-city                         P=0.5623  R=0.7488  F1=0.6423  (n=211.0)
  I-phone_number                 P=0.6436  R=0.5817  F1=0.6111  (n=208.0)
  B-phone_number                 P=0.6631  R=0.5041  F1=0.5727  (n=246.0)
  B-user_name                    P=0.4877  R=0.6930  F1=0.5725  (n=114.0)
  B-tax_id                       P=0.4545  R=0.6818  F1=0.5455  (n=22.0)
  B-ssn                          P=0.3701  R=0.9038  F1=0.5251  (n=52.0)
  B-account_number               P=0.4394  R=0.5979  F1=0.5066  (n=97.0)
  B-password                     P=1.0000  R=0.3176  F1=0.4821  (n=85.0)
  I-city                         P=0.2319  R=0.7273  F1=0.3516  (n=44.0)
  I-postcode                     P=0.2500  R=0.5000  F1=0.3333  (n=2.0)
  B-street_address               P=0.2108  R=0.2131  F1=0.2120  (n=183.0)
  B-certificate_license_number   P=0.2955  R=0.1512  F1=0.2000  (n=86.0)
  I-first_name                   P=0.0233  R=0.2000  F1=0.0417  (n=5.0)
  I-certificate_license_number   P=0.0000  R=0.0000  F1=0.0000  (n=5.0)
  I-email                        P=0.0000  R=0.0000  F1=0.0000  (n=1.0)
  I-ssn                          P=0.0000  R=0.0000  F1=0.0000  (n=0.0)
  I-tax_id                       P=0.0000  R=0.0000  F1=0.0000  (n=1.0)
  I-user_name                    P=0.0000  R=0.0000  F1=0.0000  (n=0.0)

--- Per-Length Bucket ---
  short   : F1=0.7132 (n=9)
  medium  : F1=0.5198 (n=238)
  long    : F1=0.4992 (n=753)

--- Error Summary ---
  False positives: 48
  False negatives: 500