Lethe

Data anonymization service, interactive architecture diagrams

Data Flow

Sequence of operations as data moves through the anonymization pipeline.

sequenceDiagram
    participant U as User / Client
    participant FP as Format Parser
    participant PS as PII Scanner
    participant ME as Mapping Engine
    participant R as Replacer
    participant O as Output Writer

    U->>FP: Submit data (CSV, JSON, text, DB dump)
    activate FP
    FP->>FP: Detect format & parse structure
    FP->>PS: Structured records
    deactivate FP

    activate PS
    PS->>PS: Run pattern recognizers (regex)
    PS->>PS: Run NER model (spaCy)
    PS->>PS: Run column heuristics
    PS->>PS: Merge results + confidence scoring
    PS->>ME: Detected PII entities with spans
    deactivate PS

    activate ME
    ME->>ME: Lookup existing mapping
    alt PII not yet mapped
        ME->>ME: Generate fake replacement
        ME->>ME: Store in session index
    end
    ME->>R: Mapping pairs (original -> fake)
    deactivate ME

    activate R
    R->>R: Replace PII spans in records
    R->>O: Anonymized records
    deactivate R

    activate O
    O->>O: Serialize to original format
    O->>U: Anonymized output file
    deactivate O

    Note over ME: Session index destroyed after completion
      

Component Architecture

System components and their relationships.

graph TB
    subgraph Input Layer
        A[File Input
CSV, JSON, Text] --> B[Format Parser] A2[DB Input
PostgreSQL, MySQL] --> B end subgraph Detection Layer B --> C[PII Scanner] C --> C1[Pattern Recognizers
Regex-based] C --> C2[NER Model
spaCy / Transformer] C --> C3[Column Heuristics
Header inference] C1 --> D[Confidence Merger] C2 --> D C3 --> D end subgraph Mapping Layer D --> E[Mapping Engine] E <--> F[(Session Index
PII to Fake mapping)] E --> G[Faker Provider
Locale-aware generation] end subgraph Output Layer E --> H[Replacer] H --> I[Format Writer] I --> J[Anonymized Output] end style A fill:#e8f4f8,stroke:#4361ee,color:#1a1a2e style A2 fill:#e8f4f8,stroke:#4361ee,color:#1a1a2e style B fill:#fff,stroke:#dee2e6,color:#1a1a2e style C fill:#fff3e6,stroke:#ff6b35,color:#1a1a2e style C1 fill:#fff8f0,stroke:#dee2e6,color:#495057 style C2 fill:#fff8f0,stroke:#dee2e6,color:#495057 style C3 fill:#fff8f0,stroke:#dee2e6,color:#495057 style D fill:#fff3e6,stroke:#ff6b35,color:#1a1a2e style E fill:#e6f7f5,stroke:#2ec4b6,color:#1a1a2e style F fill:#fff8f0,stroke:#adb5bd,color:#495057 style G fill:#fff,stroke:#dee2e6,color:#495057 style H fill:#fff,stroke:#dee2e6,color:#1a1a2e style I fill:#fff,stroke:#dee2e6,color:#1a1a2e style J fill:#e8f4f8,stroke:#4361ee,color:#1a1a2e

Entity Detection Flowchart

Decision tree for how an entity gets classified and anonymized.

flowchart TD
    A[Input Text / Field Value] --> B{Column header
suggests PII?} B -->|Yes| C[Assign PII type
from header name] B -->|No| D{Matches regex
pattern?} D -->|Yes| E[Assign PII type
from pattern match] D -->|No| F{NER model
detects entity?} F -->|Yes| G[Assign PII type
from NER label] F -->|No| H[Not PII
Pass through unchanged] C --> I{Confidence
above threshold?} E --> I G --> I I -->|Yes| J[Lookup in
Mapping Index] I -->|No| H J --> K{Existing
mapping found?} K -->|Yes| L[Reuse existing
fake value] K -->|No| M[Generate new
fake value via Faker] M --> N[Store in
Mapping Index] N --> O[Replace PII
in output] L --> O O --> P[Anonymized Value] style A fill:#e8f4f8,stroke:#4361ee,color:#1a1a2e style H fill:#f8f9fa,stroke:#dee2e6,color:#868e96 style P fill:#e6f7f5,stroke:#2ec4b6,color:#1a1a2e style J fill:#e6f7f5,stroke:#2ec4b6,color:#1a1a2e style N fill:#e6f7f5,stroke:#2ec4b6,color:#1a1a2e style L fill:#e6f7f5,stroke:#2ec4b6,color:#1a1a2e style M fill:#fff3e6,stroke:#ff6b35,color:#1a1a2e style C fill:#fff3e6,stroke:#ff6b35,color:#1a1a2e style E fill:#fff3e6,stroke:#ff6b35,color:#1a1a2e style G fill:#fff3e6,stroke:#ff6b35,color:#1a1a2e

Cross-Table FK Integrity

How foreign key relationships are preserved through consistent mapping across tables.

graph LR
    subgraph Input Tables
        T1["orders
        --------
        customer_id: 42
        customer_name: Jane Smith
        amount: $142"]
        T2["support_tickets
        --------
        user_id: 42
        user_name: Jane Smith
        issue: Refund"]
    end

    subgraph Global Mapping Index
        M1["customer_id: 42 -> 7701"]
        M2["Jane Smith -> Maria Lopez"]
        M3["Bob Jones -> Tom Chen"]
    end

    subgraph Output Tables
        O1["orders
        --------
        customer_id: 7701
        customer_name: Maria Lopez
        amount: $142"]
        O2["support_tickets
        --------
        user_id: 7701
        user_name: Maria Lopez
        issue: Refund"]
    end

    T1 --> M1
    T1 --> M2
    T2 --> M1
    T2 --> M2
    M1 --> O1
    M2 --> O1
    M1 --> O2
    M2 --> O2

    style T1 fill:#fff3e6,stroke:#ff6b35,color:#1a1a2e
    style T2 fill:#fff3e6,stroke:#ff6b35,color:#1a1a2e
    style M1 fill:#e6f7f5,stroke:#2ec4b6,color:#1a1a2e
    style M2 fill:#e6f7f5,stroke:#2ec4b6,color:#1a1a2e
    style M3 fill:#e6f7f5,stroke:#2ec4b6,color:#1a1a2e
    style O1 fill:#e8f4f8,stroke:#4361ee,color:#1a1a2e
    style O2 fill:#e8f4f8,stroke:#4361ee,color:#1a1a2e