Data Flow
Sequence of operations as data moves through the anonymization pipeline.
sequenceDiagram
participant U as User / Client
participant FP as Format Parser
participant PS as PII Scanner
participant ME as Mapping Engine
participant R as Replacer
participant O as Output Writer
U->>FP: Submit data (CSV, JSON, text, DB dump)
activate FP
FP->>FP: Detect format & parse structure
FP->>PS: Structured records
deactivate FP
activate PS
PS->>PS: Run pattern recognizers (regex)
PS->>PS: Run NER model (spaCy)
PS->>PS: Run column heuristics
PS->>PS: Merge results + confidence scoring
PS->>ME: Detected PII entities with spans
deactivate PS
activate ME
ME->>ME: Lookup existing mapping
alt PII not yet mapped
ME->>ME: Generate fake replacement
ME->>ME: Store in session index
end
ME->>R: Mapping pairs (original -> fake)
deactivate ME
activate R
R->>R: Replace PII spans in records
R->>O: Anonymized records
deactivate R
activate O
O->>O: Serialize to original format
O->>U: Anonymized output file
deactivate O
Note over ME: Session index destroyed after completion
Component Architecture
System components and their relationships.
graph TB
subgraph Input Layer
A[File Input
CSV, JSON, Text] --> B[Format Parser]
A2[DB Input
PostgreSQL, MySQL] --> B
end
subgraph Detection Layer
B --> C[PII Scanner]
C --> C1[Pattern Recognizers
Regex-based]
C --> C2[NER Model
spaCy / Transformer]
C --> C3[Column Heuristics
Header inference]
C1 --> D[Confidence Merger]
C2 --> D
C3 --> D
end
subgraph Mapping Layer
D --> E[Mapping Engine]
E <--> F[(Session Index
PII to Fake mapping)]
E --> G[Faker Provider
Locale-aware generation]
end
subgraph Output Layer
E --> H[Replacer]
H --> I[Format Writer]
I --> J[Anonymized Output]
end
style A fill:#e8f4f8,stroke:#4361ee,color:#1a1a2e
style A2 fill:#e8f4f8,stroke:#4361ee,color:#1a1a2e
style B fill:#fff,stroke:#dee2e6,color:#1a1a2e
style C fill:#fff3e6,stroke:#ff6b35,color:#1a1a2e
style C1 fill:#fff8f0,stroke:#dee2e6,color:#495057
style C2 fill:#fff8f0,stroke:#dee2e6,color:#495057
style C3 fill:#fff8f0,stroke:#dee2e6,color:#495057
style D fill:#fff3e6,stroke:#ff6b35,color:#1a1a2e
style E fill:#e6f7f5,stroke:#2ec4b6,color:#1a1a2e
style F fill:#fff8f0,stroke:#adb5bd,color:#495057
style G fill:#fff,stroke:#dee2e6,color:#495057
style H fill:#fff,stroke:#dee2e6,color:#1a1a2e
style I fill:#fff,stroke:#dee2e6,color:#1a1a2e
style J fill:#e8f4f8,stroke:#4361ee,color:#1a1a2e
Entity Detection Flowchart
Decision tree for how an entity gets classified and anonymized.
flowchart TD
A[Input Text / Field Value] --> B{Column header
suggests PII?}
B -->|Yes| C[Assign PII type
from header name]
B -->|No| D{Matches regex
pattern?}
D -->|Yes| E[Assign PII type
from pattern match]
D -->|No| F{NER model
detects entity?}
F -->|Yes| G[Assign PII type
from NER label]
F -->|No| H[Not PII
Pass through unchanged]
C --> I{Confidence
above threshold?}
E --> I
G --> I
I -->|Yes| J[Lookup in
Mapping Index]
I -->|No| H
J --> K{Existing
mapping found?}
K -->|Yes| L[Reuse existing
fake value]
K -->|No| M[Generate new
fake value via Faker]
M --> N[Store in
Mapping Index]
N --> O[Replace PII
in output]
L --> O
O --> P[Anonymized Value]
style A fill:#e8f4f8,stroke:#4361ee,color:#1a1a2e
style H fill:#f8f9fa,stroke:#dee2e6,color:#868e96
style P fill:#e6f7f5,stroke:#2ec4b6,color:#1a1a2e
style J fill:#e6f7f5,stroke:#2ec4b6,color:#1a1a2e
style N fill:#e6f7f5,stroke:#2ec4b6,color:#1a1a2e
style L fill:#e6f7f5,stroke:#2ec4b6,color:#1a1a2e
style M fill:#fff3e6,stroke:#ff6b35,color:#1a1a2e
style C fill:#fff3e6,stroke:#ff6b35,color:#1a1a2e
style E fill:#fff3e6,stroke:#ff6b35,color:#1a1a2e
style G fill:#fff3e6,stroke:#ff6b35,color:#1a1a2e
Cross-Table FK Integrity
How foreign key relationships are preserved through consistent mapping across tables.
graph LR
subgraph Input Tables
T1["orders
--------
customer_id: 42
customer_name: Jane Smith
amount: $142"]
T2["support_tickets
--------
user_id: 42
user_name: Jane Smith
issue: Refund"]
end
subgraph Global Mapping Index
M1["customer_id: 42 -> 7701"]
M2["Jane Smith -> Maria Lopez"]
M3["Bob Jones -> Tom Chen"]
end
subgraph Output Tables
O1["orders
--------
customer_id: 7701
customer_name: Maria Lopez
amount: $142"]
O2["support_tickets
--------
user_id: 7701
user_name: Maria Lopez
issue: Refund"]
end
T1 --> M1
T1 --> M2
T2 --> M1
T2 --> M2
M1 --> O1
M2 --> O1
M1 --> O2
M2 --> O2
style T1 fill:#fff3e6,stroke:#ff6b35,color:#1a1a2e
style T2 fill:#fff3e6,stroke:#ff6b35,color:#1a1a2e
style M1 fill:#e6f7f5,stroke:#2ec4b6,color:#1a1a2e
style M2 fill:#e6f7f5,stroke:#2ec4b6,color:#1a1a2e
style M3 fill:#e6f7f5,stroke:#2ec4b6,color:#1a1a2e
style O1 fill:#e8f4f8,stroke:#4361ee,color:#1a1a2e
style O2 fill:#e8f4f8,stroke:#4361ee,color:#1a1a2e