Metadata-Version: 2.4
Name: datemonkey
Version: 0.1.0
Summary: Batch date parsing with ambiguity detection, confidence scores, and format lock-in.
Author-email: RexBytes <pythonic@rexbytes.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/RexBytes/datemonkey
Project-URL: Repository, https://github.com/RexBytes/datemonkey
Project-URL: Issues, https://github.com/RexBytes/datemonkey/issues
Keywords: date,parsing,ambiguity,detection,batch,excel
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Text Processing
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# datemonkey

Batch date parsing with ambiguity detection, confidence scores, and format lock-in.

**The problem:** `dateutil.parser.parse("01/02/03")` silently guesses and is often wrong. DD/MM vs MM/DD ambiguity corrupts joins, aggregations, and reports. datemonkey detects ambiguity and tells you about it instead of guessing.

## Install

```bash
pip install datemonkey
```

## Quick Start

### Detect format from a column of values

```python
from datemonkey import detect_format

result = detect_format(["15/03/2024", "20/04/2024", "25/12/2024"])
print(result.format.label)      # "European date (DD/MM/YYYY)"
print(result.confidence)         # Confidence.HIGH
print(result.is_ambiguous)       # False — day > 12 resolves it
```

### Ambiguity detection

```python
result = detect_format(["01/02/2024", "03/04/2024", "05/06/2024"])
print(result.is_ambiguous)       # True
print(result.ambiguities)        # [AmbiguityType.DAY_MONTH_SWAP]
print(result.warnings)
# ["Ambiguous: cannot distinguish US date (MM/DD/YYYY) from European date (DD/MM/YYYY) ..."]
```

### Resolve ambiguity with locale preference

```python
result = detect_format(["01/02/2024", "03/04/2024"], locale_preference="eu")
print(result.format.label)       # "European date (DD/MM/YYYY)"
```

### Parse a batch of dates

```python
from datemonkey import parse_dates

batch = parse_dates(["2024-03-15", "2024-04-20", "2024-12-25"])
print(batch.ok)                  # True
print(batch.dates)               # [datetime(2024,3,15), datetime(2024,4,20), datetime(2024,12,25)]
print(batch.iso_strings)         # ["2024-03-15T00:00:00", ...]
```

### Format lock-in

```python
from datemonkey import parse_dates, ISO_8601

batch = parse_dates(["2024-03-15", "03/15/2024"], format=ISO_8601)
print(batch.results[0].ok)       # True  — matches ISO
print(batch.results[1].ok)       # False — doesn't match, flagged not re-guessed
```

### Strict mode

```python
batch = parse_dates(["01/02/2024", "03/04/2024"], strict=True)
print(batch.parsed_count)        # 0 — refuses to parse ambiguous data
print(batch.warnings)            # ["Strict mode: refusing to parse due to DD/MM vs MM/DD ambiguity..."]
```

### Excel serial dates

```python
from datemonkey import parse_dates, excel_serial_to_datetime

# Single value
dt = excel_serial_to_datetime(45292)  # datetime(2024, 1, 1)

# Batch — auto-detected
batch = parse_dates(["45292", "45293", "45294"])
print(batch.detected_format.label)  # "Excel serial date number"
```

### Per-value results

```python
batch = parse_dates(["2024-03-15", "garbage", "2024-12-25"], format="%Y-%m-%d")
for r in batch.results:
    print(f"{r.original:20s} ok={r.ok}  parsed={r.iso}  warnings={r.warnings}")
# 2024-03-15           ok=True   parsed=2024-03-15T00:00:00  warnings=[]
# garbage              ok=False  parsed=None                  warnings=[...]
# 2024-12-25           ok=True   parsed=2024-12-25T00:00:00  warnings=[]
```

## CLI

```bash
# Detect format
datemonkey detect "15/03/2024" "20/04/2024" "25/12/2024"

# Detect with JSON output
datemonkey detect --json "01/02/2024" "03/04/2024"

# Parse dates
datemonkey parse "2024-03-15" "2024-04-20"

# Parse from CSV file (column 2, skip header)
datemonkey parse --file data.csv --column 2 --skip-header

# Parse with explicit format
datemonkey parse --format "%d-%m-%Y" "15-03-2024"

# Parse in strict mode
datemonkey parse --strict "01/02/2024" "03/04/2024"

# List known formats
datemonkey formats
```

## API Reference

### `detect_format(values, *, locale_preference=None, formats=None) -> FormatDetectionResult`

Analyze a batch and determine the most likely format, reporting ambiguity.

- **values**: List of date-like values (strings, ints, floats, None)
- **locale_preference**: `"us"` for MM/DD, `"eu"` for DD/MM (only used when data alone can't resolve)
- **formats**: Custom list of `DateFormat` objects to test

### `parse_dates(values, *, format=None, locale_preference=None, strict=False) -> BatchResult`

Parse a batch with format lock-in.

- **format**: A `DateFormat` object or strftime string. If None, auto-detected.
- **strict**: If True, refuse to parse when DD/MM vs MM/DD is ambiguous.

### `excel_serial_to_datetime(serial) -> datetime | None`

Convert an Excel serial date number to a Python datetime.

### Result Objects

| Object | Key Properties |
|---|---|
| `FormatDetectionResult` | `.format`, `.confidence`, `.is_ambiguous`, `.ambiguities`, `.candidates`, `.warnings` |
| `BatchResult` | `.ok`, `.results`, `.detected_format`, `.dates`, `.iso_strings`, `.failed`, `.succeeded`, `.success_ratio` |
| `DateResult` | `.ok`, `.original`, `.parsed`, `.date`, `.iso`, `.confidence`, `.warnings`, `.row_index` |

### Confidence Levels

| Level | Meaning |
|---|---|
| `HIGH` | Unambiguous parse, format is certain |
| `MEDIUM` | Likely correct, minor ambiguity (e.g. two-digit year) |
| `LOW` | Ambiguous — DD/MM vs MM/DD unresolved, or poor match ratio |
| `FAILED` | Could not parse or detect |

## Design

- **Batch-first**: Designed for columns of data, not single strings
- **No silent guessing**: Ambiguity is reported, not hidden
- **Format lock-in**: Once detected, the format is enforced — violations are flagged
- **Structured results**: Every parse returns confidence scores and warnings
- **Zero dependencies**: Pure Python, stdlib only

## Built for LLMs

datemonkey is designed to work well as a tool for large language models. Date parsing is a common source of silent errors in LLM-driven data pipelines — ambiguous formats lead to wrong guesses, wasted tokens on retries, and broken downstream logic. datemonkey reduces that complexity: a single call returns a structured result with the detected format, confidence level, and any ambiguities — no multi-step prompting or validation loops required. Fewer tokens in, reliable answers out.

## License

MIT
