Metadata-Version: 2.4
Name: fastcsv-python
Version: 0.1.0
Summary: Fast CSV parsing for Python via C + SIMD
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: C
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.20
Provides-Extra: bench
Requires-Dist: pandas; extra == "bench"
Requires-Dist: polars; extra == "bench"
Requires-Dist: pyarrow; extra == "bench"
Provides-Extra: arrow
Requires-Dist: pyarrow>=10.0; extra == "arrow"
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# fastCSV

The fastest CSV parser for Python. Written in C with AVX2 SIMD acceleration,
memory-mapped I/O, and zero-copy columnar NumPy output.

## Benchmarks

| File             | fastcsv  | pandas   | polars   |
|------------------|----------|----------|----------|
| 1M rows mixed    | 0.053s   | 1.086s   | 0.034s   |
| 100k rows wide   | 0.027s   | 0.380s   | 0.033s   |
| 500k rows str    | 0.020s   | 0.935s   | 0.015s   |

Hardware: AMD Ryzen 5 7235HS

## Installation

    pip install fastcsv-python

## Usage

```python
import fastcsv

# Returns dict of column_name -> numpy.ndarray
result = fastcsv.read_csv("data.csv")
result = fastcsv.read_csv("data.csv", delimiter=",", has_header=True, error_mode="strict")

# Streaming â€” constant memory regardless of file size
for chunk in fastcsv.reader("data.csv", chunk_size=10_000):
    process(chunk)
```

## Options

| Parameter | Default | Description |
|-----------|---------|-------------|
| `delimiter` | `","` | Field separator. Any single character. |
| `has_header` | `True` | Treat first row as column names. |
| `error_mode` | `"strict"` | `"strict"` / `"skip"` / `"replace"` |
| `chunk_size` | `10000` | Rows per chunk for `reader()`. |

## How it's fast

- **mmap I/O** â€” the OS pages in only what is needed. No `read()` syscalls.
- **AVX2 SIMD** â€” scans 32 bytes per cycle to find delimiters. Falls back to SSE4.2 (16 bytes) or scalar automatically.
- **Zero-copy output** â€” int and float columns are handed to NumPy directly from the C buffer. No data is copied.
- **Single-pass type inference** â€” column types resolved in one pass, never re-scanned.
- **GIL released** â€” the entire parse phase runs without the GIL. Safe for multi-threaded use.

## Error recovery

- `strict` â€” raises `ValueError` on any RFC 4180 violation.
- `skip` â€” silently drops malformed rows.
- `replace` â€” replaces malformed fields with empty string.

## Building from source

    pip install numpy
    pip install -e .
    make test-all

## License

MIT
904c2c43-028b-4195-83f8-cd7f0dcb5d2b
