Metadata-Version: 2.4
Name: dms-py
Version: 0.5.1
Summary: Pure-Python decoder for DMS, a data syntax with strong typing, ordered maps, and heredocs.
Author: Filip Lopes
License: MIT OR Apache-2.0
Project-URL: Repository, https://gitlab.com/flo-labs/pub/dms-py
Keywords: dms,config,parser
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE-APACHE
License-File: LICENSE-MIT
Dynamic: license-file

<p align="center"><a href="https://gitlab.com/flo-labs/pub/dms"><img src="assets/logo.png" alt="DMS" width="120"></a></p>

# dms-py

Python implementations of **[DMS](https://gitlab.com/flo-labs/pub/dms)**, a data syntax with strong typing,
ordered maps, multi-line heredocs, and front-matter metadata.

Two packages live in this repo:

| dir       | PyPI name | description                                            |
| --------- | --------- | ------------------------------------------------------ |
| `dms_py/` | `dms-py`  | pure-Python reference decoder                          |
| `dms-c/`  | `dms-c`   | CPython extension wrapping the C decoder (much faster) |

The `dms-c` package compiles `external/dms-c/dms.c` (a git submodule). After
cloning, run:

```sh
git submodule update --init --recursive
```

## What DMS looks like

A medium-size tier-0 document, exercising every feature you'd touch in a
real config — front matter, comments (line + trailing), nested tables,
list-of-tables with the `+` marker, flow forms, distinct types, and a
heredoc with a trim modifier:

```dms
+++
title:    "DMS feature tour"
version:  "1.0.0"
updated:  2026-04-24T09:30:00-04:00
+++

# Hash and // line comments both work.
// Bare keys allow full Unicode; quoted keys take any string.

database:
  host:    "db.internal"
  port:    5432            # bumped after the LB change
  pool:    { size: 10, idle_timeout_s: 30 }   # flow table

servers:
  + name: "web1"
    disks:
      + mount: "/"
        size_gb: 100
      + mount: "/var"
        size_gb: 500
  + name: "web2"

regions: ["us-east-1", "eu-west-1", "ap-south-1"]

sql: """SQL _trim("\n", ">")
    SELECT id, email
      FROM users
     WHERE active = true
    SQL
```

Tier 1 layers structured decorators on top of the value tree. Sigils bind
to families published by a dialect; here is `dms+html` carrying an HTML
fragment as a DMS document:

```dms
+++
_dms_tier: 1
_dms_imports:
  + dialect: "html"
    version: "1.0.0"
+++

+ |html(lang: "en")
  + |head
    + |title "DMS feature tour"
    + |meta(charset: "UTF-8")
  + |body(class: "main")
    + |h1 "Welcome to DMS"
    + |p(class: "lede")
      + "Click "
      + |a(href: "/spec.html") "here"
      + " to read the spec."
```

Full feature tour, format comparison, and dialect index on the
**[DMS website](https://flo-labs.gitlab.io/pub/dms-webpage/)**.

## Install (pure Python)

```sh
pip install dms-py
```

```python
import dms_py

with open("config.dms") as f:
    src = f.read()

# Body-only (drops front matter and comments after decode).
body = dms_py.decode(src)

# Full document (preserves comments + literal forms for encode round-trip).
doc = dms_py.decode_document(src)
doc.meta            # OrderedDict | None — None when there is no `+++` block
doc.body            # decoded root value
doc.comments        # list[AttachedComment]
doc.original_forms  # list[(path, OriginalLiteral)]

# Re-emit DMS source.
output = dms_py.encode(doc)
```

> The legacy `parse`, `parse_document`, `to_dms`, and `ParseError`
> names from SPEC v0.13 still work in 0.3 as deprecated thin aliases
> that emit `DeprecationWarning` and forward to the new entry points
> (`decode` / `encode` / `DecodeError`). They will be removed one
> release later — switch to the new names now. Encoder errors now
> raise `EncodeError` (subclass of `Exception`) instead of bare
> `ValueError`.

## Install (C-extension)

```sh
pip install dms-c
```

```python
import dms_c

doc = dms_c.decode(open("config.dms").read())
```

Same API surface; the C variant is several times faster on wide-flat documents.

## Performance

50,000-key flat document (~700 KB), best-of-5, startup-subtracted,
CPython 3.13 on Windows 11:

| tier        | DMS port  | time     | JSON peer        | time     | YAML peer            | time      | DMS / JSON | DMS / YAML |
|-------------|-----------|----------|------------------|----------|----------------------|-----------|------------|------------|
| pure Python | `dms-py`  | 360 ms   | n/a              | —        | `PyYAML` SafeLoader  | 2,388 ms  | n/a        | **0.15× — DMS ~6.6× faster** |
| native (C)  | `dms-c`   | 51 ms    | `json` (stdlib)  | 11 ms    | `PyYAML` CSafeLoader | 395 ms    | 4.73×      | **0.13× — DMS ~7.7× faster** |

The C extension is ~7× faster than pure Python. Against C-backed peers
DMS sits at ~4.7× the JSON cost — the standard cost of carrying
comments, ordered keys, and source-form metadata — and ~7× faster than
libyaml.

Python's stdlib `json` is C-backed and there's no widely-used
pure-Python JSON parser to compare against, so JSON only appears in
the FFI tier (same situation as Ruby and Node). The pure-Python YAML
peer is `PyYAML` with `SafeLoader` (the pure-Python loader); the
C-backed YAML peer is the same `PyYAML` library with `CSafeLoader`
(libyaml-bound).

Reproduce with:

```sh
pip install pyyaml                                     # YAML baselines
python C:/Users/<you>/projects/dms-tests/gen_bench_fixtures.py
py bench/bench_decoders.py --iters 5 --warmup 2
```

## Value shape

| DMS type        | Python type                              |
| --------------- | ---------------------------------------- |
| bool            | `bool`                                   |
| integer         | `int`                                    |
| float           | `float`                                  |
| string          | `str`                                    |
| local-date      | `dms_py.LocalDate`                       |
| local-time      | `dms_py.LocalTime`                       |
| local-datetime  | `dms_py.LocalDateTime`                   |
| offset-datetime | `dms_py.OffsetDateTime`                  |
| table           | `OrderedDict[str, value]`                |
| list            | `list`                                   |

Datetime classes wrap the source lexeme as a `str` (already
SPEC-validated), so inspecting them never re-parses. Tables use
`collections.OrderedDict` to make the insertion-order requirement
explicit (CPython 3.7+ dicts already preserve order, but the
spec mandates it).

## Working with comments and heredocs

DMS preserves comments through decode → mutate → re-emit (SPEC
§Comments). The `Document` carries them on a side-channel keyed by
breadcrumb path; the same shape lets you attach a comment to a value
*after* decoding and have it round-trip through `encode`:

```python
import dms_py
from dms_py import Comment, AttachedComment

doc = dms_py.decode_document("db:\n  port: 8080\n")

# Mutate a value in place.
doc.body["db"]["port"] = 5432

# Attach a leading line comment to db.port.
doc.comments.append(AttachedComment(
    comment=Comment(content="# bumped after LB change", kind="line"),
    position="leading",
    path=("db", "port"),
))

print(dms_py.encode(doc))
```

### Forcing a heredoc on emit

Strings parse and re-emit in their source form. To switch a basic-quoted
string to a heredoc (or to construct one from scratch), append an
`OriginalLiteral.String` record to `doc.original_forms` keyed by the
value's path:

```python
from dms_py import OriginalLiteral, StringForm, HeredocFlavor

doc.body["db"]["greeting"] = "Hello, friend.\nWelcome aboard.\n"

doc.original_forms.append((
    ("db", "greeting"),
    OriginalLiteral.String(
        StringForm.Heredoc(
            flavor=HeredocFlavor.BasicTriple,   # or .LiteralTriple for '''
            label=None,                         # None = unlabeled
            modifiers=[],                       # _trim(...), _fold_paragraphs(), …
        )
    ),
))
```

Round-trip rules (SPEC §Round-trip semantics): comments stick to
*still-present* nodes; deleting a node drops its comments; newly
inserted nodes start with no comments. The first `original_forms`
entry per path wins, so override a parser-recorded form by replacing
rather than appending if the key is already present.

## Build & test (development)

```sh
python -m pip install -e .                  # pure-py, editable
python -m pip install -e ./dms-c             # C-ext, editable

python -m pytest tests/                      # round-trip + comment tests
```

## Conformance

The fixture corpus lives in [dms-tests](https://gitlab.com/flo-labs/pub/dms-tests)
(4500+ pairs). Clone it once as a sibling:

```sh
git clone https://gitlab.com/flo-labs/pub/dms-tests.git ../dms-tests
```

Then run the sweep:

```sh
python3 ../dms-tests/run_conformance.py "python3 encoder.py"
```

`dms-tests` can also drive every implementation in one shot — see its
README for the cross-language workflow.

## Publish

```sh
# pure Python
python -m build                              # creates dist/dms_py-0.1.0*
twine upload dist/dms_py-*

# C extension
cd dms-c && python -m build && twine upload dist/*
```

You'll need `pip install build twine` and a PyPI token configured (`~/.pypirc`).
The C-ext sdist must include `external/dms-c/dms.c` — either initialise the
submodule before `python -m build`, or use a build-time hook to copy the file.

## License

MIT OR Apache-2.0
