Metadata-Version: 2.4
Name: atomworks
Version: 1.0.0
Summary: A research-oriented data toolkit for training biomolecular deep-learning foundation models
Project-URL: homepage, https://baker-laboratory.github.io/atomworks-dev/latest
Project-URL: repository, https://github.com/RosettaCommons/atomworks
Project-URL: documentation, https://baker-laboratory.github.io/atomworks-dev/latest
Author-email: Institute for Protein Design <contact@ipd.uw.edu>
License: BSD 3-Clause License
        
        Copyright (c) 2025, Institute for Protein Design, University of Washington
        
        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are met:
        
        * Redistributions of source code must retain the above copyright notice, this
          list of conditions and the following disclaimer.
        
        * Redistributions in binary form must reproduce the above copyright notice,
          this list of conditions and the following disclaimer in the documentation
          and/or other materials provided with the distribution.
        
        * Neither the name of the copyright holder nor the names of its
          contributors may be used to endorse or promote products derived from
          this software without specific prior written permission.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
License-File: LICENSE.md
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.12
Requires-Dist: biotite==1.3.0
Requires-Dist: cython<4,>=3.0.0
Requires-Dist: cytoolz<1,>=0.12.3
Requires-Dist: fastparquet==2024.5.0
Requires-Dist: fire<1,>=0.6.0
Requires-Dist: hydride==1.2.3
Requires-Dist: numpy<2,>=1.25.0
Requires-Dist: pandas<2.3,>=2.2
Requires-Dist: py3dmol<3,>=2.2.1
Requires-Dist: pyarrow==17.0.0
Requires-Dist: pymol-remote>=0.0.5
Requires-Dist: rdkit>=2024.3.5
Requires-Dist: scipy<2,>=1.13.1
Requires-Dist: tqdm<5,>=4.65.0
Requires-Dist: typer<1,>=0.12.5
Provides-Extra: dev
Requires-Dist: debugpy<2,>=1.8.5; extra == 'dev'
Requires-Dist: ipykernel<7,>=6.29.4; extra == 'dev'
Requires-Dist: pre-commit==3.7.1; extra == 'dev'
Requires-Dist: pytest-benchmark<6,>=5.0.0; extra == 'dev'
Requires-Dist: pytest-cov<5,>=4.1.0; extra == 'dev'
Requires-Dist: pytest-dotenv<1,>=0.5.2; extra == 'dev'
Requires-Dist: pytest-testmon<3,>=2.1.1; extra == 'dev'
Requires-Dist: pytest-xdist<4,>=3.6.1; extra == 'dev'
Requires-Dist: pytest<9,>=8.2.0; extra == 'dev'
Requires-Dist: ruff==0.8.3; extra == 'dev'
Provides-Extra: docs
Requires-Dist: matplotlib<4,>=3.10.0; extra == 'docs'
Requires-Dist: pydata-sphinx-theme<1,>=0.16.1; extra == 'docs'
Requires-Dist: sphinx-gallery<1,>=0.19.0; extra == 'docs'
Requires-Dist: sphinx<9,>=8.0.0; extra == 'docs'
Provides-Extra: ml
Requires-Dist: einops==0.7.0; extra == 'ml'
Requires-Dist: torch==2.7.0; extra == 'ml'
Provides-Extra: openbabel
Requires-Dist: openbabel-wheel==3.1.1.22; extra == 'openbabel'
Description-Content-Type: text/markdown

[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![PyPI version](https://img.shields.io/pypi/v/atomworks.svg)](https://pypi.org/project/atomworks/)
[![Python versions](https://img.shields.io/pypi/pyversions/atomworks.svg)](https://pypi.org/project/atomworks/)
[![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg)](https://baker-laboratory.github.io/atomworks-dev/latest)
[![License: BSD 3-Clause](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)

<img src="docs/_static/atomworks_logo_color.svg" width="450" alt="atomworks logo">

**atomworks** is an open-source platform for next-generation biomolecular data processing, conversion, and machine-learning-ready featurization.  
It is composed of two symbiotic libraries:

- **atomworks.io:** A universal Python toolkit for parsing, cleaning, manipulating, and converting biological data (structures, sequences, small molecules). Built on the [biotite](https://www.biotite-python.org/) API, it seamlessly loads and exports between standards like mmCIF, PDB, FASTA, SMILES, MOL, and more.
- **atomworks.ml:** Advanced dataset featurization and sampling for deep learning workflows—using atomworks.io as its structural backbone.

The atomworks ecosystem is designed to eliminate the pain of file conversion and preprocessing, offering scientists and modelers an efficient, unified interface for biomolecular data.

---

## atomworks.io

*A swiss-army knife for biomolecular files in Python*

**atomworks.io** lets you:
- Parse, convert, and clean up any common biological file (structure or sequence).
- Transform all data to a consistent `AtomArray` representation for further analysis or machine learning.
- Model missing atoms, handle ligands/solvents, resolve naming/assembly heterogeneity—all from Python.

Instead of juggling dozens of tools or manual curation, simply load your data with atomworks.io and focus on your research.

---

## atomworks.ml

*Advanced dataset featurization and sampling for deep learning workflows*

**atomworks.ml** provides:
- Ready-made featurization pipelines for entire datasets
- Efficient sampling and batching utilities for training machine learning models
- Seamless integration with atomworks.io for ML-ready feature engineering
- Optimized data structures and workflows designed specifically for deep learning applications

Built on atomworks.io's structural backbone, atomworks.ml bridges the gap between biological data processing and machine learning pipelines.

---

## Installation

```
pip install atomworks # base installation version without torch (for only atomworks.io)
pip install "atomworks[ml]" # with torch and ML dependencies (for atomworks.io plus atomworks.ml)
pip install "atomworks[dev]" # with development dependencies
pip install "atomworks[ml,dev]" # with all dependencies
```

If you are using [uv](https://docs.astral.sh/uv/reference/policies/versioning/) for package management, you can install atomworks with:

```
 uv pip install "atomworks[ml,openbabel,dev]"
```

For more advanced setup options (including how to run workflows via apptainers) see the [full documentation](https://baker-laboratory.github.io/atomworks-dev/latest).

---

## Quick Start

```

from atomworks.io.parser import parse

result = parse(filename="3nez.cif.gz")

for chain_id, info in result["chain_info"].items():
print(chain_id, info["sequence"])

```

Output includes:
- **chain_info** — Sequences/metadata for each chain
- **ligand_info** — Ligand annotation & metrics
- **asym_unit** — Structure (AtomArrayStack)
- **assemblies** — Built biological assemblies
- **metadata** — Experimental and source information

See [usage examples](https://baker-laboratory.github.io/atomworks-dev/latest/auto_examples/).

---

## When to use atomworks.io vs atomworks.ml?

- Use **atomworks.io** when you:
    - Need to parse/clean/convert between biological file formats (mmCIF, PDB, FASTA, etc.)
    - Want a unified structural representation to plug into any downstream analysis or modeling
    - Need structural operations like adding missing atoms, filtering ligands/solvents, or assembly generation

- Use **atomworks.ml** when you:
    - Need to featurize entire datasets for deep learning
    - Want ready-made sampling and batching utilities for training pipelines
    - Already use atomworks.io and want a seamless bridge to ML-ready feature engineering

---

## Contribution

We welcome improvements!  
Please see the [full documentation](https://baker-laboratory.github.io/atomworks-dev/latest) for contribution guidelines.
