Metadata-Version: 2.4
Name: proffer
Version: 0.5.3
Summary: Fast and flexible tool for protein inference
Home-page: https://github.com/seerbio/proffer
Author: Seth Just
Author-email: sjust@seer.bio
Project-URL: Bug Tracker, https://github.com/seerbio/proffer/issues
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: joblib>=1.4.0
Requires-Dist: polars
Requires-Dist: scipy<2.0.0,>=1.14.0
Requires-Dist: fsspec
Requires-Dist: click
Requires-Dist: preppers<2.0.0,>=0.2.0
Provides-Extra: s3
Requires-Dist: fsspec[s3]; extra == "s3"
Provides-Extra: spark
Requires-Dist: pandas; extra == "spark"
Requires-Dist: pyspark; extra == "spark"
Requires-Dist: wheely-mammoth; extra == "spark"
Provides-Extra: dev
Requires-Dist: pre-commit>=2.7.1; extra == "dev"
Requires-Dist: black>=20.8b1; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-benchmark; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: types-setuptools; extra == "dev"
Requires-Dist: scipy-stubs; extra == "dev"
Requires-Dist: pandas; extra == "dev"
Requires-Dist: pyarrow>=8.0.0; extra == "dev"
Requires-Dist: pyspark; extra == "dev"
Requires-Dist: wheely-mammoth; extra == "dev"
Dynamic: license-file

<img alt="proffer logo" src="./docs/_static/proffer-logo.png" height="128" align="left" style="margin: 8px">

**Proffer** is fast and flexible tool for *pro*tein in*fer*ence in bottom-up proteomics experiments.

## Installation  

This library requires Python 3.10+ and can be installed with pip:  

```shell
pip install proffer
```

## Basic Usage  

After installation, you can run the `proffer` CLI tool:

```shell
proffer data/percolator.txt --format tsv
```

You can also load results directly from any URI supported by [`fsspec`](https://filesystem-spec.readthedocs.io/en/latest/):

```shell
# HTTP(S)
proffer https://github.com/seerbio/proffer/raw/refs/heads/main/data/percolator.txt --format tsv

# S3
pip install proffer[s3]  # ensure libraries for S3 support are installed
proffer s3://bucket/key/results.parquet --format parquet
```

Basic support is included for thresholding peptide identifications before inference:

```shell
proffer data/percolator.txt --format tsv --qvalue-threshold 0.01
```

By default output is written in JSON format to stdout.
Alternatively, the `-o` flag can be used to write results to Parquet:

```shell
proffer data/percolator.txt --output proffer-results.parquet
```

It is possible to configure the column names used, as well as the approach used to pick protein groups.
To learn more about the available options you can run:

```shell
proffer --help
```

## Python usage

To use Proffer in your Python code, call `proffer.infer` with a Polars DataFrame:

```python
import proffer

result_frame = proffer.infer(polars_frame)
```

It is possible to configure the column names used, as well as the approach used to pick protein groups.
To learn more about the available options you can run:

```python
help(proffer.infer)
```

## Usage with Fulcrum Pipeline

Utilities for working with Spark-based datasets can be found in `proffer.spark`.
These use interfaces from [`wheely-mammoth`](https://github.com/seerbio/wheely-mammoth)
(compatible with [Fulcrum Pipeline™](https://github.com/seerbio/fulcrum)) to efficiently read peptide and protein
information and compute inference.

For typical use cases, it will be easiest to employ a pre-built Fulcrum workflow that uses Proffer for
inference, rather than call these utilities directly.

By default, installing Proffer will not install Spark dependencies; to use Proffer in an environment
where these are not already installed, run:

```shell
pip install 'proffer[spark]'
```
