Metadata-Version: 2.0
Name: GEOpurify
Version: 0.1
Summary: Making Gene Expression Omnibus data cleansing easy.
Home-page: https://github.com/biologos/GEOpurify
Author: Sasha Illarionov
Author-email: sasha.delly@gmail.com
License: MIT
Download-URL: https://github.com/biologos/GEOpurify/archive/0.1.tar.gz
Keywords: GEO,bioinformatics
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Requires-Dist: GEOparse
Requires-Dist: pandas

* GEOpurify
/Atlas of tools making Gene Expression Omnibus data amicable to machine learning./

** Installation

#+BEGIN_SRC sh
pip install GEOpurify
#+END_SRC

** Example Usage

#+BEGIN_SRC python :results output org drawer
from GEOpurify import GEOpurifier
g = GEOpurifier()
gds_df = g.gdspurify("GDS4376")
#+END_SRC

** Methods

*** ~filepurify(filepath, separation="\t")~

Given a path to a standard table with GEO data, returns a dataframe
with gene expression and GSM ids in separate columns.

*** ~dirpurify(dirname)~

Given a path to a directory with standard tables of data from GEO,
applies ~filepurify~ and return a combined dataframe.

*** ~gdspurify(gds_id, load_extra_features=False)~

Given a GDS id, extracts data on a platform, platform organism,
platform techonolgy type and sample organism used. If
~load_extra_features~ is set to ~True~, extra features are fetched
from the GDS columns.

Saves already processed tables corresponding to the GDS in the
directory ~data/tmp~, while storing the raw GEO data in the directory
~data/raw~.

*** ~gdspolypurify(self, gds_list_path, load_extra_features=False)~

Given a path to a file listing GDS ids, each on a new line, applies
~gdspurify~ to each and combines all the data into one dataframe.


