Metadata-Version: 2.1
Name: biocolabsdk
Version: 1.0.1
Summary: A set of python modules for accessing BioTuring ecosystem on BioColab private server
Author: BioTuring
Author-email: support@bioturing.com
Requires-Python: >=3.7,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: bioflex (>=1.1,<2.0)
Requires-Dist: bioturing-connector (>=1.1,<2.0)
Description-Content-Type: text/markdown

## 1. Usage biocolabsdk:
**The package only allows data submission via BioColab private server. Please configure your tokens in the `User Settings` page.**
### 1.1. Test the connection:
```
# example.py

from biocolabsdk.connector import EConnector

connector = EConnector(
  host="https://yourcompany/t2d_index_tool/,
  token="<input your token here>"
)

connector.get_bbrowserx().test_connection()
```

Example output:

```
Connecting to host at https://yourcompany/t2d_index_tool/api/v1/test_connection
Connection successful
```

### 1.2. Get user groups available for your token:
```
# example.py

from biocolabsdk.connector import EConnector

connector = EConnector(
  host="https://yourcompany/t2d_index_tool/,
  token="<input your token here>"
)

user_groups = connector.get_bbrowserx().get_user_groups()
print(user_groups)
```

Example output:

```
[{'id': 'all_members', 'name': 'All members'}, {'id': 'personal', 'name': 'Personal workspace'}]
```

### 1.3. Submit h5ad (scanpy object):
```
# example.py
from biocolabsdk.connector import EConnector
from bioturing_connector.typing import InputMatrixType
from bioturing_connector.typing import Species

connector = EConnector(
  host="https://yourcompany/t2d_index_tool/,
  token="<input your token here>"
)

# Call this function first to get available groups and their id.
user_groups = connector.get_bbrowserx().get_user_groups()
# Example: user_groups is now [{'id': 'all_members', 'name': 'All members'}, {'id': 'personal', 'name': 'Personal workspace'}]


# Submitting the scanpy object:
connector.submit_h5ad(
  group_id='personal',
  study_s3_keys=['GSE128223.h5ad'],
  study_id='GSE128223',
  name='This is my first study',
  authors=['Huy Nguyen'],
  species=Species.HUMAN.value,
  input_matrix_type=InputMatrixType.RAW.value
)

# Example output:
> [2022-10-10 01:03] Waiting in queue
> [2022-10-10 01:03] Downloading GSE128223.h5ad from s3: 262.1 KB / 432.8 MB
> [2022-10-10 01:03] File downloaded
> [2022-10-10 01:03] Reading batch: GSE128223.h5ad
> [2022-10-10 01:03] Preprocessing expression matrix: 19121 cells x 63813 genes
> [2022-10-10 01:03] Filtered: 19121 cells remain
> [2022-10-10 01:03] Start processing study
> [2022-10-10 01:03] Normalizing expression matrix
> [2022-10-10 01:03] Running PCA
> [2022-10-10 01:03] Running kNN
> [2022-10-10 01:03] Running spectral embedding
> [2022-10-10 01:03] Running venice binarizer
> [2022-10-10 01:04] Running t-SNE
> [2022-10-10 01:04] Study was successfully submitted
> [2022-10-10 01:04] DONE !!!
> Study submitted successfully!
```
Available parameters for `submit_h5ad` function:
```
group_id: str
  ID of the group to submit the data to.

study_s3_keys: List[str]
  List of the s3 key of the studies.

study_id: str, default=None
  Study ID, if no value is specified, use a random uuidv4 string

name: str, default='To be detailed'
  Name of the study.

authors: List[str], default=[]
  Authors of the study.

abstract: str, default=''
  Abstract of the study.

species: str, default='human'
  Species of the study. Can be: **bioturing_connector.typing.Species.HUMAN.value** or **bioturing_connector.typing.Species.MOUSE.value** or **bioturing_connector.typing.Species.NON_HUMAN_PRIMATE.value**

input_matrix_type: str, default='raw'
  If the value of this input is **bioturing_connector.typing.InputMatrixType.NORMALIZED.value**,
  then the software will
  use slot 'X' from the scanpy object and does not apply normalization.
  If the value of this input is **bioturing_connector.typing.InputMatrixType.RAW.value**,then the software will
  use slot 'raw.X' from thescanpy object and apply log-normalization.

min_counts: int, default=None
  Minimum number of counts required
  for a cell to pass filtering.

min_genes: int, default=None
  Minimum number of genes expressed required
  for a cell to pass filtering.

max_counts: int, default=None
  Maximum number of counts required
  for a cell to pass filtering.

max_genes: int, default=None
  Maximum number of genes expressed required
  for a cell to pass filtering.

mt_percentage: Union[int, float], default=None
  Maximum number of mitochondria genes percentage
  required for a cell to pass filtering. Ranging from 0 to 100
```

## 2. Usage bioflex:

### Create a connection using access token:

```python
from biocolabsdk.connector import EConnector

connector = EConnector(
  public_token="<input your token here>"
)
```

### List available databases:

```python
databases = connection.get_bioflex().databases()
```
>```
> [DataBase(id="5010c7d573ae4ff2b9691422b99aa2cd",
>           name="BioTuring database",species="human",version=1),
> DataBase(id="5010c7d573ae4ff2b9691422b99aa2cd",
>           name="BioTuring database",species="human",version=2),
> DataBase(id="5010c7d573ae4ff2b9691422b99aa2cd",
>           name="BioTuring database",species="human",version=3)]

### Get database cell types gene expression summary

```python
database = databases[2]
database.get_celltypes_expression_summary(['CD3D', 'CD3E'])
```
>```
> {'CD3D': [Summary(name="B cell",sum=707108874.0,mean=4192.7096,rate=0.035,count=168652.0,total=4812967),
> 	Summary(name="CD4-positive, alpha-beta T cell",sum=9489987442.0,mean=4657.5619,rate=0.5283,count=2037544.0,total=3856590),
> 	...
> 	Summary(name="corneal progenitor",sum=0.0,mean=0.0,rate=0.0,count=0.0,total=3973),
> 	Summary(name="nucleus pulposus progenitor cell",sum=0.0,mean=0.0,rate=0.0,count=0.0,total=2310)]}


### Create study instance, using study hash ID from [BioTuring studies](https://talk2data.bioturing.com/studies/):

```python
study = database.get_study('GSE96583_batch2')
study
```
>```
> Study(id="GSE96583_batch2",
>       title="Multiplexed droplet single-cell RNA-sequencing using natural genetic variation (Batch 2)",
>       reference="https://www.nature.com/articles/nbt.4042")

### Take a peek at study metadata:

```python
study.metalist
```
>```
> [Metadata(id=0,name="Number of mRNA transcripts",type="Numeric"),
>  Metadata(id=1,name="Number of genes",type="Numeric"),
>  Metadata(id=2,name="Batch id",type="Category"),
>  Metadata(id=3,name="Stimulation",type="Category"),
>  Metadata(id=4,name="Author's cell type",type="Category")]

### Fetch a study metadata:

```python
metadata = study.metalist[4]
metadata
```
>```
>Metadata(id=4,name="Author's cell type",type="Category")
```python
metadata.fetch()
metadata.values
```
>```
> array(['CD8 T cells', 'Dendritic cells', 'CD4 T cells', ...,
>        'CD8 T cells', 'B cells', 'CD4 T cells'], dtype='<U17')

### Query genes:

```python
exp_mtx = study.query_genes(['CD3D', 'CD3E'], bioflex.UNIT_LOGNORM)
exp_mtx
```
>```
> <29065x2 sparse matrix of type '<class 'numpy.float32'>'
>     with 15492 stored elements in Compressed Sparse Column format>

### Get study barcodes:

```python
study.barcodes()
```
>```
> ['GSM2560249_AAACATACCAAGCT-1',
>  'GSM2560249_AAACATACCCCTAC-1',
>  ...
>  'GSM2560249_AATTGTGATTCACT-1',
>  'GSM2560249_AATTGTGATTTCGT-1',
>  ...]

### Get study features:

```python
study.features()
```
>```
> ['5S_RRNA',
>  '5_8S_RRNA',
>  ...
>  'AC006273',
>  'AC006277',
>  ...]

### Get study full matrix:

```python
study.matrix(bioflex.UNIT_LOGNORM)
```
>```
> <29065x64642 sparse matrix of type '<class 'numpy.float32'>'
> 	with 17570739 stored elements in Compressed Sparse Column format>

### Export Study:

```python
study.export_study(bioflex.EXPORT_H5AD)
```
>```
>{'download_link': 'https://talk2data.bioturing.com/api/export/a1003bad3dd146b28c7bda913a2fc3f0',
> 'study_hash_id': 'GSE96583_batch2'}

----
For further information please check the [documentation](https://colab.bioturing.com/).

