Cell Maps Image Downloader

Description: {DESCRIPTION}

Version: {VERSION}

Usage

cellmaps_imagedownloadercmd.py [-h] [--cm4ai_table CM4AI_TABLE] [--samples SAMPLES] [--unique UNIQUE] [--provenance PROVENANCE] [--proteinatlasxml PROTEINATLASXML]
                                  [--fake_images] [--poolsize POOLSIZE] [--imgsuffix IMGSUFFIX] [--skip_existing] [--skip_failed] [--logconf LOGCONF] [--skip_logging] [--verbose]
                                  [--version]
                                  outdir

For definitions of the positional arguments run: cellmaps_imagedownloadercmd.py -h

Outputs

The tool creates several files and folders in the specified output directory. Below is the list and description of each output generated by the tool.

- 1_image_gene_node_attributes.tsv:
  A TSV file containing attributes for image genes generated during the first fold of execution. `2_image_gene_node_attributes.tsv` corresponds to the second fold of execution, etc.

name    represents    ambiguous    antibody    filename    imageurl
UHRF2   ensembl:ENSG00000147854        HPA026633    B2AI_1_untreated_D2_R5_    no image url found
TET3    ensembl:ENSG00000187605        HPA050845    B2AI_1_untreated_E5_R5_    no image url found
HDAC6   ensembl:ENSG00000094631        HPA003714    B2AI_1_untreated_G3_R5_    no image url found
HDAC3   ensembl:ENSG00000171720        HPA052052    B2AI_1_untreated_D3_R7_    no image url found

- samples.csv:
  A CSV copy of the file passed in via the `--samples` flag. This file will only be created if the `--samples` flag is set.

- samplescopy.csv:
  A CSV file generated from data passed in via the `--cm4ai_table` flag. This file will only be created if the flag is set.

filename    if_plate_id    position    sample  locations    antibody    ensembl_ids    gene_names
B2AI_1_untreated_C1_R1_    B2AI_1_untreated    C1    R1        CAB079904    ENSG00000187555
B2AI_1_untreated_C1_R2_    B2AI_1_untreated    C1    R2        CAB079904    ENSG00000187555
B2AI_1_untreated_C1_R3_    B2AI_1_untreated    C1    R3        CAB079904    ENSG00000187555

- unique.csv:
  A CSV file that is a copy of the file passed in via the `--unique` flag. This file will only be created if the `--unique` flag is set.

- uniquecopy.csv:
  A CSV file that is generated from data passed in via the `--cm4ai_table` flag. This file will only be created if the flag is set.

antibody    ensembl_ids    gene_names    atlas_name    locations    n_location
CAB079904   ENSG00000187555        MDA-MB-468        0
CAB079921   ENSG00000186298        MDA-MB-468        0
CAB080425   ENSG00000108773        MDA-MB-468        0

- blue, red, green, yellow:
  Directories containing downloaded images in different color spectrum.

- proteinatlas.xml.gz:
  A gzipped XML file containing information fetched from the [Human Protein Atlas](https://www.proteinatlas.org/).

Logs and Metadata

- image_gene_node_attributes.errors:
  Logs any errors encountered during the creation of image gene node attributes.

- output.log:
  A standard log file recording events, errors, and other messages during the execution of the tool.

- error.log:
  A specialized log file recording only error messages encountered during the execution of the tool.

- ro-crate-metadata.json:
  Metadata in [RO-Crate](https://www.researchobject.org/ro-crate) format, a community effort to establish a lightweight approach to packaging research data
  with their metadata. The main object contains identifier (`@id`), type (`@type`), name, descriptions, keywords and `isPartOf`, that describes the
  hierarchical relationship (organization and project). Graph: The `@graph` key contains an array of objects that detail other entities related to the main dataset.
  - a. Metadata, Datasets, Software
  - b. Output Files: details of output files generated by the tool.
  - c. Images: details about specific image files, including keywords, descriptions, formats, and content URLs.
