This document should be read as INSTRUCTIONS to perform the "genomeev" service, as created on 25 Sep 2023.
The steps to follow to perform this service (which, by the way, can be done fairly quickly computationally speaking) are the following:

- Load the samples into the RAW directory (manually or automatically using the BU-ISCIII tools)

- Copy all files from this template (manually or automatically, make sure all files are there)

- Copy the whole service folder to scratch_tmp (at least, we had to do that when this template was created)

- First part is PikaVirus. Run PikaVirus by executing lablog_pikavirus, then enter the PikaVirus folder, execute the lablog (note that you need a samples_id.txt file, if you did not create it automatically, it has to be done manually), load the modules and do the thing. Feel free to change anything in PikaVirus through command or through the config (config is recommended so that any changes can be tracked). NOTE: wait for PikaVirus to end before you continue. Do something else in the meantime. read a paper or something dunno.

- Once PikaVirus has ended, we have to dive into the results, particularly the "all_samples_virus_table.tsv" in the results dir. Here, we have to find the most abundant virus. I personally recommend opening this file in excel or similar, and find the virus that repeats the most in the samples using some formula such as "COUNTIF(range, value)". Make sure you are working with a genome and not with just a fragment of it.

- Download said assembly locally, both its fna and its gff file. Make sure you store both files with the same name and different extension. The name SHOULD include the virus name, and the GCA/GCF code so its easier to identify (example: RotavirusG8_GCA_002669555_1.fasta; RotavirusG8_GCA_002669555_1.gff). Then, place it in the corresponding directory inside "/data/bi/references/virus".

- Once the files have been placed, we have to modify the samples_ref.txt file. 
    First column will be the exact same as the samples_id.txt file. 
    Second column will be the name of the assemblies we downloaded in the previous step (example: RotavirusG8_GCA_002669555_1 ). Make sure that all the rows are the exact same.
    Third column will be the name of the host (typically "human", but can be changed depending on the situation)

- Execute the lablog_viralrecon. The ANALYSIS02 directory will be created and filled with the corresponding scripts. Load the modules and launch viralrecon.

- Once it has ended, its time for MAG. Go to the ANALYSIS03 directory, execute the lablog, load the modules and run MAG with the specified params.

- Last, but not least, go to the ANALYSIS04 directory and run the lablog, the lablog will check the assembly step in viralrecon, and will store the names of the samples that didnt assembly to the scaffold level in the noscaffold.txt file. Run normally the three scripts after loading the corresponding module, and that should be about everything there is to this service! 
