1.0.0
=====

* Major redesign of how permanent data is stored. Removed 'outdir' parameter
  from all pipelines, and permanent data is stored directly in the Agalma
  sqlite database file (#163). Merged the separate Agalma and BioLite database
  files into a single file that can be specified with the environment variable
  AGALMA_DB (#162). This allows an entire Agalma analysis to be stored in a
  single sqlite file that can be relocated during analysis, or copied to a new
  sqlite file as the starting point of alternative analyses.
* New import and annotation methods to support amino acid sequences without
  accompanying nucleotide sequences, and CDS sequences. (#98, #174, #197)
* Removed deprecated phylogeny pipelines `orthologize` and `multalignx` (#172)
* Implemented phylogenetically guided assembly with the `treeinform` pipeline.
  Briefly, treeinform analyzes genetrees to infer a reassignment of assembled
  sequences to gene names in the existing Trinity assemblies. (#199, #203)
* Added validity tests for sequence reassignments in treeinform. (#223)
* Updated dendropy to latest version 4. (#207)
* Updated Trinity to latest version 2.3.2. (#188)
* Restored and improved transcriptome reports. (#220)
* Replaced notung with phyldog for inferring duplication/speciation events in
  'export_expression'. (#225)
* Replaced BioBrew installation instructions with prebuilt binary releases for
  all dependencies managed with Anaconda Python. Release includes a Docker
  image. (#227)

0.5.0
=====

* Added a tool 'agalma-export-expression' for exporting a single JSON file with the
  gene trees, species tree, and expression counts for a phylogenetic analysis of
  gene expression. The JSON file can be imported into R for downstream analyses.
  (#103, #119, #136)
* Updated 'assemble' to use Trinity version r20140413p1. (#127, #77, #91)
* Fixed a bug with incorrect arguments in a GNU parallel in 'postassemble'. (#135)
* Added a '--min_nodes' argument to 'homologize' for adjusting the minimum number
  of nodes retained as a homolgous gene cluster. (#139, contributed by Warren Francis)
* Fixed a bug in 'sanitize' where FastQC failed for gzipped FASTQ files. (#129)
* The rRNA exemplars are now annotated with the closest BLAST hit from the set
  of curated rRNA. The curated rRNA now require a "OG" field in the header for
  mitochondrial or plastid rRNA sequences. Fixed a bug in the 'load' pipeline with
  identifying mitochondrial and plastid sequences. (#134)
* The links to the final assembly files are now much more conspicuous in the
  'postassemble' report. (#137)
* Added a '--timeout' argument to the 'genetree' pipeline that sets the maximum
  amount of time to spend on estimating an individual gene tree with RAxML. Trees
  that can't be estimated within this timeframe are dropped from further analysis,
  and the number of failed trees is reported in the diagnostics. (#143)
* Update the TUTORIAL with a section on the `expression` pipeline. (#140)

0.4.0
=====

* Switched the MAFFT algorithm in 'multalign' from L-INS-i to E-INS-i.
* 'remove_rrna' is now more robust to failed subassemblies and datasets that
  contain no ribosomal RNA.
* 'remove_rrna' now excludes only the reads that map to the exemplar rRNA
  sequences, instead of to all sequences in the rRNA subassembly. The exmplar
  sequences and any rRNA transcripts identified in 'postassemble' are concatenated
  into a single rRNA file in the 'postassemble' report. (#112)
* Standardized all calls to GNU parallel so that they write out a progress log,
  halt on any errors, and rerun failed commands when the pipeline is restarted.
  (#111)
* The 'supermatrix' pipeline now runs on a single multiple alignment of either
  nucleotides or amino acids (instead of both at once) and provides a more detailed
  report. (#54)
* Switched the build system from GNU autotools to Python distutils. Agalma can
  now be installed with pip, the Python package manager. (#97)
* The 'randomize' stage was removed from 'sanitize' because it requires too much
  memory for large datasets. (#118)
* A new pipeline 'speciestree' replaces the second call to the 'genetree'
  pipeline to build a maximum-likelihood tree from the final supermatrix. It
  supports running RAxML with MPI. The Newick tree is included in a textbox in
  the report. (#104, #115)
* 'postassemble' now annotates assemblies by blasting the translations against
  swissprot with blastp. (#114)
* A new regression test is autogenerated from the tutorial. (#93)
* Sequences loaded into the Agalma database are classified by genome type
  (nuclear, mitochonrdial, plastid) and molecule type (protein-coding, large and
  small ribosomal). (#106, #117)
* Agalma now includes its own build of the swissprot blast database, with the
  organelle field (OG) in the description for idenfitying mitochondrial and
  plastid sequences. (#3, #106)
* Added a sample batch script that shows how to perform a self-contained
  transcriptome assembly on the Oscar compute cluster at Brown.
* Fixed bugs in 'remove_rrna' that caused it to fail when no rRNA is present in
  the sample.
* Transitioned over to a new organization of wrappers and workflows in BioLite.
  Removed a shim library that was used for resource reporting on older versions
  of Linux. (#107)
* New 'expression' pipeline maps reads to an assembly and estimates counts.
  Additional functionality is planned for the 0.5 release. (#69)
* Fixed a problem when relative paths were passed as arguments to Agalma
  pipelines. (#96)
* All stages of the assemblies are now written to the data directory instead of
  scratch. (#95)

0.3.5
=====

* Hotfixes to correct errors in TUTORIAL and append the supermatrix FASTA file
  to the 'multalign' report.
* New 'supermatrix' pipeline can construct supermatrices by occupancy
  proportion. (#75)
* New 'multalign' pipeline uses MAFFT instead of MACSE for multiple alignment
  of translated protein sequences. The simultaneous alignment and translation
  approach originally implemented in Agalma can improve translations by
  accommodating frameshifts; however, mistakenly including fairly distant
  homologs or erroneous transcripts within clusters can result in overall poor
  translations and alignment of clusters.  The old multalign pipeline was
  renamed 'multalignx' where the 'x' stands for translated multiple alignment,
  since MACSE uses nucleotide alignments to infer translations. (#79)
* Improved the linkage between the phylogeny pipelines, so that the most recent
  and correct type of previous runs are identified by default. A previous run
  can be explicitly chosen with the --previous argument (now consistent across
  all the pipelines). (#85)
* Rewrote the 'assemble' pipeline to subsume the Trinity.pl wrapper script, and
  run the various components of Trinity as separate stages within the pipeline.
  This provides finer grained resource usage and fixes some problems with
  robustness and memory use we were experiencing on our compute cluster. GNU
  parallel replaces ParaFly for both the quantify_graph and butterfly stages.
  Oases is no longer supported in 'assemble', but additional assemblers could
  be added in the future as variants on the 'assemble' pipeline, e.g.
  'assemble_oases'. (#87)
* The report for the 'supermatrix' report now includes a table of the
  percentage of genes present for each taxon. (#82)
* The regression tests are taking longer to run (30-40 minutes) and have been
  divided up into different levels. The default level (1) now runs in about
  (16-cores). Higher levels (2 or 3) provide more complete tests and are
  selected with 'agalma test X'. (#92)
* Added a histogram of mean quality scores to the 'sanitize' report. (#90)

0.3.4
=====

* Improved parallelization of the blastx annotation in 'postassemble'. (#53)
* 'homologize' has a new mode for seeding the homology search with an existing
  set of genes, such as CEGMA or an previously computed supermatrix. Instead of
  performing an all-by-all homology search, transcripts are only aligned
  against the seed genes. (#56, #59)
* New parameter in 'genetree' to disable bootstrapping or change the threshold
  for filtering by mean bootstrap support. (#60)
* Added multi-node parallelism to 'multalign' and 'genetree' using GNU
  parallel. (#58, #61)
* 'postassemble' now performs protein translation (largest open reading frame
  with Transdecoder) and transcript quantification (with RSEM). The schema for
  the 'sequences' table was updated so that exemplars are now selected as the
  transcript with highest abundance in a locus, rather than by the earlier
  ad-hoc selection of the longest transcript in the locus. Exemplars are now
  chosen in 'homologize' (via 'database.load_seqs') and not in 'postassemble'.
  (#57, #63)
* New 'orthologize' pipeline provides an alternative phylogeny pipeline that
  directly infers orthologs using OMA. (#64)
* Sequence reduction plot in the phylogeny report has more detail: added
  sequence counts before and after 'homologize.mcl_cluster' and for each filter
  applied in 'multalign.refine_clusters'. (#70, #71)
* Fixed a mis-calculation in the overlap threshold applied in
  'homologize.parse_edges'. (#72)

0.3.3
=====

* Added bootstraping to RAxML calls in the 'genetree.genetrees' stage, and a
  subsequent filtering stage that removes trees with low mean bootstrap
  support. (#43)
* Removed the auto-generated report at the end of 'transcriptome' and put the
  appropriate report commands in the TUTORIAL. (#51)
* Added report commands to the phylogeny section of the TUTORIAL. (#50)
* Fixed problems with 'tabular_report' that caused unneccessary rows and empty
  table cells. (#52)
* A new option '--nreads' for reducing the number of reads that 'sanitize'
  outputs. (#49)
* Modified 'load' to correctly validate external assemblies with IUPAC
  ambiguity codes. (#41)

0.3.2
=====

* Added 'resource_report' and 'phylogeny_report' utilities.
* Additional reporting for phylogeny pipelines:
  o 'genetree' reports maximum likelihood tree when run on a supermatrix.
  o supermatrix image in 'multalign', ordered by most complete taxon and gene.
  o some histograms were changed to tables for small numbers of taxa.
* Updates to README and TUTORIAL:
  o Clarified that the Agalma-bundled SwissProt database only includes Metazoa.
  o Fixed overwrite of 'BIOLITE_RESOURCES' variable in TUTORIAL. (#24)
* 'homologize' now ignores bad BLAST hits, that seem to occur for query
  sequences longer than 10Kb and in which the original query id is lost in
  the output.
* Fixed bug with passing flags through to RAxML in 'genetree'. (#19)
* Removed a hard-coded minimum cluster size of 3 from 'multalign' and replaced
  with the 'min_taxa' value (which should never be less than 4).
* New mechanism to break up the expensive all-by-all tblastx in 'homologize',
  so that many smaller chunks can be run externally/concurrently, and read
  back into the pipeline. This feature is not yet tested and we plan to finish
  it in the 0.3.3 release.
* Fixed default RAxML model in genetree. (#9)
* New regression test feature 'agalma test' downloads and runs a small
  transcriptome and phylogeny example to verify correct installation and
  validate changes to the code base.
* Phylogeny pipelines can now pass a common ID with --id and they will
  intelligently find the appropriate output from earlier pipelines. Previously,
  numeric run IDs had to be passed between pipelines. This is demonstrated in
  the TUTORIAL.

0.3.1
=====

* Split off part of 'assemble' pipeline into a new 'postassemble' pipeline,
  that performs all post-assembly filtering, coverage analysis, and annotation.
  It can be run on external (non-Agalma) assemblies prior to load, although
  the exemplars stage needs to be skipped if the assembly does not have
  Oases-style headers.
* Removed the annotation stage from 'load' pipeline, since this is now
  provided by 'postassemble' for external assemblies.
* Updated TUTORIAL now has a more complete phylogeny section and includes
  estimates of resources requirements.
* bugfix: typo in 'agalma_database' key in default agalma.cfg
* bugfix: missing 'cd' command in ubuntu install script

