Evolutionary transcriptomics studies can serve as a first approach to
screen in silico for the potential existence of evolutionary
constraints within a biological process of interest. This is achieved by
quantifying transcriptome conservation patterns and their underlying
gene sets in biological processes. The exploratory analysis functions
implemented in myTAI
provide users with a standardized,
automated and optimized framework to detect patterns of evolutionary
constraints in any transcriptome dataset of interest.
Please find a detailed documentation here.
Please cite the following paper when using myTAI
for
your own research. This will allow me to continue working on this
software tool and will motivate me to extend its functionality and
usability in the next years. Many thanks in advance :)
Drost et al. myTAI: evolutionary transcriptomics with R . Bioinformatics 2018, 34 (9), 1589-1590. doi:10.1093
Please install the following package dependencies:
# Install core Bioconductor packages
if (!requireNamespace("BiocManager"))
install.packages("BiocManager")
::install()
BiocManager# Install package dependencies
::install("Biostrings")
BiocManager::install("edgeR") BiocManager
Now users can install myTAI
from CRAN:
# install myTAI 0.9.3
install.packages("myTAI", dependencies = TRUE)
Using myTAI
, any existing or newly generated
transcriptome dataset can be combined with evolutionary information
(find details
here) to retrieve novel insights about the evolutionary conservation
of the transcriptome at hand.
For the purpose of performing large scale evolutionary
transcriptomics studies, the myTAI
package implements the
quantification, statistical assessment, and analytics functionality to
allow researchers to study the evolution of biological processes by
determining stages or periods of evolutionary conservation or
variability in transcriptome data.
We hope that myTAI
will become the community standard
tool to perform evolutionary transcriptomics studies and we are happy to
add required functionality upon request.
Today, phenotypic phenomena such as morphological mutations, diseases or developmental processes are primarily investigated on the molecular level using transcriptomics approaches. Transcriptomes denote the total number of quantifiable transcripts present at a specific stage in a biological process. In disease or developmental (defect) studies transcriptomes are usually measured over several time points. In treatment studies aiming to quantify differences in the transcriptome due to biotic stimuli, abiotic stimuli, or diseases usually treatment / disease versus non-treatment / non-disease transcriptomes are being compared. In either case, comparing changes in transcriptomes over time or between treatments allows us to identify genes and gene regulatory mechanisms that might be involved in governing the biological process of investigation. Although transcriptomics studies are based on a powerful methodology little is known about the evolution of such transcriptomes. Understanding the evolutionary mechanism that change transcriptomes over time, however, might give us a new perspective on how diseases emerge in the first place or how morphological changes are triggered by changes of developmental transcriptomes.
Evolutionary transcriptomics aims to capture and quantify the evolutionary conservation of genes that contribute to the transcriptome during a specific stage of the biological process of interest. The resulting temporal conservation pattern then enables to detect stages of development or other biological processes that are evolutionarily conserved (Drost et al., 2018). This quantification on the highest level is achieved through transcriptome indices (e.g. Transcriptome Age Index or Transcriptome Divergence Index) which aim to quantify the average evolutionary age or sequence conseration of genes that contribute to the transcriptome at a particular stage. In general, evolutionary transcriptomics can be used as a method to quantify the evolutionary conservation of transcriptomes to investigate how transcriptomes underlying biological processes are constrained or channeled due to events in evolutionary history (Dollow’s law) (Drost et al., 2017.
Please note, since myTAI relies on gene age inference and there has been an extensive debate about the best approaches for gene age inference in the last years, please follow my updated discussion about the gene age inference literature.
Some bug fixes or new functionality will not be available on CRAN
yet, but in the developer version here on GitHub. To download and
install the most recent version of myTAI
run:
if (!requireNamespace("BiocManager"))
install.packages("BiocManager")
::install()
BiocManager# Install package dependencies
::install("Biostrings", version = "3.8")
BiocManager::install("edgeR")
BiocManager# install developer version of myTAI
::install("drostlab/myTAI") BiocManager
The current status of the package as well as a detailed history of
the functionality of each version of myTAI
can be found in
the NEWS
section.
The following tutorials will provide use cases and detailed
explainations of how to quantify transcriptome onservation with
myTAI
and how to interpret the results generated with this
software tool.
library(myTAI)
# example dataset covering 7 stages of A thaliana embryo development
data("PhyloExpressionSetExample")
# transform absolute expression levels to log2 expression levels
<- tf(PhyloExpressionSetExample, log2) ExprExample
# visualize global Transcriptome Age Index pattern
PlotSignature(ExprExample)
# plot expression level distributions for each age (=PS) category
# and each developmental stage
PlotCategoryExpr(ExprExample, "PS")
# plot mean expression of each age category seperated by old (PS1-3)
# versus young (PS4-12) genes
PlotMeans(ExprExample, Groups = list(1:3, 4:12))
# plot relative mean expression of each age category seperated by old (PS1-3)
# versus young (PS4-12) genes
PlotRE(ExprExample, Groups = list(1:3, 4:12))
# plot the significant differences between gene expression distributions
# of old (=group1) versus young (=group2) genes
PlotGroupDiffs(ExpressionSet = ExprExample,
Groups = list(group_1 = 1:3, group_2 = 4:12),
legendName = "PS",
plot.type = "boxplot")
myTAI
Users can also read the tutorials within (RStudio) :
# source the myTAI package
library(myTAI)
# look for all tutorials (vignettes) available in the myTAI package
# this will open your web browser
browseVignettes("myTAI")
# or as single tutorials
# open tutorial: Introduction to Phylotranscriptomics and myTAI
vignette("Introduction", package = "myTAI")
# open tutorial: Intermediate Concepts of Phylotranscriptomics
vignette("Intermediate", package = "myTAI")
# open tutorial: Advanced Concepts of Phylotranscriptomics
vignette("Advanced", package = "myTAI")
# open tutorial: Age Enrichment Analyses
vignette("Enrichment", package = "myTAI")
# open tutorial: Gene Expression Analysis with myTAI
vignette("Expression", package = "myTAI")
# open tutorial: Taxonomic Information Retrieval with myTAI
vignette("Taxonomy", package = "myTAI")
In the myTAI
framework users can find:
TAI()
: Function to compute the Transcriptome Age Index
(TAI)TDI()
: Function to compute the Transcriptome
Divergence Index (TDI)TPI()
: Function to compute the Transcriptome
Polymorphism Index (TPI)REMatrix()
: Function to compute the relative
expression profiles of all phylostrata or divergence-strataRE()
: Function to transform mean expression levels to
relative expression levelspTAI()
: Compute the Phylostratum Contribution to the
global TAIpTDI()
: Compute the Divergence Stratum Contribution to
the global TDIpMatrix()
: Compute Partial TAI or TDI ValuespStrata()
: Compute Partial Strata ValuesPlotSignature()
: Main visualization function to plot
evolutionary signatures across transcriptomesPlotPattern()
: Base graphics function to plot
evolutionary signatures across transcriptomesPlotContribution()
: Plot Cumuative Transcriptome
IndexPlotCorrelation()
: Function to plot the correlation
between phylostratum values and divergence-stratum valuesPlotRE()
: Function to plot the relative expression
profilesPlotBarRE()
: Function to plot the mean relative
expression levels of phylostratum or divergence-stratum classes as
barplotPlotMeans()
: Function to plot the mean expression
profiles of age categoriesPlotMedians()
: Function to plot the median expression
profiles of age categoriesPlotVars()
: Function to plot the expression variance
profiles of age categoriesPlotDistribution()
: Function to plot the frequency
distribution of genes within the corresponding age categoriesPlotCategoryExpr()
: Plot the Expression Levels of each
Age or Divergence Category as Barplot or ViolinplotPlotEnrichment()
: Plot the Phylostratum or Divergence
Stratum Enrichment of a given Gene SetPlotGeneSet()
: Plot the Expression Profiles of a Gene
SetPlotGroupDiffs()
: Plot the significant differences
between gene expression distributions of PS or DS groupsPlotSelectedAgeDistr()
: Plot the PS or DS distribution
of a selected set of genesFlatLineTest()
: Function to perform the Flat
Line Test that quantifies the statistical significance of an
observed phylotranscriptomics pattern (significant deviation from a frat
line = evolutionary signal)ReductiveHourglassTest()
: Function to perform the
Reductive Hourglass Test that statistically evaluates
the existence of a phylotranscriptomic hourglass pattern (hourglass
model)EarlyConservationTest()
: Function to perform the
Reductive Early Conservation Test that statistically
evaluates the existence of a monotonically increasing
phylotranscriptomic pattern (early conservation model)ReverseHourglassTest
: Function to perform the
Reverse Hourglass Test that statistically evaluates the
existence of a reverse hourglass pattern (low-high-low)EnrichmentTest()
: Phylostratum or Divergence Stratum
Enrichment of a given Gene Set based on Fisher’s TestbootMatrix()
: Compute a Permutation Matrix for Test
StatisticsAll functions also include visual analytics tools to quantify the goodness of test statistics.
DiffGenes()
: Implements Popular Methods for
Differential Gene Expression AnalysisCollapseReplicates()
: Combine Replicates in an
ExpressionSetCombinatorialSignificance()
: Compute the Statistical
Significance of Each Replicate CombinationExpressed()
: Filter Expression Levels in Gene
Expression Matrices (define expressed genes)SelectGeneSet()
: Select a Subset of Genes in an
ExpressionSetPlotReplicateQuality()
: Plot the Quality of Biological
ReplicatesGroupDiffs()
: Quantify the significant differences
between gene expression distributions of PS or DS groupstaxonomy()
: Retrieve Taxonomic Information for any
Organism of InterestMatchMap()
: Match a Phylostratigraphic Map or
Divergence Map with a ExpressionMatrixtf()
: Transform Gene Expression Levelsage.apply()
: Age Category Specific apply FunctionecScore()
: Compute the Hourglass Score for the
EarlyConservationTestgeom.mean()
: Geometric Meanharm.mean()
: Harmonic MeanomitMatrix()
: Compute TAI or TDI Profiles Omitting a
Given GenerhScore()
: Compute the Hourglass Score for the
Reductive Hourglass TestreversehourglassScore()
: Compute the Reverse Hourglass
Score for the Reverse Hourglass TestmyTAI
to quantify transcriptome
conservation:
Evolutionary transcriptomics of metazoan biphasic life cycle supports a single intercalation origin of metazoan larvae J Wang, L Zhang, S Lian, Z Qin, X Zhu, X Dai, Z Huang et al. - Nature Ecology & Evolution, 2020
Reconstructing the transcriptional ontogeny of maize and sorghum supports an inverse hourglass model of inflorescence development S Leiboff, S Hake - Current Biology, 2019
Evidence for active maintenance of phylotranscriptomic hourglass patterns in animal and plant embryogenesis HG Drost, A Gabel, I Grosse, M Quint - Molecular biology and evolution, 2015
Gene Expression Does Not Support the Developmental Hourglass Model in Three Animals with Spiralian Development L Wu, KE Ferger, JD Lambert - Molecular biology and evolution, 2019
The Transcriptional Landscape of Polyploid Wheats and their Diploid Ancestors during Embryogenesis and Grain Development D Xiang, TD Quilichini, Z Liu, P Gao, Y Pan et al. - The Plant Cell, 2019
Developmental constraints on genome evolution in four bilaterian model species J Liu, M Robinson-Rechavi - Genome biology and evolution, 2018
Mapping selection within Drosophila melanogaster embryo’s anatomy I Salvador-Martínez et al. - Molecular biology and evolution, 2017
Distribution and diversity of enzymes for polysaccharide degradation in fungi R Berlemont - Scientific reports, 2017
The origins and evolutionary history of human non-coding RNA regulatory networks M Sherafatian, SJ Mowla - Journal of bioinformatics and computational biology, 2017
Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry SM Mahendran, EC Keystone, RJ Krawetz et al. - Clinical proteomics, 2019
Phylostratr: a framework for phylostratigraphy Z Arendsee, J Li, U Singh, A Seetharam et al. - Bioinformatics, 2019
Pervasive convergent evolution and extreme phenotypes define chaperone requirements of protein homeostasis Y Draceni, S Pechmann - BioRxiv, 2019
Environmental DNA reveals landscape mosaic of wetland plant communities ME Shackleton, GN Rees, G Watson et al. - Global Ecology and Conservation, 2019
Algorithms for synteny-based phylostratigraphy and gene origin classification Z Arendsee - 2019
High expression of new genes in trochophore enlightening the ontogeny and evolution of trochozoans F Xu, T Domazet-Lošo, D Fan, TL Dunwell, L Li et al. - Scientific reports, 2016
I would be very happy to learn more about potential improvements of the concepts and functions provided in this package.
Furthermore, in case you find some bugs or need additional (more flexible) functionality of parts of this package, please let me know:
https://github.com/drostlab/myTAI/issues
Domazet-Lošo T. and Tautz D. A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature (2010) 468: 815-8.
Quint M, Drost HG, et al. A transcriptomic hourglass in plant embryogenesis. Nature (2012) 490: 98-101.
Drost HG, Gabel A, Grosse I, Quint M. Evidence for Active Maintenance of Phylotranscriptomic Hourglass Patterns in Animal and Plant Embryogenesis. Mol. Biol. Evol. (2015) 32 (5): 1221-1231.
Drost HG, Bellstädt J, Ó’Maoiléidigh DS, Silva AT, Gabel A, Weinholdt C, Ryan PT, Dekkers BJW, Bentsink L, Hilhorst H, Ligterink W, Wellmer F, Grosse I, and Quint M. Post-embryonic hourglass patterns mark ontogenetic transitions in plant development. Mol. Biol. Evol. (2016) doi:10.1093/molbev/msw039
I would like to thank several individuals for making this project possible.
First I would like to thank Ivo Grosse and Marcel Quint for providing me a place and the environment to be able to work on fascinating topics of Evo-Devo research and for the fruitful discussions that led to projects like this one.
Furthermore, I would like to thank Alexander Gabel and Jan Grau for valuable discussions on how to improve some methodological concepts of some analyses present in this package.
I would also like to thank my past Master Students: Sarah Scharfenberg, Anne Hoffmann, and Sebastian Wussow who worked intensively with this package and helped me to improve the usability and logic of the package environment.