This is a flexible tool for enrichment analysis based on user-defined sets. It allows users to perform over-representation analysis of the custom sets among any specified ranked feature list, hence making enrichment analysis applicable to various types of data from different scientific fields. EnrichIntersect also enables an interactive means to visualize identified associations based on, for example, the mix-lasso model (Zhao et al., 2022) or similar methods.
To get started, load the package with
An example input data object cancers_drug_groups
is an R
list
provided in our package, which includes a
data.frame
object with 147 cancer drugs as rows and nine
cancer types as columns, and another data.frame
that groups
the 147 drugs (second column) into nine user-defined drug classes (first
column). The default setup of enrichment()
uses the classic
K-S test statistic to calculate the normalized enrichment score that
quantifies the degree to which the features in a feature set are
over-represented at the top or bottom of the entire ranked list of
features (e.g. a list of drugs), using as default 100 permutations for
the empirical null test statistic. In the visualization, the
statistically significantly enriched feature sets are marked with red
coloured circles given a pre-specified significance level. The
pre-specified significance level will be adjusted automatically if
argument padj.method
is one of
c("holm","hochberg","hommel","bonferroni","BH","BY","fdr","none")
.
Users can specify the argument alpha
for calculating a
weighted enrichment score, argument normalize=FALSE
for
using the standard enrichment score rather than the normalized score,
argument permute.n
for the number of permutations of the
ranked feature list used for estimating the empirical null test
statistic, and the argument pvalue.cutoff
for marking
enriched categories at a specific significance level. See the following
code for an example.
EnrichIntersect
function intersectSankey()
creates a Sankey diagram to visualize intersecting sets from an
array
object, in which the first dimension represents
intermediate variables, and the second and third dimensions represent
multiple levels and multiple tasks, respectively. One intersecting set
is a list of intermediate variables associated with a combination of a
subset of levels and a subset of tasks, which is not easy to visualize
when all possible combinations of the two are many. Our function
intersectSankey()
has adapted sankeyNetwork()
from R package networkD3
to create a
D3'
JavaScript’ interactive Sankey diagram in order to be
suitable for several levels, multiple tasks and many intermediate
variables. Besides saving the Sankey diagram as an interactive html
file, similarly to networkD3
, the user can also save the
Sankey diagram as a pdf or png file via R package webshot2
.
The argument out.fig=c(NA,"html","pdf","png")
in the
function intersectSankey()
determines the figure on the
user’s R graphics device, to be saved either as a html, pdf or png
file.
An example input data object cancers_genes_drugs
in the
package is an array
with associations between, e.g., 56
genes (first dimension), two cancer types (second dimension) and two
drugs (third dimension) provided in our package. The user can adjust the
Sankey diagram argument out.fig
for different output graph
types and use argument step.names
to indicate the labels of
the three kinds of variables in a Sankey diagram, i.e., name of multiple
levels, name of intermediate variables, and name of multiple tasks, see
the following code for an example.
Zhi Zhao, Manuela Zucknick, Tero Aittokallio (2022). EnrichIntersect: an R package for custom set enrichment analysis and interactive visualization of intersecting sets. Bioinformatics Advances, 2(1), vbac073. DOI: 10.1093/bioadv/vbac073.
Zhi Zhao, Shixiong Wang, Manuela Zucknick, Tero Aittokallio (2022). Tissue-specific identification of multi-omics features for pan-cancer drug response prediction. iScience, 25(8): 104767. DOI: 10.1016/j.isci.2022.104767.