---
title: "EnrichIntersect Tutorial"
author: "Zhi Zhao"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{EnrichIntersect}
  \usepackage[utf8]{inputenc}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```


This is a flexible tool for enrichment analysis based on user-defined sets. It allows users to perform over-representation analysis of the custom sets among any specified ranked feature list, hence making enrichment analysis applicable to various types of data from different scientific fields. EnrichIntersect also enables an interactive means to visualize identified associations based on, for example, the mix-lasso model ([Zhao et al., 2022](https://doi.org/10.1016/j.isci.2022.104767)) or similar methods.

To get started, load the package with

```{r, eval=TRUE}
library("EnrichIntersect")
```

## Examples


### Plot enrichment map

An example input data object `cancers_drug_groups` is an R `list` provided in our package, which includes a `data.frame` object with 147 cancer drugs as rows and nine cancer types as columns, and another `data.frame` that groups the 147 drugs (second column) into nine user-defined drug classes (first column). The default setup of `enrichment()` uses the classic K-S test statistic to calculate the normalized enrichment score that quantifies the degree to which the features in a feature set are over-represented at the top or bottom of the entire ranked list of features (e.g. a list of drugs), using as default 100 permutations for the empirical null test statistic. In the visualization, the statistically significantly enriched feature sets are marked with red coloured circles given a pre-specified significance level. The pre-specified significance level will be adjusted automatically if argument `padj.method` is one of `c("holm","hochberg","hommel","bonferroni","BH","BY","fdr","none")`. Users can specify the argument `alpha` for calculating a weighted enrichment score, argument `normalize=FALSE` for using the standard enrichment score rather than the normalized score, argument `permute.n` for the number of permutations of the ranked feature list used for estimating the empirical null test statistic, and the argument `pvalue.cutoff` for marking enriched categories at a specific significance level. See the following code for an example.

```{r enrichment_map, results="hide", fig.width=7.2, fig.height=5}
data(cancers_drug_groups, package = "EnrichIntersect")
x <- cancers_drug_groups$score
custom.set <- cancers_drug_groups$custom.set
set.seed(123)
enrich <- enrichment(x, custom.set, permute.n = 100)
```


### Plot Sankey diagram for intersecting set through an array

`EnrichIntersect` function `intersectSankey()` creates a Sankey diagram to visualize intersecting sets from an `array` object, in which the first dimension represents intermediate variables, and the second and third dimensions represent multiple levels and multiple tasks, respectively. One intersecting set is a list of intermediate variables associated with a combination of a subset of levels and a subset of tasks, which is not easy to visualize when all possible combinations of the two are many. Our function `intersectSankey()` has adapted `sankeyNetwork()` from R package `networkD3` to create a `D3' `JavaScript' interactive Sankey diagram in order to be suitable for several levels, multiple tasks and many intermediate variables. Besides saving the Sankey diagram as an interactive html file, similarly to `networkD3`, the user can also save the Sankey diagram as a pdf or png file via R package `webshot2`. The argument `out.fig=c(NA,"html","pdf","png")` in the function `intersectSankey()` determines the figure on the user's R graphics device, to be saved either as a html, pdf or png file. 

An example input data object `cancers_genes_drugs` in the package is an `array` with associations between, e.g., 56 genes (first dimension), two cancer types (second dimension) and two drugs (third dimension) provided in our package. The user can adjust the Sankey diagram argument `out.fig` for different output graph types  and use argument `step.names` to indicate the labels of the three kinds of variables in a Sankey diagram, i.e., name of multiple levels, name of intermediate variables, and name of multiple tasks, see the following code for an example.

```{r sankey_diagram, fig.width=9, fig.height=7}
data(cancers_genes_drugs, package = "EnrichIntersect")
# better to use argument `out.fig = "pdf"` for printing a pdf or html figure
intersectSankey(cancers_genes_drugs, step.names = c("Cancers", "Genes", "Drugs"))
```


## Citation

> Zhi Zhao, Manuela Zucknick, Tero Aittokallio (2022).
> EnrichIntersect: an R package for custom set enrichment analysis and interactive visualization of intersecting sets. 
> _Bioinformatics Advances_, 2(1), vbac073. DOI: [10.1093/bioadv/vbac073](https://doi.org/10.1093/bioadv/vbac073).

> Zhi Zhao, Shixiong Wang, Manuela Zucknick, Tero Aittokallio (2022).
> Tissue-specific identification of multi-omics features for pan-cancer drug response prediction.
> _iScience_, 25(8): 104767. DOI: [10.1016/j.isci.2022.104767](https://doi.org/10.1016/j.isci.2022.104767).