Case study: BIN1 xQTL and AD GWAS¶
This notebook documents the analysis of xQTL case study on a targeted gene, BIN1.
- Section 0: Sanity check
- Section 1: Fine-mapping for xQTL and GWAS
- Section 2: Multi-context colocalization with Bellenguez 2022
- Section 3: Refinement of colocalized loci with other AD GWAS
- Section 4: Assessment of multi-context xQTL effect sizes
- Section 5: Multi-context causal TWAS (including conventional TWAS and MR)
- Section 6: Context specific multi-gene fine-mapping
- Section 7: Epigenomic QTL and their target regions
- Section 8: Context focused validation in other xQTL data
- Section 9: Non-linear effects of xQTL
- Section 10: in silico functional studies in iPSC model
- Section 11: Functional annotations of selected loci
- Section 12: Candidate loci as trans-xQTL
Overview¶
FunGen-xQTL resource contains 67 xQTL contexts as well as 9 AD GWAS fine-mapped genome-wide. The overarching goal for case studies is to use these resource to raise questions and learn more about gene targets of interest.
Overall, a case study consists of the following aspects:
- Check the basic information of the gene
- Check the existing xQTL and integrative analysis results, roughly including
- Summary table for univariate fine-mapping
- Marginal association results
- Multi-gene and multi-context fine-mapping
- Multi-context colocalization with AD GWAS
- TWAS, MR and causal TWAS
- Integration with epigenetic QTL
- Quantile QTL
- Interaction QTL
- Validation:
- Additional xQTL data in FunGen-xQTL
- Additional AD GWAS data-set already generated by FunGen-xQTL
- In silico functional studies
- Additional iPSC data-sets
- Functional annotations of variants, particularly in relevant cellular contexts
- Creative thinking: generate hypothesis, search in literature, raise questions to discuss
Computing environment setup¶
Interactive analysis will be done on AWS cloud powered by MemVerge. Please contact Gao Wang to setup accounts for you to start the analysis.
Please follow instructions on https://wanggroup.org/productivity_tips/mmcloud-interactive to configure your computing environment. Here are some additional packages you need to install after the initial configuration, in order to perform these analysis:
in terminal with bash:
micromamba install -n r_libs r-pecotmr
in R:
install.packages('BEDMatrix')
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("GenomeInfoDb",'GenomicRanges', 'Rsamtools'))
install.packages('Signac')
How to Use This Notebook¶
- Before you start: Load functions from
cb_plot.Randutilis.R, located at<xqtl-paper>/codes/. These functions and resources are packaged to streamline the analysis and ensure everything is as clean as possible. And also the codes for ColocBoost under path/data/colocalization/colocboost/R. - Inside of this notebook, use
sed -iorCtrl+Fto replace the geneBIN1with the gene you want to analyze. - For detailed analysis in some sections, please refer to the corresponding analysis notebooks as indicated. These companion notebooks are available under this same folder. The rest of the tasks can be completed with a few lines of code, as demonstrated in this notebook.
- Similarly for the companion notebooks you should also use the
sed -iorCtrl+Freplacing gene_name (BIN1 in this case) with the gene you want to investigate.
While using this notebook, you may need to generate three intermediate files from Sections 1 and 2, which will be useful for downstream analysis:
- a. Section 1:
- Fine-mapping contexts that indicate shared signals with AD,
BIN1_finemapping_contexts.rds. This can be used as input for Section 8 the multi-cohort validation step - A subset of the xQTL-AD table,
Fungen_xQTL_allQTL.overlapped.gwas.export.BIN1.rds. This can be used as input for Section 12.
- Fine-mapping contexts that indicate shared signals with AD,
- b. Section 2: A variant list showing colocalization in cohorts we analyzed with ColocBoost,
BIN1_colocboost_res.rds. this can be used as input for Sections 7, 9, 10, and 12.
Section 0: Sanity check¶
Check the basic information of the gene¶
- To gain a preliminary understanding of this gene’s expression—specifically, whether it is cell-specific—can help us quickly determine if our results are consistent with expectations.
Useful websites:
- check gene function, (immune) cell type specificity, tissue specifity, protein location: https://www.proteinatlas.org
- check gene position and structure: https://www.ncbi.nlm.nih.gov/gene/
- other collective information: https://www.genecards.org
Check the existing results which are inputs to this analysis¶
# If an error occurs while sourcing scripts, it might be because your get() returned NULL.
#Please restart the kernel or click the R kernel in the upper right corner to resolve the issue.
source('../../codes/cb_plot.R')
source('../../codes/utilis.R')
source("../../codes/qtl_utils.R",chdir = TRUE)
for(file in list.files("/data/colocalization/colocboost/R", pattern = ".R", full.names = T)){
source(file)
}
gene_name = 'BIN1'
dir.create(paste0('plots/', gene_name), recursive = T)
get basic target gene information
target_gene_info <- get_gene_info(gene_name = gene_name)
target_gene_info
- $gene_info
A data.table: 1 x 14 region_id #chr start end TSS LD_matrix_id LD_sumstats_id LD_sumstats_id_old TADB_index TADB_id gene_start gene_end sliding_windows gene_name <chr> <chr> <dbl> <dbl> <int> <chr> <chr> <chr> <chr> <chr> <int> <int> <chr> <chr> ENSG00000136717 chr2 123880000 130720000 127107287 chr2:122654970-124537054,chr2:124537054-125689597,chr2:125689597-127728648,chr2:127728648-129107569,chr2:129107569-130787741 2_122654970-124537054,2_124537054-125689597,2_125689597-127728648,2_127728648-129107569,2_129107569-130787741 2_122654970-124537054,2_124537054_125689597,2_125689597_127728648,2_127728648_129107569,2_129107569_130787741 TADB_175,TADB_176,TADB_177 chr2_123011984_128107288,chr2_126048027_131718831,chr2_126535801_133037993 127107288 127048027 chr2:116754139-124869570,chr2:118302225-128107288,chr2:120737102-131718831,chr2:123011984-133037993,chr2:126048027-134596399,chr2:126535801-135959342,chr2:130104846-136876443 BIN1 - $target_LD_ids
A matrix: 1 x 5 of type chr chr2:122654970-124537054 chr2:124537054-125689597 chr2:125689597-127728648 chr2:127728648-129107569 chr2:129107569-130787741 - $target_sums_ids
A matrix: 1 x 5 of type chr 2_122654970-124537054 2_124537054-125689597 2_125689597-127728648 2_127728648-129107569 2_129107569-130787741 - $gene_region
- 'chr2:123880000-130720000'
- $target_TAD_ids
A matrix: 1 x 3 of type chr chr2_123011984_128107288 chr2_126048027_131718831 chr2_126535801_133037993
gene_id = target_gene_info$gene_info$region_id
chrom = target_gene_info$gene_info$`#chr`
Take a quick look for the expression of target gene in ROSMAP bulk data, we don't want them to be too low
expression_in_rosmap_bulk(target_gene_info)
Bellenguez et al GWAS signals has many overlap with CS from other xQTL sources. This motivates us to look further. The figure above shows the ranges of CS to give us a loci level idea. Below, we show the variants in those CS, color-coding the variants that are shared between them in the same color. In particular, AD GWAS signals are also captured by a few xQTL data, although at this point we don't have formal statistical (colocalization) evidences for these observations yet.
region_p
pip_p
from ADxQTL summary table¶
This summary table have been produced by conserving the loci overlap between any AD GWAS (CS95%mincorr,CS95% and CS70%mincorr) credible sets , and any xQTL (CS95%mincorr, CS70%mincorr, and SNPs with pip>0.2) credible sets (see this notebook. Each line is one AD locus overlapping with one xQTL. for both AD locus ('gwas_locus_id') and xQTL ('locus_context_id' ) the table summarize the information from the locus (context/source, number of variants in this credible set, .. ) and information from its SNP having the highest pip.
xqtlad<-fread('/data/interactive_analysis/adpelle1/xqtl-paper/landscape_analysis/res_gwas_qtl_anno_ADGWAS_xQTL_noMSBBsQTL_filtered.csv.gz')
#to see
#xqtlad[gene_name==gname]
#get also the corresponding locus_context_id annotated flatten table
resf<-fread('/data/interactive_analysis/adpelle1/xqtl-paper/landscape_analysis/Fungen_xQTL_allQTL.overlapped.gwas.export.snATAC.ADGWAS_fix.addMetabrain.locuscontextid_ADcs95corrCs95Cs70corr_xQTLcs95corrCs70corrSNP0.2.csv.gz')
gname=gene_name
options(repr.plot.width = 12, repr.plot.height = round(nrow(xqtlad[gene_name==gname])/3))
GeneTrackQTL(gene=gname,ADxQTL_summ=xqtlad,flatten_table = resf)
ggsave('BIN1_track.pdf',width = 12,height = round(nrow(xqtlad[gene_name==gname])/3))
ggsave('BIN1_track.png',width = 12,height = round(nrow(xqtlad[gene_name==gname])/3))
This is done using ColocBoost. The most updated version of ColocBoost results are under path
s3://statfungen/ftp_fgc_xqtl/analysis_result/ColocBoost/2024_9/
Original output from ColocBoost¶
cb_res <- readRDS(paste0("/data/analysis_result/ColocBoost/2024_9/",gene_id,"_res.rds") )
#save colocboost results
cb_res_table <- get_cb_summary(cb_res)
saveRDS(cb_res_table, paste0(gene_name, "_colocboost_res.rds"))
cb <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2)
pdf('plots/BIN1/sec2.colocboost_res.pdf', width = 10, height = 2 * cb_plot_row(cb_res))
replayPlot(cb$p)
dev.off()
# colocalized variants
cb_res_table
| colocalized phenotypes | purity | # variants | highest VCP | colocalized index | colocalized variants | max_abs_z_variant | cset_id |
|---|---|---|---|---|---|---|---|
| <chr> | <dbl> | <dbl> | <dbl> | <chr> | <chr> | <chr> | <chr> |
| AC; AC_unproductive | 1.0000000 | 1 | 0.9873458 | 15628 | chr2:127076883:T:C | chr2:127076883:T:C | coloc_sets:Y8_Y13:CS1 |
| Exc; DLPFC; AC; PCC; AC_unproductive | 0.9662879 | 4 | 0.3171031 | 15715; 15747; 15752; 15882 | chr2:127092925:T:C; chr2:127101168:A:G; chr2:127101865:A:G; chr2:127123057:C:T | chr2:127092925:T:C | coloc_sets:Y5_Y7_Y8_Y9_Y13:CS2 |
| Mic; AD_Bellenguez_2022 | 0.9994440 | 2 | 0.9999998 | 15981; 15986 | chr2:127135234:C:T; chr2:127136522:G:A | chr2:127135234:C:T | coloc_sets:Y1_Y18:MergeCS1 |
# effect sign for each coloc sets
get_effect_sign_csets(cb_res)
- $`coloc_sets:Y8_Y13:CS1`
A data.frame: 1 x 3 variants AC AC_unproductive <chr> <dbl> <dbl> chr2:127076883:T:C chr2:127076883:T:C -9.528189 -11.59702 - $`coloc_sets:Y5_Y7_Y8_Y9_Y13:CS2`
A data.frame: 4 x 6 variants Exc DLPFC AC PCC AC_unproductive <chr> <dbl> <dbl> <dbl> <dbl> <dbl> chr2:127092925:T:C chr2:127092925:T:C -4.707726 -3.887473 -12.33930 -5.675266 -10.31998 chr2:127101168:A:G chr2:127101168:A:G -4.707726 -3.887473 -12.33930 -5.675266 -10.31998 chr2:127101865:A:G chr2:127101865:A:G -4.707726 -3.887473 -12.33930 -5.675266 -10.31998 chr2:127123057:C:T chr2:127123057:C:T -4.739743 -3.679612 -11.52623 -5.871070 -10.49521 - $`coloc_sets:Y1_Y18:MergeCS1`
A data.frame: 2 x 3 variants Mic AD_Bellenguez_2022 <chr> <dbl> <dbl> chr2:127135234:C:T chr2:127135234:C:T 9.153807 20.07143 chr2:127136522:G:A chr2:127136522:G:A 8.345345 14.34444
# LD between coloc sets
if(nrow(cb_res_table) > 1){
get_between_purity_simple(cb_res, gene.name = gene_id, path = '/data/colocalization/QTL_data/eQTL/')
} else {
message('No more than 1 Cos, No need to compute between LD')
}
| coloc_csets_1 | coloc_csets_2 | min_abs_cor | max_abs_cor | median_abs_cor |
|---|---|---|---|---|
| coloc_sets:Y8_Y13:CS1 | coloc_sets:Y5_Y7_Y8_Y9_Y13:CS2 | 0.635484450603831 | 0.650396425065784 | 0.650396425065784 |
| coloc_sets:Y8_Y13:CS1 | coloc_sets:Y1_Y18:MergeCS1 | 0.08178432445715 | 0.298153818017145 | 0.189969071237148 |
| coloc_sets:Y5_Y7_Y8_Y9_Y13:CS2 | coloc_sets:Y1_Y18:MergeCS1 | 0.216487087638644 | 0.543568774101822 | 0.377065075053725 |
Here, different colors refer to different 95% Colocalization Sets (CoS, a metric developed in ColocBoost indicating that there is 95% probabilty that this CoS captures a colocalization event). We only included ROSMAP data for this particular ColocBoost analysis. In this case, we observe cell specific eQTL, bulk sQTL colocalization on ROSMAP data with AD as two separate CoS, suggesting two putative causal signals. We did not detect colocalization with pQTL of statistical significance although from Section 1 there are some overlap with pQTL signals in fine-mapping CS, the overlapped variants in CS have small PIP.
Projecting Coloc Sets for Marginal Performance¶
If you want to project the coloc set onto the un-colocalized phenotypes to evaluate their marginal performance, please follow these steps:
- fake_contexts: A vector containing the phenotype names representing the un-colocalized phenotypes you want to examine. By default in this example, this will be set to AD.
- fake_targets: A vector containing the phenotype names with real coloc sets. By default in this example, this will be set to the most popular context, which is the one with the highest number of coloc sets.
This command will project every coloc set from fake_targets onto the phenotypes in fake_context.
For now, we only run this section when AD does not have any coloclization to avoid confusion
# check if we have AD coloc results in orginal data
AD_in_coloc <- sum(unlist(str_split(get_cb_summary(cb_res)[, 1], '; ')) %>% gsub(' ', '', .) %>% str_detect('AD_Bellenguez_2022')) > 0
if (AD_in_coloc){
run_marginal <- FALSE
} else run_marginal <- TRUE
if(run_marginal){
cb_marginal <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2, fake_contexts = c('AD_Bellenguez_2022'), fake_targets = get_most_popular_pheno(cb_res))
pdf('plots/BIN1/sec2.colocboost_res_marginal.pdf', width = 10, height = 2 * cb_plot_row(cb_marginal$cb_res))
replayPlot(cb_marginal$p)
dev.off()
} else {
message('Original coloc results already include AD coloc set.\nTemporarily skip processing fake AD csets, or set `run_marginal = TRUE` before running this cell.')
}
# example only shown in BIN1 case
# run_marginal = T
# if(run_marginal){
# cb_marginal <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2, fake_contexts = c('AD_Bellenguez_2022'), fake_targets = get_most_popular_pheno(cb_res))
# pdf('plots/BIN1/sec2.colocboost_res_marginal.pdf', width = 10, height = 2 * cb_plot_row(cb_marginal$cb_res))
# replayPlot(cb_marginal$p)
# dev.off()
# } else {
# message('Original coloc results already include AD coloc set.\nTemporarily skip processing fake AD csets, or set `run_marginal = TRUE` before running this cell.')
# }
Section 3: Refinement of colocalized loci with other AD GWAS¶
Here we refine the colocalization with other AD GWAS to iron out any heterogeniety between studies (heterogeniety can come from many sources), to get additional candidate loci from these more heterogenous sources as candidates to study.
Projecting Bellenguez Coloc Sets onto other AD Datasets' Marginal Results¶
The marginal results for all other AD datasets are targeted at the original Bellenguez coloc set.
AD_cohorts <- c('AD_Jansen_2021', 'AD_Bellenguez_EADB_2022', 'AD_Bellenguez_EADI_2022',
'AD_Kunkle_Stage1_2019', 'AD_Wightman_Excluding23andMe_2021',
'AD_Wightman_ExcludingUKBand23andME_2021', 'AD_Wightman_Full_2021')
cb_ad <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2, add_gwas = TRUE, gene_id = gene_id, cohorts = AD_cohorts)
No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.
pdf('plots/BIN1/sec3.colocboost_res_allad.pdf', width = 10, height = 2 * cb_plot_row(cb_ad$cb_res))
replayPlot(cb_ad$p)
dev.off()
Projecting the Most Popular Coloc Set onto other AD Datasets' Marginal Results¶
Also, we can project the coloc set for the most popular phenotype (derived from cb_marginal) onto the marginal results for all AD datasets.
For now, we only run this section when AD does not have any coloclization to avoid confusion
if(run_marginal){
cb_marginal_ad <- plot_cb(cb_res = cb_marginal$cb_res, cex.pheno = 1.5, x.phen = -0.2, add_gwas = TRUE, gene_id = gene_id, cohorts = AD_cohorts)
pdf('plots/BIN1/sec3.colocboost_res_marginal_allad.pdf', width = 10, height = 2 * cb_plot_row(cb_marginal_ad$cb_res))
replayPlot(cb_marginal_ad$p)
dev.off()
} else {
message('Original coloc results already include AD coloc set.\nTemporarily skip processing fake AD csets, or set `run_marginal = TRUE` before running this cell.')
}
Section 4: Assessment of multi-context xQTL effect sizes¶
Option 1: ColocBoost + MASH¶
Use colocboost variants and check for mash posterior contrast to see if the effect size are shared or specific or even opposite. Advantage is that colocboost result is AD GWAS informed; issue is that marginal posterior effects is not always the joint
source('../../codes/utilis.R')
mash_p <- mash_plot(gene_name = 'BIN1')
options(repr.plot.width = 10, repr.plot.height = 10)
for (mash_p_tmp in mash_p) {
print(mash_p_tmp)
ggsave(filename = 'plots/BIN1/sec4.mash_posterior.pdf',mash_p_tmp, height = 6, width = 7)
}
Option 2: mvSuSiE¶
Use mvSuSiE multicontext fine-mapping results --- the bubble plot to check posterior effects. Issue is that we don't have this results yet, and this is limited to one cohort at a time, without information from AD.
We should go for option 1 by default and if we want to make claim about opposite effect size we double-check with mvSuSiE multicontext analysis.
message("Multi context in ROSMAP data")
multi_context_rosmap_tmp <- tryCatch(
readRDS(paste0('/data/analysis_result/multi_context/ROSMAP/mnm/ROSMAP_DeJager.',
target_gene_info$gene_info$`#chr`, '_', gene_id, '.multicontext_bvsr.rds')),
error = function(e) message('Error in loading ROSMAP multi context data')
)
if (!is.null(multi_context_rosmap_tmp[[1]]$mvsusie_fitted)) {
plot_and_save(multi_context_rosmap_tmp[[1]], 'plots/BIN1/sec4.multi_context_ROSMAP')
} else {
message('Multi Context results are empty in ROSMAP data')
}
# Load and process MSBB data
message("Multi context in MSBB data")
multi_context_msbb_tmp <- tryCatch(
readRDS(paste0('/data/analysis_result/multi_context/MSBB/mnm/MSBB_eQTL.',
target_gene_info$gene_info$`#chr`, '_', gene_id, '.multicontext_bvsr.rds')),
error = function(e) message('Error in loading MSBB multi context data')
)
if (!is.null(multi_context_msbb_tmp[[1]]$mvsusie_fitted)) {
plot_and_save(multi_context_msbb_tmp[[1]], 'plots/BIN1/sec4.multi_context_MSBB')
} else {
message('Multi Context results are empty in MSBB data')
}
The most updated version of cTWAS analysis are under path
s3://statfungen/ftp_fgc_xqtl/analysis_result/cTWAS/
TWAS results¶
We report TWAS from all contexts and methods from the pipeline. Here we will filter it down to the best performing methods and only keep contexts that are significant.
plot_TWAS_res(gene_id = gene_id, gene_name = gene_name)
ggsave('plots/BIN1/sec5.TWAS_BIN1.pdf', width = 6, height = 4)
MR results¶
This is only available for genes that are deemed significant in TWAS and have summary statistics available for effect size and standard errors in GWAS, in addition to z-scores --- current version does not support z-scores although we will soon also support z-scores in MR using MAF from reference panel.
cTWAS results¶
To be updated soon.
A quick analysis: using the xQTL-AD summary table (flatten table)¶
We extract from xQTL-AD summary table the variants to get other genes also have CS with the variants shared by target gene and AD.
multigene_flat <- get_multigene_multicontext_flatten('Fungen_xQTL_allQTL.overlapped.gwas.export.BIN1.rds', sQTL = 'no_MSBB')
multigene_flat
| gene_id | #chr | start | end | gene_name | contexts |
|---|---|---|---|---|---|
| <chr> | <chr> | <int> | <int> | <chr> | <chr> |
Other genes implicated are PROC and HS6ST1 in MiGA cohort which may share causal eQTL with BIN1. Further look into the data-set --- using these genes as targets and repeating what we did above for BIN1 --- might be needed to establish a more certain conclusion.
Alternatively, we may be able to apply a multi-gene statistical fine-mapping test on BIN1 region to find these genes, as you will see in the section below.
A statistically solid approach: mvSuSiE multi-gene analysis¶
This multi-gene fine-mapping analysis was done for each xQTL context separately. We will need to check the results for all contexts where this gene has an xQTL, in order to identify if there are other genes also sharing the same xQTL with this target gene. We included other genes in the same TAD window along with this gene, and extended it into a sliding window approach to include multiple TADs just in case. You need to check the sliding windows belongs to that gene (TSS) on analysis repo.
sliding_windows <- target_gene_info$gene_info$TADB_id %>% strsplit(., ',') %>% unlist %>% as.character
sliding_windows
- 'chr2_123011984_128107288'
- 'chr2_126048027_131718831'
- 'chr2_126535801_133037993'
The most updated version of mvSuSiE multi-gene results are under path
s3://statfungen/ftp_fgc_xqtl/analysis_result/mvsusie_multi_gene_test/multi_gene/ Currently it is still WIP. You can revisit this later when we prompt you to. Here is an example for BIN1:
# Main loop to process sliding windows
mnm_gene <- list()
for (window in sliding_windows) {
context_files <- list.files('/data/analysis_result/multi_gene/ROSMAP/mnm_genes/', window, full.names = T) %>% .[str_detect(., '.multigene_bvrs.rds')]
for(context_file in context_files){
context_mnm = context_file %>% basename %>% str_split(., '[.]', simplify = T) %>% .[,1]
# Load multi-gene data
mnm_gene_tmp <- tryCatch(
readRDS(context_file),
error = function(e) NULL
)
if (!is.null(mnm_gene_tmp)) {
# Check if target gene is in the condition names
if (target_gene_info$gene_info$region_id %in% mnm_gene_tmp[[1]]$mvsusie_fitted$condition_names) {
# Use a common prefix format for multi-gene plots
plot_and_save(mnm_gene_tmp[[1]], 'plots/BIN1/sec6.multigene')
} else {
message('There is mnm result for TAD window ', window, ' in ', context_mnm,
', but it does not include target gene ', gene_name, ' in CS.')
}
# Append to the results list
mnm_gene <- append(mnm_gene, list(mnm_gene_tmp))
}
}
}
In this case, there is no statistical evidence for BIN1 sharing any of its xQTL with other genes in ROSMAP Microglia data we looked into; although we have not analyzed MiGA this way yet (which showed some potential signals from the quick analysis above).
Generate a crude plot to determined whether the story is interesting¶
This is a crude version of the case study plot which shows the fsusie Effect (colored line), the gene body (black arrow), the epi-QTL (large dots with the same color as the effects) and ADGWAS cs position (small red dots).
Only produce the refine plot if we can see either:
- There are sharing snp between epi-QTL and AD CS
- There are the AD CS located within one of the effect range
- The crude plot suggest something interesting
options(repr.plot.width = 40, repr.plot.height = 40)
ggplot() + theme_bw() + facet_grid(cs_coverage_0.95+study + region ~ ., labeller = labeller(.rows = function(x) gsub("([_:,-])", "\n", x)), scale = "free_y") +
theme(text = element_text(size = 20), strip.text.y = element_text(size = 25, angle = 0.5)) +
# xlim(view_win) +
ylab("Estimated effect") +
# geom_line(data = haQTL_df %>% mutate(study = "haQTL effect") %>% filter(CS == 5),
# aes_string(y = "fun_plot", x = "x", col = "CS"), size = 4, col = "#00AEEF") +
geom_line(data = effect_of_interest ,
aes_string(y = "fun_plot", x = "x", col = "cs_coverage_0.95"), size = 4) +
geom_point(data = effect_of_interest ,
aes_string(y = "pip", x = "pos", col = "cs_coverage_0.95"), size = 4) +
theme(text = element_text(size = 40), strip.text.y = element_text(size = 15, angle = 0.5),
axis.text.x = element_text(size = 40), axis.title.x = element_text(size = 40)) +
xlab("Position") +
ylab("Estimated\neffect") +
geom_segment(arrow = arrow(length = unit(1, "cm")), aes(x = gene_start, xend = gene_end, y = 1, yend = 1), size = 6,
data = tar_gene_info$gene_info, alpha = 0.3) +
geom_text(aes(x = (gene_start + gene_end) / 2, y = 1 , label = gene_name), size = 10,
data = tar_gene_info$gene_info)+
geom_point(aes(x = pos, y = pip ) ,color = "red", data = flatten_table%>%filter( str_detect(study,"AD_") , cs_coverage_0.95 != 0 )%>%mutate(AD_study = study%>%str_replace_all("_","\n" ))%>%select(-study,-region,-cs_coverage_0.95) )
Section 8: Context focused validation in other xQTL data¶
see notebook
add fake version for now, so you don't have to refer to above link
Background: our "discovery set" is ROSMAP but we have additional "validation" sets including:
- STARNET
- MiGA
- KnightADRC
- MSBB
- metaBrain
- UKB pQTL
TODO:
- Get from Carlos WashU CSF based resource (pQTL and metabolomic QTL)
This section shows verification of findings from these data-sets. In principle we should check them through sections 1-6 more formally. In practice we will start with colocalization via colocboost --- since our study is genetics (variant and loci level) focused. We can selectively follow them up for potentially intereting validations. We therefore only demonstrate validation via colocboost as a starting point.
finempping_contexts <- readRDS(paste0(gene_name, '_finemapping_contexts.rds')) # from sec1
finempping_contexts <- get_norosmap_contexts(finempping_contexts)
Projecting Bellenguez Coloc Sets onto other xQTLs' Marginal Results¶
The marginal results for all other QTL contexts which showing interesting signals from section 1, are targeted at the original Bellenguez coloc set.
cb_contexts <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2, add_QTL = TRUE, cohorts = finempping_contexts, gene_id = gene_id)
No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.
In conclusion from what's shown above, when we check the association signals in STARNET and MiGA on colocalization established from ROSMAP and AD GWAS, we see additional evidences.
pdf('plots/BIN1/sec8.colocboost_res_fmp_contexts.pdf', width = 10, height = 2 * cb_plot_row(cb_contexts$cb_res))
replayPlot(cb_contexts$p)
dev.off()
Projecting the Most Popular Coloc Set onto other xQTLs' Marginal Results¶
Also, we can project the coloc set for the most popular phenotype (derived from cb_marginal) onto the marginal results for other xQTLs showing intresting signals from fine mapping (section 1).
For now, we only run this section when AD does not have any coloclization to avoid confusion
if(run_marginal){
cb_marginal_contexts <- plot_cb(cb_res = cb_marginal$cb_res, cex.pheno = 1.5, x.phen = -0.2, add_QTL = TRUE, cohorts = finempping_contexts, gene_id = gene_id)
pdf('plots/BIN1/sec8.colocboost_res_marginal_fmp_contexts.pdf', width = 10, height = 2 * cb_plot_row(cb_marginal_contexts$cb_res))
replayPlot(cb_marginal_contexts$p)
dev.off()
} else {
message('Original coloc results already include AD coloc set.\nTemporarily skip processing fake AD csets, or set `run_marginal = TRUE` before running this cell.')
}
APOE interaction¶
int_apoe_p
In conclusion, there is no interaction QTL with APOE identified.
Quantile QTL¶
quant_coef_colocvar
vars_p
apoe_p
Section 11: Functional annotations of selected loci¶
see notebook
TODO
- Touch base with Ryan on the snATAC annotations
- Run this by Pavel to see if there are additional comments on how we do this
func_p
options(repr.plot.width=12, repr.plot.height=6)
if(!is.null(flat_var)){
ggplot(flat_var, aes(x = gene_name, y = pip, size = pip)) +
geom_point(alpha = 0.7) +
labs(title = paste0("PIP values for trans fine mapped Genes in ", gene_name ," csets with AD"),
x = "Gene Name",
y = "PIP",
size = "PIP",
color = "CS Coverage 0.95 Min Corr") +
theme_minimal(base_size = 14) +
theme(panel.background = element_blank(),
panel.grid.major = element_line(color = "grey80"),
legend.position = NULL,
axis.text.x = element_text(angle = 45, hjust = 1))
# scale_color_manual(values = colorRampPalette(brewer.pal(8, "Set1"))(length(unique(flat_var$gene_name))))
ggsave(paste0('plots/BIN1/sec12.trans_fine_mapping_',gene_name,'.pdf'), height = 5, width = 8)
} else{
message('There are no detectable trans signals for ', gene_name)
}
Creative thinking: generate hypothesis, search in literature, raise questions to discuss¶
You can now generate some preliminary hypotheses based on the above results. Next, you should search for evidence in the literature to support or refine these hypotheses and identify additional analyses needed to confirm them.