Case study: MAPK3 xQTL and AD GWAS¶

This notebook documents the analysis of xQTL case study on a targeted gene, MAPK3.

  • Section 0: Sanity check
  • Section 1: Fine-mapping for xQTL and GWAS
  • Section 2: Multi-context colocalization with Bellenguez 2022
  • Section 3: Refinement of colocalized loci with other AD GWAS
  • Section 4: Assessment of multi-context xQTL effect sizes
  • Section 5: Multi-context causal TWAS (including conventional TWAS and MR)
  • Section 6: Context specific multi-gene fine-mapping
  • Section 7: Epigenomic QTL and their target regions
  • Section 8: Context focused validation in other xQTL data
  • Section 9: Non-linear effects of xQTL
  • Section 10: in silico functional studies in iPSC model
  • Section 11: Functional annotations of selected loci
  • Section 12: Candidate loci as trans-xQTL

Overview¶

FunGen-xQTL resource contains 67 xQTL contexts as well as 9 AD GWAS fine-mapped genome-wide. The overarching goal for case studies is to use these resource to raise questions and learn more about gene targets of interest.

Overall, a case study consists of the following aspects:

  • Check the basic information of the gene
  • Check the existing xQTL and integrative analysis results, roughly including
    • Summary table for univariate fine-mapping
    • Marginal association results
    • Multi-gene and multi-context fine-mapping
    • Multi-context colocalization with AD GWAS
    • TWAS, MR and causal TWAS
    • Integration with epigenetic QTL
    • Quantile QTL
    • Interaction QTL
    • Validation:
      • Additional xQTL data in FunGen-xQTL
      • Additional AD GWAS data-set already generated by FunGen-xQTL
    • In silico functional studies
      • Additional iPSC data-sets
    • Functional annotations of variants, particularly in relevant cellular contexts
  • Creative thinking: generate hypothesis, search in literature, raise questions to discuss

Computing environment setup¶

Interactive analysis will be done on AWS cloud powered by MemVerge. Please contact Gao Wang to setup accounts for you to start the analysis.

Please follow instructions on https://wanggroup.org/productivity_tips/mmcloud-interactive to configure your computing environment. Here are some additional packages you need to install after the initial configuration, in order to perform these analysis:

in terminal with bash:

micromamba install -n r_libs r-pecotmr

in R:

install.packages('BEDMatrix')

How to Use This Notebook¶

  1. Before you start: Load functions from cb_plot.R and utilis.R, located at <xqtl-paper>/codes/. These functions and resources are packaged to streamline the analysis and ensure everything is as clean as possible. And also the codes for ColocBoost under path /data/colocalization/colocboost/R.
  2. Inside of this notebook, use sed -i or Ctrl+F to replace the gene MAPK3 with the gene you want to analyze.
  3. For detailed analysis in some sections, please refer to the corresponding analysis notebooks as indicated. These companion notebooks are available under this same folder. The rest of the tasks can be completed with a few lines of code, as demonstrated in this notebook.
  4. Similarly for the companion notebooks you should also use the sed -i or Ctrl+F replacing gene_name (MAPK3 in this case) with the gene you want to investigate.

While using this notebook, you may need to generate three intermediate files from Sections 1 and 2, which will be useful for downstream analysis:

  • a. Section 1:
    • Fine-mapping contexts that indicate shared signals with AD, MAPK3_finemapping_contexts.rds. This can be used as input for Section 8 the multi-cohort validation step
    • A subset of the xQTL-AD table, Fungen_xQTL_allQTL.overlapped.gwas.export.MAPK3.rds. This can be used as input for Section 12.
  • b. Section 2: A variant list showing colocalization in cohorts we analyzed with ColocBoost, MAPK3_colocboost_res.rds. this can be used as input for Sections 7, 9, 10, and 12.

Section 0: Sanity check¶

Check the basic information of the gene¶

  • To gain a preliminary understanding of this gene’s expression—specifically, whether it is cell-specific—can help us quickly determine if our results are consistent with expectations.

Useful websites:

  1. check gene function, (immune) cell type specificity, tissue specifity, protein location: https://www.proteinatlas.org
  2. check gene position and structure: https://www.ncbi.nlm.nih.gov/gene/
  3. other collective information: https://www.genecards.org

Check the existing results which are inputs to this analysis¶

Check the existing results which are inputs to this analysis¶

In [1]:
# If an error occurs while sourcing scripts, it might be because your get() returned NULL. 
#Please restart the kernel or click the R kernel in the upper right corner to resolve the issue.
source('../../codes/cb_plot.R')
source('../../codes/utilis.R')

for(file in list.files("/data/colocalization/colocboost/R", pattern = ".R", full.names = T)){
          source(file)
        }
gene_name = 'MAPK3'

dir.create(paste0('plots/', gene_name), recursive = T)
In [2]:
target_gene_info <- get_gene_info(gene_name = gene_name)
target_gene_info
$gene_info
A data.table: 1 x 14
region_id#chrstartendTSSLD_matrix_idLD_sumstats_idLD_sumstats_id_oldTADB_indexTADB_idgene_startgene_endsliding_windowsgene_name
<chr><chr><dbl><dbl><int><chr><chr><chr><chr><chr><int><int><chr><chr>
ENSG00000102882chr16283600003.4e+0730123505chr16:26796952-29685831,chr16:29685831-4638151316_26796952-29685831,16_29685831-4638151316_26796952-29685831,16_29685831_46381513TADB_1164,TADB_1165chr16_24031743_30613717,chr16_27341433_349915513012350630114105chr16:21437007-30613717,chr16:22388493-34991551,chr16:24031743-46400000,chr16:27341433-49746318,chr16:30975802-50968730MAPK3
$target_LD_ids
A matrix: 1 x 2 of type chr
chr16:26796952-29685831chr16:29685831-46381513
$target_sums_ids
A matrix: 1 x 2 of type chr
16_26796952-2968583116_29685831-46381513
$gene_region
'chr16:28360000-3.4e+07'
$target_TAD_ids
A matrix: 1 x 2 of type chr
chr16_24031743_30613717chr16_27341433_34991551
In [2]:
target_gene_info <- get_gene_info(gene_name = gene_name)
target_gene_info
$gene_info
A data.table: 1 x 14
region_id#chrstartendTSSLD_matrix_idLD_sumstats_idLD_sumstats_id_oldTADB_indexTADB_idgene_startgene_endsliding_windowsgene_name
<chr><chr><dbl><dbl><int><chr><chr><chr><chr><chr><int><int><chr><chr>
ENSG00000102882chr16283600003.4e+0730123505chr16:26796952-29685831,chr16:29685831-4638151316_26796952-29685831,16_29685831-4638151316_26796952-29685831,16_29685831_46381513TADB_1164,TADB_1165chr16_24031743_30613717,chr16_27341433_349915513012350630114105chr16:21437007-30613717,chr16:22388493-34991551,chr16:24031743-46400000,chr16:27341433-49746318,chr16:30975802-50968730MAPK3
$target_LD_ids
A matrix: 1 x 2 of type chr
chr16:26796952-29685831chr16:29685831-46381513
$target_sums_ids
A matrix: 1 x 2 of type chr
16_26796952-2968583116_29685831-46381513
$gene_region
'chr16:28360000-3.4e+07'
$target_TAD_ids
A matrix: 1 x 2 of type chr
chr16_24031743_30613717chr16_27341433_34991551
In [3]:
gene_id = target_gene_info$gene_info$region_id
chrom = target_gene_info$gene_info$`#chr`
In [4]:
source('../../codes/utilis.R')
expression_in_rosmap_bulk(target_gene_info)
No description has been provided for this image

Section 1: Fine-mapping for xQTL and GWAS¶

see notebook

In [15]:
region_p
No description has been provided for this image

Bellenguez et al GWAS signals has many overlap with CS from other xQTL sources. This motivates us to look further. The figure above shows the ranges of CS to give us a loci level idea. Below, we show the variants in those CS, color-coding the variants that are shared between them in the same color. In particular, AD GWAS signals are also captured by a few xQTL data, although at this point we don't have formal statistical (colocalization) evidences for these observations yet.

In [17]:
pip_p
No description has been provided for this image

Section 2: Multi-context colocalization with Bellenguez 2022¶

This is done using ColocBoost. The most updated version of ColocBoost results are under path s3://statfungen/ftp_fgc_xqtl/analysis_result/ColocBoost/2024_9/

In [4]:
cb_res <- readRDS(paste0("/data/analysis_result/ColocBoost/2024_9/",gene_id,"_res.rds") )
In [4]:
cb_res <- readRDS(paste0("/data/analysis_result/ColocBoost/2024_9/",gene_id,"_res.rds") )
In [9]:
cb <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2)
No description has been provided for this image
In [10]:
pdf('plots/MAPK3/sec2.colocboost_res.pdf', width = 10, height = 5)
replayPlot(cb$p)
dev.off()
pdf: 2
In [11]:
# colocalized variants
cb_res_table
A data.frame: 2 x 8
colocalized phenotypespurity# variantshighest VCPcolocalized indexcolocalized variantsmax_abs_z_variantcset_id
<chr><dbl><dbl><dbl><chr><chr><chr><chr>
Exc; Inh; DLPFC; PCC; pQTL 1.0000000 40.41128295175; 5209; 5212; 5208 chr16:30123335:T:C; chr16:30130700:G:C; chr16:30133058:C:G; chr16:30130664:T:C chr16:30123335:T:Ccoloc_sets:Y5_Y6_Y7_Y9_Y11:CS1
AC; Monocyte; AD_Bellenguez_20220.63256761120.30549774843; 4856; 4857; 4874; 4825; 4833; 4827; 4810; 5010; 4940; 4844; 4864; 4865; 5012; 4943; 4985; 4935; 5056; 4954; 4913; 4978; 4849; 4946; 4962; 4873; 4876; 4925; 5023; 4961; 4866; 5055; 4929; 4930; 4743; 4950; 5029; 5070; 4992; 4782; 4731; 4952; 4987; 4964; 4821; 4820; 4839; 5003; 4746; 4770; 4769; 5067; 4760; 4745; 4811; 4761; 4734; 4945; 5022; 4877; 4790; 4803; 4791; 4789; 4890; 4921; 4795; 4796; 4807; 4725; 4718; 4798; 4916; 5118; 4786; 4799; 5062; 5034; 5108; 4730; 5104; 5123; 5128; 4854; 4804; 4809; 4818; 4847; 4813; 4869; 4794; 4750; 4751; 4784; 4749; 4868; 4752; 4759; 4974; 4972; 4975; 5004; 5047; 4887; 4939; 4984; 5074; 5044; 4976; 4953; 4969; 4997; 4999chr16:29954956:G:C; chr16:30001745:A:G; chr16:29992451:C:CCT; chr16:29971163:G:A; chr16:30004244:C:T; chr16:29984559:G:A; chr16:30004016:A:G; chr16:29971798:G:T; chr16:29971827:T:C; chr16:29972749:C:G; chr16:29974884:C:G; chr16:29975204:G:A; chr16:29977028:G:A; chr16:29981787:C:G; chr16:29982365:G:C; chr16:29983897:G:A; chr16:29989009:T:G; chr16:29989580:G:A; chr16:29991755:G:A; chr16:30000297:C:T; chr16:30001945:C:T; chr16:30002353:C:G; chr16:30006574:T:C; chr16:30007179:C:T; chr16:29984482:A:C; chr16:29986879:A:G; chr16:29997323:C:T; chr16:30008054:A:C; chr16:29973518:C:G; chr16:29977620:A:G; chr16:29998953:A:G; chr16:29959513:G:A; chr16:29959514:C:T; chr16:29971245:CAAAA:TAAAA; chr16:30004701:C:T; chr16:30003243:C:T; chr16:29956113:A:G; chr16:29956694:C:T; chr16:29961369:C:T; chr16:29962846:G:A; chr16:29966241:A:T; chr16:29966682:C:T; chr16:29953036:T:C; chr16:29958547:CTTT:CT; chr16:29971640:C:T; chr16:30007661:C:G; chr16:29946895:G:A; chr16:29949272:T:C; chr16:29951031:A:G; chr16:29951481:G:A; chr16:29959536:C:G; chr16:29961144:CA:CAA; chr16:30039101:C:CTAAT; chr16:30067171:G:C; chr16:30038757:CA:CAA; chr16:30039173:G:A; chr16:30049923:CAA:CA; chr16:30052325:A:G; chr16:30052459:A:G; chr16:30011243:A:G; chr16:30061209:T:C; chr16:30072731:AAAAT:TAAAT; chr16:30078587:C:T; chr16:30078590:T:C; chr16:30083915:C:T; chr16:30014188:A:T; chr16:30027326:A:G; chr16:30027327:G:C; chr16:30036609:G:A; chr16:30057148:T:G; chr16:30044010:CTT:CTTT; chr16:30085147:T:TG; chr16:30071187:G:T; chr16:30039184:C:T; chr16:30080727:A:G; chr16:30096460:A:G; chr16:30099146:T:C; chr16:30100035:A:AAAAT; chr16:30102803:C:T; chr16:30025807:A:C; chr16:30028857:C:G; chr16:30031789:T:G; chr16:30034584:C:CAAT; chr16:30034721:G:A; chr16:30036756:C:G; chr16:30037732:GC:G; chr16:30039991:A:G; chr16:30044228:T:C; chr16:30047763:AT:ATTT; chr16:30048272:T:G; chr16:30011836:T:C; chr16:30017814:A:G; chr16:30010081:C:T; chr16:30034468:C:G; chr16:30045827:C:T; chr16:30031356:G:T; chr16:30015148:G:A; chr16:30019378:T:C; chr16:30012465:C:G; chr16:30007399:C:T; chr16:30037232:C:T; chr16:30044429:T:C; chr16:30049334:A:G; chr16:30018227:C:T; chr16:30057033:C:T; chr16:30082458:C:G; chr16:30105932:G:A; chr16:30022107:A:G; chr16:30022868:A:G; chr16:30023148:T:G; chr16:30033260:G:A; chr16:29983415:T:Achr16:29954956:G:Ccoloc_sets:Y8_Y10_Y18:MergeCS1
In [12]:
# effect sign for each coloc sets
get_effect_sign_csets(cb_res)
$`coloc_sets:Y5_Y6_Y7_Y9_Y11:CS1`
A data.frame: 4 x 6
variantsExcInhDLPFCPCCpQTL
<chr><dbl><dbl><dbl><dbl><dbl>
chr16:30123335:T:Cchr16:30123335:T:C12.790425.35010214.7196613.6046710.88035
chr16:30130700:G:Cchr16:30130700:G:C12.829245.22267814.4806213.6046710.99189
chr16:30133058:C:Gchr16:30133058:C:G12.829245.22267814.4806213.6046710.99189
chr16:30130664:T:Cchr16:30130664:T:C12.866455.17177014.3820113.6046710.99189
$`coloc_sets:Y8_Y10_Y18:MergeCS1`
A data.frame: 112 x 4
variantsACMonocyteAD_Bellenguez_2022
<chr><dbl><dbl><dbl>
chr16:30001745:A:Gchr16:30001745:A:G -6.261315-4.422625-5.802469
chr16:30004244:C:Tchr16:30004244:C:T -6.174901-4.422625-5.716049
chr16:30004701:C:Tchr16:30004701:C:T -6.114726-4.312153-5.654321
chr16:30011243:A:Gchr16:30011243:A:G -4.855478-4.812469-6.222222
chr16:29992451:C:CCTchr16:29992451:C:CCT-6.240999-4.049458-5.855422
chr16:29998953:A:Gchr16:29998953:A:G -6.135129-4.049458-5.951807
chr16:29997323:C:Tchr16:29997323:C:T -6.150498-4.049458-5.765432
chr16:29984482:A:Cchr16:29984482:A:C -6.150498-4.049458-5.703704
chr16:30052325:A:Gchr16:30052325:A:G -4.681239-4.828032-6.085366
chr16:30027327:G:Cchr16:30027327:G:C -4.435317-4.790995-6.506024
chr16:30001945:C:Tchr16:30001945:C:T -6.226675-3.658199-6.109756
chr16:30006574:T:Cchr16:30006574:T:C -6.226675-3.658199-6.109756
chr16:30007179:C:Tchr16:30007179:C:T -6.226675-3.658199-6.085366
chr16:30052459:A:Gchr16:30052459:A:G -4.584267-4.828032-6.085366
chr16:30028857:C:Gchr16:30028857:C:G -4.598724-4.696249-6.308642
chr16:30044228:T:Cchr16:30044228:T:C -4.676958-4.696249-6.170732
chr16:30025807:A:Cchr16:30025807:A:C -4.546573-4.696249-6.308642
chr16:30078590:T:Cchr16:30078590:T:C -4.550651-4.799699-6.056645
chr16:30034721:G:Achr16:30034721:G:A -4.546573-4.696249-6.246914
chr16:30017814:A:Gchr16:30017814:A:G -4.554966-4.678517-6.259259
chr16:30039991:A:Gchr16:30039991:A:G -4.547812-4.696249-6.196184
chr16:30003243:C:Tchr16:30003243:C:T -6.179611-3.560549-6.085366
chr16:30031789:T:Gchr16:30031789:T:G -4.546573-4.696249-6.182927
chr16:30036756:C:Gchr16:30036756:C:G -4.546573-4.696249-6.182927
chr16:30010081:C:Tchr16:30010081:C:T -4.786583-4.174105-6.883721
chr16:30011836:T:Cchr16:30011836:T:C -4.474213-4.678517-6.320988
chr16:30022107:A:Gchr16:30022107:A:G -4.603651-4.590517-6.296296
chr16:30057148:T:Gchr16:30057148:T:G -4.492979-4.786071-6.033210
chr16:30036609:G:Achr16:30036609:G:A -4.383356-4.790995-6.170732
chr16:30007399:C:Tchr16:30007399:C:T -4.902656-4.174105-6.654762
...............
chr16:30004016:A:Gchr16:30004016:A:G -6.163242-4.4657950
chr16:29982365:G:Cchr16:29982365:G:C -6.229400-3.1553690
chr16:29983897:G:Achr16:29983897:G:A -6.229400-3.1553690
chr16:29989009:T:Gchr16:29989009:T:G -6.226675-3.2694150
chr16:30002353:C:Gchr16:30002353:C:G -6.226675-3.6581990
chr16:29986879:A:Gchr16:29986879:A:G -6.150498-4.0494580
chr16:30008054:A:Cchr16:30008054:A:C -6.150498-4.4226250
chr16:29973518:C:Gchr16:29973518:C:G -6.215389-3.1553690
chr16:29959513:G:Achr16:29959513:G:A -6.184695-3.3038680
chr16:29959514:C:Tchr16:29959514:C:T -6.184695-3.3038680
chr16:29971245:CAAAA:TAAAAchr16:29971245:CAAAA:TAAAA-6.187371-2.8278410
chr16:29958547:CTTT:CTchr16:29958547:CTTT:CT -6.167144-3.3063950
chr16:30007661:C:Gchr16:30007661:C:G -6.053990-4.2073730
chr16:29959536:C:Gchr16:29959536:C:G -6.028068-4.2005210
chr16:29961144:CA:CAAchr16:29961144:CA:CAA -6.080015-3.1603130
chr16:30039101:C:CTAATchr16:30039101:C:CTAAT -3.337549-5.1401330
chr16:30038757:CA:CAAchr16:30038757:CA:CAA -4.683808-4.9639160
chr16:30039173:G:Achr16:30039173:G:A -4.442710-4.8334620
chr16:30049923:CAA:CAchr16:30049923:CAA:CA -4.451156-4.8302020
chr16:30072731:AAAAT:TAAATchr16:30072731:AAAAT:TAAAT-4.447680-4.7996990
chr16:30014188:A:Tchr16:30014188:A:T -4.190042-4.7941720
chr16:30027326:A:Gchr16:30027326:A:G -4.435317-4.7909950
chr16:30044010:CTT:CTTTchr16:30044010:CTT:CTTT -4.295508-4.7764700
chr16:30085147:T:TGchr16:30085147:T:TG -4.074159-4.7493270
chr16:30071187:G:Tchr16:30071187:G:T -4.138452-4.7413070
chr16:30039184:C:Tchr16:30039184:C:T -4.347468-4.7283430
chr16:30034584:C:CAATchr16:30034584:C:CAAT -4.473958-4.6962490
chr16:30037732:GC:Gchr16:30037732:GC:G -4.546573-4.6962490
chr16:30047763:AT:ATTTchr16:30047763:AT:ATTT -4.453416-4.6962490
chr16:30048272:T:Gchr16:30048272:T:G -4.415012-4.6962490
In [13]:
# LD between coloc sets
get_between_purity_simple(cb_res, gene.name = gene_id, path = '/data/colocalization/QTL_data/eQTL/')
A matrix: 1 x 5 of type chr
coloc_csets_1coloc_csets_2min_abs_cormax_abs_cormedian_abs_cor
coloc_sets:Y5_Y6_Y7_Y9_Y11:CS1coloc_sets:Y8_Y10_Y18:MergeCS10.4162104526054720.7430157073887450.621142491680185

Here, different colors refer to different 95% Colocalization Sets (CoS, a metric developed in ColocBoost indicating that there is 95% probabilty that this CoS captures a colocalization event). We only included ROSMAP data for this particular ColocBoost analysis. In this case, we observe cell specific eQTL, bulk sQTL colocalization on ROSMAP data with AD as two separate CoS, suggesting two putative causal signals. We did not detect colocalization with pQTL of statistical significance although from Section 1 there are some overlap with pQTL signals in fine-mapping CS, the overlapped variants in CS have small PIP.

Section 3: Refinement of colocalized loci with other AD GWAS¶

Here we refine the colocalization with other AD GWAS to iron out any heterogeniety between studies (heterogeniety can come from many sources), to get additional candidate loci from these more heterogenous sources as candidates to study.

In [14]:
AD_cohorts <- c('AD_Jansen_2021', 'AD_Bellenguez_EADB_2022', 'AD_Bellenguez_EADI_2022',
             'AD_Kunkle_Stage1_2019', 'AD_Wightman_Excluding23andMe_2021',
             'AD_Wightman_ExcludingUKBand23andME_2021', 'AD_Wightman_Full_2021')
cb_ad <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2, add_gwas = TRUE, gene_id = gene_id, cohorts = AD_cohorts)
No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.Error : File '/data/GWAS/ADGWAS_sumstats/16_29685831-46381513.RSS_QC_RAISS_imputed.AD_Kunkle_Stage1_2019.sumstats.tsv.gz' does not exist or is non-readable. getwd()=='/data/interactive_analysis/hs3163/GIT/xqtl-paper/AD_targets/MAPK3'
No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.
No description has been provided for this image
In [15]:
pdf('plots/MAPK3/sec3.colocboost_res_allad.pdf', width = 10, height = 5)
replayPlot(cb_ad$p)
dev.off()
pdf: 2

Section 4: Assessment of multi-context xQTL effect sizes¶

Option 1: ColocBoost + MASH¶

Use colocboost variants and check for mash posterior contrast to see if the effect size are shared or specific or even opposite. Advantage is that colocboost result is AD GWAS informed; issue is that marginal posterior effects is not always the joint

In [16]:
mash_p <- mash_plot(gene_name = 'MAPK3')

options(repr.plot.width = 10, repr.plot.height = 10)

for (mash_p_tmp in mash_p) {
    print(mash_p_tmp)
}

Option 2: mvSuSiE¶

Use mvSuSiE multicontext fine-mapping results --- the bubble plot to check posterior effects. Issue is that we don't have this results yet, and this is limited to one cohort at a time, without information from AD.

We should go for option 1 by default and if we want to make claim about opposite effect size we double-check with mvSuSiE multicontext analysis.

Section 5: Multi-context causal TWAS (including conventional TWAS and MR)¶

The most updated version of cTWAS analysis are under path s3://statfungen/ftp_fgc_xqtl/analysis_result/cTWAS/

TWAS results¶

We report TWAS from all contexts and methods from the pipeline. Here we will filter it down to the best performing methods and only keep contexts that are significant.

In [17]:
plot_TWAS_res(gene_id = gene_id, gene_name = gene_name)
No description has been provided for this image

MR results¶

This is only available for genes that are deemed significant in TWAS and have summary statistics available for effect size and standard errors in GWAS, in addition to z-scores --- current version does not support z-scores although we will soon also support z-scores in MR using MAF from reference panel.

cTWAS results¶

To be updated soon.

Section 6: Context specific multi-gene fine-mapping¶

A quick analysis: using the xQTL-AD summary table (flatten table)¶

We extract from xQTL-AD summary table the variants to get other genes also have CS with the variants shared by target gene and AD.

In [18]:
multigene_flat <- get_multigene_multicontext_flatten('Fungen_xQTL_allQTL.overlapped.gwas.export.MAPK3.rds', sQTL = 'no_MSBB')
multigene_flat
A data.frame: 98 x 6
gene_id#chrstartendgene_namecontexts
<chr><chr><int><int><chr><chr>
ENSG00000005844chr163047265730472658ITGAL ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL
ENSG00000013364chr162982039329820394MVP ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL
ENSG00000047578chr162755014327550144KATNIP ROSMAP_DLPFC_sQTL
ENSG00000052344chr163113572631135727PRSS8 MiGA_GTS_eQTL,MiGA_SVZ_eQTL,ROSMAP_AC_sQTL
ENSG00000077235chr162754991227549913GTF3C1 ROSMAP_DLPFC_sQTL
ENSG00000077238chr162731366727313668IL4R MiGA_SVZ_eQTL,ROSMAP_DLPFC_sQTL
ENSG00000079616chr162979071829790719KIF22 ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000080603chr163069820830698209SRCAP MiGA_GTS_eQTL,ROSMAP_DLPFC_sQTL
ENSG00000089280chr163118013731180138FUS Oli_DeJager_eQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000090238chr163009691430096915YPEL3 BM_36_MSBB_eQTL,BM_44_MSBB_eQTL,Oli_Kellis_eQTL,Inh_Kellis_eQTL,Exc_mega_eQTL,ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000099364chr163092305430923055FBXL19 ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000099365chr163101063731010638STX1B BM_22_MSBB_eQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000099377chr163098520630985207HSD3B7 MiGA_SVZ_eQTL,ROSMAP_PCC_sQTL
ENSG00000099381chr163095775330957754SETD1A ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000102870chr163078720430787205ZNF629 ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000102879chr163018282630182827CORO1A ROSMAP_DLPFC_sQTL
ENSG00000102886chr163011353630113537GDPD3 MiGA_THA_eQTL,DLPFC_DeJager_eQTL,ROSMAP_DLPFC_sQTL
ENSG00000103485chr162966327829663279QPRT MiGA_GTS_eQTL,ROSMAP_DLPFC_sQTL
ENSG00000103496chr163103288831032889STX4 MiGA_SVZ_eQTL,Ast_Kellis_eQTL,Exc_mega_eQTL,ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL
ENSG00000103502chr162986327529863276CDIPT DLPFC_DeJager_eQTL,PCC_DeJager_eQTL
ENSG00000103507chr163110610631106107BCKDK MiGA_THA_eQTL
ENSG00000103510chr163111448831114489KAT8 MiGA_THA_eQTL,ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000103549chr163076174430761745RNF40 BM_44_MSBB_eQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000140678chr163135517331355174ITGAX MiGA_GTS_eQTL,DLPFC_Klein_gpQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000140682chr163147158431471585TGFB1I1MiGA_GTS_eQTL
ENSG00000140688chr163150930831509309RUSF1 ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL
ENSG00000140691chr163145807931458080ARMC5 MiGA_SVZ_eQTL,ROSMAP_DLPFC_sQTL
ENSG00000149922chr163009192330091924TBX6 MiGA_GTS_eQTL,MiGA_SVZ_eQTL,ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL
ENSG00000149923chr163007597730075978PPP4C ROSMAP_PCC_sQTL,STARNET_eQTL
ENSG00000149925chr163006417130064172ALDOA DLPFC_Bennett_pQTL
..................
ENSG00000180096chr163039599030395991SEPTIN1 ROSMAP_AC_sQTL
ENSG00000180209chr163037093330370934MYLPF MiGA_THA_eQTL
ENSG00000181625chr162945450029454501SLX1B MiGA_SVZ_eQTL,DLPFC_DeJager_eQTL
ENSG00000183336chr162945496329454964BOLA2 Exc_DeJager_eQTL,DLPFC_DeJager_eQTL,PCC_DeJager_eQTL,ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL
ENSG00000184110chr162868855728688558EIF3C Exc_DeJager_eQTL,AC_DeJager_eQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL,ROSMAP_PCC_sQTL,STARNET_eQTL
ENSG00000185905chr162974598929745990C16orf54 MiGA_THA_eQTL
ENSG00000185947chr163187380631873807ZNF267 MiGA_GTS_eQTL,MiGA_THA_eQTL
ENSG00000188603chr162849557428495575CLN3 ROSMAP_DLPFC_sQTL
ENSG00000196118chr163076208430762085CCDC189 Exc_DeJager_eQTL,PCC_DeJager_eQTL,ROSMAP_DLPFC_sQTL
ENSG00000196502chr162862362428623625SULT1A1 Knight_eQTL,MiGA_GTS_eQTL,BM_22_MSBB_eQTL,MSBB_BM36_pQTL,Exc_Kellis_eQTL,DLPFC_Bennett_pQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL,ROSMAP_PCC_sQTL,STARNET_eQTL
ENSG00000196993chr162875178628751787NPIPB9 BM_10_MSBB_eQTL
ENSG00000197162chr163058576830585769ZNF785 MiGA_GTS_eQTL,Exc_DeJager_eQTL,Exc_Kellis_eQTL,Exc_mega_eQTL,STARNET_eQTL
ENSG00000197165chr162859704928597050SULT1A2 DLPFC_DeJager_eQTL,AC_DeJager_eQTL,Exc_mega_eQTL,ROSMAP_DLPFC_sQTL
ENSG00000197302chr163171322831713229ZNF720 MiGA_THA_eQTL
ENSG00000198064chr163025450930254510NPIPB13 MiGA_SVZ_eQTL,Ast_Kellis_eQTL,Oli_Kellis_eQTL,OPC_Kellis_eQTL,Exc_Kellis_eQTL,Inh_Kellis_eQTL,Exc_mega_eQTL,monocyte_ROSMAP_eQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000205456chr163225271832252719TP53TG3D DLPFC_DeJager_eQTL
ENSG00000213648chr162945991229459913SULT1A4 MiGA_SVZ_eQTL
ENSG00000229809chr163057273330572734ZNF688 ROSMAP_DLPFC_sQTL
ENSG00000233232chr162847117428471175NPIPB7 DLPFC_DeJager_eQTL,AC_DeJager_eQTL,ROSMAP_DLPFC_sQTL
ENSG00000254206chr162940402829404029NPIPB11 MiGA_GTS_eQTL,Exc_DeJager_eQTL,Inh_DeJager_eQTL,DLPFC_DeJager_eQTL,PCC_DeJager_eQTL,AC_DeJager_eQTL,Exc_Kellis_eQTL,Exc_mega_eQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000255439chr163109495531094956AC135050.2 BM_10_MSBB_eQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL,STARNET_eQTL
ENSG00000261052chr163019925430199255SULT1A3 MSBB_BM36_pQTL,Ast_DeJager_eQTL
ENSG00000261740chr162945465029454651BOLA2-SMG1P6MSBB_BM36_pQTL,DLPFC_DeJager_eQTL,PCC_DeJager_eQTL,AC_DeJager_eQTL,ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL
ENSG00000261832chr162849209728492098AC138894.1 ROSMAP_DLPFC_sQTL
ENSG00000280789chr162981615129816152PAGR1 MiGA_SVZ_eQTL
ENSG00000280893chr162981226029812261AC009133.6 ROSMAP_PCC_sQTL
ENSG00000281991chr163074064130740642TMEM265 BM_22_MSBB_eQTL
ENSG00000282034chr163070406630704067AC106886.5 ROSMAP_PCC_sQTL
ENSG00000285043chr163005308930053090AC093512.2 ROSMAP_DLPFC_sQTL
ENSG00000288656chr162862355428623555AC020765.6 ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL,ROSMAP_PCC_sQTL

Other genes implicated are PROC and HS6ST1 in MiGA cohort which may share causal eQTL with MAPK3. Further look into the data-set --- using these genes as targets and repeating what we did above for MAPK3 --- might be needed to establish a more certain conclusion.

Alternatively, we may be able to apply a multi-gene statistical fine-mapping test on MAPK3 region to find these genes, as you will see in the section below.

A statistically solid approach: mvSuSiE multi-gene analysis¶

This multi-gene fine-mapping analysis was done for each xQTL context separately. We will need to check the results for all contexts where this gene has an xQTL, in order to identify if there are other genes also sharing the same xQTL with this target gene. We included other genes in the same TAD window along with this gene, and extended it into a sliding window approach to include multiple TADs just in case. You need to check the sliding windows belongs to that gene (TSS) on analysis repo.

In [19]:
sliding_windows <- target_gene_info$gene_info$sliding_windows %>% strsplit(., ',') %>% unlist %>% as.character
sliding_windows
  1. 'chr16:21437007-30613717'
  2. 'chr16:22388493-34991551'
  3. 'chr16:24031743-46400000'
  4. 'chr16:27341433-49746318'
  5. 'chr16:30975802-50968730'

The most updated version of mvSuSiE multi-gene results are under path s3://statfungen/ftp_fgc_xqtl/analysis_result/mvsusie_multi_gene_test/multi_gene/ Currently it is still WIP. You can revisit this later when we prompt you to. Here is an example for MAPK3:

In [20]:
mnm_gene <- list()
for (window in sliding_windows) {
    mnm_gene_tmp <- NULL
    mnm_gene_tmp <- tryCatch(
        readRDS(paste0('/data/analysis_result/mvsusie_multi_gene/multi_gene/ROSMAP_multi_gene.', window, '.mnm.rds')),
        error = function(e) NULL
    )
    
    if (!is.null(mnm_gene_tmp)) {
        if(target_gene_info$gene_info$region_id %in% mnm_gene_tmp$mvsusie_fitted$condition_names){
        tryCatch({
            p <- mvsusieR::mvsusie_plot(mnm_gene_tmp$mvsusie_fitted, sentinel_only = F, add_cs = T)
            print(p)  # This ensures the plot is displayed in JupyterLab
        }, error = function(e) NULL)
        } else {
            message('There is mnm result for sliding window ',window,', but not include target gene ', gene_name, ' in CS')
        }
        mnm_gene <- append(mnm_gene, list(mnm_gene_tmp))
    }
}

In this case, there is no statistical evidence for MAPK3 sharing any of its xQTL with other genes in ROSMAP Microglia data we looked into; although we have not analyzed MiGA this way yet (which showed some potential signals from the quick analysis above).

Section 7: Epigenomic QTL and their target regions¶

fsusie, see notebook

Generate a crude plot to determined whether the story is interesting¶

This is a crude version of the case study plot which shows the fsusie Effect (colored line), the gene body (black arrow), the epi-QTL (large dots with the same color as the effects) and ADGWAS cs position (small red dots).

Only produce the refine plot if we can see either:

  1. There are sharing snp between epi-QTL and AD CS
  2. There are the AD CS located within one of the effect range
  3. The crude plot suggest something interesting
In [9]:
options(repr.plot.width = 40, repr.plot.height = 40)

 ggplot() + theme_bw() + facet_grid(cs_coverage_0.95 + study + region ~ ., labeller = labeller(.rows = function(x) gsub("([_:,-])", "\n", x)), scale = "free_y") +

      theme(text = element_text(size = 20), strip.text.y = element_text(size = 25, angle = 0.5)) +
     # xlim(view_win) +
      ylab("Estimated effect") +
   #   geom_line(data = haQTL_df %>% mutate(study = "haQTL effect") %>% filter(CS == 5),
    #            aes_string(y = "fun_plot", x = "x", col = "CS"), size = 4, col = "#00AEEF") +
  geom_line(data = effect_of_interest ,
                aes_string(y = "fun_plot", x = "x", col = "cs_coverage_0.95"), size = 2) +  
    geom_point(data = effect_of_interest ,
                aes_string(y = "pip", x = "pos", col = "cs_coverage_0.95"), size = 10) +
    theme(text = element_text(size = 40), strip.text.y = element_text(size = 15, angle = 0.5), 
            axis.text.x = element_text(size = 40), axis.title.x = element_text(size = 40)) +
      xlab("Position") +
      ylab("Estimated\neffect") +
      geom_segment(arrow = arrow(length = unit(1, "cm")), aes(x = gene_start, xend = gene_end, y = 1, yend = 1), size = 6,
                  data = tar_gene_info$gene_info, alpha = 0.3) +
      geom_text(aes(x = (gene_start + gene_end) / 2, y = 1 , label = gene_name), size = 10, 
              data = tar_gene_info$gene_info)+
        geom_point(aes(x = pos, y = pip  ) ,color = "red", data = flatten_table%>%filter( str_detect(study,"AD_") , cs_coverage_0.95 != 0  )%>%mutate(AD_study = study%>%str_replace_all("_","\n" ))%>%select(-study,-region,-cs_coverage_0.95) ) 
No description has been provided for this image

Section 8: Context focused validation in other xQTL data¶

see notebook

add fake version for now, so you don't have to refer to above link

Background: our "discovery set" is ROSMAP but we have additional "validation" sets including:

  • STARNET
  • MiGA
  • KnightADRC
  • MSBB
  • metaBrain
  • UKB pQTL

TODO:

  • Get from Carlos WashU CSF based resource (pQTL and metabolomic QTL)

This section shows verification of findings from these data-sets. In principle we should check them through sections 1-6 more formally. In practice we will start with colocalization via colocboost --- since our study is genetics (variant and loci level) focused. We can selectively follow them up for potentially intereting validations. We therefore only demonstrate validation via colocboost as a starting point.

In [5]:
finempping_contexts <- readRDS(paste0(gene_name, '_finemapping_contexts.rds')) # from sec1
In [5]:
finempping_contexts <- readRDS(paste0(gene_name, '_finemapping_contexts.rds')) # from sec1
In [6]:
finempping_contexts <- get_norosmap_contexts(finempping_contexts)
In [7]:
cb_contexts <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2, add_QTL = TRUE, cohorts = finempping_contexts, gene_id = gene_id)
No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.
No description has been provided for this image

Section 9: Non-linear effects of xQTL¶

see notebook

APOE interaction¶

In [10]:
options(repr.plot.width=6, repr.plot.height=6)

ggplot(MAPK3_int_res, aes(x = variant_id, y = qvalue_interaction)) +
  geom_point(alpha = 0.7, size = 6) +
  labs(title = "qvalue for MAPK3 csets in interaction association nalysis",
       x = "Gene Name",
       y = "qvalue_interaction",
       size = "qvalue_interaction") +
  theme_minimal(base_size = 14) +
  theme(panel.background = element_blank(),
        panel.grid.major = element_line(color = "grey80"),
        legend.position = NULL,
        axis.text.x = element_text(angle = 45, hjust = 1))  + ylim(0,1)
  # scale_color_manual(values = colorRampPalette(brewer.pal(8, "Set1"))(length(unique(flat_var$gene_name))))
ggsave('plots/MAPK3/sec11.interaction_association_MAPK3_lessPIP25.pdf', height = 5, width = 8) 
No description has been provided for this image

In conclusion, there is no interaction QTL with APOE identified.

Section 10: in silico functional studies in iPSC model¶

see notebook

In [11]:
vars_p
No description has been provided for this image
In [13]:
apoe_p
No description has been provided for this image

Section 11: Functional annotations of selected loci¶

see notebook

TODO

  • Touch base with Ryan on the snATAC annotations
  • Run this by Pavel to see if there are additional comments on how we do this
In [ ]:
 

Section 12: Candidate loci as trans-xQTL¶

see notebook

In [9]:
options(repr.plot.width=12, repr.plot.height=6)
if(!is.null(flat_var)){
   p =  ggplot(flat_var, aes(x = gene_name, y = pip, size = pip)) +
      geom_point(alpha = 0.7) +
      labs(title = paste0("PIP values for trans fine mapped Genes in ", gene_name ," csets with AD"),
           x = "Gene Name",
           y = "PIP",
           size = "PIP",
           color = "CS Coverage 0.95 Min Corr") +
      theme_minimal(base_size = 14) +
      theme(panel.background = element_blank(),
            panel.grid.major = element_line(color = "grey80"),
            legend.position = NULL,
            axis.text.x = element_text(angle = 45, hjust = 1))  
      # scale_color_manual(values = colorRampPalette(brewer.pal(8, "Set1"))(length(unique(flat_var$gene_name))))
    ggsave(paste0('plots/MAPK3/sec12.trans_fine_mapping_',gene_name,'.pdf'),p, height = 5, width = 8)
    p
    } else{
    message('There are no detectable trans signals for ', gene_name)
}

Creative thinking: generate hypothesis, search in literature, raise questions to discuss¶

You can now generate some preliminary hypotheses based on the above results. Next, you should search for evidence in the literature to support or refine these hypotheses and identify additional analyses needed to confirm them.