Case study: GRN xQTL and AD GWAS¶

This notebook documents the analysis of xQTL case study on a targeted gene, GRN.

  • Section 0: Sanity check
  • Section 1: Fine-mapping for xQTL and GWAS
  • Section 2: Multi-context colocalization with Bellenguez 2022
  • Section 3: Refinement of colocalized loci with other AD GWAS
  • Section 4: Assessment of multi-context xQTL effect sizes
  • Section 5: Multi-context causal TWAS (including conventional TWAS and MR)
  • Section 6: Context specific multi-gene fine-mapping
  • Section 7: Epigenomic QTL and their target regions
  • Section 8: Context focused validation in other xQTL data
  • Section 9: Non-linear effects of xQTL
  • Section 10: in silico functional studies in iPSC model
  • Section 11: Functional annotations of selected loci
  • Section 12: Candidate loci as trans-xQTL

Overview¶

FunGen-xQTL resource contains 67 xQTL contexts as well as 9 AD GWAS fine-mapped genome-wide. The overarching goal for case studies is to use these resource to raise questions and learn more about gene targets of interest.

Overall, a case study consists of the following aspects:

  • Check the basic information of the gene
  • Check the existing xQTL and integrative analysis results, roughly including
    • Summary table for univariate fine-mapping
    • Marginal association results
    • Multi-gene and multi-context fine-mapping
    • Multi-context colocalization with AD GWAS
    • TWAS, MR and causal TWAS
    • Integration with epigenetic QTL
    • Quantile QTL
    • Interaction QTL
    • Validation:
      • Additional xQTL data in FunGen-xQTL
      • Additional AD GWAS data-set already generated by FunGen-xQTL
    • In silico functional studies
      • Additional iPSC data-sets
    • Functional annotations of variants, particularly in relevant cellular contexts
  • Creative thinking: generate hypothesis, search in literature, raise questions to discuss

Computing environment setup¶

Interactive analysis will be done on AWS cloud powered by MemVerge. Please contact Gao Wang to setup accounts for you to start the analysis.

Please follow instructions on https://wanggroup.org/productivity_tips/mmcloud-interactive to configure your computing environment. Here are some additional packages you need to install after the initial configuration, in order to perform these analysis:

in terminal with bash:

micromamba install -n r_libs r-pecotmr

in R:

install.packages('BEDMatrix')

How to Use This Notebook¶

  1. Before you start: Load functions from cb_plot.R and utilis.R, located at <xqtl-paper>/codes/. These functions and resources are packaged to streamline the analysis and ensure everything is as clean as possible. And also the codes for ColocBoost under path /data/colocalization/colocboost/R.
  2. Inside of this notebook, use sed -i or Ctrl+F to replace the gene GRN with the gene you want to analyze.
  3. For detailed analysis in some sections, please refer to the corresponding analysis notebooks as indicated. These companion notebooks are available under this same folder. The rest of the tasks can be completed with a few lines of code, as demonstrated in this notebook.
  4. Similarly for the companion notebooks you should also use the sed -i or Ctrl+F replacing gene_name (GRN in this case) with the gene you want to investigate.

While using this notebook, you may need to generate three intermediate files from Sections 1 and 2, which will be useful for downstream analysis:

  • a. Section 1:
    • Fine-mapping contexts that indicate shared signals with AD, GRN_finemapping_contexts.rds. This can be used as input for Section 8 the multi-cohort validation step
    • A subset of the xQTL-AD table, Fungen_xQTL_allQTL.overlapped.gwas.export.GRN.rds. This can be used as input for Section 12.
  • b. Section 2: A variant list showing colocalization in cohorts we analyzed with ColocBoost, GRN_colocboost_res.rds. this can be used as input for Sections 7, 9, 10, and 12.

Section 0: Sanity check¶

Check the basic information of the gene¶

  • To gain a preliminary understanding of this gene’s expression—specifically, whether it is cell-specific—can help us quickly determine if our results are consistent with expectations.

Useful websites:

  1. check gene function, (immune) cell type specificity, tissue specifity, protein location: https://www.proteinatlas.org
  2. check gene position and structure: https://www.ncbi.nlm.nih.gov/gene/
  3. other collective information: https://www.genecards.org

Check the existing results which are inputs to this analysis¶

In [16]:
# If an error occurs while sourcing scripts, it might be because your get() returned NULL. 
#Please restart the kernel or click the R kernel in the upper right corner to resolve the issue.
source('../../codes/cb_plot.R')
source('../../codes/utilis.R')

for(file in list.files("/data/colocalization/colocboost/R", pattern = ".R", full.names = T)){
          source(file)
        }
gene_name = 'GRN'

dir.create(paste0('plots/', gene_name), recursive = T)

get basic target gene information

In [17]:
target_gene_info <- get_gene_info(gene_name = gene_name)
target_gene_info
$gene_info
A data.table: 1 x 14
region_id#chrstartendTSSLD_matrix_idLD_sumstats_idLD_sumstats_id_oldTADB_indexTADB_idgene_startgene_endsliding_windowsgene_name
<chr><chr><dbl><dbl><int><chr><chr><chr><chr><chr><int><int><chr><chr>
ENSG00000030582chr17425600004668000044345261chr17:42087601-45383525,chr17:45383525-5016286417_42087601-45383525,17_45383525-5016286417_42087601-45383525,17_45383525_50162864TADB_1194,TADB_1195chr17_40717742_44389199,chr17_41536509_477574644434526244353106chr17:30937006-42930542,chr17:34918066-44389199,chr17:36924414-47757464,chr17:40717742-51756219,chr17:41536509-52297936,chr17:45274784-55961956GRN
$target_LD_ids
A matrix: 1 x 2 of type chr
chr17:42087601-45383525chr17:45383525-50162864
$target_sums_ids
A matrix: 1 x 2 of type chr
17_42087601-4538352517_45383525-50162864
$gene_region
'chr17:42560000-46680000'
$target_TAD_ids
A matrix: 1 x 2 of type chr
chr17_40717742_44389199chr17_41536509_47757464
In [18]:
gene_id = target_gene_info$gene_info$region_id
chrom = target_gene_info$gene_info$`#chr`

Take a quick look for the expression of target gene in ROSMAP bulk data, we don't want them to be too low

In [19]:
source('../../codes/utilis.R')
expression_in_rosmap_bulk(target_gene_info)
No description has been provided for this image

Section 1: Fine-mapping for xQTL and GWAS¶

see notebook

In [31]:
region_p
No description has been provided for this image

Bellenguez et al GWAS signals has many overlap with CS from other xQTL sources. This motivates us to look further. The figure above shows the ranges of CS to give us a loci level idea. Below, we show the variants in those CS, color-coding the variants that are shared between them in the same color. In particular, AD GWAS signals are also captured by a few xQTL data, although at this point we don't have formal statistical (colocalization) evidences for these observations yet.

In [32]:
pip_p
No description has been provided for this image

Section 2: Multi-context colocalization with Bellenguez 2022¶

This is done using ColocBoost. The most updated version of ColocBoost results are under path s3://statfungen/ftp_fgc_xqtl/analysis_result/ColocBoost/2024_9/

In [33]:
cb_res <- readRDS(paste0("/data/analysis_result/ColocBoost/2024_9/",gene_id,"_res.rds") )
In [36]:
#save colocboost results
cb_res_table <- get_cb_summary(cb_res) 

saveRDS(cb_res_table, paste0(gene_name, "_colocboost_res.rds"))
In [37]:
cb <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2)
No description has been provided for this image
In [38]:
pdf('plots/GRN/sec2.colocboost_res.pdf', width = 10, height = 5)
replayPlot(cb$p)
dev.off()
pdf: 2
In [39]:
# colocalized variants
cb_res_table
A data.frame: 1 x 8
colocalized phenotypespurity# variantshighest VCPcolocalized indexcolocalized variantsmax_abs_z_variantcset_id
<chr><dbl><dbl><dbl><chr><chr><chr><chr>
Ast; Oli; Exc; Inh; DLPFC; AC; PCC; AD_Bellenguez_20221116346chr17:44352876:C:Tchr17:44352876:C:Tcoloc_sets:Y2_Y3_Y5_Y6_Y7_Y8_Y9_Y17:CS1
In [40]:
# effect sign for each coloc sets
get_effect_sign_csets(cb_res)
$`coloc_sets:Y2_Y3_Y5_Y6_Y7_Y8_Y9_Y17:CS1` =
A data.frame: 1 x 9
variantsAstOliExcInhDLPFCACPCCAD_Bellenguez_2022
<chr><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
chr17:44352876:C:Tchr17:44352876:C:T-3.359053-4.94038-8.02911-3.790964-10.31302-14.40894-12.713317.021739
In [41]:
# LD between coloc sets
get_between_purity_simple(cb_res, gene.name = gene_id, path = '/data/colocalization/QTL_data/eQTL/')

Here, different colors refer to different 95% Colocalization Sets (CoS, a metric developed in ColocBoost indicating that there is 95% probabilty that this CoS captures a colocalization event). We only included ROSMAP data for this particular ColocBoost analysis. In this case, we observe cell specific eQTL, bulk sQTL colocalization on ROSMAP data with AD as two separate CoS, suggesting two putative causal signals. We did not detect colocalization with pQTL of statistical significance although from Section 1 there are some overlap with pQTL signals in fine-mapping CS, the overlapped variants in CS have small PIP.

Section 3: Refinement of colocalized loci with other AD GWAS¶

Here we refine the colocalization with other AD GWAS to iron out any heterogeniety between studies (heterogeniety can come from many sources), to get additional candidate loci from these more heterogenous sources as candidates to study.

In [42]:
AD_cohorts <- c('AD_Jansen_2021', 'AD_Bellenguez_EADB_2022', 'AD_Bellenguez_EADI_2022',
             'AD_Kunkle_Stage1_2019', 'AD_Wightman_Excluding23andMe_2021',
             'AD_Wightman_ExcludingUKBand23andME_2021', 'AD_Wightman_Full_2021')
cb_ad <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2, add_gwas = TRUE, gene_id = gene_id, cohorts = AD_cohorts)
No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.
No description has been provided for this image
In [46]:
pdf('plots/GRN/sec3.colocboost_res_allad.pdf', width = 10, height = 5)
replayPlot(cb_ad$p)
dev.off()
pdf: 2

Section 4: Assessment of multi-context xQTL effect sizes¶

Option 1: ColocBoost + MASH¶

Use colocboost variants and check for mash posterior contrast to see if the effect size are shared or specific or even opposite. Advantage is that colocboost result is AD GWAS informed; issue is that marginal posterior effects is not always the joint.
AP Note: I'm not sure how to interpret this plot. What is lsfr? how interpret the beta difference to know if it is shared or opposite?

In [47]:
mash_p <- mash_plot(gene_name = 'GRN')

options(repr.plot.width = 10, repr.plot.height = 10)

for (mash_p_tmp in mash_p) {
    print(mash_p_tmp)
}
No description has been provided for this image

Option 2: mvSuSiE¶

Use mvSuSiE multicontext fine-mapping results --- the bubble plot to check posterior effects. Issue is that we don't have this results yet, and this is limited to one cohort at a time, without information from AD.

We should go for option 1 by default and if we want to make claim about opposite effect size we double-check with mvSuSiE multicontext analysis.

Section 5: Multi-context causal TWAS (including conventional TWAS and MR)¶

The most updated version of cTWAS analysis are under path s3://statfungen/ftp_fgc_xqtl/analysis_result/cTWAS/

TWAS results¶

We report TWAS from all contexts and methods from the pipeline. Here we will filter it down to the best performing methods and only keep contexts that are significant.
AP Comment: So here we can say that GRN is causal of AD in Exc and Oligo right?

In [48]:
plot_TWAS_res(gene_id = gene_id, gene_name = gene_name)
No description has been provided for this image

MR results¶

This is only available for genes that are deemed significant in TWAS and have summary statistics available for effect size and standard errors in GWAS, in addition to z-scores --- current version does not support z-scores although we will soon also support z-scores in MR using MAF from reference panel.

cTWAS results¶

To be updated soon.

Section 6: Context specific multi-gene fine-mapping¶

A quick analysis: using the xQTL-AD summary table (flatten table)¶

We extract from xQTL-AD summary table the variants to get other genes also have CS with the variants shared by target gene and AD.

FIXME why, so many genes here? this is spanning 4Mb whchi is huge, it looks this is coming from any genes overlapping with any xQTL CS associated to the gene, but it will be I think more informative to take only the genes CS overlapping the AD CS (or at least highlight it).

In [ ]:
 
In [ ]:
 
In [49]:
multigene_flat <- get_multigene_multicontext_flatten('Fungen_xQTL_allQTL.overlapped.gwas.export.GRN.rds', sQTL = 'no_MSBB')
multigene_flat
A data.frame: 83 x 6
gene_id#chrstartendgene_namecontexts
<chr><chr><int><int><chr><chr>
ENSG00000004939chr174426813444268135SLC4A1 MiGA_THA_eQTL
ENSG00000005102chr174366192143661922MEOX1 MiGA_THA_eQTL
ENSG00000005961chr174438964844389649ITGA2B Exc_Kellis_eQTL,Exc_mega_eQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000012048chr174317024443170245BRCA1 MiGA_SVZ_eQTL,MiGA_THA_eQTL,Exc_mega_eQTL,ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000013306chr174432486944324870SLC25A39 Knight_eQTL,BM_10_MSBB_eQTL,BM_36_MSBB_eQTL,Exc_DeJager_eQTL,Inh_DeJager_eQTL,DLPFC_DeJager_eQTL,PCC_DeJager_eQTL,AC_DeJager_eQTL,Oli_Kellis_eQTL,Exc_Kellis_eQTL,Inh_Kellis_eQTL,Exc_mega_eQTL,Inh_mega_eQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL,ROSMAP_PCC_sQTL,STARNET_eQTL
ENSG00000037042chr174265928342659284TUBG2 MiGA_THA_eQTL
ENSG00000067596chr174348386443483865DHX8 ROSMAP_DLPFC_sQTL
ENSG00000068120chr174256146642561467COASY ROSMAP_PCC_sQTL
ENSG00000068137chr174267699342676994PLEKHH3 Inh_mega_eQTL,ROSMAP_AC_sQTL
ENSG00000073670chr174475898744758988ADAM11 MiGA_GTS_eQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000073969chr174659066846590669NSF Exc_DeJager_eQTL,AC_DeJager_eQTL,Mic_Kellis_eQTL,Oli_Kellis_eQTL,OPC_Kellis_eQTL,Exc_Kellis_eQTL,Exc_mega_eQTL,Inh_mega_eQTL,Oli_mega_eQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL,ROSMAP_PCC_sQTL
ENSG00000087152chr174420011244200113ATXN7L3 MiGA_GTS_eQTL
ENSG00000091947chr174402394544023946TMEM101 MiGA_GTS_eQTL,ROSMAP_PCC_sQTL
ENSG00000108309chr174430858244308583RUNDC3A MiGA_GTS_eQTL,MiGA_THA_eQTL,DLPFC_DeJager_eQTL,PCC_DeJager_eQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL,ROSMAP_PCC_sQTL
ENSG00000108771chr174211271342112714DHX58 MiGA_GTS_eQTL
ENSG00000108786chr174255292242552923HSD17B1 Oli_mega_eQTL
ENSG00000108797chr174268253042682531CNTNAP1 DLPFC_Klein_gpQTL
ENSG00000108799chr174274504842745049EZH1 ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000108821chr175020163050201631COL1A1 ROSMAP_PCC_sQTL
ENSG00000108825chr174298052742980528PTGES3L-AARSD1ROSMAP_AC_sQTL
ENSG00000108828chr174302512243025123VAT1 BM_22_MSBB_eQTL
ENSG00000108830chr174302523043025231RND2 STARNET_eQTL
ENSG00000108840chr174412370144123702HDAC5 DLPFC_Bennett_pQTL,ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000108852chr174390971043909711MPP2 MiGA_GFM_eQTL,MiGA_SVZ_eQTL,DLPFC_DeJager_eQTL,AC_DeJager_eQTL,Ast_mega_eQTL,ROSMAP_DLPFC_sQTL
ENSG00000108861chr174377897643778977DUSP3 MiGA_SVZ_eQTL,Oli_Kellis_eQTL
ENSG00000108883chr174489944444899445EFTUD2 MiGA_THA_eQTL,BM_44_MSBB_eQTL,ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL
ENSG00000120071chr174622538846225389KANSL1 Knight_eQTL,MiGA_GTS_eQTL,MiGA_SVZ_eQTL,BM_10_MSBB_eQTL,BM_22_MSBB_eQTL,BM_36_MSBB_eQTL,BM_44_MSBB_eQTL,Mic_DeJager_eQTL,Ast_DeJager_eQTL,Oli_DeJager_eQTL,OPC_DeJager_eQTL,Exc_DeJager_eQTL,Inh_DeJager_eQTL,AC_DeJager_eQTL,Inh_Kellis_eQTL,Ast_10_Kellis_eQTL,Mic_13_Kellis_eQTL,Exc_mega_eQTL,Mic_mega_eQTL,OPC_mega_eQTL,Oli_mega_eQTL,ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL,STARNET_eQTL
ENSG00000120088chr174578427945784280CRHR1 ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL,ROSMAP_PCC_sQTL
ENSG00000121073chr174970901349709014SLC35B1 monocyte_ROSMAP_eQTL
ENSG00000126562chr174278060942780610WNK4 MiGA_SVZ_eQTL
..................
ENSG00000167941chr174375879043758791SOST MiGA_THA_eQTL,DLPFC_DeJager_eQTL,PCC_DeJager_eQTL
ENSG00000168259chr174202137542021376DNAJC7 ROSMAP_DLPFC_sQTL
ENSG00000168517chr174516069945160700HEXIM2 MiGA_GTS_eQTL,ROSMAP_PCC_sQTL
ENSG00000168591chr174418696944186970TMUB2 MiGA_THA_eQTL,MSBB_BM36_pQTL
ENSG00000168610chr174238853942388540STAT3 ROSMAP_PCC_sQTL
ENSG00000172992chr174506110845061109DCAKD MiGA_GFM_eQTL,MiGA_GTS_eQTL,BM_44_MSBB_eQTL,Exc_DeJager_eQTL,DLPFC_DeJager_eQTL,AC_DeJager_eQTL,Exc_Kellis_eQTL,Inh_Kellis_eQTL,Exc_mega_eQTL,DLPFC_Bennett_pQTL
ENSG00000173757chr174227670642276707STAT5B MiGA_GFM_eQTL
ENSG00000173805chr174173464341734644HAP1 BM_10_MSBB_eQTL,ROSMAP_PCC_sQTL
ENSG00000175832chr174357961943579620ETV4 Inh_DeJager_eQTL
ENSG00000176681chr174629273246292733LRRC37A Knight_eQTL,MiGA_GFM_eQTL,MiGA_GTS_eQTL,MiGA_THA_eQTL,BM_10_MSBB_eQTL,BM_22_MSBB_eQTL,BM_36_MSBB_eQTL,BM_44_MSBB_eQTL,Ast_DeJager_eQTL,Oli_DeJager_eQTL,DLPFC_DeJager_eQTL,PCC_DeJager_eQTL,AC_DeJager_eQTL,Inh_mega_eQTL,monocyte_ROSMAP_eQTL
ENSG00000180336chr174465640344656404MEIOC MiGA_THA_eQTL
ENSG00000181513chr174513259945132600ACBD4 MiGA_THA_eQTL,Exc_mega_eQTL,ROSMAP_DLPFC_sQTL
ENSG00000182963chr174483081544830816GJC1 BM_44_MSBB_eQTL
ENSG00000183978chr174279870342798704COA3 MiGA_SVZ_eQTL
ENSG00000184922chr174522144345221444FMNL1 MiGA_SVZ_eQTL,ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000185829chr174657969046579691ARL17A MiGA_GTS_eQTL,MiGA_SVZ_eQTL,BM_10_MSBB_eQTL,BM_22_MSBB_eQTL,BM_36_MSBB_eQTL,BM_44_MSBB_eQTL,Ast_DeJager_eQTL,PCC_DeJager_eQTL,AC_DeJager_eQTL,Mic_Kellis_eQTL,Exc_Kellis_eQTL,Inh_Kellis_eQTL,Exc_mega_eQTL,OPC_mega_eQTL,Oli_mega_eQTL,monocyte_ROSMAP_eQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL,STARNET_eQTL
ENSG00000186566chr174450340544503406GPATCH8 Ast_10_Kellis_eQTL,Exc_mega_eQTL,Inh_mega_eQTL
ENSG00000186834chr174514847445148475HEXIM1 MiGA_GTS_eQTL
ENSG00000186868chr174589452645894527MAPT Ast_10_Kellis_eQTL,ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL
ENSG00000188554chr174317048043170481NBR1 MiGA_SVZ_eQTL,ROSMAP_DLPFC_sQTL,ROSMAP_PCC_sQTL
ENSG00000198863chr174298056442980565RUNDC1 MiGA_SVZ_eQTL,BM_44_MSBB_eQTL
ENSG00000214447chr174489971144899712FAM187A ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL
ENSG00000225190chr174549074845490749PLEKHM1 PCC_DeJager_eQTL,AC_DeJager_eQTL,Exc_Kellis_eQTL,Inh_Kellis_eQTL,Exc_mega_eQTL,Inh_mega_eQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000228696chr174636179646361797ARL17B Knight_eQTL,MiGA_GFM_eQTL,BM_10_MSBB_eQTL,BM_22_MSBB_eQTL,BM_36_MSBB_eQTL,BM_44_MSBB_eQTL,Mic_DeJager_eQTL,Oli_DeJager_eQTL,OPC_DeJager_eQTL,DLPFC_DeJager_eQTL,PCC_DeJager_eQTL,Mic_Kellis_eQTL,Oli_Kellis_eQTL,OPC_Kellis_eQTL,Exc_Kellis_eQTL,Inh_Kellis_eQTL,Ast_mega_eQTL,Exc_mega_eQTL,Inh_mega_eQTL,OPC_mega_eQTL,Oli_mega_eQTL,STARNET_eQTL
ENSG00000231256chr174378043443780435CFAP97D1 DLPFC_DeJager_eQTL
ENSG00000236383chr174330539643305397CCDC200 MiGA_SVZ_eQTL
ENSG00000238083chr174651151046511511LRRC37A2 MiGA_SVZ_eQTL,MiGA_THA_eQTL,BM_22_MSBB_eQTL,BM_36_MSBB_eQTL,Mic_DeJager_eQTL,Ast_DeJager_eQTL,Oli_DeJager_eQTL,OPC_DeJager_eQTL,Exc_DeJager_eQTL,Inh_DeJager_eQTL,DLPFC_DeJager_eQTL,PCC_DeJager_eQTL,AC_DeJager_eQTL,Mic_Kellis_eQTL,Ast_Kellis_eQTL,Oli_Kellis_eQTL,OPC_Kellis_eQTL,Exc_Kellis_eQTL,Inh_Kellis_eQTL,Ast_mega_eQTL,Exc_mega_eQTL,Inh_mega_eQTL,OPC_mega_eQTL,Oli_mega_eQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL,ROSMAP_PCC_sQTL,STARNET_eQTL
ENSG00000262633chr174692313246923133AC005670.2 Inh_Kellis_eQTL
ENSG00000263715chr174562034345620344LINC02210-CRHR1Exc_Kellis_eQTL,Inh_Kellis_eQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL,ROSMAP_PCC_sQTL
ENSG00000267060chr174298043242980433PTGES3L MiGA_GTS_eQTL,MiGA_THA_eQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL

Other genes implicated are PROC and HS6ST1 in MiGA cohort which may share causal eQTL with GRN. Further look into the data-set --- using these genes as targets and repeating what we did above for GRN --- might be needed to establish a more certain conclusion.

Alternatively, we may be able to apply a multi-gene statistical fine-mapping test on GRN region to find these genes, as you will see in the section below.

A statistically solid approach: mvSuSiE multi-gene analysis¶

This multi-gene fine-mapping analysis was done for each xQTL context separately. We will need to check the results for all contexts where this gene has an xQTL, in order to identify if there are other genes also sharing the same xQTL with this target gene. We included other genes in the same TAD window along with this gene, and extended it into a sliding window approach to include multiple TADs just in case. You need to check the sliding windows belongs to that gene (TSS) on analysis repo.

In [53]:
sliding_windows <- target_gene_info$gene_info$sliding_windows %>% strsplit(., ',') %>% unlist %>% as.character
sliding_windows
  1. 'chr17:30937006-42930542'
  2. 'chr17:34918066-44389199'
  3. 'chr17:36924414-47757464'
  4. 'chr17:40717742-51756219'
  5. 'chr17:41536509-52297936'
  6. 'chr17:45274784-55961956'

The most updated version of mvSuSiE multi-gene results are under path s3://statfungen/ftp_fgc_xqtl/analysis_result/mvsusie_multi_gene_test/multi_gene/ Currently it is still WIP. You can revisit this later when we prompt you to. Here is an example for GRN:

In [54]:
mnm_gene <- list()
for (window in sliding_windows) {
    mnm_gene_tmp <- NULL
    mnm_gene_tmp <- tryCatch(
        readRDS(paste0('/data/analysis_result/mvsusie_multi_gene/multi_gene/ROSMAP_multi_gene.', window, '.mnm.rds')),
        error = function(e) NULL
    )
    
    if (!is.null(mnm_gene_tmp)) {
        if(target_gene_info$gene_info$region_id %in% mnm_gene_tmp$mvsusie_fitted$condition_names){
        tryCatch({
            p <- mvsusieR::mvsusie_plot(mnm_gene_tmp$mvsusie_fitted, sentinel_only = F, add_cs = T)
            print(p)  # This ensures the plot is displayed in JupyterLab
        }, error = function(e) NULL)
        } else {
            message('There is mnm result for sliding window ',window,', but not include target gene ', gene_name, ' in CS')
        }
        mnm_gene <- append(mnm_gene, list(mnm_gene_tmp))
    }
}

In this case, there is no statistical evidence for GRN sharing any of its xQTL with other genes in ROSMAP Microglia data we looked into; although we have not analyzed MiGA this way yet (which showed some potential signals from the quick analysis above).

Section 7: Epigenomic QTL and their target regions¶

fsusie, see notebook

Generate a crude plot to determined whether the story is interesting¶

This is a crude version of the case study plot which shows the fsusie Effect (colored line), the gene body (black arrow), the epi-QTL (large dots with the same color as the effects) and ADGWAS cs position (small red dots).

Only produce the refine plot if we can see either:

  1. There are sharing snp between epi-QTL and AD CS
  2. There are the AD CS located within one of the effect range
  3. The crude plot suggest something interesting

AP Note: What is the difference between the 2 ROSMAP haQTL ? why one have negative while the others positive effect? Is it a desired behaviour that the epi-QTL is not overlapping with the AD SNPs ?

In [ ]:
 
In [10]:
options(repr.plot.width = 40, repr.plot.height = 40)

 ggplot() + theme_bw() +  facet_grid(cs_coverage_0.95+study + region ~ ., labeller = labeller(.rows = function(x) gsub("([_:,-])", "\n", x)), scale = "free_y") +

      theme(text = element_text(size = 20), strip.text.y = element_text(size = 25, angle = 0.5)) +
     # xlim(view_win) +
      ylab("Estimated effect") +
   #   geom_line(data = haQTL_df %>% mutate(study = "haQTL effect") %>% filter(CS == 5),
    #            aes_string(y = "fun_plot", x = "x", col = "CS"), size = 4, col = "#00AEEF") +
  geom_line(data = effect_of_interest ,
                aes_string(y = "fun_plot", x = "x", col = "cs_coverage_0.95"), size = 4) +  
    geom_point(data = effect_of_interest ,
                aes_string(y = "pip", x = "pos", col = "cs_coverage_0.95"), size = 4) +
    theme(text = element_text(size = 40), strip.text.y = element_text(size = 15, angle = 0.5), 
            axis.text.x = element_text(size = 40), axis.title.x = element_text(size = 40)) +
      xlab("Position") +
      ylab("Estimated\neffect") +
      geom_segment(arrow = arrow(length = unit(1, "cm")), aes(x = gene_start, xend = gene_end, y = 1, yend = 1), size = 6,
                  data = tar_gene_info$gene_info, alpha = 0.3) +
      geom_text(aes(x = (gene_start + gene_end) / 2, y = 1 , label = gene_name), size = 10, 
              data = tar_gene_info$gene_info)+
        geom_point(aes(x = pos, y = pip  ) ,color = "red", data = flatten_table%>%filter( str_detect(study,"AD_") , cs_coverage_0.95 != 0  )%>%mutate(AD_study = study%>%str_replace_all("_","\n" ))%>%select(-study,-region,-cs_coverage_0.95) ) 
No description has been provided for this image

Section 8: Context focused validation in other xQTL data¶

see notebook

add fake version for now, so you don't have to refer to above link

Background: our "discovery set" is ROSMAP but we have additional "validation" sets including:

  • STARNET
  • MiGA
  • KnightADRC
  • MSBB
  • metaBrain
  • UKB pQTL

TODO:

  • Get from Carlos WashU CSF based resource (pQTL and metabolomic QTL)

This section shows verification of findings from these data-sets. In principle we should check them through sections 1-6 more formally. In practice we will start with colocalization via colocboost --- since our study is genetics (variant and loci level) focused. We can selectively follow them up for potentially intereting validations. We therefore only demonstrate validation via colocboost as a starting point.

In [20]:
finempping_contexts <- readRDS(paste0(gene_name, '_finemapping_contexts.rds')) # from sec1
In [21]:
finempping_contexts <- get_norosmap_contexts(finempping_contexts)
In [22]:
cb_contexts <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2, add_QTL = TRUE, cohorts = finempping_contexts, gene_id = gene_id)

In conclusion from what's shown above, when we check the association signals in STARNET and MiGA on colocalization established from ROSMAP and AD GWAS, we see additional evidences.

Section 9: Non-linear effects of xQTL¶

see notebook

APOE interaction¶

In [7]:
options(repr.plot.width=6, repr.plot.height=6)

ggplot(GRN_int_res, aes(x = variant_id, y = qvalue_interaction)) +
  geom_point(alpha = 0.7, size = 6) +
  labs(title = "qvalue for GRN csets in interaction association nalysis",
       x = "Gene Name",
       y = "qvalue_interaction",
       size = "qvalue_interaction") +
  theme_minimal(base_size = 14) +
  theme(panel.background = element_blank(),
        panel.grid.major = element_line(color = "grey80"),
        legend.position = NULL,
        axis.text.x = element_text(angle = 45, hjust = 1))  + ylim(0,1)
  # scale_color_manual(values = colorRampPalette(brewer.pal(8, "Set1"))(length(unique(flat_var$gene_name))))
ggsave('plots/GRN/sec11.interaction_association_GRN_lessPIP25.pdf', height = 5, width = 8) 

In conclusion, there is no interaction QTL with APOE identified.

Section 10: in silico functional studies in iPSC model¶

see notebook

AP Note: Here we can see increase in iPSC neurons

In [16]:
vars_p
No description has been provided for this image
In [17]:
apoe_p
No description has been provided for this image

Section 11: Functional annotations of selected loci¶

see notebook

TODO

  • Touch base with Ryan on the snATAC annotations
  • Run this by Pavel to see if there are additional comments on how we do this
In [17]:
func_p
No description has been provided for this image

Section 12: Candidate loci as trans-xQTL¶

see notebook

In [11]:
options(repr.plot.width=12, repr.plot.height=6)
if(!is.null(flat_var)){
    ggplot(flat_var, aes(x = gene_name, y = pip, size = pip)) +
      geom_point(alpha = 0.7) +
      labs(title = paste0("PIP values for trans fine mapped Genes in ", gene_name ," csets with AD"),
           x = "Gene Name",
           y = "PIP",
           size = "PIP",
           color = "CS Coverage 0.95 Min Corr") +
      theme_minimal(base_size = 14) +
      theme(panel.background = element_blank(),
            panel.grid.major = element_line(color = "grey80"),
            legend.position = NULL,
            axis.text.x = element_text(angle = 45, hjust = 1))  
      # scale_color_manual(values = colorRampPalette(brewer.pal(8, "Set1"))(length(unique(flat_var$gene_name))))
    ggsave(paste0('plots/GRN/sec12.trans_fine_mapping_',gene_name,'.pdf'), height = 5, width = 8)
} else{
    message('There are no detectable trans signals for ', gene_name)
}

Creative thinking: generate hypothesis, search in literature, raise questions to discuss¶

You can now generate some preliminary hypotheses based on the above results. Next, you should search for evidence in the literature to support or refine these hypotheses and identify additional analyses needed to confirm them.