Case study: C1S xQTL and AD GWAS¶

This notebook documents the analysis of xQTL case study on a targeted gene, C1S.

  • Section 0: Sanity check
  • Section 1: Fine-mapping for xQTL and GWAS
  • Section 2: Multi-context colocalization with Bellenguez 2022
  • Section 3: Refinement of colocalized loci with other AD GWAS
  • Section 4: Assessment of multi-context xQTL effect sizes
  • Section 5: Multi-context causal TWAS (including conventional TWAS and MR)
  • Section 6: Context specific multi-gene fine-mapping
  • Section 7: Epigenomic QTL and their target regions
  • Section 8: Context focused validation in other xQTL data
  • Section 9: Non-linear effects of xQTL
  • Section 10: in silico functional studies in iPSC model
  • Section 11: Functional annotations of selected loci
  • Section 12: Candidate loci as trans-xQTL

Overview¶

FunGen-xQTL resource contains 67 xQTL contexts as well as 9 AD GWAS fine-mapped genome-wide. The overarching goal for case studies is to use these resource to raise questions and learn more about gene targets of interest.

Overall, a case study consists of the following aspects:

  • Check the basic information of the gene
  • Check the existing xQTL and integrative analysis results, roughly including
    • Summary table for univariate fine-mapping
    • Marginal association results
    • Multi-gene and multi-context fine-mapping
    • Multi-context colocalization with AD GWAS
    • TWAS, MR and causal TWAS
    • Integration with epigenetic QTL
    • Quantile QTL
    • Interaction QTL
    • Validation:
      • Additional xQTL data in FunGen-xQTL
      • Additional AD GWAS data-set already generated by FunGen-xQTL
    • In silico functional studies
      • Additional iPSC data-sets
    • Functional annotations of variants, particularly in relevant cellular contexts
  • Creative thinking: generate hypothesis, search in literature, raise questions to discuss

Computing environment setup¶

Interactive analysis will be done on AWS cloud powered by MemVerge. Please contact Gao Wang to setup accounts for you to start the analysis.

Please follow instructions on https://wanggroup.org/productivity_tips/mmcloud-interactive to configure your computing environment. Here are some additional packages you need to install after the initial configuration, in order to perform these analysis: in terminal with bash:

micromamba install -n r_libs r-pecotmr

in R:

install.packages('BEDMatrix')

How to Use This Notebook¶

  1. Before you start: Load functions from cb_plot.R and utilis.R, located at /data/interactive_analysis/rf2872/codes/. These functions and resources are packaged to streamline the analysis and ensure everything is as clean as possible. And also the codes for ColocBoost under path /data/colocalization/colocboost/R.
  2. Inside of this notebook, use sed -i or Ctrl+F to replace the gene C1S with the gene you want to analyze.
  3. For detailed analysis in some sections, please refer to the corresponding analysis notebooks as indicated. These companion notebooks are available under this same folder. The rest of the tasks can be completed with a few lines of code, as demonstrated in this notebook.
  4. Similarly for the companion notebooks you should also use the sed -i or Ctrl+F replacing gene_name (C1S in this case) with the gene you want to investigate.

While using this notebook, you may need to generate three intermediate files from Sections 1 and 2, which will be useful for downstream analysis:

  • a. Section 1:
    • Fine-mapping contexts that indicate shared signals with AD, C1S_finemapping_contexts.rds. This can be used as input for Section 8 the multi-cohort validation step
    • A subset of the xQTL-AD table, Fungen_xQTL_allQTL.overlapped.gwas.export.C1S.rds. This can be used as input for Section 12.
  • b. Section 2: A variant list showing colocalization in cohorts we analyzed with ColocBoost, C1S_colocboost_res.rds. this can be used as input for Sections 7, 9, 10, and 12.

Section 0: Sanity check¶

Check the basic information of the gene¶

  • To gain a preliminary understanding of this gene’s expression—specifically, whether it is cell-specific—can help us quickly determine if our results are consistent with expectations.

Useful websites:

  1. check gene function, (immune) cell type specificity, tissue specifity, protein location: https://www.proteinatlas.org
  2. check gene position and structure: https://www.ncbi.nlm.nih.gov/gene/
  3. other collective information: https://www.genecards.org

Check the existing results which are inputs to this analysis¶

In [1]:
source('../../codes/cb_plot.R')
source('../../codes/utilis.R')
for(file in list.files("/data/colocalization/colocboost/R", pattern = ".R", full.names = T)){
          source(file)
        }
gene_name = 'C1S'

dir.create(paste0('plots/', gene_name), recursive = T)

get basic target gene information

In [2]:
target_gene_info <- get_gene_info(gene_name = gene_name)
target_gene_info
$gene_info
A data.table: 1 x 14
region_id#chrstartendTSSLD_matrix_idLD_sumstats_idLD_sumstats_id_oldTADB_indexTADB_idgene_startgene_endsliding_windowsgene_name
<chr><chr><dbl><dbl><int><chr><chr><chr><chr><chr><int><int><chr><chr>
ENSG00000182326chr12428000084400006988258chr12:3219245-4882038,chr12:4882038-6411328,chr12:6411328-8138114,chr12:8138114-1002452112_3219245-4882038,12_4882038-6411328,12_6411328-8138114,12_8138114-1002452112_3219245-4882038,12_4882038_6411328,12_6411328_8138114,12_8138114_10024521TADB_947,TADB_948,TADB_949chr12_3269770_8218574,chr12_5556863_9438425,chr12_6812512_1252168669882597071032chr12:0-5851927,chr12:529890-8218574,chr12:970785-9438425,chr12:3269770-12521686,chr12:5556863-13471233,chr12:6812512-13829975C1S
$target_LD_ids
A matrix: 1 x 4 of type chr
chr12:3219245-4882038chr12:4882038-6411328chr12:6411328-8138114chr12:8138114-10024521
$target_sums_ids
A matrix: 1 x 4 of type chr
12_3219245-488203812_4882038-641132812_6411328-813811412_8138114-10024521
$gene_region
'chr12:4280000-8440000'
$target_TAD_ids
A matrix: 1 x 3 of type chr
chr12_3269770_8218574chr12_5556863_9438425chr12_6812512_12521686
In [3]:
gene_id = target_gene_info$gene_info$region_id
chrom = target_gene_info$gene_info$`#chr`

Take a quick look for the expression of target gene in ROSMAP bulk data, we don't want them to be too low

In [4]:
source('../../codes/utilis.R')
expression_in_rosmap_bulk(target_gene_info)
No description has been provided for this image

Section 1: Fine-mapping for xQTL and GWAS¶

see notebook

In [5]:
region_p

Bellenguez et al GWAS signals has many overlap with CS from other xQTL sources. This motivates us to look further. The figure above shows the ranges of CS to give us a loci level idea. Below, we show the variants in those CS, color-coding the variants that are shared between them in the same color. In particular, AD GWAS signals are also captured by a few xQTL data, although at this point we don't have formal statistical (colocalization) evidences for these observations yet.

In [ ]:
pip_p
In [ ]:
 

summary ADxQTL table¶

This summary table have been produced by conserving the loci overlap between all-source AD GWAS credible sets (CS95%mincorr,CS95% and CS70%mincorr ), and the xQTL credible sets (CS95%mincorr, CS70%mincorr, and SNPs with pip>0.2)

In [10]:
options(repr.matrix.max.cols=150, repr.matrix.max.rows=20)
gname=gene_name
gwas_anno<-fread('../../Landscape_analysis/res_gwas_qtl_anno_ADGWAS_xQTL_noMSBBsQTL_filtered.csv.gz')
gwas_anno[gene_name==gname]
A data.table: 1 x 57
gwas_locus_idgwas_sourcegwas_variantlocuscontext_idgene_idgene_namecontextn.variant.hitn.variant.overlappingpct.variant.overlaptop.snp.distgwas_region.hitgwas_region.hit.sizegwas_n.variant.hitgwas_posgwas_mafgwas_pipoverlapping_gwasoverlapping_gwas_locistudyregionvariant_idpipmethodbetahatsebetahatchrposmafannotated_susie_csqtl_typeSingleNucBulkCellTypeMetabassay_typecohortcohort_typecell_typeqtl_categorycontext_hijunction_regiontested_molecule_eventcrediblesetqtl_hitstart.hitend.hitregion.hitregion.hit.sizeoverlap.with.xqtlgwas_zn.gwasn.contextqtl_zqtl_pvaluen.hitgwas_region.hit.unionAD_context_score
<chr><chr><chr><chr><chr><chr><chr><int><int><dbl><int><chr><int><int><int><lgl><dbl><chr><chr><chr><chr><chr><dbl><chr><dbl><dbl><chr><int><dbl><int><chr><lgl><lgl><lgl><chr><chr><chr><chr><chr><chr><chr><chr><chr><lgl><int><int><chr><int><lgl><dbl><int><int><dbl><dbl><int><chr><dbl>
chr12_6411328_8138114_AD_Bellenguez_2022_cs95_1AD_Bellenguez_2022chr12:7055497:C:TENSG00000182326_DLPFC_Bennett_pQTL_cs95corr_1ENSG00000182326C1SDLPFC_Bennett_pQTL18140.77777788537chr12-7048747-707385825112217055497NA0.1856597DLPFC_Bennett_pQTLENSG00000182326chr12:7064034:T:A0.2851886top_loci-0.50490250.0524393chr1270640340.1358173NApQTLFALSEFALSEFALSEBulkBrainBennettBennett_BulkBrainbulk_brainothers QTLsbulk_brain_pQTLENSG00000182326cs95corr_1TRUE70630327074075chr12-7063032-707407511044TRUE4.90909116-9.62832301chr12-7048747-70738594.541893

here we can see than C1S is associated only with 1 AD locus : in AD bellenguez study, and 1 locuscontext_id, DLPFC_Bennett_pQTL.

We can check now how this AD locus is 'pure', i.e. non associated with others context than C1S

In [9]:
ad_loci<-gwas_anno[gene_name==gname]$gwas_locus_id
gwas_anno[gwas_locus_id%in%ad_loci]
A data.table: 6 x 57
gwas_locus_idgwas_sourcegwas_variantlocuscontext_idgene_idgene_namecontextn.variant.hitn.variant.overlappingpct.variant.overlaptop.snp.distgwas_region.hitgwas_region.hit.sizegwas_n.variant.hitgwas_posgwas_mafgwas_pipoverlapping_gwasoverlapping_gwas_locistudyregionvariant_idpipmethodbetahatsebetahatchrposmafannotated_susie_csqtl_typeSingleNucBulkCellTypeMetabassay_typecohortcohort_typecell_typeqtl_categorycontext_hijunction_regiontested_molecule_eventcrediblesetqtl_hitstart.hitend.hitregion.hitregion.hit.sizeoverlap.with.xqtlgwas_zn.gwasn.contextqtl_zqtl_pvaluen.hitgwas_region.hit.unionAD_context_score
<chr><chr><chr><chr><chr><chr><chr><int><int><dbl><int><chr><int><int><int><lgl><dbl><chr><chr><chr><chr><chr><dbl><chr><dbl><dbl><chr><int><dbl><int><chr><lgl><lgl><lgl><chr><chr><chr><chr><chr><chr><chr><chr><chr><lgl><int><int><chr><int><lgl><dbl><int><int><dbl><dbl><int><chr><dbl>
chr12_6411328_8138114_AD_Bellenguez_2022_cs95_1AD_Bellenguez_2022chr12:7055497:C:TENSG00000111254_12:4638196:4648745:clu_14532_ROSMAP_AC_sQTL_cs70corr_3 ENSG00000111254AKAP3 ROSMAP_AC_sQTL 22160.72727277535chr12-7048747-707385825112217055497NA0.1856597ROSMAP_AC_sQTL_intron_chr12:4638196:4648745:clu_14532_-:ENSG00000111254 ENSG00000111254chr12:7063032:G:A0.0587871top_loci-0.32128000.07467939chr1270630320.1188870NAsQTL(leafcutter)FALSEFALSEFALSEBulkBrain ROSMAP ROSMAP_BulkBrain bulk_brain sQTL bulk_brain_sQTL(leafcutter)12:4638196:4648745:clu_14532ENSG00000111254_12:4638196:4648745:clu_14532cs70corr_3TRUE70594667074644chr12-7059466-707464415179TRUE4.90909116-4.3021241.691683e-052chr12-7048747-70738594.541893
chr12_6411328_8138114_AD_Bellenguez_2022_cs95_1AD_Bellenguez_2022chr12:7055497:C:TENSG00000111254_12:4638196:4648816:clu_14532_ROSMAP_AC_sQTL_cs70corr_2 ENSG00000111254AKAP3 ROSMAP_AC_sQTL 19140.73684216860chr12-7048747-707385825112217055497NA0.1856597ROSMAP_AC_sQTL_intron_chr12:4638196:4648816:clu_14532_-:ENSG00000111254 ENSG00000111254chr12:7062357:C:T0.1703815top_loci-0.30511570.06109810chr1270623570.1340641NAsQTL(leafcutter)FALSEFALSEFALSEBulkBrain ROSMAP ROSMAP_BulkBrain bulk_brain sQTL bulk_brain_sQTL(leafcutter)12:4638196:4648816:clu_14532ENSG00000111254_12:4638196:4648816:clu_14532cs70corr_2TRUE70594667074644chr12-7059466-707464415179TRUE4.90909116-4.9938665.918236e-072chr12-7048747-70738594.541893
chr12_6411328_8138114_AD_Bellenguez_2022_cs95_1AD_Bellenguez_2022chr12:7055497:C:TENSG00000126746_MiGA_SVZ_eQTL_cs95corr_4 ENSG00000126746ZNF384MiGA_SVZ_eQTL 18150.83333333969chr12-7048747-707385825112217055497NA0.1856597MiGA_SVZ_eQTL ENSG00000126746chr12:7059466:A:G0.0555556top_loci-0.13272490.10974247chr1270594660.1808511NAeQTL FALSE TRUEFALSEBulkCellTypeMiGA MiGA_BulkCellTypebulk_microgliabulk-eQTL bulk_microglia_eQTL ENSG00000126746 cs95corr_4TRUE70594667074644chr12-7059466-707464415179TRUE4.90909116-1.2094222.265009e-012chr12-7048747-70738594.541893
chr12_6411328_8138114_AD_Bellenguez_2022_cs95_1AD_Bellenguez_2022chr12:7055497:C:TENSG00000126749_12:6970966:6974339:clu_15629_ROSMAP_PCC_sQTL_cs95corr_1ENSG00000126749EMG1 ROSMAP_PCC_sQTL 20160.80000003969chr12-7048747-707385825112217055497NA0.1856597ROSMAP_PCC_sQTL_intron_chr12:6970966:6974339:clu_15629_+:ENSG00000126749ENSG00000126749chr12:7059466:A:G0.3928376top_loci 0.48832810.06985865chr1270594660.1383220NAsQTL(leafcutter)FALSEFALSEFALSEBulkBrain ROSMAP ROSMAP_BulkBrain bulk_brain sQTL bulk_brain_sQTL(leafcutter)12:6970966:6974339:clu_15629ENSG00000126749_12:6970966:6974339:clu_15629cs95corr_1TRUE70594667074644chr12-7059466-707464415179TRUE4.90909116 6.9902312.744249e-122chr12-7048747-70738594.541893
chr12_6411328_8138114_AD_Bellenguez_2022_cs95_1AD_Bellenguez_2022chr12:7055497:C:TENSG00000139197_MiGA_THA_eQTL_cs95corr_1 ENSG00000139197PEX5 MiGA_THA_eQTL 6 40.6666667 0chr12-7048747-707385825112217055497NA0.1856597MiGA_THA_eQTL ENSG00000139197chr12:7055497:C:T0.2613628top_loci-0.55200350.15287697chr1270554970.2410714NAeQTL FALSE TRUEFALSEBulkCellTypeMiGA MiGA_BulkCellTypebulk_microgliabulk-eQTL bulk_microglia_eQTL ENSG00000139197 cs95corr_1TRUE70482327057810chr12-7048232-7057810 9579TRUE4.90909116-3.6107703.052895e-042chr12-7048747-70738594.541893
chr12_6411328_8138114_AD_Bellenguez_2022_cs95_1AD_Bellenguez_2022chr12:7055497:C:TENSG00000182326_DLPFC_Bennett_pQTL_cs95corr_1 ENSG00000182326C1S DLPFC_Bennett_pQTL18140.77777788537chr12-7048747-707385825112217055497NA0.1856597DLPFC_Bennett_pQTL ENSG00000182326chr12:7064034:T:A0.2851886top_loci-0.50490250.05243930chr1270640340.1358173NApQTL FALSEFALSEFALSEBulkBrain BennettBennett_BulkBrainbulk_brain others QTLsbulk_brain_pQTL ENSG00000182326 cs95corr_1TRUE70630327074075chr12-7063032-707407511044TRUE4.90909116-9.6283230.000000e+001chr12-7048747-70738594.541893

The locus is not pure, it is associated with other genes : ZNF384 and PEX5 in MIGA eQTl, and 3 others genes in sQTL data. Interestingly only PEX5 MiGA_THA_eQTL have the same top SNP (based on pip ) than AD. this data is however to take with caution (MIGA is a little dataset and known to be probably inflated for false positive), so look at the coloc results will help.

Section 2: Multi-context colocalization with Bellenguez 2022¶

This is done using ColocBoost. The most updated version of ColocBoost results are under path s3://statfungen/ftp_fgc_xqtl/analysis_result/ColocBoost/2024_9/

In [7]:
 cb_res <- readRDS(paste0("/data/analysis_result/ColocBoost/2024_9/",gene_id,"_res.rds") )
In [ ]:
#str(cb_res)
In [8]:
#save colocboost results
cb_res_table <- get_cb_summary(cb_res) 

saveRDS(cb_res_table, paste0(gene_name, "_colocboost_res.rds"))
In [9]:
cb <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2)
In [ ]:
pdf('plots/C1S/sec2.colocboost_res.pdf', width = 10, height = 5)
replayPlot(cb$p)
dev.off()
In [9]:
# colocalized variants
cb_res_table
$max_abs_z_variant =
In [10]:
# effect sign for each coloc sets
get_effect_sign_csets(cb_res)
In [11]:
# LD between coloc sets
get_between_purity_simple(cb_res, gene.name = gene_id, path = '/data/colocalization/QTL_data/eQTL/')

Here, different colors refer to different 95% Colocalization Sets (CoS, a metric developed in ColocBoost indicating that there is 95% probabilty that this CoS captures a colocalization event). We only included ROSMAP data for this particular ColocBoost analysis. In this case, we observe cell specific eQTL, bulk sQTL colocalization on ROSMAP data with AD as two separate CoS, suggesting two putative causal signals. We did not detect colocalization with pQTL of statistical significance although from Section 1 there are some overlap with pQTL signals in fine-mapping CS, the overlapped variants in CS have small PIP.

Section 3: Refinement of colocalized loci with other AD GWAS¶

Here we refine the colocalization with other AD GWAS to iron out any heterogeniety between studies (heterogeniety can come from many sources), to get additional candidate loci from these more heterogenous sources as candidates to study.

In [12]:
AD_cohorts <- c('AD_Jansen_2021', 'AD_Bellenguez_EADB_2022', 'AD_Bellenguez_EADI_2022',
             'AD_Kunkle_Stage1_2019', 'AD_Wightman_Excluding23andMe_2021',
             'AD_Wightman_ExcludingUKBand23andME_2021', 'AD_Wightman_Full_2021')
cb_ad <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2, add_gwas = TRUE, gene_id = gene_id, cohorts = AD_cohorts)
No pvalue cutoff. Extract all variants names.
In [12]:
pdf('plots/C1S/sec3.colocboost_res_allad.pdf', width = 10, height = 5)
replayPlot(cb_ad$p)
dev.off()
png: 2

Section 4: Assessment of multi-context xQTL effect sizes¶

Option 1: ColocBoost + MASH¶

Use colocboost variants and check for mash posterior contrast to see if the effect size are shared or specific or even opposite. Advantage is that colocboost result is AD GWAS informed; issue is that marginal posterior effects is not always the joint

In [13]:
mash_p <- mash_plot(gene_name = 'C1S')
for (plot in mash_p) {
    print(plot)
}

Option 2: mvSuSiE¶

Use mvSuSiE multicontext fine-mapping results --- the bubble plot to check posterior effects. Issue is that we don't have this results yet, and this is limited to one cohort at a time, without information from AD.

We should go for option 1 by default and if we want to make claim about opposite effect size we double-check with mvSuSiE multicontext analysis.

Section 5: Multi-context causal TWAS (including conventional TWAS and MR)¶

The most updated version of cTWAS analysis are under path s3://statfungen/ftp_fgc_xqtl/analysis_result/cTWAS/

TWAS results¶

We report TWAS from all contexts and methods from the pipeline. Here we will filter it down to the best performing methods and only keep contexts that are significant.

In [14]:
plot_TWAS_res(gene_id = gene_id, gene_name = gene_name)
No description has been provided for this image

MR results¶

This is only available for genes that are deemed significant in TWAS and have summary statistics available for effect size and standard errors in GWAS, in addition to z-scores --- current version does not support z-scores although we will soon also support z-scores in MR using MAF from reference panel.

cTWAS results¶

To be updated soon.

Section 6: Context specific multi-gene fine-mapping¶

A quick analysis: using the xQTL-AD summary table (flatten table)¶

We extract from xQTL-AD summary table the variants to get other genes also have CS with the variants shared by target gene and AD.

In [15]:
multigene_flat <- get_multigene_multicontext_flatten('Fungen_xQTL_allQTL.overlapped.gwas.export.C1S.rds', sQTL = 'no_MSBB')
multigene_flat
A data.frame: 73 x 6
gene_id#chrstartendgene_namecontexts
<chr><chr><int><int><chr><chr>
ENSG00000008323chr1263104356310436PLEKHG6 MiGA_THA_eQTL
ENSG00000010219chr1245622034562204DYRK4 ROSMAP_PCC_sQTL
ENSG00000010278chr1261997146199715CD9 ROSMAP_DLPFC_sQTL
ENSG00000010292chr1264933556493356NCAPD2 ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000010610chr1267868576786858CD4 ROSMAP_AC_sQTL,STARNET_eQTL
ENSG00000010626chr1268735686873569LRRC23 MiGA_SVZ_eQTL
ENSG00000011105chr1230773543077355TSPAN9 ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000047617chr1259462315946232ANO2 MiGA_GFM_eQTL,Exc_Kellis_eQTL,Exc_mega_eQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000065970chr1280327158032716FOXJ2 STARNET_eQTL
ENSG00000067182chr1263421136342114TNFRSF1AROSMAP_AC_sQTL
ENSG00000069493chr1296649689664969CLEC2D MiGA_SVZ_eQTL
ENSG00000078237chr1243077624307763TIGAR MiGA_GTS_eQTL,MiGA_SVZ_eQTL
ENSG00000089692chr1267725186772519LAG3 MiGA_THA_eQTL
ENSG00000089693chr1267674746767475MLF2 ROSMAP_DLPFC_sQTL
ENSG00000089818chr1280769388076939NECAP1 STARNET_eQTL
ENSG00000110799chr1261247696124770VWF DLPFC_DeJager_eQTL,Exc_Kellis_eQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000110811chr1268284066828407P3H3 DLPFC_DeJager_eQTL,AC_DeJager_eQTL,Exc_mega_eQTL,ROSMAP_DLPFC_sQTL
ENSG00000110852chr1298693539869354CLEC2B BM_22_MSBB_eQTL
ENSG00000111247chr1245387974538798RAD51AP1MiGA_SVZ_eQTL
ENSG00000111254chr1246490504649051AKAP3 ROSMAP_AC_sQTL
ENSG00000111262chr1249099044909905KCNA1 AC_DeJager_eQTL
ENSG00000111319chr1263777296377730SCNN1A monocyte_ROSMAP_eQTL
ENSG00000111321chr1263750446375045LTBR ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000111639chr1264938406493841MRPL51 ROSMAP_DLPFC_sQTL
ENSG00000111640chr1265345166534517GAPDH MSBB_BM36_pQTL,DLPFC_DeJager_eQTL,Exc_Kellis_eQTL,Ast_mega_eQTL,OPC_mega_eQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000111641chr1265686906568691NOP2 MiGA_SVZ_eQTL,ROSMAP_DLPFC_sQTL
ENSG00000111642chr1266145236614524CHD4 MiGA_THA_eQTL,ROSMAP_DLPFC_sQTL,ROSMAP_PCC_sQTL
ENSG00000111653chr1266631416663142ING4 MSBB_BM36_pQTL
ENSG00000111664chr1268402106840211GNB3 MiGA_THA_eQTL,BM_36_MSBB_eQTL,BM_44_MSBB_eQTL,Exc_Kellis_eQTL,Inh_mega_eQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL,ROSMAP_PCC_sQTL
ENSG00000111665chr1268512916851292CDCA3 MiGA_GFM_eQTL,MiGA_GTS_eQTL,MiGA_THA_eQTL
..................
ENSG00000130035chr12 4720399 4720400GALNT8 Exc_DeJager_eQTL
ENSG00000130037chr12 5043878 5043879KCNA5 DLPFC_DeJager_eQTL
ENSG00000130038chr12 3764818 3764819CRACR2A ROSMAP_AC_sQTL
ENSG00000139178chr12 7109237 7109238C1RL ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000139182chr12 7129697 7129698CLSTN3 MSBB_BM36_pQTL,Exc_mega_eQTL,ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000139190chr12 6470676 6470677VAMP1 monocyte_ROSMAP_eQTL
ENSG00000139192chr12 6451689 6451690TAPBPL Knight_eQTL,MiGA_THA_eQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000139197chr12 7188684 7188685PEX5 Knight_eQTL,BM_22_MSBB_eQTL,OPC_DeJager_eQTL,DLPFC_DeJager_eQTL,PCC_DeJager_eQTL,AC_DeJager_eQTL,Exc_Kellis_eQTL,OPC_mega_eQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL,ROSMAP_PCC_sQTL
ENSG00000139200chr12 6700814 6700815PIANP MiGA_THA_eQTL,ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000151067chr12 1970785 1970786CACNA1C monocyte_ROSMAP_eQTL,ROSMAP_AC_sQTL
ENSG00000151079chr12 4809175 4809176KCNA6 MiGA_GTS_eQTL,MSBB_BM36_pQTL
ENSG00000159335chr12 6765515 6765516PTMS ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000166523chr12 8540904 8540905CLEC4E ROSMAP_DLPFC_sQTL
ENSG00000166535chr12 8822620 8822621A2ML1 ROSMAP_DLPFC_sQTL
ENSG00000171847chr12 8227617 8227618FAM90A1 ROSMAP_PCC_sQTL
ENSG00000171860chr12 8066358 8066359C3AR1 MiGA_SVZ_eQTL,STARNET_eQTL
ENSG00000173262chr12 7891147 7891148SLC2A14 Knight_eQTL,MiGA_GFM_eQTL,MiGA_THA_eQTL,STARNET_eQTL
ENSG00000173391chr121017213710172138OLR1 Mic_13_Kellis_eQTL
ENSG00000175899chr12 9116228 9116229A2M ROSMAP_DLPFC_sQTL
ENSG00000177575chr12 7503892 7503893CD163 MiGA_SVZ_eQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000177675chr12 7479896 7479897CD163L1 Mic_Kellis_eQTL,ROSMAP_PCC_sQTL
ENSG00000180574chr121050560110505602EIF2S3B MiGA_THA_eQTL
ENSG00000184344chr12 7695774 7695775GDF3 MiGA_GFM_eQTL
ENSG00000212124chr121102261911022620TAS2R19 BM_22_MSBB_eQTL
ENSG00000215021chr12 6970779 6970780PHB2 ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL
ENSG00000250510chr12 6821703 6821704GPR162 MiGA_SVZ_eQTL,ROSMAP_DLPFC_sQTL
ENSG00000255639chr12 4649100 4649101AC005833.1MiGA_THA_eQTL,ROSMAP_DLPFC_sQTL
ENSG00000256537chr121117119311171194SMIM10L1 ROSMAP_PCC_sQTL
ENSG00000285238chr12 6607366 6607367AC006064.6ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL,ROSMAP_PCC_sQTL
ENSG00000285901chr12 4273761 4273762AC008012.1ROSMAP_PCC_sQTL

Other genes implicated are PROC and HS6ST1 in MiGA cohort which may share causal eQTL with C1S. Further look into the data-set --- using these genes as targets and repeating what we did above for C1S --- might be needed to establish a more certain conclusion.

Alternatively, we may be able to apply a multi-gene statistical fine-mapping test on C1S region to find these genes, as you will see in the section below.

A statistically solid approach: mvSuSiE multi-gene analysis¶

This multi-gene fine-mapping analysis was done for each xQTL context separately. We will need to check the results for all contexts where this gene has an xQTL, in order to identify if there are other genes also sharing the same xQTL with this target gene. We included other genes in the same TAD window along with this gene, and extended it into a sliding window approach to include multiple TADs just in case. You need to check the sliding windows belongs to that gene (TSS) on analysis repo.

In [16]:
sliding_windows <- target_gene_info$gene_info$sliding_windows %>% strsplit(., ',') %>% unlist %>% as.character
sliding_windows
  1. 'chr12:0-5851927'
  2. 'chr12:529890-8218574'
  3. 'chr12:970785-9438425'
  4. 'chr12:3269770-12521686'
  5. 'chr12:5556863-13471233'
  6. 'chr12:6812512-13829975'

The most updated version of mvSuSiE multi-gene results are under path s3://statfungen/ftp_fgc_xqtl/analysis_result/mvsusie_multi_gene_test/multi_gene/ Currently it is still WIP. You can revisit this later when we prompt you to. Here is an example for C1S:

In [17]:
mnm_gene <- list()
for (window in sliding_windows) {
    mnm_gene_tmp <- NULL
    mnm_gene_tmp <- tryCatch(
        readRDS(paste0('/data/analysis_result/mvsusie_multi_gene/multi_gene/ROSMAP_multi_gene.', window, '.mnm.rds')),
        error = function(e) NULL
    )
    
    if (!is.null(mnm_gene_tmp)) {
        if(target_gene_info$gene_info$region_id %in% mnm_gene_tmp$mvsusie_fitted$condition_names){
        tryCatch({
            p <- mvsusieR::mvsusie_plot(mnm_gene_tmp$mvsusie_fitted, sentinel_only = F, add_cs = T)
            print(p)  # This ensures the plot is displayed in JupyterLab
        }, error = function(e) NULL)
        } else {
            message('There is mnm result for sliding window ',window,', but not include target gene ', gene_name, ' in CS')
        }
        mnm_gene <- append(mnm_gene, list(mnm_gene_tmp))
    }
}

In this case, there is no statistical evidence for C1S sharing any of its xQTL with other genes in ROSMAP Microglia data we looked into; although we have not analyzed MiGA this way yet (which showed some potential signals from the quick analysis above).

Section 7: Epigenomic QTL and their target regions¶

fsusie, see notebook

Generate a crude plot to determined whether the story is interesting¶

This is a crude version of the case study plot which shows the fsusie Effect (colored line), the gene body (black arrow), the epi-QTL (large dots with the same color as the effects) and ADGWAS cs position (small red dots).

Only produce the refine plot if we can see either:

  1. There are sharing snp between epi-QTL and AD CS
  2. There are the AD CS located within one of the effect range
  3. The crude plot suggest something interesting
In [9]:
options(repr.plot.width = 40, repr.plot.height = 40)

 ggplot() + theme_bw() + facet_grid(cs_coverage_0.95 + study + region ~ ., labeller = labeller(.rows = function(x) gsub("([_:,-])", "\n", x)), scale = "free_y") +

      theme(text = element_text(size = 20), strip.text.y = element_text(size = 25, angle = 0.5)) +
     # xlim(view_win) +
      ylab("Estimated effect") +
   #   geom_line(data = haQTL_df %>% mutate(study = "haQTL effect") %>% filter(CS == 5),
    #            aes_string(y = "fun_plot", x = "x", col = "CS"), size = 4, col = "#00AEEF") +
  geom_line(data = effect_of_interest ,
                aes_string(y = "fun_plot", x = "x", col = "cs_coverage_0.95"), size = 2) +  
    geom_point(data = effect_of_interest ,
                aes_string(y = "pip", x = "pos", col = "cs_coverage_0.95"), size = 10) +
    theme(text = element_text(size = 40), strip.text.y = element_text(size = 15, angle = 0.5), 
            axis.text.x = element_text(size = 40), axis.title.x = element_text(size = 40)) +
      xlab("Position") +
      ylab("Estimated\neffect") +
      geom_segment(arrow = arrow(length = unit(1, "cm")), aes(x = gene_start, xend = gene_end, y = 1, yend = 1), size = 6,
                  data = tar_gene_info$gene_info, alpha = 0.3) +
      geom_text(aes(x = (gene_start + gene_end) / 2, y = 1 , label = gene_name), size = 10, 
              data = tar_gene_info$gene_info)+
        geom_point(aes(x = pos, y = pip  ) ,color = "red", data = flatten_table%>%filter( str_detect(study,"AD_") , cs_coverage_0.95 != 0  )%>%mutate(AD_study = study%>%str_replace_all("_","\n" ))%>%select(-study,-region,-cs_coverage_0.95) ) 
No description has been provided for this image

Section 8: Context focused validation in other xQTL data¶

see notebook

add fake version for now, so you don't have to refer to above link

Background: our "discovery set" is ROSMAP but we have additional "validation" sets including:

  • STARNET
  • MiGA
  • KnightADRC
  • MSBB
  • metaBrain
  • UKB pQTL

TODO:

  • Get from Carlos WashU CSF based resource (pQTL and metabolomic QTL)

This section shows verification of findings from these data-sets. In principle we should check them through sections 1-6 more formally. In practice we will start with colocalization via colocboost --- since our study is genetics (variant and loci level) focused. We can selectively follow them up for potentially intereting validations. We therefore only demonstrate validation via colocboost as a starting point.

In [18]:
finempping_contexts <- readRDS(paste0(gene_name, '_finemapping_contexts.rds')) # from sec1
In [19]:
finempping_contexts <- get_norosmap_contexts(finempping_contexts)
In [20]:
cb_ad <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2, add_QTL = TRUE, cohorts = finempping_contexts, gene_id = gene_id)
No pvalue cutoff. Extract all variants names.

In conclusion from what's shown above, when we check the association signals in STARNET and MiGA on colocalization established from ROSMAP and AD GWAS, we see additional evidences.

Section 9: Non-linear effects of xQTL¶

see notebook

APOE interaction¶

In [21]:
options(repr.plot.width=6, repr.plot.height=6)

ggplot(C1S_int_res, aes(x = variant_id, y = qvalue_interaction)) +
  geom_point(alpha = 0.7, size = 6) +
  labs(title = "qvalue for C1S csets in interaction association nalysis",
       x = "Gene Name",
       y = "qvalue_interaction",
       size = "qvalue_interaction") +
  theme_minimal(base_size = 14) +
  theme(panel.background = element_blank(),
        panel.grid.major = element_line(color = "grey80"),
        legend.position = NULL,
        axis.text.x = element_text(angle = 45, hjust = 1))  + ylim(0,1)
  # scale_color_manual(values = colorRampPalette(brewer.pal(8, "Set1"))(length(unique(flat_var$gene_name))))
ggsave('plots/C1S/sec11.interaction_association_C1S_lessPIP25.pdf', height = 5, width = 8) 

In conclusion, there is no interaction QTL with APOE identified.

Section 10: in silico functional studies in iPSC model¶

see notebook

In [22]:
vars_p
In [23]:
apoe_p

Section 11: Functional annotations of selected loci¶

see notebook

TODO

  • Touch base with Ryan on the snATAC annotations
  • Run this by Pavel to see if there are additional comments on how we do this
In [17]:
func_p
No description has been provided for this image

Section 12: Candidate loci as trans-xQTL¶

see notebook

In [12]:
options(repr.plot.width=12, repr.plot.height=6)
if(!is.null(flat_var)){
   p =  ggplot(flat_var, aes(x = gene_name, y = pip, size = pip)) +
      geom_point(alpha = 0.7) +
      labs(title = paste0("PIP values for trans fine mapped Genes in ", gene_name ," csets with AD"),
           x = "Gene Name",
           y = "PIP",
           size = "PIP",
           color = "CS Coverage 0.95 Min Corr") +
      theme_minimal(base_size = 14) +
      theme(panel.background = element_blank(),
            panel.grid.major = element_line(color = "grey80"),
            legend.position = NULL,
            axis.text.x = element_text(angle = 45, hjust = 1))  
      # scale_color_manual(values = colorRampPalette(brewer.pal(8, "Set1"))(length(unique(flat_var$gene_name))))
    ggsave(paste0('plots/CR1/sec12.trans_fine_mapping_',gene_name,'.pdf'),p, height = 5, width = 8)
    p
    } else{
    message('There are no detectable trans signals for ', gene_name)
}

Creative thinking: generate hypothesis, search in literature, raise questions to discuss¶

You can now generate some preliminary hypotheses based on the above results. Next, you should search for evidence in the literature to support or refine these hypotheses and identify additional analyses needed to confirm them.