Case study: MS4A4A xQTL and AD GWAS¶

This notebook documents the analysis of xQTL case study on a targeted gene, MS4A4A.

  • Section 0: Sanity check
  • Section 1: Fine-mapping for xQTL and GWAS
  • Section 2: Multi-context colocalization with Bellenguez 2022
  • Section 3: Refinement of colocalized loci with other AD GWAS
  • Section 4: Assessment of multi-context xQTL effect sizes
  • Section 5: Multi-context causal TWAS (including conventional TWAS and MR)
  • Section 6: Context specific multi-gene fine-mapping
  • Section 7: Epigenomic QTL and their target regions
  • Section 8: Context focused validation in other xQTL data
  • Section 9: Non-linear effects of xQTL
  • Section 10: in silico functional studies in iPSC model
  • Section 11: Functional annotations of selected loci
  • Section 12: Candidate loci as trans-xQTL

Overview¶

FunGen-xQTL resource contains 67 xQTL contexts as well as 9 AD GWAS fine-mapped genome-wide. The overarching goal for case studies is to use these resource to raise questions and learn more about gene targets of interest.

Overall, a case study consists of the following aspects:

  • Check the basic information of the gene
  • Check the existing xQTL and integrative analysis results, roughly including
    • Summary table for univariate fine-mapping
    • Marginal association results
    • Multi-gene and multi-context fine-mapping
    • Multi-context colocalization with AD GWAS
    • TWAS, MR and causal TWAS
    • Integration with epigenetic QTL
    • Quantile QTL
    • Interaction QTL
    • Validation:
      • Additional xQTL data in FunGen-xQTL
      • Additional AD GWAS data-set already generated by FunGen-xQTL
    • In silico functional studies
      • Additional iPSC data-sets
    • Functional annotations of variants, particularly in relevant cellular contexts
  • Creative thinking: generate hypothesis, search in literature, raise questions to discuss

Computing environment setup¶

In [109]:
region_p
No description has been provided for this image
In [121]:
pip_p
No description has been provided for this image

Section 0: Sanity check¶

Check the basic information of the gene¶

  • To gain a preliminary understanding of this gene’s expression—specifically, whether it is cell-specific—can help us quickly determine if our results are consistent with expectations.

Useful websites:

  1. check gene function, (immune) cell type specificity, tissue specifity, protein location: https://www.proteinatlas.org
  2. check gene position and structure: https://www.ncbi.nlm.nih.gov/gene/
  3. other collective information: https://www.genecards.org

Check the existing results which are inputs to this analysis¶

In [1]:
source('../../codes/cb_plot.R')
source('../../codes/utilis.R')
for(file in list.files("/data/colocalization/colocboost/R", pattern = ".R", full.names = T)){
          source(file)
        }
gene_name = 'MS4A4A'

dir.create(paste0('plots/', gene_name), recursive = T)

get basic target gene information

In [2]:
target_gene_info <- get_gene_info(gene_name = gene_name)
target_gene_info
$gene_info
A data.table: 1 x 14
region_id#chrstartendTSSLD_matrix_idLD_sumstats_idLD_sumstats_id_oldTADB_indexTADB_idgene_startgene_endsliding_windowsgene_name
<chr><chr><dbl><dbl><int><chr><chr><chr><chr><chr><int><int><chr><chr>
ENSG00000110079chr11572800006144000060185656chr11:56858541-60339997,chr11:60339997-6381833211_56858541-60339997,11_60339997-6381833211_56858541-60339997,11_60339997_63818332TADB_903chr11_56299638_624858226018565760318080chr11:44804071-58324952,chr11:46980557-62485822,chr11:50840000-65321811,chr11:56299638-66671716,chr11:60203514-68955802MS4A4A
$target_LD_ids
A matrix: 1 x 2 of type chr
chr11:56858541-60339997chr11:60339997-63818332
$target_sums_ids
A matrix: 1 x 2 of type chr
11_56858541-6033999711_60339997-63818332
$gene_region
'chr11:57280000-61440000'
$target_TAD_ids
A matrix: 1 x 1 of type chr
chr11_56299638_62485822
In [3]:
gene_id = target_gene_info$gene_info$region_id
chrom = target_gene_info$gene_info$`#chr`

Take a quick look for the expression of target gene in ROSMAP bulk data, we don't want them to be too low

In [4]:
source('../../codes/utilis.R')
expression_in_rosmap_bulk(target_gene_info)
No description has been provided for this image

Section 1: Fine-mapping for xQTL and GWAS¶

see notebook

In [5]:
region_p

Bellenguez et al GWAS signals has many overlap with CS from other xQTL sources. This motivates us to look further. The figure above shows the ranges of CS to give us a loci level idea. Below, we show the variants in those CS, color-coding the variants that are shared between them in the same color. In particular, AD GWAS signals are also captured by a few xQTL data, although at this point we don't have formal statistical (colocalization) evidences for these observations yet.

In [6]:
pip_p

Section 2: Multi-context colocalization with Bellenguez 2022¶

This is done using ColocBoost. The most updated version of ColocBoost results are under path s3://statfungen/ftp_fgc_xqtl/analysis_result/ColocBoost/2024_9/

In [7]:
cb_res <- readRDS(paste0("/data/analysis_result/ColocBoost/2024_9/",gene_id,"_res.rds") )
In [8]:
#save colocboost results
cb_res_table <- get_cb_summary(cb_res) 

saveRDS(cb_res_table, paste0(gene_name, "_colocboost_res.rds"))
In [9]:
cb <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2)
No description has been provided for this image
In [ ]:
options(repr.plot.width=6, repr.plot.height=6)

ggplot(MS4A4A_int_res, aes(x = variant_id, y = qvalue_interaction)) +
  geom_point(alpha = 0.7, size = 6) +
  labs(title = "qvalue for MS4A4A csets in interaction association nalysis",
       x = "Gene Name",
       y = "qvalue_interaction",
       size = "qvalue_interaction") +
  theme_minimal(base_size = 14) +
  theme(panel.background = element_blank(),
        panel.grid.major = element_line(color = "grey80"),
        legend.position = NULL,
        axis.text.x = element_text(angle = 45, hjust = 1))  + ylim(0,1)
  # scale_color_manual(values = colorRampPalette(brewer.pal(8, "Set1"))(length(unique(flat_var$gene_name))))
ggsave('plots/MS4A4A/sec11.interaction_association_MS4A4A_lessPIP25.pdf', height = 5, width = 8) 
In [11]:
# colocalized variants
cb_res_table
A data.frame: 2 x 8
colocalized phenotypespurity# variantshighest VCPcolocalized indexcolocalized variantsmax_abs_z_variantcset_id
<chr><dbl><dbl><dbl><chr><chr><chr><chr>
Mic; DLPFC; AC; PCC 0.91588971130.137378412134; 12140; 12150; 12349; 12355; 12330; 12185; 12343; 12348; 12345; 12346; 12350; 12191; 12233; 12351; 12248; 12136; 12167; 12317; 12327; 12331; 12205; 12133; 12181; 12347; 12332; 12168; 12325; 12326; 12144; 12320; 12339; 12314; 12315; 12319; 12321; 12322; 12324; 12328; 12142; 12145; 12146; 12198; 12208; 12212; 12213; 12214; 12215; 12304; 12303; 12194; 12333; 12312; 12244; 12290; 12180; 12183; 12184; 12188; 12190; 12196; 12197; 12203; 12217; 12218; 12229; 12234; 12238; 12239; 12240; 12241; 12243; 12247; 12249; 12250; 12252; 12253; 12254; 12255; 12260; 12263; 12265; 12272; 12273; 12276; 12278; 12279; 12308; 12158; 12259; 12281; 12311; 12313; 12335; 12192; 12174; 12297; 12209; 12210; 12251; 12336; 12201; 12202; 12340; 12341; 12271; 12275; 12169; 12170; 12163; 12113; 12117; 12121chr11:60310703:A:G; chr11:60312082:C:T; chr11:60315024:G:A; chr11:60329824:CTTTTT:CT; chr11:60335317:G:A; chr11:60336353:G:A; chr11:60333108:C:T; chr11:60320770:CTT:CTTT; chr11:60311226:G:A; chr11:60335158:A:G; chr11:60325117:T:C; chr11:60334683:T:C; chr11:60334911:A:G; chr11:60334923:A:G; chr11:60335623:A:C; chr11:60326531:G:C; chr11:60318420:T:C; chr11:60335034:G:T; chr11:60335639:T:C; chr11:60321282:G:A; chr11:60310299:C:A; chr11:60318551:C:A; chr11:60331879:C:T; chr11:60332890:T:G; chr11:60333134:G:A; chr11:60333980:T:C; chr11:60312537:G:A; chr11:60313260:G:A; chr11:60322896:G:A; chr11:60313154:C:T; chr11:60320598:G:C; chr11:60332878:C:T; chr11:60332886:C:T; chr11:60332204:T:C; chr11:60333146:A:G; chr11:60331633:T:C; chr11:60331739:T:C; chr11:60332140:T:G; chr11:60332337:T:C; chr11:60332421:C:T; chr11:60332606:T:A; chr11:60332956:G:A; chr11:60316992:G:A; chr11:60322169:T:C; chr11:60323059:A:G; chr11:60323298:A:G; chr11:60323303:G:A; chr11:60323317:T:C; chr11:60323352:AC:A; chr11:60313801:C:T; chr11:60323212:A:G; chr11:60323223:TA:T; chr11:60329979:G:C; chr11:60319803:G:A; chr11:60327845:T:C; chr11:60322673:G:C; chr11:60322674:G:A; chr11:60329215:T:C; chr11:60304676:C:T; chr11:60305502:G:A; chr11:60306840:A:T; chr11:60320583:G:A; chr11:60320700:C:T; chr11:60320748:A:G; chr11:60320941:T:C; chr11:60321251:T:C; chr11:60321826:G:T; chr11:60321970:T:TG; chr11:60322838:T:G; chr11:60323565:A:G; chr11:60323634:C:A; chr11:60324907:T:C; chr11:60325177:T:A; chr11:60326006:G:C; chr11:60326054:C:T; chr11:60326111:C:G; chr11:60326159:C:T; chr11:60326317:T:C; chr11:60326496:A:G; chr11:60326820:C:A; chr11:60327057:G:A; chr11:60327346:C:T; chr11:60327369:C:T; chr11:60327385:C:T; chr11:60327717:A:G; chr11:60327923:G:A; chr11:60328122:T:C; chr11:60328465:T:C; chr11:60329041:C:T; chr11:60329101:A:G; chr11:60329444:G:A; chr11:60329567:C:T; chr11:60329621:A:T; chr11:60330870:T:C; chr11:60331241:TATA:T; chr11:60326396:T:C; chr11:60333243:T:TA; chr11:60321702:C:G; chr11:60333929:C:T; chr11:60331178:C:T; chr11:60331622:A:G; chr11:60333563:T:A; chr11:60328767:AAG:A; chr11:60321314:G:A; chr11:60327211:T:G; chr11:60318804:A:G; chr11:60318827:C:T; chr11:60317607:T:C; chr11:60330478:A:AC; chr11:60330467:G:A; chr11:60330084:T:C; chr11:60334120:C:T; chr11:60334245:A:Gchr11:60335317:G:Acoloc_sets:Y1_Y2_Y3_Y4:CS1
AC_unproductive; PCC_unproductive0.9505003 50.376877712026; 12038; 12030; 12016; 12066 chr11:60291670:C:T; chr11:60282652:A:G; chr11:60287475:G:A; chr11:60278834:G:A; chr11:60284189:T:C chr11:60282652:A:Gcoloc_sets:Y7_Y11:MergeCS1
In [12]:
# effect sign for each coloc sets
get_effect_sign_csets(cb_res)
$`coloc_sets:Y1_Y2_Y3_Y4:CS1`
A data.frame: 113 x 5
variantsMicDLPFCACPCC
<chr><dbl><dbl><dbl><dbl>
chr11:60310703:A:Gchr11:60310703:A:G 8.962573 9.036035 9.808210 6.082303
chr11:60312082:C:Tchr11:60312082:C:T 8.829216 8.976418 9.803572 6.082303
chr11:60315024:G:Achr11:60315024:G:A 8.829216 8.976418 9.803572 6.082303
chr11:60335317:G:Achr11:60335317:G:A -8.718433-8.935774-9.846141-6.111421
chr11:60336353:G:Achr11:60336353:G:A -8.718433-8.935774-9.846141-6.111421
chr11:60333108:C:Tchr11:60333108:C:T 8.688651 8.980412 9.800958 6.127455
chr11:60320770:CTT:CTTTchr11:60320770:CTT:CTTT 8.713986 9.004763 9.688047 6.082303
chr11:60334683:T:Cchr11:60334683:T:C 8.640892 8.954343 9.740190 6.082303
chr11:60335158:A:Gchr11:60335158:A:G 8.640892 8.949246 9.739917 6.082303
chr11:60334911:A:Gchr11:60334911:A:G 8.640892 8.949246 9.732768 6.082303
chr11:60334923:A:Gchr11:60334923:A:G 8.640892 8.949246 9.732768 6.082303
chr11:60335623:A:Cchr11:60335623:A:C 8.640892 8.949246 9.732768 6.082303
chr11:60321282:G:Achr11:60321282:G:A 8.640892 9.017875 9.642670 6.082303
chr11:60325117:T:Cchr11:60325117:T:C 8.674124 8.918504 9.697848 6.082303
chr11:60335639:T:Cchr11:60335639:T:C 8.600741 8.989777 9.697254 6.052380
chr11:60326531:G:Cchr11:60326531:G:C 8.613411 8.894380 9.740190 6.013676
chr11:60311226:G:Achr11:60311226:G:A 8.832401 9.146312 9.587774 5.902490
chr11:60318420:T:Cchr11:60318420:T:C 8.557083 8.884473 9.757520 5.981552
chr11:60331879:C:Tchr11:60331879:C:T 8.702063 9.094312 9.624858 5.991132
chr11:60332890:T:Gchr11:60332890:T:G 8.702063 9.094312 9.624858 5.991132
chr11:60333134:G:Achr11:60333134:G:A 8.702063 9.094312 9.624858 5.991132
chr11:60322896:G:Achr11:60322896:G:A 8.546270 8.867759 9.653841 6.082303
chr11:60310299:C:Achr11:60310299:C:A 8.735205 9.163102 9.554245 5.902490
chr11:60320598:G:Cchr11:60320598:G:C 8.680487 9.171846 9.593325 5.902490
chr11:60335034:G:Tchr11:60335034:G:T 8.640892 8.819149 9.678936 5.968479
chr11:60333146:A:Gchr11:60333146:A:G 8.616902 9.161712 9.621145 5.944411
chr11:60318551:C:Achr11:60318551:C:A 8.548292 8.796684 9.695236 6.052380
chr11:60332878:C:Tchr11:60332878:C:T 8.625421 9.094312 9.624858 5.991132
chr11:60332886:C:Tchr11:60332886:C:T 8.625421 9.094312 9.624858 5.991132
chr11:60313154:C:Tchr11:60313154:C:T 8.700702 9.158087 9.570702 5.902490
..................
chr11:60329101:A:Gchr11:60329101:A:G 8.5704089.0938559.5612665.902490
chr11:60329444:G:Achr11:60329444:G:A 8.5704089.0938559.5612665.902490
chr11:60329567:C:Tchr11:60329567:C:T 8.5704089.0938559.5612665.902490
chr11:60329621:A:Tchr11:60329621:A:T 8.5704089.0938559.5612665.902490
chr11:60330870:T:Cchr11:60330870:T:C 8.5704089.0938559.5612665.902490
chr11:60316992:G:Achr11:60316992:G:A 8.4339208.8743819.7099645.866710
chr11:60327845:T:Cchr11:60327845:T:C 8.5468229.0463039.6232135.902490
chr11:60329824:CTTTTT:CTchr11:60329824:CTTTTT:CT8.6968368.4436589.7656285.906323
chr11:60331178:C:Tchr11:60331178:C:T 8.5704089.0899599.5563945.902490
chr11:60331622:A:Gchr11:60331622:A:G 8.5704089.0899599.5563945.902490
chr11:60333563:T:Achr11:60333563:T:A 8.5704089.0899599.5563945.902490
chr11:60321314:G:Achr11:60321314:G:A 8.5704089.2014789.4318385.902490
chr11:60319803:G:Achr11:60319803:G:A 8.6035459.0307829.5802405.882127
chr11:60330084:T:Cchr11:60330084:T:C 8.3621308.9906079.4902916.118703
chr11:60323212:A:Gchr11:60323212:A:G 8.6325139.0111169.5612665.882127
chr11:60323223:TA:Tchr11:60323223:TA:T 8.6325139.0111169.5612665.882127
chr11:60327211:T:Gchr11:60327211:T:G 8.5704089.1514959.4629495.902490
chr11:60333929:C:Tchr11:60333929:C:T 8.5704089.1364099.5358015.824096
chr11:60322673:G:Cchr11:60322673:G:C 8.6126679.0368709.5309395.856208
chr11:60322674:G:Achr11:60322674:G:A 8.6126679.0368709.5309395.856208
chr11:60334120:C:Tchr11:60334120:C:T 8.4942679.0899599.5563945.902490
chr11:60334245:A:Gchr11:60334245:A:G 8.4942679.0899599.5563945.902490
chr11:60328767:AAG:Achr11:60328767:AAG:A 8.5704089.0916309.5358015.824096
chr11:60329215:T:Cchr11:60329215:T:C 8.5167698.9813479.5962735.919893
chr11:60318804:A:Gchr11:60318804:A:G 8.5174039.0551449.5802405.836300
chr11:60318827:C:Tchr11:60318827:C:T 8.5174039.0551449.5802405.836300
chr11:60317607:T:Cchr11:60317607:T:C 8.5470019.0356699.5612665.836300
chr11:60304676:C:Tchr11:60304676:C:T 8.7623848.9115659.2256105.591325
chr11:60305502:G:Achr11:60305502:G:A 8.7623848.9115659.2256105.591325
chr11:60306840:A:Tchr11:60306840:A:T 8.7623848.9115659.2256105.591325
$`coloc_sets:Y7_Y11:MergeCS1`
A data.frame: 5 x 3
variantsAC_unproductivePCC_unproductive
<chr><dbl><dbl>
chr11:60282652:A:Gchr11:60282652:A:G 6.048789 7.023734
chr11:60287475:G:Achr11:60287475:G:A 6.048789 7.023734
chr11:60284189:T:Cchr11:60284189:T:C-5.923336-7.020131
chr11:60278834:G:Achr11:60278834:G:A 6.007581 6.797674
chr11:60291670:C:Tchr11:60291670:C:T 6.347555 6.283213
In [13]:
# LD between coloc sets
get_between_purity_simple(cb_res, gene.name = gene_id, path = '/data/colocalization/QTL_data/eQTL/')
A matrix: 1 x 5 of type chr
coloc_csets_1coloc_csets_2min_abs_cormax_abs_cormedian_abs_cor
coloc_sets:Y1_Y2_Y3_Y4:CS1coloc_sets:Y7_Y11:MergeCS10.303592746208780.43207110836680.310470958062636

Here, different colors refer to different 95% Colocalization Sets (CoS, a metric developed in ColocBoost indicating that there is 95% probabilty that this CoS captures a colocalization event). We only included ROSMAP data for this particular ColocBoost analysis. In this case, we observe cell specific eQTL, bulk sQTL colocalization on ROSMAP data with AD as two separate CoS, suggesting two putative causal signals. We did not detect colocalization with pQTL of statistical significance although from Section 1 there are some overlap with pQTL signals in fine-mapping CS, the overlapped variants in CS have small PIP.

Section 3: Refinement of colocalized loci with other AD GWAS¶

Here we refine the colocalization with other AD GWAS to iron out any heterogeniety between studies (heterogeniety can come from many sources), to get additional candidate loci from these more heterogenous sources as candidates to study.

In [14]:
AD_cohorts <- c('AD_Jansen_2021', 'AD_Bellenguez_EADB_2022', 'AD_Bellenguez_EADI_2022',
             'AD_Kunkle_Stage1_2019', 'AD_Wightman_Excluding23andMe_2021',
             'AD_Wightman_ExcludingUKBand23andME_2021', 'AD_Wightman_Full_2021')
cb_ad <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2, add_gwas = TRUE, gene_id = gene_id, cohorts = AD_cohorts)
No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.
No description has been provided for this image
In [15]:
pdf('plots/MS4A4A/sec3.colocboost_res_allad.pdf', width = 10, height = 5)
replayPlot(cb_ad$p)
dev.off()
pdf: 2

Section 4: Assessment of multi-context xQTL effect sizes¶

Option 1: ColocBoost + MASH¶

Use colocboost variants and check for mash posterior contrast to see if the effect size are shared or specific or even opposite. Advantage is that colocboost result is AD GWAS informed; issue is that marginal posterior effects is not always the joint

In [16]:
mash_p <- mash_plot(gene_name = 'MS4A4A')
for (plot in mash_p) {
    print(plot)
}

Option 2: mvSuSiE¶

Use mvSuSiE multicontext fine-mapping results --- the bubble plot to check posterior effects. Issue is that we don't have this results yet, and this is limited to one cohort at a time, without information from AD.

We should go for option 1 by default and if we want to make claim about opposite effect size we double-check with mvSuSiE multicontext analysis.

Section 5: Multi-context causal TWAS (including conventional TWAS and MR)¶

The most updated version of cTWAS analysis are under path s3://statfungen/ftp_fgc_xqtl/analysis_result/cTWAS/

TWAS results¶

We report TWAS from all contexts and methods from the pipeline. Here we will filter it down to the best performing methods and only keep contexts that are significant.

In [17]:
plot_TWAS_res(gene_id = gene_id, gene_name = gene_name)
No description has been provided for this image

MR results¶

This is only available for genes that are deemed significant in TWAS and have summary statistics available for effect size and standard errors in GWAS, in addition to z-scores --- current version does not support z-scores although we will soon also support z-scores in MR using MAF from reference panel.

cTWAS results¶

To be updated soon.

Section 6: Context specific multi-gene fine-mapping¶

A quick analysis: using the xQTL-AD summary table (flatten table)¶

We extract from xQTL-AD summary table the variants to get other genes also have CS with the variants shared by target gene and AD.

In [18]:
multigene_flat <- get_multigene_multicontext_flatten('Fungen_xQTL_allQTL.overlapped.gwas.export.MS4A4A.rds', sQTL = 'no_MSBB')
multigene_flat
A data.frame: 46 x 6
gene_id#chrstartendgene_namecontexts
<chr><chr><int><int><chr><chr>
ENSG00000071203chr116049277760492778MS4A12 MiGA_GTS_eQTL
ENSG00000089597chr116264672562646726GANAB ROSMAP_PCC_sQTL
ENSG00000109991chr115733835157338352P2RX3 MiGA_GFM_eQTL
ENSG00000110031chr115857821958578220LPXN Exc_mega_eQTL
ENSG00000110042chr115917142959171430DTX4 ROSMAP_DLPFC_sQTL
ENSG00000110104chr116084211260842113CCDC86 MiGA_GTS_eQTL,MiGA_THA_eQTL
ENSG00000110446chr116095265260952653SLC15A3 ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL
ENSG00000134802chr115742757957427580SLC43A3 MiGA_GTS_eQTL,MiGA_THA_eQTL
ENSG00000134824chr116179297961792980FADS2 ROSMAP_DLPFC_sQTL
ENSG00000149131chr115759738657597387SERPING1 BM_44_MSBB_eQTL,Ast_DeJager_eQTL,Exc_DeJager_eQTL,Inh_DeJager_eQTL,DLPFC_DeJager_eQTL,PCC_DeJager_eQTL,AC_DeJager_eQTL,Exc_Kellis_eQTL,Ast_mega_eQTL,Exc_mega_eQTL,Inh_mega_eQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL,ROSMAP_PCC_sQTL
ENSG00000149136chr115733589157335892SSRP1 Exc_mega_eQTL
ENSG00000149150chr115751577957515780SLC43A1 MiGA_SVZ_eQTL,STARNET_eQTL
ENSG00000149499chr116261277462612775EML3 ROSMAP_DLPFC_sQTL
ENSG00000149506chr116086756160867562ZP1 MiGA_THA_eQTL
ENSG00000149516chr116005658660056587MS4A3 MiGA_GTS_eQTL
ENSG00000149532chr116143003061430031CPSF7 BM_10_MSBB_eQTL,DLPFC_DeJager_eQTL,AC_DeJager_eQTL,Exc_Kellis_eQTL,ROSMAP_DLPFC_sQTL
ENSG00000156587chr115756828357568284UBE2L6 MiGA_SVZ_eQTL,MiGA_THA_eQTL,monocyte_ROSMAP_eQTL,ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL,STARNET_eQTL
ENSG00000156599chr115766774657667747ZDHHC5 DLPFC_Bennett_pQTL
ENSG00000156603chr115771232257712323MED19 ROSMAP_DLPFC_sQTL
ENSG00000156689chr115890421458904215GLYATL2 MiGA_GTS_eQTL,Exc_DeJager_eQTL
ENSG00000166793chr115764994357649944YPEL4 MiGA_THA_eQTL
ENSG00000166900chr115971345559713456STX3 ROSMAP_PCC_sQTL
ENSG00000166926chr116033483060334831MS4A6E MiGA_GFM_eQTL,STARNET_eQTL
ENSG00000166927chr116037848460378485MS4A7 ROSMAP_AC_sQTL,ROSMAP_DLPFC_sQTL,ROSMAP_PCC_sQTL
ENSG00000166928chr116037852960378530MS4A14 Knight_eQTL,BM_10_MSBB_eQTL,BM_22_MSBB_eQTL,BM_36_MSBB_eQTL,BM_44_MSBB_eQTL,DLPFC_DeJager_eQTL,AC_DeJager_eQTL,Mic_Kellis_eQTL,ROSMAP_AC_sQTL,ROSMAP_PCC_sQTL,ROSMAP_DLPFC_sQTL,STARNET_eQTL
ENSG00000167985chr116143004161430042SDHAF2 MiGA_GTS_eQTL
ENSG00000167987chr116116142561161426VPS37C MiGA_GFM_eQTL,MiGA_THA_eQTL
ENSG00000167992chr116129531561295316VWCE MiGA_SVZ_eQTL
ENSG00000167996chr116196763361967634FTH1 MiGA_SVZ_eQTL
ENSG00000168002chr116276156462761565POLR2G ROSMAP_DLPFC_sQTL
ENSG00000172324chr115942641259426413OR5A2 MiGA_GFM_eQTL
ENSG00000172409chr115764818757648188CLP1 MiGA_SVZ_eQTL,ROSMAP_PCC_sQTL
ENSG00000172742chr115951136759511368OR4D9 MiGA_GTS_eQTL,MiGA_SVZ_eQTL
ENSG00000173457chr116424694264246943PPP1R14B MSBB_BM36_pQTL
ENSG00000186509chr115802388058023881OR9Q1 MiGA_GFM_eQTL,MiGA_SVZ_eQTL
ENSG00000186513chr115818906958189070OR9Q2 MiGA_GTS_eQTL
ENSG00000186652chr115739064957390650PRG2 BM_36_MSBB_eQTL
ENSG00000186660chr115857906258579063ZFP91 MiGA_GFM_eQTL,ROSMAP_PCC_sQTL
ENSG00000186907chr115746052757460528RTN4RL2 MiGA_SVZ_eQTL
ENSG00000197658chr116314422063144221SLC22A24 MiGA_SVZ_eQTL
ENSG00000198561chr115775324257753243CTNND1 MiGA_SVZ_eQTL,MSBB_BM36_pQTL
ENSG00000214787chr116024313660243137MS4A4E MiGA_SVZ_eQTL
ENSG00000254466chr115947331459473315OR4D10 MiGA_THA_eQTL
ENSG00000255073chr115857917158579172ZFP91-CNTFROSMAP_PCC_sQTL
ENSG00000279051chr115803092958030930OR6Q1 MiGA_GTS_eQTL
ENSG00000288534chr115771258157712582AP001931.2MiGA_SVZ_eQTL,ROSMAP_AC_sQTL

Other genes implicated are PROC and HS6ST1 in MiGA cohort which may share causal eQTL with MS4A4A. Further look into the data-set --- using these genes as targets and repeating what we did above for MS4A4A --- might be needed to establish a more certain conclusion.

Alternatively, we may be able to apply a multi-gene statistical fine-mapping test on MS4A4A region to find these genes, as you will see in the section below.

A statistically solid approach: mvSuSiE multi-gene analysis¶

This multi-gene fine-mapping analysis was done for each xQTL context separately. We will need to check the results for all contexts where this gene has an xQTL, in order to identify if there are other genes also sharing the same xQTL with this target gene. We included other genes in the same TAD window along with this gene, and extended it into a sliding window approach to include multiple TADs just in case. You need to check the sliding windows belongs to that gene (TSS) on analysis repo.

In [19]:
sliding_windows <- target_gene_info$gene_info$sliding_windows %>% strsplit(., ',') %>% unlist %>% as.character
sliding_windows
  1. 'chr11:44804071-58324952'
  2. 'chr11:46980557-62485822'
  3. 'chr11:50840000-65321811'
  4. 'chr11:56299638-66671716'
  5. 'chr11:60203514-68955802'

The most updated version of mvSuSiE multi-gene results are under path s3://statfungen/ftp_fgc_xqtl/analysis_result/mvsusie_multi_gene_test/multi_gene/ Currently it is still WIP. You can revisit this later when we prompt you to. Here is an example for MS4A4A:

In [20]:
mnm_gene <- list()
for (window in sliding_windows) {
    mnm_gene_tmp <- NULL
    mnm_gene_tmp <- tryCatch(
        readRDS(paste0('/data/analysis_result/mvsusie_multi_gene/multi_gene/ROSMAP_multi_gene.', window, '.mnm.rds')),
        error = function(e) NULL
    )
    
    if (!is.null(mnm_gene_tmp)) {
        if(target_gene_info$gene_info$region_id %in% mnm_gene_tmp$mvsusie_fitted$condition_names){
        tryCatch({
            p <- mvsusieR::mvsusie_plot(mnm_gene_tmp$mvsusie_fitted, sentinel_only = F, add_cs = T)
            print(p)  # This ensures the plot is displayed in JupyterLab
        }, error = function(e) NULL)
        } else {
            message('There is mnm result for sliding window ',window,', but not include target gene ', gene_name, ' in CS')
        }
        mnm_gene <- append(mnm_gene, list(mnm_gene_tmp))
    }
}
$pip_plot
$effect_plot
No description has been provided for this image
$z_plot
NULL

$effects
                       L3           L1           L2            L4
ENSG00000110077 0.1918843 7.351113e-09 7.198491e-09 -1.216698e-08
ENSG00000110079 0.1918843 7.351113e-09 7.198491e-09 -1.216698e-08
ENSG00000166927 0.1918843 7.351113e-09 7.198491e-09 -1.216698e-08
ENSG00000166928 0.1918843 7.351113e-09 7.198491e-09 -1.216698e-08
ENSG00000110446 0.1918843 7.351113e-09 7.198491e-09 -1.216698e-08
ENSG00000134824 0.1918843 7.351113e-09 7.198491e-09 -1.216698e-08
ENSG00000254772 0.1918843 7.351113e-09 7.198491e-09 -1.216698e-08
ENSG00000089597 0.1918843 7.351113e-09 7.198491e-09 -1.216698e-08
ENSG00000162191 0.1918843 7.351113e-09 7.198491e-09 -1.216698e-08
ENSG00000173511 0.1918843 7.351113e-09 7.198491e-09 -1.216698e-08
ENSG00000168071 0.1918843 7.351113e-09 7.198491e-09 -1.216698e-08
ENSG00000162302 0.1918843 7.351113e-09 7.198491e-09 -1.216698e-08

No description has been provided for this image

In this case, there is no statistical evidence for MS4A4A sharing any of its xQTL with other genes in ROSMAP Microglia data we looked into; although we have not analyzed MiGA this way yet (which showed some potential signals from the quick analysis above).

Section 7: Epigenomic QTL and their target regions¶

fsusie, see notebook

Generate a crude plot to determined whether the story is interesting¶

This is a crude version of the case study plot which shows the fsusie Effect (colored line), the gene body (black arrow), the epi-QTL (large dots with the same color as the effects) and ADGWAS cs position (small red dots).

Only produce the refine plot if we can see either:

  1. There are sharing snp between epi-QTL and AD CS
  2. There are the AD CS located within one of the effect range
  3. The crude plot suggest something interesting
In [21]:
options(repr.plot.width = 40, repr.plot.height = 40)

 ggplot() + theme_bw() +  facet_grid(cs_coverage_0.95+study + region ~ ., labeller = labeller(.rows = function(x) gsub("([_:,-])", "\n", x)), scale = "free_y") +

      theme(text = element_text(size = 20), strip.text.y = element_text(size = 25, angle = 0.5)) +
     # xlim(view_win) +
      ylab("Estimated effect") +
   #   geom_line(data = haQTL_df %>% mutate(study = "haQTL effect") %>% filter(CS == 5),
    #            aes_string(y = "fun_plot", x = "x", col = "CS"), size = 4, col = "#00AEEF") +
  geom_line(data = effect_of_interest ,
                aes_string(y = "fun_plot", x = "x", col = "cs_coverage_0.95"), size = 4) +  
    geom_point(data = effect_of_interest ,
                aes_string(y = "pip", x = "pos", col = "cs_coverage_0.95"), size = 4) +
    theme(text = element_text(size = 40), strip.text.y = element_text(size = 15, angle = 0.5), 
            axis.text.x = element_text(size = 40), axis.title.x = element_text(size = 40)) +
      xlab("Position") +
      ylab("Estimated\neffect") +
      geom_segment(arrow = arrow(length = unit(1, "cm")), aes(x = gene_start, xend = gene_end, y = 1, yend = 1), size = 6,
                  data = tar_gene_info$gene_info, alpha = 0.3) +
      geom_text(aes(x = (gene_start + gene_end) / 2, y = 1 , label = gene_name), size = 10, 
              data = tar_gene_info$gene_info)+
        geom_point(aes(x = pos, y = pip  ) ,color = "red", data = flatten_table%>%filter( str_detect(study,"AD_") , cs_coverage_0.95 != 0  )%>%mutate(AD_study = study%>%str_replace_all("_","\n" ))%>%select(-study,-region,-cs_coverage_0.95) ) 

Section 8: Context focused validation in other xQTL data¶

see notebook

add fake version for now, so you don't have to refer to above link

Background: our "discovery set" is ROSMAP but we have additional "validation" sets including:

  • STARNET
  • MiGA
  • KnightADRC
  • MSBB
  • metaBrain
  • UKB pQTL

TODO:

  • Get from Carlos WashU CSF based resource (pQTL and metabolomic QTL)

This section shows verification of findings from these data-sets. In principle we should check them through sections 1-6 more formally. In practice we will start with colocalization via colocboost --- since our study is genetics (variant and loci level) focused. We can selectively follow them up for potentially intereting validations. We therefore only demonstrate validation via colocboost as a starting point.

In [22]:
finempping_contexts <- readRDS(paste0(gene_name, '_finemapping_contexts.rds')) # from sec1
In [23]:
finempping_contexts <- get_norosmap_contexts(finempping_contexts)
In [24]:
cb_ad <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2, add_QTL = TRUE, cohorts = finempping_contexts, gene_id = gene_id)
No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.
No description has been provided for this image

In conclusion from what's shown above, when we check the association signals in STARNET and MiGA on colocalization established from ROSMAP and AD GWAS, we see additional evidences.

Section 9: Non-linear effects of xQTL¶

see notebook

APOE interaction¶

In [25]:
options(repr.plot.width=6, repr.plot.height=6)

ggplot(MS4A4A_int_res, aes(x = variant_id, y = qvalue_interaction)) +
  geom_point(alpha = 0.7, size = 6) +
  labs(title = "qvalue for MS4A4A csets in interaction association nalysis",
       x = "Gene Name",
       y = "qvalue_interaction",
       size = "qvalue_interaction") +
  theme_minimal(base_size = 14) +
  theme(panel.background = element_blank(),
        panel.grid.major = element_line(color = "grey80"),
        legend.position = NULL,
        axis.text.x = element_text(angle = 45, hjust = 1))  + ylim(0,1)
  # scale_color_manual(values = colorRampPalette(brewer.pal(8, "Set1"))(length(unique(flat_var$gene_name))))
ggsave('plots/MS4A4A/sec11.interaction_association_MS4A4A_lessPIP25.pdf', height = 5, width = 8) 

In conclusion, there is no interaction QTL with APOE identified.

Section 10: in silico functional studies in iPSC model¶

see notebook

In [26]:
vars_p
In [27]:
apoe_p

Section 11: Functional annotations of selected loci¶

see notebook

TODO

  • Touch base with Ryan on the snATAC annotations
  • Run this by Pavel to see if there are additional comments on how we do this
In [28]:
func_p

Section 12: Candidate loci as trans-xQTL¶

see notebook

In [29]:
options(repr.plot.width=12, repr.plot.height=6)
if(!is.null(flat_var)){
    ggplot(flat_var, aes(x = gene_name, y = pip, size = pip)) +
      geom_point(alpha = 0.7) +
      labs(title = paste0("PIP values for trans fine mapped Genes in ", gene_name ," csets with AD"),
           x = "Gene Name",
           y = "PIP",
           size = "PIP",
           color = "CS Coverage 0.95 Min Corr") +
      theme_minimal(base_size = 14) +
      theme(panel.background = element_blank(),
            panel.grid.major = element_line(color = "grey80"),
            legend.position = NULL,
            axis.text.x = element_text(angle = 45, hjust = 1))  
      # scale_color_manual(values = colorRampPalette(brewer.pal(8, "Set1"))(length(unique(flat_var$gene_name))))
    ggsave(paste0('plots/MS4A4A/sec12.trans_fine_mapping_',gene_name,'.pdf'), height = 5, width = 8)
} else{
    message('There are no detectable trans signals for ', gene_name)
}

Creative thinking: generate hypothesis, search in literature, raise questions to discuss¶

You can now generate some preliminary hypotheses based on the above results. Next, you should search for evidence in the literature to support or refine these hypotheses and identify additional analyses needed to confirm them.