Case study: PICALM xQTL and AD GWAS¶

This notebook documents the analysis of xQTL case study on a targeted gene, PICALM.

  • Section 0: Sanity check
  • Section 1: Fine-mapping for xQTL and GWAS
  • Section 2: Multi-context colocalization with Bellenguez 2022
  • Section 3: Refinement of colocalized loci with other AD GWAS
  • Section 4: Assessment of multi-context xQTL effect sizes
  • Section 5: Multi-context causal TWAS (including conventional TWAS and MR)
  • Section 6: Context specific multi-gene fine-mapping
  • Section 7: Epigenomic QTL and their target regions
  • Section 8: Context focused validation in other xQTL data
  • Section 9: Non-linear effects of xQTL
  • Section 10: in silico functional studies in iPSC model
  • Section 11: Functional annotations of selected loci
  • Section 12: Candidate loci as trans-xQTL

Overview¶

FunGen-xQTL resource contains 67 xQTL contexts as well as 9 AD GWAS fine-mapped genome-wide. The overarching goal for case studies is to use these resource to raise questions and learn more about gene targets of interest.

Overall, a case study consists of the following aspects:

  • Check the basic information of the gene
  • Check the existing xQTL and integrative analysis results, roughly including
    • Summary table for univariate fine-mapping
    • Marginal association results
    • Multi-gene and multi-context fine-mapping
    • Multi-context colocalization with AD GWAS
    • TWAS, MR and causal TWAS
    • Integration with epigenetic QTL
    • Quantile QTL
    • Interaction QTL
    • Validation:
      • Additional xQTL data in FunGen-xQTL
      • Additional AD GWAS data-set already generated by FunGen-xQTL
    • In silico functional studies
      • Additional iPSC data-sets
    • Functional annotations of variants, particularly in relevant cellular contexts
  • Creative thinking: generate hypothesis, search in literature, raise questions to discuss

Computing environment setup¶

Interactive analysis will be done on AWS cloud powered by MemVerge. Please contact Gao Wang to setup accounts for you to start the analysis.

Please follow instructions on https://wanggroup.org/productivity_tips/mmcloud-interactive to configure your computing environment. Here are some additional packages you need to install after the initial configuration, in order to perform these analysis:

micromamba install -n r_libs r-pecotmr
micromamba install -n r_libs r-bedmatrix

How to Use This Notebook¶

  1. Before you start: Load functions from cb_plot.R and utilis.R, located at /data/interactive_analysis/rf2872/codes/. These functions and resources are packaged to streamline the analysis and ensure everything is as clean as possible. And also the codes for ColocBoost under path /data/colocalization/colocboost/R.
  2. Inside of this notebook, use sed -i or Ctrl+F to replace the gene PICALM with the gene you want to analyze.
  3. For detailed analysis in some sections, please refer to the corresponding analysis notebooks as indicated. These companion notebooks are available under this same folder. The rest of the tasks can be completed with a few lines of code, as demonstrated in this notebook.
  4. Similarly for the companion notebooks you should also use the sed -i or Ctrl+F replacing gene_name (PICALM in this case) with the gene you want to investigate.

While using this notebook, you may need to generate three intermediate files from Sections 1 and 2, which will be useful for downstream analysis:

  • a. Section 1:
    • Fine-mapping contexts that indicate shared signals with AD, PICALM_finemapping_contexts.rds. This can be used as input for Section 8 the multi-cohort validation step
    • A subset of the xQTL-AD table, Fungen_xQTL_allQTL.overlapped.gwas.export.PICALM.rds. This can be used as input for Section 12.
  • b. Section 2: A variant list showing colocalization in cohorts we analyzed with ColocBoost, PICALM_colocboost_res.rds. this can be used as input for Sections 7, 9, 10, and 12.

Section 0: Sanity check¶

Check the basic information of the gene¶

  • To gain a preliminary understanding of this gene’s expression—specifically, whether it is cell-specific—can help us quickly determine if our results are consistent with expectations.

Useful websites:

  1. check gene function, (immune) cell type specificity, tissue specifity, protein location: https://www.proteinatlas.org
  2. check gene position and structure: https://www.ncbi.nlm.nih.gov/gene/
  3. other collective information: https://www.genecards.org

Check the existing results, which includes:¶

In [62]:
source('../../codes/cb_plot.R')
source('../../codes/utilis.R')
for(file in list.files("/data/colocalization/colocboost/R", pattern = ".R", full.names = T)){
          source(file)
        }
gene_name = 'PICALM'

dir.create(paste0('plots/', gene_name), recursive = T)

get basic target gene information

In [2]:
target_gene_info <- get_gene_info(gene_name = 'PICALM')
target_gene_info
$gene_info
A data.table: 1 x 14
region_id#chrstartendTSSLD_matrix_idLD_sumstats_idLD_sumstats_id_oldTADB_indexTADB_idgene_startgene_endsliding_windowsgene_name
<chr><chr><dbl><dbl><int><chr><chr><chr><chr><chr><int><int><chr><chr>
ENSG00000073921chr11849571758736000086069881chr11:84267999-86714492,chr11:86714492-8933214811_84267999-86714492,11_86714492-8933214811_84267999-86714492,11_86714492_89332148TADB_914,TADB_915,TADB_916,TADB_917,TADB_918chr11_80821272_86627922,chr11_82455012_86627922,chr11_82455012_86627922,chr11_82455012_88330052,chr11_86037843_911928948606988285957175chr11:77324757-86627922,chr11:80552225-86627922,chr11:80821272-88330052,chr11:82455012-91192894,chr11:82455012-94812378,chr11:86037843-97507574PICALM
$target_LD_ids
A matrix: 1 x 2 of type chr
chr11:84267999-86714492chr11:86714492-89332148
$target_sums_ids
A matrix: 1 x 2 of type chr
11_84267999-8671449211_86714492-89332148
$gene_region
'chr11:84957175-87360000'
$target_TAD_ids
A matrix: 1 x 5 of type chr
chr11_80821272_86627922chr11_82455012_86627922chr11_82455012_86627922chr11_82455012_88330052chr11_86037843_91192894
In [3]:
gene_name = 'PICALM'
gene_id = target_gene_info$gene_info$region_id
chrom = target_gene_info$gene_info$`#chr`

Take a quick look for the expression of target gene in ROSMAP bulk data, we don't want them to be too low

In [6]:
expression_in_rosmap_bulk(target_gene_info)
No description has been provided for this image

Section 1: Fine-mapping for xQTL and GWAS¶

see notebook

In [7]:
region_p
In [43]:
pip_p
No description has been provided for this image

Section 2: Multi-context colocalization with Bellenguez 2022¶

This is done using ColocBoost. The most updated version of ColocBoost results are under path s3://statfungen/ftp_fgc_xqtl/analysis_result/ColocBoost/2024_9/

In [9]:
cb_res <- readRDS(paste0("/data/analysis_result/ColocBoost/2024_7/",gene_id,"_res.rds") )
# manually add orange loci 
cb_res$cb_z2z_noLD$coloc_results$csets_snp_names$`coloc_sets:Y1_Y2_Y18:CSfake` <- c("chr11:86157598:T:C","chr11:86156833:A:G")
cb_res$cb_z2z_noLD$coloc_results$csets$`coloc_sets:Y1_Y2_Y18:CSfake` <- c(4817, 4813)
cb_res$cb_z2z_noLD$coloc_results$phenotypes[[4]] <-  c('phenotype1','phenotype2','phenotype18')
In [10]:
#save colocboost results
cb_res_table <- get_cb_summary(cb_res) 

saveRDS(cb_res_table, paste0(gene_name, "_colocboost_res.rds"))
In [11]:
cb <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2)
No description has been provided for this image
In [14]:
#pdf('plots/PICALM/sec2.colocboost_res.pdf', width = 10, height = 5)
replayPlot(cb$p)
#dev.off()
No description has been provided for this image
In [13]:
# colocalized variants
cb_res_table
A data.frame: 4 x 8
colocalized phenotypespurity# variantshighest VCPcolocalized indexcolocalized variantsmax_abs_z_variantcset_id
<chr><dbl><dbl><dbl><chr><chr><chr><chr>
AC; DLPFC_productive NA 60.5816798093851; 3834; 4027; 3993; 4333; 4064 chr11:85947843:TAGG:CAGG; chr11:85943544:A:G; chr11:85981138:A:C; chr11:85973892:T:C; chr11:86055977:A:G; chr11:85991934:T:C chr11:85947843:TAGG:CAGGcoloc_sets:Y8_Y14:CS1
Mic; Ast; pQTL; AC_unproductive; DLPFC_unproductive; PCC_productive; AD_Bellenguez_2022NA140.4280565564422; 4367; 4379; 4425; 4207; 4204; 4246; 4306; 4320; 4458; 4292; 4217; 4279; 3981chr11:86076782:C:T; chr11:86063377:GTA:G; chr11:86065502:G:A; chr11:86077309:C:T; chr11:86031175:CA:C; chr11:86030088:C:T; chr11:86039629:A:T; chr11:86051202:C:T; chr11:86054256:G:A; chr11:86083718:C:T; chr11:86048986:T:C; chr11:86034995:C:T; chr11:86046547:C:T; chr11:85970540:A:Gchr11:86065502:G:A coloc_sets:Y1_Y2_Y11_Y13_Y15_Y16_Y18:CS3
Mic; Monocyte; AD_Bellenguez_2022 NA 80.2350142774791; 4766; 4786; 4781; 4758; 4767; 4771; 4773 chr11:86141937:G:A; chr11:86149263:G:A; chr11:86147455:G:T; chr11:86139201:C:T; chr11:86146597:T:C; chr11:86142209:T:G; chr11:86144030:ATCT:A; chr11:86144036:GC:G chr11:86142209:T:G coloc_sets:Y1_Y10_Y18:CS2
Mic; Ast; AD_Bellenguez_2022 NA 20.0011294534817; 4813 chr11:86157598:T:C; chr11:86156833:A:G chr11:86157598:T:C coloc_sets:Y1_Y2_Y18:CSfake
In [28]:
# effect sign for each coloc sets
get_effect_sign_csets(cb_res)
$`coloc_sets:Y8_Y14:CS1`
A data.frame: 6 × 3
variantsACDLPFC_productive
<chr><dbl><dbl>
chr11:85947843:TAGG:CAGGchr11:85947843:TAGG:CAGG5.3192145.908326
chr11:85943544:A:Gchr11:85943544:A:G 5.2406315.851143
chr11:85981138:A:Cchr11:85981138:A:C 5.1538155.779653
chr11:85973892:T:Cchr11:85973892:T:C 5.0658565.739538
chr11:86055977:A:Gchr11:86055977:A:G 5.1391225.653113
chr11:85991934:T:Cchr11:85991934:T:C 5.0991245.608884
$`coloc_sets:Y1_Y2_Y11_Y13_Y15_Y16_Y18:CS3`
A data.frame: 14 × 8
variantsMicAstpQTLAC_unproductiveDLPFC_unproductivePCC_productiveAD_Bellenguez_2022
<chr><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
chr11:86076782:C:Tchr11:86076782:C:T -10.52174-4.517582-3.818992-10.89911-8.859720-2.47863011.75000
chr11:86063377:GTA:Gchr11:86063377:GTA:G-10.34026-4.537517-3.844662-10.93803-8.765682-2.23230411.58427
chr11:86065502:G:Achr11:86065502:G:A -10.34026-4.537517-3.844662-10.94457-8.689725-2.19961711.60227
chr11:86077309:C:Tchr11:86077309:C:T -10.24246-4.409258-3.717093-10.69015-8.932100-2.41055011.69318
chr11:86031175:CA:Cchr11:86031175:CA:C -10.34026-4.537517-3.844662-10.88368-8.797908-2.23868311.44944
chr11:86030088:C:Tchr11:86030088:C:T -10.34026-4.537517-3.844662-10.88368-8.797908-2.23868311.44318
chr11:86039629:A:Tchr11:86039629:A:T -10.38765-4.467640-3.817673-10.88900-8.666949-2.39280411.45455
chr11:86051202:C:Tchr11:86051202:C:T -10.28062-4.410642-3.620332-10.88368-8.750821-2.42591211.54545
chr11:86054256:G:Achr11:86054256:G:A -10.23398-4.480186-3.647768-10.88900-8.728554-2.20574511.54545
chr11:86083718:C:Tchr11:86083718:C:T -10.09243-4.405057-3.649974-10.84648-8.639717-2.37414711.60227
chr11:86048986:T:Cchr11:86048986:T:C -10.10608-4.358270-3.715062-10.67645-8.816007-2.35795911.56322
chr11:86034995:C:Tchr11:86034995:C:T -10.10608-4.358270-3.701529-10.67645-8.793389-2.40677611.49425
chr11:86046547:C:Tchr11:86046547:C:T -10.05720-4.428364-3.742244-10.67645-8.870296-2.17171911.50575
chr11:85970540:A:Gchr11:85970540:A:G -10.06227-4.422811-3.654206-10.75134-8.781144-2.49912111.24138
$`coloc_sets:Y1_Y10_Y18:CS2`
A data.frame: 8 × 4
variantsMicMonocyteAD_Bellenguez_2022
<chr><dbl><dbl><dbl>
chr11:86149263:G:Achr11:86149263:G:A -11.297655.59982811.81481
chr11:86141937:G:Achr11:86141937:G:A -11.212845.51400212.01235
chr11:86147455:G:Tchr11:86147455:G:T -11.232015.59982811.79012
chr11:86146597:T:Cchr11:86146597:T:C -11.232015.59849611.77778
chr11:86139201:C:Tchr11:86139201:C:T -11.188215.39421811.96296
chr11:86142209:T:Gchr11:86142209:T:G -11.476465.51180610.97561
chr11:86144030:ATCT:Achr11:86144030:ATCT:A-11.086865.51299811.63881
chr11:86144036:GC:Gchr11:86144036:GC:G -11.086865.51299811.63831
$`coloc_sets:Y1_Y2_Y18:CSfake`
A data.frame: 2 × 4
variantsMicAstAD_Bellenguez_2022
<chr><dbl><dbl><dbl>
chr11:86157598:T:Cchr11:86157598:T:C-13.86218-5.01213412.54762
chr11:86156833:A:Gchr11:86156833:A:G-13.66511-5.03378612.57143
In [15]:
# LD between coloc sets
get_between_purity_simple(cb_res, gene.name = gene_id, path = '/data/colocalization/QTL_data/eQTL/')
A matrix: 6 x 5 of type chr
coloc_csets_1coloc_csets_2min_abs_cormax_abs_cormedian_abs_cor
coloc_sets:Y8_Y14:CS1 coloc_sets:Y1_Y2_Y11_Y13_Y15_Y16_Y18:CS30.2699636799372460.3162457374800890.285559647837661
coloc_sets:Y8_Y14:CS1 coloc_sets:Y1_Y10_Y18:CS2 0.3863196521409450.45771188274784 0.406421060547751
coloc_sets:Y8_Y14:CS1 coloc_sets:Y1_Y2_Y18:CSfake 0.1403793468542120.1603203091394 0.146601624689543
coloc_sets:Y1_Y2_Y11_Y13_Y15_Y16_Y18:CS3coloc_sets:Y1_Y10_Y18:CS2 0.5788641170105190.6690866862522310.599914040978263
coloc_sets:Y1_Y2_Y11_Y13_Y15_Y16_Y18:CS3coloc_sets:Y1_Y2_Y18:CSfake 0.7759398674827110.7990647836842590.783879863411505
coloc_sets:Y1_Y10_Y18:CS2 coloc_sets:Y1_Y2_Y18:CSfake 0.6139863615457650.6385885173566630.624090378592517

Section 3: Refinement of colocalized loci with other AD GWAS¶

In [16]:
AD_cohorts <- c('AD_Jansen_2021', 'AD_Bellenguez_EADB_2022', 'AD_Bellenguez_EADI_2022',
             'AD_Kunkle_Stage1_2019', 'AD_Wightman_Excluding23andMe_2021',
             'AD_Wightman_ExcludingUKBand23andME_2021', 'AD_Wightman_Full_2021')
cb_ad <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2, add_gwas = TRUE, gene_id = gene_id, cohorts = AD_cohorts)
No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.
No description has been provided for this image
In [21]:
pdf('plots/PICALM/sec3.colocboost_res_allad.pdf', width = 10, height = 5)
replayPlot(cb_ad$p)
dev.off()
png: 2

Section 4: Assessment of multi-context xQTL effect sizes¶

Option 1: use colocboost variants and check for mash posterior contrast to see if the effect size are shared or specific or even opposite. Advantage is that colocboost result is AD GWAS informed; issue is that marginal posterior effects is not always the joint

In [102]:
get_constrast_plot(df_rosmap)
No description has been provided for this image

Option 2: use mvSuSiE multicontext fine-mapping results --- the bubble plot to check posterior effects. Issue is that we don't have this results yet, and this is limited to one cohort at a time, without information from AD.

We should go for option 1 by default and if we want to make claim about opposite effect size we double-check with mvSuSiE multicontext analysis.

The most updated version of cTWAS analysis are under path s3://statfungen/ftp_fgc_xqtl/analysis_result/cTWAS/

In [63]:
message("Multi context in ROSMAP data")

multi_context_rosmap_tmp <- tryCatch(
    readRDS(paste0('/data/analysis_result/multi_context/ROSMAP/mnm/ROSMAP_DeJager.',
                   target_gene_info$gene_info$`#chr`, '_', gene_id, '.multicontext_bvsr.rds')),
    error = function(e) message('Error in loading ROSMAP multi context data')
)
if (!is.null(multi_context_rosmap_tmp[[1]]$mvsusie_fitted)) {
    plot_and_save(multi_context_rosmap_tmp[[1]], 'plots/PICALM/sec4.multi_context_ROSMAP')
} else {
    message('Multi Context results are empty in ROSMAP data')
}
No description has been provided for this image
No description has been provided for this image
In [65]:
# Load and process MSBB data
message("Multi context in MSBB data")

multi_context_msbb_tmp <- tryCatch(
    readRDS(paste0('/data/analysis_result/multi_context/MSBB/mnm/MSBB_eQTL.',
                   target_gene_info$gene_info$`#chr`, '_', gene_id, '.multicontext_bvsr.rds')),
    error = function(e)  message('Error in loading MSBB multi context data')
)
if (!is.null(multi_context_msbb_tmp[[1]]$mvsusie_fitted)) {
    plot_and_save(multi_context_msbb_tmp[[1]], 'plots/PICALM/sec4.multi_context_MSBB')
} else {
    message('Multi Context results are empty in MSBB data')
}

Section 5: Multi-context causal TWAS (including conventional TWAS and MR)¶

TWAS results¶

We report TWAS from all contexts and methods from the pipeline. Here we will filter it down to the best performing methods and only keep contexts that are significant.

In [10]:
plot_TWAS_res(gene_id = gene_id, gene_name = gene_name)
No description has been provided for this image

MR results¶

This is only available for genes that are deemed significant in TWAS and have summary statistics available for effect size and standard errors in GWAS, in addition to z-scores --- current version does not support z-scores although we will soon also support z-scores in MR using MAF from reference panel.

cTWAS results¶

To be updated soon.

Section 6: Context specific multi-gene fine-mapping¶

A quick analysis: using the xQTL-AD summary table (flatten table)¶

We extract from xQTL-AD summary table the variants to get other genes also have CS with the variants shared by target gene and AD.

In [21]:
multigene_flat <- get_multigene_multicontext_flatten('Fungen_xQTL_allQTL.overlapped.gwas.export.PICALM.rds', sQTL = 'no_MSBB')
multigene_flat
A data.frame: 2 × 6
gene_id#chrstartendgene_namecontexts
<chr><chr><int><int><chr><chr>
ENSG00000137501chr118581115885811159SYTL2 AC_DeJager_eQTL,Exc_mega_eQTL,Inh_mega_eQTL
ENSG00000150687chr118679105886791059PRSS23MiGA_GTS_eQTL

A statistically solid approach: mvSuSiE multi-gene analysis¶

The most updated version of mvSuSiE multi-gene results are under path s3://statfungen/ftp_fgc_xqtl/analysis_result/mvsusie_multi_gene_test/multi_gene/

Other genes implicated are SYTL2 and PRSS23 in different cohorts which may share causal eQTL with PICALM. Further look into the data-set --- using these genes as targets and repeating what we did above for PICALM --- might be needed to establish a more certain conclusion.

Alternatively, we may be able to apply a multi-gene statistical fine-mapping test on PICALM region to find these genes, as you will see in the section below.

In [35]:
sliding_windows <- target_gene_info$gene_info$TADB_id %>% strsplit(., ',') %>% unlist %>% as.character 
sliding_windows
  1. 'chr11_80821272_86627922'
  2. 'chr11_82455012_86627922'
  3. 'chr11_82455012_86627922'
  4. 'chr11_82455012_88330052'
  5. 'chr11_86037843_91192894'

The most updated version of mvSuSiE multi-gene results are under path s3://statfungen/ftp_fgc_xqtl/analysis_result/mvsusie_multi_gene_test/multi_gene/ Currently it is still WIP. You can revisit this later when we prompt you to. Here is an example for PICALM:

In [52]:
# Main loop to process sliding windows
mnm_gene <- list()
for (window in sliding_windows) {
    context_files <- list.files('/data/analysis_result/multi_gene/ROSMAP/mnm_genes/', window, full.names = T) %>% .[str_detect(., '.multigene_bvrs.rds')]
    for(context_file in context_files){
        context_mnm = context_file %>% basename %>% str_split(., '[.]', simplify = T) %>% .[,1]
        # Load multi-gene data
        mnm_gene_tmp <- tryCatch(
            readRDS(context_file),
            error = function(e) NULL
        )
        
        if (!is.null(mnm_gene_tmp)) {
            # Check if target gene is in the condition names
            if (target_gene_info$gene_info$region_id %in% mnm_gene_tmp[[1]]$mvsusie_fitted$condition_names) {
                # Use a common prefix format for multi-gene plots
                plot_and_save(mnm_gene_tmp[[1]], 'plots/PICALM/sec6.multigene')
            } else {
                message('There is mnm result for TAD window ', window, ' in ', context_mnm,
                        ', but it does not include target gene ', gene_name, ' in CS.')
            }
            # Append to the results list
            mnm_gene <- append(mnm_gene, list(mnm_gene_tmp))
        } 
    }
}

Section 7: Epigenomic QTL and their target regions¶

fsusie, see notebook

Generate a crude plot to determined whether the story is interesting¶

This is a crude version of the case study plot which shows the fsusie Effect (colored line), the gene body (black arrow), the epi-QTL (large dots with the same color as the effects) and ADGWAS cs position (small red dots).

Only produce the refine plot if we can see either:

  1. There are sharing snp between epi-QTL and AD CS
  2. There are the AD CS located within one of the effect range
  3. The crude plot suggest something interesting
In [66]:
cowplot::plot_grid(plotlist = list(p1,p3,p2


                                 
) ,
ncol = 1,
align = "v",
axis = "tlbr",  labels = c("A", "B", "C"),  label_size = 45, # Adjust the label size as needed
  label_fontface = "bold",
rel_heights = c(2,1,3)
) 
No description has been provided for this image

Section 8: Context focused validation in other xQTL data¶

see notebook

add fake version for now, so you don't have to refer to above link

Background: our "discovery set" is ROSMAP but we have additional "validation" sets including:

  • STARNET
  • MiGA
  • KnightADRC
  • MSBB
  • metaBrain
  • UKB pQTL

TODO:

  • Get from Carlos WashU CSF based resource (pQTL and metabolomic QTL)

This section shows verification of findings from these data-sets. In principle we should check them through sections 1-6 more formally. In practice we will start with colocalization via colocboost --- since our study is genetics (variant and loci level) focused. We can selectively follow them up for potentially intereting validations. We therefore only demonstrate validation via colocboost as a starting point.

In [23]:
finempping_contexts <- readRDS(paste0(gene_name, '_finemapping_contexts.rds')) # from sec1
In [24]:
finempping_contexts <- get_norosmap_contexts(finempping_contexts)
In [25]:
# for PICALM only
finempping_contexts <- c(as.character(finempping_contexts), 'MiGA_GFM_eQTL')
In [26]:
cb_ad <- plot_cb(cb_res = cb_res, cex.pheno = 1.5, x.phen = -0.2, add_QTL = TRUE, cohorts = finempping_contexts, gene_id = gene_id)
No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.No pvalue cutoff. Extract all variants names.
No description has been provided for this image

Section 9: Non-linear effects of xQTL¶

see notebook

APOE interaction¶

results are from Alex, see notebook

In [29]:
asso_p
No description has been provided for this image
In [52]:
#lm: APOE*Xres
#to limit multi-testing, test only the one of interest : Ast and Mic in purple and green who seems to have an interaction in TCW iPSC
#we remove also dono
picalmf<-picalm[context%in%c('Mic','Ast')&cs_name%in%c('purple',
                                                       'green')]

picalmf[,apoe4dose:=str_count(as.character(apoe_genotype),'4')]
picalmf[,AD:=str_count(as.character(apoe_genotype),'4')]

table(unique(picalmf,by='specimenID')$cogdx)


res_lm<-picalmf[,data.table(summary(lm(expression_res~genotype_res*apoe4dose))$coefficients,
                        keep.rownames = 'cov'),
       by=c('SNP','context','cs_name')]

#padj: n of cs context
ntests=nrow(unique(res_lm,by=c('cs_name','context')))
res_lm[,padj:=p.adjust(`Pr(>|t|)`,n = ntests),by=c('SNP','context','cov')]

res_lm[padj<0.1][cov=='genotype_res:apoe4dose']


ggplot(unique(res_lm[order(padj)],by=c('cs_name','context','cov')))+
  geom_col(aes(x=cov,y=-log10(`Pr(>|t|)`),fill=Estimate))+
facet_wrap(context~cs_name)+theme_bw()+scale_fill_gradient2(low = 'blue',high = 'red')+
  scale_x_discrete(guide = guide_axis(angle = 60))+geom_hline(yintercept = -log10(0.05),colour = 'grey',linetype='dashed')

#fwrite(res_lmf,fp(out,'res_lm_astromic_purple_green_qtl_interacAPOE.csv.gz'))


#fwrite(picalm,fp(out,'res_picalm_expr_css_interacAPOE.csv.gz'))
  1   2   3   4   5   6 
133 101   5 127  19  13 
A data.table: 7 × 9
SNPcontextcs_namecovEstimateStd. Errort valuePr(>|t|)padj
<fct><chr><chr><chr><dbl><dbl><dbl><dbl><dbl>
chr11:86141937:G:A Astpurplegenotype_res:apoe4dose-0.15226150.05776645-2.6358120.0087251440.03490057
chr11:86142209:T:G Astpurplegenotype_res:apoe4dose-0.14411050.05833496-2.4703960.0139188690.05567548
chr11:86144030:ATCT:AAstpurplegenotype_res:apoe4dose-0.13985150.05807169-2.4082570.0164873860.06594954
chr11:86144036:GC:G Astpurplegenotype_res:apoe4dose-0.13985150.05807169-2.4082570.0164873860.06594954
chr11:86146597:T:C Astpurplegenotype_res:apoe4dose-0.14927200.05747814-2.5970230.0097556360.03902254
chr11:86147455:G:T Astpurplegenotype_res:apoe4dose-0.14927200.05747814-2.5970230.0097556360.03902254
chr11:86149263:G:A Astpurplegenotype_res:apoe4dose-0.14451530.05763470-2.5074370.0125624960.05024999
No description has been provided for this image

Quantile QTL analysis¶

Anjing will soon provide the vignette to assess that.

In [46]:
quant_coef_colocvar
No description has been provided for this image

Section 10: in silico functional studies in iPSC model¶

see notebook

In [264]:
vars_p
No description has been provided for this image
In [256]:
apoe_p
No description has been provided for this image

Section 11: Functional annotations of selected loci¶

see notebook

TODO

  • Touch base with Ryan on the snATAC annotations
In [68]:
func_p
No description has been provided for this image

Section 12: Candidate loci as trans-xQTL¶

see notebook

In [44]:
options(repr.plot.width=12, repr.plot.height=6)
ggplot(flat_var, aes(x = gene_name, y = pip, size = pip)) +
  geom_point(alpha = 0.7) +
  labs(title = paste0("PIP values for trans fine mapped Genes in ", gene_name ," csets with AD"),
       x = "Gene Name",
       y = "PIP",
       size = "PIP",
       color = "CS Coverage 0.95 Min Corr") +
  theme_minimal(base_size = 14) +
  theme(panel.background = element_blank(),
        panel.grid.major = element_line(color = "grey80"),
        legend.position = NULL,
        axis.text.x = element_text(angle = 45, hjust = 1))  
  # scale_color_manual(values = colorRampPalette(brewer.pal(8, "Set1"))(length(unique(flat_var$gene_name))))
ggsave(paste0('plots/PICALM/sec12.trans_fine_mapping_',gene_name,'.pdf'), height = 5, width = 8)
No description has been provided for this image

no trans signals extracted for PICALM

Creative thinking: generate hypothesis, search in literature, raise questions to discuss¶

You can now generate some preliminary hypotheses based on the above results. Next, you should search for evidence in the literature to support or refine these hypotheses and identify additional analyses needed to confirm them.

For the different colocalization sets involving AD in microglia, we are exploring whether these differences are related to distinct microglia subtypes. To investigate this, we combined microglia data from ROSMAP-De Jager and Kellis, along with multi-omics data. We performed an integration analysis, followed by a comparison of PICALM expression across microglia subtypes based on the cell type classifications in the Kellis paper. Our goal is to identify a subtype that shows distinct expression patterns compared to others.

In [1]:
%preview 'plots/PICALM/Mic_subtypes.png'
> plots/PICALM/Mic_subtypes.png (241.7 KiB):
No description has been provided for this image

While we did not observe distinct performance among the different microglia subtypes, we found that CAMs, monocytes, and T cells were included in the microglia object and showed lower expression levels compared to true microglia cells. We were able to separate these into two clusters and will now investigate the QTL evidence for each cluster.

In [2]:
%preview 'plots/PICALM/PICALM_exp_Kellis_anno.png'
> plots/PICALM/PICALM_exp_Kellis_anno.png (179.6 KiB):
No description has been provided for this image

On the other hand, we are curious about the results of annotating the cell clusters based on PICALM expression. Specifically, we identified marker genes for each cluster and then classified the clusters as PICALM_high or PICALM_low based on their PICALM expression levels. Clusters with high PICALM expression were labeled as PICALM_high, and those with low expression as PICALM_low.

In [3]:
%preview 'plots/PICALM/PICALM_subtype_exp.png'
> plots/PICALM/PICALM_subtype_exp.png (271.4 KiB):
No description has been provided for this image

Although we did not observe clear boundaries between clusters based on PICALM expression, we were still able to separate the cells into two groups—PICALM_high and PICALM_low. We will proceed to examine the QTL evidence for these two clusters.

In [4]:
%preview 'plots/PICALM/PICALM_subtype_exp_visua.png'
> plots/PICALM/PICALM_subtype_exp_visua.png (157.5 KiB):
No description has been provided for this image