Tuesday, October 8, 2019

Connect with the Knowledge Portal Network team at #ASHG19

Attending the American Society of Human Genetics Annual Meeting next week? We are too, and we look forward to connecting with you in multiple venues:

Wednesday, October 16

  • Visit our booth (#131) in the exhibit hall from 10am-4:30pm
  • Attend our Ancillary session:
Translating Variant Associations to Functional Insights Using the Knowledge Portal Network
12:45-2:00 pm, Marriott Marquis Houston, Tanglewood room
Jesse Engreitz and Jason Flannick will speak, with an introduction from Noël Burtt and followup from Maria Costanzo.
  • Attend our presentation at the Broad genomics booth (#714) from 3-4pm
  • Attend the talk by Lokendra Thakur, “Calculating principled gene priors for genetic association analysis.” 4:45-5pm, Room 317A, Level 3, Convention Center

Thursday, October 17

  • Visit our booth (#131) in the exhibit hall from 10am-4:30pm
  • Visit the poster (#1657/T) by Ben Alexander, “Systematic comparison of different evidence sources for predicting GWAS effector genes” from 2-3pm
  • Visit the poster (#1402/T) by Dylan Spalding, “Federating association analysis in type 2 diabetes to protect participants’ privacy” from 3-4pm

Friday, October 18

  • Visit our booth (#131) in the exhibit hall from 10am-3:30pm
  • Attend our presentation at the Broad genomics booth (#714) from 10-11am
  • Visit the poster (#2804/F) by Peter Dornbos, “The functional impacts of rare coding variants in 46,000 individuals on 23 quantitative phenotypes” from 2-3pm

Saturday, October 19

  • Attend the talk by Marcin von Grotthuss, “Public programmatic access to GWAS summary statistics and analytical methods.” 8:45-9am, Room 310A, Level 3, Convention Center

Tuesday, September 17, 2019

Mining insights from GWAS

Genetic association data from genome-wide association studies (GWAS) are foundational for our understanding of complex diseases and traits. But in order to apply these results to diagnosis, drug development, and treatment, we need to identify the effector genes that explain those genetic associations. This is rarely straightforward: most SNPs associated with disease are located outside of coding regions of the genome, so that their impact on genes is not obvious; and even a variant located in a protein-coding gene may actually affect a different gene. And to complicate things further, a variant that is strongly associated with disease may not have a direct impact on a gene, but may rather be "along for the ride" with a tightly linked causal variant.

To help bridge the gap between genetic association results and the effector genes that are directly involved in disease, we are aggregating additional data types—for example, transcriptional regulation, tissue specificity, curated biological annotations, and more—and integrating them, using cutting-edge computational methods, in order to mine insights from GWAS data. We present the results of these methods in interactive FOCUS (Find Orthogonal Computational Support) tables.

As a first step in implementing these methods, we needed to find a way to store many different connections between variants, genes, tissues, phenotypes, and biological annotations. We decided to use a Neo4J graph database, which holds data nodes and their relationships with each other and can support complex, scientifically meaningful queries.

Neo4J graph showing variants on chromosome 8 that are associated with glycemic phenotypes. Orange circles represent variants; pink, p-values; blue, phenotypes; red, phenotype group; green and brown, variant annotations.

We have also created pipelines to apply computational methods to the genetic association data in the Knowledge Portal Network. In brief, we are currently running:
  • MetaXcan, which integrates tissue-specific expression data from GTEx and genetic association data to predict the potential that a gene is causal for a phenotype in a given tissue;
  • DEPICT, which integrates multiple data sources including transcriptional co-regulation, Gene Ontology annotations,  model organism phenotypes, and more to make several predictions: membership of a gene in a pathway; the probability of its association with a given phenotype; and the tissues or cell types that are likely to be relevant for a given trait;
  • eCAVIAR and COLOC, two methods that quantify the probability that a variant is causal in both genetic association and eQTL studies;
  • GREGOR, which integrates chromatin states with genetic associations derived from meta-analysis of the Knowledge Portal Network to generate p-values representing the significance of association between a tissue and trait; 
  • LD score regression (LDSR), which uses cell type-specific annotations and genetic association summary statistics in the Knowledge Portal Network to generate p-values representing the significance of association between a tissue and trait.

The Gene FOCUS table is accessible via the "Genes in region" tab on the Gene page:

Gene FOCUS table for PITX2

The table shows results of the methods for each gene across the region. It has two alternative views, and supports versatile sorting. The methods, data types, and table navigation are described in more detail in our downloadable help documentation for the new interface. Note that not all results are currently available for all genes in the Cerebrovascular Disease Knowledge Portal, but the Gene FOCUS table will become increasingly populated in the future as more datasets are added to the CDKP and the methods are re-run.

The Tissue FOCUS table, accessible via a link on the home page, presents results that can suggest which tissues or cell types may be relevant for a disease or trait of interest. 

To use the table, choose a phenotype of interest to see p-values for different tissues, denoting the significance with which variants associated with that phenotype are enriched in each tissue. Find complete details about the table and methods in our downloadable documentation.

This system, from data storage through the computational pipelines through the user interface, has been designed to be flexible and modular so that in the future we will be able to add new methods and data types easily and rapidly. As we actively develop these tools, we are very interested in feedback from researchers about how to improve it. Please try it out and let us know what you think!

Thursday, July 11, 2019

Join our instructional webinar July 18

Join us at noon EDT on Thursday, July 18 for an interactive workshop featuring gene-specific resources in the Knowledge Portal Network portals. We’ll first cover two new types of information on T2D gene associations: predictions of T2D effector genes, and gene-level T2D association scores. Then we'll delve into the Gene page with its comprehensive information for a wide variety of phenotypes, focusing on how the Knowledge Portals can help researchers prioritize genes within a GWAS locus for further investigation. See below for the agenda.

This session may be attended as an online webinar (connection information below) or in person at the Broad Institute in the 415 Main St Board room (mezzanine level), where lunch will be provided.

We hope you will attend and bring your questions and suggestions!


Introduction - Noël Burtt

Gene-specific resources in the Knowledge Portals - Maria Costanzo

Preview of upcoming features - Ben Alexander

Q & A - the T2DKP team

Connection Information:

Join Zoom Meeting

One tap mobile
+16468769923,,619080603# US (New York)
+16465588656,,619080603# US (New York)

Dial by your location
        +1 646 876 9923 US (New York)
        +1 646 558 8656 US (New York)
Meeting ID: 619 080 603

Friday, April 19, 2019

Bottom line p-values now available in the CDKP

When genetic association analysis for a phenotype is performed in multiple studies, many different p-values representing the significance of that association are generated. How do we know which one is the most accurate?

To complicate things even further, the populations tested in different datasets often overlap with each other. How can we avoid double-counting associations?

Bottom line analysis provides an answer to both of these questions. It integrates results over multiple datasets and accounts for sample overlap between datasets to generate a single p-value representing the significance of the association between a variant and a phenotype.

Now, you can access bottom line p-values for individual variants on Variant pages in the Cerebrovascular Disease Knowledge Portal as well as in the other portals of the Knowledge Portal Network: Type 2 Diabetes KP, Cardiovascular Disease KP, and Sleep Disorder KP. To view bottom line p-values, open the "associations at a glance" section of the Variant page (see an example):

Choose to view "Bottom line analysis" in the PheWAS plot, and then mouse over a point to see the p-value:

We thank our colleagues at the University of Michigan, who developed the METAL method used in this analysis. Please note that this method as instantiated in the CDKP is experimental; be sure to compare the results with those from individual datasets, and contact us with any questions.

Thursday, April 18, 2019

GPS information for BMI and obesity now available

Genome-wide polygenic scores (GPS) have great potential for helping to advance research on complex diseases and traits. Not only can they help predict individual genetic risk, but they can also help us understand the physiology of disease, by identifying groups at the extremes of risk whose clinical profiles can be studied or who may be enrolled in clinical trials.

Following up on their previous work that generated GPSs for five complex diseases, co-lead authors Amit Khera and Mark Chaffin, along with senior author Sekar Kathiresan and colleagues, have now developed a GPS for body mass index (BMI) and obesity, published today in Cell. To help promote obesity research, the authors have provided an open-access file listing the variants and weights that comprise the GPS. That file is now available for download from the Data page of our sister Knowledge Portal, the Cardiovascular Disease Knowledge Portal.

To generate this GPS, Khera and colleagues started with a large, recently published genome-wide association study (GWAS) for BMI in more than 300,000 UK Biobank participants (Locke et al., 2015) and applied an algorithm that assigned a weight to each of 2.1 million variants, also taking into account factors such as the proportion of variants with non-zero effect size and the degree of correlation between a variant and its neighbors. They validated the GPS by applying it to nearly 120,000 additional UK Biobank participants, finding that the score was strongly correlated with measured BMI, and then applied it to four independent testing datasets.

We don't have space here to cover the many interesting details uncovered by the researchers, but overall, this work shows that a high GPS strongly predicts increased risk of severe obesity, cardiometabolic disease, and all-cause mortality. Those with the very highest GPS had a level of risk for obesity similar to that conferred by a rare monogenic mutation in the MC4R gene.

The GPS has the potential to be a powerful tool for people struggling with overweight and obesity. "Importantly, we are in the early days of identifying how we can best inform and empower patients to overcome health risks in their genetic background," said Khera in a press release from the Broad Institute. "We are incredibly excited about the potential to improve health outcomes."

We invite you to read the paper, take a look at the file of variants and weights freely available from the CVDKP Data page, and contact us with any questions!

Friday, March 1, 2019

Faster access to tools from the CDKP home page

We've rearranged some of the links on the Cerebrovascular Disease Knowledge Portal home page, as a first step towards offering a central location for analysis tools. The previous link to the Variant Finder tool has been replaced by a link to the new Analysis modules page:

The new page, shown below, offers access to two analysis tools.

  • The Interactive Manhattan plot allows you to choose a phenotype and view variant associations across the genome for that phenotype.  We've added phenotype selection options to both the Analysis modules page and the Manhattan plot page, making it easier to switch your view between phenotypes.  The default view on the Manhattan plot page shows the largest dataset for a phenotype, but when multiple datasets exist, you can select any one to display. For many datasets, LD clumping is available at several r2 thresholds. Clumping reduces redundancy due to association signals from linked variants, pinpointing the most strongly associated variant in a group.
  • The Variant Finder is a versatile tool that allows you to set multiple criteria (phenotype, p-value, size and direction of effect, and more) and retrieve the set of variants meeting those criteria.
The new Analysis Modules page will be the central access point for new analysis tools as they are developed, so check back often for updates!

The new Analysis Modules page will be the central access point for new analysis tools as they are developed, so check back often for updates!

Monday, January 28, 2019

New dataset added to the CDKP: WMHV GWAS

We are pleased to announce that a new dataset has been added to the Cerebrovascular Disease Knowledge Portal: Cerebral WMHV GWAS 2019, from the recently published study "Genetic variation in PLEKHG1 is associated with white matter hyperintensities" (Traylor et al. 2019). This genome-wide association study consisting of 11,226 individuals sparked the discovery of a locus at genome-wide significance in an intron of PLEKHG1.

Results from this study may be viewed and searched in the CDKP at these locations:

  • on Gene pages (view an example) for the "Cerebral white matter hyperintensities" phenotype
  • on Variant pages (view an example) in the Associations at a glance section, the Associations across all datasets graphic and table, and in LocusZoom static plots
LocusZoom plot of WMHV associations in the PLEKHG1 gene

  • from the View full genetic association results for a phenotype search on the home page: first select the cerebral white matter hyperintensities phenotype and then on the resulting page, select the Cerebral WMHV GWAS 2019 dataset.
Summary statistics may also be downloaded from the CDKP Downloads page.

Please check out these new results and let us know if you have comments or questions!