Wednesday, September 26, 2018

New data and new features in the CDKP

We are pleased to announce the addition of two new summary level data sets to the Cerebrovascular Disease Knowledge Portal. MEGASTROKE is a genome-wide association study of ~520,000 subjects, including controls. Using stroke risk scores and LD scores, this study discovered 22 novel stroke related loci. With the addition of this study, the sample size for stroke-related associations in the CDKP increases by more than 5-fold. Since the METASTROKE results previously available in the CDKP are a subset of MEGASTROKE, they are no longer displayed as a separate set.

Another new dataset in the CDKP is the Han Population Taiwan-NGCM study of ~2,000 subjects, a genome-wide association study that discovered novel loci for large- and small-vessel ischemic strokes. Results from both of these datasets may be searched using the Variant Finder tool and may be browsed:

• On Gene Pages in the Common variants and High-impact variants tables and in LocusZoom plots;

• On Variant Pages in the Associations at a glance section, the Associations across all datasets section, and in LocusZoom plots;

• From the View full genetic association results for a phenotype search on the home page: first select a phenotype, then select a dataset on the resulting page.

In addition to new results, the CDKP now includes four new features that simplify the interpretation of genetic association data, making it easier to pinpoint variants and datasets that are informative for a disease or phenotype of interest.

"Clumping" variants by linkage disequilibrium

The first step in getting an overview of the results of a particular experiment is typically to plot variant associations vs. chromosomal location, in a so-called "Manhattan plot." These plots are available from the CDKP home page after choosing a phenotype:

After selecting a phenotype, you may select a dataset, and the Manhattan plot is displayed above a table of the top variants:

Now, in addition to selecting a dataset to view associations, you may select a threshold for linkage disequilibrium (LD) in order to reduce the number of linked variants that represent a single association signal. For example, without "clumping" variants by LD (r2 = 1), when viewing the "All ischemic stroke" phenotype and the NINDS SiGN 2016 dataset, 9 of the top 25 significantly associated variants are near the KCNQ3 gene; but setting the most stringent LD threshold  (r2 = 0.1) reduces that number to just 2 variants by displaying only the most significant associations after clumping variants by LD. Intermediate LD thresholds of r2 = 0.2. 0.4, 0.6, or 0.8 may also be set, allowing more versatility in this analysis.

New Region page

The Gene page of the CDKP (see an example) integrates and summarizes information about the associations of variants across the region of a gene. Now, you can see this integration and summation for any region of the genome, not just the areas surrounding protein-coding genes. Simply enter a chromosome and coordinates in the home page search box:

The resulting page resembles a Gene page. The traffic light integrates all associations across the region to give you an immediate indication of whether there are significant associations found in any of the datasets in the CDKP. Further down the page, tools and displays let you drill down to the specifics for a phenotype or variant of interest. This new Region page provides a way to explore any part of the genome in great detail.

PheWAS graphic on the Variant page

Previously, the Variant page of the CDKP displayed significant associations for each variant in a graphic that showed a color-coded box for each phenotype-dataset combination. But the rapidly increasing number of phenotypes becoming available from biobank studies has made this view unsustainably large. In its place, we have incorporated a phenome-wide association study (PheWAS) visualization developed at the University of Michigan. The graphic shows at a glance which phenotype associations are most significant for a particular variant. Mouse over a point to see more details.

All Associations graphic on the Variant page

The PheWAS graphic distills variant associations in order to highlight the most significant ones. But suppose you want to drill down to the details and explore associations in every dataset, viewing parameters like sample size, odds ratio, and more? There's a graphic for that too: our new All Associations interactive graphic, located in the "Associations across all datasets" section of the variant page. Start by using keywords to filter phenotypes. Filtering allows you to view one specific phenotype, several related phenotypes, or phenotypes in a broad category, such as ischemic stroke; both the graphic and the table below it change in response to phenotype filtering.  There are also options to filter by setting ranges of p-values and/or sample sizes.

The graph plots p-value (vertical axis) vs. dataset sample size (horizontal axis) for each association. Points in the graph are triangular; whether the triangle points up or down indicates a positive or negative direction of effect, respectively. Mousing over a point shows you more details about the association and the dataset. This graphic can help you evaluate whether an association is likely to be real: a genuine signal should increase in significance (i.e., decrease in p-value) with increasing sample size.

Stay in touch!

Like the rest of the CDKP, these features are under continuous development. Please give them a try and let us know what you think.