1. Overview

          NPdenovo is a database which integrates de novo mutations including autism spectrum disorders (ASD), intellectual disability (ID), epileptic encephalopathy (EE), schizophrenia (SCZ) and unaffected siblings (Control) by whole-genome or whole-exome sequencing. NPdenovo supporting different ways to browse and search de novo mutations, neuropsychiatric diseases associated genes, enriched gene ontology and spatio-temporal expression in human brain. In addition, NPdenovo provides many functional and useful tools to analyze neuropsychiatric and spatio-temporal transcriptome, including BLAT, overlap gene in diseases, custom extreme mutation and co-expression. Here is the overall flow chart of our experiment.

2. Data Summary

          At present, the NPdenovo database contains a total of 41225 variants, including 37511 SNVs and 3714 Indels. Users can get more details here.

3. Browse

          All de novo mutations of four neuropsychiatric disorders (ASD, EE, ID, SCZ) and controls collected from currently available trios-based WES/WGS studies are listed here. You can browse de novo mutations by specific disorder, effect or chromosome. Results are sorted based on "Total damaging score" by default, other options "Cytoband" and "Gene symbol" are available. The relatively longer InDels sequence is automatically hidden in consideration of conciseness. You can make it visible by clicking on the triangle button in the position, then the sequence will be displayed on the next line. See graphic below.

          All genes of four neuropsychiatric disorders (ASD, EE, ID, SCZ) collected from currently available trio-based WES/WGS studies are listed here. You can browse associated genes by disorder, associated level or chromosome. This table are sorted by p-value of selected disorder.
          "UCSC" column of the result provides a hyperlink redirecting to a UCSC page, in which all of the mutations that are associated with neuropsychiatric disorders and contained in the gene region will be displayed. The following graph shows UCSC page of SCN2A. The red rectangle on the left shows labels of de novo mutations. The mutation sites are displayed on the graph.
          Associated levels are classified as following:

associated level p value
strong p-value < 0.0001
suggestive 0.0001 ≤ p-value < 0.001
positive 0.001 ≤ p-value < 0.01
possible 0.01 ≤ p-value < 0.05
negative p-value ≥ 0.05 or there is no damaging mutation in this gene

          The expression level of genes in human brain are listed here. You can browse gene expression levels by period, region, or chromosome. This result is sorted by gene symbol. The abbreviations and their corresponding denotations are listed below.

Periods Regions
period1 embryonic (4PCW~8PCW) V1C primary visual (V1) cortex
period2 early fetal (8PCW~10PCW) ITC inferior temporal cortex
period3 early feta (10PCW~13PCW) IPC posterior inferior parietal cortex
period4 early mid-fetal (13PCW~16PCW) A1C primary auditory (A1) cortex
period5 early mid-fetal (16PCW~19PCW) STC superior temporal cortex
period6 late mid-fetal (19PCW~24PCW) M1C primary motor (M1) cortex
period7 late fetal (24PCW~38PCW) S1C primary somatosensory (S1) cortex
period8 neonatal and early infancy (0M (birth)~6M) VFC ventrolateral prefrontal cortex
period9 late infancy (6M~12M) MFC medial prefrontal cortex
period10 early childhood (1Y~6Y) DFC dorsolateral prefrontal cortex
period11 middle and late childhood (6Y~12Y) OFC orbital prefrontal cortex
period12 adolescence (12Y~20Y) STR striatum
period13 young adulthood (20Y~40Y) HIP hippocampus
period14 middle adulthood (40Y~60Y) AMY amygdala
period15 late adulthood (60Y~die) MD mediodorsal nucleus of the thalamus
CBC cerebellar cortex

          Gene enrichment of function, including "Gene Ontology", "Transcription Factor Target", "MicroRNA Target", "KEGG", "Wikipathways" and "Pathway Commons", was performed by WebGestalt (WEB-based GEne SeT AnaLysis Toolkit). For "Gene Ontology", each disease has a GO map page from which you can click on the red nodes to view the genes of interest with descriptions and links to ensembl. In "Genes related" column shows a triangle, you can click it to see related genes contained in the enrichment of function. In the result, p-value are adjusted by BH test (proposed by Benjamini & Hochberg). If several disorders share the same enrichment of function, a hyperlinked name of disorder will be listed in "Other disease" column. See diagram below.
          

4. Search

          You can search information by gene symbol, gene id, cytoband, or chromosome coordinate. You can also download the search results.
    Search strategy:
          1. Gene symbol: Fuzzy search is supported. For example, if you enter "SCN" or "scn", all of genes whose gene symbol containing uppercase, lowercase or mixed case of "SCN" will be listed in the results.
          2. Gene id: Only accurate search is supported. If you enter "6326", only the gene with gene id equivalent to "6326" will be listed.
          3. Cytoband: Supporting fuzzy search. If you enter "12q13" or "12q13.", all of genes on 12q13 will be listed. If you enter "12q", all of genes on 12q will be listed.
          4. Chromosome coordinate: You can search a section of chromosome (e.g. "chr2:166210819-1662108190") or a point on chromosome (e.g. "chr2:166210819").

          Input one or more gene symbol, gene id, cytoband or chromosome coordinate into the input box at the left side of advanced search page or upload a file, each item being separated by any type of blank character. You can search any gene in specific disorder, gene region, effect, mutation type, brain expression level, chromosome coordinate, et al. In dataset module, if you select "Spatio-temporal brain expression dataset", related data in brain expression database will be displayed. In brain expression module, if you select a minimum brain expression level, only the genes whose brain expression level greater than the threshold will be shown. For search strategy please see quick search module.

5. Gene detail

          At anywhere in this database, hyperlink was provided for each gene symbol. The detail information of this gene will be displayed when you click the hyperlink. This detail information contains "Gene information", "Mutation information", "Mutation annotation", "Brain expression" and "Co-expression".

          There are 4 modules: basic information, disease annotation, OMIM annotation and MGI annotation. Several hyperlinks (including NCBI, UCSC, HGNC, OMIM, Ensembl, HPRD, Vega and GeneCards) are provided in basic information module. P-value and associated level of each disorder are also provided.

          Basic mutation information are listed here. You can see more detailed mutation information by clicking on the triangle button in the "Annotation" column. See diagram below:

          There are 9 modules: function annotation, Conserved and constrained region annotation, dbSNP annotation, 1000g annotation, ESP and CG annotation, Genome feature annotation, wgEncodeBroadHmm feature annotation, RNA related and miRNA target. The function annotation module have 7 prediction softwares, and they are "SIFT", "PolyPhen2 HDIV", "PolyPhen2 HVAR", "LRT", "Mutation Taster", "Mutation Assessor" and "FATHMM". Prediction results will be shown when your mouse hover on it.

          This page includes brain expression data of 3 databases: Human Brain Transcriptome, BrainSpan Atlas and Human Laser Micro-Dissection data. Brain expression graph is also exhibited, which exhibits the tendency of brain expression level the varies from embryonic to death briefly. See graph below.

          1. Human Brain Transcriptome (http://hbatlas.org/): The Human Brain Transcriptome (HBT) project is a public database containing transcriptome data and associated metadata for the developing and adult human brain. The project provides genome-wide, exon-level transcriptome data generated using the Affymetrix GeneChip Human Exon 1.0 ST Arrays from over 1,340 tissue samples sampled from both hemispheres of postmortem human brains. Specimens range from embryonic development to adulthood and are representative of both males and females from multiple ethnicities. A total of 16 brain regions were sampled: the cerebellar cortex, mediodorsal nucleus of the thalamus, striatum, amygdala, hippocampus, and 11 areas of the neocortex.
          2. BrainSpan Atlas (http://www.brainspan.org/): The BrainSpan atlas is a foundational resource for studying transcriptional mechanisms involved in human brain development, including Developmental Transcriptome, Prenatal LMD Microarray, ISH and Reference Atlas.
          3. Human Laser Micro-Dissection data: High-resolution neuroanatomical transcriptional profiles of ~300 distinct structures spanning the entire brain for four midgestional prenatal specimens. These data are freely accessible as part of the BrainSpan Atlas of the Developing Human Brain (http://www.brainspan.org/) via the AllenBrain Atlas data portal (http://www.brain-map.org/).

          This page shows spatio-temporal co-expression data from 3 databases: Human Brain Transcriptome, BrainSpan Atlas and Human Laser Micro-Dissection data. Those genes with pearson correlation coeffecient larger than 0.8 will be exhibited in descending order of pearson. In addition, you can click the hyperlink above the tables to analyse more complicated co-expression.

6. Analysis

          BLAT: The BLAST-Like Alignment Tool is a pairwise sequence alignment algorithm that was developed to assist in the assembly and annotation of the human genome. By either enter FASTA format sequences or upload a file, you can align your sequences with one of the 5 databases: NPdenovo-protein, NPdenovo-nucleotide, Human-protein, Human-RefSeqGene and GRCH37/hg19. NPdenovo-protein and NPdenovo-nucleotide respectively represent protein and RefSeq gene sequences which are identified as carrying de novo mutations associated with neuropsychiatry in current study. Human-protein and Human-RefSeqGene respectively represent protein and RefSeq gene sequences of human. GRCH37/hg19 represents human genome downloaded from UCSC.
          You can also set some advanced parameters such as below.
    Advanced options:
          -tileSize=N   sets the size of match that triggers an alignment. Usually between 8 and 12. Default is 11 for DNA and 5 for protein.
          -stepSize=N   spacing between tiles. Default is tileSize.
          -minMatch=N   sets the number of tile matches. Usually set from 2 to 4. Default is 2 for nucleotide, 1 for protein.
          -minScore=N   sets minimum score. This is the matches minus the mismatches minus some sort of gap penalty. Default is 30
          -minIdentity=N   Sets minimum sequence identity (in percent). Default is 90 for nucleotide searches, 25 for protein or translated protein searches.
          -maxGap=N   sets the size of maximum gap between tiles in a clump. Usually set from 0 to 3. Default is 2. Only relevant for minMatch > 1.


          By overlap genes analysis, you can get overlap genes shared in ASD, EE, ID and SCZ. You can also filter your results by selecting specific effects, gene regions and associated level. There are 5 degrees of association levels defined according to the p value, and they are "strong"( p-value < 0.0001), "suggestive"(0.0001 ≤ p-value < 0.001), "positive"(0.001 ≤ p-value < 0.01), "possible"(0.01 ≤ p-value < 0.05) and "negative"( p-value ≥ 0.05,or no damaging mutation).
          In the results, the quantity of each genes shared in different disorders, the p-value and associated level of each gene are listed. In the last column of the results, every gene has a co-expression icon, you can click this icon to see co-expression network of the chosen gene. All of the genes in the results are hyperlinked, you can get detailed information of the gene by simply clicking on it.

          Extreme DNMs were defined as rare and damaging mutations in CDS region which are more likely to be pathogenic. You can define extreme mutation by yourself. In Damage prediction field, you can choose whatever prediction softwares you want. You can also set the minimun number of prediction softwares that predict the mutation to be damaging. Default number is 6, which means mutations that are predicted as damaging by over 6 softwares will be displayed. In dbSNP filters, 1000g filters, ESP and CG filters, all of the mutations will be filtered by the databases you choosed. In 1000g filters, ESP and CG filters, minor allele frequency (MAF) could be set as a filtering threshold, default is 0.01.

           By spatio-temporal co-expression analysis, a network of co-expression will be displayed. You can set a number of parameters to get the network you want. The size of nodes are ranked from large to small, a larger node refers to a stronger mutation. The color of nodes are ranked from dark to light, a darker color denotes a closer correlation with central node. You can set a maximum nodes of network. In this case, these genes which locate closest to central gene will be displayed in network. See graph below.

        In the result, several tables of the connections will also be displayed. You can exhibit or hide a table by click the triangle button at the right of gene symbol (marked by red circle). See diagram below. The co-expression results are sorted by pearson correlation coefficient in descending order. All of the tables could be downloaded.

7. Submit de novo mutation

          If you find any neuropsychiatric disorder de novo mutations, please feel free to submit your data. Your kindness would be of great help for our future development. The coordinate is based on GRCh37 (hg19). You need to convert your data into a tab-separated file. The following picture shows an example of submitted file.

8. Download

          Extreme de novo mutations: This file contains all the extreme de novo mutations.
          Extreme de novo mutations and annotations: Extreme de novo mutations have been annotated in several database, such as 1000 genome, dbSNPs, ESP6500, ESP5400, et al.
          De novo mutations in coding region: This file contains all the de novo mutations in coding region.
          De novo mutations and annotations in coding region: De novo mutations in coding region have been annotated in several database, such as 1000 genome, dbSNPs, ESP6500, ESP5400, et al.
          P-value of all de novo mutations: This file contains p-values of all the de novo mutations.