Project Publications

Diagnostic Exome Sequencing in Persons with Severe Intellectual Disability.

de Ligt J, Willemsen MH, van Bon BW, Kleefstra T, Yntema HG, Kroes T, Vulto-van Silfhout AT, Koolen DA, de Vries P, Gilissen C, Del Rosario M, Hoischen A, Scheffer H, de Vries BB, Brunner HG, Veltman JA, Vissers LE.
October 2012
New England Journal of Medicine

Author's abstract Background The causes of intellectual disability remain largely unknown because of extensive clinical and genetic heterogeneity. Methods We evaluated patients with intellectual disability to exclude known causes of the disorder. We then sequenced the coding regions of more than 21,000 genes obtained from 100 patients with an IQ below 50 and their unaffected parents. A data-analysis procedure was developed to identify and classify de novo, autosomal recessive, and X-linked mutations. In addition, we used high-throughput resequencing to confirm new candidate genes in 765 persons with intellectual disability (a confirmation series). All mutations were evaluated by molecular geneticists and clinicians in the context of the patients' clinical presentation. Results We identified 79 de novo mutations in 53 of 100 patients. A total of 10 de novo mutations and 3 X-linked (maternally inherited) mutations that had been previously predicted to compromise the function of known intellectual-disability genes were found in 13 patients. Potentially causative de novo mutations in novel candidate genes were detected in 22 patients. Additional de novo mutations in 3 of these candidate genes were identified in patients with similar phenotypes in the confirmation series, providing support for mutations in these genes as the cause of intellectual disability. We detected no causative autosomal recessive inherited mutations in the discovery series. Thus, the total diagnostic yield was 16%, mostly involving de novo mutations. Conclusions De novo mutations represent an important cause of intellectual disability; exome sequencing was used as an effective diagnostic strategy for their detection. (Funded by the European Union and others.).

Impaired riboflavin transport due to missense mutations in SLC52A2 causes Brown-Vialetto-Van Laere syndrome

Haack TB, Makowski C, Yao Y, Graf E, Hempel M, Wieland T, Tauer U, Ahting U, Mayr JA, Freisinger P, Yoshimatsu H, Inui K, Strom TM, Meitinger T, Yonezawa A, Prokisch H.
August 2012
Journal of Inherited Metabolic Disease DOI: 10.1007/s10545-012-9513-y

Author's abstract: Brown-Vialetto-Van Laere syndrome (BVVLS [MIM 211530]) is a rare neurological disorder characterized by infancy onset sensorineural deafness and ponto-bulbar palsy. Mutations in SLC52A3 (formerly C20orf54), coding for riboflavin transporter 2 (hRFT2), have been identified as the molecular genetic correlate in several individuals with BVVLS. Exome sequencing of just one single case revealed that compound heterozygosity for two pathogenic mutations in the SLC52A2 gene coding for riboflavin transporter 3 (hRFT3), another member of the riboflavin transporter family, is also associated with BVVLS. Overexpression studies confirmed that the gene products of both mutant alleles have reduced riboflavin transport activities. While mutations in SLC52A3 cause decreased plasma riboflavin levels, concordant with a role of SLC52A3 in riboflavin uptake from food, the SLC52A2-mutant individual had normal plasma riboflavin concentrations, a finding in line with a postulated function of SLC52A2 in riboflavin uptake from blood into target cells. Our results contribute to the understanding of human riboflavin metabolism and underscore its role in the pathogenesis of BVVLS, thereby providing a rational basis for a high-dose riboflavin treatment.

Mutations in DNMT1 cause autosomal dominant cerebellar ataxia, deafness and narcolepsy

Winkelmann J, Lin L, Schormair B, Kornum BR, Faraco J, Plazzi G, Melberg A, Cornelio F, Urban AE, Pizza F, Poli F, Grubert F, Wieland T, Graf E, Hallmayer J, Strom TM, Mignot E.
May 2012
Human Molecular Genetics Hum Mol Genet. 2012 May 15;21(10):2205-10. Epub 2012 Feb 9.

Author's abstract: Autosomal dominant cerebellar ataxia, deafness and narcolepsy (ADCA-DN) is characterized by late onset (30-40 years old) cerebellar ataxia, sensory neuronal deafness, narcolepsy-cataplexy and dementia. We performed exome sequencing in five individuals from three ADCA-DN kindreds and identified DNMT1 as the only gene with mutations found in all five affected individuals. Sanger sequencing confirmed the de novo mutation p.Ala570Val in one family, and showed co-segregation of p.Val606Phe and p.Ala570Val, with the ADCA-DN phenotype, in two other kindreds. An additional ADCA-DN kindred with a p.GLY605Ala mutation was subsequently identified. Narcolepsy and deafness were the first symptoms to appear in all pedigrees, followed by ataxia. DNMT1 is a widely expressed DNA methyltransferase maintaining methylation patterns in development, and mediating transcriptional repression by direct binding to HDAC2. It is also highly expressed in immune cells and required for the differentiation of CD4+ into T regulatory cells. Mutations in exon 20 of this gene were recently reported to cause hereditary sensory neuropathy with dementia and hearing loss (HSAN1). Our mutations are all located in exon 21 and in very close spatial proximity, suggesting distinct phenotypes depending on mutation location within this gene.

Research priorities. ELSI 2.0 for Genomics and Society

Jane Kaye, Eric M. Meslin, Bartha M. Knoppers, Eric T. Juengst, Mylène Deschênes, Anne Cambon-Thomsen, Donald Chalmers, Jantina De Vries, Kelly Edwards, Nils Hoppe, Alastair Kent, Clement Adebamowo, Patricia Marshall, Kazuto Kato
11 May 2012
Science PMID: 22582247

Anticipating and addressing the ethical, legal, and social implications (ELSI) of scientific developments has been a key feature of the genomic research agenda (1–4). Research in genomics is advancing by developing common infrastructures and research platforms, open-access and sharing policies, and new forms of international collaborations (5–12). In this paper, we outline a proposal to establish a "collaboratory" (13) for ELSI research to enable it to become more coordinated, responsive to societal needs, and better able to apply the research knowledge it generates at the global level. Current ELSI research is generally nationally focused, with investigator-initiated approaches that are not always aligned with the developments in international genomics research. This makes it difficult to efficiently leverage findings that impact global practice and policy. Moreover, as translational genomic research design challenges become more pressing (14), ELSI research will need to develop greater capacity to respond rapidly to new developments. The ELSI 2.0 Initiative is designed to catalyze international collaboration in ELSI genomics and to enable those in the field to better assess the impact and dynamics of global genome research.

Lack of the mitochondrial protein acylglycerol kinase causes Sengers syndrome

Mayr JA, Haack TB, Graf E, Zimmermann FA, Wieland T, Haberberger B, Superti-Furga A, Kirschner J, Steinmann B, Baumgartner MR, Moroni I, Lamantea E, Zeviani M, Rodenburg RJ, Smeitink J, Strom TM, Meitinger T, Sperl W, Prokisch H.
26 January 2012
The American Journal of Human Genetics PMID: 22284826

Author's abstract: Exome sequencing of an individual with congenital cataracts, hypertrophic cardiomyopathy, skeletal myopathy, and lactic acidosis, all typical symptoms of Sengers syndrome, discovered two nonsense mutations in the gene encoding mitochondrial acylglycerol kinase (AGK). Mutation screening of AGK in further individuals with congenital cataracts and cardiomyopathy identi?ed numerous loss-offunction mutations in an additional eight families, con?rming the causal nature of AGK de?ciency in Sengers syndrome. The loss of AGK led to a decrease of the adenine nucleotide translocator in the inner mitochondrial membrane in muscle, consistent with a role of AGK in driving the assembly of the translocator as a result of its effects on phospholipid metabolism in mitochondria.

KLHL3 mutations cause familial hyperkalemic hypertension by impairing ion transport in the distal nephron

Louis-Dit-Picard H, Barc J, Trujillano D, Miserey-Lenkei S, Bouatia-Naji N, Pylypenko O, Beaurain G, Bonnefond A, Sand O, Simian C, Vidal-Petiot E, Soukaseum C, Mandet C, Broux F, Chabre O, Delahousse M, Esnault V, Fiquet B, Houillier P, Bagnis CI, Koenig J, Konrad M, Landais P, Mourani C, Niaudet P, Probst V, Thauvin C, Unwin RJ, Soroka SD, Ehret G, Ossowski S, Caulfield M; International Consortium for Blood Pressure (ICBP), Bruneval P, Estivill X, Froguel P, Hadchouel J, Schott JJ, Jeunemaitre X.
11 March 2012
Nature Genetics doi:10.1038/ng.2218

Familial hyperkalemic hypertension (FHHt) is a Mendelian form of arterial hypertension that is partially explained by mutations in WNK1 and WNK4 that lead to increased activity of the Na(+)-Cl(-) cotransporter (NCC) in the distal nephron. Using combined linkage analysis and whole-exome sequencing in two families, we identified KLHL3 as a third gene responsible for FHHt. Direct sequencing of 43 other affected individuals revealed 11 additional missense mutations that were associated with heterogeneous phenotypes and diverse modes of inheritance. Polymorphisms at KLHL3 were not associated with blood pressure. The KLHL3 protein belongs to the BTB-BACK-kelch family of actin-binding proteins that recruit substrates for Cullin3-based ubiquitin ligase complexes. KLHL3 is coexpressed with NCC and downregulates NCC expression at the cell surface. Our study establishes a role for KLHL3 as a new member of the complex signaling pathway regulating ion homeostasis in the distal nephron and indirectly blood pressure.
PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data
Yanju Zhang, Eric-Wubbo Lameijer, Peter A. C. 't Hoen, Zemin Ning, P. Eline Slagboom and Kai Ye
15 February 2012
Bioinformatics doi: 10.1093/bioinformatics/btr712

Author's abstract: Motivation: RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon–exon boundaries. The accurate and efficient mapping of these reads to the reference genome is a major challenge. Results: We developed PASSion, a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads. Comparing the performance of PASSion to three existing RNA-Seq analysis pipelines, TopHat, MapSplice and HMMSplicer, revealed that PASSion is competitive with these packages. Moreover, the performance of PASSion is not affected by read length and coverage. It performs better than the other three approaches when detecting junctions in highly abundant transcripts. PASSion has the ability to detect junctions that do not have known splicing motifs, which cannot be found by the other tools. Of the two public RNA-Seq datasets, PASSion predicted ? 137 000 and 173 000 splicing events, of which on average 82 are known junctions annotated in the Ensembl transcript database and 18% are novel. In addition, our package can discover differential and shared splicing patterns among multiple samples.
Increased sensitivity of NGS-based expression profiling after globin reduction in human blood RNA
Mastrokolias A, den Dunnen JT, van Ommen GB, 't Hoen PA, van Roon-Mom WM.
18 January 2012
BMC Genomics doi:10.1186/1471-2164-13-28

Transcriptome analysis is of great interest in clinical research, where significant differences between individuals can be translated into biomarkers of disease. Although next generation sequencing provides robust, comparable and highly informative expression profiling data, with several million of tags per blood sample, reticulocyte globin transcripts can constitute up to 76% of total mRNA compromising the detection of low abundant transcripts. We have removed globin transcripts from 6 human whole blood RNA samples with a human globin reduction kit and compared them with the same non-reduced samples using deep Serial Analysis of Gene Expression.
Next generation sequencing technologies and applications for human Genetic History and Forensics
Eva C Berglund, Anna Kiialainen and Ann-Christine Syvanen
Investigative Genetics doi:10.1186/2041-2223-2-23

Author's provisional abstract: The rapid advances in the development of sequencing technologies in recent years enable an increasing number of applications in biology and medicine. Here we review key technical aspects of the preparation of DNA templates for sequencing, the biochemical reaction principles and assay formats underlying next generation sequencing systems, methods for imaging and base calling, quality control, and bioinformatic approaches for sequence alignment, variant calling and assembly. We also discuss some of the most important advances that the new sequencing technologies have brought to the fields of human population genetics, human genetic history and forensic genetics.
Gene Expression Atlas update--a value-added database of microarray and sequencing-based functional genomics experiments.
Kapushesky M et al (last author Brazma A)

Nucleic acids research Nucl. Acids Res. (2011)doi: 10.1093/nar/gkr913

Gene Expression Atlas ( is an added-value database providing information about gene expression in different cell types, organism parts, developmental stages, disease states, sample treatments and other biological/experimental conditions. The content of this database derives from curation, re-annotation and statistical analysis of selected data from the ArrayExpress Archive and the European Nucleotide Archive. A simple interface allows the user to query for differential gene expression either by gene names or attributes or by biological conditions, e.g. diseases, organism parts or cell types. Since our previous report we made 20 monthly releases and, as of Release 11.08 (August 2011), the database supports 19 species, which contains expression data measured for 19?014 biological conditions in 136?551 assays from 5598 independent studies.
Adaptor Protein Complex 4 Deficiency Causes Severe Autosomal-Recessive Intellectual Disability, Progressive Spastic Paraplegia, Shy Character, and Short Stature
Rami Abou Jamra, Orianne Philippe, Annick Raas-Rothschild, Sebastian H. Eck, Elisabeth Graf, Rebecca Buchert, Guntram Borck, Arif Ekici, Felix F. Brockschmidt, Markus M. No¨then, Arnold Munnich, Tim M. Strom, Andre Reis, and Laurence Colleaux
The American Journal of Human Genetics doi:10.1016/j.ajhg.2011.04.019

Authors' Abstract: Intellectual disability inherited in an autosomal-recessive fashion represents an important fraction of severe cognitive-dysfunction disorders. Yet, the extreme heterogeneity of these conditions markedly hampers gene identi?cation. Here, we report on eight affected individuals who were from three consanguineous families and presented with severe intellectual disability (...). Using a combination of autozygosity mapping and either Sanger sequencing of candidate genes or next-generation exome sequencing, we identi?ed one mutation in each of three genes encoding adaptor protein complex 4 (AP4) subunits(...). Combined with previous observations, these results support the hypothesis that AP4- complex-mediated traf?cking plays a crucial role in brain development and functioning and demonstrate the existence of a clinically recognizable syndrome due to de?ciency of the AP4 complex
Publications related to the project
Genome and exome sequencing in the clinic: unbiased genomic approaches with a high diagnostic yield
Marcel Nelen, Joris A Veltman
4 April 2012
Future Medicine PMID: 22462741

"...we think whole-genome- or exome-based approaches are currently most suited for diagnostic implementation in genetically heterogeneous diseases, initially to complement and later to replace Sanger sequencing, qPCR and genomic microarrays."
Disease gene identification strategies for exome sequencing
Gilissen C, Hoischen A, Brunner HG, Veltman JA.
18 January 2012
European Journal of Human Genetics Eur J Hum Genet. 2012 Jan 18. doi: 10.1038/ejhg.2011.258.

Next generation sequencing can be used to search for Mendelian disease genes in an unbiased manner by sequencing the entire protein-coding sequence, known as the exome, or even the entire human genome. Identifying the pathogenic mutation amongst thousands to millions of genomic variants is a major challenge, and novel variant prioritization strategies are required. The choice of these strategies depends on the availability of well-phenotyped patients and family members, the mode of inheritance, the severity of the disease and its population frequency. In this review, we discuss the current strategies for Mendelian disease gene identification by exome resequencing. We conclude that exome strategies are successful and identify new Mendelian disease genes in approximately 60% of the projects. Improvements in bioinformatics as well as in sequencing technology will likely increase the success rate even further. Exome sequencing is likely to become the most commonly used tool for Mendelian disease gene identification for the coming years.
Unlocking Mendelian disease using exome sequencing
Christian Gilissen, Alexander Hoischen, Han G Brunner and Joris A Veltman
Genome Biology

Exome sequencing is revolutionizing Mendelian disease gene identification. This results in improved clinical diagnosis, more accurate genotype-phenotype correlations and new insights into the role of rare genomic variation in disease.
Rare and Common Regulatory Variation in Population-Scale Sequenced Human Genomes
Stephen B. Montgomery,* Tuuli Lappalainen, Maria Gutierrez-Arcelus, and Emmanouil T. Dermitzakis
PLoS Genetics doi: 10.1371/journal.pgen.1002144

Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia
Puente XS et al (incl. Guigó R, Bayés M, Gut I, Estivill X.)
Nature doi: 10.1038/nature10113.

Chronic lymphocytic leukaemia (CLL), the most frequent leukaemia in adults in Western countries, is a heterogeneous disease with variable clinical presentation and evolution. Two major molecular subtypes can be distinguished, characterized respectively by a high or low number of somatic hypermutations in the variable region of immunoglobulin genes. The molecular changes leading to the pathogenesis of the disease are still poorly understood. Here we performed whole-genome sequencing of four cases of CLL and identified 46 somatic mutations that potentially affect gene function. Further analysis of these mutations in 363 patients with CLL identified four genes that are recurrently mutated (...) To our knowledge, this is the first comprehensive analysis of CLL combining whole-genome sequencing with clinical characteristics and clinical outcomes. It highlights the usefulness of this approach for the identification of clinically relevant mutations in cancer.
The GENCODE exome: sequencing the complete human exome
Alison J Coffey, Felix Kokocinski, Maria S Calafato, Carol E Scott, Priit Palta, Eleanor Drury, Christopher J Joyce, Emily M LeProust, Jen Harrow, Sarah Hunt, Anna-Elina Lehesjoki, Daniel J Turner, Tim J Hubbard and Aarno Palotie
European Journal of Human Genetics doi:10.1038/ejhg.2011.28

Sequencing the coding regions, the exome, of the human genome is one of the major current strategies to identify low frequency and rare variants associated with human disease traits. So far, the most widely used commercial exome capture reagents have mainly targeted the consensus coding sequence (CCDS) database. We report the design of an extended set of targets for capturing the complete human exome, based on annotation from the GENCODE consortium. The extended set covers an additional 5594 genes and 10.3?Mb compared with the current CCDS-based sets. The additional regions include potential disease genes previously inaccessible to exome resequencing studies, such as 43 genes linked to ion channel activity and 70 genes linked to protein kinase activity. In total, the new GENCODE exome set developed here covers 47.9?Mb and performed well in sequence capture experiments. In the sample set used in this study, we identified over 5000 SNP variants more in the GENCODE exome target (24%) than in the CCDS-based exome sequencing.
A pipeline for RNA-seq data processing and quality assessment
Goncalves A, Tikhonov A, Brazma A, Kapushesky M.
Bioinformatics Bioinformatics. 2011 Mar 15;27(6):867-9

SUMMARY: We present an R based pipeline, ArrayExpressHTS, for pre-processing, expression estimation and data quality assessment of high-throughput sequencing transcriptional profiling (RNA-seq) datasets. The pipeline starts from raw sequence files and produces standard Bioconductor R objects containing gene or transcript measurements for downstream analysis along with web reports for data quality assessment. It may be run locally on a user's own computer or remotely on a distributed R-cloud farm at the European Bioinformatics Institute. It can be used to analyse user's own datasets or public RNA-seq datasets from the ArrayExpress Archive. AVAILABILITY: The R package is available at with online documentation at, also available as supplementary material.
The effect of next-generation sequencing technology on complex trait research
Day-Williams AG, Zeggini E.
European Journal of Clinical Investigation doi: 10.1111/j.1365-2362.2010.02437.x

" Advances in the understanding of complex trait genetics have always been enabled by advances in genomic technology. Next-generation sequencing (NGS) is set to revolutionize the way complex trait genetics research is carried out."
Diversity of Human Copy Number Variation and Multicopy Genes
Peter H. Sudmant, et al.
Science 2010 Oct 29;330(6004):641-6.

"Copy number variants affect both disease and normal phenotypic variation, but those lying within heavily duplicated, highly identical sequence have been difficult to assay. [...] We identified 4.1 million "singly unique nucleotide" positions informative in distinguishing specific copies[...]. These data identify human-specific expansions in genes associated with brain development, reveal extensive population genetic diversity, and detect signatures consistent with gene conversion in the human species. Our approach makes ~1000 genes accessible to genetic studies of disease association. [...]"
Big science: the cancer genome challenge
Heidi Ledford
Nature 2010 Apr 15;464(7291):972-4.

"[...] Because cancer is a disease so intimately associated with genetic mutation, many thought it would be amenable to genomic exploration through initiatives based on the collaborative model of the Human Genome Project. The International Cancer Genome Consortium (ICGC), formed in 2008, is coordinating efforts to sequence 500 tumours from each of 50 cancers. Together, these projects will cost in the order of US$1 billion.[...]"
1000 Genomes Project Gives New Map Of Genetic Diversity
Elizabeth Pennisi
Science 2010 Oct 29;330(6004):574-5

"Talk about inflation. A decade ago, one human genome was the goal. Now nothing less than 1000 will do. By sequencing hundreds of human genomes, the 1000 Genomes Project has produced the most detailed catalog of human variation ever: a compendium of millions of previously unknown single-nucleotide polymorphisms (SNPs) and other variants. [...]"
Genomes by the thousand
Nature 2010 Oct 28;467(7319):1026-7

" Ten years ago, two fingers were enough to count the number of sequenced human genomes. Until last year, the fingers on two hands were enough. Today, the rate of such sequencing is escalating so fast it is hard to keep track. Nature attempted nevertheless: [...] at least 2,700 human genomes will have been completed by the end of this month, and that the total will rise to more than 30,000 by the end of 2011. [...]"
International network of cancer genome projects
The International Cancer Genome Consortium
Nature 2010 Apr 15;464(7291):993-8.

"The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic, epigenomic and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.[...]"
Transcriptome genetics using second generation sequencing in a Caucasian population
Stephen B. Montgomery et al
Nature 2010 Apr 1;464(7289):773-7

"Gene expression is an important phenotype that informs about genetic and environmental effects on cellular state. [...]Second generation sequencing technologies are now providingunprecedented access to the fine structure of the transcriptome. We have sequenced the mRNA fraction of the transcriptome in 60 extended HapMap individuals of European descent and have combined these data with genetic variants from the HapMap3 project15.[...]This analysis shows that high throughput sequencing technologies reveal new properties of genetic effects on the transcriptome and allow the exploration of genetic effects in cellular processes.[...]"
A map of human genome variation from population-scale sequencing
The 1000 Genomes project consortium
Nature 2010 Oct 28;467(7319):1061-73

"The aim of the 1000 Genomes Project is to discover, genotype and provide accurate haplotype information on all forms of human DNA polymorphism in multiple human populations. [...] Here we report the results of the pilot phase of the project. [...]The results give us a much deeper, more uniform picture of human genetic variation than was previously available, providing new insights into the landscapes of functional variation, genetic association and natural selection in humans.[...]"