Potential regulatory SNPs in promoters of human genes: A systematic approach
Introduction
Variations in the human DNA sequence between individuals can be an indication of predisposition to disease or affect the response to treatment [1]. Variations represented by Single Nucleotide Polymorphisms (SNPs) are becoming increasingly important tools for genetic and biomedical research. Although the current genomic databases contain information on several million SNPs and are growing at a very fast rate, the true value of a given SNP is not always clear. Many of those SNPs correspond to mere sequencing errors, while others are true neutral nucleotide substitutions. Targeting the correct SNP for large-scale association studies represents a major bottleneck, especially when it comes to SNPs which are located outside of open reading frames of genes.
Nevertheless, many successful studies of SNP association with particular disease phenotypes have been performed, even for SNPs located in promoter areas not included in mRNA. For instance, some regulatory SNP associations survived meta analysis studies that reveal an age-related pattern of risk of Alzheimer's disease associated with the IL-1A (−889) polymorphism [2] and the protective role of the myeloperoxidase (MPO) −463G→A polymorphism in lung cancer [3].
SNPs located in regulatory areas of human genes can significantly contribute to the cellular level of mRNA transcripts. For example, polymorphisms (−491A→T, −427C→T and −219G→T) in the promoter region of the apolipoprotein E (apoE) gene alter the level of its expression in an allele-specific manner [4], [5]. Similarly, the −330G allele of the IL2 gene showed two-fold higher levels of expression over the −330T allele and association with multiple sclerosis [6], [7]. The −232G allele of the PCK1 gene encoding phosphoenolpyruvate carboxykinase showed significantly increased basal expression with no down-regulation by insulin and had an association with type 2 diabetes mellitus [8]. Regulatory polymorphisms in the disease-related genes offers an obvious advantage, as the therapeutic manipulation of regulatory mutations should conceivably be easier than repairing or modulating the effects of an abnormal protein [9].
Unfortunately, in a promoter region broadly defined as a 2 kb sequence located upstream of the transcription start site there may be a large number of SNPs, some of which may be non-polymorphic in the study population, while others may fail to demonstrate a regulatory role. The search for relevant polymorphic candidates faces significant obstacles, due in part to both the high number of potentially promising SNPs and the intrinsic difficulties associated with identification of weak gene–disease interactions [10]. At present extensive case-control studies can be applied only to a limited number of gene polymorphisms due to high cost. The gene regulation based choice of SNPs that deserve an exhaustive cohort analysis is of primary importance.
Lack of functional certainty is prominent even in cases of very well studied disease-associated SNPs like the TNF-alpha promoter polymorphism −308G/A. This is a major pitfall for case-control genetics studies [11]. On the other hand, association studies depend on linkage disequilibrium (LD) between a causative mutation and its linked marker loci. So, even non-regulatory but associated SNPs may serve as helpful leads to a SNP that is truly causative. An observation of association of non-regulatory SNPs with disease is a prerequisite for detailed investigation of the variations in the genomic region within the same haplotype. Algorithms similar to ones used for primary searches for potential regulatory SNPs can also be applied to secondary searches aimed at revealing truly regulatory SNPs located in the vicinity of a disease-associated SNP previously described experimentally.
In this paper we describe an algorithm which allows in silico extraction of SNPs with a high probability of influence on the level of gene expression. To perform whole-genome analysis of SNP markers in regulatory areas of human genes we created the software SNP_TRAST (SNP Transcription Regulating Area Search Tool) and applied computational criteria for involvement of a given SNP in gene regulation. This study revealed 14127 first-line candidates for future association studies. A significant subset of these SNP markers confirmed to be polymorphic experimentally was organized in an open access database available by ftp://194.67.85.195/ and in the Supplementary Table.
Section snippets
Methods
We used version 2 of the NCBI assembly 34 available at ftp://ftp.ncbi.nih.gov/genebank/genomes/H_Sapiens (February 2004 data freeze). In this database the human genome was subdivided into contigs from 100 kb to 65 Mb in length. We also used MapView (build#3.2) containing annotations for 473 contigs which include 3.0×106 ESTs, representing 8.7×106 coding exons (ftp://ftp.ncbi.nih.gov/genomes/H_Sapiens/maps/mapview). The location and verification status of SNP markers was retrieved from dbSNP build
Determination of the cut-off level of matrix change by SNP introduction
Determining the relationship between the binding affinity of a particular TF for a single nucleotide substitution in its binding site is an important step in predicting whether a given uncharacterized SNP is responsible for the increase or decrease in expression of a given gene of interest.
To provide a solid basis for the estimation of the cut-off level of the matrix change we performed computational analysis of 18 SNPs experimentally found to influence the level of gene expression resulting in
Conclusions
In conclusion, we describe computational criteria for prediction of SNPs likely to influence gene expression levels. The method has identified 14,127 potentially regulatory SNPs suitable for future association studies. A population-validated set of regulatory SNP markers is organized in a database available by open access at ftp://194.67.85.195/ and in Supplementary Table.
Acknowledgements
Authors are extremely grateful to Prof. Nick K. Yankovsky, the Head of Genome Analysis Lab of Vavilov Insitute, for unwaivering support and to faculty of the MMB department, Dr. Christensen and Dr. Grant, for their help with everything. This work was supported by grants “Cancer Genomics and Development of Diagnostic Tools and Therapies” (Commonwealth Technology Research Fund, Virginia, USA), the FCNTP grant “Whole genome analysis of human polymorphisms” from the Russian Ministry of Science and
References (44)
- et al.
Allelic expression and interleukin-2 polymorphisms in multiple sclerosis
J Neuroimmunol
(2001) - et al.
Effects of the multiple sclerosis associated −330 promoter polymorphism in IL2 allelic expression
J Neuroimmunol
(2004) - et al.
Searching for cancer-associated gene polymorphisms, promises and obstacles
Cancer Lett
(2004) - et al.
Connective tissue growth factor, what's in a name?
Mol Genet Metab
(2000) - et al.
A naturally occurring sequence variation that creates a YY1 element is associated with increased cystic fibrosis transmembrane conductance regulator gene expression
J Biol Chem
(2000) - et al.
Severe factor VII deficiency due to a mutation disrupting an Sp1 binding site in the factor VII promoter
Blood
(1998) - et al.
A relationship between Matrix metalloproteinase-1 (MMP-1) promoter polymorphism and cervical cancer progression
Cancer Lett
(2003) - et al.
Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs
Cell
(2004) - et al.
A functional SNP in the promoter region of TCOF1 is associated with reduced gene expression and YY1 DNA-protein interaction
Gene
(2005) Prospects for whole-genome linkage disequilibrium mapping of common disease genes
Nat Genet
(1999)
Age-dependent association between interleukin-1A (−889) genetic polymorphism and sporadic Alzheimer's disease. A meta-analysis
J Neurol
Point: myeloperoxidase −463G→a polymorphism and lung cancer risk
Cancer Epidemiol Biomarkers Prev
A polymorphism in the regulatory region of APOE associated with risk for Alzheimer's dementia
Nat Genet
Contribution of APOE promoter polymorphisms to Alzheimer's disease risk
Neurology
Promoter polymorphism in PCK1 (phosphoenolpyruvate carboxykinase gene) associated with type 2 diabetes mellitus
J Clin Endocrinol Metab
Wanted, regulatory SNPs
Nat Genet
Is there a future for TNF promoter polymorphisms?
Genes Immun
MatInd and MatInspector, new fast and versatile tools for detection of consensus matches in nucleotide sequence data
Nucleic Acids Res
A comparative analysis of relative occurrence of transcription factor binding sites in vertebrate genomes and gene promoter areas
Bioinformatics
TRANSFAC, transcriptional regulation, from patterns to profiles”
Nucleic Acids Res
MySQL Reference Manual
A low number wins the genesweep pool
Science
Cited by (10)
Susceptibility and resistance to canine leishmaniose is associated to polymorphisms of the canine TNF-α gene
2011, European Journal of InflammationRelationship between -262C/T and -21A/T polymorphism of catalase gene and coal-burning borne fluorosis
2011, Chinese Journal of Endemiology