Introduction

Well-recognized biological features of the non-classical class Ib HLA-G molecule that differ from other classical class I HLA (-A, -B and -C) molecules include: (1) limited protein variability, (2) presence of several membrane-bound and soluble isoforms, generated by alternative splicing of the primary transcript, (3) unique molecular structure, presenting a reduced cytoplasmic tail, (4) modulation of the immune response, and (5) restricted tissue expression. Polymorphic sites present in coding and non-coding regions of the HLA-G gene may potentially affect all of these biological features. Nucleotide variability in the promoter or in the 3′ untranslated region (3′ UTR) may influence HLA-G levels by modifying the affinity of gene targeted sequences for transcriptional or post-transcriptional factors, respectively. Likewise, nucleotide variability in the coding region may produce conformational changes in the molecule, which may modify its major functions, i.e., interaction with cell receptors, isoform production, modulation of the immune response, polymerization features and ability to couple peptides. In this section, we highlight the major characteristics of HLA-G gene polymorphic sites that are relevant for the understanding of molecule function, gene regulation and evolution, as well as the implications of these features on selected disease associations.

Nomenclature of HLA-G alleles

Compared to classical HLA class I genes that exhibit hundreds of alleles, the HLA-G locus presents only a few variants. Due to the increasing number of reported HLA alleles, much effort has been devoted to standardize HLA nomenclature. Currently, an allele name may be composed of four, six or eight digits. The first two digits refer to the allele family, and the third and fourth assign the order in which the sequences were reported. Therefore, an allele that differs in these first four digits must have at least one non-synonymous nucleotide substitution, i.e., modifying the amino acid sequence of the encoded protein.

To date, 44 HLA-G alleles have been described, which encode 14 distinct functional proteins with all isoforms (HLA-G*0101, *0102, *0103, *0104, *0106, *0107, *0108, *0109, *0110, *0111, *0112, *0114, *0115 and *0116) and modified proteins encoded by the G*0105N allele [1] (The International Immunogenetics Database-IMGT/HLA, database version 2.28.0, January 2010). Exemplifying, the HLA-G*0101 allele differs from G*0103 by a non-synonymous substitution at exon 2, codon 31, position 292, where ACG codes for threonine and TCG codes for serine. On the other hand, alleles exhibiting synonymous nucleotide substitutions in the coding sequence, producing no modification of the amino acid sequence of the encoded protein, are distinguished by the use of the fifth and sixth digits. The HLA-G*010401 and G*010404 alleles are typical examples. Both alleles differ at position +1827 (codon 267), where the G*010401 allele presents a CCG and the G*010404 allele a CCA, both encoding the same residue of proline. Finally, distinct nucleotide sequences observed in introns or in the 3′ or 5′ untranslated regions are distinguished by adding a seventh or eighth digit. Exemplifying, the G*01010104 and G*01010105 alleles encode the same protein sequence and present the same coding sequence, except for a nucleotide exchange in intron 1 at position 99, in which G*01010104 presents an adenine and G*01010105 a guanine. Other examples can be retrieved from data illustrated in Fig. 1.

Fig. 1
figure 1

Nucleotide sequences, from exon 1 to 4, described for the 44 alleles or haplotypes observed in the coding region of the HLA-G gene (IMGT version 2.28.0, January 2010). Asterisks represent that no official information regarding these single nucleotide polymorphisms was available. Hyphen indicates deletion. Amino acid codes: A alanine, S serine, F phenylalanine, Y tyrosine, T threonine, M methionine, Q glutamine, R arginine, E glutamic acid, P proline, H histidine, G glycine, D aspartic acid, V valine, C cysteine, L leucine, I isoleucine, W tryptophan

Although the nomenclature has been quite appropriate to designate the adequate site of nucleotide substitution, because of the ever-increasing number of HLA alleles, particularly of the HLA-A and HLA-B loci, in which more than 100 non-synonymous nucleotide substitutions have been described for the same allele group, the WHO Nomenclature Committee for Factors of the HLA System has decided to introduce colons (:) into allele designations to delimit the separate fields, to be used starting in April 2010 (Anthony Nolan Research Institute, http://hla.alleles.org). Then, according to the new nomenclature, the HLA-G*01010101 allele is designated as HLA-G*01:01:01:01. Although the number of proteins encoded by all HLA-G alleles is less than half the total number of alleles, the discovery of nucleotide substitutions has been continuously increasing, particularly those responsible for synonymous substitutions. Therefore, only the new nomenclature will be used throughout this review.

Due to the great number of synonymous nucleotide variations in the coding region, only few encoded proteins are described, accounting for HLA-G limited protein variability. In contrast, many polymorphic sites in promoter and 3′UTR have been reported that may influence HLA-G expression.

HLA-G coding region polymorphism: impact on HLA-G molecule features

Similar to classical HLA class I molecules, HLA-G presents a heavy chain, non-covalently associated with a β2-microglobulin. The HLA-G gene also shows similarity to the classical HLA loci, exhibiting 7 introns and 8 exons, coding only the heavy chain of the molecule and located at chromosome 6, whereas β2-microglobulin is encoded by a gene at chromosome 15. Exon 1 encodes the peptide signal, exons 2, 3 and 4, the extracellular α1, α2 and α3 domains, respectively, and exons 5 and 6 the transmembrane and the cytoplasmic domain of the heavy chain. Compared to classical class I molecules, HLA-G has a shortened cytoplasmic domain because of the presence of a premature stop codon in exon 6. Exon 7 is always absent from the mature mRNA and, due to the stop codon in exon 6, exon 8 is not translated [2].

Based on the gametic phase (haplotypes) of 72 single nucleotide polymorphisms (SNP) observed between exon 1 and intron 6, 44 HLA-G coding alleles were defined (IMGT, database 2.28.0, January 2010) (Fig. 1). Considering the region between exon 1 and exon 6, which encodes the external portion and the transmembrane region of the HLA-G molecule, all of these segments do present several nucleotide variations [1]. The heavy chain encoding region exhibits 33 SNPs; however, only 13 amino acid variations are observed, 4 of them in α1, 6 in α2 and 3 in the α3 domain (Fig. 2). On the other hand, exons 1 and 5 both present only two synonymous nucleotide substitutions, hence the signal peptide and the transmembrane portion of the molecule are invariable. Despite the limited protein variability, amino acid substitutions may account for the biological function of HLA-G, including peptide binding, isoform production, and ability to polymerize and modulate immune system cells.

Fig. 2
figure 2

HLA-G α1, α2 and α3 domains. Variable amino acid positions are indicated in blue circles for each domain (codons). The consensus amino acids are based on the *01:01:01:01 amino acid sequence, and the amino acid exchange for each coding-region allele is given. Probably G*01:13N is not expressed; all domains of the G*01:05N may be expressed. Based on IMGT database version 2.28.0, January 2010. See also [12, 109]

Peptide-binding properties

Nucleotide variability in the coding region of the HLA-G gene is evenly distributed throughout exons 2 (α1 domain), 3 (α2 domain) and 4 (α3 domain), as well as in introns, contrasting with classical HLA class I genes, in which nucleotide variation is clustered around the peptide-binding groove encoded by exon 2 (α1 domain) and exon 3 (α2 domain), which influences peptide loading, peptide diversity and T cell recognition.

Although both classical and non-classical HLA molecules bind to peptides, HLA-G has a limited peptide repertoire. Peptides isolated from placental cells revealed that 15% of the HLA-G-bound peptides were derived from a single cytokine-related protein [3], and peptides obtained from transfected cells were derived from a restricted number of proteins, including histone H2A, nuclear and ribosomal proteins, and cytokine receptors [4, 5]. The conformation of the HLA-G and bound peptide complex is similar to that observed for classical molecules; however, the peptide is buried deeper in the HLA-G cleft (reviewed by Clements et al. [6]). Given these features of the HLA-G peptide ensemble, and considering the absence of reported HLA-G restricted T cells, peptide presentation is not the major function of the HLA-G molecule. Apparently, polymorphic sites around the peptide groove may not influence this biological feature; however, the molecular structure of the distinct HLA-G proteins coupled with different peptides has not been fully investigated.

Membrane-bound and soluble isoforms

HLA-G presents 7 protein isoforms (Fig. 3), generated by alternative splicing of the primary transcript, 4 of them being membrane-bound (HLA-G1, G2, G3 and G4) and 3 soluble (G5, G6 and G7) isoforms. HLA-G1 is the complete isoform exhibiting a structure similar to that of the membrane-bound classical HLA molecule, associated with β2-microglobulin. The HLA-G2 isoform has no α2 domain, encoded by exon 3. HLA-G3 presents no α2 and α3 domains, encoded by exons 3 and 4. HLA-G4 has no α3 domain, encoded by exon 4. The soluble HLA-G5 and HLA-G6 isoforms present the same extraglobular domains of HLA-G1 and HLA-G2, respectively, generated by transcripts conserving intron 4, which blocks the translation of the transmembrane domain (exon 5). The 5′ region of intron 4 is translated until the generation of a stop codon, conferring to the HLA-G5 and HLA-G6 isoforms a tail of 21 amino acids implicated on their solubility. The HLA-G7 isoform has only the α1 domain linked to two amino acids encoded by intron 2, which is retained in the corresponding transcript. All alternative transcripts lack exon 7 [710].

Fig. 3
figure 3

Isoforms of HLA-G produced by alternative splicing of the primary mRNA. The HLA-G heavy chain domains (α1, α2, α3, transmembrane portion and cytoplasmic tail) are encoded by the HLA-G gene (chromosome 6), and the light β2-microglobulin molecule is encoded by a gene at chromosome 15. Exon 7 is always spliced out. The final portion of exon 6 and exon 8 is always transcribed, however, never translated due to the presence of a premature stop codon at the beginning of exon 6 (red stop signal), and has been considered as the 3′untranslated region of the gene (3′UTR). The primary transcript may be spliced out into 7 isoforms, HLA-G1 to −G7. HLA-G1 is the full-length HLA-G molecule, HLA-G2 lacks exon 3, HLA-G3 lacks exons 3 and 4, and HLA-G4 lacks exon 4. HLA-G1 to −G4 are membrane-bound molecules due to the presence of the transmembrane and cytoplasmic tail encoded by exons 5 and 6. HLA-G5 is similar to HLA-G1 but retains intron 4, HLA-G6 lacks exon 3 but retains intron 4, and HLA-G7 lacks exon 3 but retains intron 2. HLA-G5 and -G6 are soluble forms due to the presence of intron 4, which contains a premature stop codon at exon 4 (blue stop signal), preventing the translation of the transmembrane and cytoplasmic tail. HLA-G7 is soluble due to the presence of intron 2, which presents a premature stop codon (green stop signal). The G*01:13N allele is probably not expressed due to the presence of a premature stop codon at exon 2 (codon 54). A deletion of a cytosine (ΔC) at exon 3 of the G*01:05N allele changes the reading frame, leading to a stop codon at exon 4

Most of the currently described HLA-G alleles may theoretically produce, by alternative splicing, all membrane bound and soluble isoforms. In contrast, the HLA-G*01:05N (null allele, G*0105N) presents a cytosine deletion (ΔC) in the last nucleotide of codon 129 or first nucleotide of codon 130 in exon 3, causing a shift in the reading frame [11, 12], leading to a downstream stop signal (TGA) in codon 189 (exon 4). As a corollary, this allele is associated with an incomplete formation of the HLA-G1, -G4 and G5 isoforms that possess the α3 domain, and normal expression of HLA-G2, G3 and G7 isoforms that lack the α3 domain [12]. The other null allele, G*01:13N (G*0103N), presents a C → T transition at the first base of codon 54 in exon 2 (α1 domain), yielding the formation of a premature stop codon (TAG), which prevents the production of all membrane-bound and soluble isoforms or produces presumably nonfunctional proteins [13]. Figure 3 illustrates the major features of HLA-G isoforms and major sites responsible for physiological or truncated transcription patterns.

Polymerization

The presence of a cysteine residue at position 42 in the α1 domain of the heavy chain is a unique feature of the HLA-G molecule, which permits the formation of dimers via a Cyst42–Cyst42 intermolecular disulfide bond. Dimers may be formed by recombinant HLA-G monomers, membrane-bound and soluble molecules in transfected cells [14], membrane-bound on the surface of cells that endogenously express HLA-G, and even by β2-microglobulin-free HLA-G constructs [15]. Once formed, the dimer has an oblique orientation, exposing the binding sites for CD8 and leukocyte immunoglobulin-like receptors B1/B2 (LILRB1/LILRB2, also known as LIR1/LIR2, ILT2/ILT4, and CD85j/CD85d), permitting one HLA-G dimer to interact with two LILR or CD8 molecules. Besides providing higher affinity for LILRs in relation to monomers, HLA-G dimers exhibit enhanced LILR-mediated intracellular signaling [16].

None of the HLA-G alleles already described presents nucleotide variation at codon 42 (TGT) that encodes the cysteine residue. Therefore, apparently all alleles that encode non-truncated HLA-G molecules may form dimers. Since the dimer interface may be stabilized by electrostatic and hydrogen-bond interactions, it is possible that nearby polymorphic residues may influence dimer stability or alter the flexibility of the Cyst42–Cyst42 disulfide bond.

Modulation of the immune response

Several lines of evidence have supported the role of HLA-G as a tolerogenic molecule, playing an important role in the suppression of the immune response [2, 17]. Overall, all segments of the molecule may contribute to this ability. The short cytoplasmic tail retains HLA-G longer in the endoplasmic reticulum and prolongs the half-life of the molecule on the cell surface because of the lack of an endocytosis motif [18, 19], permitting multiple interactions with cells of the immune system. The extracellular domains interact with leukocyte receptors, including CD8, LILRB1 and LILRB2 and the killer cell immunoglobulin-like receptor KIR2DL4 (CD158d) [20, 21].

According to the nomenclature, KIR genes may encode molecules exhibiting two (KIR2D) or three (KIR3D) extracellular immunoglobulin-like domains, which may contain immunoreceptor tyrosine-based inhibitory receptor-ITIM (L-originally for long tail) or not (S-originally for short tail) in the cytoplasmic domain. KIR genes with the “L” designation are predicted to be inhibitory receptors, while those with “S” are predicted to encode activator receptors. Besides, “S” genes contain in their transmembrane domain a positively charged arginine or lysine residue. The KIR2DL4 gene is an exception, containing an arginine in the transmembrane domain and a single ITIM in the cytoplasmic domain. An array of activating and inhibitor receptors is expressed on the surface of most NK cells and macrophages, and the final effector function depends on the balance between activator and inhibitor receptors [22]. Because of these features, the role of the interaction of KIR2DL4 with HLA-G in the modulation of the immune response has been a matter of much debate [23, 24]. It has even been reported that cross-linking of KIR2DL4 with a specific antibody or the incubation with transfectants expressing HLA-G dimers upregulates several pro-inflammatory cytokines [25]. Overall, the α1 domain of HLA class I molecules is an important KIR recognition site; however, the binding site of KIR2DL4 to HLA-G remains unknown.

Since the Met76 and Gln79 residues are unique to HLA-G, and since the mutation of these residues to Ala76/79 affects the binding affinity between KIR2DL4 and HLA-G, these residues may be relevant candidates for this interaction [26]. Although Met76 and Gln79 residues are conserved in the α1 domain of all currently described HLA-G variants, one KIR2DL4 allele (KIR2DL4*006) has been associated with susceptibility to preeclampsia [27], suggesting that both HLA-G and KIR polymorphisms may be considered for functional studies.

LILRB1s are expressed on the surface of several leukocytes, such as NK and lymphomononuclear cells, while LILRB2s are primarily expressed on the surface of a restricted set of cells, including monocytes and dendritic cells [28]. Both LILRB1 and LILRB2 have several ITIM receptors in their cytoplasmic tail, inhibiting signaling events triggered by stimulatory receptors [29]. Both receptors interact with classical HLA class I molecules; however, the binding with HLA-G presents three- to fourfold higher affinity when compared to classical molecules [30]. LILRB1 and LILRB2 bind to the α3 domain and β2-microglobulin of the HLA-G molecule; however, LILRB2 binds with higher affinity than LILRB1. LILRB2 binds more to the α3 domain than to the β2-microglobulin domain, and the binding sites of these receptors are distinct. The Tyr36 and Arg38 residues of LILRB2 bind to the 195–197 loop of the α3 domain of HLA-G, whereas the Tyr38 and Tyr76 residues of LILRB1 bind to the Phe195 of HLA-G [20]. Given that LILRBs are considered to be the major HLA-G receptors, and since at least 4 HLA-G alleles (G*01:06, G*01:08, G*01:14 and G*01:16, see Fig. 2) present non-synonymous polymorphic sites at exon 4, which codes the α3 domain of the molecule, one might expect that polymorphic residues observed in this domain may influence LILRB interactions, modulating the inhibitory intracellular signaling. It is interesting to observe that the G*01:06 allele has been associated with preeclampsia in French [31] and Singaporean populations [32].

The CD8 molecule also interacts with the α3 domain of classical and non-classical molecules, including HLA-G and HLA-E, although exhibiting different affinities. CD8α/α binds to HLA-G with an affinity comparable to that of the HLA-A*02 interaction, contrasting with lower affinity for HLA-E. The affinity of this interaction may vary according to the considered allele. While most HLA-A alleles have high affinity for the α3 domain, HLA-A*68:01 and B*48:01 exhibit low affinity. Mutation analyses of classical HLA molecules have indicated that polymorphism at position 245 is responsible for this reduced affinity. The major differences between HLA-E and HLA-G in the α3 domain reside at positions 219, 223 and 224 for HLA-E and 214 and 228 for HLA-G, which are located very close to the HLA/CD8α/α interaction; i.e., the 223–229 loop of the α3 domain [21]. Although affinity studies have not been performed for variants of HLA-G molecules, those exhibiting different residues in the α3 domain (see Fig. 2) may potentially present diverse receptor affinity. It is interesting to observe that the α3 domain of HLA-G is the same site of interaction with CD8α/α and LILRBs. Although CD8α/α and LILIRB binding sites overlap, LILIRBs inhibit the binding of CD8α/α to HLA molecules, indicating that the inhibitory function of LILRBs may be dual, displacing CD8α/α and activating ITIMs [30].

Besides the inhibitory role of HLA-G in cytotoxic cells exhibiting CD8 on their surface, other interactions of HLA-G molecules with CD8 cells deserve attention. Soluble HLA-A-B-C and -G molecules can induce apoptosis in CD8+ activated T lymphocytes as well as in CD8+ NK cells (lacking the T cell receptor) at similar rates. The binding of soluble HLA molecules to CD8 leads to apoptosis upregulating Fas production and Fas/FasL interaction [33, 34]. Classical and non-classical soluble molecules may produce similar effects on apoptosis of CD8+ cells; however, in conditions in which soluble HLA-G is elevated, including pregnancy, some tumors and allografts with good prognosis, this mechanism may represent an additional immunomodulatory effect of HLA-G. Although the precise mechanism of this interaction has not been elucidated, the interaction of soluble forms of HLA-G containing the α3 domain seems to be plausible.

HLA-G non-coding region polymorphism: impact on HLA-G expression regulation

The functional mRNA level of a particular gene is regulated by the rate of synthesis, mainly driven by the promoter region (5′UTR) of a given gene, as well as by the rate of degradation, stability, localization and translatability of the specific mRNA [35]. Depending on the presence of microenvironmental factors that may upregulate the expression of HLA-G and depending on the genetic background of the individual, theoretically, any tissue might express HLA-G. Furthermore, HLA-G expression may be profitable or harmful, depending on the fine adjustment of the immune response. Pathological conditions in which a vigorous and maintained immune response is desirable, the expression of HLA-G is detrimental, like in chronic viral disorders and cancer. In contrast, when a vigorous immune response is undesirable, the expression of HLA-G is profitable, like in autoimmune disorders and engraftment of allogeneic organs or tissues.

Many factors have been described that potentially affect transcriptional and post-transcriptional mechanisms responsible for HLA-G regulation [36]; however, the reasons for HLA-G expression in some but not in other tissues have not been fully elucidated. In this respect, the promoter and 3′untranslated regions exhibit numerous nucleotide variations that may influence HLA-G expression and consequently tissue distribution in healthy (placenta, thymus, cornea, pancreas, proximal nail matrix, erythroblasts, mesenchymal stem cells) and pathological conditions (cancer, autoimmunity, transplantation). Besides, variation sites observed at introns 1–5 may be involved in HLA-G regulation processes, such as alternative splicing.

HLA-G promoter region polymorphisms

Overall, classical HLA-class I genes contain two main regulatory modules in the proximal promoter region (200 base pairs upstream of ATG), namely enhancer A (enhA)/interferon-stimulated response element (ISRE) and SXY [37] (Fig. 4, upper panel). EnhA comprises two palindromic binding sites, kappa(κ)B2 and κB1, for the NFκB/rel family members [38], and ISRE, a target site for the interferon regulatory factor family, including the interferon regulatory factor-1 (IRF-1) [39]. SXY comprises the X1, X2 (site α) boxes and Y box (also named enhancer B), bound by the multiprotein complex regulatory factor X (RFX) [comprising RFX5, RFX-associated protein (RFXAP) and RFX-ankyrin(ANK)/B] [4043], X2 box-binding protein (X2-BP)/cAMP response element-binding (CREB) [44] and nuclear factor Y (NF-Y) [45]. All these factors cooperate to allow the formation of a stable multiprotein complex and the binding of the class II transactivator (CIITA), which mediates constitutive and IFN-γ-induced expression of HLA-class I molecules [46, 47]. S box function is not fully understood and could possibly play a role in the promoter architecture [48]

Fig. 4
figure 4

Upper panel Comparison of cis-regulatory elements of classical HLA-class I and HLA-G proximal promoter regions (200 base pairs upstream of ATG). Dotted boxes with slashes indicate mutated regulatory elements in the HLA-G promoter. Mutations prevent binding of major classical HLA-class I transacting factors. RFX complex binds to the conserved HLA-G X1 box in electrophoretic mobility shift assay but is not associated to the HLA-G promoter in situ. Lower panel Single nucleotide polymorphisms along the whole HLA-G gene promoter sequence (1.4 kb upstream of ATG) as well as the location of the known regulatory elements

The HLA-G promoter region is unique among the HLA genes with a divergent proximal region when compared to the other HLA genes [49]. A modified enhA and a deleted ISRE render the proximal HLA-G gene promoter unresponsive to NF-κB [38] and IFN-γ [39]. The upstream region encompassing the SXY module only contains conserved S and X1 sequences. While the X1 box was shown to be a potential target for the RFX5 factors by electrophoretic mobility shift assay [50], chromatin immunoprecipitation assay analysis demonstrated that RFX5 is not associated with the HLA-G promoter in situ [51].

A few alternative regulatory elements within the HLA-G gene promoter have been described. Using HLA-G transgenic mice, a 244-bp positive regulatory region located −1.2 kb from the ATG initiation codon of the HLA-G gene is critical for its spatio-temporal expression, presenting a function similar to that of the locus control region (LCR) [52, 53]. The CREB1 factor binds to this region (−1380/−1370), as well as to two additional cAMP response elements (CRE) dispersed through the promoter region at positions −934 and −770 from the ATG, allowing promoter transactivation with the co-activators CREB binding protein (CBP)/P300 [54]. In addition, an ISRE motif, which is highly homologous to the consensus ISRE, is located at position −744 bp upstream of ATG [55], beside a nonfunctional GAS-like element (−734) [56]. ISRE is a binding site for IFN-response factor-1 (IRF-1) and transactivates HLA-G expression following IFN-β treatment [55]. A heat shock element (HSE) is located within the HLA-G promoter at position −459/−454 that binds heat shock factor 1 (HSF-1) [57]. Librach’s group identified a binding site for the progesterone receptor located −37 bp from ATG that is overlapping the HLA-G TATA box [58]. Finally, the ras responsive element binding 1 factor (RREB-1) was recently demonstrated to downregulate HLA-G transcriptional activity through three ras response elements (RREs) along the HLA-G gene promoter (−1356, −142, −53) [59]. RREB-1 is likely to act through the recruitment of C-terminal binding protein (CtBP) implicated on chromatin remodeling [60].

Contrary to expectations, the HLA-G promoter exhibits numerous variations since 29 SNPs have been identified to date [6163]. They may be of importance in the regulation of HLA-G expression and may act in different ways. Interestingly, many of these polymorphisms either coincide with or are closed to the known regulatory elements described above, and thus may affect the binding of the corresponding regulatory factors (Fig. 4, lower panel). Based on this assumption, polymorphisms have been identified in the LCR. Regarding −725 G/C/T SNP, the three haplotypes were cloned into luciferase expression vectors and transfected into JEG-3 cells, resulting in a significantly higher expression level of the promoter haplotype containing the −725G allele compared with those containing the −725C or −725T alleles [64]. Of note is that the −725 G/C/T SNP is very close to ISRE. Along this line, polymorphism in the proximal promoter of Paan-AG, the functional homologue of HLA-G in the olive baboon, was shown to influence NF-κB binding and transcriptional activity [65]. In addition, evidence has been accumulated showing that the methylation status of the HLA-G gene promoter is crucial to the transcriptional activity of the gene [66, 67]. One may speculate that promoter methylation might also be affected by polymorphisms located at CpG sites [64]. Finally, in some cases, polymorphism in the promoter region may be in linkage disequilibrium with 3′ UTR variants [68], and some of them could influence alternative splicing [69] and mRNA stability [70].

Interestingly, the pattern of variations observed in the HLA-G promoter region is characterized by two divergent lineages, which is consistent with balancing selection. Ober’s group proposed that it was probably related to highly regulated expression favoring high- and low-expressing promoters under temporally and/or spatially varying selective pressures [61]. Therefore, this should be considered as a very pertinent parameter in the understanding of HLA-G gene regulation and more particularly in some HLA-G-associated diseases. So far, only a few studies have addressed these issues. One study performed in Chicago families and in a Dutch population demonstrated that the GG genotype in SNP −964 G/A was associated with asthma in children of affected mothers, whereas the AA genotype was associated with asthma in children of unaffected mothers [68]. Another study associated the −725G variant with sporadic miscarriage [71], but, contrary to expectation, this variant was found to be associated with increased HLA-G transcription [71]. Other studies did not reveal any significant association with HLA-G promoter polymorphism, levels of HLA-G and diseases. No association was observed between −56 C/T polymorphism and preeclampsia (n = 118) and eclampsia (n = 46) [72], or between −725 C/G/T polymorphism with susceptibility to multiple sclerosis (MS) and the course of MS (n = 698) [73]. Therefore, additional and larger studies encompassing several polymorphisms of the HLA-G promoter region in various diseases are required for a definitive conclusion. Further functional analysis may complete this scenario.

HLA-G 3′ untranslated region (3′UTR) polymorphisms

Since exon 7 is always absent from the mature mRNA and, due to the stop codon in exon 6, exon 8 is not translated, this gene segment has been considered to be the 3′UTR of the mature RNA. The HLA-G 3′UTR contains several regulatory elements [35], including polyadenylation signals and AU-rich elements [74], as well as signals that regulate the spatial and temporal expression of an mRNA [35]. The primary transcripts must be processed and bound by proteins before being exported to the cytoplasm [75]. In this process, the Cap protein is added, introns are removed by the spliceosome, and the 3′ ends are cleaved and polyadenylated. The Cap-binding complex (CBC) binds to the 5′ monomethylated cap structures and poly(A)-binding proteins (PAB) to the 3′ tail in order to export competent messenger ribonucleoprotein particles (mRNPs) necessary to mRNA transportation and translation [35, 75]. The proteins that bind to an mRNA can influence its translation, localization and degradation, and any polymorphism at 3′UTR of a given gene may influence the binding properties. The availability of the mRNA for translation is constantly balanced by the opposing force of mRNA retention and decay, in which non-functional and deleterious transcripts may be eliminated before translation [35, 7577] (for a detailed review regarding the 3′UTR translational control see [35]). In addition, a 3′UTR of a given gene may be a target for microRNAs (miRNAs), which are small non-coding RNAs with approximately 22 nucleotides processed from longer precursors. miRNAs may negatively regulate gene expression by translation suppression, RNA degradation or both [35, 78]. The first miRNA was reported in 1993 [79], and more than 600 have been reported to date [80].

Before going to the HLA-G 3′UTR polymorphisms, it should be mentioned that there is no consensus in the literature regarding the position of the nucleotide variation in this region. As reported by IMGT, the first nucleotide of the coding sequence of the HLA-G gene is the adenine of the first codon at exon 1 (ATG), extending the sequence until position 2838 at intron 6. Since there is no official information beyond this point, nucleotide positions that we and other authors have used [78, 81, 82] infer polymorphic sites at 3′UTR using the original HLA-G sequence described by Geraghty et al. [83], considering the first base the adenine of the first ATG codon at exon 1, as well as public sequences reported for chromosome 6. On the other hand, other authors have used the original sequence described by Geraghty et al. [83], including part of the promoter sequence, a region encompassing 781 nucleotides before the first ATG. Currently, more than 1,500 nucleotides are considered to be the promoter sequence, and 781 bases represent only an aleatory fragment of the promoter region base sequence, as originally reported in 1987. Therefore, these authors refer to 3′UTR polymorphisms encompassing these 781 base pairs. In addition, the original sequence described by Geraghty et al. does not consider the 14 base pair (bp) insertion, since the sequenced allele was G*01:01:01:01, which has a 14-bp deletion. Given that 14-bp insertion or deletion allele frequencies are closely similar in several populations, and given that the insertion allele is present in gorillas and chimpanzees, probably representing the ancestor allele, the 14-bp sequence should be included in the sequence, as considered by us and other authors [78, 81, 82]. Therefore, the nucleotide variation sequence considered from the first ATG codon at exon 1, including the 14-bp insertion, seems to be the more appropriate form to assign 3’UTR polymorphisms. For instance, the SNP at the +3010C/G position is considered by other authors as +3777, i.e., 3010 plus 781 (aleatory part of the promoter region) minus 14 bp (a segment absent in the allele described by Geraghty et al.).

The HLA-G gene exhibits a 3′UTR presenting AU-rich motifs, a poly-A signal, and several polymorphic sites that may potentially influence HLA-G transcription, translation or both by several different mechanisms. Among them, it is worth mentioning: (1) the presence (insertion) or absence (deletion) of a 14-bp fragment (INDEL polymorphism), which has been associated with mRNA stability, (2) the SNP at the +3142 position, which may be a target for certain miRNAs, degrading HLA-G mRNA, and (3) the SNP at the +3187 position, which is related to mRNA stability and degradation. Figure 5 illustrates the 3′UTR of the HLA-G gene, emphasizing the polymorphic sites that may influence HLA-G expression.

Fig. 5
figure 5

Variation sites, described by us and by other authors, observed in the HLA-G 3′ untranslated region, which may influence HLA-G expression. The frequencies of some of these polymorphic sites observed in a highly heterogenous population such as Brazilians are also shown. a Castelli et al. [82]. b Hviid et al. [87]. c Hiby et al. [85]. d Rousseau et al. [70]. e Alvarez et al. [74]. f Unpublished data. This SNP was found in the Brazilian population in only two individuals. g Tan et al. [81]. h Yie et al. [91]

Although the 14-bp INDEL polymorphism has been associated with the magnitude of HLA-G production [84], modulating HLA-G mRNA stability [70, 8587], the mechanisms implicated have not been elucidated. HLA-G alleles presenting the 14-bp (5′-ATTTGTTCATGCCT-3′) sequence [88] have been associated with lower mRNA production for most membrane-bound and soluble isoforms in trophoblast samples [87, 89]. On the other hand, a fraction of HLA-G mRNA transcripts presenting the 14-base insertion can be further processed (alternatively spliced) by the removal of 92 bases from the mature HLA-G mRNA (Fig. 4) [85, 87], yielding smaller HLA-G transcripts, reported to be more stable than the complete mRNA forms [70]. The ratio between −92 and +92 base transcripts may vary according to the cell line studied, being 0.5 for JEG-3 cells (that endogenously produces HLA-G) and 0.2 for transfected M8 melanoma cells [70]. This alternative splicing is probably not only driven by the presence of the 14-base fragment in the primary transcript and may be a consequence of the presence of other polymorphisms in linkage disequilibrium with the 14-bp insertion. In fact, JEG-3 cells are homozygous for the G*01:01:03 allele, and M8 cells are transfected with the G*01:01:02 allele, both presenting nucleotide differences in exons 3 and 5, and also in introns 1 and 4. In addition, both G*01:01:03 and G*01:01:02 alleles are associated with different 3’UTR haplotypes that differ at the +3035, +3187 and +3196 positions, all preserved after the alternative splicing [82, 90]. In conclusion, the combination of such polymorphisms, encompassing the coding region and 3′UTR, may influence the amount of primary 14-base insertion transcripts that lose the 92-base sequence.

The nucleotide variation at the +3142 position (Fig. 5) was recently explored by Tan et al. [81], who demonstrated that this marker may be related to susceptibility to asthma as well as influencing HLA-G expression. The presence of a guanine at the +3142 position may influence the expression of the HLA-G locus by increasing the affinity of this region for the microRNAS miR-148a, miR-148b and miR-152, therefore decreasing the mRNA availability by mRNA degradation and translation suppression. The influence of the +3142G allele was demonstrated by a functional study in which HLA-G high-expressing JEG-3 cells were transfected with miR-148a, decreasing soluble HLA-G levels. In addition, the miRNAs miR-148a and miR-148b were detected in bronchial epithelial cells but were underrepresented in the cytotrophoblast cells, which is in agreement with the higher expression of HLA-G in cytotrophoblast in relation to bronchial epithelial cells. Similarly to miR-142a, miR142b and miR-152, an in silico analysis has revealed that several human miRNAs have the potential to bind to the HLA-G mRNA 3′UTR and influence HLA-G expression. The binding ability of these miRNAs may potentially be influenced by several polymorphisms present in the HLA-G 3′UTR, emphasizing the role of the 14-bp fragment, SNPs at the +3003, +3010, +3027 and +3035 positions, which encompass a region of only 32 nucleotides that may influence the binding of several miRNAs, and the +3142 SNP, as previously discussed [90].

A recent study has reported that the presence of an adenine at the +3187 position is associated with preeclampsia in a Canadian population [91]. This SNP is near (4-bp upstream) an AU-rich motif that mediates mRNA degradation. The same report demonstrated that this SNP is associated with decreased RNA stability in vitro, and the presence of the +3187A allele may lead to a decreased HLA-G expression [91].

These three polymorphic sites associated with HLA-G production may also be associated with each other, illustrating a scenario in which their influence may not be mutually exclusive. It is noteworthy that the 14-bp insertion is always accompanied by the +3142G and +3187A alleles, both previously associated with low mRNA availability, indicating that the low mRNA production associated with the 14-bp insertion [87] may also be a consequence of the presence of these polymorphisms associated with the 14-bp fragment [82].

In addition to the described variation sites discussed above, the 3′UTR presents several other polymorphisms frequently observed in highly genetically diverse populations [82, 92], which have not been associated with differential HLA-G expression so far, as illustrated in Fig. 5. The polymorphic sites at the 3′UTR seem to be arranged in several haplotypes, each of them associated either with a single or a group of coding and promoter region polymorphisms [74, 82], creating an extended HLA-G haplotype. Exemplifying, the +3003C allele is associated with other 3′UTR variation, including the 14-bp deletion, +3142C and +3187A alleles, as well as the coding region nucleotide variation typical of the G*01:01:01:05 allele, and further with a specific promoter region polymorphism [82].

In conclusion, in contrast to the coding region, the 3′UTR of the HLA-G locus presents a high degree of variation, presenting several polymorphic sites that may potentially influence mRNA stability, turnover, mobility and splicing pattern. The expression of HLA-G is a complex process modulated by many factors such as the promoter efficiency, driven by 5′ promoter polymorphisms, and the rate of mRNA degradation or translation, highly influenced by the mRNA 3′UTR polymorphisms. Although several polymorphisms in the HLA-G 3′UTR have been previously related to mRNA stability or degradation, their influence seems to occur simultaneously since these alleles are characteristically associated in haplotypes.

Evolution aspects of HLA-G

To understand the mechanisms that generated diversity at the HLA-G locus, some evolutionary aspects observed in New and Old World primates, in Pongidae and in humans are emphasized, including the genetic diversity of the HLA-G locus in worldwide populations.

To avoid misunderstandings regarding the primate Major Histocompatibility Complex nomenclature, the general abbreviation MHC-G will be used to assign the HLA-G homologous locus. A similar approach will be used for other MHC loci.

New World primates

Considering the lineage that gave rise to Old World and anthropoid monkeys, the New World monkey lineage separated about 38 million years ago (mya). The cotton-top tamarin (Saguinus oedipus, Saoe) that inhabits the Central-South American continent (Costa Rica, Panama, Colombia) is a typical example of this group. Saoes presumably have functional MHC-G-like genes instead of MHC-A and -B genes [93]. MHC-C sequences have also been detected ([94], Fig. 6), which may strongly interact with KIRs [95]. Saoes share more sequence homologies with HLA-G than with classical HLA class I genes [93, 96]. Thus, it has been proposed that HLA-G could be the ancestral MHC class I gene and that MHC class I genes of the Saoes could be homologous to the HLA-G locus. On the other hand, it has also been shown that MHC-E may be more similar to Saoe MHC (Fig. 7) when MHC-G and, in addition, MHC-E are analyzed together [95, 97].

Fig. 6
figure 6

New World monkeys: the cotton-top tamarin (Saguinus oedipus, Saoe) MHC-C. Two different MHC-C sequences have been found (GenBank accession numbers AF005217 and AF005218), which cluster with other primates MHC-C. This is represented in a NJ dendrogram. HLA human MHC, Patr chimpanzee, Gogo gorilla, Popy orangutan, Hyla gibbon, Mamu M. mulatta

Fig. 7
figure 7

The MHC-G DNA sequence of cotton-top tamarins (Saoe) is more closely related to the primate MHC-E than to the primate MHC-G sequence in the NJ tree (shown), also regarding DNA base percentage similarity (not shown). HLA human MHC, Patr chimpanzee MHC, Gogo gorilla MHC, Popy orangutan MHC, Hyla gibbon MHC, Mamu M. mulatta

MHC-G intron 2 sequences show conserved motifs in all primate species studied so far, particularly a 23-bp deletion between positions 161 and 183, which seems to be locus specific [97, 98]; however, the Saoe MHC-G intron 2 does not bear this deletion (Fig. 8), which is surprising. The two most feasible explanations for this phenomenon could be that: (1) the MHC-G-like sequences in Saoe described did not give rise to the Old World monkey and human MHC-G alleles, or (2) the 23-bp deletion (Fig. 8) most likely occurred after separation of the New World:Old World monkey lineages about 38 mya. The first hypothesis is more plausible since eluted peptides from cotton-top tamarin MHC-G like molecules are not typical of MHC-G [G. Rammansee, personal communication, 4]).

Fig. 8
figure 8

MHC-G intron 2 nucleotide sequences compared to other MHC sequences (see also [98, 101]. The box indicates the 23-bp deletion observed in all introns 2 from MHC-G sequences. Cotton-top tamarins (Saoe) do not show this typical deletion, and this feature further casts doubts about whether the described cotton-top tamarin’s majority MHC alleles belong to the MHC-G lineage. Otherwise, this deletion may have appeared in the Old World monkey’s lineage. Saoe intron 2 sequence belongs to Saoe-G*02 and -G*04 alleles, which were sequenced in three different monkeys. Identity between residues is indicated by a dash and deletions are denoted by an asterisk

On the other hand, Saoe usually gives birth to monozygotic twins after 4–5 months of pregnancy and, thereafter, their fathers take care of the newborns until they are weaned and only give them to their mother for feeding [99]. Whether or not this peculiar pregnancy is related to the peculiar MHC-G system of this species has to be determined. Due to the high polymorphism observed in Saoe MHC-G molecules (11 alleles) (Fig. 9), it is possible that these G-like molecules are classical antigen-presenting molecules in New World monkeys since no other class I MHC molecules have been found, except one that seems to be MHC-C, not known to be translated into protein [94, 95, 100].

Fig. 9
figure 9

Saoe (cotton-top tamarins) MHC-G clusters in relatedness dendrograms with MHC-E of other primates (see Fig. 7)

In conclusion, many MHC-G-like molecules exist in monkey species that separated from the human lineage at least 38 mya. Since MHC-G alleles present polymorphic sites at the T cell receptor, NK receptor and antigen binding sites, it is likely that the MHC-G function of these New World monkeys is similar to classical HLA class I presenting molecules.

Old World primates (Cercopithecinae subfamily)

This group of monkeys lives in Eurasia and Africa. Their pregnancies last 5 months, and they usually deliver just one newborn. The MHC-G alleles of M. mulatta (Mamu, rhesus monkey), M. fascicularis (Mafa, cynomolgous monkey) and C. aethiops (Ceae, green monkey) all bear stop codons in a very restricted area of exon 3 (at codon 164), and some alleles may also show stop signals at codons 133, 118 and 176 ([101], (Fig. 10). The α1 domain of the molecule is always preserved in all species studied and may suffice for MHC-G function in Cercopithecinae. Considering the postulated role of MHC-G in preserving the fetus from maternal NK cell attack, and considering that the pregnancies are normal, usually bearing just one fetus, functional MHC-G molecules may exist in these species lacking the α2 domain. Otherwise, the existence of reading-through stop codon mechanisms may be present, as found in certain mammalian genes [102104].

Fig. 10
figure 10

Macaca mulatta. MHC-G exon 2 and exon 3 sequences, showing two stop codon positions at exon 3 (TGA) (five individuals were studied). Macaca fascicularis (four individuals) and Cercopithecus aethiops (five individuals) also showed stop codons at MHC-G exon 3. All of these monkeys belong to the Cercopithecinae family and show one stop codon at the 164 position. Some other Macaca species (43 individuals were studied) show additional stop codons at 118, 133 and 176 exon 3 residues [101]

A total of 43 Cercopithecinae individuals were tested for the presence of a stop codon at position 164, and it was observed in all of them. This is a general feature of this subfamily, which may have occurred before speciation within the group precursors (at least 30 mya, when they separated from higher primates, Hylobates and Pongidae). Stop base triplets have also been reported in humans at codon 107, and the presence of a cysteine at codons 101 and 164, crucial to maintain the overall molecule tertiary structure, has also been found to be substituted by another amino acid [101]. All these variations were observed in a heterozygous form. In these species, mothers are exposed to relatively few allogeneic fetuses since 92% of pregnancies have the α-male as a father [99].

In conclusion, only postulated isoforms not bearing the α2 MHC-G domain are found in Cercopithecinae monkeys due to the existence of stop codons in homozygosis mainly at position 164, although other stop codon positions are observed. These mutations appeared in the Cercopithecinae common ancestors after the separation from the human and Pongidae lineages about 35 mya. Either α2-lacking MHC-G isoforms may suffice for function or all isoforms may be possibly due to stop codons reading-through mechanisms.

Pongidae

Gorillas and chimpanzees (Figs. 11a, b) do not seem to have high polymorphism at MHC-G; however, more individuals need to be studied. The almost non-existence of nucleotide variation parallels the relatively low polymorphism seen in humans (see Fig. 2) [98, 105]. Mothers show relatively high exposure to allogeneic fetuses due to polygamy [99]. However, the orangutan (Fig. 11c) shows five encoded proteins that affect both the T cell receptor and the antigen binding site. Thus, the very low MHC-G polymorphism may only have appeared 15 mya, when the orangutan diverged from the human lineage. Orangutans may show long lasting male-female relationships that expose mothers to relatively few allogeneic fetuses [99].

Fig. 11
figure 11

MHC-G molecule (α1, α2 and α3 domains) in primates. a Gorilla gorilla: only one MHC-G allele was found; b Pan troglodytes: one variable position and two different MHC-G alleles were found; and c Pongo pygmaeus: MHC-G alleles found in variable positions in five orangutans. Variable amino acid positions are indicated in circles for each domain cluster (codons)

In conclusion, orangutans that separated from the human lineage about 15 mya show more MHC-G alleles than gorillas and chimpanzees, exhibiting variability at the T cell receptor, NK and antigen binding sites.

Humans

Similar to gorilla and chimpanzee, humans also present low coding region polymorphisms, encoding only 14 common proteins. Many Negroid, Caucasoid and Mongoloid individuals have been studied [98, 105] (Table 1), yielding sound results. The allelism apparently does not drastically affect the HLA-G molecule function. According to these data, the coding-region of the HLA-G gene seems to suffer a strong selective pressure for invariance (purifying selection), i.e., preservation of nucleotide and amino acid sequences, reduced variability and lower than expected non-synonymous mutation rate [106]. However, humans bearing the null allele (G*01:05N) in homozygosis have been found [107109], which may indicate that soluble HLA-G molecule or molecules lacking the α3 domain are sufficient for HLA-G function.

Table 1 Frequencies of the HLA-G coding region alleles observed in selected populations

The selective forces acting at the promoter region and 3′UTR of the HLA-G gene seem to be different from those acting at the coding region. The pattern of variation of the promoter region is characterized by two divergent lineages of haplotypes, which has been maintained by balancing selection in worldwide human populations studied so far. These lineages may have different promoter activity and may modulate the fine balance between high- and low-expressing HLA-G haplotypes [61]. It is noteworthy that the majority of SNPs described in the promoter region are located in transcriptional factor binding-sites (Fig. 4, lower panel), so heterozygosis in these regions may strongly affect the promoter responsiveness to several transcriptional factors. In addition, the 3′UTR seems to undergo the same pattern of selective pressure of the promoter region. The two most common 3’UTR haplotypes found in highly genetically diverse populations, such as Brazilians, differ in 5 out of 8 points of variation, including those already described influencing HLA-G expression (14-bp, +3142 and +3187). Given that, and considering that both present frequencies of approximately 25% [82], one may assume that this region may also suffer balancing selection.

The within-species peculiarities of MHC-G genes in primates (not following a linear evolutionary pattern) suggest that behavior peculiarities give rise to particular fetal-maternal relationships that may shape MHC-G evolution. For instance, cotton-top tamarins always give birth to monozygotic twins, which thereafter cling to the father and are born to long-lasting monogamous couples. All other primates are polygamous (polygamy, especially in primitive societies, is a human characteristic), and mothers may be in contact with many different fetuses regarding MHC antigens. However, Cercopithecinae and orangutans (species less exposed to allogeneic fetuses) are the most polymorphic at the MHC level [101]. A hypothesis may be raised that a mechanism avoiding a high MHC-G allelism has been developed in polygamous species (Old World and anthropoid monkeys and humans). This would protect the mother against frequently MHC incompatible trophoblast aggression (as seen in some pregnancy-related tumors) and, on the other hand, would confer to the mother a simple non-polymorphic system to switch off fetal NK and other immune system cells [17].

Since each primate species seems to have undergone a particular evolutionary pathway, further research studies are needed to understand the MHC-G 40 million year evolution, as summarized in this paper.

Population genetics in humans

As per definition, a polymorphic site must be present in a population at a frequency higher than 1%; however, some HLA-G variation sites may not be considered true polymorphisms, since they are observed in just one or in a few individuals. Among the 13 SNPs described for exon 2, 12 for exon 3 and 8 for exon 4, apparently only four SNPs (one in exon 2, two in exon 3 and one in exon 4) do present frequencies higher than 1% in worldwide populations [31, 82, 89, 105, 110116]. The majority of these SNPs are synonymous substitutions, and only one non-synonymous substitution in each of exons 2, 3 or 4 does present a frequency higher than 1% in worldwide populations [31, 82, 89, 105, 110115].

Regarding the consensus allele family (G*01:01), only 4 variation sites in the coding region of the HLA-G locus are related to amino acid exchange, which is frequently found in worldwide populations: (1) the +292 A → T SNP at exon 2 (codon 31), exchanging a threonine for serine, characteristic of the G*01:03 allele; (2) the +755 C → A SNP at exon 3 (codon 110), exchanging leucine/isoleucine, typical of the G*01:04 allele group; (3) the ΔC deletion at exon 3, typical of the G*01:05 N allele; and (4) the +1799 C → T SNP (codon 258), exchanging threonine for methionine, characteristic of the G*01:06 allele. In practical terms, with exception made for the null alleles, only four different HLA-G proteins are usually observed in worldwide populations, which are encoded by the most frequent G*01:01, G*01:03, G*01:04 and G*01:06 allele families (Table 1), and which may influence HLA-G function (Fig. 2).

It is interesting to observe that the ΔC deletion characteristic of the G*01:05N allele [12] is frequently found in some populations and absent in others [108, 111, 117], being overrepresented in North Indian (15.4%) and African Shona (11.1%) populations compared to Danish (1.0%) and Brazilian (0.97–4.17%) populations. In addition, the G*01:03 allele is overrepresented in North Indian (24.2%) compared to African Shona (7.4%) and Danish (4%) populations (Table 1).

Considering all alleles described for the coding region of the HLA-G gene, at least 24 exhibit a particular substitution detected only in a single or in a few individuals (Table 1). In fact, the number of detected HLA-G alleles in distinct samples from different regions is frequently less than the total number [82, 89, 105, 108, 111, 118122]. Even in highly admixed populations, such as Brazilians, only 19 out of the 44 HLA-G alleles were detected considering two different samples, 3 of them being observed only in single individuals [82, 105, 113, 123].

Overall, the low protein variability observed for the HLA-G gene corroborates evidence of its evolutionary pathway.

HLA-G polymorphism and disease associations

HLA-G expression has been extensively evaluated in several disorders of distinct etiologies; however, HLA-G gene polymorphism has not been studied to the same extent. Since the magnitude of HLA-G expression is regulated by the promoter and 3′ untranslated regions, many studies have focused on these regions, particularly on the 3′UTR polymorphic sites. In this section, we selected some disorders in which the tolerogenic role of HLA-G has been implicated on disease susceptibility, outcome or both, excluding pregnancy disorders, which will be discussed in another part of this series of HLA-G reviews.

Chronic viral infections

Viruses have evolved multiple strategies to subvert host immune surveillance and responses. Strategies such as the downregulation of classical MHC class I antigen prevent the display of viral peptides to escape lymphocyte T cytolysis [124]. In principle, virus-infected cells exhibiting downregulation of MHC class I molecules should be vulnerable to natural killer (NK) cell-mediated cytolysis. However, viruses have developed various mechanisms to impede NK cell recognition by inducing expression of HLA-G molecules [125, 126], which can suppress the function of various immune cells [127129]. The immunosuppressive properties of HLA-G might contribute to the susceptibility and persistence of viral infections.

Studies looking at the association between HLA-G expression and HIV-1 infection have yielded contradictory results. Increased soluble HLA-G levels [130] and increased surface expression of HLA-G on monocytes and T lymphocytes [126] were observed in antiretroviral (ART)-treated HIV-1-infected European Caucasians. In contrast, low levels of plasma soluble HLA-G were associated with HIV-1 infection in ART-naive Beninese commercial sex workers (CSW) [131]. The discrepancies observed in these studies may be due to several reasons: first, ART can induce surface expression of HLA-G on blood peripheral monocytes from HIV-1-infected patients [132, 133]. Hence, the relatively high blood levels of HLA-G observed in the European subjects could be due to ART and not to HIV-1 infection per se. Second, previous studies [126, 130] did not control for potential confounding factors that could influence HLA-G expression. Blood levels of soluble HLA-G are lower in men than in non-pregnant women [134], and during pregnancy the levels increase even more [135]. Third, cytokine production could influence HLA-G expression and vice versa. Interleukin (IL)-10 has been shown to induce HLA-G expression [136], and HLA-G can also stimulate IL-10 expression in peripheral blood monocytes [137]. Interestingly, plasma levels of IL-10 were lower in HIV-1-infected CSWs compared to both the HIV-1-uninfected CSW and non-CSW groups [138]. Fourth, HLA-G expression may fluctuate over the course of infection and may vary between different rates of disease progression. Indeed, longitudinal monitoring of plasma-soluble HLA-G levels in subjects with acute HIV infection undergoing different rates of disease progression showed that levels were elevated in the early phases of infection for all patients and remained high throughout follow-up in rapid progressors who responded to ART; however, soluble HLA-G were restored to normal levels in the chronic phase of infection in both untreated normal progressors and long-term nonprogressors [139]. Finally, HLA-G gene polymorphism has to be taken into account since healthy individuals carrying the HLA-G*01:01:03 and G*01:05 N alleles have lower plasma-soluble HLA-G levels than subjects carrying the more frequent HLA-G*01:01:01 allele. In addition, individuals with the latter allele have lower plasma soluble HLA-G levels than those with the HLA-G*01:04:01 allele [84]. Moreover, the presence of a 14-bp sequence insertion in HLA-G 3′UTR has been associated with significantly reduced HLA-G mRNA levels [87] and lower levels of soluble HLA-G in the serum of healthy subjects [63, 140]. In the study of CSWs from Benin, all subjects were non-pregnant women, and their reduced expression of soluble HLA-G in plasma was independently associated with both HIV-1 infection and the HLA-G 3′UTR 14-bp sequence insertion homozygote genotype [131].

HLA-G polymorphism is also associated with the risk of human immunodeficiency virus (HIV) infection. Indeed, the HLA-G*01:05N allele was significantly associated with protection from HIV-1 infection, whereas the HLA-G*01:01:08 allele was associated with an increased risk of HIV-1 infection in Zimbabwean women [141, 142].

Since the HLA-G*01:05N allele does not encode complete HLA-G1 proteins [107], and since HLA-G1 is a major ligand by which HLA-G inhibits NK cell-mediated lysis [143], it has been postulated that the absence or reduced expression of HLA-G1 molecules in individuals carrying the HLA-G*01:05N allele would allow NK cells to destroy HIV-infected cells, leading to protection from infection. The HLA-G*01:01:08 allele is defined by a synonymous substitution (proline) at codon 57. Although discordance in HLA-G codon 57 between a mother and her child has been reported to reduce the risk of HIV-1 vertical transmission in American Caucasians [144], this finding was not confirmed in the Zimbabwean population [145]. Because the polymorphism in codon 57 does not change the HLA-G amino acid composition and presumably its function, it is difficult at this time to envisage the mechanism by which such a silent mutation could have a direct influence on HIV transmission.

In addition, two independent studies have found an association between HLA-G nucleotide sequences located in the 3′UTR, including the C variant at position +3010 (others consider this position as +3777) and 14-bp deletion polymorphism, and decreased risk of HIV-1 vertical transmission [146, 147]. Moreover, two independent studies have demonstrated that the +3010 and 14-bp polymorphisms are in linkage disequilibrium, especially the association between the 14-bp insertion and the +3010C allele [82, 148]. The 3010C variant alone has no effect on vertical transmission of HIV-1 but, when linked with the 14-bp deletion allele, exerts a positive role on protection [148].

Associations between HLA-G polymorphism and other chronic viral infections have also been described. Martinetti et al. [149] observed that the HLA-G 3′UTR 14-bp deletion homozygote genotype and the presence of the HLA-G*01:04:01 allele that contains this deletion were both risk factors for vertical transmission of hepatitis C virus (HCV), whereas the HLA-G*01:05N allele containing the 14-bp insertion polymorphism was associated with reduced risk of transmission. Homozygosity for the C variant at position +3142 in the HLA-G 3′UTR region conferred protection against HCV infection among sickle cell disease patients [150]. Interestingly, the HLA-G 3′UTR +3142 variant is in linkage disequilibrium with the 14-bp polymorphism and is thought to affect microRNA binding to the mRNA molecule, thus influencing HLA-G RNA turnover and translation [81, 82]. During acute human cytomegalovirus (hCMV) infection both membrane-bound and soluble HLA-G expression is upregulated in peripheral blood monocytes and bronchoalveolar macrophages [125, 151]. The HLA-G 14-bp deletion homozygote genotype was associated with acute hCMV infection in children and higher hCMV DNA copy numbers in the urine of these children [152].

In neurotropic viral infections, herpes simplex virus type 1 and rabies virus upregulate the neuronal expression of HLA-G isoforms in both infected cells and neighboring uninfected cells [153]. Cell surface expression of HLA-G was restricted to rabies virus-infected neurons [154]; however, there are no HLA-G polymorphism studies in these disorders.

Taken together, these observations suggest that, in the context of viral infections, the expression of HLA-G is a complex process modulated by many factors such as HLA-G polymorphism, stage of infection, drug therapy, and cytokine expression patterns, which may contribute to an immunological environment affecting the outcome of infection.

Transplantation

Due to the scarcity of organs for transplantation, and due to the several mechanisms that contribute to graft rejection, much effort has been devoted to understanding the mechanisms associated with graft rejection and survival. Since the discovery in the 1990s that HLA-G played a relevant physiological role in the induction of maternal-fetal tolerance, the idea of investigating the role of HLA-G in allografting has taken hold. Several studies have been reported demonstrating that the presence of HLA-G in the allograft [155158] or the increase of soluble HLA-G serum/plasma levels [159166] are associated with better graft acceptance, increased graft survival or both.

Despite the large amount of information regarding the graft expression or the levels of soluble HLA-G, few studies have been conducted evaluating polymorphic sites in the HLA-G gene, most of them evaluating only the 14-bp INDEL polymorphism in the 3’UTR. In kidney transplantation, the frequency of the 14-bp insertion/deletion alleles or genotypes revealed no significant differences when patients were compared to healthy controls, either in a Brazilian [167] or in an Italian cohort of patients [168]; however, the frequency of the 14-bp insertion homozygous genotype was increased in patients exhibiting acute rejection [167]. Similarly, in bone marrow transplantation for thalassemia, the 14-bp deletion homozygous genotype was associated with a higher risk of developing acute graft-versus-host disease [169]. In contrast, for heart transplants, the 14-bp deletion homozygous genotype has been associated with higher levels of soluble HLA-G and higher bioavailability of cyclosporine [170]. Although the 14-bp insertion has been associated with a more stable mRNA [70], a fraction of HLA-G mRNA transcripts presenting the 14-bp insertion homozygous genotype has been associated with a lower production of HLA-G [63, 87]. According to these findings, the results regarding 14-bp INDEL polymorphism in transplantation are coherent, indicating that insertion homozygous genotypes (associated with decreased production of HLA-G) are associated with rejection or acute graft-versus-host disease, whereas the deletion homozygous genotype (associated with increased production) is associated with low rejection.

Besides the 3′UTR, the coding region polymorphism has also been studied in renal transplant patients. It has been well recognized that HLA-A-B and -DR incompatibilities are associated with a poorer prognosis and shorter allograft survival; however, little attention has been paid to HLA-G locus incompatibility between donor and receptor, particularly considering that the HLA-G molecule is not a recognized peptide presenter molecule. On the other hand, the HLA-G locus is in linkage disequilibrium with the HLA-A locus, and the molecule has well-recognized tolerogenic features. The presence of two HLA-G matches was associated with lower risk of kidney rejection when compared to zero or one match. In addition, patients who were heterozygous for the HLA-G alleles exhibiting synonymous/non-synonymous (S/NS) nucleotide variations had a higher chance to develop rejection in relation to those homozygous for S/S or NS/NS alleles [113]. Since HLA-G alleles exhibiting synonymous nucleotide variation may have distinct promoter or 3′UTR, this study supports the importance of studying regulatory HLA-G regions to understand the genetic mechanism associated with the production of HLA-G in transplants.

Autoimmune disorders

Autoimmune disorders encompass a heterogeneous group of diseases, exhibiting complex genetic inheritance, affecting virtually all organs or tissues, and producing a vast array of clinical features. Breakdown of the mechanisms that control central or peripheral tolerance in genetically susceptible individuals, usually triggered by environmental factors, is the major pathogenetic feature. In addition, many of these diseases share several susceptibility genes, particularly those encoded by MHC. Since classical HLA class I and II molecules are highly implicated in peptide presentation, and since disturbances of the MHC/peptide/T cell receptor interaction may drive the production of auto-reactive T lymphocytes or auto-reactive autoantibodies, the genes that encode these molecules have been the primary candidates associated with susceptibility to or protection against the development of autoimmunity. Besides MHC, some genes encoding molecules responsible for the negative control of the immune response, including CTLA-4, PTPN-22 and correlated genes, have been associated with distinct autoimmune disorders like systemic lupus erythematosus, rheumatoid arthritis, multiple sclerosis and type 1A diabetes mellitus [171173]. Given that autoimmunity is caused by a breakdown of tolerance and HLA-G is as a tolerogenic molecule, HLA-G should be an adequate subject; however, little attention has been given to HLA-G tissue expression or genetic aspects.

Considering that the presence of soluble isoforms or HLA-G expression on affected tissue cells should lessen autoimmune manifestations, only few examples have shown such association. Tissue HLA-G expression has been observed in the skin of patients with systemic sclerosis, who exhibited a lower frequency of vascular cutaneous lesions, telangiectasias, inflammatory polyarthritis and increased survival after a 15-year follow-up [174]. In the demyelinating disorder multiple sclerosis, levels of soluble HLA-G in the cerebrospinal fluid of patients have been inversely associated with imaging findings indicative of disease activity [175]. Soluble HLA-G levels were increased in patients with systemic lupus erythematosus in relation to controls; however, no correlation with disease activity was observed [176].

With respect to polymorphic sites in the HLA-G gene, no association was found regarding the frequencies of the −725C/G (promoter region), exon 3 positions 129 and 130 ΔC (coding region), and 14-bp INDEL (3′UTR) in multiple sclerosis patients [73]. Increased frequencies of the 14-bp insertion allele as well as the insertion/insertion genotype were observed in systemic lupus erythematosus patients [115], whereas the heterozygous insertion/deletion genotype was associated with the disease activity index, photosensitivity and absence of arthritis [177]. The 14-bp deletion allele was overrepresented in patients exhibiting the juvenile form but not the adult form of rheumatoid arthritis, presenting no association with diseased severity or clinical manifestations [178]. In addition, the 14-bp insertion/insertion genotype was associated with increased production of soluble HLA-G levels and a better therapeutic response to methotrexate in adult rheumatoid arthritis patients [179].

Agnostic studies using genome-wide screens or fine mapping of the MHC region, conducted on large numbers of multiplex families, have unexpectedly identified the HLA-G region as a susceptibility gene for type 1 diabetes mellitus [180] and for bronchial asthma, which is an immune-mediated chronic inflammatory disease [68]. The risk conferred by HLA-G for diabetes mellitus patients was independent of that conferred by the well-recognized classical class II HLA-DRB1, DQB1 and DQA1 alleles [181, 182]. The association of bronchial asthma with HLA-G has led investigators to evaluate the promoter and 3′UTR of the gene, reporting that the GG genotype of the −964G/A SNP was associated with asthma in children of affected mothers, whereas the AA genotype was associated with asthma in children of unaffected mothers, as previously mentioned [68]. In addition, the authors identified the SNP +3142C/G at the 3′UTR as a putative target site for miRNAs, strengthening the association of HLA-G and asthma [81].

Taken together, these findings seem to associate HLA-G polymorphic sites with autoimmune disorders encompassing functional sites of the gene, i.e., the promoter and 3′untranslated regions.

Tumors

Cancer cells display tumor-associated antigens coded by mutated or normal deregulated genes that, once presented by classical MHC class I molecules, may be recognized by the host immune system, being usually eliminated. Even in the presence of a competent immune system, neoplastic cells can grow and progress to very aggressive malignant lesions, evolving by tumor immunoediting, a process that comprises three major mechanisms. First, most immunogenic tumor cells are eliminated by cytotoxic T and NK cells. Second, tumor cells with reduced immunogenicity are selected. Third, variants that no longer respond to the host immune system are maintained [183185]. HLA-G is supposed to play a pivotal role in cancer immunoediting by decreasing the elimination of tumor cells, inhibiting the cytotoxic function of T and NK cells, and by trogocytosis, i.e., the intercell transference of viable HLA-G molecules, rendering competent cytotoxic cells unresponsive to tumor antigens [186, 187].

The most common immune-escape mechanism in experimental or spontaneous tumors is the downregulation or lack of expression of classical HLA class I molecules, which directly affects cytotoxic T cell function against tumor cells. The loss of classical class I molecules is often associated with mutation in the β2-microglobulin gene, which may affect class I expression level, producing unstable molecules. In addition, the loss of certain classical HLA class I alleles by microdeletions may lead to inefficient presentation of some immunodominant antigens [188]. In the absence of classical HLA class I molecules, NK cells become the major actors for tumor elimination. Similar to chronic viral infections, the microenvironment of the tumor or the transformed cell per se may induce HLA-G expression, impairing the activity of NK cells.

Several studies have demonstrated that various tumor lesions do present HLA-G transcription and protein expression [183]. To date, HLA-G expression has been detected in several tumor cells, exhibiting distinct patterns regarding the percentage of lesions expressing the molecule. Exemplifying, in cutaneous melanomas [189, 190], renal cell carcinomas [191], retinoblastomas [192], colorectal cancer [193], ovarian carcinomas [194, 195], endometrial adenocarcinomas [196], gastric cancer [197, 198], cutaneous T cell lymphomas [199, 200] and pancreatic ductal adenocarcinoma [201], among others [17, 202], at least 30% of these lesions exhibit HLA-G expression. In addition to tumor cell expression, HLA-G may also be detected in tumor infiltrating cells [183], which, allied to increased soluble HLA-G concentrations [203206], may contribute to tumor progression.

Overall, most studies have evaluated HLA-G expression in tumor cells, tumor cell lines or even soluble HLA-G levels, and few genetic studies have been performed. Underlying these tumor-associated studies is the concept that some HLA-G polymorphisms may influence HLA-G expression (promoter and 3′UTR polymorphisms), the pattern of alternative splicing of the primary transcript and mRNA turnover (coding region, including introns, and 3′UTR polymorphisms). The regulatory sequences of the HLA-G gene, as discussed earlier, may be induced by various factors present in the tumor microenvironment [207], since tumor-derived soluble factors may promote HLA-G expression [183]. Several lines of evidence have shown that the methylation status of the HLA-G gene promoter is also crucial for the gene transcriptional activity [66, 67], and polymorphisms located at CpG sites surrounding the HLA-G locus may affect the gene methylation pattern [64]. Regarding the 3′UTR and coding region, the 14-bp insertion seems to occur at high frequency in renal cell carcinomas compared to the normal kidney epithelium; however, no correlation between the 14-bp INDEL polymorphism and the level of HLA-G expression has been found [191].

In human papillomavirus (HPV) infections, the coding region G*01:03 and the G*01:04 alleles were associated with squamous intraepithelial lesions (SIL), and the G*01:01/G*01:04 genotype was associated with high-grade lesions (HSIL). In addition, patients possessing the extended G*01:04/+14-bp haplotype and harboring HPV-16 and -18 co-infections were particularly susceptible to HSIL [208].

In bladder cancer, the HLA-G locus was associated with susceptibility to transitional cell carcinoma (TCC) development and progression. The G*01:04 allele family (especially the G*01:04:04 allele) and the G*01:03 allele were associated with a tobacco-dependent influence on TCC development. The G*01:04 allele family was associated with progression to high-grade bladder tumor, irrespective of the smoking habit, while the G*01:03 allele was associated with protection against TCC [123]. Curiously, the same HLA-G coding region variations were found to influence both HPV-induced lesions and bladder cancer in studies conducted on two different samples of patients from the same geographical area (Southeast Brazil). Apparently, the G*01:04 allele family (G*01:04:01 and G*01:04:04 being the most frequent alleles in Brazilians), which is often associated with HSIL, has also been associated with increased production of soluble HLA-G [84]. This is of particular importance since the G*01:04 allele family occurs at high frequency in Brazilians (about 12%) [105], as well as in worldwide populations studied so far, including African Shona (about 21%) [110], North Indians (about 17.5%) [111] and Danes (about 9%) [89]. The higher expression of HLA-G associated with the G*01:04 allele family is probably due to the presence of specific 5′ [61] or 3′UTR [82] haplotypes that are exclusive of these alleles. The G*01:04 allele family presents the same 3′UTR haplotype, encompassing two polymorphic sites previously associated with low HLA-G expression, +3187A and +3142G [82]. In contrast, this family presents a unique promoter haplotype characterized by the presence of −1155A close to the locus control region (LCR), sharing specific combinations of polymorphic sites at several transcriptional factor binding sites [61]. Thus, it is possible that this promoter region segment may be responsible for the increased HLA-G production.

Knowledge of the tumor cell HLA-G expression profile as well as the typing of relevant polymorphic sites in the HLA-G gene may discriminate patients prone to producing high amounts of HLA-G, individualizing future treatments targeting HLA-G.

Concluding remarks

In physiological conditions like pregnancy, the expression of HLA-G is useful to maintain tolerance of the semiallogeneic engraftment. In pathological conditions, the expression of HLA-G may be profitable or pernicious, depending on the specific disorder. Even considering individuals exhibiting the same type of disorder, some may express HLA-G, while others may not. Certainly, gene regulatory regions play a pivotal role in this process, and polymorphic sites observed in these regions may influence HLA-G production.

Whether or not a sick individual will express HLA-G depends on several factors: (1) transcriptional factors in the disease milieu that upregulate HLA-G expression; (2) presence of selected polymorphic sites in the promoter region that can be targeted for transcription factors; (3) magnitude of post-transcriptional factors that may degrade HLA-G mRNA, particularly, microRNAs that can target polymorphic sites at 3′UTR; (4) structural features of the HLA-G molecule encoded by the alleles of that person, which are available to interact with HLA-G receptors; (5) intrinsic features of diseased tissue cells; (6) polymorphic sites on genes that encode the mediators involved in the pathogenesis of the disease; (7) the role of neighboring or distant genes epistatically acting on the HLA-G gene; and (8) the role of epigenetic factors, among others. Therefore, due to the complexity of this scenario, most studies have focused only on one or a few of these aspects, adding just a few scenes to a long motion picture that needs to be edited to show the epilogue. Much has to be learned regarding the regulation of HLA-G expression, to unveil individuals at risk to develop complications associated with the excess or scanty HLA-G production to individualize specific therapeutical approaches furnishing or inhibiting HLA-G.