Polycomb group (PcG) protein complexes dynamically define cellular identity through the regulation of key developmental genes. Important advances in the PcG field have come from genome-wide mapping studies in a variety of tissues and cell types that have analyzed PcG protein complexes, their associated histone marks and putative mechanisms of PcG protein recruitment. We review how these analyses have contributed to our understanding of PcG protein complex targeting to chromatin and consider the importance of diverse PcG protein complex composition for gene regulation. Finally, we focus on the dynamics of PcG protein complex action during cell fate transitions and on the implications of histone modifications for cell lineage commitment.
Polycomb group (PcG) proteins are conserved chromatin factors that were originally discovered in Drosophila melanogaster as regulators of Hox genes, a set of transcription factors that specify cell identity along the anteroposterior axis of the body plan (Duncan, 1982; Lewis, 1978). The expression of Hox genes is established early during embryonic development by a cascade of maternal and zygotic transcription factors (Akam, 1987). However, these early transcription factors decay shortly after the establishment of Hox gene expression, despite the fact that Hox gene expression patterns need to be maintained throughout development. PcG proteins maintain the silent state of Hox genes outside of their expression domains, whereas a second group of proteins, termed the Trithorax group (trxG), maintains active transcription in the appropriate expression domains (Box 1). On the basis of these observations, PcG and trxG proteins have long been considered as a cellular memory system that stably locks Hox gene expression states for an organism's whole life span (reviewed by Ringrose and Paro, 2004). However, recent genome-wide mapping studies of PcG components in several species have revealed that PcG proteins bind many more genes in addition to Hox genes, mainly comprising transcription factors involved in diverse cellular functions and developmental pathways (Boyer et al., 2006; Bracken et al., 2006; Lee et al., 2006; Negre et al., 2006; Schwartz et al., 2006; Squazzo et al., 2006; Tolhuis et al., 2006). In addition, the association of PcG proteins with their target genes does not necessarily result in gene silencing (Beisel et al., 2007; Papp and Muller, 2006; Schwartz et al., 2006; Stock et al., 2007), and PcG proteins have been demonstrated to dynamically bind their targets in embryonic stem (ES) cells and during subsequent cell lineage commitment events (Boyer et al., 2006; Lee et al., 2006). These findings challenge the dogma that PcG proteins solely convey cellular memory and suggest that they might be involved in the dynamic regulation of a variety of biological processes.
Indeed, many reports published in the last decade have revealed an expanded spectrum of action for PcG proteins, including roles in cell cycle control (reviewed by Martinez and Cavalli, 2006), spermatogenesis (Chen et al., 2005), actin polymerization (Su et al., 2005), cellular senescence (Bracken et al., 2007; Dietrich et al., 2007) (reviewed by Guney et al., 2006), X-chromosome inactivation (reviewed by Heard, 2005; Lee, 2009), genomic imprinting (Mager et al., 2003; Pandey et al., 2008; Puschendorf et al., 2008; Terranova et al., 2008), stem cell plasticity and cell fate determination, as well as cancer (reviewed by Rajasekhar and Begemann, 2007; Sparmann and van Lohuizen, 2006). In addition, a recent report demonstrated an unexpected role for the Polycomb repressive machinery in maintaining mitochondrial function and redox homeostasis (Liu et al., 2009). The plethora of processes regulated by PcG proteins (Table 1) has raised interest from a wide range of research fields, and it is likely that the known spectrum of action of these proteins will expand further in the future.
Box 1. The role of Trithorax in PcG target gene expression
The histone H3K4 methyltransferase Trithorax (Trx) has been traditionally considered as counteracting Polycomb group (PcG) protein-mediated silencing. However, its molecular function has been less widely studied than that of PcG proteins and remains little understood. Importantly, it has been shown that Trx is constitutively bound at Drosophila Polycomb response elements (PREs) independently of their activity state (Papp and Muller, 2006). This might allow PcG target genes to switch rapidly from an inactive to an active state in response to an activating signal, with the subsequent relief of PcG protein-mediated silencing.
Alternative splicing gives rise to five Trx isoforms with a conserved C-terminal part (Trx-C) and a variable N-terminal region (Trx-N) (Sedkov et al., 1994). Furthermore, Trx is proteolytically cleaved into an N-terminal and a C-terminal domain, but the fate or function of the two moieties after cleavage has never been addressed in vivo (Kuzin et al., 1994). Genome-wide mapping studies using two different antibodies, one against Trx-C (which recognizes all isoforms) and one against Trx-N (which recognizes only two isoforms) have provided further insight into how this switch in PcG target gene activity might happen (Schuettengruber et al., 2009). Trx-N shows a low affinity for PREs, but is strongly bound to K4-recruiter sites. As such, its distribution resembles that of a general transcription co-factor, as indicated by previous reports (Petruk et al., 2006; Petruk et al., 2008). By contrast, Trx-C is strongly linked to PcG protein function, showing high binding affinity only to PREs. Thus, Trx might have a dual function depending on the isoform present or on proteolytical cleavage. At PREs, constitutive Trx-C binding might allow PcG protein target genes to switch their state in response to transcription-inducing signals. At promoter regions that are not occupied by PcG factors, the PRE-associated Trx form is absent and the promoter-associated Trx isoforms might constitutively activate transcription (Petruk et al., 2006; Petruk et al., 2008). The molecular mechanism that underpins this difference is unknown and awaits future analysis.
Genes and proteins that interact with PcG proteins genetically or biochemically are generally added to the PcG family, although many of these components also have other functions. The PcG genes thus form a heterogeneous group that includes both core members and associated factors. Attempts have been made to rationalize the gene nomenclature (Gildea et al., 2000; Grimaud et al., 2006), but, as the field progresses, the criteria for defining a new component as a member of the PcG family evolve, blurring the classifications. Historically, PcG proteins have been shown to form two major core complexes: Polycomb repressive complex 1 and 2 (PRC1 and PRC2, respectively). Recent reports, however, suggest that the diversity of PcG complexes is greater than previously thought.
In Drosophila, PcG protein complexes are recruited to chromatin by DNA elements called Polycomb response elements (PREs). These elements mediate the inheritance of silent chromatin states throughout development (Busturia et al., 1997; Sengupta et al., 2004). The maintenance of gene expression states by PREs is epigenetic, meaning that the heritable state of gene activity does not require the continuous presence of the initiating signal nor does it involve changes in the DNA sequence (Ptashne, 2007). However, the precise mechanism of PcG protein recruitment to PREs remains a mystery.
In this review, we discuss several important aspects of the dynamic regulation of PcG protein complex distribution. First, we highlight the role of individual PcG protein complex components and the function of alternative PcG isoforms in forming PcG protein complexes with different enzymatic and gene silencing activities. Second, we summarize the findings of large-scale analyses of PcG target sites in different species and discuss different models of PcG targeting to chromatin. Finally, we focus on the role of PcG proteins and their associated histone marks in cell fate transitions. For additional information on the role of PcG proteins in nuclear organization and on the mechanisms of PcG-mediated gene silencing, we refer the reader to other recent reviews (Mateos-Langerak and Cavalli, 2008; Schuettengruber et al., 2007; Schwartz and Pirrotta, 2008).
PcG protein complex diversity and gene regulation
Biochemical and genetic studies have demonstrated that PcG silencing in Drosophila and vertebrates involves the activity of two multiprotein complexes PRC1 (Saurin et al., 2001; Shao et al., 1999) and PRC2 (Cao et al., 2002; Czermin et al., 2002; Kuzmichev et al., 2002; Muller et al., 2002). PRC2-type complexes contain the four core components Enhancer of zeste [E(z) in Drosophila, EZH2 in mammals], Extra sexcombs (Esc in Drosophila, EED in mammals), Suppressor of zeste 12 [Su(z)12 in Drosophila, SUZ12 in mammals] and a nucleosome remodeling factor [Nurf55 (Caf1) in Drosophila, RbAp46/48 (RBBP7/4) in mammals]. The catalytic subunit, EZH2, is a SET domain-containing methyltransferase that catalyzes the di- and trimethylation of lysine 27 on histone H3 (H3K27me2 and me3, respectively). H3K27me3, the hallmark of PcG-dependent gene silencing, is specifically recognized by the chromodomain of Polycomb (Pc) (Cao and Zhang, 2004a), a subunit of PRC1-type complexes. The analysis of Drosophila PRC1 by Shao et al. (Shao et al., 1999) identified Pc, Polyhomeotic (Ph), Posterior sex combs (Psc) and dRing [also known as Sex combs extra (Sce)] as its core components (Fig. 1A); in mammals, each of the fly genes has two or more homologs (Levine et al., 2002). One should thus consider PRC1 as a family of complexes. Within PRC1 complexes, mammalian RING1B (also known as RNF2 or RING2) and fly Sce are ubiquitin E3 ligases that catalyze the monoubiquitylation of histone H2A at lysine 119 (H2AK119ub1), a histone mark that is associated with transcriptional silencing (Wang et al., 2004).
A third PcG complex involved in homeotic gene silencing, PhoRC, has been identified in Drosophila (Klymenko et al., 2006). PhoRC contains the sequence-specific DNA-binding protein Pleiohomeotic (Pho), as well as the Scm-related protein containing four MBT domains (Sfmbt), which binds specifically to mono- and dimethylated K3K9 and H4K20 through its MBT repeats (Fig. 1A). No enzymatic activity has been shown to be associated with PhoRC. In addition to these three PcG protein complexes, several additional complexes with different enzymatic activities have been identified in the last few years that might contribute to the variety of biological processes regulated by PcG proteins. Complex diversity is achieved either by interactions with additional PcG proteins (Fig. 1B), the incorporation of homologous proteins or different protein isoforms (Fig. 1C), or by the formation of PcG protein-like complexes that include only some of the PRC1 core components in combination with other chromatin regulators (Fig. 1D).
Interactions with additional proteins
The Drosophila Polycomblike (Pcl) protein and its mammalian paralog PHF1 have been shown to interact biochemically and functionally with PRC2 (Fig. 1B). In Drosophila, loss-of-function mutations of Pcl result in decreased levels of H3K27me3 at PcG protein complex target genes, whereas H3K27me2 levels are not significantly affected (Nekrasov et al., 2007). Similarly, in mammals, PHF1 is required for efficient H3K27me3 production by PRC2, indicating that the Pcl-PRC2 complex is needed for high levels of H3K27 trimethylation at PcG protein complex target genes (Cao et al., 2008; Sarma et al., 2008) (reviewed by Muller and Verrijzer, 2009).
Incorporation of alternative PcG paralogs and isoforms
Two papers have analyzed the role of EZH1, a homolog of EZH2 that can interact with PRC2 components (Fig. 1C) (Margueron et al., 2008; Shen et al., 2008). RNA interference (RNAi)-mediated knockdown of PRC2-EZH1 has no significant effect on global H3K27me3 levels, which indicates that its global contribution to K27 methylation is minor. However, both complexes use H3K27me1 as a substrate and share most of their target genes (Margueron et al., 2008). EZH1 expression levels are constant during development, whereas EZH2 is expressed mainly in proliferating cells (e.g. in early embryos) and is overexpressed in cancer cell lines. Interestingly, the expression profile of EZH2 strongly resembles that of proteins involved in DNA replication, which indicates a role for PRC2-EZH2 in the transmission of repressive marks during DNA replication (Hansen et al., 2008) (see note added in proof). In contrast to PRC2-EZH2, PRC2-EZH1-mediated repression does not require the trimethylation of H3K27, and this complex has been shown to condense chromatin in vitro, independently of its histone methyltransferase (HMTase) activity (Margueron et al., 2008).
In addition to the two homologs of fly E(z), four EED isoforms (EED is a homolog of fly Esc) exist in mammalian cells, as well as five Pc, three Ph, four Psc and two Sce homologs. All of these homologs might contribute to PcG protein complex diversity. For example, the incorporation of the different EED isoforms (EED1-4) into PRC2 results in PcG protein complexes with different in vitro substrate specificities (Fig. 1C) (Kuzmichev et al., 2004; Kuzmichev et al., 2005). The Drosophila EED homologs Esc and Esc-like (Escl) differ quantitatively, but not qualitatively, in their PcG silencing function: in vitro, the Esc-PRC2 complex has higher enzymatic activity than the Escl-PRC2 complex. Moreover, Esc has a strong maternal contribution, whereas little Escl protein is available during early development; however, increasing amounts of Escl get incorporated into PRC2 at later developmental stages (Ohno et al., 2008).
Formation of PRC1-like protein complexes
A novel PcG protein silencing complex, named dRAF, which contains dRing, Psc and the histone demethylase dKDM2 (encoded by the gene CG11033), has been identified in Drosophila (Fig. 1D). dKDM2 specifically demethylates H3K36me2, but also strongly stimulates histone H2A ubiquitylation by dRing/Psc, and dRAF, rather than PRC1, might be the major H2A ubiquitylating complex in Drosophila (Lagarou et al., 2008).
The mammalian homolog of Psc, BMI1, has at least three paralogous proteins, MBLR, NSPC1 and MEL18 [also known as Polycomb group ring finger (PCGF) protein 6, PCGF1 and PCGF2, respectively], which all interact with other PcG proteins to form a set of distinct, but related, complexes (Fig. 1D) (Akasaka et al., 2002; Brunk et al., 1991; van Lohuizen et al., 1991). MBLR was detected, together with RING1B, in the E2F6 complex which displays methyltransferase activity for lysine 9 on histone H3 catalyzed by Eu-HMTase1 (EHMT1) or NG36 (G9a or EHMT2) (Ogawa et al., 2002). In addition, MBLR has been identified in a complex together with the H3K4 demethylase JARID1d (KDM5D). NSPC1, together with RING1, RING1B and RYBP, has been purified with the BCOR co-repressor complex, which contains FBXL10 (KDM2B), a demethylase for histone H3K36 (Gearhart et al., 2006). MEL18 was found in a PcG protein-like complex, melPRC1, together with RING1B, HPH2 (PHC2) and CBX8 (Elderkin et al., 2007). Interestingly, MEL18 needs to be phosphorylated to direct RING1B substrate specificity, which suggests the intriguing possibility that cell signaling pathways can regulate PcG protein function. This is further corroborated by earlier findings in Drosophila that show that the Jun N-terminal kinase (JNK) signaling pathway can repress PcG function upon tissue injury (Lee et al., 2005). Conversely, PcG protein complexes can also regulate signaling. PcG proteins have been found to be associated with genes involved in several signaling pathways (Bracken et al., 2006), and a recent study has shown that PRC1 binds to multiple components of the Notch signaling pathway to control cellular proliferation and differentiation and to suppress tumor formation in Drosophila (Martinez et al., 2009) (see note added in proof).
In summary, what was previously thought to be a simple set of two chromatin-binding complexes that ensured cellular memory turns out to be a highly sophisticated set of complexes, the function of which might be modulated by post-translational protein modification or by the presence of one or many different subunits. These complexes might perform different sets of functions, collaborate, or compete for certain functions, but their detailed characterization awaits future research.
Genome-wide PcG protein distribution and targeting
In recent years, genome-wide mapping studies in Drosophila and vertebrates have led to a comprehensive list of PRC1 and PRC2 target sites (for a review, see Ringrose, 2007). In all species, PcG protein binding is highly correlated with the presence of the H3K27me3 mark. In flies, all mapped PRC2 and PRC1 components, with the exception of Pc, bind as sharp peaks to PRE sequences, whereas H3K27me3 (and, to a lesser extent, Pc) forms large domains of up to several hundred kb around PREs. The evident discrepancy between broad H3K27me3 domains and the localized binding of the corresponding methyltransferase E(z) to PREs might be explained by a looping model (Kahn et al., 2006; Papp and Muller, 2006; Schwartz et al., 2006), according to which the PRE-bound protein complexes loop out to transiently contact neighboring nucleosomes and to trimethylate them on histone H3K27. This model might also account for the broader binding of Pc around PREs (Kwong et al., 2008; Schuettengruber et al., 2009; Schwartz et al., 2006), if one assumes that these transient interactions, mediated by the Pc chromodomain, can be captured by the cross-linking process. Recently, the distribution of the PhoRC complex has also been described (Oktaba et al., 2008). Similar to PRC2, PhoRC binding is sharply localized, and the majority of these sites has been reported to be co-occupied by PRC1 and PRC2, which establishes PhoRC as a core PRE-binding complex (see discussion below).
In contrast to the situation in Drosophila, mammalian PRC2 components are more tightly colocalized with H3K27me3 (Boyer et al., 2006; Bracken et al., 2006; Lee et al., 2006). Moreover, at most genomic sites in mammalian cells, PcG proteins and H3K27me3 are localized to regions of less than 5 kb, mostly spanning gene promoters, whereas only a minority of PcG binding sites forms larger domains. This is in contrast to the large H3K27me3 domains observed in flies as well as to the findings of a recent study in mouse embryonic fibroblasts that provides evidence for large domains of H3K27me3 (Pauler et al., 2009). Additionally, it is worth mentioning that Ku and colleagues reported a distinctly lower level of colocalization of H3K27me3 with the PRC1 component RING1B (Ku et al., 2008). PRC1-negative H3K27me3 regions were smaller and more unstable upon differentiation, which indicates that PRC1 might add stability to the silencing process. Thus, the mechanisms by which H3K27me3 is deposited on chromatin by PRC2 and selectively recognized by PRC1 might differ between flies and vertebrates.
What might be the reason for the discrepancies in the distribution of PcG proteins and histone marks observed between flies and human cells or between the different mammalian studies? To some extent, they might simply depend on the choice of peak-finding thresholds and the choice of statistical methods, as well as on different mapping technologies (microarrays or high-throughput sequencing technologies) and experimental procedures (e.g. the use of native versus cross-linked chromatin, or the antibodies employed). However, they could also reflect real biological differences that indicate that PcG proteins might establish different chromatin domains in different species and cell types [for a discussion, see the review by Ringrose (Ringrose, 2007)]. For example, the distribution of PcG proteins might be more plastic in mammalian cell types than it is in Drosophila. In mammals, long-term memory of chromatin states might generally require DNA methylation, whereas PcG proteins might repress genes in a dynamic manner. By contrast, flies do not generally use DNA methylation for long-term memory of chromatin states; instead, they might use Polycomb-mediated silencing, and this stronger chromatin stability could be linked to the formation of larger Polycomb domains.
DNA-binding proteins in PcG protein recruitment
Unlike PhoRC, PRC1 and PRC2 do not bind their target DNA in a sequence-specific manner. PcG protein recruitment has been suggested to depend on the combinatorial action of several sequence-specific DNA-binding proteins, such as Pho, its homolog Pleiohomeotic-like (Phol), GAGA factor (GAF; Trithorax-like), Pipsqueak (Psq), Dorsal switch protein (Dsp1), Zeste, Grainy head (Grh) and SP1/KLF, which recognize several conserved sequence motifs at or near PREs, leading to the tethering of PcG proteins to their targets (reviewed by Muller and Kassis, 2006; Schuettengruber et al., 2007). However, loss-of-function mutations in genes that encode certain putative PcG recruiter proteins do not induce clear PcG phenotypes, and none of these proteins is sufficient to recruit PcG proteins to their targets. In addition, all putative PcG recruiter proteins seem to be involved in transcriptional activation as well as in repression (reviewed by Muller and Kassis, 2006).
To gain more insight into the role of putative PcG recruiter proteins and the sequences that underlie PcG protein targeting, several labs have mapped the genome-wide distribution of these proteins; the distribution of Pho has recently been described in three independent reports (Kwong et al., 2008; Oktaba et al., 2008; Schuettengruber et al., 2009). Pho, a PhoRC member, plays a crucial role in PcG silencing, and it interacts with PRC2, as well as with the Pc and Ph subunits of PRC1, in vitro (Mohd-Sarip et al., 2002). Interestingly, however, a large portion of Pho binding sites is not associated with PcG-bound regions, but instead corresponds to promoter regions of genes marked by active histone modifications and co-activators (here, we refer to them as `K4-recruiter sites'). The same observation has been made for other DNA-binding proteins, namely Phol, Dsp1, GAF and Zeste, which indicates that the simple idea that clusters of binding sites for these proteins define a PRE is not correct.
Why is PcG recruitment triggered at PcG-bound sites (i.e. PREs), but not at non-PcG-bound sites (i.e. K4-recruiter sites) occupied by similar combinations of DNA-binding proteins (Fig. 2)? First, a high binding ratio between Pho and its homologous protein Phol is a strong predictive feature of PREs, whereas a low Pho/Phol ratio marks K4-recruiter sites (Fig. 2A). Pho and Phol bind to the same DNA sequence in vitro and thus could play redundant roles in PcG protein complex-mediated gene silencing. However, the binding of Pho and Phol is not enough to explain PcG protein targeting: PcG protein binding is lost at the bxd PRE in pho/phol double-mutant wing discs (Wang et al., 2004), but Pho binding sites alone are insufficient to tether PcG proteins to DNA in vivo (Brown et al., 2003; Dejardin et al., 2005). In addition, most PcG sites are stained normally in polytene chromosomes in pho/phol double mutants, despite the lack of detectable Pho and Phol proteins (Brown et al., 2003). Intriguingly, Phol is only bound at a subset of PREs, whereas it frequently binds at promoter regions that are not PcG target genes, which suggests that it might primarily assist active transcription rather than PcG protein-dependent silencing. By contrast, Pho is found at almost all PREs, which suggests that the PhoRC complex might be required for anchoring other PcG protein complexes at PREs (Fig. 2B). Second, a combination of several DNA-binding proteins, including as yet unknown factors that discriminate PREs from K4-recruiter sites, could be responsible for tethering PcG protein complexes to PREs (Fig. 2C). A third possible contribution could depend on the fact that many PREs are transcribed into long non-coding RNAs (ncRNAs). These might be bound by PcG proteins, which could result in PcG protein recruitment to PREs (Fig. 2C). Small interfering RNAs and the RNAi machinery might also contribute to PcG protein complex recruitment (for a review, see Hekimoglu and Ringrose, 2009). Finally, PcG protein complex binding might be actively blocked by the presence of transcription factors or co-activators at K4-recruiter sites that are not bound at PREs (Fig. 2D).
Sequence motif distribution at PREs versus non-PRE sites
Kwong et al. have reported that a long Pho-binding motif is overrepresented at sites that are bound only by Pho, as compared with sites bound by both Pho and Pc, whereas the frequency of GAGA and Zeste motifs did not differ between these sites (Kwong et al., 2008; Schuettengruber et al., 2009). Conversely, Oktaba et al. have suggested an extensive Pho motif and GAGA motifs as a signature of PhoRC-bound PREs (Oktaba et al., 2008). A higher density of Pho binding sites seems to be specific to PREs as compared with K4-recruiter sites (Schuettengruber et al., 2009), which suggests that cooperative binding could be involved in efficient PcG protein complex recruitment (Fig. 2B). In addition, the distribution of Pho motifs around PREs is less localized, and they are also found in the surrounding regions of the core PRE, whereas at K4-recruiter sites, Pho motifs are localized right at the transcription start site. Interestingly, an unbiased screening approach known as unsupervised sequence analysis has identified sequence motifs for known and unknown factors that are enriched in PREs or K4-recruiter sites (Fig. 2C,D) (Schuettengruber et al., 2009). PRE-enriched sequences constitute potential candidates for novel recruiters. By contrast, activator motifs enriched in K4-recruiter sites might be involved in blocking the binding of PcG protein complexes to active sites. Accordingly, in mammalian cells, PcG binding sites strongly correlate with the absence of motifs capable of conferring transcriptional activity (Ku et al., 2008).
In summary, the DNA sequences at PREs appear to contain much of the information needed for the recruitment of PcG protein complexes. However, individual sequence motifs are likely to be working combinatorially, and none of the identified motifs seems to be able to drive PcG protein complex recruitment by itself.
The quest for mammalian PREs
The evolution of DNA sequences at PREs between different Drosophila species is very dynamic, providing a rich source of potential diversity between species (Hauenschild et al., 2008). This observation might explain why, even though Drosophila PREs have been known for over 15 years, no PREs have yet been identified in mammals. PREs might be simply defined as DNA elements that are necessary and sufficient for the recruitment of PcG complexes and for the PcG-dependent silencing of flanking promoters. Do such elements exist in vertebrates? If so, their DNA sequences are probably rather different from those of fly PREs. Large CpG islands depleted of activating factor motifs colocalize with PcG protein complexes in pluripotent cells (Ku et al., 2008); these are, as of today, the best candidates for PRE function in mammals.
What about sequence-specific DNA-binding factors? Of the putative fly recruiter proteins, only the Dsp1 homolog HMGB2 and the Pho homolog YY1 are conserved in vertebrates. HMGB2 has been shown to form a complex with YY1 that might be involved in silencing (Gabellini et al., 2002). YY1 has been reported to be associated with PcG protein complex binding and is required for EZH2 binding to chromatin in mouse myoblasts (Caretti et al., 2004). However, genome-wide profiling of YY1 is now required to confirm whether the binding of this protein is predictive of PcG binding. Three other DNA-binding proteins have been suggested to recruit mammalian PcG proteins. The zinc-finger protein AEBP2 has recently been suggested to be involved in mammalian PRC2 targeting (Kim et al., 2009) and co-purifies with PRC2 components (Cao and Zhang, 2004b). The SET domain of EZH proteins has been shown to be required for PRC2 recruitment, which suggests that this domain might be essential for PRC2 targeting via its interaction with DNA-binding factors (Margueron et al., 2008). As AEBP2 does not directly interact with EZH2 in vitro (Cao and Zhang, 2004b), additional proteins might be involved in tethering PRC2 to DNA. The second candidate recruiter protein in mammals is the transcription factor SNAIL1, which interacts with EZH2 and SUZ12 and can recruit the PRC2 complex to repress the E-cadherin (cadherin 1) gene (Herranz et al., 2008). The third candidate recruiter protein is PLZF (ZBTB16), which has been shown to repress the HoxD locus via the recruitment of PcG proteins. PLZF interacts with BMI1 in coimmunoprecipitation assays, and colocalizes with BMI1 at the same nuclear bodies (Barna et al., 2002). In addition, the PML/RARA fusion protein, which forms after a translocation between chromosome 15 and 17 and is a hallmark of acute promyelocytic leukemia, has been found to associate with PcG protein complexes (Villa et al., 2007), whereas the similarly leukemia-associated PLZF/RARA fusion protein forms a stable component of the PRC1 complex and leads to the ectopic recruitment of both PRC1 and PRC2 (Boukarabila et al., 2009).
In summary, DNA-binding factors and CpG islands are likely to be involved in the recruitment of PcG proteins to chromatin, and the identification of mammalian PREs seems to be only a short step away. As in flies, the role of ncRNAs in PcG protein recruitment is not yet fully established, but it is possible that RNA species contribute to PcG protein recruitment at least for a subset of their targets (Rinn et al., 2007; Zhao et al., 2008) (reviewed by Hekimoglu and Ringrose, 2009).
PcG proteins in embryonic stem cells and cell fate decisions
PcG proteins, long considered to represent epigenetic gatekeepers of cellular memory processes, are also capable of tissue-specific and dynamic gene regulation during fly development (Kwong et al., 2008; Negre et al., 2006; Oktaba et al., 2008). In particular, testis-specific transcription factors have been shown to counteract PcG protein complex-mediated silencing by selectively removing PcG protein complexes from target promoters to activate testis-specific genes (Chen et al., 2005).
The recent identification of H3K27 demethylases has confirmed that PcG protein-dependent histone modifications can be actively removed, enabling the activation and the dynamic regulation of genes repressed by PcG protein complexes (reviewed by Swigut and Wysocka, 2007). Two JmjC domain-containing proteins, JMJD3 (KDM6B) and UTX (KDM6A), have been identified as histone demethylases specific for H3K27. It is still unclear, however, whether these enzymes specifically counteract PcG protein complex-mediated silencing, or whether they play a more general role in transcriptional regulation (for a review, see Schwartz and Pirrotta, 2008).
Dynamic gene regulation by PcG proteins is even more prominent in mammalian ES cells. Genome-wide mapping studies in mouse and human ES cells have shown that PcG complexes are predominantly bound at genes that encode master developmental regulator proteins, such as homeodomain-containing transcription factors that regulate diverse developmental pathways (Boyer et al., 2006; Lee et al., 2006). Many of these regulator genes are repressed in ES cells. Upon differentiation, a discrete set of these genes becomes activated, which indicates a crucial role for PcG proteins in the dynamic regulation of stem cell identity and cell fate determination. Nevertheless, EED and SUZ12 are dispensable for ES cell derivation and, contrary to earlier reports (O'Carroll et al., 2001), ES cells can also be derived from Ezh2-/- embryos (Shen et al., 2008). Thus, ES cells do not require the H3K27me3 mark for their establishment and self-renewal. However, PcG proteins are probably involved in the stable maintenance of ES cell identity, given that Eed-/- ES cells are prone to differentiate (Boyer et al., 2006; Chamberlain et al., 2008). Moreover, Eed-/- ES cells are unable to give rise to the full range of cell types in differentiation assays in vitro (Chamberlain et al., 2008), indicating that PcG proteins are required for ES cell pluripotency in the strictest sense.
PcG proteins not only prevent differentiation by repressing specific genes, but can also enable and modulate differentiation in response to appropriate signals. ES cells with impaired PcG function fail to repress pluripotency genes efficiently during differentiation, and differentiation markers are not derepressed completely (Pasini et al., 2007). SUZ12-deficient ES cells show differentiation defects (Pasini et al., 2007), and BMI1 expression is required for neuronal differentiation (Cui et al., 2006). The inactivation of RING1B in embryonic neural stem cells affects self-renewal and results in precocious neuronal, but not glial, differentiation (Roman-Trufero et al., 2009). Therefore, PcG proteins play a crucial, context-dependent role both in the maintenance of stem cell proliferation and in differentiation processes.
Bivalent chromatin domains and PcG targets during cell fate commitment
H3K27me3 is distributed over large chromosomal regions and its distribution correlates with PRC2 binding, covering up to 20% of gene promoters in ES cells (Boyer et al., 2006; Lee et al., 2006). Surprisingly, most of these promoters are also marked by the activating histone modification H3K4me3, resulting in the identification of so-called `bivalent domains' (Bernstein et al., 2006). The coexistence of these two opposing histone marks at the same nucleosome has long been controversial, as it cannot be easily distinguished whether this dual histone mark represents different sets of cells within a cultured population, or neighboring nucleosomes, each having one of the two marks. However, several independent groups have now mapped these bivalent domains in different cell systems using different platforms and techniques (Bernstein et al., 2006; Mikkelsen et al., 2007; Pan et al., 2007; Zhao et al., 2007) and, together, the data suggest that bivalent states really do exist within single cells.
The current hypothesis is that bivalent chromatin states poise genes for subsequent activation. Indeed, a large proportion of PcG target genes, including key developmental regulators, is activated upon differentiation and concomitantly loses the repressive H3K27me3 mark. However, bivalent domains predispose their targets not only for gene activation, but also for repression. After a specific cell fate decision, non-induced bivalent genes tend to lose the active H3K4me3 mark, whereas the repressive H3K27me3 mark is kept (reviewed by Pietersen and van Lohuizen, 2008).
In a recent study, Ezhkova and co-workers shed light on the role of PcG proteins during later steps of cell commitment (Ezhkova et al., 2009). During epidermal lineage differentiation, basal cells, which give rise to other epidermal cell types, are rich in EZH2, but PcG protein levels decrease upon terminal differentiation. Interestingly, in mice in which EZH2 is conditionally ablated, genes that are involved in epidermal differentiation are selectively activated, whereas PcG protein target genes that are involved in controlling pluripotency or other differentiation pathways are not derepressed. Provided that only a small subset of PcG protein target genes in embryonic fibroblasts become reactivated after PcG knockdown (Bracken et al., 2006), this indicates that other silencing mechanisms, such as DNA methylation of cytosine residues within CpG dinucleotides (CpG islands), contribute to stable gene silencing.
PcG proteins and CpG island methylation during cell fate commitment
Although PcG sites are characterized by a high density of CpG dinucleotides (Gal Yam et al., 2008), the relation between PcG proteins and the DNA methylation of CpG islands is not straightforward. EZH2 and BMI1 interact with DNA methyltransferases and might recruit them to PcG protein target genes (Negishi et al., 2007; Vire et al., 2006). Alternatively, DNA methylation might not be directly linked to PcG proteins, but might replace PcG protein-based repression to silence genes stably (Gal Yam et al., 2008). Thus, PcG proteins could be considered part of a flexible silencing system that postpones lineage choices until the appropriate signals have been received. Upon lineage commitment, pluripotency genes at the top of the hierarchy, as well as alternative differentiation pathways, are stably silenced by DNA methylation, while the PcG protein complex silencing machinery still dynamically represses genes at the bottom of the differentiation hierarchy (Fig. 3). This view departs from the original assumption that PcG proteins convey stable cellular memory and suggests instead that the memory function for PcG proteins might be an exception rather than the rule.
In a recent report, Mohn and co-workers used an elegant murine system of progressive transition from ES cells to neuronal progenitor cells to terminally differentiated neuronal cells to demonstrate these dynamic switches in PcG protein-associated histone marks and DNA methylation during differentiation (Fig. 3) (Mohn et al., 2008). The active H3K4me2 mark is present at almost all CpG island promoters, including inactive genes, in ES cells. Similarly, H3K4me3 has been shown to be present at more than three-quarters of annotated promoters in ES cells, suggesting that these histone marks have additional functions in gene regulation. Very few CpG islands are DNA methylated in ES cells (Meissner et al., 2008), whereas in terminal differentiated cells, a large number of CpG island-containing promoters are DNA methylated. However, DNA methylation already represses most of the pluripotency genes in progenitor cells and, during the transition to a terminal differentiated state, DNA methylation is less dynamic. In addition, PcG target genes in ES cells are more likely to become methylated de novo in neuronal progenitors, which suggests that PcG protein-dependent repression and de novo methylation are linked. Most interestingly, many neuron-specific genes that are activated upon terminal differentiation and that are not marked by H3K27me3 in ES cells gain H3K27me3 in progenitor cells and become bivalent (Fig. 3). This discovery of de novo bivalent domain formation has three important implications. First, bivalent domains appear to be the consequence of PRC2 targeting and are not ES cell-specific features. This is in agreement with an earlier report that showed the presence of bivalent domains in differentiated cells (Mikkelsen et al., 2007). Second, PcG proteins can prime genes for both activation and repression during terminal differentiation. Third, the de novo formation of bivalent domains at later developmental stages indicates that the fate of all targets is not already predetermined in ES cells.
Additional bivalent chromatin state regulators
Additional regulatory factors have been shown to be involved in regulating the activity of key developmental genes by resolving these bivalent chromatin structures: many bivalent genes are occupied by a non-processive form of RNA polymerase II (RNA pol II) that is poised for gene activation and experience transcriptional initiation, whereas efficient elongation is blocked. Interestingly, PRC1-mediated histone H2A ubiquitylation (H2AK119ub1) is necessary for this block (Stock et al., 2007). Another report has shown that the histone variant H2AZ is found at many silent developmental regulator genes that are co-occupied by PRC2 components. This histone variant might thus be the target of the ubiquityltransferase activity of PRC1 in ES cells (Creyghton et al., 2008). In addition, H2AZ has been shown to protect genes from DNA methylation (Zilberman et al., 2008). The presence of H2AZ at bivalent genes might thus keep them silent in ES cells, yet poised for activation by protecting them from DNA methylation (Fig. 4). Intriguingly, H2AZ levels have been reported to be directly proportional to gene activation (Barski et al., 2007; Mavrich et al., 2008; Schones et al., 2008). Accordingly, upon differentiation, H2AZ is relocalized to a set of highly expressed genes that are distinct from the targets of H2AZ in ES cells (Creyghton et al., 2008). Therefore, the removal of H2AZ from bivalent genes might be essential to resolve bivalent domains into H3K4me3 or H3K27me3 monovalent domains, which establishes H2AZ, together with PcG proteins, as an important regulator of cell fate transitions upon the induction of differentiation (Fig. 4).
TrxG proteins have also been shown to be required to resolve silenced bivalent domains during neurogenesis (Lim et al., 2009). The mixed-lineage leukaemia 1 (MLL1) histone H3K4 methyltransferase is required for neuronal differentiation. DLX2, a bivalently marked key developmental regulator for neurogenesis, is a direct target of MLL1. Upon differentiation, DLX2 is activated, which results in the loss of the H3K27me3 mark. However, Dlx2 gene activation is impaired in Mll1 mutant cells, and the gene remains bivalently marked. The H3K27 demethylase UTX has been found in a complex with MLL2/3 (Lee et al., 2007), and another K27 demethylase, JMJD3, has been shown to be essential for the resolution of bivalent domains during macrophage activation (De Santa et al., 2007). It is tempting to speculate, therefore, that MLL1 contributes to the resolution of bivalent domains by recruiting H3K27-specific demethylases to their targets.
Finally, the histone demethylase RBP2 (also known as JARID1a and KDM5A) has been shown to be recruited by PRC2 to a large number of PcG protein target genes in mouse ES cells (Pasini et al., 2008). RBP2 might be responsible for removing the active H3K4me3 mark from bivalent promoters during differentiation. However, it remains unclear how the demethylase activity at PcG target genes is turned down in ES cells to retain bivalent domains.
Is bivalency conserved?
Do bivalent domains also exist in flies? Thus far, there is no evidence that bivalency is a common feature of the Drosophila genome. Mapping studies in fly embryos have shown that H3K4me3 and H3K27me3 are generally mutually exclusive (Schuettengruber et al., 2009). Moreover, H3K4me3 has not been found to be spread over larger regions, as seen at bivalent domains in mammalian cells, but is tightly localized to gene promoter regions. Although it cannot be excluded that bivalency exists in a small subset of embryonic cells or at other stages of fly development that have not yet been analyzed, this intriguing difference might suggest that PcG proteins have somewhat different functions in mammalian versus insect biology. Although PcG proteins might have an important function in maintaining the memory of gene silencing states in insects, this function could have been partly replaced by DNA methylation in vertebrates, and PcG proteins might rather silence genes dynamically or during short-term cellular memory phenomena.
Post-translational modification of PcG and trxG protein complexes
Whereas much work has been devoted to the isolation of PcG and trxG protein complexes, less is known about the post-translational modifications of these proteins. Three recent studies have identified an interesting function for O-linked beta-N-acetylglucosamine glycosylation (O-GlcNAcylation) in the regulation of PcG and trxG members (Fujiki et al., 2009; Gambetta et al., 2009; Sinclair et al., 2009). O-GlcNAcylation is found in many nuclear and cytoplasmic proteins and modulates fundamental cellular processes including signaling (Hart et al., 2007). O-linked N-acetylglucosamine transferase (OGT) has now been linked to trxG-dependent gene activation and granulopoiesis (Fujiki et al., 2009). The HMTase MLL5 is in a GlcNAcylation-dependent complex that is associated with nuclear retinoic acid receptor (RARA) and OGT. OGT is essential for O-GlcNAcylation of MLL5 in the SET domain, and this modification is required for the H3K4 methyltransferase activity of MLL5. Importantly, O-GlcNAcylation of MLL5 facilitates retinoic acid (RA)-induced granulopoiesis in human promyelocytes by binding and activating a major granulopoietic regulator gene via H3K4 methylation. RNAi knockdown of MLL5 or OGT results in reduced RA-induced gene activation and impaired granulopoiesis.
Interestingly, two independent studies demonstrated a role for OGT in PcG-dependent repression in flies (Gambetta et al., 2009; Sinclair et al., 2009). OGT (previously described genetically as sxc) glycosylates Ph, and mutant flies that lack OGT fail to maintain PcG-dependent repression. However, it remains to be determined whether Ph is the major substrate of OGT in PcG-dependent repression, and how O-GlcNAcylation of Ph contributes to its function.
If O-GlcNAcylation of PcG proteins also occurs in mammals, it is tempting to speculate that this modification might regulate PcG protein function during cell fate choice, adding one more layer of complexity to the regulation of lineage determination.
Despite the growing body of knowledge in the field of PcG protein complex-dependent mechanisms, many central questions remain unanswered. For example, we still do not know the rules that govern PcG protein complex targeting. Higher resolution mapping techniques (such as ChIP-seq) and refined sequence analysis could help to `crack' the DNA code that defines PREs in flies. A crucial advance would be the identification of mammalian PREs, which is still pending. Future work should also clarify how general the role of ncRNAs is in PcG protein complex recruitment, and whether histone variants play a role in the recruitment of PcG protein complexes.
Much has been learned from genome-wide mapping studies of PcG protein complexes and their associated histone marks. Owing to technical limitations, most of these studies have been performed in transformed cultured cells, undifferentiated ES cells or in heterogeneous cell populations, such as Drosophila embryos. Given the dynamic and tissue-specific regulation of PcG protein complexes, the next step will be to extend these studies to specific tissues and developmental stages using homogeneous cell populations. It would also be of great interest to analyze how epigenetic programs are changed in tumor cells as compared with normal tissues. To achieve these goals, it will be necessary to establish genome-wide maps of small cell samples, such as purified stem cell populations, using fluorescence-activated cell sorting or human tumor biopsies. Such analyses should also help to shed light on the question of whether bivalency exists in specific Drosophila cell types or whether other histone marks functionally substitute for mammalian bivalent domains.
Note added in proof
Two recent papers bring new insight into the mechanism of epigenetic inheritance and the function of PcG proteins in cancer. The first paper (Margueron et al., 2009), which analyzed the role of the EED subunit of the PRC2 complex, suggests that EED may stimulate a cooperative methylation of H3K27 that PRC2 might use to propagate silent chromatin states through DNA replication. The second paper (Classen et al., 2009) strengthens the conclusion of Martinez et al. (Martinez et al., 2009), showing a tumor suppressor function for Drosophila PcG proteins. They show that PcG mutants induce cancer concomitantly with activation of JAK/STAT signaling and that a crucial JAK/STAT gene is a direct PcG target.
We thank Amos Tanay (Weizmann Institute, Israel), Mythily Ganapathi and Inma Gonzalez for critically reading the manuscript. B.S. was supported by a fellowship of the Fondation de la Recherche Médicale (FRM). G.C. was supported by grants of the European Union FP6 (Network of Excellence the Epigenome), by the Agence Nationale de la Recherche and by the Association pour la Recherche sur le Cancer.
- © 2009.