Morpholinos for splice modificatio

Morpholinos for splice modification


The growing catalog of small RNAs and their association with distinct Argonaute/Piwi family members
Thalia A. Farazi, Stefan A. Juranek, Thomas Tuschl


Several distinct classes of small RNAs, some newly identified, have been discovered to play important regulatory roles in diverse cellular processes. These classes include siRNAs, miRNAs, rasiRNAs and piRNAs. Each class binds to distinct members of the Argonaute/Piwi protein family to form ribonucleoprotein complexes that recognize partially, or nearly perfect, complementary nucleic acid targets, and that mediate a variety of regulatory processes, including transcriptional and post-transcriptional gene silencing. Based on the known relationship of Argonaute/Piwi proteins with distinct classes of small RNAs, we can now predict how many new classes of small RNAs or silencing processes remain to be discovered.


Small RNAs perform diverse biological functions, often in a tissue-specific manner. They function by guiding sequence-specific gene silencing at the transcriptional and/or post-transcriptional level (reviewed by Bartel, 2004; Meister et al., 2004; Nakayashiki, 2005; Vaucheret, 2006; Grewal and Jia, 2007; Seto et al., 2007; Zaratiegui et al., 2007). Naturally occurring small RNAs are processed from longer RNA precursors that are either encoded in the genome or are generated by viral replication. Importantly, these natural RNA-silencing processes can be harnessed to induce gene-specific silencing through the provision of non-natural RNA precursors or mimics of natural, small RNA processing intermediates. This approach, known as RNA interference (RNAi), is widely used for the systematic analysis of gene function, and its potential therapeutic applications are currently under intense investigation (reviewed by Bumcrot et al., 2006; Echeverri and Perrimon, 2006). However, to efficiently harness the machinery of RNAi, it is essential to elucidate how the different types of small RNA molecules are generated.

Distinct sequence and/or structural elements within the precursor transcripts of various classes of small RNAs recruit RNA-processing enzymes and proteins that are responsible for small RNA maturation, and also for the subsequent assembly of the small RNAs into effector complexes that mediate small RNA function. The best-characterized RNA structure that triggers RNAi is double-stranded RNA (dsRNA), either in the form of a hairpin (>20 bp) or a longer dsRNA. Recently, small RNA classes that may originate from apparently single-stranded RNA (ssRNA) transcripts have also been identified (Aravin et al., 2006; Girard et al., 2006; Grivna et al., 2006a; Lau et al., 2006; Ruby et al., 2006; Brennecke et al., 2007). The key feature that distinguishes the different classes of small RNAs from each other is their length, with peak lengths varying from 21 to 30 nucleotides (nt). The lengths of the different classes of small RNAs vary due to distinct mechanisms of biogenesis. Other significant differences between them are the presence of a 5′ uridine, phosphorylation at the 5′ end, and 2′-O-methylation at the 3′ end of the RNA molecule.

These characteristics of small RNAs determine their loading onto effector ribonucleoprotein (RNP) complexes. These effector complexes mediate different small RNA functions at the transcriptional and/or post-transcriptional level, such as mRNA cleavage, translational repression, and regulation of chromatin structure. For example, the effector complex that mediates catalytic mRNA cleavage is known as RNA-induced silencing complex (RISC), the effector complex that mediates translational repression directed by microRNAs (miRNAs) is known as miRNP, and the effector complex that mediates chromatin regulation is the RNA-induced transcriptional gene silencing (RITS) complex (reviewed by Meister and Tuschl, 2004).

Small RNA-associated RNP complexes contain at their center an Argonaute/Piwi (Ago/Piwi) protein family member (see Table 1) and are loaded with distinct classes of small RNAs to form target-recognizing complexes (Hammond et al., 2001; Hutvagner and Zamore, 2002; Martinez et al., 2002; Mourelatos et al., 2002; Meister et al., 2004; Aravin et al., 2006; Grivna et al., 2006b; Lau et al., 2006; Saito et al., 2006; Vagin et al., 2006; Watanabe et al., 2006; Brennecke et al., 2007; Gunawardane et al., 2007; Houwing et al., 2007). The number of Ago/Piwi genes varies considerably among species, setting an upper limit on the number of classes of small regulatory RNAs that remain to be identified and the number of small RNA-guided regulatory processes. The tissue specificity that is associated with the expression of various members of the Ago/Piwi protein family and their small RNA precursors add further complexity to our understanding of small RNA-regulated processes.

View this table:
Table 1.

Ago/Piwi proteins and their small RNA partners

Here, we review the currently identified classes of small RNAs, summarize what is known about their cellular functions, and discuss their protein partners, focusing on their association with specific Ago/Piwi protein members.

The Ago/Piwi protein family

The Ago/Piwi protein family is well conserved, and members have been identified in all species that possess small RNA-mediated phenomena (see Fig. 1A) (reviewed by Parker and Barford, 2006; Peters and Meister, 2007). Based on their sequence similarities, Ago/Piwi proteins can be divided phylogenetically into three families (see Fig. 1A). The largest family comprises the Argonautes (Ago), named after its founding member in Arabidopsis thaliana. The second family comprises the Piwis, named after the Drosophila melanogaster protein PIWI (P-element induced wimpy testis). The third family, Class 3, consists exclusively of Caenorhabditis elegans proteins. Different members of the Ago/Piwi family often show distinct tissue distribution, which allows some to mediate tissue-specific small RNA functions (see Table 1). The importance of individual members of the Ago/Piwi protein family has been assessed in many genetic studies (see Table 1). These studies, however, are sometimes complicated by redundant or overlapping Ago/Piwi protein functions and expression.

Ago/Piwi proteins have a molecular weight of ∼90 kDa and show an overall bilobal architecture (see Fig. 1B). The first lobe contains the N-terminal PAZ-domain that is responsible for binding the 3′-end of the guide small RNA. The second lobe contains the MID-domain, responsible for binding the 5′-phosphate of the guide RNA, and the RNase H endonuclease domain, also known as the PIWI-domain (reviewed by Parker and Barford, 2006; Patel et al., 2006; Tolia and Joshua-Tor, 2007).

Fig. 1.

Phylogenetic tree and crystal structure of the Ago/Piwi proteins. (A) Phylogenetic tree of the Ago/Piwi protein family. Alignments were generated using ClustalW ( The length of each branch represents an estimate of the genetic distance. The alignment was done using the PAZ domain (when present), the PIWI domain and the C terminus. The sequences are based mainly on published RefSeqs for PAZ/PIWI domain-containing proteins at PubMed ( The Accession numbers of the sequences can be obtained from the authors. Asterisks indicate Ago/Piwi members with experimentally determined cleavage activity. The interacting classes of small RNAs are indicated next to their corresponding Ago/Piwi family members (evidence stems from biochemical and/or genetic experiments). Ago/Piwi family members are designated according to their affiliation to either the Ago protein family, the Class 3 protein family or the Piwi protein family. C. elegans Alg1, Alg2, T23D8.7, ZK757.3, T22B3.2 comprise the Ago family; PRG-1, PRG-2 comprise the Piwi family; Sago-1, Sago-2, PPW-1, R06C7.1, F55A12.1, PPW-2, F58G1.1, C06A1.4, R04A9.2, Y49F6A.1, T22H9.3, C16C10.3, CSR-1, M03D4.6, ZK1248.7, C14B1.7, C04F12.1 comprise the Class 3 family. Mammalian Ago proteins are also known as eIF2Cs (eukaryotic translation initiation factors). Mammalian Piwil1 is also known as Hiwi; Miwi or Riwi depending on the species (human or mouse or rat), Piwil2 is also known as Hili, and Piwil4 as Hiwi2. Cniwi is the Piwi protein in Podocoryne carnea, and Seawi is the Piwi member in the sea urchins Strongylocentrotus purpuratus and Paracentrotus lividus. (B) Ribbon diagram of the structure of the Aquifex aeolicus Piwi protein, showing its bilobed architecture. The functions of the protein domains are further discussed in the text.

The PAZ domain is a RNA-binding module of about 100-200 amino acids. This domain recognizes the 3′ end of small RNAs by inserting the backbone of the small RNA into a preformed hydrophobic pocket (reviewed by Patel et al., 2006). Most miRNAs and small interfering RNAs (siRNAs) are initially bound to the Ago/Piwi-containing RNP complex as duplexes with a two nt 3′ overhang, and are subsequently unwound, resulting in bound ssRNA intermediates. The bound ssRNA strand is referred to as the `guide' RNA, whereas the ssRNA strand not stably incorporated into the RNP complex is referred to as the `passenger' or `star' RNA, and is cleaved by the Ago/Piwi proteins (Matranga et al., 2005; Miyoshi et al., 2005). A PAZ domain is also found in most Dicer RNase III family members, where it is believed to position one end of the RNA duplex at a defined distance from the endonuclease domain, thereby producing small RNAs of a defined length (Zhang et al., 2004; Macrae et al., 2006; MacRae et al., 2007). Dicer RNase III family members are endonucleases required for the biogenesis of specific classes of small RNAs by generating small dsRNAs from longer dsRNA precursors (see more detailed discussion below).

The MID domain has structural homology to the sugar-binding domain of the lac repressor and is around 150 amino acids. It loads the small RNA onto the RNP complex (Nykanen et al., 2001), presumably by receiving and binding the 5′ phosphate of the small RNA presented as a duplex (Chen et al., 2007). It places the 5′ phosphate end of a small RNA in a binding site that is formed by a basic pocket of the MID domain adjacent to the interface with the PIWI domain, the C-terminal carboxylate of the Ago/Piwi protein and a divalent metal cation, such as magnesium (Ma et al., 2005; Parker et al., 2005; Rivas et al., 2005). Within the RISC RNP, the presence of a 5′ phosphate on the bound single-stranded siRNA contributes to the fidelity of the endonuclease activity during target cleavage (Rivas et al., 2005). The MID domain of some Ago proteins contains a sequence motif similar to the methyl(7)G-cap-binding domain of the eukaryotic translation initiation factor eIF4E. The ability of Ago/Piwi proteins to bind the m(7)G cap of the target mRNA suggests that one mechanism of small RNA regulation occurs by controlling the translation initiation of their target mRNAs (Kiriakidou et al., 2007).

The PIWI domain exhibits a RNase H fold, and ranges between 400-600 amino acids. The RNase H domain is conserved among eukaryotes and prokaryotes. It may act as a double-strand-specific endonuclease (also referred to as `Slicer') that can cleave the mRNA targeted by the guide small RNA. Ago/Piwi protein members show sequence variation in the active site, and not all members have endonuclease activity (see Table 1). Some Ago/Piwi proteins include the Asp-Asp-His motif that forms the active site catalytic triad, possessing endonuclease activity with a divalent cation, such as calcium (reviewed by Tolia and Joshua-Tor, 2007). A recent study has revealed that the PIWI domain binds a conserved motif in Ago/Piwi interacting proteins, such as GW182 and Tas3. These proteins are involved in mediating the small RNA-Ago/Piwi complex functions in the processes of translational repression and transcriptional silencing, respectively (see subsequent sections) (Till et al., 2007).

Small RNAs: their biogenesis and function

An overview of the classes of small RNAs and their Ago/Piwi protein-binding partner is shown in Table 1. The molecular characteristics of the small RNAs, including their length, precursor structure and chemical modifications are presented in Table 2. It is sometimes difficult to draw a clear line between the different classes of small RNAs, partly because the nomenclature that was introduced early on in the field did not anticipate the complexity of the small RNAs and the many processes they mediate. The features that can be used to distinguish different classes of small RNAs are: mechanism of biogenesis (or precursor structure); the genomic region they originate from; and their associated protein-binding partner. Classes of small RNAs can be grouped into two main categories, those excised from dsRNA precursors and those derived from transcripts that are probably not double stranded. The best-characterized members of the first category are siRNAs and miRNAs, whereas members of the second category include Piwi-interacting small RNAs (piRNAs) and some repeat-associated-siRNAs (rasiRNAs). In this section, we describe the different classes of small RNAs that have been identified to date and their functions, and in the next section we describe in more detail the proteins involved in their biogenesis.

View this table:
Table 2.

Classes of small RNAs and their characteristics


miRNAs are the most abundant class of small RNAs in animals. They are on average 20 to 23 nts in length and usually have a uridine at their 5′ end. The first representative of this small RNA family, lin-4, was identified in a genetic screen in C. elegans in 1981 (Chalfie et al., 1981), and was molecularly characterized in 1993 (Lee et al., 1993). Plants have on average 120 miRNA-encoding genes (reviewed by Jones-Rhoades et al., 2006), invertebrate animals about 150 (Aravin et al., 2003; Lai et al., 2003; Ruby et al., 2006), and humans close to 500 (Landgraf et al., 2007), which are differentially expressed depending on the cell type and developmental stage. miRNAs have also been identified in DNA viruses (Pfeffer et al., 2004; Pfeffer and Voinnet, 2006) and the green algae Chlamydomonas reinhardtii (Molnar et al., 2007; Zhao et al., 2007). miRNAs can be expressed at high levels, up to ten thousands of copies per cell, and can thus play important regulatory roles by controlling hundreds of mRNA targets (Lim et al., 2003). A repository of miRNAs and miRNA genes from many organisms is available at the miRBase Sequence Database (, a searchable database of published miRNA sequences and annotation. An expression atlas/database of mammalian miRNAs identified in a variety of tissues and cell lines has also recently become available (

Fig. 2.

Biogenesis and mode of action of miRNAs. miRNA biogenesis in (A) animals and (B) plants. The red miRNA strand is the strand incorporated into the Ago effector complex. The blue miRNA strand, referred to as miRNA*, becomes degraded. Drosha acts as the RNase III in some animal nuclei, and nuclear Dicer as the RNase III in the plant nucleus, where it cleaves the pri-miRNA in two steps (1,2). The cytoplasmic RNase III in animals is Dicer. RNAse III enzymes usually partner with distinct double-stranded RNA-binding-domain-containing proteins (dsRBPs, in gold) in the nucleus. Following their export from the nucleus, miRNAs then associate with Ago. In animals, the AGO-containing miRNPs predominantly associate with GW182, a protein with glycine-tryptophan (GW) repeats that is required for P body integrity. The miRNA subsequently translationally represses its target and is then localized to P bodies. In plants, miRNAs predominantly function through target mRNA cleavage, which can also occur in animals (see text for more details). m7G, 5′ methyl(7)G cap of target mRNA; me, 2′-O-methyl group on the 3′ end of the RNA; miRNP, effector ribonucleoprotein complex that mediates translational repression or target mRNA cleavage directed by miRNAs; p, 5′ phosphate group.

miRNA biogenesis

Most miRNAs are encoded in non-coding regions that generate short dsRNA hairpins, and are transcribed by Polymerase II, many as polycistronic transcripts (Tanzer and Stadler, 2004). In animals, they are processed by endoribonucleases in partnership with dsRNA-binding proteins sequentially in the nucleus and cytoplasm (see Fig. 2). In the animal nucleus, the endoribonuclease Drosha excises the miRNA stem loops from the primary transcript (pri-miRNA), producing an approximately 70 nt intermediate (pre-miRNA) (Lee et al., 2002). The pre-miRNA is actively transported to the cytoplasm in a GTP-dependent manner by an export protein complex containing a dsRNA-binding export receptor, such as mammalian Exportin 5 or plant HASTY, and Ran (ras-related nuclear protein) GTPase (reviewed by Kim, 2004). In the cytoplasm, the pre-miRNA is further processed by Dicer to the mature miRNA in the form of a base-paired double-stranded processing intermediate with a 2 nt 3′ overhang. In plants, nuclear-localized Dicer is responsible for pri-miRNA processing to miRNA, followed by the addition of a 2′-O-methyl group at the 3′ end of the miRNA by a methyltransferase (reviewed by Vazquez, 2006). In the cytoplasm, the strand of the duplex whose 5′ end is less stably paired is favored for incorporation into Ago effector complexes (Schwarz et al., 2003). The other strand, referred to as miRNA*, becomes degraded.

An alternative miRNA biogenesis mechanism has recently been identified, the precursors of which reside in introns. In these intronic miRNAs, named mirtrons, the 3′ end of the stem-loop precursor structure coincides with the 3′ splice site, and is cleaved by nuclear pre-mRNA splicing rather than by Drosha (Berezikov et al., 2007; Okamura et al., 2007; Ruby et al., 2007).

miRNA function and Ago/Piwi association

miRNAs have been implicated in many cellular processes by regulating gene expression at the post-transcriptional level. miRNA RNPs mediate diverse functions depending on the particular Ago protein member and the degree of sequence complementarity between the guide miRNA and the target nucleic acid (reviewed by Eulalio et al., 2007; Peters and Meister, 2007; Pillai et al., 2007). Several lines of evidence have identified that the six to eight nts at the 5′ end of an miRNA (position 1-8) are important for target site recognition and have been designated the `seed' region (Lai et al., 2003; Lewis et al., 2003; Stark et al., 2003; Jackson et al., 2006; Lall et al., 2006) (reviewed by Rajewsky, 2006; Sood et al., 2006; Gaidatzis et al., 2007).

miRNA RNP effector complexes guide catalytic target RNA cleavage based on Ago protein sequence variation (see Table 1) at or near the active site, as well as on the degree of mismatches between the miRNA and target RNA (Jackson et al., 2003; Martinez and Tuschl, 2004; Jackson et al., 2006). Most miRNA RNPs with near-perfect complementary guide miRNA/target mRNA mediate mRNA cleavage, whereas RNPs with a greater degree of mismatches inhibit translation and/or trigger the transport of mRNA to mRNA-processing bodies (P-bodies, also known as cytoplasmic GW-bodies) (reviewed by Zamore and Haley, 2005; Valencia-Sanchez et al., 2006; Du and Zamore, 2007; Parker and Sheth, 2007; Pillai et al., 2007). The presence of P-bodies is now considered to be a consequence of small RNA-guided mRNA targeting (Eulalio et al., 2007; Lian et al., 2007). Components of the RNAi machinery localized to P-bodies include members of the Ago/Piwi family, members of the GW-proteins/trinucleotide repeat-containing family of proteins, and RNA helicases (reviewed by Ding and Han, 2007; Parker and Sheth, 2007). Other proteins concentrated in P-bodies include general mRNA translation repression and mRNA decay machinery proteins, such as mRNA decapping proteins, translational repressors, deadenylase complexes and RNA-binding proteins. Small RNA-mediated regulation does not necessarily require localization to P-bodies, and P-bodies are not always detectable (reviewed by Eulalio et al., 2007; Jakymiw et al., 2007). Interestingly, GW-proteins are also involved in the crosstalk between the maternal macronucleus and the developing macronuclei during RNA-mediated DNA elimination processes in the ciliate Paramecium tetraurelia (Nowacki et al., 2005) following the sexual process of conjugation.

In many organisms there is biochemical evidence that miRNAs specifically associate with members of the Ago family. All four human AGO proteins (Liu, J. et al., 2004; Meister et al., 2004), A. thaliana AGO1 (Vaucheret et al., 2004; Baumberger and Baulcombe, 2005; Qi et al., 2005), and C. elegans Alg-1 and Alg-2 (Grishok et al., 2001) interact with miRNAs. Certain members of the miRNA-associated Ago proteins exhibit endoribonuclease activity and are thus capable of target mRNA cleavage (see Table 1). More recently, it has become apparent that D. melanogaster Ago1 preferentially binds miRNAs that have been excised from imperfectly paired hairpin precursors, whereas those miRNAs that have near-perfectly paired hairpin precursors are bound by Ago2 (Okamura et al., 2004; Miyoshi et al., 2005; Forstemann et al., 2007; Tomari et al., 2007).

miRNA conservation

Many miRNAs are represented as families that are defined by the conservation of the seed region. miRNAs identified in one species are often conserved in closely related species (see miRBase), and about 10% of the miRNA families identified in invertebrates are completely conserved in mammals. There is no sequence conservation between the miRNAs of animals and plants. Plant and animal miRNAs have different 3′-end modifications: plant miRNAs are 2′-O-methylated (Yu et al., 2005), whereas animal miRNAs are unmodified (Kirino and Mourelatos, 2007b; Ohara et al., 2007). Animal and plant miRNAs also have different mRNA target-recognition modes: plant miRNAs usually cleave in open reading frames (ORFs), whereas the binding sites of animal miRNAs are most often located in 3′ untranslated regions (UTRs) (reviewed by Bartel, 2004; Stark et al., 2005; Gaidatzis et al., 2007). Moreover, plant miRNAs show a greater degree of complementarity to their mRNA target than do animal miRNAs, and primarily function through mRNA cleavage. Animal miRNAs target mRNA 3′ UTRs predominantly by seed sequence complementarity and are rarely fully complementary; they therefore function through translational repression rather than cleavage. A recent study in mammals revealed that the sequence that surrounds the 3′ UTR target region that is complementary to the miRNA seed region also contributes to the repression of a target mRNA by a miRNA (Grimson et al., 2007; Nielsen et al., 2007).


The first hunch that small RNAs mediate gene silencing came from their observation in transgenic co-suppressing plants (Hamilton and Baulcombe, 1999). Co-suppression is triggered by the genomic integration of an additional gene (or of gene segments) that is identical to a host gene, and results in the reduced accumulation of RNA molecules that share sequence similarity with the introduced nucleic acid. Biochemical studies following the discovery of RNAi in C. elegans (Fire et al., 1998) revealed that small RNAs were processed from dsRNA triggers (Zamore et al., 2000). Because these dsRNA processing products were able to efficiently reconstitute silencing complexes, they were named siRNAs (Elbashir et al., 2001a).

siRNA biogenesis

siRNAs have a distinct size distribution, but, in contrast to miRNAs, which are excised in a precise fashion from their dsRNA precursor, siRNAs are processed in a more random fashion (Elbashir et al., 2001a) from longer dsRNAs (see Fig. 3A) (Hammond et al., 2000; Zamore et al., 2000). They are processed by Dicer, producing two nt 3′ overhangs, similar to the final processing intermediate of the miRNA pathway.

siRNAs can be produced from RNA transcribed in the nucleus (endogenous siRNAs), or can be virally derived or experimentally introduced as chemically synthesized dsRNA (exogenous siRNAs). Endogenous plant siRNAs can be generated directly from transcription or can be derived from inverted repeats of transgenes or transposons. They include natural antisense-siRNAs (natsiRNAs), trans-acting-siRNAs (tasiRNAs) and heterochromatic small RNAs (hcRNAs). natsiRNAs are endogenously expressed siRNAs that originate from overlapping sense and antisense transcripts (Borsani et al., 2005). tasiRNAs are generated from specific non-coding genomic regions. Their biogenesis is initiated by Ago1-bound miRNAs that cleave the non-coding ssRNA transcript to produce fragments, which serve as templates for dsRNA synthesis by a RNA-dependent RNA polymerase (RdRP, RDR6) (see Fig. 3A). The dsRNA fragment is subsequently cleaved by a Dicer RNase (Dicer-like 4, DCL4) to yield 21 nt tasiRNAs (reviewed by Vazquez, 2006). Plant hcRNAs are mostly derived from repeat-associated genomic regions (see below).

Fig. 3.

Biogenesis of siRNAs and hcRNAs. (A) Biogenesis of different classes of siRNA (see text for more details). Endogenous siRNAs are transcribed in the nucleus, whereas exogenous siRNAs are either chemically synthesized or virally derived. siRNAs are further processed by RNase III enzymes, such as Dicer. tasiRNAs are specific to plants and, after initial cleavage by specific miRNAs (red) and complementary strand synthesis by the RNA-dependent RNA polymerase (RdRp) RDR6, are processed by the Dicer DCL4. They are then phosphorylated (P) and subsequently methylated (me) by the RNA methyltransferase HEN1. In C. elegans and plants, secondary siRNAs participate in a signal amplification loop. In plants, the cleaved mRNA is converted into dsRNA by a RdRP, and is further processed by Dicer (the RNase III in green). In C. elegans, secondary siRNAs have a 5′ di- or triphosphate group and associate with Class 3 Ago/Piwi protein members, Sago-1/Sago-2, leading to target mRNA (black) cleavage. (B) Biogenesis of hcRNAs (see also text for more details). In the yeast S. pombe and in plants, hcRNAs are processed by RNase III enzymes. In S. pombe, hcRNAs associate with Ago1 and form the RNA-induced transcriptional gene silencing (RITS) complex, which participates in RNA-directed RNA polymerase complex (RDRC) formation and histone (gray circles) methylation. In plants, hcRNAs form a complex with AGO4, which participates in DNA methylation. Chp1, chromodomain protein 1; Clr4, cryptic loci regulator 4; Cid12, caffeine induced death protein; DRD1, defective in RNA-directed DNA methylation 1 (an SNF2-like chromatin remodeling protein); DRM2, domain rearranged methyltransferase 2; Hrr1, a helicase required for RNAi-mediated heterochromatin assembly; Rdp1, RNA-directed RNA polymerase 1; Tas3, targeting complex subunit 3.

Naturally occurring endogenous siRNAs have also been identified in C. elegans (Ambros et al., 2003; Simmer et al., 2003; Ruby et al., 2006). They include the previously annotated tiny-noncoding RNA (tncRNA) and secondary siRNA. tncRNAs are ∼22 nts in length, depend on Dicer for their biogenesis, and derive from non-coding, non-conserved sequences. If RNAi is induced in C. elegans, primary siRNAs derived from the processing of the trigger dsRNA are generated, as are secondary siRNAs that originate from the unprimed RdRP synthesis of dsRNA (see Fig. 3A) (Pak and Fire, 2007; Sijen et al., 2007). These siRNAs are 21 to 22 nts in length, are of antisense polarity to the targeted gene, and have 5′ di- or triphosphate termini.

So far, endogenous siRNAs have not been identified in mammals or insects. In cultured mammalian cells, siRNAs have been successfully used to analyze gene function (Elbashir et al., 2001b). The exposure of mammalian cells to long dsRNA induces an antiviral interferon response that leads to apoptosis (reviewed by Dorsett and Tuschl, 2004). This reaction can be bypassed by using siRNA duplexes that resemble in size and structure the miRNA processing intermediates. In this setting, siRNAs depend on the cellular miRNA machinery for their function and guide the cleavage of target RNA by binding to Ago2 (Liu, J. et al., 2004).

siRNA function and Ago/Piwi association

siRNAs associate with Ago family members to form siRNA RNP complexes (known as RISC) that guide target mRNA cleavage. Exogenous siRNAs trigger RNAi when provided as a dsRNA with a two nt 3′ overhang, similar to the miRNA or endogenous siRNA intermediates processed by Dicer. Endogenous siRNAs and RNAi are thought to play an important role in defending genomes against transgenes and transposons, as well as against foreign nucleic acids, such as viruses. The random integration of new DNA or the rearrangement of existing sequences, such as by transposons, might trigger the formation of dsRNA. dsRNA might also be generated as a consequence of viral replication or by the action of genome-encoded RdRPs. As discussed below, another RNAi-related mechanism involving piRNAs is also involved in genome defense, predominantly in the germline. Finally, plant and S. pombe hcRNAs play a role in heterochromatin regulation.

In plants, siRNAs are readily identified from virus- and viroid-infected cells, or from transgenic plants that show co-suppression (reviewed by Voinnet, 2005). They fall mainly into two size classes, 21 to 22 nt and 24 nt species. The shorter siRNAs (such as tasiRNAs, natsiRNAs, most viral-derived siRNAs) guide mRNA degradation, while the longer ones (such as hcRNAs) are involved in DNA and histone methylation (see Fig. 3B). Genetic studies suggest that tasiRNAs may form complexes with Ago7 that mediate the cleavage of target mRNAs that are different from the sequences from which the tasiRNAs originate, playing a crucial role in plant development (Adenot et al., 2006).

In C. elegans, the function of tncRNAs has not yet been elucidated. Secondary siRNAs appear to associate with the Class 3 Ago/Piwi proteins Sago-1 and Sago-2, and their function is to support the primary siRNA signal (Yigit et al., 2006).

Small RNAs derived from repetitive genomic sequence

Small RNA sequences identified in clone libraries that do not map to a single genomic region but to many, sometimes thousands of sites, are classified as being repeat derived. Depending on the species and their size distribution, these small RNAs can be classified as being a category of conventional siRNAs (hcRNAs), or as constituting their own class, defined by a distinct mechanism of maturation, and by the Ago/Piwi protein they associate with (rasiRNAs).


hcRNAs identified in Saccharomyces pombe, plants and Trypanosoma brucei are siRNAs that derive from long dsRNA precursors that are transcribed from genomic repeat regions (sometimes also referred to as rasiRNAs). They were initially termed small heterochromatic siRNAs (shRNAs). However, the abbreviation `shRNA' can be misleading, as it was also introduced as an abbreviation for `small hairpin RNA', a precursor for stable expression of siRNAs used for gene silencing.

In the unicellular eukaryote T. brucei, hcRNAs are involved in transposon control, whereas in S. pombe and A. thaliana they are also involved in the regulation of heterochromatin structures, and thus mediate transcriptional gene regulation (Djikeng et al., 2001; Reinhart and Bartel, 2002). In S. pombe, hcRNAs derive from peri-centromeric and mating-type locus repeats, and have been identified in an Ago-containing effector complex (RITS) (see Fig. 3B) (reviewed by Grewal and Jia, 2007). The RITS complex, in addition to Ago, consists of Tas3 (targeting complex subunit 3), an S. pombe-specific protein, and Chp1 (chromodomain protein 1), a chromodomain containing protein. The RITS complex subsequently pairs with the nascent transcript repeat sequences, and recruits the RNA-directed RNA polymerase complex (RDRC) and Clr4 (cryptic loci regulator 4), a histone methyltransferase (see Fig. 3B). This complex has been implicated in nucleation and/or maintenance of heterochromatin by targeting transcripts that emerge from repeat-containing regions and that are supposed to be transcriptionally repressed, thereby establishing a feedback loop that reinforces and sustains the transcriptional silencing of heterochromatic regions. In A. thaliana, the Ago-siRNA complex associates with DRD1 (defective in RNA-directed DNA methylation 1), an SNF2-like chromatin remodeling protein, and with Polymerase IVb, to initiate cytosine methylation via DRM2 (domain rearranged methyltransferase), a DNA methyltransferase. S. pombe hcRNAs associate with Ago1 (Irvine et al., 2006; Buker et al., 2007), whereas A. thaliana hcRNAs interact with Ago4 and Ago6 (Zilberman et al., 2003; Zheng et al., 2007).


A subset of rasiRNAs was identified by cloning from D. melanogaster and D. rerio small RNA libraries (Aravin et al., 2003; Chen et al., 2005; Houwing et al., 2007). Given their association with the Piwi protein family, they are also known as piRNAs and are discussed in the following section.

piRNAs and rasiRNAs


piRNAs are 28 to 33 nts in length and have been characterized by the cloning of small RNAs from anti-Piwi immunoprecipitates prepared from mammalian testes (reviewed by O'Donnell and Boeke, 2007).

piRNA biogenesis

Mammalian piRNAs are not usually derived from repeat sequences, given that the proportion of repeat elements able to generate piRNAs is actually smaller within the piRNA regions than the frequency of repeat sequences in the mouse genome (12-20% versus 38%) (Betel et al., 2007). They are believed to be processed from single-stranded primary transcripts that are transcribed from defined genomic regions and have a preference for a uridine at their 5′ end (see Fig. 4A). Mammalian piRNAs are a highly complex mix of sequences, with tens of thousands of distinct piRNAs generated from the 50 to 100 defined primary transcripts (Aravin et al., 2006; Girard et al., 2006; Grivna et al., 2006b; Lau et al., 2006; Watanabe et al., 2006). This may suggest that mammalian piRNAs, unlike miRNAs, are not processed in a precise manner. However, approximately 20% of all piRNA sequences were cloned three or more times, and many piRNA sequences from the same strand are partially overlapping, suggesting a quasi-random mechanism (Betel et al., 2007). The mechanism of biogenesis of D. melanogaster rasiRNAs is beginning to be elucidated, and may offer parallels for a specific mode of processing for piRNAs as well (see section on biogenesis of rasiRNAs below). piRNA biogenesis is thought to be Dicer independent (Vagin et al., 2006) and they appear to be 2′-O-methylated at their 3′ end (Horwich et al., 2007; Kirino and Mourelatos, 2007b; Kirino and Mourelatos, 2007a; Ohara et al., 2007; Saito, K. et al., 2007).

piRNA conservation and function

Between mammals, mature piRNAs are not conserved; however, the genomic regions, from which they derive, in particular the promoter sequences, are conserved (Betel et al., 2007). Mammalian piRNAs are strongly expressed in the male germline, their total number per cell obtained from testis tissue reaching up to two million, i.e. about 10-fold higher than the miRNA content of these cells (Aravin et al., 2006; Girard et al., 2006; Grivna et al., 2006a; Lau et al., 2006). Although the targets of piRNAs and their mechanism of action are unknown, the knockout in mice of any of the testis-expressed three Piwi proteins (Mili, Miwi, Miwi2) abolishes spermatogenesis (reviewed by O'Donnell and Boeke, 2007; Klattenhoff and Theurkauf, 2008). Mammalian piRNAs may also play a role in transposon regulation, but their mechanism of action is currently uncharacterized. The knockout of the gene that encodes the Piwi protein Miwi2 in mice leads to phenotypes that may be linked to an inappropriate activation of transposable elements (Carmell et al., 2007). Mice mutant for the Piwi protein Mili also show transposon de-repression, thus suggesting that mammalian piRNAs may contribute in some manner to the silencing of transposable elements (Aravin et al., 2007).

Fig. 4.

Biogenesis of piRNAs and rasiRNAs. (A) Biogenesis of piRNAs. In mammals, piRNAs are likely to be processed from single-stranded (ss)RNA precursors, and are further processed by as yet undefined endonucleases (green) before being methylated (me) by methyltransferases. They associate with members of the Piwi subfamily to participate in transposon control and/or other germline-specific functions. (B) Biogenesis of D. melanogaster rasiRNAs/piRNAs. In D. melanogaster, rasiRNAs/piRNAs are also likely to be derived from ssRNA precursors and are further processed by endonucleases (green), including specific members of the Piwi subfamily (as shown in C), before being methylated (me) by methyltransferases. They associate with the Piwi subfamily to participate in histone methylation. Su(var)205 (suppressor of variegation 205) is a heterochromatin-associated protein. (C) Ping-Pong model of D. melanogaster piRNA biogenesis. After the piRNA-directed cleavage of transposon mRNA (piRNA in red; transposon mRNA in black and blue) by the Piwi family member Aubergine (AUB), the resulting blue strand (sense to the transposon mRNA) associates with AGO3, to guide cleavage of the rasiRNA cluster transcript (black and red) that produces additional piRNAs.


Some rasiRNAs are thought to belong to the piRNA class because of their association with members of the Piwi family. In D. melanogaster, unlike miRNAs and siRNAs, which associate with Ago1 and Ago2, rasiRNAs associate with the Piwi family members Piwi, Aubergine and Ago3 (Saito et al., 2006; Brennecke et al., 2007; Gunawardane et al., 2007; Nishida et al., 2007). In D. rerio, rasiRNAs associate with the Piwi protein Ziwi (Houwing et al., 2007). The rasiRNAs of D. melanogaster and D. rerio are distinct from conventional 20 to 23 nt siRNAs or miRNAs, their length ranging between 23 and 28 nt. They have a bias for a uridine at their 5′ end, and are 2′-O-methylated at their 3′ end (Houwing et al., 2007; Saito, T. et al., 2007). Based on genome content, more D. melanogaster and D. rerio rasiRNAs than expected originate from retrotransposon sequences, and from certain repeat-rich genome regions (Betel et al., 2007). rasiRNAs play a crucial role in controlling the expression of homologous sequences dispersed throughout the genome (reviewed by O'Donnell and Boeke, 2007).

rasiRNA biogenesis

Because of the repetitive nature of the genomic regions from which rasiRNAs derive, it is unclear if rasiRNAs derive from dsRNA precursors, but recent findings based on more extensive cloning and sequencing suggest that they have a distinct mechanism of biogenesis that probably involves single-stranded precursors (see Fig. 4B) (Saito et al., 2006; Vagin et al., 2006; Brennecke et al., 2007; Gunawardane et al., 2007). Most of the current information on rasiRNA biogenesis is based on studies in D. melanogaster. The maturation of rasiRNAs is independent of Dicer (Vagin et al., 2006). Processing of the rasiRNA 5′ end is believed to be performed by the Piwi proteins Piwi, Aubergine and Ago3. One model of rasiRNA 5′ processing is called the ping-pong model, in which antisense and sense rasiRNAs, associated with the Piwi proteins Piwi/Aubergine and Ago3, respectively, guide rasiRNA primary transcript cleavage and further participate in an amplification loop to produce additional rasiRNAs that target transposons (see Fig. 4C) (Brennecke et al., 2007; Gunawardane et al., 2007; Nishida et al., 2007). It remains unclear how the ping-pong mechanism is initiated. Recently, two proteins have also been implicated in the biogenesis of the rasiRNA 3′ end: the Phospholipase D nuclease Zucchini and the RNase HII-related protein Squash (Pane et al., 2007).


21U-RNAs are a class of diverse, autonomously expressed small RNAs described in C. elegans that are about 10-times less abundant than are miRNAs (Ruby et al., 2006). They are precisely 21 nt long, begin with a 5′ uridine monophosphate and are modified at the 3′ terminal ribose. Their biogenesis mechanism is currently not well defined, nor is their function. There is no evidence that they are created from a dsRNA precursor, and they originate mostly from two broad, non-coding, but distinct, regions of chromosome IV. 21U-RNAs have a conserved upstream sequence element (also conserved in Caenorhabditis briggsae), which could either be a promoter or a processing signal.


scanRNAs (scnRNAs) have been identified in Tetrahymena thermophila and other protozoa. They are longer than miRNAs, 26 to 30 nt in size, and their biogenesis is Dicer dependent. They participate in chromatin modification, similarly to hcRNAs, leading to DNA elimination (the most extreme form of gene silencing) during differentiation processes following conjugation (Taverna et al., 2002; Liu, Y. et al., 2004; Mochizuki and Gorovsky, 2004). They are associated with the Piwi protein Twi1. Recently, a second, smaller-sized RNA population has been described in T. thermophila, indicating the existence of a second endogenous small RNA pathway in protozoans (Lee and Collins, 2006).

Molecules involved in the biogenesis of distinct classes of small RNAs

Numerous biochemical studies have been conducted to understand small RNA biogenesis. Each class of small RNAs has distinguishing features due to different precursor structures and different mechanisms of processing. Proteins in addition to Ago/Piwi proteins that are involved in small RNA biogenesis include endoribonucleases, dsRNA-binding-domain (dsRBD)-containing proteins, RNA helicases, RdRPs, and RNA methyltransferases (reviewed by Du and Zamore, 2005; Kim, 2005).

RNase III endoribonucleases

The key proteins required for the biogenesis of small RNAs from dsRNA precursors are RNase III endoribonucleases (reviewed by Conrad and Rauhut, 2002; Patel et al., 2006). RNase III was first discovered in Escherichia coli (Robertson et al., 1967), where it modulates the expression of phage, plasmid and cellular genes by its participation in rRNA maturation. Members of the RNase III family are present in all species (MacRae and Doudna, 2007), and, in addition to a possible role in rRNA maturation (Wu et al., 2000), they are required for small RNA biogenesis. Two different RNase III subfamilies have been identified in animals and plants, Dicer and Drosha (reviewed by Kim, 2005). The Dicer subfamily is characterized by having an additional N-terminal RNA helicase domain compared with the E. coli enzyme; the Drosha subfamily has a distinct N terminus of unknown function. In plants, the biogenesis of small RNAs is mediated by four different Dicer RNase enzymes; members of the Drosha subfamily are absent from plant genomes. Animals generally have one Dicer and one Drosha, except insects, which have two Dicers and one Drosha.

The involvement of Drosha and/or Dicer in miRNA and siRNA biogenesis is described in an earlier section and is illustrated in Fig. 2 and Fig. 3A, respectively. Different species have differently sized dsRNA processing products, presumably because of Dicer protein sequence structural variation. For example, whereas invertebrate and vertebrate miRNAs and siRNAs are between 20 and 23 nts, Giardia intestinalis siRNAs are approximately 25 nts long (Macrae et al., 2006), and T. brucei siRNAs are around 24 to 26 nts (Djikeng et al., 2001). Moreover, in plants, 21, 22 and 24 nt RNA species are generated by different Dicers (reviewed by Vazquez, 2006).

In D. melanogaster, the Dicer Dcr-1 is predominantly responsible for miRNA biogenesis, whereas Dcr-2 is required for the dsRNA processing that produces the siRNAs that mediate RNAi (see Fig. 2 and Fig. 3A) (Lee et al., 2004). The recognition of dsRNA by Dcr-1 or Dcr-2 depends on the number of mismatches in the dsRNA precursor; perfectly paired duplexes are preferentially recognized by Dcr-2 (Forstemann et al., 2007; Tomari et al., 2007). Dcr-1-processed small RNAs are preferentially loaded onto Ago1, whereas Dcr-2-processed siRNAs are loaded onto Ago2 (Hammond et al., 2001; Lee et al., 2004; Okamura et al., 2004; Matranga et al., 2005; Miyoshi et al., 2005; Rand et al., 2005; Forstemann et al., 2007; Tomari et al., 2007).

RNA helicases and dsRNA-binding proteins

Other proteins involved in the processing of dsRNA precursors include RNA helicases and dsRBD-containing proteins. RNA helicases have been implicated in the assembly of some small RNA-processing intermediates into effector RNP complexes. They may also play a role in the biogenesis of piRNAs, which probably derive from ssRNA precursors (Klattenhoff et al., 2007). They include the human MOV10, RNA helicase A, and RCK/p54 (Meister et al., 2005; Chu and Rana, 2006; Robb and Rana, 2007), the D. melanogaster Armitage and spindle-E (Cook et al., 2004; Tomari et al., 2004; Lim and Kai, 2007), and the plant SDE-3 (silencing defective locus 3) (Dalmay et al., 2001). Specific dsRBD-containing proteins associate with distinct Dicers in certain species, including plants (reviewed by Vazquez, 2006). For example, in D. melanogaster, the dsRBD-containing proteins R2D2 [contains two dsRNA-binding domains (R2) and is associated with DCR-2 (D2)] and Loquacious/R3D1 partner with Dicer, whereas Pasha partners with Drosha. In humans, the dsRBD-containing proteins PACT (Protein activator of PKR) and/or TARBP2 [TAR (HIV1) RNA binding protein 2], partner with Dicer, whereas DGCR8 (DiGeorge syndrome critical region 8) partners with Drosha. These dsRBD-containing proteins may facilitate dsRNA substrate recognition, the loading of small RNAs onto RNP complexes and the stabilization of RNase III enzymes. The structure of the DGCR8 dsRBD-containing protein core has recently been solved, suggesting that the DGCR8 core recognizes pri-miRNAs in two possible orientations (Sohn et al., 2007).

RNA polymerases

RdRPs are involved in generating dsRNA from ssRNA templates (see examples in Fig. 3A). These templates are the targets of siRNAs that are generated from the trigger dsRNA (Schwarz et al., 2002; Pak and Fire, 2007; Sijen et al., 2007). In nematodes and plants, such regulatory loops are responsible for generating diffusible small RNA-silencing signals that can propagate gene silencing and spread viral resistance (reviewed by Wassenegger and Krczal, 2006). Direct biochemical evidence for dsRNA polymerase activity is sparse and has been reported only for a RdRP isolated from tomato leaves (Schiebel et al., 1993b; Schiebel et al., 1993a). Another RNA polymerase, RNA polymerase IV, is specific to plant genomes and is required for the production of most siRNAs that originate from discrete genomic loci (Zhang et al., 2007).

RNA methyltransferases

Another protein family involved in small RNA biogenesis is the 2′-O-methyltransferases, which, depending on the species and class of small RNA, modify the RNA 3′ end. 2′-O-methylation possibly protects small RNAs from 3′ exonucleases or modulates their affinity for binding to the PAZ domain of different Ago/Piwi proteins (Ma et al., 2004). In plants, all classes of small RNAs appear to be methylated by the RNA methyltransferase HEN1 (Ebhardt et al., 2005; Li et al., 2005; Yu et al., 2005; Yang et al., 2006). In D. melanogaster, the RNA methyltransferase Pimet/DmHen1 methylates small RNAs bound to Ago2 or to one of the Piwi proteins (Horwich et al., 2007; Saito et al., 2007); miRNAs, which predominantly associate with Ago1, are not methylated. In mammals, the RNA methyltransferase mHEN1 is a candidate for the methyltransferase activity that modifies piRNAs (Kirino and Mourelatos, 2007b; Kirino and Mourelatos, 2007a; Ohara et al., 2007). piRNAs are also 2′-O-methylated in D. rerio (Houwing et al., 2007).


Ago/Piwi proteins constitute a large family of proteins, and their numbers and the ways in which they function to regulate gene expression via small RNAs are surprisingly diverse. Given the additional complexity of cell type-specific differences in small RNA expression, complex networks of gene regulation that involve small RNAs are beginning to emerge, networks that orchestrate important regulatory cellular processes. These networks may be further regulated by the cell type-specific expression of RNA-binding proteins and RNA-processing enzymes. The study and the characterization of these networks require the development of new experimental methods and bioinformatic approaches. We believe the most productive approaches will be the generation of antibodies that are specific to Ago/Piwi protein members, and the subsequent sequencing of small RNA cDNA libraries obtained from immunoprecipitations, as well as the analysis of associated nucleic acid targets.

The mechanisms of biogenesis and function of some of the classes of small RNAs also remain to be elucidated. For example, the RNA-processing enzymes involved in the biogenesis of mammalian piRNAs have not yet been identified. Likewise, the role of the proteins implicated in rasiRNA biogenesis awaits further biochemical study.

Understanding the processes mediated by small RNAs and the networks they regulate, as well as their mechanisms of biogenesis, will allow for the design of improved gene silencing methods mediated by small RNAs. The therapeutic development of siRNAs for targeting disease genes is already ongoing, as are studies to inhibit specific miRNAs linked to disease processes. These approaches will also benefit from our improved knowledge of how the various classes of small RNAs are generated and function.


We thank D. Patel for providing the crystal structure of Aquifex aeolicus Ago (Fig. 1). The authors thank all of the members of the Tuschl laboratory for their comments and critical reading of the manuscript, in particular M. Ascano, M. Hafner, M. Landthaler and J. Pena. We would also like to thank J.-B. Ma, and C. E. Rogler for helpful discussions. We apologize to colleagues whose work was not cited due to space limitations.


  • * These authors contributed equally to this work


View Abstract