Morpholinos for splice modificatio

Morpholinos for splice modification


Evaluation and application of modularly assembled zinc-finger nucleases in zebrafish
Cong Zhu, Tom Smith, Joseph McNulty, Amy L. Rayla, Abirami Lakshmanan, Arndt F. Siekmann, Matthew Buffardi, Xiangdong Meng, Jimann Shin, Arun Padmanabhan, Daniel Cifuentes, Antonio J. Giraldez, A. Thomas Look, Jonathan A. Epstein, Nathan D. Lawson, Scot A. Wolfe


Zinc-finger nucleases (ZFNs) allow targeted gene inactivation in a wide range of model organisms. However, construction of target-specific ZFNs is technically challenging. Here, we evaluate a straightforward modular assembly-based approach for ZFN construction and gene inactivation in zebrafish. From an archive of 27 different zinc-finger modules, we assembled more than 70 different zinc-finger cassettes and evaluated their specificity using a bacterial one-hybrid assay. In parallel, we constructed ZFNs from these cassettes and tested their ability to induce lesions in zebrafish embryos. We found that the majority of zinc-finger proteins assembled from these modules have favorable specificities and nearly one-third of modular ZFNs generated lesions at their targets in the zebrafish genome. To facilitate the application of ZFNs within the zebrafish community we constructed a public database of sites in the zebrafish genome that can be targeted using this archive. Importantly, we generated new germline mutations in eight different genes, confirming that this is a viable platform for heritable gene inactivation in vertebrates. Characterization of one of these mutants, gata2a, revealed an unexpected role for this transcription factor in vascular development. This work provides a resource to allow targeted germline gene inactivation in zebrafish and highlights the benefit of a definitive reverse genetic strategy to reveal gene function.


The attributes of the zebrafish have established it as a powerful model for the study of vertebrate development. The accessibility of its externally fertilized embryos allows numerous manipulations, and the transparency of the zebrafish embryo permits direct serial visualization of biological processes in vivo (Beis and Stainier, 2006). These characteristics have enabled forward genetic screens for mutants affecting a wide range of developmental processes (Patton and Zon, 2001). However, the size of the zebrafish genome, the relatively long generation time, and the expense of maintaining large populations preclude forward genetic screening to saturation. Consequently, mutants exist for only a small fraction of zebrafish genes. Although modified antisense oligonucleotides can achieve targeted gene knockdown during embryogenesis (Nasevicius and Ekker, 2000), this technology is problematic with regard to persistence, specificity and penetrance (Robu et al., 2007; Eisen and Smith, 2008). Thus, definitive reverse genetic strategies are required, especially for the study of later developmental processes, analysis of adult physiology, and for the establishment of disease models.

Zinc-finger nucleases (ZFNs) are chimeric fusions between a zinc-finger protein (ZFP) and the nuclease domain of FokI (Urnov et al., 2010). ZFNs have been employed to achieve heritable targeted gene disruption in the genomes of numerous plant and animal species, including zebrafish (Bibikova et al., 2002; Beumer et al., 2008; Doyon et al., 2008; Meng et al., 2008; Foley et al., 2009; Geurts et al., 2009; Cui et al., 2010; Mashimo et al., 2010; Meyer et al., 2010; Takasu et al., 2010). Gene disruption is achieved through imprecise repair of a ZFN-induced double-strand break within the coding sequence of a target gene (Urnov et al., 2010). ZFNs have also been utilized to achieve tailor-made genomic alterations in animal genomes through stimulation of homology-directed repair from an exogenous donor DNA (Beumer et al., 2008; Cui et al., 2010; Meyer et al., 2010). Although off-target lesions in the genome can result from ZFN treatment, their frequency is significantly lower than at the target site and their location can be predicted based on ZFN specificity (Meng et al., 2008; Perez et al., 2008; Gupta et al., 2011). Together, these characteristics make ZFNs an ideal tool to facilitate reverse genetic studies in zebrafish and other organisms.

Despite their utility, the widespread implementation of ZFNs is hindered by the difficulty in creating highly specific ZFPs. Two general approaches have been employed for this purpose: selection-based methods, which identify ZFPs with a desired specificity from randomized libraries, and assembly-based methods, which utilize archives of predefined zinc fingers to construct ZFPs with a desired specificity. Each approach is hampered by limitations. Selection-based approaches (Greisman and Pabo, 1997; Isalan et al., 1998; Maeder et al., 2008; Meng et al., 2008), although effective in generating specific ZFPs, can be time-consuming and technically challenging (Kim et al., 2010). Assembly-based approaches are more straightforward (Carroll et al., 2006; Kim et al., 2009) but are dependent on the quality of the available zinc-finger archive and the difficulty in predicting context-dependent effects between neighboring zinc fingers (Desjarlais and Berg, 1993; Greisman and Pabo, 1997; Wolfe et al., 1999). Although assembly-based, or modular, ZFNs have suffered from relatively low success rates (Ramirez et al., 2008), continued characterization of available archives and investigation of context dependence (Carroll et al., 2006; Kim et al., 2009; Sander et al., 2009), along with the generation of new archives with expanded specificity, will make this a more viable approach.

In this study we evaluated an assembly-based approach for creating ZFNs for targeted gene disruption in the zebrafish genome. We constructed an archive of ZFP modules recognizing 27 different triplet sequences and characterized the DNA-binding specificity for three-finger ZFP cassettes assembled from this archive. In parallel, we tested ZFNs constructed from these ZFPs for their ability to induce both somatic and germline lesions in zebrafish embryos. To facilitate the application of this technology in the zebrafish community, we developed a publicly accessible database that assists users in modular ZFN construction using our archive. Finally, we characterized the phenotypes in zebrafish embryos bearing a ZFN-induced deletion in the gata2a gene, revealing its role in vascular morphogenesis.


Fish lines

Zebrafish were handled according to established protocols (Westerfield, 1993) and in accordance with Institutional Animal Care and Use Committee (IACUC) guidelines of participating institutions (University of Massachusetts Medical School, University of Pennsylvania, Dana Farber Cancer Institute and Yale University). The Tg(kdrl:egfp)la116 line has been described elsewhere (Choi et al., 2007).

Construction of a modular zinc-finger archive

An archive of zinc-finger modules originating from multiple sources (see Table S1 in the supplementary material and Results) was constructed by subcloning ZFPs into either a pBluescript or pCS2 vector using standard methods. Clones were sequence verified and arrayed into a 96-well plate to generate an address for each module. Plasmids from the archive are available through

Modular assembly of ZFNs

To assemble three-finger ZFPs we first separately amplified modules for each of the three finger positions from plasmid templates by PCR. Distinct primer pairs were used to amplify modules for each position (fingers 1, 2 and 3) (see Table S2 in the supplementary material). PCR was performed using Advantage HF2 polymerase (Clontech) and 20 ng plasmid template. Finger 1 modules were amplified as follows: 94°C for 2 minutes, followed by 20-30 cycles of 94°C for 30 seconds, 60°C for 30 seconds and 68°C for 20 seconds, with a final extension of 68°C for 5 minutes. For finger 2 or 3 modules: 94°C for 2 minutes, then 20-30 cycles of 94°C for 30 seconds, 57°C for 30 seconds and 68°C for 20 seconds, with a final extension of 68°C for 5 minutes. All initial PCR products were gel purified. Three-finger cassettes were constructed by overlapping PCR using 30 ng of each finger module in a 50 μl reaction with Advantage HF2 polymerase. The first five cycles were performed without primers: 94°C for 2 minutes, followed by five cycles of 94°C for 30 seconds, 55°C for 30 seconds and 68°C for 1 minute. We added 2 μl of 10 μM F1 forward and F3 reverse primer and performed the following steps: 24 cycles of 94°C for 15 seconds, 60°C for 30 seconds, 68°C for 1 minute, with a final extension of 68°C for 5 minutes. Final PCR products were gel purified, digested with Acc65I and BamHI and cloned into the pCS2 nuclease backbone in frame with the EL/KK heterodimeric variants of FokI (Miller et al., 2007), such that the 5′ ZFP was fused to the EL nuclease and the 3′ ZFP was fused to the KK nuclease. In several cases, DD/RR nuclease variants (Miller et al., 2007; Szczepek et al., 2007) were used. All constructs were sequence verified.

ZFN injections into zebrafish embryos

Synthesis and injection of ZFN mRNAs, dose optimization (see Table S3 in the supplementary material) and establishment and identification of germline lesions were performed as previously described (Meng et al., 2008; Gupta et al., 2011).

Bacterial one-hybrid binding site selections

Sequence-verified ZFP cassettes were cloned into the 1352-UV2 expression vector using Acc65I and BamHI. Bacterial one-hybrid (B1H) binding site selections were typically carried out at 5 mM 3-amino-1,2,4-triazole (3-AT), 10 μM IPTG and in the absence of uracil as previously described (Noyes et al., 2008; Gupta et al., 2011). For some ZFPs, lower stringency was required to achieve sufficient enrichment of recognition sequences above background.

Illumina library preparation of B1H-selected binding sites

Preparation of selected binding sites for deep sequencing was performed as described (Gupta et al., 2011), except that amplicons were digested with either EcoRI or NotI prior to ligation of bar-coded adapters for the identification of sequences associated with each ZFP (see Table S2 in the supplementary material).

Evaluation of the DNA-binding specificity of each ZFP and individual zinc-finger modules

Recognition motifs within the population of unique 28 bp sequences recovered from each binding site selection were identified using MEME (Bailey and Elkan, 1994). Position weight matrices (PWMs) were generated using the Log-Odds method (Hertz and Stormo, 1999) from the aligned sequences within the most statistically significant motif, weighting each sequence based on the number of counts within the Illumina dataset using the formula: Embedded Image where S is the PWM score over the 9 bp target site s, b,i is the identity of base b at position i, fb,i is the normalized frequency of occurrence of base b at position i, and pb,i is the probability of observing the same base at the same position in a background model, which is assumed to be an equal distribution. Each ZFP was evaluated by calculating the score for its target site from its derived PWM. Each individual finger module was evaluated by calculating the score for its corresponding 3 bp recognition element (assuming canonical recognition) from this PWM. When data were available for multiple identical modules at identical positions, the scores for these modules were averaged for an overall assessment of the quality of the finger module. Modules with (average) scores that were two standard deviations below the average score of all modules at that position (i.e. finger 1, finger 2 or finger 3) were considered `questionable' and flagged.

Analysis of somatic lesion frequency

Somatic lesion frequency was determined by Illumina sequencing as previously described (run #1) (Meng et al., 2008; Gupta et al., 2011). A second round of sequencing was performed for a subset of ZFNs from an independent set of injections (run #2). To avoid cross-contamination with samples from the first analysis, a unique pair of PCR primers was designed for each target site that would not amplify the PCR products from the first trial. These primers encoded the Illumina P1 and P2 sequences within each primer pair (see Table S2 in the supplementary material). In this case, PCR products for each ZFN target site were gel purified and amplified using Illumina genomic DNA primers (1.1 and 2.1). The PCR products for each target were then pooled at equal molar ratio, sequenced at 7 pM on a HiSeq 2000 (Illumina), and analyzed in a manner identical to run #1.

Calculation of the occupancy probability for a ZFN pair at its target site

The probability of a ZFP occupying its 9 bp target site was calculated using statistical thermodynamic free energy derived from a modified PWM score based on its B1H-determined DNA-binding specificity: Embedded Image where fb,i and pb,i are defined as in (1), R=0.001987 kcal/(mol.K) and T=301.5K (28.5°C).

All 9mers within the zebrafish genome (Zv8) were extracted considering both strands then the free energy of binding each 9mer by each ZFP was calculated using the scoring function shown in formula (2). These energies were then used to calculate the probability (P) for on-target binding by the ZFP within the zebrafish genome using: Embedded Image where ΔGs is the free energy of binding to the 9mer matching the desired ZFP target site, ΔGs,i is the free energy of binding of the ith 9mer, ni is the occurrence of the ith 9mer in the genome, and l is total number of 9mers. A threshold for ΔGs,i was set such that ΔGs,i–ΔGs≤4.0 to simulate the non-specific binding affinity of each ZFP. The occupancy probability of each ZFN pair was calculated as the product of the occupancy probabilities of the two ZFPs (3′ ZFP and 5′ ZFP).

Sequence and statistical analyses

All sequence analyses, PWM score calculations, ZFP occupancy calculations and lesion frequency calculations were performed using customized Perl scripts. All statistical analyses, plots and DNA-binding logos were generated using R (

Analysis of gata2aum27 mutant embryos

For phenotypic analysis, embryos were obtained from individual incrosses of heterozygous carriers. To analyze vascular development, gata2aum27 was crossed into the Tg(kdrl:egfp)la116 background. Vascular morphology and circulatory function were observed by confocal microscopy, confocal microangiography and video microscopy as described elsewhere (Covassin et al., 2006); Quantum Dots were from Invitrogen. Whole-mount in situ hybridization was performed as previously described using antisense riboprobes against kdrl, hey2, efnb2a, vegfaa, flt4, tal1 and gata1 (Lawson et al., 2001; Hart et al., 2007). Following phenotypic analysis or in situ hybridization, DNA was isolated from selected individual embryos as described elsewhere (Roman et al., 2002). The presence of the 10 bp um27 deletion was determined by PCR amplification (see Table S2 in the supplementary material for primer sequences), followed by gel electrophoresis or analysis on a QiaXL system (Qiagen). A rescue construct was made by amplifying the gata2a coding sequence by PCR using primers containing the Gateway attB1 and attB2 sites (see Table S2 in the supplementary material), followed by BP cloning into pDONR221 (Invitrogen). The resulting plasmid, pME-gata2a, was used in an LR reaction with pCSDest (Villefranc et al., 2007) to generate pCSgata2a, which was linearized with NotI and used as a template to synthesize mRNA. Embryos derived from crosses between gata2aum27 heterozygous carriers were injected at the 1-cell stage with 200 pg gata2a mRNA, or a comparable amount of mcherry mRNA, and were scored for the presence or absence of trunk circulation at 48 to 55 hours post-fertilization (hpf). Following phenotypic scoring, embryos were genotyped as above.


Overall rationale and experimental approach

To assess ZFNs constructed via modular assembly, we developed the following approach (see Fig. S1 in the supplementary material). We generated a collection of plasmids encoding zinc-finger modules recognizing 27 different triplet sequences. In parallel, we constructed a database of sites within zebrafish protein-coding genes (Zv7) that could be targeted by ZFNs assembled from this archive. From this database, we chose target genes and generated the corresponding ZFP cassettes. We characterized the specificity of these ZFP cassettes and determined their function in ZFNs by assessing their ability to induce somatic and germline lesions in vivo. Finally, we created zebrafish bearing a truncation allele in gata2a and characterized the phenotypes associated with this mutation.

A zinc-finger archive for modular assembly

We compiled an archive of zinc fingers recognizing 27 different triplet sequences in the three finger positions of the Zif268 backbone that can be rapidly assembled by PCR into three-finger ZFPs (see Table S1 in the supplementary material). This archive comprises modules from previously defined finger archives (Segal et al., 1999; Liu et al., 2002; Carroll et al., 2006) or that have been designed based on previously described recognition principles (Isalan et al., 1998; Segal et al., 1999; Wolfe et al., 1999; Dreier et al., 2001; Dreier et al., 2005). We also generated 14 new modules by bacterial one-hybrid (B1H) selection (Meng et al., 2008). In some cases, distinct recognition helices were utilized at different finger positions for a common triplet for the B1H-generated fingers. In general, the archive was focused around GNN recognition elements (see Table S1 in the supplementary material) because of their reliable functional properties (Ramirez et al., 2008). Nine HNG modules with a preference for either G or T at the neighboring 3′ position due to an RSD motif at positions –1, 1 and 2 of the recognition helix were included as they should synergistically recognize a composite sequence (NNGGNN) with GNN modules. Finally, AGA and TGT modules were included to expand the set of triplets that could be specified.

Target gene selection

We identified zebrafish genes that could be targeted using ZFNs constructed from the modular archive. In this analysis, each ZFN monomer consists of a three-finger cassette recognizing a 9 bp sequence. Therefore, we searched all coding exons (including the 10 bp flanking each exon) for potential ZFN sites that constitute adjacent, appropriately oriented 9 bp recognition sites compatible with our archive and separated by a 5 or 6 bp gap. Nearly 75% of all annotated zebrafish protein coding genes contained accessible ZFN sites. To generally assess the archive, targets sites were chosen in more than thirty genes (see Table S4 in the supplementary material) by four different laboratories, in which the corresponding ZFNs were constructed. The target genes are involved in multiple biological processes and are located randomly on 19 different chromosomes throughout the zebrafish genome. We restricted target sites to those that should yield a loss-of-function allele (i.e. those in the 5′ half of the coding sequence or in a known functional domain) in the event of a ZFN-induced lesion. We gave preference to targets in which a restriction enzyme site overlapped the sequence gap between the ZFP binding sites to allow straightforward lesion identification in founder fish.

Binding specificity of modularly assembled ZFPs

To assess the recognition properties of three-finger ZFPs, we determined their DNA-binding specificities by interrogating each ZFP against a 28 bp randomized library using the B1H system (Noyes et al., 2008). The majority of selections (56 out of 76; see Table S5 in the supplementary material) were successful (i.e. yielded a significant increase in colonies over background) under standard stringency (5 mM or 10 mM 3-AT). Selections were successful for 19 of the remaining 20 ZFPs at lower stringency (≤2.5 mM 3-AT). For each ZFP a recognition motif (sequence logo) (Schneider and Stephens, 1990) and position weight matrix (PWM) were generated based on binding sites recovered following selection. In general, determined recognition motifs resembled the desired target motif, although there was variability in the overall quality of the PWM score of the target site across all ZFPs (see Fig. S2 in the supplementary material). For example, the 5p ZFP targeting gata2a properly specified all 9 bp and displayed among the highest PWM scores, suggesting excellent specificity for its target (Fig. 1A). Most ZFPs exhibited more modest specificity for their target, similar to the drosha (rnasen – Zebrafish Information Network) 3p ZFP (Fig. 1B), whereas several displayed weaker preference, such as the nf1b 5p (Fig. 1C). Overall, the distribution of PWM scores for the modular ZFP cassettes was similar to the engineered ZFPs successfully used to target the zebrafish kdrl locus (Fig. 1D) (Meng et al., 2008; Gupta et al., 2011), indicating that the ZFPs constructed using this archive generally display good recognition properties.

Fig. 1.

DNA-binding specificity of modular zinc-finger proteins (ZFPs). (A-C) Sequence logos and position weight matrix (PWM) scores determined for 5′ (5p) or 3′ (3p) target sites for zebrafish (A) gata2a, (B) drosha and (C) nf1b. (D) Box plot depicting meta-analysis of PWM scores for each target site across all ZFPs for which bacterial one-hybrid (B1H) selections were successful. PWM scores for the previously described kdrl ZFPs are shown for reference. Whiskers indicate the largest (smallest) datum still within 1.5 interquartile range (IQR) of the upper (lower) quartile, where outliers are indicated as open circles.

We further analyzed the B1H binding site selection data to assess the quality of individual modules, allowing us to identify those with poor or context-dependent specificity. Overall, most modules displayed a strong preference for their recognition triplet based on individually derived PWM scores (Fig. 2A), although modules at fingers 1 and 3 display lower median scores than those at finger 2. This is likely to be due to fraying effects at the edges of the protein-DNA complex that reduce the specificity of the determinants at these positions (Choo, 1998). Comparison of PWM values across all individual modules revealed that those targeting purine-rich triplets were among the most robust, regardless of their finger position. For example, the GAG module consistently displayed excellent specificity in multiple finger positions in most ZFPs (Fig. 2B), as did modules targeting GAT, GGG and GGT (see Tables S5 and S6 in the supplementary material). We also identified four modules within the archive that displayed poor specificity at a specific position or in a context-dependent manner. For example, the TTG module displays context-dependent alterations in specificity. We originally utilized the TTG module at finger 3 in ZFPs targeting the zebrafish kdrl locus, where it displays moderate preference (NtG) for its triplet (Meng et al., 2008; Gupta et al., 2011), similar to the efnb2a 5p ZFP in this study (Fig. 2C). However, when it is placed at the finger 2 position adjacent to a C-terminal module bearing an RSD motif at positions –1, 1 and 2, the specificity shifts to a strong preference for a GNG binding site [Fig. 2C, zgc:66439 (clec14a – Zebrafish Information Network), 3p ZFP]. This dramatic alteration in specificity is not observed when the neighboring C-terminal module lacks an RSD motif (Fig. 2C, kif1b, 5p ZFP). Overall, the majority of analyzed modules have favorable recognition properties in multiple contexts, although several might specify incorrect triplets in particular contexts.

Fig. 2.

Assessing the quality and behavior of individual ZFP modules. (A) Box plot depicting meta-analysis of PWM values for individual ZFP modules in each of the three positions in the Zif268 backbone. 5p (red diamonds) and 3p (blue circles) scores for fingers in the kdrl ZFPs are shown for reference. Whiskers indicate the largest (smallest) datum still within 1.5 interquartile range (IQR) of the upper (lower) quartile, where outliers are indicated as open circles. (B) Sequence logos for ZFP monomers recognizing GAG in each finger position. (C) Sequence logos for ZFPs containing TTG modules at the indicated finger position.

In vivo activity of modular ZFNs

We evaluated the ability of ZFNs constructed from our modular archive to induce lesions at a target site within the zebrafish genome. Twenty-nine pairs of ZFNs targeting 28 genes (see Table S7 in the supplementary material) were constructed from 76 ZFPs described above (see Table S5 in the supplementary material) by fusing them to engineered heterodimeric FokI nuclease domains (Miller et al., 2007; Szczepek et al., 2007). We assessed lesion frequency by deep sequencing the target region in normal embryos at 24 hpf following injection of mRNAs encoding ZFNs at an optimal dose (see Table S3 in the supplementary material). Eight of the 29 ZFNs displayed lesions frequencies of 1% or more in at least one set of injections (see Table S7 in the supplementary material). The previously characterized kdrl ZFNs were injected as a positive control and generated an in vivo lesion frequency of ∼7%, similar to our previous observations using Illumina sequencing (Gupta et al., 2011).

Germline transmission of ZFN-induced mutant alleles

Deep sequencing is likely to underestimate the frequency of ZFN-induced lesions because of short read lengths, and assessment of somatic lesions in embryos may not reflect the frequency of germline transmission. Therefore, we determined the ability of 12 ZFNs targeting 11 genes to generate founder fish bearing germline lesions at the desired target site (Table 1). For most ZFNs that induced appreciable somatic lesion frequencies, we identified founder fish that transmitted mutant alleles through their germline and did so at a higher frequency than the somatic lesion rate (Table 1). Overall, there was a moderate correlation between the somatic lesion frequency and the founder rate (R2=0.71). Notably, for two different ZFNs [ago2 (eif2c2 – Zebrafish Information Network) and nf1a], we were able to obtain founders despite lesion rates below 1% in somatic cells. Conversely, embryos injected with ZFNs targeting braf failed to yield founders even with a somatic lesion frequency greater than 1% (Table 1, see Table S7 in the supplementary material).

Table 1.

Somatic lesion frequency and germline transmission rate of ZFN-induced mutagenic alleles

Correlations between ZFP specificity and ZFN activity

We next investigated potential correlations between the recognition properties of the ZFPs and the activity of the ZFNs in vivo. This analysis could provide a basis for estimating the likelihood of success of ZFNs generated from this archive. Surprisingly, we did not observe a significant correlation between DNA-binding specificity of the ZFPs and the activity of the respective ZFNs in vivo (Fig. 3A). Specificity alone may not be predictive of activity because genomic sequence is not random and thus activity might depend on the number of favorable binding sites for each ZFP within the genome. Therefore, we estimated the fractional occupancy of each ZFP monomer at its target site based on its DNA-binding specificity and the number of alternative high-affinity binding sites within the zebrafish genome (see Table S7 in the supplementary material). Again, there was no significant difference between the predicted target site occupancy for the active and inactive groups (Fig. 3B).

Although DNA-binding specificity did not appear to definitively predict in vivo ZFN activity, we did observe a strong correlation (P=0.001) between the number of GNN subsites within the recognition sequences and active ZFNs (Fig. 3C), consistent with previous observations (Ramirez et al., 2008). One additional prominent element of our module archive is the inclusion of NNG fingers. There are, on average, more NNG fingers in active ZFNs than inactive ZFNs (Fig. 3D), but the difference is not significant (P=0.36). Combined, a significant difference (P=0.019) exists between the average total number of GNN and NNG fingers in the active versus inactive groups of ZFNs, but the majority of the predictive value of this combination is derived from the GNN fingers.

Fig. 3.

Correlating ZFP characteristics with in vivo zinc-finger nuclease (ZFN) activity. Box plots depicting (A) PWM score, (B) estimated target site occupancy, (C) number of GNN modules in a ZFN pair and (D) number of NNG modules in a ZFN pair in both active and inactive ZFNs, where red dots indicate the mean of each population. Active ZFNs are those in which somatic lesion frequencies were greater than 0.5% or that successfully generated founders for mutant alleles. Whiskers indicate the largest (smallest) datum still within 1.5 interquartile range (IQR) of the upper (lower) quartile, where outliers are indicated as open circles.

A zebrafish database for modular ZFNs

To facilitate ZFN construction by the zebrafish community, we designed a web-accessible database ( that can be used in a standard workflow (see Fig. S3 in the supplementary material), allowing researchers to easily construct ZFNs, generate founder zebrafish bearing targeted lesions, and identify associated phenotypes within 7-12 months. A user can identify ZFN target sites in a gene of interest by entering or uploading an ENSEMBL gene number or RefSeq ID. The resulting output provides a count of ZFN target sites, a gene description and abbreviation, and a link to the locus on the UCSC genome browser (Fig. 4A). The user can view sites within one particular gene or all targets using a sortable list (Fig. 4B) that aids identification of optimal sites based on position (exon rank), presence of a restriction enzyme site in the spacer sequence, or a revised efficacy score. The efficacy score has a maximum of 12 points, where one point is assigned for each guanine at the edge of each recognition triplet (i.e. GNN and NNG fingers), as active ZFNs are enriched for these contacts (see above). The database also notes single-nucleotide polymorphisms in the target site in the reference Tuebingen (Tu) and AB wild-type zebrafish strains that have been sequenced by the Sanger Center (D. Stemple, personal communication), although we recommend sequencing the target site in wild-type strains other than the hybrid Tu/AB (SAT) reference line.

Fig. 4.

Screenshots from a modular ZFP/ZFN database. (A) Example output from a search for ZFN target sites. Circle indicates the link to the page shown in B. (B) List of targets in the fgf24 gene with associated information. Circled number indicates the link to individual target site and ZFN information shown in C. (C) Example output page with individual target site information.

Once a target site is chosen, the user can click on the ZFN entry (QueryID; see Fig. 4B) for details about each construct (Fig. 4C), such as the target site sequence, the amino acid sequence of the recognition helices for the assembled ZFNs, and clone IDs for each plasmid in the archive (available at to facilitate overlapping PCR of the appropriate modules (see Table S1 in the supplementary material). Alternatively, the DNA sequence of the ZFP cassette is included for direct synthesis. In either case, the inclusion of the amino acid sequence for each entry (Fig. 4C) aids in sequence validation of ZFN-containing plasmids. ZFNs containing modules with poor or context-dependent specificity (see above) are flagged with an asterisk. For lesion detection, we also include restriction sites, when present, within the spacer region separating the ZFP binding sites, along with sequences of flanking primers for PCR amplification. The primer and restriction enzyme information also facilitates easy genotypic analysis of founders and mutant embryos. Data from these pages can be exported into Excel (Microsoft) for easy reference. A query page ( is available to identify ZFN sites in sequences not contained within our database.

A novel role for gata2a in vascular development

To demonstrate the application of our archive to interrogate gene function and to provide an example of the workflow shown in Fig. S3 in the supplementary material, we investigated defects associated with a ZFN-induced mutation in the gata2a gene. Gata2 is a zinc-finger transcription factor that is essential for definitive hematopoiesis and maintenance of stem cell progenitors in both embryos and adults (Tsai et al., 1994; Rodrigues et al., 2005). Gata2 has also been implicated in angiogenesis and is associated with early onset coronary artery disease in humans (Connelly et al., 2006; Mammoto et al., 2009), but its role in vascular development is unknown. As described above, we constructed ZFNs targeting the fourth exon of gata2a. The target sequence is upstream of two zinc fingers required for function of the mammalian Gata2 ortholog (Minegishi et al., 2003) and, therefore, truncation in this region would be expected to generate a null allele (Fig. 5A). Following injection of ZFNs targeting this site, we identified a founder fish bearing a 10 bp deletion allele, referred to as gata2aum27, which causes a frameshift to a premature stop codon (Fig. 5A,B).

To identify phenotypes caused by gata2a deficiency we observed embryos derived from an incross of gata2aum27 heterozygous carriers derived from the original founder. At 24 and 48 hpf, all embryos appeared morphologically normal, including normal development of neural tube, notochord and somites (see Fig. S4 in the supplementary material; data not shown). However, ∼25% of embryos failed to display trunk blood vessel circulation by 48 hpf (see Movie 1 in the supplementary material, Table 2) whereas the remaining siblings were normal (see Movie 2 in the supplementary material). Genotypic analysis demonstrated that embryos with defective trunk circulation were homozygous for the um27 deletion, whereas normal embryos were heterozygous or homozygous for the wild-type allele (Fig. 5C, Table 2). Importantly, injection of mRNA encoding wild-type Gata2a could rescue trunk circulation in gata2aum27 mutant embryos (16 out of 26 mutant embryos in two separate experiments) (see Movie 3 in the supplementary material, Fig. 5D,E), whereas injection with mcherry mRNA did not (0 out of 13 mutant embryos in two separate experiments) (see Movie 4 in the supplementary material, Fig. 5D,E). Closer inspection of gata2aum27 mutant embryos revealed the occurrence of pulsating blood cells trapped in the trunk blood vessels and abnormal circulatory connections, or shunts, between the dorsal aorta and posterior cardinal vein (see Movies 1 and 5 in the supplementary material). Despite these defects, mutant embryos displayed a beating heart, the presence of blood cells and circulation through cranial blood vessels (see Movie 6 in the supplementary material), similar to sibling embryos with trunk circulation (see Movie 7 in the supplementary material). Together, these observations demonstrate that gata2a deficiency leads to a specific defect in circulatory function in embryonic zebrafish.

Fig. 5.

The ZFN-induced um27 lesion is a truncation allele of gata2a. (A) Location of frameshift caused by the um27 deletion in the zebrafish gata2a gene. ZF, zinc-finger domain. (B) gata2a coding and amino acid sequence in the region of the um27 deletion. ZFN recognition sequences are boxed. (C) Genotype of embryos derived from an incross of gata2aum27 heterozygous carriers. (D) Genotype of embryos with normal circulation derived from a gata2aum27 incross injected with 200 pg gata2a mRNA. Asterisks denote mutant embryos with normal trunk circulation. (E) Percentage of gata2aum27 mutant embryos displaying the indicated circulation phenotypes.

To further characterize the phenotype caused by loss of gata2a, we performed confocal microangiography on wild-type and gata2aum27 mutant siblings bearing the Tg(kdrl:egfp)la116 transgene at 50 hpf. Consistent with normal head circulation in gata2aum27 mutant embryos, we did not note any overt defects in vascular morphology of the cranial vessels when compared with wild-type siblings (Fig. 6A,B). We observed normal formation of arteries and veins within the head, which are fully perfused following angiography (Fig. 6A,B). By contrast, the lateral dorsal aortae in gata2aum27 mutant embryos are more dilated than in wild-type siblings, although these vessels are perfused, suggesting that lumenization is not affected (Fig. 6C,D). Within the trunk, we observed specific defects in vascular morphology. Whereas intersegmental vessels appeared largely normal in gata2aum27 mutant embryos, the dorsal aorta was discontinuous (Fig. 6E). Accordingly, gata2aum27 mutant trunk vessels were poorly perfused following angiography when compared with their wild-type siblings (Fig. 6E,F). These analyses suggest a specific defect in the morphogenesis of the dorsal aorta in gata2aum27 mutant embryos, which would be the likely cause of the arteriovenous shunts noted above.

Table 2.

Linkage of gata2aum27 to circulatory defects in mutant embryos

We have previously observed aorta morphogenesis defects and arteriovenous shunts in zebrafish embryos deficient for Notch or Vegf signaling (Lawson et al., 2001; Covassin et al., 2006). In these cases, these defects were associated with loss of arterial endothelial cell identity. However, gata2aum27 mutant embryos did not display loss of artery markers, such as efnb2a, and exhibited normal levels of vegfaa expression (data not shown). Similarly, the hey2 gene, which is also required for proper dorsal aorta morphogenesis (Zhong et al., 2000), was expressed at normal levels in gata2aum27 mutants (Fig. 7A,B). We also did not observe any significant changes in the expression of blood markers such as gata1 and tal1 (Fig. 7C,D; data not shown). However, gata2aum27 mutant embryos displayed a subtle, but consistent, downregulation of kdrl, the functional ortholog of Vegfr2, when compared with wild-type siblings (Fig. 7E,F). Taken together, our results demonstrate that gata2a function is essential for the proper morphogenesis of the dorsal aorta, but may be dispensable for arterial endothelial differentiation.


With the increasing characterization of the genomes of model and non-model organisms, there is a broad need for reverse genetic approaches to assess gene function in multiple species. ZFNs allow the direct introduction of targeted germline lesions in vivo, without the need for species-matched embryonic stem cell lines, thus providing a general means to determine gene function. However, the primary limitation hindering the widespread employment of this technology has been the lack of an affordable, simple method for constructing ZFNs that are active in vivo. Modularly assembled ZFNs provide a rapid and effective method for introducing targeted lesions into the genomes of animals. The specificity analysis of our assembled ZFPs indicates that the majority have similar recognition potential to the selected ZFPs incorporated into our kdrl ZFNs (Meng et al., 2008; Gupta et al., 2011). Consequently, a sizable fraction (29%) of ZFNs constructed from these modules displayed significant activity (≥1%) for the generation of somatic lesions, establishing the utility of this archive for targeted genome manipulation in complex genomes. This success rate compares favorably with that of other single-finger modular assembly archives for constructing ZFNs (Ramirez et al., 2008; Kim et al., 2009; Kim et al., 2011). Moreover, characterization of DNA-binding specificity provides additional information about the performance of individual modules within this archive and will facilitate its subsequent improvement.

Fig. 6.

gata2aum27 mutant embryos display defects in aorta morphogenesis. Confocal microangiography using Quantum Dots in Tg(kdrl:egfp)la116 transgenic zebrafish embryos. Endothelial cells are green and vessel perfusion is red. (A,B) Cranial blood vessels in gata2aum27 mutant (A) and wild-type (B) embryos. Lateral views, anterior to the left, dorsal is up. (C,D) Lateral dorsal aortae (arrows) in gata2aum27 mutant (C) and wild-type (D) embryos. Dorsal views, anterior is up. (E,F) Trunk blood vessels in gata2aum27 mutant (E) and wild-type (F) embryos. Lateral views, anterior to the left, dorsal is up. Segmental vessels are indicated by arrowheads and the dorsal aorta by an arrow; the asterisk indicates a region of aorta that failed to form.

Surprisingly, we did not find a strong correlation between ZFP specificity and in vivo ZFN activity. Previous studies have shown that improvements in DNA-binding specificity can lead to improved ZFN activity (Cornu et al., 2008) and precision (Gupta et al., 2011). It is likely that other factors, such as ZFP affinity, also significantly influence ZFN activity. Whereas the recognition motifs generated using the B1H system provide an estimate of ZFP specificity, our analysis utilized varying stringencies to achieve enrichment of binding sites, possibly bypassing variations that could reflect differences in affinity. That said, seven of eight ZFNs that contained at least one ZFP that displayed low activity in the B1H system (i.e. binding site selections required a stringency below 5 mM 3-AT) displayed low somatic lesion rates (<0.5%). This correlation is consistent with studies utilizing the bacterial two-hybrid system, in which highly active ZFPs often perform well in ZFNs (Ramirez et al., 2008; Lam et al., 2011). Thus, a more refined analysis of ZFP affinity might provide additional insights into the behavior of our archive. Additionally, ZFN activity could be affected by properties of the endogenous target sequence (e.g. chromatin architecture or DNA methylation status) that could interfere with nuclease recognition or function.

We did observe significantly higher rates of ZFN activity as a function of the number of GNN fingers, consistent with previous observations (Ramirez et al., 2008), and a slight increase in the average number of NNG fingers in the active versus inactive ZFN population. In both sets of modules, arginine-guanine interactions, which can contribute dramatically to binding affinity and specificity (Elrod-Erickson and Pabo, 1999), appear to be critical linchpins for the formation of active ZFN complexes. Consequently, this information is incorporated into a simple scoring function for evaluating our ZFNs that notes the number of these contacts. Consistent with the importance of arginine-guanine interactions, their preservation is a defining feature of active off-target sites for kdrl ZFNs in zebrafish (Gupta et al., 2011). Clearly, the presence of a large number of arginine-guanine interactions is not required for the construction of active ZFNs (Hockemeyer et al., 2009), but these particular examples employed larger numbers of fingers, which might compensate for the absence of these favorable interactions.

Fig. 7.

gata2aum27 mutant embryos show reduced kdrl expression but normal artery and blood differentiation. (A-F) Whole-mount in situ hybridization at 24 hpf using antisense riboprobes against hey2 (A,B), tal1 (C,D) and kdrl (E,F) in genotypically wild-type zebrafish embryos (A,C,E) and those homozygous for the um27 deletion (B,D,F). Lateral views, anterior to the left, dorsal is up.

A limitation of the modular assembly approach highlighted by our analysis is position or context-dependent specificity of individual modules. In other studies, this problem has been partially mitigated by the creation of extensive archives of selected two-finger modules that allow the assembly of multiple ZFPs against a single target site. For example, Sangamo BioSciences employs a proprietary archive of two-finger modules with origins in context-dependent selection for ZFN assembly (Isalan and Choo, 2001). Recently, the Zinc Finger Consortium reported the selection and construction of an archive of two-finger modules, in which three-finger proteins are created from the assembly of overlapping two-finger modules (CoDA) (Sander et al., 2011). By reducing context-dependent effects both methods are anticipated to improve the success rate for the assembly of functional ZFPs over those assembled from single-finger archives. For the Zinc Finger Consortium-based CoDA ZFPs this has been directly demonstrated on a large scale, in which ∼50% of ZFNs were active in vivo (Sander et al., 2011), although only ZFPs that performed well in a bacterial two-hybrid assay were used for in vivo testing in this case. Taking this prefiltering into account, the in vivo efficacy of ZFNs from our archive compares favorably to the CoDA approach. Furthermore, our archive is capable of targeting genomic sequences that are not accessible using CoDA because of differences in the recognition sequences of modules between these archives. Finally, the detailed characterization of ZFPs constructed from our archive allowed us to identify context-dependent or otherwise poorly performing modules, facilitating future improvements to our collection. Taken together, we believe that our archive and the associated computational resources will provide a viable and accessible platform for the generation of gene knockouts in zebrafish, as well as in other organisms.

A number of studies have now described ZFNs that are capable of producing somatic lesions in zebrafish embryos. However, only a handful of new mutants have been generated and there is a paucity of data concerning germline transmission of ZFN-induced alleles. To date, the modular archive described here has been used to generate eight new zebrafish mutants. Significantly, several of these new lines have been used to reveal novel aspects of microRNA biogenesis and vascular development (Siekmann et al., 2009; Cifuentes et al., 2010), whereas others, such as nf1a and nf1b, will serve as important disease models (Padmanabhan et al., 2009; Lee et al., 2010). Equally importantly, our archive has been used to independently generate these mutant lines in four separate laboratories, suggesting that this approach can be easily implemented within the research community at large to create zebrafish lines bearing mutations in genes of interest.

Among the new mutants that we have generated is a truncation in the gata2a gene. We observed that gata2a-deficient embryos displayed defects in the formation of the dorsal aorta, yet the remainder of the vascular system is remarkably unaffected. This phenotype is reminiscent of embryos lacking Vegf or Notch signaling components and is often attributed to the loss of properly specified artery and vein endothelial identity (Lawson et al., 2001; Covassin et al., 2006). However, we did not observe obvious loss of artery marker gene expression. These results are somewhat surprising and suggest that vascular morphogenesis and endothelial differentiation might be uncoupled at some point downstream of Vegf. Alternatively, the phenotype of gata2aum27 mutant embryos might reflect a defect in endothelial mechanosensation that affects morphogenesis independently of differentiation. Indeed, we do not observe any major changes in vascular morphology until the onset of circulation, although kdrl expression is slightly downregulated at 24 hpf, prior to blood flow. In addition, GATA2 has recently been implicated in the induction of VEGFR2 in human endothelial cells in response to increased extracellular matrix stiffening (Mammoto et al., 2009), suggesting a role for this transcription factor in mediating endothelial mechanosensation. In any case, our observations reveal an unexpected and previously undescribed role for gata2a in artery morphogenesis. Importantly, the gata2aum27 mutant generated using modular ZFNs now provides an important tool to further characterize the observed vascular phenotype and to identify relevant target genes that are required for vascular morphogenesis or function.


We thank Craig Ceol, Stefania Nicoli and Fatma Kok for critical reading of the manuscript, John Polli for excellent fish care, and Derek Stemple for providing the Tu and AB genome sequences.


  • Funding

    This work was funded by grants from the NIH National Heart, Lung, and Blood Institute to S.A.W. and N.D.L. (R01 HL093766) and to N.D.L. (R01 HL079266) and from the Department of Defense to A.T.L. and J.A.E. (DOD, NF050175), and from the National Institute of General Medical Sciences to A.J.G. (R01 GM081602). Deposited in PMC for release after 12 months.

  • Competing interests statement

    The authors declare no competing financial interests.

  • Supplementary material

    Supplementary material for this article is available at

  • Accepted August 3, 2011.


View Abstract