Control of Hox transcription factor concentration and cell-to-cell variability by an auto-regulatory switch.

The variability in transcription factor concentration among cells is an important developmental determinant, yet how variability is controlled remains poorly understood. Studies of variability have focused predominantly on monitoring mRNA production noise. Little information exists about transcription factor protein variability, as this requires the use of quantitative methods with single-molecule sensitivity. Using Fluorescence Correlation Spectroscopy (FCS), we have characterized the concentration and variability of 14 endogenously tagged TFs in live Drosophila imaginal discs. For the Hox TF Antennapedia, we investigated whether protein variability results from random stochastic events or is developmentally regulated. We found that Antennapedia transitioned from low concentration/high variability early, to high concentration/low variability later, in development. FCS and temporally resolved genetic studies uncovered that Antennapedia itself is necessary and sufficient to drive a developmental regulatory switch from auto-activation to auto-repression, thereby reducing variability. This switch is controlled by progressive changes in relative concentrations of preferentially activating and repressing Antennapedia isoforms, which bind chromatin with different affinities. Mathematical modeling demonstrated that the experimentally supported auto-regulatory circuit can explain the increase of Antennapedia concentration and suppression of variability over time.


Figure S1. Measurement of average concentrations and nucleus-to-nucleus variability of 14 endogenously-tagged TFs in Drosophila imaginal discs by FCS.
(A-P) Fluorescence imaging of TFs, showing their expression pattern in imaginal discs and the salivary gland. White arrows indicate regions where FCS measurements of endogenous intra-nuclear concentration were performed and the average concentrations are given for each TF. Images have been contrasted for visualization purposes. For the Antp and Grn TFs, both leg and wing imaginal discs have been used for measurements. Average concentrations of TFs measured in different cells span a range of two orders of magnitude, from few tens to a thousand nanomolar. Scale bars denote 100 μm, unless otherwise indicated. (Q) Characterization of nucleus-tonucleus variability among neighboring cells within the same expression domain in imaginal discs of the 14 TF studied by FCS. Black bars show concentration averages (with error bars representing 1 standard deviation), whereas grey bars show the variability, i.e. the squared coefficient of variability (expressed as the variance over the     (A) Schematic representation of the Antp-eGFP fusion protein produced by the conversion of the MiMIC MI02272 construct to an artificial exon. The eGFPencoding artificial exon is situated in intron 6 of the mRNA and is spliced in between exons 6 and 7 that correspond to the long and non-conserved N-terminal coding sequence of the protein, which has little (if any) function in vivo (Papadopoulos et al., 2011), and does not disrupt the homeodomain or YPWM motif. All features have been drawn to scale. (B) Heterozygous flies (embryos and third instar larvae), examined for their Antp-eGFP pattern (detected by an antibody to GFP, green), as compared to the total amount of Antp (expressed by the sum of the MiMIC Antp-eGFP and the wild type Antp loci), detected by an Antp antibody (magenta). Comparisons of the Antp pattern in wild type embryos and all thoracic imaginal discs are provided case-wise in the right panel. In discs, dashed lines approximately separate the anterior (indicated by "A") from the posterior (indicated by "P") domain of the disc. Note the high expression of Antp in the humeral disc. In the leg discs, Antp is expressed most strongly in the posterior compartment of the prothoracic leg disc, the anterior compartment of the mesothoracic leg disc and in an abundant pattern in the metathoracic leg disc. Cyan arrows point to Antp positive cells in the second and third leg discs that are centrally located, as previously shown (Engstrom et al., 1992). All images represent Zprojections. Scale bars denote 100 μm.  (C) Quantification of average concentrations and cell-to-cell variability in protein concentration among neighboring nuclei in wing and leg, second and third instar, discs. Black bars denote the average concentration and grey bars denote the variability, expressed as the variance over the squared mean. Note the increase in average concentration from second to third instar (eleven-fold increase in the leg disc) and the concurrent drop in variability to almost half of its value. Statistical significance was determined using Student's two-tailed T-test [***P<0.001, namely P (3rd -2nd instar leg) = 4.4 × 10 −18 and P (3rd -2nd instar wing)= 3.2 × 10 −8 ]. Development: doi:10.1242/dev.168179: Supplementary information   constants for the long and short linker isoforms was calculated to be , ℎ .

>2.3
(for the calculation refer to Supplement 3). The two dissociation constants differ at least 2.3 times, indicating stronger binding of the short linker isoform to the DNA, as compared to the long linker one.

Figure S11. In vitro binding study of Antp full-length long and short linker isoforms to Antp and homeodomain binding sites by gel-shift assays (Electrophoretic Mobility Shift Assays -EMSAs).
Full-length Antp short and long linker variants (transcript variants RM and RN), encoding activating and repressing Antp isoforms, respectively, were cloned into the pET21b(+) vector (Novagen), which features a C-terminal 6xHis tag, and expressed in Rosetta TM 2 cells (Novagen), following the manufacturer's standard protocol. The two proteins were then Ni-column purified and subjected to gel-filtration. The concentrations of purified proteins were then compared by Western blotting, using the anti-Antp 4C3 antibody (Developmental Studies Hybridoma Bank, University of Iowa), and equal starting concentrations were used in the indicated serial dilutions (A-E) in gel-shift experiments. The BS1and BS2 binding sites have been identified ~2 kb upstream of the engrailed gene promoter and characterized for Antp binding previously (Affolter et al., 1990). The HB1 binding site has been described previously (Keegan et al., 1997) and is a binding site found in the intron of the mouse Hoxa-4 gene. The D4 probe has been characterized previously (Duncan et al., 2010) as a functional element in the spineless gene. The fkh250con binding site has been described previously (Ryoo and Mann, 1999). The same procedure was followed for EMSA, as previously described (Bhatia et al., 2013). Double-stranded DNA fragments were purchased from Integrated DNA Technologies and were 5' 6-FAM end-labelled. Images were obtained using a Fujifilm FLA-5100 Fluorescent Image Analyser. (A-E) Gel-shift experiments using purified full-length Antp protein, featuring a long or a short linker, with 100 μΜ fluorescently labelled probe show stronger binding of the short linker isoform to all investigated binding sites, as compared to its long linker counterpart.

Background on Fluorescence Microscopy Imaging and FCS
Two individually modified instruments (Zeiss, LSM 510 and 780, ConfoCor 3) with fully integrated FCS/CLSM optical pathways were used for imaging. The detection efficiency of CLSM imaging was significantly improved by the introduction of APD detectors. As compared to PMTs, which are normally used as detectors in conventional CLSM, the APDs are characterized by higher quantum yield and collection efficiencyabout 70 % in APDs as compared to 15 -25 % in PMTs, higher gain, negligible dark current and better efficiency in the red part of the spectrum. Enhanced fluorescence detection efficiency enabled image collection using fast scanning (1 − 5 μs/pixel). This enhances further the signal-to-noise-ratio by avoiding fluorescence loss due to triplet state formation, enabling fluorescence imaging with single-molecule sensitivity. In addition, low laser intensities (150-750 μ ) could be applied for imaging, significantly reducing the photo-toxicity (Vukojevic et al., 2008).
FCS measurements are performed by recording fluorescence intensity fluctuations in a very small, approximately ellipsoidal observation volume element (OVE) (about 0.2 μm wide and 1 μm long) that is generated in imaginal disc cells by focusing the laser light through the microscope objective and by collecting the fluorescence light through the same objective using a pinhole in front of the detector to block out-of-focus light. The fluorescence intensity fluctuations, caused by fluorescently labeled molecules passing through the OVE are analyzed using temporal autocorrelation analysis.
In temporal autocorrelation analysis we first derive the autocorrelation function To derive information about molecular numbers and their corresponding diffusion time, the experimentally obtained autocorrelation curves are compared to autocorrelation functions derived for different model systems, and the model describing free three dimensional (3D) diffusion of two components and triplet formation was identified as the simplest and best suited for fitting the experimentally derived autocorrelation curves, and was used throughout: In the above equation, N is the average number of molecules in the OVE; y is the fraction of the slowly moving Antp-eGFP molecules; 1 is the diffusion time of the free Antp-eGFP molecules; 2 is the diffusion time of Antp-eGFP molecules undergoing nonspecific interactions with the DNA; and are radial and axial parameters, respectively, related to spatial properties of the OVE; T is the average equilibrium fraction of molecules in the triplet state; and the triplet correlation time  (Muller et al., 2008). The diffusion time, , measured by FCS, is related to the translation diffusion coefficient D by: = 2 4 (S3). To establish that Antp molecules diffusing through the OVE are the underlying cause of the recorded fluorescence intensity fluctuations, we plotted the characteristic decay times 1 and 2 , obtained by FCS, as a function of the total concentration of Antp molecules (Supplemental Fig. S2). We observed that both characteristic decay times remain stable for increasing total concentration of Antp molecules, signifying that the underlying process triggering the fluorescence intensity fluctuations is diffusion of fluorescent Antp molecules through the OVE (which is independent of the total concentration of Antp molecules).
In order to ascertain that the interpretation and fitting of FCS curves is correct, we have: (1) tested several laser intensities in our FCS measurements and have utilized the highest laser intensity, for which the highest counts per second and molecule (CPSM) were obtained, while photobleaching was not observed; (2) we have established that CPSM do not change among FCS measurements performed in cells expressing Antp endogenously, or overexpressed with different Gal4 drivers. Moreover, we have previously shown that both characteristic decay times increase when the size of the OVE is increased (Fig. 4 in (Vukojevic et al., 2010)). Together, these lines of evidence indicate that both short and long characteristic decay times are generated by molecular diffusion rather than by photophysical and/or chemical processes such as eGFP protonation/deprotonation; (3) we have ascertained that the long characteristic decay time of our FCS measurements is not the result of photobleaching and that differences in the relative amplitudes of the fast and slow diffusing components reflect differences in their concentrations among cells.
While we have taken all possible precautions to ascertain that the correct model for FCS data fitting is applied, some inevitable limitations still remain. For example, FCS cannot account for Antp molecules with irreversibly photobleached fluorophores or with fluorophores residing for various reasons in dark states. In addition, FCS cannot account for Antp molecules associated with large immobile structures, such as specifically bound Antp molecules. These molecules contribute to the overall background signal, but they do not give rise to fluorescence intensity fluctuations. As a consequence, transcription factor concentration can be somewhat underestimated by FCS. In contrast, the number of transcription factor molecules may also be overestimated by FCS, when high background signal as compared to fluorescence intensity may lead to an artificially low amplitude of FCS curves, and, hence, overestimation of molecular numbers. To avoid artifacts due to photobleaching, the incident laser intensity was kept as low as possible but sufficiently high to allow high signal-to-noise ratio. This is because photobleaching of fluorophores may induce errors in the measurements of molecular numbers and lateral diffusion, yielding both smaller number of molecules and shorter values of D, and hence apparently larger diffusion coefficients. Finally, contribution of brightness, i.e. brightness squared, to the correlation function was not analyzed, which may in turn affect quantification of Antp numbers. Development: doi:10.1242/dev.168179: Supplementary information

Calculation of the concentration of endogenous TFs and average number of molecules in imaginal disc cell nuclei from FCS measurements (exemplified for Antp)
Experimentally derived FCS curves were analyzed by fitting, using the model function for free three-dimensional diffusion of two components with triplet formation, equation (S2), to derive the average number of molecules in the OVE ( ); the diffusion time of the free Antp-eGFP molecules ( 1 ); the diffusion time of Antp-eGFP molecules undergoing interactions with the DNA ( 2 ); and the relative fraction of Antp-eGFP molecules that are engaged in interactions with chromatin and therefore move slowly ( ).
In order to translate the average number of molecules in the OVE ( ) into molar concentration, the size of the OVE, i.e. the axial and radial parameters ( and , respectively) were determined in calibration experiments with Alexa488 or Rhodamine 6G dyes, using equation (S3). The volume of the OVE, approximated by a prolate ellipsoid, was determined as follows:

Calculation of the ratio of apparent Antp dissociation constant for short and long linker Antp isoforms from FCS measurements on ectopically expressed Antp
Antp undergoes both specific and non-specific interactions with DNA, with nonspecific interactions preceding the specific ones and effectively assisting the binding to a specific target site by facilitated diffusion (Halford and Marko, 2004). The searching for specific binding sites can be described as a two-step process of consecutive reactions (Vukojevic et al., 2010): (S10).
The turnover rate for the non-specific complex is: . Assuming a quasi-steady state approximation: (S13). Using the mass balance equation to express the concentration of the free TF: (S14) and assuming that: [ ] ≈ [ ] 0 (S15), equation (S13) becomes: According to equation (S19) and the FCS data presented in Supplemental Fig.  S10 . Thus, the concentration of specific complex between Antp-eGFP and DNA in the wing disc cell nuclei can be estimated to be: [( − − ) ] = 15.62 (S23). The average concentration of free-diffusing Antp-eGFP molecules is determined as follows: 28 − 0.34 • 785.28 + 5.31 − 15.62 = 507.97 (S24). Using the experimentally determined concentration of specific DNA-Antp-eGFP complexes (equation (S23)), we could estimate the dissociation constant for the specific DNA-Antp-eGFP, as a function of the total concentration of specific Antp binding sites, to be: = 0.24 (S26). and the intercept: If − is small compared to (S30). Using the experimentally determined concentration of specific DNA-Antp-eGFP complexes (equation (S29)), we could estimate the dissociation constant for the specific DNA-Antp-eGFP, as a function of the total concentration of specific Antp binding sites, to be: indicating a roughly 2.5fold higher affinity of the short linker repressive isoform. In addition, equations (S20) and (S26) contain information about the ratio of the apparent equilibrium dissociation constants for nonspecific interactions [ (Vukojevic et al., 2010) From these relationships, the ratio of the apparent equilibrium dissociation constants for nonspecific interactions can be estimated to be: , , > 1.63 • , , ℎ (S38). Thus, our analysis shows that the short linker, which is the preferentially repressing isoform, binds with higher affinity (lower ) to both specific and nonspecific binding sites on the DNA [(S33) and (S38), respectively]. This, in turn, implies that the short linker is also more efficient in searching for specific TF binding sites, as evident from the lower dissociation constant for nonspecific DNA interactions of the short linker isoform (Sela and Lukatsky, 2011;Soltani et al., 2015), and that it binds with lower apparent dissociation constant to specific binding sites on the DNA.

Stochastic modeling of Antennapedia expression
In the following, we develop a simple mathematical model that is able to explain the behavior of Antp expression at early and late developmental stages. The Antp promoter is modeled as a continuous-time Markov chain with three distinct transcriptional states. In the absence of Antp, the promoter is in an unbound state ("U"), in which transcription is inactive. From this state, the promoter can switch to a transcriptionally active state "A" at a rate, which we consider to be proportional to the concentration of the long-linker, activating isoform of Antp. Analogously, repression of the promoter by the short-linker isoform of Antp is modeled by an additional transcriptionally inactive state "R", which can be reached from state "U" at a rate proportional to the concentration of that isoform. The corresponding reverse transitions from states "R" and "A" back into state "U" are assumed to happen at a constant rate . Since the activating isoform can potentially also repress the promoter, we assume that state "R" can be reached also from the active state "A". Similarly, we model a potential link also in the reverse direction from state "A" to "R". Depending on the model variant, we consider this transition to happen either at a constant rate (competitive promoter model) or at a rate proportional to the concentration of the repressing isoform of Antp (non-competitive promoter model). In the latter case, repression through short-linker isoforms can take place even if a long-linker isoform is already bound to the promoter. As we have demonstrated in Fig. 5A,B, the two model variants yield qualitative differences in Antp expression. For the sake of illustration, the following description focuses on the non-competitive model variant but we remark that the competitive model can be derived analogously.
At a particular time point , the transcription rate of Antp is determined by the current state of the promoter, i.e., ( ) {0, , 0}, with as the transcription rate associated with state "A". In line with our experimental findings, we assume that transcripts are spliced into the activating and repressing isoforms at different rates and , respectively. This allows us to capture the imbalance between the two isoforms that was revealed by our FCS data. The overall expression rates for the two isoforms of Antp are then given by ℎ ( ) = ( ) and ℎ ( ) = ( ) , whereas is a random variable that accounts for extrinsic variability in gene expression rates (Zechner et al., 2012). In all of our analyses, we model as a Gamma-distributed random variable ~( , ) with and as shape and inverse scale parameters of that distribution. In summary, we describe the auto-regulatory circuit of Antp expression by a Markovian reaction network of the form: with ( )and ( ) as the concentration of the activating and repressing isoforms of Antp, as the protein half-live and as a coefficient accounting for cooperativity in the binding of Antp to the promoter. The initial conditions (0) and (0) were drawn randomly in accordance with our concentration measurements at early stages. In particular, we assume that the total amount of Antp in each cell is drawn from a negative binomial distribution such that ~( , ), with = (1 − )/ and 2 = 1/ (1 − ) as the mean and squared coefficient of variation of this distribution. The total number of Antp molecules was then randomly partitioned into fractions of repressing and activating isoforms according to a Beta distribution. More specifically, we set (0) = and (0) = (1 − ) with ~( , ). The parameters , , and were chosen based on our experimental data (see Table 1).
Due to the fact that Antp expression takes place at the timescale of several hours to days, we can further simplify our model from (S39). In particular, we can make use of a quasi-steady state assumption (Rao and Arkin, 2003), by assuming that promoter switching due to binding and unbinding of the different Antp isoforms occurs at a much faster timescale than production and degradation of Antp. As a consequence, we can replace the stochastic gene expression rates of the two isoforms by their expected value, whereas the expectation is taken with respect to the quasistationary distribution of the three-state promoter model. More precisely, we have: (S40), with [ ( )] = 0 + + 0 = as the quasi-stationary probabilities of finding the promoter in state "U", "A" and "R", respectively. These probabilities can be derived from the generator matrix of the three-state promoter model, which reads: Assuming that ( ) and ( ) remain roughly constant on the timescale of the promoter, the quasi-stationary distribution can be determined by the null-space of Q, which is given by: Correspondingly, the expectation of ( ) becomes: [ ( )] = (0 0) ( ) = (2 ( ) + ( ) ) (2 + ( ) )( + ( ) + ( ) ) ≔ ̅ ( ( ), ( )) (S43). The simplified model of Antp expression can then be compactly written as two coupled birth-and-death processes: with ℎ ̅ ( ( ), ( )) = ̅ ( ( ), ( )) and ℎ ̅ ( ( ), ( )) = ̅ ( ( ), ( )) . In all our simulation studies, the circuit from (S44) was simulated using theleaping algorithm (Gillespie, 2007). In case of the perturbation experiments, small modifications to the model were made. Overexpression of either of the two isoforms was reflected by changing the initial conditions of Antp. In particular, we added to the overexpressed isoform a random number of molecules drawn from a negative binomial distribution with mean and squared coefficient of variation 2 (see Table 1). To account for overexpression of an external repressor , we introduced a fourth state in the promoter model, from which no expression can take place. This state is assumed to be reachable from any of the other three states at a rate ( ) with ( ) as the concentration of the external repressor at time and as a coefficient accounting for cooperativity in the binding of the repressor to the promoter. For simplicity, we assumed = in our case studies. To account for cell-to-cell variability in the repressor concentration, the latter was initialized randomly according to a Poisson distribution, i.e., ( 0 )~( ) with as the average repressor abundance and Development: doi:10.1242/dev.168179: Supplementary information as the Gamma-distributed random variable defined above. Furthermore, repressor molecules were assumed to have an average lifetime of , i.e., −1 → ∅. The corresponding reaction rates of Antp expression were determined analogously to equations (S41-S43). Table 1 summarizes the parameters used for each of the simulation studies.