Hox binding specificity is directed by DNA sequence preferences and differential abilities to engage inaccessible chromatin

While Hox genes encode for conserved transcription factors (TFs), they are further divided into anterior, central, and posterior groups based on their DNA-binding domain similarity. The posterior group expanded in the deuterostome clade and patterns caudal and distal vertebrate structures such as the spinal neuronal diversity required for motor function. Our data revealed that limb-level patterning central Hoxc6, Hoxc8 and posterior Hoxc10 have a reduced ability to access occluded sites compared to other tested Hox TFs. Thus, their genomic binding relies more on cell-specific chromatin accessibility. Although posterior Hoxc9, Hoxc10 and Hoxc13 induce different fates, they share motif preference. However, Hoxc9 and Hoxc13 have a unique ability to access sites occluded by chromatin, resulting in divergent genomic binding patterns. From these results, we propose that the differential abilities of posterior Hox TFs to bind to previously inaccessible chromatin is the predominant force driving their patterning diversification.


INTRODUCTION
Hox genes encode a highly conserved transcription factor family that endow cells with positional identity during embryonic development (Duboule & Dolle, 1989;Lewis, 1978;McGinnis & Krumlauf, 1992). In mammals, Hox genes are organized into four clusters located on different chromosomes (HoxA, HoxB, HoxC, and HoxD). Each cluster contains a subset of 13 similar paralogous Hox genes, genomically arranged in the same order as their spatial and temporal expression patterns in the developing embryo, a phenomenon known as collinearity (Duboule & Morata, 1994;Kmita & Duboule, 2003). Changes in Hox gene expression patterns induce gross morphological changes, such as organs transforming into one another (homeotic transformations). However, how similar Hox transcription factors (TFs) assign positional identities during cell differentiation is not entirely understood.
Homeodomains are highly conserved helix turn helix DNA-binding domains that recognize similar consensus DNA sequences (Affolter, Slattery, & Mann, 2008;Berger et al., 2008;Gehring et al., 1994;Noyes et al., 2008). Vertebrate Hox TFs can be divided into anterior (Hox1-5), central (Hox6-8), and posterior (Hox9-13) paralog groups. The posterior Hox group is particularly interesting because it has expanded in the deuterostome clade. While there is one Drosophila Abd-B, several Abd-B-related Hox genes assign different posterior positional identities during vertebrate development (reviewed in (Duboule, 2007;Lanfear, 2010)). Thus, understanding how Hox TFs differentiate their genomic binding activity to specify cell fates is at the core of understanding vertebrate body patterning.
In vitro binding studies have uncovered the intrinsic sequence preferences of Hox TFs alone or in complex with a specific cofactor. These studies revealed slight sequence preference differences between anterior (Hox1-5) and central Hox paralog groups (Hox6-8), which prefer motifs containing the canonical TAAT core sequence, and posterior paralog groups (Hox9-13), which prefer the TTAT sequence (Berger et al., 2008;Ekker et al., 1994;Mann, Lelli, & Joshi, 2009;Noyes et al., 2008). Moreover, the interaction between Hox TFs and Meis and Pbx cofactors increases the specificity and selectivity of Hox DNA-binding (reviewed in (Mann & Affolter, 1998;Mann & Chan, 1996;Merabet & Mann, 2016)). However, these studies do not address Hox binding in a cell-specific chromatin context, which can be explored by investigating genome-wide binding in differentiating cells. Moreover, the variability among Abd-B-derived vertebrate posterior Hox TFs cannot be explored with Drosophila genes. Virally-expressed Hoxa9-13 and Hoxd9-13 in primary chicken mesenchymal limb progenitors exhibit some binding specificity differences between the posterior group Hox TFs, with Hoxa/d13 paralogs being the most different (Jerkovic et al., 2017). Drosophila Hox proteins have distinct abilities to bind accessible and inaccessible chromatin, and these differential abilities are associated with binding selectivity (Porcelli, Fischer, Russell, & White, 2019). However, how Hox TF abilities to bind to inaccessible chromatin shape their different patterning activities is not well understood.
This aspect is of particular interest for the Abd-B related posterior Hox group, which have similar sequence preferences but different patterning abilities.
Similarly, V1 interneurons are also patterned by Hox genes, resulting in expression of specific combinations of transcription factors in interneurons formed at forelimb and hindlimb levels, which differ compared to combinations expressed at the thoracic level (Sweeney et al., 2018).
Thus, understanding how vertebrate Hox TFs pattern MN and interneuron identity into different subtypes will illuminate mechanisms that establish neuronal diversity.
The role of Hox genes in organizing the spinal cord presents an interesting conundrum. The shared limb expression program is controlled by central (Hox6 & Hox8) and posterior (Hox10) Hox TFs, which are expected to have distinct intrinsic DNA binding preferences. Meanwhile, the posterior Hoxc9 should bind to similar DNA sequences as Hox10 TFs, but it instead instructs thoracic fate. Moreover, while all Hox9 paralogs repress rostral Hox gene expression, Hoxc9 distinguishes itself from the rest of its paralogs with its ability to induce thoracic fate (Jung et al., 2014). In agreement with their genomic cluster position, Hox13 paralogs are expressed late during development, distally, and in posterior regions. Moreover, they are associated with the unique ability to serve as "repressors of elongation" or "patterning terminators" by inhibiting cell growth or inducing apoptosis (Denans, Iimura, & Pourquie, 2015;Economides, Zeltser, & Capecchi, 2003;Godwin & Capecchi, 1998). As a model to understand Hox-dependent patterning, we sought to understand how central and posterior Hox TFs bind the genome to induce different spinal cord identities.
To understand how Hox TFs induce different fates, we performed a comparison of global binding preferences and transcriptional targets of different Hox proteins during embryonic stem cell to MN/interneuron differentiation. We focused on a subset of HoxC TFs (Hoxc6,8,9,10, and 13) because of their importance for inducing spinal cord identities. Furthermore, we complemented these studies with additional Hox9 paralog groups. Our results suggest that limb-innervating fate is not the product of identical central and posterior Hox binding patterns since Hoxc10 does not mimic Hoxc6 and Hoxc8 binding profiles. Although posterior group Hox TF have similar DNA motif preferences, they do not bind to the genome with identical patterns. This difference is mainly due to their abilities to bind to motifs occluded in inaccessible chromatin.
Hoxc13 and Hoxc9 bind larger fractions of sites in previously inaccessible chromatin when compared to Hoxc10 and other Hox9 paralogs. In this regard, Hoxc10 behaves more like central Hoxc6 and Hoxc8 that induce a similar limb-innervating fate. From these results, we propose that the differential abilities of posterior Hox TFs to bind to previously inaccessible chromatin is the predominant force driving their patterning diversification.

Hox TFs induce distinct patterns of gene expression during in vitro spinal cord differentiation
The study of Hox TF genomic binding has been made difficult by the lack of availability of homogenous relevant cell populations at scales compatible with chromatin immunoprecipitation studies. To mitigate this technical limitation, we opted for an embryonic stem cell (ESC) differentiation system that recapitulates ventral spinal cord early development (Davis-Dusenbery, Williams, Klim, & Eggan, 2014;Wichterle, Lieberam, Porter, & Jessell, 2002;Wichterle & Peljto, 2008). In response to the dorso-ventral Hedgehog and rostro-caudal retinoic acid (RA) patterning signals, ESCs differentiate into MNs and interneurons transitioning through embryonic progenitor states (Wichterle et al., 2002). The culture acquires a rostral spinal cord identity, with 90% of cells expressing Hoxa5 (Peljto, Dasen, Mazzoni, Jessell, & Wichterle, 2010;Peljto & Wichterle, 2011). Moreover, differentiating ESCs respond to Hox gene overexpression similarly to those in the developing spinal cord (Jung et al., 2010;Machado et al., 2014;Narendra, Bulajic, Dekker, Mazzoni, & Reinberg, 2016;Tan, Mazzoni, & Wichterle, 2016). The stem cell differentiation system allows for the study of homogenous differentiating cells at an exquisite temporal resolution, producing a large number of cells required for chromatin studies. Therefore, in vitro MN/interneuron differentiation is an advantageous platform to study Hox TF binding to the genome during neuronal diversification in a developmentally relevant cellular context.
To recapitulate Hox expression dynamics, Dox was added at a time point when the culture consists of neural progenitors ( Figure 1A, Supp Figure 1A). As time progresses, cells become neurons, peaking at 48h after Dox addition when the culture mostly consists of postmitotic MNs as well as interneurons.
To investigate the transcriptional consequences of Hox induction on gene expression, we performed RNA-seq in Hoxc6, Hoxc8, Hoxc9, Hoxc10 induced postmitotic neurons as well as control neurons not treated with Dox (no Dox control) ( Figure 1A-B, Supp Figure 1A). To examine global similarity in gene expression profiles amongst the four iHox cell lines, we used principal component analysis (PCA). We filtered out genes that were not expressed in any iHox neuron, retaining 19,019 genes (see Methods). The first two principal components explained 80% of the variance in RNA-seq tag counts at these genes, with variance along PC1 reflecting a combination of paralog group and the subtype identities specified by the Hox proteins ( Figure   1C). iHoxc6 and iHoxc8 were most similar to each other and the control cells, which mostly express Hox5 but harbor some Hox6 paralog expression at this timepoint (Mazzoni et al., 2013;Narendra et al., 2016;Wichterle et al., 2002). iHoxc9 grouped the furthest away, while iHoxc10 grouped between cells expressing Hox5-8 genes (control cells, iHoxc6 and iHoxc8) and iHoxc9.
Closer analysis of the RNA-seq data revealed that while Hoxc9 and Hoxc10 are both posterior paralogs, their transcriptional impact and the effects on cell fate are distinct, illustrated by the expression of known marker genes. For example, Hoxc6, Hoxc8, and Hoxc10 overexpression induce canonical LMC markers Raldh2 and Foxp1 ( Figure 1D, Supp Figure 1C) (Narendra et al., 2016;Tan et al., 2016). A similar recapitulation of the known in vivo phenotype can be seen upon Hoxc9 overexpression, which leads to the repression of anterior Hox1-5 paralogs ( Figure   1D, Supp Figure 1C) (Jung et al., 2010). Importantly, Hox induction during in vitro spinal cord differentiation did not affect the ability of cells to acquire an MN identity, as illustrated by the shared downstream expression of Mnx1, Isl1/2, and Lhx3 ( Figure 1D, Supp Figure 1C). Thus, Hox TF activity during ESC differentiation induces distinct fates without affecting the generic MN fate, as it does during embryonic development, making it an appropriate strategy to study how Hox TFs bind the genome to activate target gene expression and induce proper positional identities.

Central and posterior Hox TFs have distinct sequence motif preferences during MN differentiation
Hox proteins have similar DNA binding domains, yet they modify the transcriptomes in positional-specific patterns along the spinal cord rostro-caudal axis. To understand how Hox TFs assign positional identity, we assayed genomic binding of Hoxc6, Hoxc8, Hoxc9, and Hoxc10 by performing ChIP-seq experiments 24h and 48h after Dox treatment ( Figure 1A, Supp Figure 1A). These time points were chosen because they align with newborn and young postmitotic MNs, critical times for Hox positional identity patterning. We compared the binding of Hoxc6, Hoxc9, and Hoxc10 Hox TFs at 24h and 48h after Dox induction and concluded that the binding of each Hox protein is mostly the same in young and postmitotic MNs (Supp Figure 2A -C). Importantly, all Hox proteins are N-terminally Flag-tagged, which allows for immunoprecipitation with the same antibody, eliminating any bias that could occur from different antibody affinities and cross-reactivity.
De novo motif discovery, using the ensemble method MEME-ChIP, revealed distinct motifs when comparing central versus posterior Hox ChIP-seq datasets (Machanick & Bailey, 2011). anterior and central Hox paralogs prefer the TAAT motif while the posterior paralogs prefer the TTAT motif (Mann et al., 2009;Noyes et al., 2008). However, this motif preference did not discriminate within central and posterior group Hox TFs and thus cannot fully explain vertebrate Hox patterning abilities.

Hoxc6, Hoxc9, and Hoxc10 TFs have different genome-wide binding profiles
We next asked how the differing sequence preferences are translated into distinct central versus posterior Hox TF genomic binding distributions. Moreover, posterior Hox TFs share sequence specificity but have different fate-inducing abilities, thus it is not clear how their genomic binding would be in differentiating cells. We began by comparing Hoxc6 and Hoxc8 because they are central group Hox genes and both induce brachial identity. We restricted the analysis to the top 10,000 ChIP-seq sites in each dataset for all downstream analyses to ensure the least amount of bias from comparing different experiments. During cell differentiation, Hoxc6 and Hoxc8 proteins bind the genome in a remarkably similar pattern; less than 2% of tested Hoxc6 and Hoxc8 sites are significantly differentially bound by the two TFs ( Figure 3A). Thus, the two main brachial Hox proteins bind similar target sites in differentiating neurons. To facilitate the interpretation of the subsequent comparisons, we picked Hoxc6 to represent the brachial-inducing Hox binding profile.
Mirroring whole ChIP-seq analysis for each factor, MEME-ChIP motif analysis revealed distinct primary motifs when comparing differentially bound sites by a central versus posterior Hox TF ( Figure 3F). Altogether, Hox TFs bind to a set of similar sites regardless of their paralog group and fate inducing activity. The brachial and thoracic inducing Hox TFs are different in their binding specificity and bind to unique sites. Although Hoxc10 and Hoxc9 share sequence specificity, Hoxc10 lacks the ability to bind to a large fraction of Hoxc9 sites.
We then asked whether differentially expressed genes in the iHox neurons correlated with specific Hox binding categories. Specifically, we used the logistic regression-based ChIP-Enrich method to identify significant associations (adjusted p-value < 0.01) between RNA-seq derived gene sets and Hox TF binding categories. Genes that are equally upregulated in iHoxc6, iHoxc9 and iHoxc10 neurons associated strongly with c6=c9=c10 sites, with some association with c6>c9,c10 sites and c9,c10>c6 sites as well ( Figure 4A). Thus, genes upregulated in all Hox inductions correlate with sites bound by all Hox TFs, suggesting a universal "Hox input" in their regulation. Genes differentially expressed in iHoxc6, compared to iHoxc9 or iHoxc10 neurons, showed a correlation with c6>c9,c10 sites ( Figure 4A). Similarly, genes differentially expressed in iHoxc9 neurons correlate with c9>c6,c10 sites ( Figure 4A). Thus, Hox TF binding correlates with transcriptional activity. Moreover, these results suggest that while Hoxc9 expression represses rostral Hox gene expression, Hoxc9 is not a general repressing Hox TF.

Hoxc9 has a higher preference for inaccessible chromatin than Hoxc6 and Hoxc10
The central Hoxc6 and posterior TFs (Hoxc9, Hoxc10) bind distinct sequences ("TAAT" vs. "TTAT") explaining Hoxc6 divergent binding. However, Hoxc9 and Hoxc10 binding does not entirely overlap, yet they share sequence preferences ("TTAT", "TGATTTAT"). Since chromatin accessibility is an important determinant of genome-wide TF binding specificity, we decided to explore whether the prior accessibility landscape affects genomic binding patterns of Hox TFs even in instances where they share a primary motif preference.
To investigate whether the chromatin accessibility landscape prior to Hox induction shapes Hox TF binding patterns, we characterized genome-wide chromatin accessibility states by ATAC-seq at the progenitor stage ( Figure 5A, Supp Figure 1A). The distribution of progenitor ATAC-seq read density at Hoxc6, Hoxc9, and Hoxc10 sites revealed that Hoxc9 binding sites had the lowest median accessibility before each factor is induced ( Figure 5B). Hox factors share some genomic binding locations, yet they also bind to unique sites. Thus, we hypothesized that prior chromatin accessibility might not be equally distributed across all binding sites for each Hox TF. c6=c9=c10, c6>c9,c10, and c6,c9>c10 binding categories harbor similar prior accessibility landscapes (55%, 54%, and 45% of sites overlap accessible domains in neuronal progenitors, respectively). On the other hand, c9>c6,c10 and c9,c10>c6 binding occurs in genomic locations with much lower prior accessibility (16% and 18% of sites overlap accessible domains in MN progenitors, respectively) ( Figure 5C, D). Dissecting Hox binding into different categories underscored a possible mechanism to complement sequence specificity in generating Hox TF binding profiles. Hoxc6 and Hoxc10 overall binding has a higher correlation with preexisting accessibility than Hoxc9. These results suggest that sites bound by all studied Hox TFs have higher prior accessibility to allow for binding of those factors that cannot bind to inaccessible sites. Hoxc9 has a significant fraction of its binding in previously inaccessible regions, so although it shares motif preference with Hoxc10, it can bind to more sites.
To test if Hox differential preferences for accessible regions were TF intrinsic or due to a MN/interneuron progenitor specific chromatin state or cofactor, we investigated their binding in a different cell type. Undifferentiated cells found in embryoid bodies (EBs) at an early differentiation state before the application of patterning signals have a dissimilar accessibility landscape than MN progenitor state cells and were chosen as the system conducive for experiments utilizing the same inducible system ( Figure 5E). Even in this different cell type, Hoxc9 maintains a higher preference for inaccessible chromatin than Hoxc6 ( Figure 5F). Thus, the intrinsic ability of Hox TFs to bind to inaccessible chromatin seems to be independent of the particular cellular environment in which they are expressed. Altogether, comparing Hox TF binding with prior accessibility revealed that limb-innervating Hoxc10 and Hoxc6 rely more on chromatin accessibility established at progenitor stages to find their target sites. Moreover, these results divide the posterior paralog group by their ability to bind previously inaccessible chromatin, placing Hoxc9 with a higher ability to bind inaccessible chromatin compared with Hoxc10.

Hox TF binding increases chromatin accessibility
The difference in the abilities of Hox TFs to bind to inaccessible chromatin prompted us to explore whether Hox TFs change the accessibility landscape after binding. To characterize the accessibility state after Hox binding, we performed ATAC-seq in postmitotic neurons expressing Hoxc6, Hoxc9, or Hoxc10 ( Figure 6A, Supp Figure 1A). We then compared accessibility changes from progenitors to post-mitotic neurons at Hoxc6, Hoxc9, and Hoxc10 binding events.
Consistent with Hox TF increasing or maintaining the chromatin status, c6=c9=c10 sites gained accessibility after Hoxc6, Hoxc9, and Hoxc10 expression ( Figure 6B, C). c6>c9,c10 sites gained some accessibility in iHoxc6 neurons, while they lost accessibility in iHoxc9 and iHoxc10 expressing cells. Thus, although with a relatively weaker ability to bind inaccessible sites, Hoxc6 can maintain accessibility. c9,c10>c6 sites gained accessibility in both iHoxc9 and iHoxc10 neurons, while not significantly changing in iHoxc6 cells. These posterior Hox TFs bind to sites with low accessibility and increase accessibility upon binding. Hoxc9 exclusive sites had the lowest prior accessibility and gained the most accessibility in iHoxc9 neurons. Thus, the large category of sites differentially bound by Hoxc9 went through the biggest change in accessibility status only in Hoxc9-expressing neurons ( Figure 6B, C). These results show that although Hox TF binding is generally associated with maintained or increased chromatin accessibility, Hoxc9 stands out in its ability to bind to a large set of sites in inaccessible chromatin and increase the accessibility status after binding.

Posterior group Hox TFs display a range of abilities to bind inaccessible chromatin
Genome-wide chromatin preferences of Hoxc6, Hoxc8, Hoxc9 and Hoxc10 revealed that Hoxc9 binds a larger fraction of previously inaccessible sites. We wondered whether Hoxc9 or Hoxc10 is an outlier compared to other posterior Hox TFs. Hoxc9 thoracic inducing ability is unique among Hox9 paralogs (Jung et al., 2014). Thus, the ability of Hoxc9 to bind to inaccessible chromatin could also be unique among Hox9 paralogs. We captured the genomic binding of Hoxa9 and Hoxd9 by performing ChIP-seq experiments 24h after Dox treatment ( Figure 7A-B).
The detected motifs for Hoxa9 and Hoxd9 resemble the posterior Hox TTAT and bipartite Pbx and posterior TGATTTAT motifs ( Figure 7A-B). A comparison of Hoxa9 and Hoxd9 ChIP-seq using EdgeR revealed that they share most of their binding sites ( Figure 7C), while comparing Hoxc9 and Hoxa9 binding patterns showed that Hoxc9 binds a large category of sites alone ( Figure 7D). Among the Hox9 paralogs, Hoxc9 binds to sites with the lowest median prior accessibility ( Figure 7E). Accordingly, the sites uniquely bound by Hoxc9 showed a lower prior accessibility than other sets of sites ( Figure 7F, G). Hence, along with a strong ability to induce thoracic fate, the ability of Hoxc9 to bind to a large set of inaccessible sites is not shared by other Hox9 paralogs.
The posterior Hox binding profiles we have analyzed revealed that in spite of sharing motif preferences, they do not entirely overlap in their genomic binding. Thus, we sought to expand these analyses to another relevant posterior Hox gene. Hox13 paralogs are also posterior group genes but have the unique ability to terminate axial elongation. Analysis of Hoxc13 ChIP-seq data revealed a posterior Hox TTAT motif ( Figure 8A). To assess whether Hoxc13 would bind like Hoxc9 or other posterior Hox TFs, we compared Hoxc9, Hoxc10, and Hoxc13 genomic binding overlaps ( Figure 8B). A small fraction of all sites (1,024) were shared by Hoxc9, Hoxc10, and Hoxc13 ("c9=c10=c13"). Hoxc9 and Hoxc10 ("c9,c10>c13") bind similarly to 6,257 sites, while only 1,208 sites were bound similarly by Hoxc9 and Hoxc13 ("c9,c13>c10"). As in all previous comparisons, Hoxc9 retained a subset of private sites 2,543 ("c9>c10,c13"). However, in this comparison, a large category (5,652) of sites was bound by Hoxc13 alone ("c13>c9,c10"). A direct comparison of Hoxc9 and Hoxc13 binding profiles supported the finding that they largely bind distinct sets of sites (Supp Figure 4A). Thus, while Hoxc13 shares a posterior motif preference, it binds to unique sites increasing the posterior Hox TF binding diversity.
We then asked if the posterior Hox group TF binding diversity correlates with their differential ability to bind to previously inaccessible chromatin. The distribution of progenitor ATAC-seq read density at Hoxc9, Hoxc10, and Hoxc13 sites revealed that Hoxc13 binding sites had the lowest median accessibility in MN progenitors, even lower than Hoxc9 ( Figure 8C). Dissecting the accessibility at the different Hox binding categories underscored the ability of Hoxc13 to bind to sites with the lowest prior accessibility ( Figure 8D, E).
To visualize the overall variation in Hox TF genome-wide binding profiles, we performed Principal Components Analysis (PCA) on Hox TF ChIP-seq read counts associated with the set of genomic sites that belong to the top 10,000 binding sites for at least one Hox TF (38,874 unique genomic locations; Figure 9A). PC1, which explains 43% of the variance between the TFs, separates Hoxc13 from the anterior Hox TFs, Hox9 paralogs, and Hoxc10. On the other hand, PC2 and PC3, which cumulatively explain 32% of the variance, separate the TFs into three clusters. The limb-level-specifying Hoxc6 and Hoxc8 cluster together, Hoxc9 clusters by itself, and finally Hoxc10 and Hox9 paralogs Hoxa9 and Hoxd9 cluster together. Of note, the binding pattern of HoxC TFs to the HoxC gene cluster demonstrates their differential ability to engage inaccessible chromatin ( Figure 9B). While Hoxc6 only binds Hoxc4-5 genes, which are transcriptionally active, Hoxc10, Hoxc9 and Hoxc13 bind progressively deeper into the cluster at repressed HoxC genes which are covered in the catalytic product of PRC2 (Mazzoni et al., 2013;Narendra et al., 2015). Thus, differences in genome-wide binding profiles of the Hox TFs reflect sequence preferences, as well as the differential abilities of the Hox TFs to bind previously inaccessible chromatin. During embryonic development, LMC neurons are further patterned by Hox proteins into specific pools (Dasen et al., 2005). During this second stage, Hoxc6 and Hoxc8 TFs differ in their fine fate inducing identity. The in vitro differentiation strategy and the forced expression of a single Hox gene does not recapitulate this late differentiation event. Thus, we speculate that pool patterning by Hox genes does require a more complex developmental context expressing more Hox genes and signaling molecules important for pool identity acquisition (Arber, Ladle, Lin, Frank, & Jessell, 2000;Dasen et al., 2005;Haase et al., 2002).

DISCUSSION
In agreement with in vitro binding preference studies, central and posterior Hox proteins bind to different motifs in neurons (Mann et al., 2009;Noyes et al., 2008). While central Hox TFs bind to sites with the central TAAT motif, posterior Hox TFs bind to TTAT central motifs. MEME-ChIP also discovered bipartite cofactor and Hox motifs with high enrichment in some sets of sites. An interaction with TALE cofactors, Meis and Pbx, can change the affinity and selectivity of Hox DNA-binding (Slattery, Riley, et al., 2011) (reviewed in (Mann & Affolter, 1998;Mann & Chan, 1996;Merabet & Mann, 2016)). In addition, partnering with TALE cofactors modifies Hox binding specificity through the recognition of DNA shape (Abe et al., 2015;Joshi et al., 2007;Zeiske et al., 2018). Although Hox expression changes the fate of in vitro differentiating neurons, we have not seen differential binding of the canonical Meis Hox co-factor that could explain differential Hox TF binding patterns (Supp Figure 5). Furthermore, Pbx1-4 and Meis1-3 expression levels are similar across the different Hox TF inductions and thus cannot explain binding differences ( Figure 1D, Supp Figure 1C). In search for additional factors, we failed to detect a CTCF motif at Hox binding sites, which was reported to co-bind with some HoxA/D proteins (Jerkovic et al., 2017). Thus, CTCF as a Hox-cofactor could be cell fate dependent.
This study revealed a gradation of Hox TF abilities to bind to inaccessible chromatin. This idea is supported further by previous studies reported that anterior Hox TF Hoxa1 does not act as a pioneer factor and instead binds to sites that were previously accessible (De Kumar et al.,

2017). Moreover, a study of the binding profiles of Ultrabithorax (Ubx), Abdominal A (Abd-A)
and Abdominal B (Abd-B) in Drosophila cells revealed that Abd-B harbored a unique ability to bind to inaccessible chromatin alone (Beh et al., 2016). Our results suggest that posterior group Hox TFs, which are related to Abd-B, have different abilities to engage inaccessible chromatin.
Hoxc9 has a higher preference for inaccessible chromatin than any other spinal cord Hox TF.
Furthermore, Hoxc13 has a high preference for inaccessible chromatin, unlike any other posterior Hox TF we investigated. Hox13 factors have a unique role in the termination of posterior structures during development (Denans et al., 2015;Economides et al., 2003;Godwin & Capecchi, 1998;Young et al., 2009). Thus, we propose a model of posterior diversification based on their ability to bind inaccessible chromatin.
Limb innervating Hox TFs from the central and posterior groups do not share binding motif preference but share a reduced ability to bind to inaccessible sites. They rely more on cellspecific accessibility landscape generated during differentiation. Indeed, the binding patterns of the Hoxc8 orthologue Ultrabithorax (Ubx) in T3 leg and haltere imaginal discs are not identical (Slattery, Ma, Negre, White, & Mann, 2011). Thus, a Hox TF binding model should include motif preference, co-factor binding and Hox TF-specific ability to bind DNA motifs occluded by chromatin.

DATA AVAILABILITY
Sequencing data have been submitted to the GEO under accession code GSE142379.

DECLARATION OF INTEREST
Authors declare no competing interests.

Cell line generation
The inducible Hoxc6, Hoxc8, Hoxc9, and Hoxc10 cell lines were generated as previously described (Mazzoni et al., 2011). The inducible cassette exchange (ICE) system was used to generate cell lines (Iacovino et al., 2011). The resulting inducible cell lines harbor a single copy of the transgene inserted at the expression-competent HPRT locus. Inducible Hoxa9, Hoxd9 and Hoxc13 cell lines were generated for this study. Hoxa9, Hoxd9, and Hoxc13 cDNA was amplified using Phusion polymerase (Thermo Scientific) from pCAGGS mHoxA9 (JD-114), pCAGGS mHoxD9 (JD-237) and hHoxc13 cDNA (Dharmacon, Accession: BC090850), respectively. Flag and HA tags were introduced during the amplification step at the amino-or carboxyl-terminus, respectively. p2lox Hoxa9, p2lox Hoxd9, and p2lox Hoxc13 plasmids were generated by In-Fusion cloning (Takara) the respective cDNAs into the p2lox plasmid backbone.

RNA-seq
Cells were collected prior to TF induction (day 2 of RA/SAG differentiation) and 48h after Doxycycline (Dox) treatment. RNA was extracted by using TRIzol LS Reagent (Life Technologies) and purified using the RNAeasy mini kit (Qiagen). Agilent High Sensitivity RNA Screentape (Agilent, 5067-5579) was used to check RNA integrity. 500ng of RNA was used to prepare RNA-seq libraries and spiked-in with ERCC Exfold Spike-in mixes (Thermo Fisher,4456739). RNA-seq libraries were prepared using TruSeq Stranded mRNA Library Preparation kit (Illumina, 20020594). Library size was verified using High Sensitivity DNA ScreenTape (Agilent, 5067-5584). The KAPA library amplification kit was used on Roche Lightcycler 480 for library quantification before pooling. The libraries were sequenced on Illumina NextSeq 500 using V2.5 chemistry (75 cycles, single-end 75bp) at the Genomics Core Facility at NYU.

ATAC-seq
Cells were collected prior to TF induction (day 2 of RA/SAG differentiation) and 48h after Doxycycline (Dox) treatment. 50,000 cells were aliquoted and washed twice in 1 × PBS (cold).
Cell pellets were resuspended in 10 mM Tris pH 7.4, 10 mM NaCl, 3 mM MgCl2, and freshly added 0.1% NP-40 (vol/vol), and centrifuged at 4 °C. Pellets were resuspended in 25 µL of 2 × TD buffer, 2.5 µL TDE1 (Nextera DNA sample preparation kit, FC-121-1030) and 22.5 µL of water and incubated at 37 °C for 30 min. The sample was purified using the MinElute PCR purification kit (Qiagen, 28004). A quantitative PCR reaction with 1 × SYBR Green (Invitrogen), custom-designed primers and 2 × NEB MasterMix (New England Labs, M0541) was performed to determine the optimal number of PCR cycles (one-third of the maximum measured fluorescence) (Buenrostro, Giresi, Zaba, Chang, & Greenleaf, 2013). PCR enrichment of the library was performed with custom-designed primers and 2 × NEB MasterMix (New England Labs, M0541). The libraries were purified using the MinElute PCR purification kit. High Sensitivity DNA ScreenTape (Agilent, 5067-5584) was used to verify the fragment length distribution of the library. Library quantification was performed using the KAPA library amplification kit on the Roche Lightcycler 480. The libraries were sequenced on Illumina NextSeq 500 using V2 and V2.5 chemistry (150 cycles, paired-end 75bp) at the Genomics Core Facility at NYU.

ChIP-seq Data Processing
ChIP-seq reads were aligned to the mm10 genome using Bowtie (v1.0.1) (Langmead, Trapnell, Pop, & Salzberg, 2009), using options "-q --best --strata -m 1 --chunkmbs 1024". Genome-wide TF binding events were called in each condition using MultiGPS (v0.74) (Mahony et al., 2014). EdgeR (v3.22.5) was used within the MultiGPS framework to test whether genomic sites were differentially bound by TFs (Robinson, McCarthy, & Smyth, 2010). Specifically, EdgeR uses a negative binomial generalized linear model (GLM) to test whether ChIP-seq reads in one condition are significantly greater than in alternative condition. A genomic site was defined as shared by TFs if significant binding events were called in both TF ChIP-seq experiments (qvalue < 0.001), and the TFs did not display differential read enrichment at that site as estimated by EdgeR (q-value > 0.01). A binding event was defined as "TF1>TF2" if a MultiGPS peak was called in TF1 ChIP-seq (q-value < 0.001), TF1 exhibited a greater log fold-change with respect to the input ChIP-seq than TF2, and TF1 and TF2 were significantly differentially bound as defined by EdgeR (q-value < 0.01). A similar strategy was applied to perform multi-way ChIPseq comparisons. For example, when comparing Hoxc6, Hoxc9, and Hoxc10 ChIP-seq experiments, "shared" binding sites were defined if significant binding events were detected in all three ChIP-seq datasets, and no two TFs were differentially bound with respect to each other. Sites were defined as "TF1> TF2, TF3" if a significant binding event was called for TF1, and a significantly greater number of reads were detected by EdgeR in TF1 ChIP-seq when compared to both TF2 and TF3. Finally, "TF1, TF2 > TF3" events were defined if significant binding events were called in both TF1 and TF2, no differential binding was detected between TF1 and TF2, and both TF1 and TF2 has a significantly greater number of reads than TF2. Only binding categories containing at least 500 binding events were retained

RNA-seq Data Processing
Fastq files obtained from RNA-sequencing were aligned to the genome using the splice-aware STAR (Spliced Transcripts Alignment to a Reference) aligner (v2.7.0c) (Dobin & Gingeras, 2016). Mapped reads were assigned to NCBI RefSeq annotated mm10 genes using the featureCount function in Rsubread (v1.30.9) (Liao, Smyth, & Shi, 2019). RefSeq genes with matching Entrez IDs were merged into a single gene by Rsubread. Following read summarization, read counts were normalized using the 'rlog' or regularized log transformation in DESeq2 (v1.20.0) (Love, Huber, & Anders, 2014). Transformed read counts were used as input features into the dimensionality reduction techniques PCA and MDS. The log 2 fold change (LFC) in gene expression levels between iHox motor neuron vs. progenitor motor neurons was estimated using DESeq2. A q-value < 0.01 and LFC > 2 was used to define differentially expressed genes between progenitor motor neurons, noDox controls, iHoxc6 MNs, iHoxc8 MNs, iHoxc9 MNs, and iHoxc10 MNs.

ChIP-seq and ATAC-seq Data Visualization
Heatmaps were used to plot the ChIP-seq and ATAC-seq reads at multiple categories of genomic sites in iHox neurons. Heatmaps were made using the MetaMaker program in SeqCode. (https://github.com/seqcode/seqcodecore/blob/master/src/org/seqcode/viz/metaprofile/MetaMaker.java). Raw reads from the ChIPseq data were extended by 100 bps and read counts were binned into 100 bp bins. Binned reads were plotted over 1000 base pair windows centered on MultiGPS binding events. Color thresholds used to produce heatmaps were determined from binding data using the MetaPlotLineMaxEstimator program in SeqCode. Specifically, all 100 bp bins were ordered by the number of reads overlapping each bin. The maximum value for the heatmap color scale was set to the number of read counts at the 85 th percentile bin. The minimum value for all heatmaps was set to 5 reads. Finally, all binding events represented were ordered based on their genomic co-ordinates.

ATAC-seq Composite Plots
For each ATAC-seq experiment, reads mapping to mitochondrial DNA or unannotated regions were filtered out. All experiments were performed in replicates; composite read distributions were initially calculated independently for each replicate. Filtered bedfiles were used to calculate the total number of reads overlapping each position in a 2000 base pair window surrounding a binding event of interest. The number of reads at each position was summed over the set of genomic sites being analyzed. The reads were total tag normalized, and finally the resulting read counts were averaged over replicates. Read density (reads per million per site) was plotted using seaborn in Python.

Association between gene sets and binding events
In order to construct gene sets, pairwise comparisons of gene expression levels between iHoxc6, iHoxc9 and iHoxc10 vs. progenitor motor neurons were performed using DESeq2.
Additionally, pairwise comparisons were also performed between the iHox motor neurons: iHoxc6 vs. iHoxc9, iHoxc6 vs. iHoxc10 and iHoxc9 vs. iHoxc10. Genes that were upregulated in all iHox vs. pMN comparisons (log2 fold change (LFC) > 2 and adjusted p-value < 0.01), as well as not differentially expressed between the iHox MNs (|LFC| > 2) were assigned to the gene set "shared-upregulated". A similar logic was used to identify the "shared-downregulated" genes.
Pairwise comparisons between iHox motor neurons were used to identify genes significantly upregulated or downregulated (|LFC| > 2 and adjusted p-value < 0.01) in individual iHox MNs.
Genes that were significantly upregulated or downregulated in two out of three iHox motor neurons were assigned to gene sets using the same thresholds.