Generation, selection and transcriptomic profiling of human neuromesodermal and spinal cord progenitors in vitro

Robust protocols for directed differentiation of human pluripotent cells are needed to establish the extent to which mechanisms operating in model organisms are relevant to our own development. Recent work in vertebrate embryos has identified neuromesodermal progenitors as a bipotent cell population that contributes to paraxial mesoderm and spinal cord. However, precise protocols for in vitro differentiation of human neuromesodermal progenitors are lacking. Informed by signalling activities during spinal cord generation in amniote embryos, we show here that transient dual-SMAD inhibition, together with retinoic acid (dSMADi-RA), provides rapid and reproducible induction of human spinal cord progenitors from neuromesodermal progenitors. We use CRISPR-Cas9 to engineer a GFP-reporter for a neuromesodermal progenitor-associated transcription factor Nkx1.2 in human embryonic stem cells, to facilitate selection of this cell population. RNA-sequencing (RNA-Seq) was then used to identify human and conserved neuromesodermal progenitor transcriptional signatures, validate this differentiation protocol and implicate new pathways and processes in human neural differentiation. This optimised protocol, novel reporter line and transcriptomic data are useful resources with which to dissect cellular and molecular mechanisms regulating the generation of human spinal cord, allow scale-up of distinct cell populations for global analyses, including proteomic, biochemical and chromatin interrogation and open up translational opportunities.


Introduction
Head and trunk nervous systems have distinct developmental origins. Head or anterior neural progenitors are derived from the epiblast rostral to the primitive streak and will form regions of the brain. In contrast, progenitors of trunk or posterior neural tissue (posterior hindbrain and spinal cord) arise from epiblast adjacent to and within the anterior primitive streak (known as caudal lateral epiblast (CLE) and node streak border (NSB), respectively) (Wilson et al. 2009) (Fig. 1A). In recent years, evidence has accrued which indicates that, unlike anterior, posterior neural tissue is generated via an intermediary neuromesodermal progenitor (NMP), which contributes paraxial mesoderm as well as posterior neural tube (reviewed (Tzouanacou et al. 2009;Gouti et al. 2015;Henrique et al. 2015;Tsakiridis and Wilson 2015). Human, mouse and chick embryos as well as in vitro NMPs are identified by co-expression of early neural (Sox2) and mesodermal Brachyury (Bra) proteins, but as yet lack unique molecular markers (Olivera-Martinez et al. 2012;Gouti et al. 2014;Turner et al. 2014;Henrique et al. 2015;Tsakiridis and Wilson 2015). While we are beginning to uncover how mouse NMPs are regulated, human NMPs and their derivatives are less well characterised, in part because this requires creation of robust in vitro models.
Most in vitro differentiation protocols are informed by our understanding of how the cell type of interest is generated during embryonic development. In the caudal end of amniote embryos, FGF and Wnt signalling act in a positive feedback loop to maintain the elongation of the body axis (Aulehla et al. 2003;Olivera-Martinez and Storey 2007;Wilson et al. 2009). FGF signalling also promotes expression of genes characteristic of CLE including the transcription factor Nkx1.2 (Delfino-Machin et al. 2005;Sasai et al. 2014). Nkx1.2 expression extends into the preneural tube (PNT) (Spann et al. 1994;Schubert et al. 1995;Rodrigo-Albors et al. 2016). Here preneural progenitors (PNPs) downregulate Bra, transcribe the early neural gene Sox2, but as yet lack neurogenic genes such as Neurog2 and Pax6 (Scardigli et al. 2001;Scardigli et al. 2003;Bel-Vialar et al. 2007) (Fig 1A). Retinoic acid synthesized in neighbouring paraxial mesoderm mediates the transition from PNPs, repressing expression of Fgf8 and Wnts 8a/c and 3a (Shum et al. 1999;Diez del Corral et al. 2003;Sirbu and Duester 2006;Olivera-Martinez and Storey 2007;Cunningham et al. 2015) and is then further required for neurogenic gene transcription (Diez del Corral et al. 2003;Ribes et al. 2008) In addition, to the involvement of these signalling pathways in NMP regulation, inhibition of BMP signalling is required for Sox2 transcription in the CLE/NSB (Takemoto et al. 2006). In mouse and chick embryos, various BMP and TGFb antagonists (Noggin, Chordin and Follistatin) are expressed in the anterior primitive streak, emerging notochord and newly 4 formed somites close to posterior neural tissue (Albano et al. 1994;Liem et al. 2000;Chapman et al. 2002). Considered together with the requirement for BMP antagonism for anterior neural induction (Hemmati Brivanlou and Melton 1997;Harland 2000;Kuroda et al. 2004;Linker and Stern 2004)  Almost all in vitro protocols for making NMPs from mouse and human embryonic stem cells (hESCs) involve exposure to various durations of Wnt agonist with or without FGF (Gouti et al. 2014;Tsakiridis et al. 2014;Turner et al. 2014;Lippmann et al. 2015) and one approach has included TGFb inhibition (to promote loss of self-renewal in human ESC and repress mesendoderm differentiation, after (Chambers et al. 2009)) (Denham et al. 2015). It is well established that efficient induction of anterior neural tissue from hESCs is achieved by exposure to inhibitors of both TGFb and BMP signalling (known as dual-SMAD inhibition) (Chambers et al. 2009). However, a role for BMP inhibition in the differentiation of neural tissue from NMPs in vitro has not been assessed. Here we show that neural differentiation from human NMPs is promoted by transient dual-SMAD inhibition. We deploy CRISPR-Cas9 engineering to make a reporter for enrichment for human NMPs and provide the first transcriptomic profiling of this cell population and derived spinal cord progenitors.

Results and Discussion
Generation of human neuromesodermal progenitors and robust differentiation into posterior neural progenitors by inclusion of transient dual SMAD inhibition In human ESCs the simplest approach to make NMPs involves removal of self-renewal conditions and exposure to FGF and the Wnt agonist CHIR99021 for 3 days. The NMPs generated in this way were then differentiated into neural progenitors by day 6, following replating and culture in basal media alone (Gouti et al. 2014). We assessed the reproducibility of this protocol (Fig. 1B) to generate Pax6 expressing neural progenitors. Culturing hESCs in Neurobasal/1x N2/1x B27 medium (N2/B27) supplemented with 20 ng/ml bFgf and 3 µM CHIR99021 for 3 days readily generated Sox2/Bra co-expressing NMP cells (Fig. 1B, C).
However, subsequent differentiation after cell dissociation and replating in just N2B27 at the end of day 3 (D3), did not generate Pax6 positive cells by end of day 6 (D6) (assessed in two hESC lines, SA121 and H9) (Figs 1D, E). We therefore next included all-trans retinoic acid (RA) 100 nM in this differentiation protocol from the beginning of day 4 (D4). However, this regime also did not elicit reproducible expression of Pax6 by D6 (Fig. 1D, E). This contrasted with a positive control for Pax6 transcription provided by a protocol for inducing anterior neural progenitors, exposure to Noggin 50 ng/ml and the TGFb receptor type 1 inhibitor SB431542 10 µM following removal of self-renewal conditions (dual SMAD inhibition, (Chambers et al. 2009) , Fig. 1D).
This inability to induce Pax6 from NMPs might reflect inherent differences between hESC lines (MasterSheff in Gouti et al 2014, compared with H9 this study), but could also involve variant culture conditions. In particular, while we used EDTA and gentle mechanical agitation to replate at D3, these researchers used Accutase, which can damage extra-cellular matrix and membrane proteins (Beers et al. 2012). This might then reduce cell-cell signalling and could mimic inhibition of BMP signalling, as reported on dissociation of Xenopus animal cap ectoderm (Wilson and Hemmati-Brivanlou 1995). We therefore next employed dual SMAD inhibitors from the beginning of D3 to the end of D4. This period aimed to mimic exposure to endogenous TGFb inhibitors experienced by cells in the CLE and PNT in the amniote embryo (Fig. 1A). Inclusion of this step did not alter induction of NMPs on D3, as judged by generation of Sox2/Bra co-expressing cells ( Fig. 1F and see flow cytometry data Fig.   S1), but, this step alone was still not sufficient to promote reproducible Pax6 expression by D6 ( Fig 1G). However, in combination with subsequent exposure to RA from D4, robust Pax6 was induced in this timeframe (Fig. 1G). Importantly, inclusion of either Noggin or SB431542 alone with RA was not effective (Fig. 1H), indicating that dual SMAD inhibition is required to augment neural differentiation in this context. The reproducibility of this dual SMAD inhibition and RA protocol (dSMADi-RA) (Fig. 1F) was further demonstrated by rapid induction of Pax6 in a hiPSC line (Fig S2: ChiPS4).
To characterize this dSMADi-RA differentiation protocol we analyzed the expression dynamics of key cell state marker genes using quantitative reverse transcription PCR (RT-qPCR). Pluripotency genes Nanog and Oct4 were dramatically reduced from hESC to D3(NMP) and transcripts were lost quickly as these cells differentiated ( Fig. 2A), as observed in mouse and chick embryo and mouse ESC-derived NMPs , Gouti et al., 2014. D3(NMPs) were characterized by high-level Bra and Cdx2 transcription ( Fig 2B). As in mouse ES cell-derived NMPs, Sox2 transcripts were lower in D3(NMPs) than in hESCs, despite highlevel Sox2 protein in NMPs (Gouti et al 2014;Turner et al 2014) (Fig. 1F, 2B). Cdx genes regulate signalling that maintains the mouse NMP cell state and also induce expression of posterior Hox genes, which confer anterior-posterior identity (Young and Deschamps 2009;Young et al. 2009;Gouti et al. 2017). NMPs and their derivatives generated with this protocol expressed Hoxb4 and Hoxc6, indicative of thoracic identity (Fig. 2C). Although expression of 6 these genes commenced in D3(NMPs), their transcription increased during neural differentiation and so must also be indicative of the identity of neural progenitors generated in these conditions. In the embryo, differentiation from NMPs to neural progenitors involves downregulation of Bra and entry into a transitional preneural cell state (Fig. 1A), which is detected in vitro by persisting expression of Wnt8a/c and Nkx1.2 (Fig. 2E). As these genes decline Pax6 is then transcribed, rising to a peak at D8 (Fig. 2D). This suggests that neural progenitors arise between D5 and D8. This protocol therefore provides an assay in which to investigate the human NMP cell state and how spinal cord progenitors are formed.
Generation of a human Nkx1.2 reporter cell line Cell populations generated in vitro are inevitably heterogeneous and so we next made a reporter line that could be used to enrich for NMPs. We took advantage of CRISPR/Cas9 technology (Komor et al. 2017) to engineer H9 hESCs to express GFP under the control of the endogenous Nkx1.2 promoter. This homeo-domain containing transcription factor is highly expressed in NMPs (CLE and NSB) in the embryo and persists in preneural cells (Fig. 1A, 2E) but is not detected in non-neural tissues (Spann et al. 1994;Schubert et al. 1995;Rodrigo-Albors et al. 2016). We reasoned that selection for high Nkx1.2 expression on D3 when Bra transcripts are high would enrich for NMPs. To this end a GFP-T2A sequence (Kim et al. 2011) was knocked-in to the Nkx1.2 locus in-frame just upstream of exon1 ( Differentiation of this GFP-Nkx1.2 reporter line using the dSMADi-RA protocol was then characterized by Western blot; revealing GFP expression up to day 5 (Fig 2I), including low-level GFP in hESC, (consistent with detection of Nkx1.2 in H9 hESCs) (Fig. 2E). Flow cytometry (without GFP antibody) further confirmed GFP expression at D3 in GFP-Nkx1.2 cells compared with auto-fluorescence profile of wild-type H9 differentiated in parallel, which was then lost as cells differentiate (D7) (Fig. 2J). To confirm that Nkx1.2 locus modification did not impair differentiation we used immunocytochemistry and flow cytometry to assess Sox2/Bra co-expression on D3 (Fig. S1) and RT-qPCR ( Fig. S4) to profile expression of marker genes during dSMADi-RA differentiation. These analyses indicated that the engineered line made NMPs and that its differentiation was comparable to that of the parental H9 line (Figs S1, S4, 2A-E). Similar results were obtained with a second GFP-Nkx1.2 line, demonstrating the reproducibility of this approach (Fig. S5).
Identity and conservation of human NMP transcriptional signature We next used this GFP-Nkx1.2 cell line to select for high GFP-expressing cells on D3 using FACS (see Materials and Methods) and generated RNA-Seq data for D3 and D8 cell populations. This was also compared with published RNA-Seq data for H9 hESCs (Chu et al. 2016). Genes more highly expressed on D3 were identified by comparison with hESCs and D8 neural progenitors (Fig. 3A, Table S1). This included expected NMP-associated genes Bra, Cdx1, Sp5, Wnt8A/C, Fgf17, but also new genes, such as membrane protein of unknown function Unc93a and GPRC5a, an orphan G-protein-coupled receptor responsive to retinoid signalling (Cheng and Lotan 1998) This human D3 gene list compared was next with that for genes uniquely upregulated in in-vitro-derived mouse NMPs (Gouti et al. 2014). This identified 31 conserved NMP genes ( Fig. 3C). These include transcription factors known to be expressed in NMPs, Bra, Nkx1.2 and Mixl1, but also newly implicate Mkx (mohawk /Irx1L), (Liu et al. 2006), Alx3 (Beverdam and Meijlink 2001) and Runx3 as transcriptional regulators of this cell population. Predicted signalling pathways, Wnt (Wnt8A, Wnt5A, Dkk4) and TGFb antagonism (Fst, Follistatin) were also represented, along with genes implicating new signalling activities. These include 4 solute carriers mediating citrate (SLC13A5) or amino acid transport (SLC38A8, SLC43A1, SLC6A7).
SLC6A7 is a member of the gamma-aminobutyric acid (GABA) neurotransmitter gene family and strikingly two further genes mediating GABAergic signalling are also conserved: GAD1 (glutamic acid decarboxylase), which synthesizes GABA from glutamate and is transcribed in the mouse tailbud (Maddox and Condie 2001) and GABA receptor GABBR2/GPRC3B. In neurons, GABA-B receptors can trigger inactivation of voltage-gated calcium channels (Padgett and Slesinger 2010). Two further conserved NMP genes, CACNA1C (a calciumchannel auxiliary subunit/CaV1.2) implicated in maintaining calcium-channel inactivation (Soldatov et al. 1997) and ATP2A1 (a calcium transporting ATPase) which maintains low cytoplasmic calcium (Shull et al. 2003), may be additionally operating via different mechanisms to attenuate intra-cellular calcium levels. This is consistent with the requirement for calcium signalling for neural induction identified in chick embryos (Papanayotou et al. 2013) as indicated by Sox2 transcription, which is indeed detected at low-levels in NMPs and then rises in neural progenitors.
As there are not only species differences between these data sets, but also in vitro protocol variation, we additionally compared the human D3(NMP) molecular signature with those obtained for mouse embryonic NMPs at E8.5 and E9.5 using single-cell-RNA-Seq (Gouti et al. 2017). This identified 23 genes conserved between mouse embryo and in vitro derived human NMPs (Fig. 3D). This again included GAD1 and another GABA receptor, GABRG1, belonging to the type-A family, shown to regulate stem cell proliferation (Andang et al. 2008).
GABA biosynthesis is an output of the tricarboxylic acid (TCA) cycle, input to which can come from glycolytic metabolism, which has recently been shown to operate in tailbud progenitor cell populations (Bulusu et al. 2017;Oginuma et al. 2017). It will therefore be important in the future to understand the relationship between this metabolic state and GABA production in NMPs (Fig. 3D).

Transcriptomic characterization of the differentiation protocol
These RNA-Seq data also helped to characterize cell types generated with the dSMADi-RA differentiation protocol. The mesendoderm marker Sox17 was not detected, nor were transcripts from anterior neural genes (Foxg1, En2 and Dlx2) in any condition (<10 reads), while Otx2, which is initially expressed in the early epiblast and the primitive streak (Ang et al. 1996;Henrique et al. 2015), declines sharply from hESCs (Fig. 4A). This is not surprising given hESC exposure to FGF and Wnt signalling for 3 days to generate NMPs, at which time cells begin to express a range of Hox genes, including A1, B1, B4 and A7, (Fig. 4B). In this assay, therefore, NMPs possess a posterior identity prior to their direction along the neural differentiation pathway. Components of signalling pathways regulating NMPs exhibited expected gene expression profiles (Figs 4C, D, E). High-level transcription of neural progenitor and neurogenic genes ( Fig 4F) was detected on D8 and correlated with increased retinoid signalling reported by RARb transcription (Fig. 4G). The expression of both BMP and Shh pathway genes (Fig. 4H,I) on D8 suggested that induced spinal cord progenitors are exposed to dorsal (BMP) and ventral (Shh) patterning signals. However, while dorsal neural progenitor and neural crest associated genes were expressed along with some more ventral progenitor genes (Fig. 4J), ventral-most markers Nkx2.2 and floor plate marker FoxA2 were not detected at D8 (data not shown). The early transcription of neural crest genes in this differentiation assay further suggests that as in the elongated embryonic body axis and in mouse ES-derived
For further differentiation, NMPs were passaged using PBS-EDTA 0.5 mM and seeded back at controlled density (200000 cells cm 2 ) on Geltrex (20µgcm 2 , LifeTechnologies) in the presence of Y-27632 (10 mM, Tocris). From passaging, cells are then cultured in N2B27 containing 100 nM RA for the indicated time to obtain later stage progenitors. In addition to the above, 50 ng/mL Noggin and 10µM SB431542 are added from day 2 to day 4. All experiments with hESCs were approved by the UK Stem Cell Bank steering committee (SCSC14-28 and SCSC14-29).

RTqPCR
Total RNA was extracted using the RNEasy mini kit (Qiagen), following the manufacturer's instructions, with the addition of a DNAse digestion step performed on the column for 15 min with RQ1-DNase (Promega). After initial denaturation for 5 min at 70°C in presence of 1 µg random primers, 500 ng of RNA per sample were reverse transcribed for 1h in 20 µL reaction volume containing 0.5 mM dNTPs, 5 mM MgCl2, 1X ImProm-II RT buffer, 20 U RNasin and 160 U of ImProm-II RT (Promega). Samples were incubated for 15 min at 70°C to stop the reaction. qPCR analysis was performed using primers described in table 6 on either a Mastercycler RealPlex2 (Eppendorf) or an AriaMX (Agilent) device in presence of PerfeCTa SYBR Green SuperMix for iQ (Quanta Biosciences) or BrilliantIII SYBRgreen PCR MasterMix (Agilent) respectively. Relative expression was calculated using the DDCt method, normalizing each gene of interest to Gapdh levels.
Quadrant gates used to estimate the percentage of positive cells were designed based on fluorescence levels detected in the control samples processed without primary antibodies.

GFP-Nkx1.2 engineering
The donor plasmid construct, pDonorNkx1.2NterKI, was synthesized by GeneArt. The vector is based on a pMK-RQ backbone and contains a Kanamycin resistance cassette and the GFP-T2A insert flanked by 500 bp homology arms for recombination to the Nkx1.2 5' end. The second plasmid used, px335Nkx1.2NterKIas, encoded the Cas9D10A nickase (Cong et al., 2013) and the antisense gRNA (asgRNA GCCCACGGGCCGGCGGTCGG). A third plasmid, pBABEDpU6Nkx1.2NterKIs, included the sense gRNA (sgRNA GCTGGCATGGCAGGACGGCG) and a puromycin resistance cassette to select transfected cells. CRISPR-Cas9 mediated gene targeting was performed as follows. H9 hESC were dispersed to single cells using TryLEselect (thermofisher) and re-suspended in DEF medium in the presence of Y-27632 (10mM, Tocris).
For transfection, 5x10 6 cells were pelleted by centrifugation at 300 xg for 3 minutes, washed with PBS and re-suspended in 100µL buffer R (Neon Transfection Kit, Thermofisher). 4ng of pDonorNkx1.2NterKI, 2ng of px335Nkx1.2NterKIas and 2ng of pBABEDpU6Nkx1.2NterKIs was added to the mix. Electroporation was performed with the Neon Transfection System (Thermofisher) using the following parameters: 1 pulse, 1150V, 30ms. Transfected cells were plated and allowed to recover for 36 hours, puromycin selection was applied for a further 36 hours. Clones were left to grow until easily visible, hand-picked and seeded back in 96 well plates before being amplified. Screening of the clones for GFP integration was performed by PCR using primers amplifying across the insertion sites (GFPcheckFw1+GFPcheckRev1, see the table below for sequences) (Fig. 2G). Correctly targeted integration of the GFP-T2A sequence was checked in 40 transformed hESC clones by PCR across the integration site. Overlapping PCR amplification products spanning the locus from outside the homologous region to inside the GFP sequence on both sides of the integration were sequenced (GFPcheckFw1+GFPcheckRev2 and GFPcheckFw2+GFPcheckRev1, see the table below for sequences). Five clones were found to include the GFP-T2A sequence at the correct locus and these were all heterozygous for GFP-Nkx1.2 (Fig. 2G). PCR bands obtained for GFP-Nkx1.2 clone 5 were sequenced to check for integrity of the recombination borders and absence of mutations. Results were combined and detailed sequence of the engineered allele was obtained (Fig. S3) (Chu et al. 2016) were re-analysed in the same fashion, however, we point out that these were single-end non-stranded reads. For the following analysis, 4 biological replicates were used for D3(NMP) and 2 for D8 samples.
Differential expression was performed with edgeR 3.16.5, for each pair of conditions independently. Benjamini-Hochberg multiple-test correction was applied to test p-values.
Human NMP genes (Fig. 3A) were determined by selecting genes using the following criteria: at least 10 read counts in D3(NMP), significantly enriched (p value < 0.01) in D3(NMP) compared to both hESC and hD8 samples, with a foldchange >2. Time-dependent properties of genes were studied using intensity profiles hESC-D3(NMP)-D8. Each point in the profile is a DESeq-normalized mean gene count across replicates. To make profiles comparable, they were normalized to their mean across conditions, so the mean of each normalized profile is 1.           Sequencing of the GFP-T2A insertion site in the correctly targeted clone used in this study. Grey: plasmid sequence, highlighted yellow: Primers used for cloning, highlighted blue: gRNA (antisense and sense), highlighted green: GFP sequence, orange: T2A sequence, bold black: Nkx1.2 Exon1, peach: repetitive sequences. Figure S4 Diffentiation evaluation of the GFP-Nkx1.2 clone used in this study Expression of selected marker genes was analyzed by RTqPCR during differentiation of the GFP-Nkx1.2 line following the protocol presented Figure 1F. Graphs represent the expression of each individual gene normalized to Gapdh and relative to hESC levels. Average of 3 independent RTqPCR experiments, error bars: standard deviation.