Supplemental Figure 1
-
Fig. S1. Relative over-representation of specific
CAGGTAG-related heptamers in genes ranking highest in a microarray analysis of
transcript levels 30 minutes into nuclear cycle 14 time point T1 data (Pilot
et al., 2006) confirms and extends inferences from the pre-CB/post-CB
comparison in Fig. 2. The occurrence of
CAGGTAG and all its 1 bp degenerates in successive 100 gene cohorts of the top
500 ranked genes based on mRNAs from time point T1 is compared with that for
all genes at T1 from rank 1001 to 13966, within the most promoter-proximal 500
bp upstream gene region. The expectation that pre-CB genes that do not have
maternally deposited transcripts will be in the highest rank cohorts, while
only a tiny fraction of all genes below rank 1000 will be pre-CB, is born out
by the fact that of the 23 certified pre-CB genes without maternal transcripts
on which the analysis of Fig. 2 was based, 14 appeared in the top T1 cohort
from Pilot et al.: four in the second, four in the third, and none in the
fourth and fifth. The probability by a chi-square test (incorporating Yate’s
correction) that the difference in occurrence of the indicated heptamer among
the top ranked genes relative to the lower ranked genes would arise by chance
is indicated on the y-axis, with a baseline cutoff of P=0.01 (–log0.01=2). All cases of a significant
difference arose from over-representation of the indicated heptamer among
high-ranked genes. Despite the fact that this comparison was based on a much
larger data set than the analysis in Fig. 2, the conclusion was unchanged: the
three functionally certified TAGteam sequences, CAGGTAG, tAGGTAG, and CAGGcAG,
are greatly over-represented in pre-CB genes relative to genes expressed later,
with CAGGTAG leading the pack by a wide margin. However, the additional data
here highlight two other sequences, CAGGTAa and CAGGTAt, the relative
over-representation of which among pre-CB genes appears to be at least as
significant as that for certified TAGteam member CAGGcAG. Discrepancies between
the genome annotation used for the Pilot et al. microarray analysis and that
(release 4.3) used for the statistical test here eliminated 20.2% of the
top-ranked 500 entries and 19.1% of the low-rank group.