Seven mature colonies of R. speratus were collected in Toyama Prefecture in 2016 and 2019. A mature colony of H. sjostedti was collected in 2015 on Yakushima Island, Kagoshima Prefecture. Two mature colonies of N. takasagoensis were found on Ishigakijima Island, Okinawa Prefecture, in 2017 and 2018. They were found in the laboratory at about 25 ° C in constant darkness. Three relatives of C. punctulatus were collected at Mountain Lake Biological Station, Giles County, VA in April 2015-201741 and kept in constant darkness at 15 ° C. Testes and ovaries were dissected from three males and three females, respectively, and stored at −80 ° C until RNA was extracted for subcloning and expression analysis of Cpun_dsx.
BLAST searches for dsx and dmrt homologues in termites and cockroaches
tBlastX searches were performed in genome and transcriptome databases of six termite species and one Cryptocercus cockroach species (Table 1) using SequenceServer42. Either the OD1 (45 amino acids) or OD2 (45 amino acids) sequences of B. germanica dsx were used as queries. Sequences with an E value less than 10-5 were selected as candidates for dsx orthologues and used for phylogenetic analysis.
Construction of a phylogenetic tree from OD1 sequences and identification of dsx orthologues
To identify the dsx ortholog, a phylogenetic tree of OD1 sequences (135 bp with no gaps) was constructed according to previous studies5,43,44. We used 84 OD1-containing genes (Table S3 ‘OTU’). The DMRT7 genes from vertebrate animals (mouse and cattle) were used as outgroups. Phylogenetic relationships were derived using Bayesian Inference (BI) and Maximum Likelihood (ML) methods. For BI, the most suitable model of sequence development was determined using the model selection option implemented in MEGA version 7.0.2145 and the GTR + G model was selected. With MrBayes version 3.2.646 a total of 100,000 trees were obtained (ngen = 10,000,000, samplefreq = 100). The first 25% of them (25,000) were discarded as burn-ins and a consensus tree was created with a 50% majority rule. For ML, 1000 bootstrap replicas were performed based on the same sequence development model as BI in MEGA 7.0.21 with the default tree inference options.
According to Wexler et al.9, the synteny of dsx was examined on the basis of its proximity to the transcription factor prospero (pros). The dsx is at the genomic position near Pros with a distance between 17 and 245 kb in eight different insect orders (Blattodea, Orthoptera, Ephemeroptera, Phthiraptera, Thysanoptera, Hymenoptera, Coleoptera and Lepidoptera) 9.23. To identify the pro-ortholog in each termite, tBlastX searches were carried out against gene models of R. speratus24 and M. natalensis18 with SequenceServer42 and BlastX searches against predicted protein databases from Cryptotermes secundus15 and Coptotermes formosanus22. The pros sequence from B. germanica (Accession No. PYGN0100615) was used as a query. To reveal the genomic position, tBlastN searches were then carried out against the genome database of Cryptotermes secundus15 and Coptotermes formosanus22, querying the obtained pros protein sequence in each species using SequenceServer42.
Subcloning the candidate dsx orthologues
Female and male Alat reproductive animals (winged adults) of R. speratus were obtained from three colonies collected in 2016. According to previous studies47,48 incipient colonies were formed using unrelated female and male alates. A total of 30 eggs were obtained and 1.5 months after colony establishment, 3 primary queens and 3 primary kings were randomly selected from 20 colonies. Total RNA from all eggs was extracted and treated with DNase I using the ReliaPrep RNA Tissue Miniprep System (Promega, USA). The total RNA of the primary queens or kings was extracted from the body (excluding the head parts, three individuals of each sex) using ISOGEN II (Nippongene, Japan). Total RNA of female or male alates of N. takasagoensis (using the colony collected in 2017) was extracted from the body (with the exception of the head parts, five individuals per sex) with ISOGEN II. Total RNA of the secondary queen or king of H. sjostedti (using the colony collected in 2015) was extracted from the body (except for the head part, one individual per sex) with ISOGEN II. C. punctulatus ovary and testicle were dissected from adult female and male respectively. Total RNA was extracted from each of the gonads using ISOGEN I and II and then treated with RNase-free DNase I (Takara, Japan). The purity and amount of the extracted RNA were measured using a NanoVue spectrophotometer (GE Healthcare Bio-Sciences, Japan).
The 5 ‘and 3’ ends of Cpun_dsx were RENNEN with the SMARTer RACE 5 ‘/ 3’ kit (TaKaRa, Shiga, Japan) and gene-specific primers for OD2 (“Cpun_dsx OD2 5’RACE” and “Cpun_dsx OD2 3 ‘ “, Table S1). Its male-specific exon was amplified with Advantage® 2 Polymerase Mix (TaKaRa), the gene-specific primer “Cpun_dsx OD2 3’RACE” and a reverse primer in the male-specific exon of Blattella dsx, which was located at the terminal codon, and 3 ′ UTR (“Cpun_dsx male-specific exon-R”, Table S1). The 3 ‘ends of Rspe_dsx, Hsjo_dsx and Ntak_dsx were also amplified using the SMARTer RACE 5’ / 3 ‘kit and redesigned gene-specific primers (Table S1). The amplified fragments were subcloned into the pGEM-T easy vector system (Promega) and the nucleotide sequences of each fragment were determined using the ABI Prism Big Dye Terminator v3.1 Cycle Sequencing Kit in conjunction with a 3500 Genetic Analyzer (Applied Biosystems, Foster®) certain city, California, USA). The newly identified sequences of Rspe_dsx, Ntak_dsx and Hsjo_dsx were deposited in the GenBank / EMBL / DDBJ database under the accession numbers: LC635717, LC635718 and LC635719 (Table 1).
RNA-Seq analysis in Reticulitermes speratus
RNA-seq data from R. speratus were used to compare the expression levels of Rspe_dsx (Table 1) between three castes (workers, soldiers and reproductive animals), two parts of the body (heads and the rest of the parts) and two sexes (biological triplicates) ; NCBI BioProject accession number PRJDB558921). The filtered RNA-Seq reads were mapped to their genome assembly using TopHat v2.1.02120. The frequencies of transcripts were then estimated using the featureCounts program of the Subread package49. To compare gene expression levels between castes and between the sexes, counts per million (CPM) were first calculated from the estimated transcript frequencies. Genes with at least CPM of 1 in at least three samples were saved for subsequent analysis. The CPM values were then normalized with the trimmed mean of the M values (TMM) algorithm in edgeR50. Differential gene expression analyzes were performed separately for each body part using a GLM with two factors, namely caste and gender, implemented in edgeR, and then genes with a false detection rate (FDR) <0.01 were identified as genes expressed in one gender . certain way. RPKM values (Reads Per Kilobase Million) were calculated by dividing the CPM values by the length of the genes in kilobases.
Real-time quantitative PCR
Gene-specific primers were designed against Rspe_dsx, Ntak_dsx and Cpun_dsx using Primer3Plus51 for real-time quantitative PCR (Table S2). The total RNA of female and male R. speratus nymphs (from the colony collected in 2019) was extracted individually from whole bodies (10 individuals of each sex) with ISOGEN II. The total RNA of male (small and medium-sized) workers, female (medium and large) workers and male soldiers of N. takasagoensis (from the colony collected in 2018) was extracted from the whole body of five different individuals using ISOGEN II. Total RNAs obtained from adult cockroach gonads were also used. The DNase treatment was carried out using the same procedure described above. According to previous studies47,48 incipient colonies (queen-queen and two-queen colonies) of R. speratus were established using alates from three colonies collected in 2019 and develop into both sexes or only females. Total RNA was extracted from 30 sexual and 30 asexual eggs (in triplicate) and treated with DNase I using the ReliaPrep RNA Tissue Miniprep System (Promega). cDNAs were synthesized from the purified RNA using a high-capacity cDNA reverse transcription kit (Applied Biosystems). Biological replications (n = 3–5) were made for each category. Expression analyzes were performed using Thunderbird SYBR qPCR Mix (Toyobo, Japan) with a MiniOpticon Real-time PCR System (Bio-Rad, Japan) and an Applied Biosystems 7500 Fast Real-Time PCR System (Applied Biosystems).
To determine a sustainable internal control gene for R. speratus, the expression levels of six genes [EF1-α (accession no. AB602838), NADH-dh (no. AB602837), β-actin (no. AB520714), GstD1 (gene id RS001168), EIF-1 (RS005199), and RPS18 (RS015150)] were evaluated with the software GeNorm52 and NormFinder53 (Table S4). According to previous studies41,54 there are three genes for C. punctulatus and N. takasagoensis [β-actin (nos. Cp_TR19468 and AB501107), NADH-dh (Cp_TR49774 and AB50119), and EF1-α (AFK49795 and AB501108)] evaluated (Tables S5 and S6). All gene specific primers were designed using Primer3Plus (Table S2). We confirmed the amplification of a single PCR product using dissociation curves and product sizes.
The expression levels were statistically analyzed using GLMs with gamma error distribution and log-link function. The fixed effects were gender, caste and their interaction. Multiple comparisons were carried out using general linear hypothesis tests (Glht function) with Tukey fitting in the “multcomp” R package55. These analyzes were carried out with R ver. 4.0.3 (available at http://cran.r-project.org/).
Search for transcription factor binding motif
The Reticulitermes genome was loaded into the database of HOMER v4.11 with the program “loagGenome.pl”. The “findMotifsGenome.pl” program was executed to search for Drosophila motif collections against the promoter of Rspe_dsx, which was the 1.0 kbp upstream region from its transcription start site, using the “-mcheck” option. The nucleotide sequences of Drosophila transcription factors detected in their binding sites were subjected to a BlastX search for orthologues against the protein database of R. speratus41 using BLAST + (2.10.1) 56. The expression levels of these orthologues were extracted from their transcriptome data21 and statistically analyzed using GLMs with gamma error distribution and log-link function, as mentioned above.