Microbial gene expression was followed for 23 days within a
mesocosm (M1) isolating 50 m
We observed high expression of the
Gene expression of diazotrophic cyanobacteria was mainly attributed to
In the study of natural marine microbial populations, it is of fundamental
interest to identify the biota these populations consist of and to elucidate
their transcriptional activities in response to biotic or abiotic changes in
the environment. Metatranscriptomics gives insight into these processes at
high functional and taxonomic resolution, as shown, e.g., in the analysis of
a wide range of marine microbial populations
(Frias-Lopez
et al., 2008; Ganesh et al., 2015; Gifford et al., 2014; Hewson et al.,
2010; Hilton et al., 2015; Jones et al., 2015; Moran et al., 2013; Pfreundt
et al., 2014; Poretsky et al., 2009; Shi et al., 2009; Steglich et al.,
2015; Wemheuer et al., 2015). Here, we report the results of a
metatranscriptome analysis from the VAriability of vertical and tropHIc transfer of diazotroph derived N in the
south wEst Pacific (VAHINE) mesocosm experiment, whose
overarching objective was to examine the fate of diazotroph-derived nitrogen
(DDN) in a low-nutrient, low-chlorophyll (LNLC) ecosystem
(Bonnet et al., 2016b). In this experiment, three
large-scale (
Samples were collected in January 2013 every other day at 07:00 LT from mesocosm
1 (hereafter called M1) and from the Nouméa lagoon (outside the
mesocosms) in 10 L carboys using a Teflon pump connected to PVC tubing. To
ensure quick processing of samples, the carboys were immediately transferred
to the inland laboratory setup on Amédée Island, located 1 nautical
mile off the mesocosms. Samples for RNA were prefiltered through a 1 mm mesh
to keep out larger eukaryotes and then filtered on 0.45
The samples were treated by TurboDNase (Ambion, Darmstadt, Germany),
purified with RNA Clean & Concentrator columns (Zymo Research, Irvine,
USA), followed by Ribozero (Illumina Inc., USA) treatment for the depletion
of ribosomal RNAs. To remove the high amounts of tRNA from the rRNA depleted
samples, these were purified further using the Agencourt RNAClean XP kit
(Beckman Coulter Genomics). Then, first-strand cDNA synthesis was primed
with an N6 randomized primer. After fragmentation, Illumina TruSeq
sequencing adapters were ligated in a strand-specific manner to the 5' and
3' ends of the cDNA fragments, allowing the strand-specific PCR
amplification of the cDNA with a proof-reading enzyme in 17–20 cycles,
depending on yields. To secure that the origin of each sequence could be
tracked after sequencing, hexameric TruSeq barcode sequences were used as
part of the 3' TruSeq sequencing adapters. The cDNA samples were purified
with the Agencourt AMPure XP kit (Beckman Coulter Genomics), quality
controlled by capillary electrophoresis and sequenced by a commercial vendor
(Vertis Biotechnologie AG, Germany) on an Illumina NextSeq 500 system using
the paired-end (2
Raw paired-end Illumina data in FASTQ format were pretreated as follows
(read pairs were treated together in all steps to not produce singletons):
adapters were removed and each read trimmed to a minimum Phred score of 20
using cutadapt. This left 386 010 015 pairs of good-quality raw reads for
the 22 samples. Ribosomal RNA reads were removed using SortMeRNA
(Kopylova et al., 2012). The resulting non-rRNA reads
(corresponding to a total of 155 022 426 pairs of raw reads binned from all
samples) were used as input for de-novo transcript assembly with Trinity
(Haas et al., 2013) using digital normalization
prior to assembly to even out kmer coverage and reduce the amount of input
data. Remarkably, data reduction by digital normalization was only
The transcript assembly led to 5 594 171 transcript contigs with an N50 of
285 nt, a median contig length of 264 nt, and an average of 326 nt.
Transcript abundance estimation and normalization was done using scripts
included in the Trinity package.
During manual analysis of the top 100 transcript contigs according to their mean expression over all samples, we found 9 transcripts to be residual ribosomal RNA or internal transcribed spacer. These contigs were removed from the count and TPM matrices for all multivariate statistics analyses. Absence of these rRNA transcripts in the Diamond output was checked and verified.
The matrix with expected counts for each transcript contig (see Sect. 2.2)
was used as input for differential expression (DE) analysis with edgeR
(Robinson et al., 2010) as implemented in the Trinity package script
Nonmetric multidimensional scaling (NMDS) was performed in R on the transposed matrix containing all 3 844 358 transcript contigs and their respective TMM-normalized TPM values. First, the matrix values were standardized to raw totals (sample totals) with the decostand function of the vegan package (Oksanen et al., 2015). Then, metaMDS was used for calculation of Bray–Curtis dissimilarity and the unconstrained ordination.
A list with genes of interest was created using the Integrated Microbial
Genomes (IMG) system (Markowitz et al., 2015).
First, 147 genomes close to bacteria and archaea found in the samples (based
on 16S rRNA sequences; Pfreundt et al., 2016) were
selected using “find genomes”. Then, the “find genes” tool was used to
find a gene of interest (for example
From the full Diamond output (Sect. 2.2), all matching transcripts together with their taxonomic and functional assignment were extracted and false positives discarded (i.e., transcripts that mapped to the list of specific genes but had a different Diamond hit). The top hit for each transcript was extracted, and the protein classifications manually curated to yield one common description per function (from different annotations for the same protein in different genomes). The TMM-normalized TPM counts were added to each transcript classification, as well as the full taxonomic lineage from NCBI taxonomy. These taxonomic lineages were curated manually to align taxonomic levels per entry.
The table was imported into R, all counts per sample summed up for each combination of protein and family-level taxa, and a matrix created with samples as row names and combined protein and family description as column names. Heat maps were created separately for each protein group (e.g., rhodopsins or sulfolipid biosynthesis proteins), scaling all values to the group maximum.
Flowchart describing the major steps in the bioinformatics workflow. Preprocessing of RNA-Seq reads was done separately for each data set, leading to 22 data sets of nonribosomal paired-end reads. These were binned and used as input for de-novo assembly of transcripts. The nonribosomal reads were mapped back onto the assembled transcripts with bowtie (Langmead, 2010) to infer each transcripts abundance in each sample using RSEM (Li and Dewey, 2011). Raw abundances were used for differential expression (DE) analysis and cluster analysis with edgeR on the M1 and lagoon count matrices separately, to find transcripts which changed significantly over time. To enable direct in-between sample comparison of transcript abundances, raw abundances were converted to TPM (transcripts per kilobase million) and TMM normalized (trimmed mean of M values) in RSEM, creating the final count matrix used for all figures showing transcript abundances. Classifications for these transcripts were generated using Diamond (Buchfink et al., 2015) against the RefSeq protein database. Further, a manually curated list with specific genes involved in N and P metabolism, as well as light utilization (genes of interest, GOIs) was used to extract the corresponding transcripts, but final classifications were inferred from the Diamond output. This information was used to produce the integrated function-per-taxon heat maps.
The metatranscriptomic data were analyzed following the strategy outlined in Fig. 1. We obtained taxonomic assignments for 37 % of all assembled transcript contigs. This reflects the fact that the genes of complex marine microbial communities, especially from infrequently sampled ocean regimes like the southwest Pacific, are still insufficiently covered by current databases. The data with taxonomic assignments thus give an overview about the gene expression processes during this mesocosm experiment. With this study, we aimed at identifying global differences in expression patterns between the mesocosm and the lagoon, as well as between the different sampling time points within the mesocosm. We further explored the expression of marker genes for N and P metabolism, and light utilization in the different taxonomic groups.
Gene expression changes roughly followed the timeline, within both M1 and
the Nouméa lagoon, with some exceptions (Fig. 2). For the lagoon,
samples from day 20 and 23 clustered together, the samples from day 10 to 18
formed a mid-time cluster, and those from day 2 to 8 an early cluster (Fig. 2b). In M1, the samples from day 6 to 10 and day 12 to 20 clustered together
(Fig. 2a). Deviating from the timeline, the sample from day 2 was placed
close to day 20, day 23 was separated from the late cluster, and day 4,
exhibiting a prominent subcluster of transcripts upregulated only that day,
was the furthest apart from all other samples (Fig. 2a, black brackets).
Closer inspection of this subcluster containing several hundred different
transcripts identified > 80 % of them as
Heat map showing the expression (median-centered
log
Unconstrained ordination using nonmetric multidimensional scaling (NMDS)
confirmed the similar temporal distribution of samples from the Nouméa
lagoon and M1 (Fig. 3). Yet, the samples from M1 showed a much higher
variance and were more dispersed than those from the lagoon (Fig. 3).
Thus, the gene expression profiles within the mesocosm were more diverse
than in the lagoon waters. The comparison of the whole data set against the
KEGG database (Kanehisa et al., 2014) showed a major difference between M1
and the lagoon samples only in the category energy metabolism, and its subcategories
photosynthesis and antenna proteins. These categories comprised
22–36, 8–16, and 2.7–7.5 % in the lagoon, respectively, and
were in M1 (excluding day 23) constantly below 22, 7, and 4 %,
respectively (the Supplement Figs. S1 and S2). This lower contribution of
energy-related functions in M1 was detectable already at the earliest time
point (day 2). Furthermore, diverging dynamics in the microbial community
composition and transcriptional activity were triggered in M1 already within
the first 48 h (before day 2 was sampled), indicated by the large distance
between M1 and lagoon samples on day 2 (Fig. 3). The early timing of this
effect already on day 2 suggests a rapid remodeling of the microbial
community's gene expression upon confinement within the mesocosm. In
addition, the DIP spike on the evening of day 4 subsequently triggered distinct
ecological successions in M1. The patterns we observed here are
close to the three temporal phases defined for the VAHINE experiment based on
biogeochemical flux measurements (Bonnet et al., 2016a)
and
In the following sections, we refer to P0, P1, or P2 to describe trends and changes in gene expression when appropriate.
The most striking difference between M1 and Nouméa lagoon samples was
the 2- to 3-fold dominance of
NMDS ordination of samples on the basis of TPM counts (transcripts per million sequenced transcripts). Outside samples are blue, samples from M1 are orange. Note that samples from M1 are more dispersed in the plot, thus transcription profiles are more diverse than outside. This might be due to the DIP fertilization creating a distinct ecological succession in M1.
Owing to the initial decay of
Comparison of the taxonomic affiliation of mRNA transcripts from M1 and the lagoon in the three chronological phases P0, P1, and P2, visualized with CoVennTree (Lott et al., 2015). Normalized transcript abundances (TMM-normalized TPM), displayed as the node area, were summed up per phase as follows. P0: day 2–day 4, P1: day 6–day 14, P2: day 16–day 23. The different sizes of the root nodes occur because different transcripts with differing total read abundances may be classifiable in the different data sets, and the data set normalization included all transcripts (also nonclassifiable). The overlap of the red (M1) and blue (lagoon) circles denotes the amount of transcripts present in both locations during the respective phase. The diagrams were reduced to show only major nodes and thus raise no claim to completeness. Yet, each node contains the information from all its children nodes, also those not shown. Archaea are scarcely represented in the current RefSeq protein database, thus their transcript abundances are underestimated here.
Unexpectedly, over 3 weeks, the temporal pattern of SAR11
transcription appeared tightly coordinated with that of SAR86
Gammaproteobacteria (Figs. S3, S4). We tested pairwise
correlations of alpha- and gammaproteobacterial groups and found that SAR11
and SAR86 transcript accumulation were highly correlated in M1 and the
Nouméa lagoon (Fig. S6, Pearson correlation:
A closer look into gammaproteobacterial activities (Figs. S3c,
S4c) revealed a dominant pool of transcripts from the oligotrophic marine
Gammaproteobacteria (OMG) group
(Cho
and Giovannoni, 2004; Spring et al., 2013) and
Other groups following the dominant classes
Gene expression in putative diazotrophic cyanobacteria inside M1 and in the Nouméa lagoon. Note the square-root scale for both plots and the generally higher transcript abundances inside M1. Transcriptional activity is presented in TPM (transcripts per million transcripts sequenced), normalized in between samples by TMM normalization (edgeR). Thus, plots can be directly compared, but values are relative.
We specifically examined the gene expression patterns of diazotrophic
cyanobacteria (Fig. 5) and compared them with parallel analyses of
The 100 most highly expressed nonribosomal transcripts, as identified by
highest mean expression in all samples, are presented in the Supplementary Table S1. Of them, 24 could not be classified with the NCBI nucleotide or protein
databases and remain unknown. The most abundant transcript overall, both
inside M1 and outside, was the non-protein-coding RNA (ncRNA) Yfr103,
discussed in Sect. 3.4. All classified transcripts on the top 9 ranks plus
28 additional transcripts in M1 were related to
To investigate gene-specific expression patterns, we analyzed genes of interest (GOIs) from specific genera. Transcripts mapping to the respective genes from different organisms were extracted, searched against NCBI's nonredundant protein database, and the hits were manually curated. This analysis was only performed for the M1 samples.
Genes indicative of different nitrogen utilization strategies are shown in
Fig. 6. The selected GOIs were related to nitrogen fixation, nitrate and
nitrite reduction, the uptake and assimilation of ammonia (transporter AmtA
and glutamine synthetase,
For
Expression of selected genes indicative for different nitrogen
acquisition strategies. TPM counts were summed per taxonomic family, the
names of which are denoted to the right of each line. The maximum TPM value
for each group is written below the name of that group. For plotting, values
were scaled within each functional group, but not for each line, resulting
in the maximum color density always representing the maximum TPM. After the
name, in brackets, is additional annotation information, if deviant from the
name of the functional group. The
Most other bacteria require ammonia, nitrate, or organic nitrogen sources
such as urea, with ammonia being the energetically most favorable source.
The importance of ammonia was underscored by expression of the respective
uptake systems in many different taxa over long periods of the experiment
and expression of glutamine synthetase (GS) (Fig. 6, ammonium transporter and glutamine synthetase), the enzyme
forming the central point of entry for the newly assimilated nitrogen into
the metabolism. Ammonia transporters (AMT) were highly expressed in the
Nitrate reductase expression was detected on day 10 mainly in
The expression of the NtcA transcription factor itself can be an indicator
for the nitrogen status, especially in marine picocyanobacteria
(Lindell and Post, 2001;
Tolonen et al., 2006). Therefore, the clear peaks for NtcA expression in
Heat map showing selected genes indicative for different
phosphorus acquisition strategies
In addition to nitrogen, genes involved in phosphate assimilation were
analyzed in more detail. Expression of alkaline phosphatase (AP) was
prominent between days 12 and 23 in different organisms (Fig. 7, alkaline phosphatase) and not
expressed before the DIP fertilization, although phosphate levels were
similar before the fertilization event and after day 13
(Pfreundt et al., 2016), and phosphate turnover time
reached prefertilization levels after day 20
(Berthelot et al., 2015).
TonB-dependent transport allows large molecules to pass through the
membrane. This strategy of exploiting larger molecules as nutrient sources
is thought to be prevalent in SAR86 bacteria
(Dupont et al., 2012) and indeed we found
the highest expression of
Proteorhodopsin was highly expressed, especially by SAR11 (
Although largely unexplored in nonmodel bacteria, ncRNAs can play important
regulatory roles, e.g., in cyanobacteria in the adaptation of the
photosynthetic apparatus to highlight intensities
(Georg et al., 2014) or of the nitrogen
assimilatory machinery to nitrogen limitation
(Klähn et al., 2015). During
the analysis of the 100 transcripts with the highest mean abundance, we
found that 14 of these transcripts corresponded to the recently identified
noncoding RNA (ncRNA) Yfr103. The Yfr103 transcripts were mapped to 14
different loci, from these 12 could be assigned to
Here, we have studied how mesocosm confinement and DIP fertilization
influenced transcriptional activities of the microbial community during the
VAHINE experiment in the southwest Pacific. One of the most pronounced
effects we observed was transcript diversification within the mesocosm,
pointing to induced transcriptional responses in several taxonomic groups
compared to a more stable transcript pool in the lagoon. Despite this
diversification, analysis of differentially expressed transcripts amongst
time points showed that global transcriptional changes roughly followed the
timeline in both M1 and the lagoon. This confirms results from 16S-based
community analysis, where time was shown to be the factor most strongly
influencing bacterial succession in both locations. Gene expression inside
M1 was dominated by Alphaproteobacteria until day 12, with
The specific gene expression of diazotrophic cyanobacteria could be mainly
attributed to
All raw sequencing data can be downloaded from NCBI's Sequence Read Archive (SRA) under the accession number PRJNA304389 (Pfreundt et al., 2015).
Sophie Bonnet conceived and designed the experiment. Ilana Berman-Frank took part in experimental planning, preparation, and implementation. Ulrike Pfreundt, Wolfgang R. Hess, Dina Spungin, and Ilana Berman-Frank participated in experiment and sampled. Ulrike Pfreundt analyzed samples and prepared all figures. Wolfgang R. Hess and Ulrike Pfreundt wrote the manuscript and all authors contributed and revised the manuscript.
The authors thank the captain and crew of the R/V