Plankton-derived environmental DNA extracted from abyssal sediments preserves patterns of plankton macroecology

Deep-sea sediments constitute a unique archive of ocean change, fueled by a permanent rain of mineral and organic remains from the surface ocean. Until now, paleo-ecological analyses of this archive have been mostly based on Biogeosciences Discuss., doi:10.5194/bg-2016-486, 2016 Manuscript under review for journal Biogeosciences Published: 22 November 2016 c © Author(s) 2016. CC-BY 3.0 License.


Introduction
With over two thirds of the planet covered by oceans, deep sea deposits form the most extensive archive of the Earth's recent history.These deposits preserve mineralized skeletons of marine nano-and micro-plankton, which serve as an column above the sediment (Lejzerowicz et al., 2013).Part of this DNA pool remains preserved in ancient sediments, and can be extracted and analyzed using metabarcoding to reveal the molecular diversity of past ecosystems (Lejzerowicz et al., 2013;Pawłowska et al., 2014).This potential has been demonstrated in a range of other depositional environments, such as cave sediments, lake and ice cores where the dynamics of plant and animal communities could be followed over 50 ka (Pedersen et al., 2015).

5
In marine sediments, the presence of eDNA sequences has been reported from organic-rich layers in the Mediterranean dating back to 217 ka (Coolen and Overmann, 2007) and 125 ka (Boere et al., 2011), in sediments covering the last 11.4 ka in the Black Sea (Coolen et al., 2013), and in up to 32.5 ka old deposits in the Atlantic (Lejzerowicz et al., 2013;Pawłowska et al., 2014).Direct comparison with co-occurring fossils showed that the sequenced eDNA pool exceeds the taxonomic spectrum of the fossils, but many of the taxa preserved as fossils were not identified in the 10 eDNA (Pawłowska et al., 2014;Pedersen et al., 2013).This raises the question of how well the sedimentary DNA pool reflects the autochthonous (in situ origin) or allochthonous (foreign origin) community composition, whether there is any differential DNA preservation across taxa and whether the metabarcode marker selected is fully representative of the entire taxonomical diversity, regardless of its origin.
The primary difficulty in the analysis of the sedimentary DNA pool is to separate the local and foreign origin of the 15 sequenced material (Torti et al., 2015).This can be done with certainty only when the ecological origin of the sequenced eDNA is unambiguously resolved.Potential bias could arise from a range of factors including preferential amplification (Taberlet et al., 2012), inconsistent taxonomic resolution of the sequenced barcodes (Pawlowski et al., 2012) and insufficient coverage of the barcode reference database (Pawlowski et al., 2014b).In addition, the extensive fragmentation of eDNA (Pedersen et al., 2015) makes incompatible the amplification of sequences longer than ~100 20 bp, preventing the alleviation of bias by targeting longer, more informative barcodes.
Here we take advantage of the possibility to unambiguously ascribe sequences of foraminifera to benthic and planktonic lineages.By analyzing the planktonic portion of foraminiferal metabarcodes from deep sea sediments, we provide evidence that the structure and diversity of surface ocean communities is preserved in eDNA molecules and that the preservation is not limited to specific depositional environments.We focus our analysis on the Foraminifera 25 because of access to highly resolving short barcodes (Pawlowski and Lecroq, 2010) and the availability of a Biogeosciences Discuss., doi :10.5194/bg-2016-486, 2016 Manuscript under review for journal Biogeosciences Published: 22 November 2016 c Author(s) 2016.CC-BY 3.0 License.taxonomically well resolved barcode database for the planktonic taxa (Morard et al., 2015).It allows the unambiguous separation of the benthic (autochthonous) component of the dataset from its planktonic (allochthonous) component.
Foraminifera are single-cell eukaryotes (protists) belonging to the phylum Rhizaria (Adl et al., 2012).Most Foraminifera lineages occupy benthic ecological niches.Their ~ 5000 morphospecies inhabit the bottom of shallow coastal environment to deep abyssal plains.In contrast, the planktonic lineages only include 50 morphospecies, living 5 mostly in the photic part of the water column and spend their entire life cycle in the plankton (Hemleben et al., 1989).
Foraminifera are known for their unusually high rate of evolution (de Vargas et al., 1997) resulting in highly resolving barcodes even in fragments shorter than ~100 bp (Pawlowski and Lecroq, 2010).This circumstance facilitates identification of short, potentially degraded, extracellular eDNA sequences.In addition planktonic foraminifera harbor considerable cryptic diversity (Darling and Wade, 2008;Morard et al., 2016), which offers an additional layer of 10 taxonomic information that can be exploited in eDNA studies.
In the present study we perform new analysis on eDNA libraries generated by Lecroq et al. (2011), which comprise metabarcodes from 31 abyssal sediment samples containing ~ 78 million foraminiferal sequences derived from the 37f foraminiferal specific barcode of the 18S rDNA.The main part (>99%) of the sequences could be assigned to benthic taxa and their composition was analyzed to unravel patterns of benthic diversity on the sea floor.However, a 15 tiny portion of the barcodes (<1 %) could be assigned to planktonic foraminifera.These sequences represent eDNA exported to the seafloor from the plankton.With the recent development of the Planktonic Foraminifera Ribosomal Reference Database (PFR², (Morard et al., 2015)), the environmental sequences belonging to planktonic foraminifera in the eDNA libraries generated by Lecroq et al. (2011) can now be for the first time thoroughly analyzed and assigned to the morphological and cryptic species taxonomic levels

20
The extensive knowledge on the distribution and abundance of planktonic foraminiferal shells in surface sediments (Kucera et al., 2005b), enabled the eDNA data to be directly compared with data derived from classical taxonomy.
We thus assess to what extent the eDNA originating from plankton is representative of the source community which is an essential prerequisite for interpretation of the eDNA archive in the sediment.Discuss., doi:10.5194/bg-2016Discuss., doi:10.5194/bg- -486, 2016 Manuscript under review for journal Biogeosciences Published: 22 November 2016 c Author(s) 2016.CC-BY 3.0 License.

Biogeosciences
The 31 surface sediment samples analyzed were taken at water depths ranging from 1,745 to 5,338 m and cover sediment types from calcareous ooze in the Caribbean Sea to fine clastic sediments in the Arctic Ocean (Fig. 1a, Supplement 1).All analyses are based on the Illumina Solexa GAII datasets generated by Lecroq et al. (2011) and registered at the NCBI's Short Read Archive under the BioProject number PRJEB2682.The original sequencing data include 78,613,888 reads covering the 36 positions starting 3' of the "GACAG" motif delimitating the foraminifera-5 specific hypervariable region 37f region (Pawlowski and Lecroq, 2010).We used the unique sequences obtained for each library in Lecroq et al. (2011) after the strict dereplication step and the removal of singletons associated with only one read occurrence in a library.For each DNA library, we parsed sequencing reads passing the default base calling of GAPipeline v 1.0 and reads showing a single base quality or averaged base qualities inferior to 10 and 20, respectively as well as sequencing reads presenting ambiguities (N) or 10 homopolymers over 30 positions.This resulted in a total of 204,704 unique and filtered 36 bp-long reads.
We compared the retained reads to the Planktonic Foraminifera Ribosomal Reference database (PFR², (Morard et al., 2015)), which represents a compilation of 3,322 partial SSU rDNA sequences of planktonic foraminifera groups associated with a 6-ranks taxonomy.The first three basal ranks correspond to the level of assignation comparable to that achievable using morphological data, and is thus analogous to fossil data.The three terminal levels correspond to 15 the molecular taxonomy accessible using molecular data only.Of these, 2,418 sequences covered the fragment of the region 37f.These sequences were downloaded from the PFR 2 database (http://pfr2.sb-roscoff.fr/)and trimmed to the 36-nt fragment corresponding to the environmental sequences, which resulted in a total of 463 unique homologous reference sequences (Supplement 2).Initially, we evaluated the taxonomic resolution of the 36 nt barcoding region and found that it was variable enough to discriminate the genetic types (cryptic species) within morphological species 20 of almost all planktonic foraminifera taxa referenced in PFR 2 .We observed taxonomic conflicts for only two species pairs belonging to Globorotalia (tumida and ungulata) and Globigerinella (calida and siphonifera) and three pairs of genetic types among Globorotalia truncatulinoides (type III and IV), Pulleniatina obliquiloculata (types I and II) and Globigerinita glutinata (types III and IV).
We then individually aligned the 4,466 to 27,578 unique sequences obtained for each of the 31 samples against the 25 461 reference sequences using the Needleman-Wunsch global sequence alignment algorithm (Needleman and Wunsch, 1970), to separate the portion of the dataset belonging to the planktonic foraminifera (allochthonous origin) Biogeosciences Discuss., doi :10.5194/bg-2016-486, 2016 Manuscript under review for journal Biogeosciences Published: 22 November 2016 c Author(s) 2016.CC-BY 3.0 License.from the portion belonging to the benthic foraminifera (autochthonous origin).Pairwise genetic distances were calculated as the number of differences (counting successive indels and terminal gaps as one difference), and an iterative clustering of the unique environmental sequences with the reference sequences was performed, allowing 1, 2, 3, 5 and 10 differences as thresholds for the average linkage algorithm.We then extracted all environmental sequences found within each cluster containing a planktonic reference sequence in an iterative manner, by screening 5 from the most stringent (1 difference threshold) to the most permissive (10 differences threshold) clusters.As a post hoc verification, we compared these sequences with the extensive benthic foraminifera sequence database used in Pawlowski et al. (2014a) together with the sequences of the Protist Ribosomal Reference Database (PR², Version based on Release 203 of Genbank (Guillou et al., 2013)) and additional undescribed benthic specimen sequences to ensure that the extracted sequences do not belong to benthic foraminifera.No match was found.We assigned to each 10 extracted environmental sequence the taxonomy of the planktonic reference sequences in the cluster.Finally, we retained only the sequences occurring in at least 2 samples or having a minimal abundance of 10 for downstream analysis.The final product was then considered an individual e-ribotype (Supplement 3).E-ribotypes are unique environmental sequences (not cluster) originating from planktonic foraminifera and thus transferred from surface ocean to the bottom (allochthonous origin).The relative proportions between e-ribotypes (planktonic reads) and the 15 benthic reads of each sample are shown on Fig. 1B.We calculated the rarefaction curves of each individual samples using PAST v 2.17 (Hammer et al., 2001) to estimate to what degree the full taxonomic spectrum of each sample was recovered by eDNA (Fig. 2).
Genuine sequences of planktonic foraminifera representing species not yet registered in the reference database may have been omitted.We therefore structured our analyses to account for the detection of possibly unknown genetic 20 types.To this end, we used the phylogenetic signal contained in the 36bp-reads to build a taxonomic framework within each morphospecies.In contrast to strict annotation approaches using arbitrary similarity thresholds, a phylogenetic approach can identify novel genetic type, not represented in the reference comparative database.The retained eribotypes were automatically aligned using MAFFT v.7 (Katoh and Standley, 2013), with reference sequences of the complete 37f region.The complete 37f region was used at this step instead of the 36-bp fragment to avoid possible 25 read alignment shifts caused by artificial mismatches with trimmed 36-bp sequences during the assignment process.
A single alignment was produced per morphospecies.For each resulting alignment, a phylogenetic tree was inferred using PhyML (Guindon et al., 2010)  for branch support estimation.The resulting trees were visualized with ITOL (Letunic and Bork, 2011) and all visually distinct clusters were considered as unique genotypes (Supplement 4).The reads clustering with reference sequences were assigned at the genetic type level, the sequences clustering without a close reference received an artificial genetic type attribution (Supplement 3).These assignments were used to prepare three datasets with different degrees of taxonomic resolution (at the level of e-ribotype, genetic types and morphological species).The occurrences of the 5 defined genetic types in the samples are shown on the Figure 3.
In order to compare the taxonomic richness and structure of the dataset, Non-Metric Distribution Scaling (NMDS) as implemented in PAST v 2.17 (Hammer et al., 2001) was applied on the dataset at the different taxonomic resolution.
We used the Dice distance to consider only the presence/absence data and the Bray-Curtis distance to compare absolute and relative abundances of reads among the samples (Fig. 4).To compare the similarity structure and diversity in the 10 samples based on the eDNA reads with census counts of microfossils, we used the MARGO database (Kucera et al., 2005a) and calculated fossils-based diversity (Shannon-Wiener) and similarity (Dice and Bray-Curtis) matrices using PAST v 2.17 for all surface samples within the regions outlined in Fig 1A (between 6 and 13 per region, Fig. 5 and   6).

3 Results
After quality filtering and collapsing of identical reads into single sequences, the comparison of the entire dataset with reference databases (Supplement 2) allowed to ascribed with certainty 1,373 unique sequence patterns representing 488,291 reads to planktonic foraminifera (Supplement 1, 3).Because we required reads to be present in a minimum of two samples or to show a minimal abundance of 10 in the entire dataset, the retained dataset was reduced to 697 20 unique sequences of planktonic foraminifera (e-ribotypes), which are representing a total of 486,435 reads (~0.63% of the total dataset, Supplement 1).Diversity was then assessed using a phylogenetic approach and the 697 e-ribotypes were found to represent 37 genotypes (Fig. 3, Supplement 4).Of these, 675 e-ribotypes (representing ~ 99 % of the reads) were attributed to 24 known genotypes assigned to 17 morphological species (Supplement 3, 4).

25
(Supplement 1, Fig. 2), representing between 0.003 and 9.412% of the total foraminifera reads in the libraries from these samples (Fig. 1b).Three Arctic samples did not yield any sequences that could be assigned to planktonic foraminifera (Fig. 1b, 3).The total number of reads per sample is a function of sequencing effort and is therefore not related to initial community density.However, the relative abundance of reads assigned to planktonic foraminifera in the DNA accumulated on the sea floor should reflect the relative proportion of the foraminiferal DNA produced by planktonic communities and the DNA produced by the in-situ benthic community.Whilst the absolute number of 5 reads varied among the samples and replicates, we did indeed observe a higher reproducibility of the relative number of planktonic reads recovered from replicates at the same location (Fig. 1b).No obvious correlation is observable between the relative number of planktonic reads and the latitudinal position of the samples.The samples with the highest relative proportions originate from Japan (0.790% to 9,412%) whilst the lowest abundances are observed in the Caribbean samples (0.003 to 0.032%).The high latitude samples (Arctic, North Atlantic and South Atlantic) show 10 relative abundances ranging from 0.011 to 1.204% when excluding samples without planktonic reads.In contrast, there appears to be a trend of increasing abundance of planktonic reads with depth (Fig. 1b), suggesting that the organic matter flux delivering planktonic DNA becomes relatively more abundant compared to DNA from in-situ community with depth.This trend is relatively weak and influenced by outliers, but it clearly appears to be the case that the proportion of planktonic DNA reads does not decrease with depth.

15
Rarefaction analysis has been used to assess the degree to which the retained planktonic reads cover the diversity they contain (Fig. 2).As expected, the general trend indicates a higher degree of saturation in samples with more reads.
For example, samples with the highest number of reads (Japan) had saturated diversity (Fig. 2a), whereas the Caribbean samples representing a similar geographical province but with fewer reads are clearly under-saturated (Fig. 2c).However, we observe that samples from high latitude regions are also saturated (Fig 2b,c), despite having a lower 20 number of reads than the samples from Japan, implying lower diversity.
With respect to the composition of the reads, we observed that e-ribotypes attributed to the microperforate species Globigerinita glutinata dominated the dataset (~77% of the reads) and were particularly abundant in subtropical communities (Fig. 3).E-ribotypes of common subtropical species Orbulina universa, Globorotalia menardii, Globorotalia hirsuta, Hastigerina pelagica Neogloquadrina dutertrei, Globigerina falconensis, Globigerinella dominate subpolar and polar samples.Among these, e-ribotypes belonging to the genotype IV of N. pachyderma were mostly found in the Southern Ocean (>99.99%)whereas e-ribotypes of the genotype I were only observed in the subpolar samples from the northern hemisphere.Additionally, different Globigerina bulloides e-ribotypes were detected either in subtropical samples (type I) or in subpolar assemblages (type II).Similarly, type II e-ribotypes of Globigerinita uvula were found more frequently in subpolar samples from both hemispheres, whereas e-ribotypes of 5 type I were also abundant in low-latitude samples (Fig. 3).
Next, we calculated similarity matrices among samples at the level of morphological species, genotypes and eribotypes in order to identify patterns of community structure in the planktonic reads in each sample.Visualization was based on Nonlinear Multi-Dimensional Scaling (NMDS, Fig. 4) with either Dice (Presence/absence) or Bray-Curtis similarity metrics computed from relative as well as absolute read abundances.Calculation performed on 10 absolute number with Bray-Curtis (Fig 4a-c) showed high reproducibility for samples from the same regions, best expressed at the e-ribotype taxonomic level.The high latitude communities are more similar (Fig 4c), whilst we observe a strong separation between Caribbean and Japan samples, probably reflecting the substantially different number of reads between localities (Fig. 2).Indeed, this difference vanishes when relative proportions are considered (Fig 4d-f).This shows that despite the enormous differences in read numbers between the Caribbean (48 to 212 reads 15 per sample) and Japan (3,630 to 124,355 reads per sample) samples, the relative proportions of the major taxa are the same between these regions (Fig. 3).We also observe a clear separation between low and high latitude samples (Fig  c).This is most 20 likely due to the different level of taxonomic saturation between the samples (Fig. 2).For example, less than 15 eribotype have been observed in every Caribbean samples whilst more than 200 were detected for 5 of the 6 Japan samples, thus maintaining the separation between these regions.Remarkably, despite the different numbers of reads and the associated different level of taxonomic saturation, the recovered pattern of taxonomic composition of the reads is so strong that the opposition between the high and low latitude samples, clearly observed with fossil assemblages, These observations taken together imply that the relative abundance of planktonic eDNA reads in the sediment samples contains exploitable information at all three taxonomic levels.To further explore the diversity patterns implied by eDNA data, we calculated the Shannon-Wiener diversity index within each sample (Fig. 5a).Despite 5 differences in sediment type and sequencing depth, eDNA in the analyzed samples reproduces the latitudinal diversity gradient based on morphospecies abundances in surface sediment samples (Rutherford et al., 1999).The latitudinal diversity gradient is present at all three taxonomic levels, but is most pronounced at the e-ribotype level (Fig. 5b).
Finally, since census counts of planktonic foraminifera morphospecies in surface sediments are available from the same regions as those analyzed for eDNA (Fig. 1a), we assessed whether the e-ribotype abundances reflect the same 10 community turnover pattern as that indicated by fossil assemblages (Fig. 6).To this end, we compared pairwise distances between eDNA MOTU assemblages with pairwise distances between fossil assemblages.This comparison reveals that eDNA and morphospecies community turnover rates are correlated (Fig. 6), with highest similarity among samples from the same region and lowest similarity among samples from different climatic regimes.This pattern emerges both when relative abundances and presence/absence data are considered.This implies that the 15 proportionality of eDNA reads abundance is consistently scaled with the proportionality of plankton flux to the seafloor.The analysis based on relative abundances yields a pattern with highly consistent results for comparisons between climatic zones and more scatter when comparing samples within a region or within one climatic zone.This is likely due to the fact that the eDNA data only cover a part of the morphological diversity of the foraminifera combined with differential distortion of the original abundance signal due to variation in gene copy number (Weber 20 and Pawlowski, 2013) and primer bias (Bradley et al., 2016).

Discussion
Here, we provide evidence that eDNA originating from planktonic foraminifera is indeed preserved in the DNA pool of abyssal marine sediments irrespective of water depth, geographic region and sediment type.Earlier eDNA studies 25 on marine sediments assumed that DNA preservation is proportional to the preservation of organic matter and, thus, Biogeosciences Discuss., doi :10.5194/bg-2016-486, 2016 Manuscript under review for journal Biogeosciences Published: 22 November 2016 c Author(s) 2016.CC-BY 3.0 License.
prioritized sampling in organic-rich sediment layers (Coolen et al., 2009).Yet, recent experimental research and field studies suggest that the primary structure of DNA molecules is adsorbed to solid particles and molecules preserved in this way may form an archive of extracellular DNA regardless to the organic content of the sediment (Corinaldesi et al., 2007(Corinaldesi et al., , 2011(Corinaldesi et al., , 2014;;Torti et al., 2015).We also show that the eDNA composition consistently reflects the composition of the pelagic planktonic communities from which it was derived.The high reproducibility of reads 5 diversity (Dice index) and relative abundance (Bray-Curtis index) within a single region (Fig. 4) suggests that the taphonomic process governing the transfer and preservation of extracellular DNA from surface to bottom ocean are similar at regional scale and do not differentially impact DNA from species within different ecological groups.
Although the number of planktonic foraminifera reads recovered differed by three orders of magnitude between the Caribbean (62 to 212 reads per samples, representing ~0.003 to 0.03 % of the dataset) and Japan (3,620 to 124,355 10 reads per sample, representing 0.8 to 9.4 % of the dataset), the information recovered was sufficient to unveil the structure of foraminifera communities across the whole range of environments investigated (Figs.3-6).However, since the taxonomic richness in eDNA data increased with sequencing efforts (Fig. 2), the recovery of the full taxonomic diversity requires a certain minimum sequencing effort.From the analyzed dataset, it is not possible to explain the large variation in the numbers of reads ascribed to planktonic foraminifera among regions (Fig. 2).This 15 could represent DNA differential preservation conditions, or an imbalance between flux from the surface (allochthonous) community and the abundance of DNA from the autochthonous benthic community.The latter is a likely explanation because the analyzed eDNA material was amplified by PCR primers annealing to all foraminiferal sequences (Lecroq et al., 2011).During the PCR, the DNA of planktonic foraminifera might well be outcompeted by the autochthonous DNA of benthic foraminifera, which is potentially more abundant, less damaged and more easily 20 extracted from cells than when tightly absorbed to sediment particles (Ceccherini et al., 2009;Torti et al., 2015).
Consistent with earlier studies (Capo et al., 2015;Lejzerowicz et al., 2013;Pawłowska et al., 2014;Pedersen et al., 2013), the taxonomic diversity revealed by the analyzed eDNA barcodes overlaps only partly with the diversity based on fossils present in the sediment.One part of the observed difference could be ascribed to the limited coverage of the reference database.Because of the way we assigned reads to planktonic foraminifera, we cannot assess the portion of 25 the planktonic foraminifera diversity not represented in the reference database, although all major planktonic foraminifera taxa making >90% of tests larger than 150 μm are present in the reference database (Morard et al., 2015).barcodes could be attributed to four common species well represented in the reference database: Globorotalia truncatulinoides, Turborotalita quinqueloba, Trilobatus sacculifer and Globigerinoides ruber.This observation is consistent with preferential PCR amplification.The rDNA of planktonic foraminifera is characterized by high and variable substitution rates (de Vargas et al., 1997), and two of the four above species exhibit some of the highest mutation rates (Aurahs et al., 2009).The manual inspection of a multiple sequence alignment containing the reference 10 database sequences (Morard et al., 2015) revealed the presence of up to 5 mismatches between these species sequences and the primer sequences used to generate the dataset.Hence, such mutations in the conserved regions of the gene where the primers anneal may be responsible for detection failures.Another preferential PCR amplification could also explain the strong skew dataset towards microperforate species sequences, which represent 55 to 99% of the reads (Fig. 3), but only 0 to 30% of the morphological assemblages (Kucera et al., 2005b).The microperforate clade appears 15 to have significantly lower rDNA substitution rates (Aurahs et al., 2009) and here we observe no mismatch between the primer and the reference sequences within this clade.
Alternatively, the higher abundance of reads assigned to microperforate taxa could represent a genuine pattern, questioning the representativeness of census counts of fossil foraminifera, which ignore specimens smaller than 150 μm (Kucera et al., 2005b).Microperforate species tend to be small and are disproportionately abundant in the size 20 fraction smaller than 150 µm (Brummer et al., 1986).This is significant because the eDNA archive comprises information on all planktonic foraminifera irrespective of size and is thus potentially a more comprehensive recorder of species proportions in the plankton.
Overall, our results indicate PCR/primer bias as the most important limitation of foraminiferal community surveys based on metabarcoding.Alleviating them will allow detection of the full taxonomic spectrum, provided that sufficient 25 sequencing effort is achieved, as recently discussed for fungi (Adams et al., 2013a(Adams et al., , 2013b)).To our knowledge, the dataset we re-analyzed represents the largest sequencing data for a given taxonomic group.Yet, it seems to indicate Biogeosciences Discuss., doi :10.5194/bg-2016-486, 2016 Manuscript under review for journal Biogeosciences Published: 22 November 2016 c Author(s) 2016.CC-BY 3.0 License.
that the main ecological pattern can be extracted even from metabarcodes found at relatively modest frequencies (< 1000 reads, Figs. 3, 4: Caribbean samples).This conclusion underlines the importance of comprehensive reference datasets and barcoding efforts to facilitate the development of specific and effective probing techniques to recover the signal of individual key groups (Pawlowski et al., 2012).
Metabarcoding surveys of marine sediments offer a powerful alternative to study marine plankton ecology and 5 biogeography.Plankton eDNA diversity observed in sea floor sediments represents a continuous flux of biomass, averaged over seasons and throughout the entire water column.Unlike plankton sampling, sea-floor deposits are not affected by the seasonality, reproductive cycle or habitat depth of the plankton at the time of sampling.They offer a spatiotemporally archive of the overlying water column, which contains an integrated record of the maximum range of taxa that is realized at least at some point during the seasonal cycle.In this way, it is possible to constrain 10 biogeographical patterns like endemism or ecological exclusion across oceanic gradients, without the need for highly time-resolved sampling.Importantly, eDNA data can be used to test the stability of biotic interactions inferred from the plankton (Lima-Mendez et al., 2015) simultaneously across a large range of environmental conditions represented in the sediment.

15
Assuming that eDNA deposited on the sea floor is also preserved through time, marine sediments should contain a remarkable ancient DNA (aDNA) archive of the history of plankton communities.There is growing evidence that eDNA is preserved in marine sediments old enough to cover the previous ice age (Lejzerowicz et al., 2013).Until now, the interpretation of aDNA datasets from marine sediments suffered from insufficient sequencing depth (Coolen et al., 2009) or insufficient coverage of the reference database (Pawlowski et al., 2014a).As a result, to which degree 20 the observed aDNA patterns reflect genuine past ecological shifts remained contentious.If sedimentary DNA is incorporated into marine sediments without taxonomic bias, as is the case in other environments (Pedersen et al., 2015), our data would support previous claims of DNA survival in deep-sea sediments, even where the sequencing depth was limited (Coolen et al., 2009).We hypothesize that if the molecular imprint of foraminifera is preserved in surface sediments, the same should apply to the rest of the pelagic community.If confirmed, this, together with the  For each magnification, the curves which are out of range are drawn in dashed lines ease the reading to the implemented in SEAVIEW 4 (Gouy et al., 2010) with default option using aLRT Biogeosciences Discuss., doi:10.5194/bg-2016-486,2016 Manuscript under review for journal Biogeosciences Published: 22 November 2016 c Author(s) 2016.CC-BY 3.0 License.
4f) at morphospecies level, which is analogous to the structure represented by fossil assemblages in nearby samples (Fig 4g).Calculation performed on Dice indices (Fig 4i-k) tends to reproduce the same structure as the calculation performed with Bray-Curtis calculated on absolute read number, but the patterns are noisier (Fig 4a-

25 remains
even in the eDNA assemblages when considering the morphospecies level (Fig 4k-l).The signal in the eDNA data is noisier because only a fraction of the morphospecies have been detected by the eDNA (1 to 11 morphospecies per sample, Supplement 1), whilst 1 to 24 morphospecies are observed in the census counts.This means that the Biogeosciences Discuss., doi:10.5194/bg-2016-486,2016 Manuscript under review for journal Biogeosciences Published: 22 November 2016 c Author(s) 2016.CC-BY 3.0 License.relative proportions of the reads carry enough information to reproduce identical patterns between eDNA and fossil record despite only a partial coverage of the morphological diversity (Fig 4f-g).

Figure 2 .
Figure 2. Rarefaction curves.e-ribotypes rarefaction curves of each of the 28 samples containing planktonic foraminifera sequences.The three boxes show the same rarefaction curves at 3 different scales highlighted by grey rectangles.The symbols with numbers correspond to the replicates of a single location shown on Fig. 1. figure.

Figure 3 .
Figure 3. Heat Map of the relative proportions of the genotypes detected in the 31 samples.The histogram on the left side of the heat map indicates the total abundance in log-value of the reads belonging to planktonic foraminifera.Symbols as on Figs 1-2.

Figure 4 .
Figure 4. Community structuring of planktonic foraminifera in sedimentary eDNA.Grouping of eDNA and census count samples according to their taxonomic composition using Non-linear Multi-Dimensional Scaling based on Bray-Curtis (Absolute number (a-c) and relative abundances (d-g)) and Dice distances (i-l).The NMDS are provided for the three different degrees of taxonomic resolution (ribotypes, genotypes and morphospecies) for

5 the 10 24
eDNA samples.As the census count are relative abundances, the Bray-Curtis on absolute value is not provided for the census count assemblages.Symbols as in Fig.1band Fig. 2. Biogeosciences Discuss., doi:10.5194/bg-2016-486,2016 Manuscript under review for journal Biogeosciences Published: 22 November 2016 c Author(s) 2016.CC-BY 3.0 License.

Figure 5 .
Figure 5. Macro ecological pattern of spatial diversity known as the mid-domain effect.(a) The grey areas represent the distribution of the Shannon index calculated on the census count of planktonic foraminifera in core top samples from the MARGO database(Kucera et al., 2005b)  against latitude.dark grey area represents the 1 st -3 rd quartiles (50% confidence interval), light grey the 5 th -95 th percentile (90 % confidence interval), and

Figure 6 . 5 26
Figure 6.eDNA vs census counts.Similarity pairwise comparison of the community composition inferred from relative abundances of morphospecies based on eDNA and census counts among and between the five sampled regions.Each symbol corresponds to the average between all pairwise distances of each category and the lines represent the standard deviation.The distances are based on the Bray-Curtis and Dice