Lough Neagh Assignment

MetaVir analysis of unassembled reads.

2,295,055 reads were uploaded to the MetaVir server [32, 33] for taxonomic annotation and comparative analyses with other viromes. Rarefaction analysis was performed on the whole dataset with clustering of sequences at 90% identity level, and demonstrated that, while sequencing effort was substantial and sufficient for accurate taxonomic annotation of major groups of viruses, it wasn’t exhaustive, as the rarefaction curve had not approached a plateau (S1A Fig). To further assess cluster richness, we conducted a comparative rarefaction analysis of subsamples from the Lough Neagh virome and several viral freshwater metagenomes. Comparison with the freshwater lakes Bourget and Pavin is shown in S1B Fig (sampling depth– 50,000 reads, clustering of sequences at 90% identity level). All three rarefaction curves could be fit to linear functions using GraphPad Prism (r2 > 0.99); the comparison of their slopes demonstrated that all three curves were different (p < 0.0001) with Lough Neagh having a more diverse virome.

Taxonomic annotation on Metavir was performed by comparing all reads from the Lough Neagh virome with the RefSeq complete viral genomes protein sequence database (2014-09-10 release) using BLASTx [16]. 14.6% (334,507 reads) of the virome sequences produced a database hit (threshold of 50 on the BLAST bit score, with no minimum alignment length). These reads were annotated on the basis of their similarity to known viruses, and the taxonomic composition of the virome was determined after normalisation with the Genome relative Abundance and Average Size (GAAS) tool [34] to account for differences in the genome lengths of viruses (Fig 3). Less than 0.5% of these reads had similarity to ssDNA viruses, and the majority of the remaining reads (97.0%) originated from dsDNA viruses, of which Caudovirales (tailed bacteriophages) accounted for 79.9% of reads. Unclassified dsDNA phage sequences comprised 15.8%, and unclassified dsDNA viruses 1.0% of reads. The majority of reads annotated as arising from Caudovirales had similarity to genomes of the Podoviridae family phages (34.3% of all reads), closely followed by Siphoviridae (32.8%), while Myoviridae was the least numerous group, with 10.3% of reads affiliated with this taxon. The predominant subfamilies/genera (accounting for more than 0.5% of metagenome) for Podoviridae were unclassified and unassigned Podoviridae (26.6% and 0.8%, respectively), Bppunalikevirus (2.3%), Autographivirinae (1.8%), P22likevirus (0.8%), Epsilon15likevirus (0.7%), and Luz24likevirus (0.5%). The majority of the reads assigned to Siphoviridae were from unclassified Siphoviridae (29.1%), followed by Lambdalikevirus (1.8%), Phic3unalikevirus (0.9%), and Yualikevirus (0.8%). In the case of Myoviridae, no subgroup with abundance of more than 0.5% (except unclassified Myoviridae; 8.4%) was identified. Fourteen individual phage sequences were most abundant in the virome, making up more than 1% each. Of these, seven can be linked to the Podoviridae, two to the Siphoviridae family, while five others correlated to unclassified dsDNA phages. Due to abundance corrections introduced by GAAS, the most abundant virotypes in terms of number of mapped reads were different from the most abundant ones selected based on GAAS-corrected values. The combined list of the most abundant phage sequences is given in Table 1. Of special notice is Pelagibacter phage HTVC010P [35], which made up 1.8% of the virome (GAAS-corrected value) with 4,223 reads mapped to its genome. Pelagiphages are possibly among the most numerous types of viruses on the planet [35], but little is known about their role in freshwater environments. One of the top 21 contigs in terms of the number of mapped reads assembled in this work (LNW4-c10) also had the TerL gene showing high similarity to the TerL of Pelagibacter phage HTVC010P (see below). Three other Pelagibacter phage sequences were identified in the Lough Neagh dataset, constituting 1.4% of the virome, with 5,527 reads mapped to their genomes. In agreement with the dominance of Cyanobacteria in the microbial community structure, 39,845 (9.00%) reads from the whole virome were annotated as originating from bacteriophages of Synechococcus and Prochlorococcus cyanobacteria as well as unclassified cyanophages.

Fig 3. Taxonomic composition of Lough Neagh virome.

Composition was computed at the MetaVir server from a BLAST comparison with the RefSeq complete viral genomes protein sequences database. Abundance of the major viral groups shown with the numbers of mapped sequences at the right ends of the corresponding bars.


MG-RAST analysis of unassembled reads.

After merging of paired-end reads, quality processing, and deduplication, the MG-RAST analysis pipeline [36] generated 2,601,470 reads. These reads were subjected to functional and taxonomic classification. MG-RAST utilises a number of different databases for functional annotation of reads, including four databases allowing for hierarchical functional annotation, namely KEGG Orthology (KO), COG, eggNOG, and SEED Subsystems [37]. The SEED subsystems database is manually curated and thus is considered to be more accurate. It is a conclusion reached by, for example, [37, 38], which we share, so we chose it as a primary method of functional annotation. The unassembled reads processed by MG-RAST were compared to the Subsystems database using a maximum e-value of 10−5, a minimum identity of 60%, and a minimum alignment length of 15 (measured in aa for protein and bp for RNA databases). 125,852 reads were classified this way. The functional distribution of reads at the highest hierarchical level of MG-RAST Subsystems classification is presented in Fig 4A. 68.3% of all classified reads were identified as belonging to the functional category of “Phages, Prophages, Transposable elements, and Plasmids”. Phages and prophages were the largest part of this group (66.4% of all classified reads), while 1.4% of reads belonged to the GTA (Gene Transfer Agents). A small number of reads in the functional category of “Phages, Prophages, Transposable elements, and Plasmids” were assigned to functional categories of”Pathogenicity islands” (0.5%) and “Transposable elements and integrons” (0.1%) (Fig 4C). It should be noted that in Fig 4C, in the category “Phages, Prophages” the top subgroup is “r1t-like streptococcal phages” (26.7%). We used functional classification based on SEED Subsystems. One of these subsystems, named “r1t-like streptococcal phages”, contains several genes characteristic of streptococcal bacteriophages, which are similar to phage r1t. The reads from our virome that had best BLAST hits to the genes in the category “r1t-like streptococcal phages” were classified as such, not necessarily originating from streptococcal phages. The remaining 21.7% reads were divided between various non-viral functional groups (Fig 4A). A detailed description of these groups presented in Fig 4B. It is important to note that the pstS (high affinity phosphate transporter) gene was identified in 116 reads. The pstS gene has previously been detected as integrated into genomes of a number of bacteriophages in a study of marine viruses by Sullivan and colleagues [39, 40]. To assess the extent of horizontal gene transfer we based the study of functional diversity of the virome on the analysis of individual reads, and not the assembled contigs. The presence of the pstS gene in our viral metagenome could arise from its being permanently integrated into a phage genome (specialized transducing phages) or from various transducing entities (generalised transducing phages or GTAs).

Fig 4. Functional analysis of Lough Neagh virome.

The analysis was carried using SEED subsystems hierarchical functional annotation on the MG-RAST webserver. (A) Relative abundance of level one functional categories. (B) Distribution of minor functional categories. (C) Distribution of functional categories in the “Phages, Prophages, Transposable elements, Plasmids” group at levels 2 and 3.


This study has produced the largest virome sequencing coverage of a freshwater lake to date. Nevertheless, the rarefaction analysis conducted clearly demonstrates that this sequencing is not exhaustive (S1A Fig). Comparison with previously published viromes of the French lakes Pavin and Bourget (S1B Fig), sequenced with less depth [12], demonstrated that the Lough Neagh virome has a higher sequence diversity. The lower limit of viral richness for Lough Neagh was estimated according to [41]. The average length of the 2,295,055 reads uploaded to MetaVir was 276 bp, and the reads were clustered into approximately 650,000 clusters at 90% identity level, and into approximately 840,000 clusters at 98% identity level. Using 50,000 bp as an average bacteriophage genome size, and defining “a single viral species” as in [41] (as being a grouping of isolates at nucleotide identity levels of 90% to 95%), we estimate the lower limit of the number of different viruses as being between 3588 and 4637, using the formula N*L/G, where N is the number of clusters, L the average read length (bp), and G the average bacteriophage genome size (bp). The Lough Neagh virome was also compared to freshwater viromes available on MetaVir (S2 Fig). Depending on the algorithm used for the comparison (di-, tri-, or tetranucleotide bias comparison [42] or BLAST-based comparison [32]), the closest viral communities identified were the viromes of Lagoa Vermelha [MetaVir project ID 4000], Tilapia_Channel– 1105 [MetaVir project ID 33] [11], El Berbera [MetaVir project ID 395] [9], and Lake Bourget [MetaVir project ID 7] [12], respectively.

According to MetaVir analysis, 14% of all reads were classified as of viral origin; the rest were not assigned. MG-RAST analysis of the same virome classified approximately 15% of the reads analysed. This means that over 80% of the sequences analysed lack any substantial homology to database entries (with an e-value smaller than 10−5). This is typical for those viral metagenomes analysed to date [41]. According to MG-RAST analysis, 10.9% of the reads were annotated as of bacterial origin (72% of all reads after QC and post-processing). This apparent anomaly could be explained by the fact that sequences of GTAs, bacterial vesicles, free external DNA, malformed VLPs (with bacterial DNA), and transduced bacterial DNA would be included in this category. It is also should be taken into account that the MG-RAST pipeline is heavily biased towards the annotation of sequences as being of bacterial origin. All precautions were taken in this work to minimise external bacterial DNA contamination; the VLP fraction was treated with an excess of DNase I as recommended [43] until disappearance of the 16S rRNA gene products (results not shown). Indeed, only 4 of 2,601,470 reads were classified as originating from 16S rRNA genes. These are likely to originate from general transducing phages or GTA particles.

When compared with two temperate freshwater viromes published [12], the striking difference is the absence of ssDNA viruses in Lough Neagh metagenome (0.5%); comparable values are 80% for Lake Pavin and 85% for Lake Bourget. The most likely explanation of this is the difference in preparation of the metagenomic samples for sequencing. No multiple displacement amplification (MDA), which is known to be highly biased towards the amplification of single-stranded DNA molecules [44, 45], was used in our work. In another viral metagenome project, where MDA was also not employed, ssDNA viruses also constituted less than 1% of all raw reads [41]. It may be concluded that avoiding the amplification of viral metagenomic samples using MDA is desirable for a more accurate representation of viral communities.

Contig construction and MetaVir analysis.

66,450 contigs ranging from 301 to 58,805 bp were produced as described in the Experimental Procedure section. All contigs were uploaded to MetaVir server for annotation and comparison with other publicly available viromes. There were 21 contigs larger than 30 kb, with the largest being 58.8 kb. The essential characteristics of these contigs are presented in Table 2. The in-depth analysis has been conducted for largest contigs (i.e., LNW4-c0 –LNW4-c20), as well as for those which were detected as the most abundant in Lough Neagh (identified by high sequence coverage). As can be seen from the Table 2, putative cyanophages are highly represented in the Lough Neagh virome (contigs LNW4-c0, LNW4-c11, LNW4-c20).

Genetic maps for contigs LNW4-c0 and LNW4-c12 are shown in Fig 5. LNW4-c0 represents a putative Myoviridae (possibly T4-like) phage. 51 full and 1 partial ORFs were identified in this contig of 58,073 bp. On the basis of the analysis of orf35, identified as a terminase large subunit by BLASTp and hmmscan comparisons, this phage can be classified as being related to Prochlorococcus phage P-SSM7 (NC_015290.1) and Sinorhizobium phage phiM12 (KF381361). While it is impossible to unambiguously determine the taxonomic affiliation of the phage in question, the similarity of a number of other ORFs of the contig to genes of cyanophages favours the hypothesis of a cyanophage origin. The genome sizes of both related phages are more than 150 kb; therefore, it is likely that LNW4-c0 contig represents a partial sequence of a phage genome from Lough Neagh. LNW4-c12 probably comes from a member of Podoviridae family, this 34,467 bp circular contig contains 52 ORFs. It is likely that this contig represents a genome of a phage with either circular permutations or long direct terminal repeats. According to MetaVir BLASTp and independent BLASTx analyses, the closest homologs of the LNW4-c12 TerL gene are sequences of the terminase large subunit from Roseobacter phage RDJL Phi 1 (62,668 bp) and the terminase large subunit from the Burkholderia sp. TJI49 phage genome, respectively. Due to a high diversity of environmental bacteriophages and a limited number of viral genomes available in the reference databases, it is not possible to state whether or not LNW4-c12 is indeed a phage infecting bacteria of genus Roseobacter or Burkholderia.

Fig 5. Maps of putative phage genomes identified in Lough Neagh.

Genome regions amplified using PCR and genome specific primers are indicated with horizontal bars. Identified ORF shown by arrows. (A) Genome map of putative phage LNW4-c0. (B) Genome map of putative phage LNW4-c12.


To confirm that the identified contigs LNW4-c0 and LNW4-c12 corresponded to the genomic DNA molecules present in the sample analysed, three pairs of specific primers were designed for each of these two contigs to amplify segments 4–6 kbp long, and PCR reactions were performed using the same metagenomic DNA that had been used for Illumina sequencing. In all six cases, PCR products were obtained and Sanger sequencing analysis confirmed the presence of these contigs (the PCR amplified and confirmed regions are indicated in Fig 5).

River authorities have been asked to explain why they have not fixed a broken flood control gate on Lough Neagh after a significant rise in water levels in recent weeks.

The huge sluice gate is one of five used to regulate the flow of water from Lough Neagh into the River Bann at Toome.

The gates can be lowered or raised in order to control the levels of the lough, which is Ireland's largest freshwater lake.

It is believed the broken sluice gate has reduced the flow of flood water from the lough by around 20 percent.

The Rivers Agency, now known as DfI (Department for Infrastructure) Rivers, was criticised in January 2016 after homes, businesses and farmland around Lough Neagh were flooded as levels rose to their highest in living memory.

Concerns were raised about the broken flood gate last year after a wet summer and autumn resulting in land around the lough being swamped.

The department is required to control water levels in Lough Neagh within a range of 12.450 metres to 12.600 metres.

However, the department's website last night confirmed the current levels of the Lough are 13.098 metres - 498mm above the statutory maximum.

Heavy rain in recent days combined with melting snow from the Sperrin and other mountain ranges have raised fears that flood waters could rise further on Lough Neagh in the coming days.

Concerns were raised as it emerged that the home of a Co Tyrone man who lives close to Lough Neagh shoreline has been practically cut off by rising flood water.

It is believed the property at Derrytresk regularly floods when the levels of the lough rise.

Assembly member Linda Dillon last night said: “I want to know why the department has not fixed the gate that was broken before the water levels went up.

“I would like them to give an estimated timeline on when it will be fixed.”

A spokeswoman for DFI Rivers said: “The protracted period of snowfall and rain has resulted in the Lough Neagh level rising over the past few weeks.

“Four of the five sluice gates at Toome that regulate the water level in Lough Neagh are venting maximum flows.

“DFI Rivers will be continuing to carefully monitor the water levels on a daily basis.

“A full structural survey of the damaged gate will be carried out when weather conditions and Lough level permits."

Other parts of the north have also been hit by rising flood waters, including around Lough Erne in Co Fermanagh.

Heavy flooding has also been reported in the Newtownbutler area with some roads closed due to rising water.

Lisdead Road and Samsonagh road, which are both near Ross Lough outside Enniskillen were closed yesterday due to flooding

Ballycrummy Road, which is near Armagh city was also closed due to flooding yesterday.

Enjoy reading the Irish News?

Subscribe from just £1 for the first month to get full access



0 Replies to “Lough Neagh Assignment”

Lascia un Commento

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *