Créé(e) 30/04/2018
30 avr

Inferring spatio-temporal inventories of marine microbes using curated marker databases and analysis workflows (Rosko-Tags)

 

Marine microbes are major contributors to biogeochemical cycles due to their key role in oceanic food webs and microbial loop. Their rapid turnover times and the strong dependence of their distribution upon physico-chemical parameters make them particularly good sentinels for assessing the effects of global change and eutrophication of coastal areas. Although their diversity is starting to be routinely monitored in the frame of marine stations long-term time-series using NGS technologies, the detection of significant shifts in complex microbial community structure is still largely limited by i) the paucity of extensive expert-curated reference databases that are mandatory for a reliable taxonomic assignment of sequences, ii) the lack of homogeneity between genetic analyses performed across different sampling sites and iii) the dispersal of taxonomic expertise on specific groups.

Here we propose to develop a suite of bioinformatics tools that are critical to generate intercomparable inventories of microbial diversity, e.g. at long-term monitoring stations. The data and knowledge generated will constitute unique resources for academic users, including culture collections and/or biotechnological companies, notably allowing users to target specific sampling sites and periods of the year for isolating yet uncultivated microbial species of interest for biotechnological or scientific purposes. To reach these goals, we intend to:

Gather under a common interface a number of existing reference databases targeting the most ecologically relevant microbial groups using either universal markers, such as the 18S rRNA region (PR2) and the plastidial 16S rRNA (PhytoREF), or high resolution functional genes such as proteorhodopsin (PR) for PR-containing photoheterotrophic bacteria (MicRhoDE) and the petB gene, encoding cytochrome b6, for cyanobacteria (CyanoDB).

Develop bioinformatic tools to semi-automatically update these databases over time in order to cope with the continuous and exponential flow of ‘omics’ data, available from public databases, culture collections or global sampling projects (e.g., TARA-Oceans, OSD, etc.)

Develop user-friendly analytical workflows adapted to each marker gene and sequencing technology in order to quickly and accurately analyze extensive environmental datasets and determine the relative abundance of each microbial taxa.

As a proof of concept, perform seasonal inventories of major microbial groups will be performed over the long-term series SOMLIT-Astan (a station located 2 miles off Roscoff, English Channel, France)

This information will be made available through a centralized virtual access, integrated into the EMBRC website, offering common tools to search, update, import/export sequence data. Results from this project will reinforce connections between curated databases, time series andculture collections by providing tools to readily and reliably analyze new (meta)Tags datasets and/or search the inventories for specific organisms. This service will increase the attractiveness and exploitability of long-term stations within and beyond the EMBRC network.

The postdoctoral/engineer fellow (24 months) should have spent at least 12 months abroad within the last three years. He will be co-supervised by the Roscoff Bioinformatics platform ABIMS and the scientists of the Plankton group responsible of each database.

____________________________________________________

Some of the following skills are expected:

Proficiency in software development, particularly with advanced programming in SQL and setting up, editing, correcting, and querying databases.

Knowledge of best practices for supporting large, complex databases, including database backup/restoration, information validation and data verification processes

Good programming skills (Python, JAVA) proficiency in Unix/Linux and cluster expericence

Knowledge on metabarcode/metagenome databases (e.g. those relating to genome annotation and metabolic pathways)

A track record in other omics data analysis (e.g. genomics, transcriptomics, proteomics, metabolomics, epigenomics, genotyping data) is a plus

Familiarity with biological & statistical software packages for high-throughput data analysis (e.g. gene annotation programs, microbial ecology sequencing analysis softwares, R)

Expertise in building bioinformatics pipelines and NGS data analysis

Capacity for team work and interest in multidisciplinary approaches

Ability to communicate technical information effectively, both orally and in writing.

 

PhD in bioinformatics, computational biology, datamining, biostatistics, population and/or statistical genetics, human microbiota, numerical/microbial ecology or equivalent

____________________________________________________

 Interested candidates are encouraged to send their CV, along with a letter stating their interest

____________________________________________________