Building automated genomics data curation and collection pipeline
Dr Jody Phelan (LSHTM), Gary Napier (LSHTM), Dr Ruby Chang (RVC) and Dr Martin Walker (RVC) - £30,500
1 October 2019 to 31 March 2021 (Sandpit Award BSA33)
The project aims to build a pipeline for collection M. tuberculosis NGS data and the application of unsupervised learning methods to characterise the population structure in real-time. The project has three stages:
- A number of unsupervised learning techniques will be applied to a large database of publicly available isolate sequences to identify the optimum methods to determine population structure. Techniques will be assessed based on speed, scalability and accuracy.
- A backend to the TB-Profiler webserver will be developed to integrate frameworks from aims 1 and 2.
- A data protection impact assessment will be performed to in compliance with GDPR. This will aim to characterise how the service interface with user data and minimise potential data security risks. A security policy will be created to ensure that the project developers are knowledgeable on data privacy and security.