Prediction of complex human traits from genomic data using machine learning
methods and informative priors
Tutor: Reka Nagy
Participants: Konstantin Sharafutdinov, Elena Carnero, Dunja Vucenovic, Tatiana Shashkova
In this project we sought to improve prediction of complex human health traits from genomic data by evaluating different ways of incorporating domain knowledge into parametric and non parametric prediction methods. Read more..
Experimental design in Omics - simulation studies
Leaders: Jeanine Houwing-Duistermaat, Mar Rodriguez Girondo
Tutors: Lucija Klaric and Frano Vuckovic
Participants: Sara Koska, Ena Melvan, Manshu Song, Viktoria Szeifert
In this project we evaluated the impact of the different steps in the experimental design that may introduce errors, biases, and loss in efficiency in the data analysis. For that, we analysed two case-control studies. In both cases, the final goal was to measure glycans to assess relationships between a disease (lupus and multiple sclerosis, respectively) and the measured glycans. For this goal it is important to measure the glycans efficiently and not to introduce bias. Topics discussed were distinguishing between biological and technical variation, experimental design, randomization, replication and missing data. We performed simulations under various scenarios and discussed about adequate strategies for efficient designs and analysis plans data for “–omics “case-control studies.
Discovering the difference in genetic control of plasma and IgG glycosylation
Leaders and tutors: Sodbo Sharapov, Yakov Tsepilov, Olga Zaitseva
Participants: Alyce Russel, Ashley van der Spek, FeiFei Zhao
Many plasma proteins are modified by covalently-bound glycans; oligosaccharides that are biologically important for normal biochemical processes, such as protein folding and cellular signaling. The biosynthesis of such glycans occurs in the endoplasmic reticulum in tandem to protein biosynthesis. Yet, unlike proteins, glycans do not follow a genetic template. In fact, a complex network of hundreds of genes is involved in the biosynthetic pathway, and many genes controlling glycosylation are unknown. Immunoglobulin G (IgG), transferrin and fibrinogen are the most prevalent glycoproteins in the plasma proteome. IgG is heavily studied and its glycan moieties are known to have a downstream influence on effector functions; that is, whether it is anti- or pro-inflammatory. Genome-wide association studies (GWAS) are a powerful tool to discover new associations of common genetic variants and phenotypes of interest. GWAS have already been successfully used on the plasma and IgG glycomes separately, identifying important glyco-genes; however, there have been fewer studies on other plasma protein glycomes. For this project we were interested in analysing the plasma glycome using GWAS without the IgG fraction of glycans with the aim of identifying new loci associated with the glycosylation of other important plasma glycoproteins. Read more..
Longitudinal and Survival data analysis
Leader: Ivo Ugrina
Tutor: Frano Vuckovic
Participants: Zlata Cherpakova, Nikolina Sostaric
During the project students learnt basics of Longitudinal and Survival data analysis. Emphasis was given to Friedman test and Cox proportional hazards model. Students learnt how to work with appropriate R functions for testing, visualizing and describing data. As an additional (non conventional) approach students learnt about basic Tensors Algebra and Tensor decompositions like PARAFAC and Tucker decompositions. These methods were applied to longitudinal data to test the quality and interpretability of decompositions to data which contains time as one mode. Results: Longitudinal and Survival data analysis was conducted on two real-life data sets. Additionally, PARAFAC tensor decomposition was applied on longitudinal data resulting in interesting representation of the data and giving some insights into problematic structure of the data. Results of the project seem interesting and will be processed in future scientific research.