Thumbnail

Visual Analysis of RNAseq Data: Discovering Genes in Bacteria

S. Simon

2015
Dissertation

RNA sequencing (RNAseq) using next-generation sequencing (NGS) technologies allows, nowadays, to produce transcriptomic data in a high throughput fashion. However, the analysis of these large and complex biological data sets remains a great challenge. This analysis is high of explanatory nature and requires constantly connecting observations with implicit domain knowledge. This requires interactive visual analysis systems and an expert user in the analysis loop. The challenge of designing interactive visual analysis systems for the analysis of RNAseq data demands interdisciplinary research at the interface between molecular biology and visual data analysis. However, the epistemic distance between both fields is typically very high and, therefore, knowledge gaps and interdisciplinary communication issues hamper effective collaboration. In order to bridge the knowledge gap between domain and visualization experts, I introduce the Liaison role for problem-driven research in the visualization domain which fosters better and richer interdisciplinary communication. In this thesis, I contribute a problem characterization and task descriptions to discover and describe genes using RNAseq data. Based on the problem characterization, I identify two research gaps: First, assessing the trustworthiness of RNAseq data in the analysis and, second, discovering and relating genes to identify their functions. With the systems NGS Overlap Searcher and VisExpress, I present two visual analysis solutions that address these research gaps. Furthermore, I evaluate and apply both systems on real data sets with real experts leading to important insights for the biological domain as well as for problem-driven visualization research.

Materials
Title