NATIONAL SCIENCE FOUNDATION Award DBI-1262107 to Jiang, Tao: Collaborative Research: ABI Innovation: Genome-Wide Inference of mRNA Isoforms and Abundance Estimation from Biased RNA-Seq Reads

Tao Jiang
Distinguished Professor
Computer Science & Engineering
tjiang@ucr.edu
(951) 827-2991

UCR Grants Archive

Collaborative Research: ABI Innovation: Genome-Wide Inference of mRNA Isoforms and Abundance Estimation from Biased RNA-Seq Reads

AWARD NUMBER

006441-002

FUND NUMBER

21203

STATUS

Closed

AWARD TYPE

3-Grant

AWARD EXECUTION DATE

8/31/2013

BEGIN DATE

9/1/2013

END DATE

8/31/2016

AWARD AMOUNT

$569,932

Sponsor Information

SPONSOR AWARD NUMBER

DBI-1262107

SPONSOR

NATIONAL SCIENCE FOUNDATION

SPONSOR TYPE

Federal

FUNCTION

Organized Research

PROGRAM NAME

Proposal Information

PROPOSAL NUMBER

12115270

PROPOSAL TYPE

New

ACTIVITY TYPE

Basic Research

PI Information

Jiang, Tao

PI TITLE

Other

PI DEPTARTMENT

Computer Science & Engineering

PI COLLEGE/SCHOOL

Bourns College of Engineering

CO PIs

Project Information

ABSTRACT

The University of California, Riverside and University of California, Los Angeles are awarded collaborative grants to identify mRNA isoforms on a genome-wide basis. Due to alternative splicing events in eukaryotic cells, the identification of mRNA isoforms (or transcripts) is a difficult problem in molecular biology. Traditional experimental methods for this purpose are time-consuming and cost ineffective. The emerging RNA-Seq technology provides a possible effective way to address this problem. This project aims to develop efficient and accurate methods for inferring isoforms and estimating their abundance levels from RNA-Seq data where the reads may be sampled non-uniformly due to the existence of various biases including positional, sequencing and mappability biases. In particular, a novel statistical framework based on quasi-multinomial distributions will be introduced and a companion expectation-maximization (EM) algorithm developed for estimating isoform abundance levels that can handle all above biases in RNA-Seq data. The algorithms will be implemented efficiently in C++, tested extensively on both simulated and real RNA-Seq data in human, mouse and drosophila, and made available to the public for free. The performance of the algorithms will be evaluated extensively using both simulated and real RNA-Seq data. In the latter case, perturbations to some important splicing factors will be introduced into selected cell lines to induce widespread alteration of splicing events. RNA-Seq data of these cells, combined with quantitative RT-PCR validation, will provide an enriched dataset to assess the performance of the algorithms in predicting both isoform abundance and relative variation. In addition, the validation results may provide insight on the regulatory functions of the splicing factors and serve as a testbed for further improvement of the algorithms.

The broader impact of this project is twofold. First, RNA-Seq data analysis is a timely topic in bioinformatics due to the recent rapid advance in next generation sequencing (NGS) technologies and its potential impact in life sciences and medicine. Despite the success of many RNA-Seq applications, several challenges remain in the analysis of RNA-Seq data, one of which comes from the understanding and handling of biases in RNA-Seq reads. The approaches proposed in this project for treating RNA-Seq biases combine unique techniques from statistics, machine learning and combinatorial algorithms. Moreover, the experimental validation results may shed light on the regulatory functions of some important splicing factors. Second, the project will provide an excellent opportunity for the training of two computer science PhD students, a postdoc and two biology undergraduate students in the interdisciplinary field of computational biology and bioinformatics. Since many of the involved students are female, the research will also help improve the representation of women in science and engineering.

(Abstract from NSF)