Office of Research, UC Riverside
Tao Jiang
Distinguished Professor
Computer Science & Engineering
tjiang@ucr.edu
(951) 827-2991


EAGER: Transcript-Based Differential Expression Analysis for Population Data Without Predefined Conditions

AWARD NUMBER
008399-002
FUND NUMBER
33275
STATUS
Closed
AWARD TYPE
3-Grant
AWARD EXECUTION DATE
8/4/2016
BEGIN DATE
9/1/2016
END DATE
8/31/2018
AWARD AMOUNT
$200,000

Sponsor Information

SPONSOR AWARD NUMBER
1646333
SPONSOR
NATIONAL SCIENCE FOUNDATION
SPONSOR TYPE
Federal
FUNCTION
Organized Research
PROGRAM NAME

Proposal Information

PROPOSAL NUMBER
16121312
PROPOSAL TYPE
New
ACTIVITY TYPE
Basic Research

PI Information

PI
Jiang, Tao
PI TITLE
Other
PI DEPTARTMENT
Computer Science & Engineering
PI COLLEGE/SCHOOL
Bourns College of Engineering
CO PIs

Project Information

ABSTRACT

With the emergence of precision medicine, there is increased demand for more sensitive molecular biomarkers. A fundamental computational step in the discovery of molecular biomarkers is to identify genes that are expressed differently across different samples. This project investigates new algorithmic approaches for performing differential expression analysis at the transcript level for samples without predefined biological conditions. Such an analysis is critical to both clinical and biological studies on population (or cohort) data. For example, it can be used to discover molecular biomarkers to classify cancer samples into subtypes so that better diagnosis and therapy methods can be developed for each subtype. It can also be used to characterize individual cells involved in different biological processes. Efficient software tools can be built based on the new analysis approaches proposed in this project which can help biologists to discover more sensitive biomarkers than the existing methods.

Specifically, this project studies three approaches for differential transcript expression analysis on population data. The first two approaches treat either the exons or full transcripts of a gene as the basic expression elements and then apply a gene-level differential expression analysis method on these expression elements. The third approach is a hybrid of the first two. It uses a splice graph to represent the transcripts of a gene and a new modular decomposition algorithm to partition the graph into small components that correspond to independent alternative splicing events. Moreover, a robust clustering algorithm is employed to deal with an arbitrary number of conditions in the input population. The software implementations of the three approaches are calibrated and tested extensively on both simulated and real sequence data to establish their practical utility to the public.
(Abstract from NSF)