NATIONAL SCIENCE FOUNDATION Award 1646333 to Jiang, Tao: EAGER: Transcript-Based Differential Expression Analysis for Population Data Without Predefined Conditions

Tao Jiang
Distinguished Professor
Computer Science & Engineering
tjiang@ucr.edu
(951) 827-2991

UCR Grants Archive

EAGER: Transcript-Based Differential Expression Analysis for Population Data Without Predefined Conditions

AWARD NUMBER

008399-002

FUND NUMBER

33275

STATUS

Closed

AWARD TYPE

3-Grant

AWARD EXECUTION DATE

8/4/2016

BEGIN DATE

9/1/2016

END DATE

8/31/2018

AWARD AMOUNT

$200,000

Sponsor Information

SPONSOR AWARD NUMBER

1646333

SPONSOR

NATIONAL SCIENCE FOUNDATION

SPONSOR TYPE

Federal

FUNCTION

Organized Research

PROGRAM NAME

Proposal Information

PROPOSAL NUMBER

16121312

PROPOSAL TYPE

New

ACTIVITY TYPE

Basic Research

PI Information

Jiang, Tao

PI TITLE

Other

PI DEPTARTMENT

Computer Science & Engineering

PI COLLEGE/SCHOOL

Bourns College of Engineering

CO PIs

Project Information

ABSTRACT

With the emergence of precision medicine, there is increased demand for more sensitive molecular biomarkers. A fundamental computational step in the discovery of molecular biomarkers is to identify genes that are expressed differently across different samples. This project investigates new algorithmic approaches for performing differential expression analysis at the transcript level for samples without predefined biological conditions. Such an analysis is critical to both clinical and biological studies on population (or cohort) data. For example, it can be used to discover molecular biomarkers to classify cancer samples into subtypes so that better diagnosis and therapy methods can be developed for each subtype. It can also be used to characterize individual cells involved in different biological processes. Efficient software tools can be built based on the new analysis approaches proposed in this project which can help biologists to discover more sensitive biomarkers than the existing methods.

Specifically, this project studies three approaches for differential transcript expression analysis on population data. The first two approaches treat either the exons or full transcripts of a gene as the basic expression elements and then apply a gene-level differential expression analysis method on these expression elements. The third approach is a hybrid of the first two. It uses a splice graph to represent the transcripts of a gene and a new modular decomposition algorithm to partition the graph into small components that correspond to independent alternative splicing events. Moreover, a robust clustering algorithm is employed to deal with an arbitrary number of conditions in the input population. The software implementations of the three approaches are calibrated and tested extensively on both simulated and real sequence data to establish their practical utility to the public.

(Abstract from NSF)