NATIONAL SCIENCE FOUNDATION Award CCF-1423108 to Bhuyan, Laxmi: SHF: Small: Efficient CPU-GPU Communication for Heterogeneous Architectures

Laxminarayan Bhuyan
Distinguished Professor-Emeritus
Computer Science & Engineering
lbhuyan@ucr.edu
(951) 827-2281

UCR Grants Archive

SHF: Small: Efficient CPU-GPU Communication for Heterogeneous Architectures

AWARD NUMBER

006824-002

FUND NUMBER

21264

STATUS

Closed

AWARD TYPE

3-Grant

AWARD EXECUTION DATE

6/18/2014

BEGIN DATE

7/1/2014

END DATE

6/30/2017

AWARD AMOUNT

$498,976

Sponsor Information

SPONSOR AWARD NUMBER

CCF-1423108

SPONSOR

NATIONAL SCIENCE FOUNDATION

SPONSOR TYPE

Federal

FUNCTION

Organized Research

PROGRAM NAME

Proposal Information

PROPOSAL NUMBER

14070606

PROPOSAL TYPE

New

ACTIVITY TYPE

Basic Research

PI Information

Bhuyan, Laxmi

PI TITLE

Other

PI DEPTARTMENT

Computer Science & Engineering

PI COLLEGE/SCHOOL

Bourns College of Engineering

CO PIs

Project Information

ABSTRACT

Future chip multiprocessors (CMPs) will have silicon space and technology to incorporate hundreds of cores. The trend is to integrate tens of cores and hardware accelerators (HAs), such as GPUs, on a single platform. The proposed heterogeneous architecture will enable future chips to operate within their power budgets while providing the high-throughput per Watt required for large scientific applications. Many of the top-500 supercomputers integrate thousands of CPUs with GPU accelerators to achieve the desired throughput for scientific applications. Considerable effort, however, is needed to design efficient communication mechanisms between heterogeneous components in such a system. Currently, HAs are not fully integrated with the system architecture; offloading computation from the CPU to the HAs adds large communication overhead. This research project explores comprehensive solutions to this problem through many different techniques. The project has significant broader impact in terms of research publications, graduate student supervision, and minority education because UCR is a minority serving institution.

This project will develop new CPU-GPU communication techniques through static programming and run-time optimization. It will develop a divisible load theory (DLT) technique to overlap communication with computation, and optimize the time and size of data transfer between the CPU and GPU. The research will also develop run-time techniques that can monitor the efficiency of execution and dynamically change the transfer parameters by considering the execution behaviors of different applications. Architectural changes are to be incorporated in the GPU to initiate data transfers based on task execution inside the GPU. Design of the shared virtual memory (SVM) architecture is to be developed, where the accelerator and system memories share a single virtual address space; and CPUs and HAs in the system will communicate through the SVM. The hardware controllers, memory management unit (MMU), GPU cache memory architectures, cache coherence protocols, and other interfaces between the GPU and CPU cores will also be designed. The project proposes suitable hybrid cache coherence protocols and efficient interconnection networks for scalable system design. Finally, run-time system and software interfaces will be developed that can execute multiple multithreaded applications on a heterogeneous multicore architecture.

(Abstract from NSF)