Office of Research, UC Riverside
Heng Yin
Associate Professor
Computer Science & Engineering
hengy@ucr.edu
(951) 827-6437


SaTC: CORE: Small: Towards Robust and Scalable Search of Binary Code and Data

AWARD NUMBER
009253-002
FUND NUMBER
33383
STATUS
Active
AWARD TYPE
3-Grant
AWARD EXECUTION DATE
8/18/2017
BEGIN DATE
9/15/2017
END DATE
8/31/2020
AWARD AMOUNT
$476,756

Sponsor Information

SPONSOR AWARD NUMBER
1719175
SPONSOR
NATIONAL SCIENCE FOUNDATION
SPONSOR TYPE
Federal
FUNCTION
Organized Research
PROGRAM NAME

Proposal Information

PROPOSAL NUMBER
17050711
PROPOSAL TYPE
New
ACTIVITY TYPE
Basic Research

PI Information

PI
Yin, Heng
PI TITLE
Other
PI DEPTARTMENT
Computer Science & Engineering
PI COLLEGE/SCHOOL
Bourns College of Engineering
CO PIs

Project Information

ABSTRACT

The problem of binary code and data search concerns how to glean valuable information from binary code and binary data in an accurate, scalable and robust fashion. This concern is central to many security problems, including vulnerability scanning, code plagiarism detection, software lineage, malware classification, memory forensics, virtual machine introspection, malicious document detection, etc. Although this problem is not new and a great deal of solutions have been proposed, no solutions can achieve the requirements of accuracy, scalability and robustness simultaneously. There are bottlenecks for binary code and data search due to the search schemes: pair-wise comparison for binary code search does not scale, and rule-based binary data search is too rigid and thus not robust against changes caused by different platform versions and malicious manipulations.

The proposed work takes a novel approach to the problem of binary code and data search, one that mimics how the human brain recognizes interesting objects from an enormous amount of visual information. There are two research thrusts: 1) scalable cross-platform binary code search, which aims to quickly identify semantically equivalent or similar code from a large binary code base in different architectures, by automatically learning high-level features from binary code via clustering and deep learning; and 2) adaptive, efficient and robust binary data analysis, which aims to accurately identify objects from binary data such as memory dumps and documents, by constructing deep neural network models. Because binary code and data search are foundational for many security applications, advances to these foundations can push the boundary for all the security applications built on top. Moreover, successful application of deep learning onto binary code and data search will revolutionize how we solve many security problems in general and stimulate more research in the direction of security by deep learning.
(Abstract from NSF)