Transformations to exploit parallelism and to improve data locality are two of the most valuable compiler techniques in use today. One trend in industry to boot the performance of engineering computing like multi-physics simulation is to integrate CPUs and GPGPUs on a single chip. The most common design is to share the last level cache (LLC) between the CPU and GPU. For the engineering computing, GPUs are more suited for throughput-critical applications like the Monte Carlo (MC) and similar codes, while CPUs are more suited for latency-critical applications like the Computational Fluid Dynamic (CFD) and similar codes. However, the massive data access from the Monte Carlo codes running on thousands of cores of GPU may dominate the LLC resources, and then starve the latency-critical codes running on the CPU. Therefore, a compiler stage optimization to exploit best data locality and parallelism of codes running on both the CPU and GPU comprehensively is necessary to adapt the engineering computing from the separate GPU or CPU based systems to single chip CPU and GPU heterogenous system. Through this project, students will learn and practice the complier optimization techniques in the back-end of the compiler and work on a heterogeneous CPU-GPU simulator gem5-gpu to evaluate their proposed solutions.