Tri, short for Nguyen Minh Tri, is a PhD candidate in the school of electrical engineering at Princeton University. He is being advised by professor David Wentzlaff in the field of Computer Architecture at the Princeton Parallel Computing research group. Tri is currently in his 5th year, and hoping to graduate sometime next year...

Besides studying architecture, Tri dedicates whatever left of his time to guitar, video/board gaming, running, and last but not least, learning cool stuff that he wished he had studied instead on... YouTube.

Curriculum Vitae (11/2016)


Research at a Glance

For his PhD thesis, Tri's primary interest is in understanding the bandwidth wall, its cause, manifestation, and solution. To risk on oversimplification, the bandwidth wall refers to the widening gap between computation performance (core count or FLOP/s) and memory performance (# of memory channels, DRAM latency). Limited bandwidth is already a problem for today's throughput computing systems, and for future computing systems such as data centers and super-computers, memory will truly become a first-class optimization problem. In his research, Tri takes the viewpoint of systems 10+ years in the future where throughput of commercial manycore servers is more valuable than single-threaded performance of today's consumer desktops.

Selected Publications

MORC: Manycore Cache Compression (MICRO'15) pdf,slides

To increase overall memory bandwidth, one approach is to increase on-chip cache hit rate through cache compression. Much like file compression for email, cache compression compacts the data residing in caches in order to store more cache lines. As on-chip caches have much more bandwidth capacity than off-chip memory (terabytes vs gigabytes), serving memory requests from caches increases performance substantially even with the compression overhead.

Unlike prior work in cache compression, MORC utilizes a novel log-based cache organization to compress a log composed of multiple cache lines together, gzip-style. This approach results in vastly improved compression ratios, though with longer compression/decompression latencies and decreased single-threaded performance. That said, for future manycore systems, the trade-off is favorable for throughput and energy consumption. MORC was published in MICRO'15 in Waikiki.

MORC fig1

Piton (website)

Piton (pronounced pee-t-on) is a manycore prototype designed in-housed at Princeton and tape-out at IBM fab (now GlobalFoundries) at 32nm. The computational core is based on the OpenSPARC T1, and it has all the traditional features you have ever wanted in a manycore prototype: tile-based, distributed shared caches, directory-based shared mem, 3 NoCs, seamless multi-chip...

Piton chip


Email: firstname + lastname[0] at princeton dot edu