You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Near-data computing, processing in memory (PIM), in-storage computing—all of these buzzwords are pretty hot right now. They acknowledge that, in the modern era, the real costs to high-performance computing are all about data movement, not computation itself. So by "moving" computation "closer" to the data it acts on, we stand to build far more efficient systems.
One problem with these buzzwords is that they can mean many different things, from custom SRAM arrays to just throwing an ARM processor into an SSD enclosure. But one category of work boils down to saying let's see if we can use HBM to do anything useful.
High-Bandwidth Memory (HBM) is a new standard for 3D-stacked DRAM, meaning that a memory chip can be slapped directly onto a logic chip. In this 3D arrangement, the logic and memory can communicate at multiple points in the 2D surface where they make contact. The result is that the memory is, well, very high bandwidth—but it's also complicated to use because you have to carefully manage the channels you communicate over. You can visualize several "portals" in the logic die that can communicate to a chunk of the memory, distributed across the chip, each one surrounded by a little cluster of computational logic. If you live in a logic cluster that happens to be close to portal that contains the data you need, you are lucky and you can get that data quickly. If you have to go to another portal, well, sorry—you need to pay the cost to traverse over to the other portal and ask for it there.
The result is that there is a cottage industry of papers just showing that you can do something useful with HBM that approaches utilization of its full bandwidth potential. Here's a very recent overview paper, and here's a talk from the same authors. The result is that we live in a world where, if you can get an FPGA+HBM chip to do something useful, you can publish a paper.
Calyx is well positioned to break this one-off accelerator design logjam. We already know that Calyx is great at making it easy to generate custom processing logic for an application. Let's see what it would take to hook this core processing logic up to HBM channels in a flexible way. If we can put a DSL in charge of how to allocate computation to HBM channels, how to manage the interface, what kinds of caching/buffering to add, etc., we might be able to make it far easier to explore the design space of HBM-exploiting accelerators.
Fortunately, we already have an HBM-equipped FPGA in havarti. So step 0 to understanding how feasible this would be is to make a minimal design that can interact with an HBM channel and expose it to a Calyx program.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Near-data computing, processing in memory (PIM), in-storage computing—all of these buzzwords are pretty hot right now. They acknowledge that, in the modern era, the real costs to high-performance computing are all about data movement, not computation itself. So by "moving" computation "closer" to the data it acts on, we stand to build far more efficient systems.
One problem with these buzzwords is that they can mean many different things, from custom SRAM arrays to just throwing an ARM processor into an SSD enclosure. But one category of work boils down to saying let's see if we can use HBM to do anything useful.
High-Bandwidth Memory (HBM) is a new standard for 3D-stacked DRAM, meaning that a memory chip can be slapped directly onto a logic chip. In this 3D arrangement, the logic and memory can communicate at multiple points in the 2D surface where they make contact. The result is that the memory is, well, very high bandwidth—but it's also complicated to use because you have to carefully manage the channels you communicate over. You can visualize several "portals" in the logic die that can communicate to a chunk of the memory, distributed across the chip, each one surrounded by a little cluster of computational logic. If you live in a logic cluster that happens to be close to portal that contains the data you need, you are lucky and you can get that data quickly. If you have to go to another portal, well, sorry—you need to pay the cost to traverse over to the other portal and ask for it there.
The result is that there is a cottage industry of papers just showing that you can do something useful with HBM that approaches utilization of its full bandwidth potential. Here's a very recent overview paper, and here's a talk from the same authors. The result is that we live in a world where, if you can get an FPGA+HBM chip to do something useful, you can publish a paper.
Calyx is well positioned to break this one-off accelerator design logjam. We already know that Calyx is great at making it easy to generate custom processing logic for an application. Let's see what it would take to hook this core processing logic up to HBM channels in a flexible way. If we can put a DSL in charge of how to allocate computation to HBM channels, how to manage the interface, what kinds of caching/buffering to add, etc., we might be able to make it far easier to explore the design space of HBM-exploiting accelerators.
Fortunately, we already have an HBM-equipped FPGA in havarti. So step 0 to understanding how feasible this would be is to make a minimal design that can interact with an HBM channel and expose it to a Calyx program.
Beta Was this translation helpful? Give feedback.
All reactions