However, the design has important modifications to make it suitable for graphics (and high-performance computing): it has a 16-wide vector processor and it has been designed to better hide memory latency, as typically seen in graphics applications.
Note that the vector processor is the foundation to the parallel-computing capability here. Intel adamantly says that Larrabee is not a GPU, but can act as one. This is somewhat true, although in practice, I suspect that it will only be a GPU when it debuts.
We now know that Larrabee will be DirectX and OpenGL compliant through a 100% software renderer. Current GPU have certain parts like polygon setup and blending that are “fixed” or “hardwired” (which means that they are not programmable). However, Larrabee will use a fixed-function texture filtering mechanism. This part of the workload is just too expensive to do (for now) in a more flexible way and is better off using a fixed solution.
http://softwarecommunity.intel.com/articles/eng/3803.htm
At render time, the image is split into blocks of 128×128 (this number is flexible, but 128×128 should use half of the cache) that will be processed independently. That allows the rendering engine to pre-compute a number of things and re-use the information, later on, resulting in bandwidth savings.
Previous generations of GPUs that have attempted to use tiling did not succeed, but Intel should not encounter the same fate because their architecture is more flexible in handling all the corner cases.
Intel also claims that because Larrabee doesn’t have fixed functions like Triangle Setup or Raster Operation (ROP), which can become bottlenecks depending on rendering conditions (shadow maps might be setup-limited if the triangles are really small), it should maximize the use of its X86 cores.
It’s hard to tell what will happen until we see the first functional units with synthetic benchmarks. Also, because the pipeline is 100% programmable, things like frame-buffer blending can be much more programmable than it is today, which is good news for developers.
Note that Larrabee could be the ultimate “benchmark” processor: because it is so flexible, Intel could modify its driver to adapt it to any situation. That means that it could be configured optimally for key benchmarks, but most importantly key games and applications!
This would require a lot of manpower to pull off, but it is possible. Right now, somewhat rigid GPU architectures are built with a set of expectations that is extrapolated from current and upcoming application analysis.
The GPU life cycle is so short that GPU makers have enough time to react to trend changes. But Intel says that they can react at any time, given that they don’t have to wait for a new hardware cycle to change how their rendering pipeline works. It’s pretty cool.
However, note that it will most likely run DirectX 11, which means that it won’t be able to differentiate itself too much from classic GPUs under this API. Also, we don’t know how flexible future GPUs will be…
As we learn more about the *theoretical* capabilities of this chip, I’m thinking that it could be a console “dream processor” or co-processor, because console makers are typically very fund of extremely flexible architectures. They’re not the only ones: game developers will also be excited if the performance is there. Heck, as a software guy, I really like the idea.
The most important question is: “how fast will it be in the real world?” Right now, nobody knows for sure, including Intel. They won’t know until they get the first samples and start to measure its performance. Intel will release more information soon, so stay tuned. We’ll keep an eye on this!
Update 8/4: the SIGGRAPH paper is available. This is pretty technical, but just in case you feel like reading it…