[ad_1]
In addition to Arm’s 2023 CPU cores, we’re taking a deep dive into what Arm has built into its recently announced 5th Gen mobile graphics architecture that will inevitably power future high-end mobile games. Before getting into the fine details, Arm’s 2023 GPU architecture comes in three product varieties — the Immortalis-G720, Mali-G720, and Mali-G620.
Like last year’s Immortalis-G715, Immortalis-G720 is the flagship product designed with ray tracing capabilities in hand. The Mali-G720 and G620 sport the same architectural capabilities, just with fewer cores and no mandatory ray tracing for more affordable product lines. As in previous Arm GPUs, the graphics core count remains key to scaling performance. So expect to see the Immortalis-G720 in flagship chipsets, the Mali-G720 in the upper-mid-range, and the G620 in more budget-oriented products. The table below highlights the key differences.
Arm 5th-Gen GPUs | Immortalis-G720 | Mali-G720 | Mali-G620 |
---|---|---|---|
Arm 5th-Gen GPUs
Shader core count |
Immortalis-G720
10-16 cores |
Mali-G720
7-9 cores |
Mali-G620
1-6 cores |
Arm 5th-Gen GPUs
Deferred Vertex Shading? |
Immortalis-G720
Yes |
Mali-G720
Yes |
Mali-G620
Yes |
Arm 5th-Gen GPUs
Hardware Ray Tracing? |
Immortalis-G720
Yes |
Mali-G720
No (optional) |
Mali-G620
No (optional) |
Arm 5th-Gen GPUs
Variable Rate Shading? |
Immortalis-G720
Yes |
Mali-G720
Yes |
Mali-G620
Yes |
Arm 5th-Gen GPUs
L2 cache slices |
Immortalis-G720
2 or 4 |
Mali-G720
2 or 4 |
Mali-G620
1, 2, or 4 |
Key talking points with Arm’s 5th Gen architecture include a 15% performance per watt gain over the previous generation, 40% less memory bandwidth usage to save on power consumption, and twice the HDR rendering capabilities with 64-bit-per-pixel texturing. All this fits into a GPU core that’s just 2% larger than last-gen.
The key to these eye-catching numbers is, in part, down to the adoption of Deferred Vertex Shading (DVS) in the GPU core, making it the heart of Arm’s latest architecture across all three products. Let’s get into how it works.
Deferred Vertex Shading explained
The long and short of DVS is that it reduces memory bandwidth usage, thereby saving on that all-important DRAM power consumption. This also frees up shared system memory to accommodate more complex geometry and also means a bigger power budget for potentially more GPU cores too. The examples Arm shared with us include 26% less bandwidth used in Fortnite up and 33% less bandwidth for Genshin Impact when compared to its last-gen GPU. The implication is that this is a valuable change for real-world games and not just benchmarks.
To accomplish this, Arm extended its long-running use of deferred rendering to delay vertex as well as fragment shading. Arm bamboozled us all with the following graphic to demonstrate how it all works, but we’ll walk you through it.
First, let’s quickly recap the basics of a graphics rendering pipeline. Vertex rendering comes first, which involves morphing geometry and triangles (think creating water ripples). Next comes rasterization, essentially calculating which triangles can be seen and which “pixel” grid they fall into. Then fragment processing applies color (textures, lighting, depth, etc.) to finalize the frame. The deferred part of a rendering pipeline comes by waiting to do the fragment shading until you’ve culled all the out-of-view triangles. This avoids re-shading triangles multiple times compared to forward shading, which might run multiple lighting calculations on the same geometry.
So performance can increase, but so does the memory requirement to store the deferred data. It can’t all be held in cache-like forward shading, so it is put into an external vertex buffer. That can be costly in terms of power. It’s equally important to appreciate that Arm, like most other mobile GPU designers, uses tile-based rendering, splitting the render frame into much smaller tiles. This saves on local memory and increases performance as fewer pixels are rendered at a given time. However, deferred information must still be stored and returned from memory when it’s time for fragment shading, which consumes power and bandwidth.
The important thing is that DVS reduces memory bandwidth, improving power consumption.
However, if a triangle fits entirely into a small number of tiles, there’s scope to defer part of the vertex shading process until much closer to fragment shading. In this instance, vertex data kept in a local cache and processed closer in time to fragment shading. The result is far fewer memory reads and writes, and therefore a notable saving in power consumption. The smart thing about Arm’s implementation is that positional information is gathered as part of the tiling process, making it possible to cull triangles early and defer rendering if they fit in the tile. For larger triangles, forward vertex rendering is used and the data is stored in an external buffer. After all the triangles are processed, they are recalled from memory for rasterization and fragment shading.
Importantly, this feature is handled completely in hardware, saving memory bandwidth in certain scenarios (particularly models with very high geometry detail or many small distant triangles) without any input from software developers.
That’s a lot to take in (it’s taken me many tries). The key to understanding it is basically that, where possible, Arm’s 5th-Gen architecture holds off on vertex shading in addition to traditional fragment shading to cut down on costly reads and writes to memory, which saves power.
There’s even more to Arm’s 5th Gen graphics architecture
Robert Triggs / Android Authority
DVS is just part of Arm’s latest GPU architecture. Ray tracing support returns, of course, which is mandatory in the Immortalis branded G720. But there’s also now support for 2x Multi-Sampling Anti-Aliasing (MSAA), in addition to previously supported 4x, 8x, and 16x options. 4x MSAA has little overhead with tile-based pipelines, but Arm has seen that developers want to drive even higher frame rates in their games to improve fidelity. Hence it’s latest architecture supports 2x MSAA as well.
The latest GPUs also improve performance in 4×2 and 4×4 fragment shading rates used in VRS. A niche use case, to be sure, but one that will give the graphics core extra futureproofing for upcoming games.
At a deeper level, Arm supports implementing two power rails for higher core counts (six and above), enabling higher clock frequencies for the same voltage as before. Speaking of power, the G720 duo and G620 have additional clock, voltage, and power domain configuration options for fine-grain energy control.
So what does this all mean for next-generation smartphone graphics chips? Well, improved power consumption is the big gain, thanks to memory savings and other power improvements. That’s not just significant for battery life; it also means that Arm’s partners could increase their core count for additional performance while remaining within existing power budgets. Even if core counts don’t grow, that 15% typical energy saving can be put towards additional performance itself, which will translate to better frame rates in the latest high-end mobile games.
[ad_2]