Friday, January 23, 2009

Intel Graphics Media Accelerator 900 and Intel 915G

Intel Graphics Media Accelerator 900 and Intel 915G
(New Generation of Integrated Graphics)

Intel’s contribution into the sphere of integrated graphics has been rather poor compared to the mighty rivals like RADEON 9x00 PRO and NVIDIA nForce2. The speed and functionality of the Extreme Graphics 2 core from Intel is no match for the current integrated GPUs from NVIDIA and ATI – our recent review of contemporary integrated chipsets confirmed this point.
In spite of the alluring name, Extreme Graphics 2 is obsolete with its one pixel pipeline and two texture-mapping units (I won’t mention VIA or SiS today – their currently available integrated graphics cores are downright hopeless). It is like the long-forgotten TNT chipset from NVIDIA. Like the TNT, Extreme Graphics 2 has no hardware support of T&L as well as shaders.
Intel seemed to give little thought to that; Intel’s integrated chipsets never lose to their competitors in other capabilities, while high-performance integrated graphics must have been less interesting for the company.
This situation has changed after the arrival of LGA755 CPUs and a new family of PCI Express-supporting chipsets: the i915G chipset boasts a new integrated graphics processor called Intel Graphics Media Accelerator 900, and this is the first integrated chipset to have hardware support of DirectX 9 shaders.
Now, let’s discuss this and other facts in more detail.

Intel Graphics Media Accelerator 900: Functionality and Features:

So, Graphics Media Accelerator 900 is an integral part of the Intel 915G chipset, which supports the PCI Express bus and DDR2 memory. We tested the i915G chipset using a D915GUX mainboard from Intel:


Data transfers between the graphics core and memory, both on standalone graphics cards and with integrated chipsets, are performed in rather big chunks, so higher memory frequency is more important than the timings. That is, the use of DDR2 memory, which works at higher clock rates compared to DDR SDRAM, provides an additional performance reserve to the integrated graphics processor: as usual, Graphics Media Accelerator 900 uses some part of the system RAM as graphics memory.
The i915G features a dual-channel memory controller, and ideally, when there’s no load from the CPU, GMA 900 can exchange data with the “graphics memory” at a speed of up to 8.5GB/s. The 128-bit “graphics memory bus” and 533MHz memory frequency are good parameters even if we compare them to mainstream discrete graphics.
Let’s now focus on the graphics core alone. The following table compares the two generations of integrated graphics from Intel:

Intel 865G Intel 915G

Graphics core Intel Extreme Graphics 2 Intel Graphics Media Accelerator 900

Graphics core clock frequency 266MHz 333MHz
Pixel pipelines 1 4
Texturing units per pipeline 2 1
Maximum pixel rendering speed 266Mpixels/sec 1333Mpixels/sec
Maximum texturing speed 533Mtexels/sec 1333Mtexels/sec
Maximum number of textures
during multitexturing 4 8
Hardware pixel shaders None DirectX 9
shaders 2.0 Hardware vertex shaders and T&L None None
FSAA methods None None
Texture filtering Bilineartri-linearanisotropic Bilineartri-linearanisotropic
Maximum anisotropy level 2x 4x
Multi-display configurations None Yes
RAMDAC frequency 350MHz 400MHz


The table doesn’t include the characteristics of the integrated GPUs as concerns video playback and output, but it is anyway clear that Graphics Media Accelerator 900 is not a development of the existing architecture, but a new GPU from ground up.

Some points need comments:

The core now has four pixel pipelines. To be more exact, GMA 900, like the previous Extreme Graphics 2 core, has only one pixel pipeline, but unlike its predecessor, it processing four pixels at a time – all modern graphics processors use the same organization of the pixel pipework.
GMA 900’s “four-pixel” pipeline has four texture-sampling units, equivalent to a scheme with four independent pixel pipelines with one texture-mapping unit per each. Graphics Media Accelerator 900 can render one texture on a pixel per cycle, and rendering of each next texture requires an additional cycle. That’s exactly how modern GPUs work.
GMA 900 features hardware support of DirectX 9 pixel shaders. It means that modern applications using Shader Model 2.0 can start and run on a GMA 900 system without any problems or quality loss. Unfortunately, Intel didn’t publish yet any info about the supported calculation accuracy during execution of shaders, while special-purpose test utilities like Xbitmark, Shademark and others, as if conspiredly, refused to run on the i915G – they all require hardware support of DirectX 9 vertex shaders.
The new graphics core from Intel has no hardware support of vertex shaders or T&L. All the geometry transformations are calculated by the central processor of the system. The company presents this as an efficient use of system resources and relies on the power of its processors – the user shouldn’t pay for a more complex graphics core if the CPU can handle geometry calculations all right. However, things are not so bright in reality: discrete graphics processors have long had hardware support of vertex shaders and their special-purpose vertex units are no inferiors even to the topmost Intel CPUs as concerns fast shader execution.
Intel’s GMA 900 employs the tile-based architecture – “Zone Rendering Technology 3” is Intel’s term for it. This technology works like that:

Before drawing the image, the driver first waits till the application provides all the polygons necessary for the rendering. Then, for each tile (a “zone” in Intel’s terminology, it is a rectangular fragment of the image) lists of triangles that fully or partially cover it are produced.
When rendering a frame, the graphics processor renders tile after tile, using the polygon lists created at Step 1 as source data, until the entire frame is rendered.

This operation scheme has both advantages and shortcomings. The advantages include:

The use of small fragments, tiles, allows for an efficient use of the GPU’s caches since small amounts of homogenous data are operated upon.
Having drawn a tile, the GPU never turns back to it in the frame-creation process. Considering that a tile has a small size, the fragments of the frame buffer and the Z-buffer, corresponding to this tile, can be wholly loaded into the GPU’s cache. Thus, the graphics processor does all of its calculations “on-die”, using the cache, rather than system memory. After the tile is drawn, the contents of the tile frame-buffer and the Z-buffer are written into the system RAM. Caching of the frame and Z buffers allows alleviating the load on the memory bus by performing data transfers in larger blocks. This is most important for an integrated chipset, whose graphics core has to share the memory bus with the central processor.
Intel’s GMA 900 has a special unit for checking the values of Z pixels. If the check says some group of pixels won’t be visible, it is excluded from further processing. This Z-checking helps GMA 900 to avoid performing unnecessary work – like texturing or shader execution – for invisible pixels and to be most efficient at rendering scenes with a high overdraw parameter, which reflects the level of overlapping of the objects (or the number of “redraws” of a pixel).

The disadvantages of a tile-based architecture are mostly related to how it processes geometry data:
In order to create the polygon lists correctly, the tile-architecture graphics processor has to wait for all the geometry data, necessary to build a frame, to come in, and only then it starts rendering the scene. GPUs of the traditional architecture begin to process streams of geometry data and render the scene right after they start receiving the data.
The need to sort the polygons and create lists for each tile badly conforms to the well-established stream-n-pipelined operation algorithm of vertex processors. This is probably the reason for GMA 900 to offer no hardware support of T&L and vertex shaders, while all the geometry data as well as the polygons sorting are performed by the central processor.

Graphics Media Accelerator 900 doesn’t seem to have full-screen antialiasing. At least, the driver’s control panel doesn’t offer this function. Of course, FSAA is not a very important feature for an integrated graphics core, considering its overall low level of performance. However, it would come in handy in simple 3D games where the core would have some performance reserve.
GMA 900 supports anisotropic texture filtering of up to 4x level. Anisotropic filtering cannot be combined with tri-linear filtering: the latter is disabled when you enable the former.
GMA 900 supports Dynamic Video Memory Technology version 3.0. Thanks to DVMT, the system memory becomes “graphics memory” when it’s necessary and in the necessary amounts; it is flushed up for the needs of the OS after it is no longer in use by the GPU. Thus, the OS and the GPU share the system memory in the most efficient and balanced way.

That’s how GMA 900 works with memory:

The memory amount necessary for the graphics core is divided in two parts. The first and smaller part – Preallocated Memory – is the GPU’s domain; the operating system cannot use it and regards it as regular graphics memory. You can set up the size of this memory area in the BIOS into 1MB or 8MB.
The other part is provided for GMA 900 by DVM Technology. Three DVMT modes are supported:


In the “Fixed” mode, a fixed-size fragment of the system memory is allocated to the graphics core. It can only be used by the graphics core; its size can be set to 64 or 128MB.
In the “DVMT” mode, the driver of the graphics core uses the system memory like any other OS component or application does. If a “heavy” 3D game starts up, requiring a lot of memory for textures, geometry data and so on, and there’re no other memory-hungry applications running, the required memory amount is automatically allotted to the graphics core. When the GPU doesn’t need the surplus memory, it automatically hands it over to the OS. The maximum amount of memory, given to the GPU in this mode, is 224MB (the preallocated memory included).
In the “Fixed+DVMT” mode, the graphics processor gets a fixed-size chunk of 64MB of memory (preallocated memory included) and up to 64MB of dynamically-allotted memory. This mode guarantees that at least 64MB of memory is available to the graphics core, with a possibility to increase this amount to 128MB, if necessary.

So, the new graphics processor from Intel is an ambiguous figure that combines an efficient tile-based architecture, support of DirectX 9 pixel shaders and flexible control over memory with such deficiencies as the lack of hardware support of T&L and vertex shaders, unavailable FSAA and high texture filtering modes (tri-linear plus anisotropic filtering).
Today I’m going to test Graphics Media Accelerator 900 and compare it to potential and actual competitors. The description of our testbed and testing methodology follows.

Testbed and Methods:
The i915G-based system was configured as follows:

Intel Pentium 4 Extreme Edition 3400MHz CPU (800MHz FSB);
Intel D915GUX mainboard (Intel 915G chipset);
2x512MB Micron DDR2 SDRAM, 533MHz.

I’ll compare the i915G chipset with available integrated chipsets for Socket 479 and Socket A platforms that I reviewed in the previous article: ATI RADEON 9100 IGP, RADEON 9000 PRO IGP, SiS661 FX, Intel 865G, NVIDIA nForce2 IGP, SiS741 GX and VIA KM400. I tested these chipsets using a Pentium 4 3000MHz (800MHz FSB) and Athlon XP 3000+ (333MHz FSB) CPUs and 2x512MB TwinMOS PC3200 (CL2) memory.
Like in the previous review, I didn’t ask for the impossible from the integrated graphics by using extreme gaming modes. I just used the games’ medium and low image-quality settings in 800x600 and 1024x768 resolutions:

The “Medium Quality” mode uses average graphics quality settings and 32-bit color depth of the frame buffer. This is a compromise between speed and image quality in games.
The second or “Low Quality” mode uses the lowest graphics quality settings and 16-bit color (in games that allow setting this color depth) to achieve the maximum fps rates.

When running synthetic tests and games that use DirectX 9 shaders, we took the cheapest of the available DirectX 9-compatible graphics cards from ATI and NVIDIA: RADEON 9600, RADEON 9550 with a 64-bit memory bus, GeForce FX 5200 and GeForce FX 5200 with a 64-bit memory bus.
Unfortunately, PCI Express analogs of these cards were unavailable as of the time of our tests, so we tested AGP products – it is clear that the results of the cards will be mostly determined by the performance of their GPUs and memory, rather than by the speed of the CPU or the system overall since we used very powerful configurations.
The i915G-based mainboard doesn’t support the AGP, so we plugged those graphics cards into a differently configured system:


AMD Athlon 64 3400+ CPU;
ASUS K8V-SE mainboard;
2x512MB TwinMOS PC3200 CL2.5.

I’d like to emphasize the fact that the tested graphics cards vary greatly in their performance level among themselves: RADEON 9600/9550 compete with GeForce FX 5700/5700LE in the market, rather than with the GeForce FX 5200. However, we chose these cards since they are the cheapest products from ATI and NVIDIA with hardware support of DirectX 9 shaders and are potential rivals of the i915G on the transition to the PCI Express bus. My point is that you shouldn’t compare the cards among themselves: they are potential competitors to the i915G, not to each other.
Now, the testing environment is clear – let’s proceed to the benchmarks.

Synthetic Benchmarks: Pixel Performance:

Thanks to the four pixel pipelines and increased clock rate, the pixel output rate and the texturing speed grew manifold on GMA 900 compared to the previous core, Extreme Graphics 2. The fill-rate tests of 3DMark 2001 SE confirm this fact:

Intel’s GMA 900 is very close to its maximum theoretical texturing speed, both when rendering one and several textures – the texturing speed is 80% and 96% of the maximum, respectively, in the hardest mode.
The share of read/write operations with the frame and Z buffers is much higher at single-texturing than at multi-texturing, so the tile-based GMA 900 processor, employing caching of the Z-buffer and the frame-buffer, should be similarly efficient at single- and multi-texturing. It is a notable fact that GMA 900 has a higher efficiency at multi-texturing, though, which is a trait of classic-architecture GPUs. At the same time, in spite of the limited memory bus bandwidth, this GPU is rather indifferent to the changes in the precision of the frame buffer, Z-buffer, which is a feature of tile-based processors.
Now let’s see the new graphics processor handling pixel shaders. Specialized test suites like Xbitmark or ShaderMark wouldn’t run on GMA 900, finding no hardware support of vertex shaders, so we’ll limit ourselves with the results of 3DMark 2001 SE and 3DMark03 only:



Like all modern graphics processors with hardware support of DirectX 9 pixel shaders, GMA 900 finds it no problem to do DirectX 8 shaders. It runs a simple DirectX 8 shader fast enough, being just a little slower than the RADEON 9600.

The efficiency of Graphics Media Accelerator 900 declines at rendering a scene with a more complex shader. Considering the difference in frequencies between the RADEON 9600 and GMA 900, I can say that GMA 900 is nearly twice slower. On the other hand, it keeps its advantage over the GeForce FX 5200.
Running a complex DirectX 9 pixel shader, rich in math1ematical calculations, the new graphics core from Intel finds itself behind the GeForce FX 5200, not even mentioning the RADEON 9600/9550.
So, GMA 900 is quite efficient at texturing but execution of complex DirectX 8 and 9 pixel shaders doesn’t seem to be among its strong points.

..........................................

0 comments: