NVIDIA's 64-Bit Denver CPU Architecture Details Unveiled - Dual Custom ARMv8 Cores Clocked at 2.50 GHz
NVIDIA has unveiled the outset architecture details of their custom designed 64-Bit Denver CPU which is also their beginning loftier-performance SOC design at Hot Chips. It has been almost eight months since NVIDIA launched their new Tegra K1 SOC which features an A15 processor and 192 Kepler cores featuring unparalleled amount of functioning and power efficiency against chips from competitors.
NVIDIA'southward 64-Bit Denver CPU Architecture Details Unveiled
The first Tegra K1 variant which is based off the 32-Flake ARM15 core has made some proper name and featured in some hot selling devices such as the Xiaomi MiPad and the NVIDIA Shield Tablet which is the company's reference and latest Shield branded "handheld" gaming device. However, we accept known since launch that there were ever supposed to be two variants of the Tegra K1 SOC, ane with the 32-Bit ARM cadre while the other featuring 64-Fleck Denver CPU. Theoretically, Projection Denver's dual core should be much more powerful than the previous 4+1 Cortex A15 based variant. The 'Super Dual Core' as Nvidia calls information technology is a highly efficient compages (ARMv8 -A) and the first iteration of ARM to feature 64 bit. A major indicator of its power efficiency is that while the 4+one Variant features alow power core for non-intensive applications, the Denver Variant merely has the 2 cores.
Denver is a dual cadre at its centre featuring a 7-Style Superscalar micorarchitecture fitted beyond 192 Kepler GPU cores. It includes a 128 KB 4-Way L1 enshroud, a 64 KB 4_Way L1 enshroud and a 2 MB sixteen-Manner L2 enshroud. Denver likewise makes use of the new Dynamic code optimization which stores frequently used software routines into a dense and highly tuned microcode-equivalent routines. For this purpose, a 128MB main memory based optimization cache has been configured which reduces the need to re-optimize software routines
As part of the Dynamic Code Optimization procedure, Denver looks across a window of hundreds of instructions and unrolls loops, renames registers, removes unused instructions, and reorders the code in various ways for optimal speed. This effectively doubles the functioning of the base-level hardware through the conversion of ARM code to highly optimized microcode routines and increases the execution energy efficiency. NVIDIA
So coming to the technical details, the details presented at Hot Fries show that Denver CPU has its own teaching fix and brand use of conversion to process ARMv8 instructions to its own ISA. Equally reported by TechReport:
- Binary translation is for existent. Aye, the Denver CPU runs its own native instruction ready internally and converts ARMv8 instructions into its own internal ISA on the wing. The rationale backside doing so is the opportunity for dynamic code optimization. Denver tin can analyze ARM code simply before execution and wait for places where it can bundle together multiple instructions (that don't depend on one some other) for execution in parallel. Binary translation has been used by some interesting CPU architectures in the past, including, famously, Transmeta's x86-compatible effort. It'south too used for emulation of non-native lawmaking in a number of applications.Denver's binary translation layer runs in software, at a lower level than the operating system, and stores commonly accessed, already optimized code sequences in a 128MB cache stored in master retentiveness. Optimized lawmaking sequences can and then be recalled and replayed when they are used again.
- Execution is broad simply in-guild. Denver attempts to save ability and reap the benefits of dynamic code optimization by eschewing power-hungry out-of-social club execution hardware in favor of a simpler in-gild engine. That execution engine is very wide: seven-way superscalar and thus capable of processing every bit many every bit seven operations per clock bike. Denver's elevation pedagogy throughput should be very loftier. The tougher question is what its typical throughput will exist in cease-user workloads, which can be variable enough and contain plenty dependencies to claiming dynamic optimization routines. In other words, Denver's high peak throughput could be accompanied past some fragility when it encounters difficult instruction sequences. via TechReport
The operation numbers were also presented for the Denver CPU in which its pitted against a Haswell "Celeron 2955", iPhone 5s (A7 Cyclone), Krait-400 (8974-AA) and Baytrail (Celeron N2910) processor. In all benchmarks, the Tegra K1 64-Fleck Denver powered SOC turns out faster than the mobility based chips while the 15W Haswell CPU which does have a leverage in some benchmarks is running merely on par with the Tegra K1 SOC. The wattage of Tegra K1 Denver is not known merely would be lower than what we accept seen on the 32-Scrap variant but seeing how information technology performs equivalent to PC level chips is amazing. NVIDIA has stated that their Dual Cadre Denver CPU can surpass quad and Octa core mobile processors on most mobility workloads while delivering insane power efficiency. The Tegra K1 64-Chip aims to deliver PC-Class performance in the mobile give-and-take and NVIDIA assures that they will accept mobile devices based on the Denver CPU arriving afterwards this year and they are already developing the side by side version of Android "L" on Tegra K1.
NVIDIA Tegra K1 64-Bit Denver CPU Specifications:
| NVIDIA Tegra K1 64-Bit | NVIDIA Tegra K1 32-Fleck | NVIDIA Tegra 4 | NVIDIA Tegra 3 | |
| Codename | Logan | Logan | Wayne | Kal-El |
| ARM Cores | 2 Cadre (Multi-Thread) | 4+1 | iv+one | 4 Core |
| ARM Compages | 64-chip ARM v8 (Custom) | 32-fleck Cortex A15 | 32-bit Cortex A15 | 32-bitCortex A9 |
| GPU Architecture | Kepler | Kepler | GeForce GPU | GeForce GPU |
| GPU Cores | 192 Core | 192 Core | 72 Cadre | 12 Core |
| Process | 28nm | 28nm | 28nm HPL | 40nm LPG |
| Core Frequency | 2.5 GHz | 2.3 GHz | 1.9 GHz | i.two GHz |
| Memory Size | eight GB | viii GB | iv GB | 2 GB |
| Retention Blazon | DDR3L / LPDDR3 | DDR3L / LPDDR3 | DDR3L / LPDDR3 | DDR3 / LPDDR2 |
| Cache | 128 K + 128 M L1 | 32K + 32K L1 | 32K + 32K L1 | - |
| Launch | 2014 | 2014 | 2013 | 2012 |
The Performance numbers have been compiled by the beau forum members over at Beyond3D for better understanding:
DMIPS
- Baytrail (Celeron N2910): 0.45x
- S800 (Krait 400 8974AA): 0.95x
- Tegra K1 (R3 Cortex A15): i.00x
- A7 (Cyclone): one.30x
- Haswell (Celeron 2955U): 1.00x
- Tegra K1 (Denver): 1.80x
SPECInt 2K
- Baytrail (Celeron N2910): 0.70x
- S800 (Krait 400 8974AA): 0.60x
- Tegra K1 (R3 Cortex A15): 1.00x
- A7 (Cyclone): 0.90x
- Haswell (Celeron 2955U): ane.30x
- Tegra K1 (Denver): 1.45x
SPECFP 2K
- Baytrail (Celeron N2910): 0.85x
- S800 (Krait 400 8974AA): 0.80x
- Tegra K1 (R3 Cortex A15): i.00x
- A7 (Whirlwind): Northward/A
- Haswell (Celeron 2955U): 1.95x
- Tegra K1 (Denver): ane.75x
AnTuTu 4
- Baytrail (Celeron N2910): N/A
- S800 (Krait 400 8974AA): 0.80x
- Tegra K1 (R3 Cortex A15): 1.00x
- A7 (Cyclone): 0.70x
- Haswell (Celeron 2955U): N/A
- Tegra K1 (Denver): 1.00x
Geekbench iii Single-Core
- Baytrail (Celeron N2910): 0.65x
- S800 (Krait 400 8974AA): 0.80x
- Tegra K1 (R3 Cortex A15): 1.00x
- A7 (Cyclone): ane.20x
- Haswell (Celeron 2955U): 1.20x
- Tegra K1 (Denver): 1.65x
Google Octane v2.0
- Baytrail (Celeron N2910): 0.70x
- S800 (Krait 400 8974AA): 0.65x
- Tegra K1 (R3 Cortex A15): 1.00x
- A7 (Cyclone): 0.70x
- Haswell (Celeron 2955U): 1.45x
- Tegra K1 (Denver): 1.30x
16MB Memcpy (GB/south)
- Baytrail (Celeron N2910): 0.85x
- S800 (Krait 400 8974AA): 0.80x
- Tegra K1 (R3 Cortex A15): 1.00x
- A7 (Cyclone): i.15x
- Haswell (Celeron 2955U): 1.55x
- Tegra K1 (Denver): 1.40x
16MB Memset (GB/due south)
- Baytrail (Celeron N2910): 0.40x
- S800 (Krait 400 8974AA): 0.75x
- Tegra K1 (R3 Cortex A15): 1.00x
- A7 (Cyclone): 0.80x
- Haswell (Celeron 2955U): 0.65x
- Tegra K1 (Denver): one.05x
Source: https://wccftech.com/nvidias-64bit-denver-cpu-architecture-details-unveiled-dual-custom-armv8-cores-clocked-250-ghz/
Posted by: chasteennord1954.blogspot.com

0 Response to "NVIDIA's 64-Bit Denver CPU Architecture Details Unveiled - Dual Custom ARMv8 Cores Clocked at 2.50 GHz"
Post a Comment