computer tutorial: Intel Itanium

By : http://en.wikipedia.org

Itanium
Itanium 2

Produced:
From mid 2002 to present
Manufacturer:Intel
CPU Speeds:733 MHz to 1.6 GHz
FSB Speeds:200 MHz to 533 MHz
Instruction Set:Itanium
Socket:PAC611
Cores:
· McKinley
· Madison
· Hondo
· Deerfield
· Montecito

Itanium is the brand name for 64-bit Intel Microprocessors that implement the Intel Itanium architecture (formerly called IA-64). Intel has released two processor families using the brand: Itanium and Itanium 2. The processors are marketed for use in enterprise servers and high-performance computing systems. The architecture originated at Hewlett-Packard (HP) which was then later developed by HP and Intel together.
Itanium's architecture differs dramatically from the x86 and x86-64 architectures used in other Intel processors. The architecture is based on explicit instruction-level parallelism, with the compiler making the decisions about which instructions to execute in parallel. This approach allows the processor to execute up to six instructions per clock cycle. By contrast with other superscalar architectures, Itanium does not need elaborate hardware to keep track of instruction dependencies during parallel execution.
After a protracted development process, the first Itanium was released in 2001, and subsequently more powerful Itanium processors have been released periodically. HP produces most Itanium-based systems, but several other manufacturers have also developed systems based on Itanium. As of 2007, Itanium is the fourth-most deployed microprocessor architecture for enterprise-class systems, behind x86-64, IBM POWER, and SPARC. After a schedule slip of several years,Intel released its newest Itanium 2, codenamed Montecito, in July 2006.
Development: 1989–2001
In 1989, HP determined that reduced instruction set computer (RISC) architectures were approaching a processing limit at one instruction per cycle. HP researchers investigated a new architecture called Explicitly Parallel Instruction Computing (EPIC) that allows the processor to execute multiple instructions in one clock cycle. EPIC implements a form of very long instruction word (VLIW) architecture, where one instruction word contains multiple instructions. With EPIC, the compiler determines in advance which instructions can be executed at the same time, so the microprocessor simply executes the instructions and does not need elaborate mechanisms to determine which instructions to execute in parallel.
HP determined that it was no longer cost-effective for individual enterprise systems companies such as itself to develop proprietary microprocessors, so HP partnered with Intel in 1994 to develop the IA-64 architecture, which derived from EPIC. Intel was willing to undertake a very large development effort on IA-64 in the expectation that the resulting microprocessor would be used by the majority of the enterprise systems manufacturers. HP and Intel initiated a large joint development effort with a goal of delivering the first product, codenamed Merced, in 1998.
During development, Intel, HP, and industry analysts were predicting that IA-64 would dominate in servers, workstations, and high-end desktops, and eventually supplant RISC and complex instruction set computer (CISC) architectures for all general-purpose applications. Several groups began to develop operating systems for the architecture, including Microsoft Windows variants, Linux variants, and UNIX variants. By 1997, it was apparent that the IA-64 architecture and the compiler were much more difficult to implement than originally thought, and the delivery of the Merced began slipping quarter by quarter.Technical difficulties included the very high transistor counts needed to support the wide instruction words and the large caches. There were also structural problems within the project, as the two parts of the joint team used different methodologies and had slightly different priorities. Since Merced was the first EPIC processor, the development effort encountered more unanticipated problems than the team was accustomed to. In addition, the EPIC concept depends on compiler capabilities that had never been implemented before, so more unanticipated research was needed.
Intel announced the official name of the processor, Itanium, on October 4, 1999.Within hours observers referred to the processor as Itanic,a reference to Titanic, the "unsinkable" ocean liner which sank in 1912. Itanic has since often been used by The Register,Scott McNealy,and others It alludes to the perception that Itanium is a white elephant which cost Intel and HP many billions of dollars while failing to achieve expected performance and sales in the originally projected timeframe. Meanwhile, RISC and CISC architects were making steady improvements in superscalar implementations, allowing them to break the one-instruction-per-clock barrier without using EPIC.

Itanium processor: 2001–02

Intel Itanium processor
Produced:
From June 2001 to June 2002
Manufacturer:Intel
CPU Speeds:733 MHz to 800 MHz
FSB Speeds:266 MT/s to 266 MT/s
Instruction Set:Itanium
Socket:PAC418
Core Name:Merced

By the time Itanium was released in June, 2001, it was no longer superior to contemporaneous RISC and CISC processors. Itanium competed at the low-end (primarily 4-CPU and smaller systems) with servers based on x86 processors, and at the high end with IBM's POWER architecture and Sun Microsystems' SPARC architecture. Intel repositioned Itanium to focus on high-end business and HPC computing, attempting to duplicate x86's successful "horizontal" (i.e., single architecture, multiple systems vendors) market. Its success was limited to replacing PA-RISC and Alpha in HP systems and MIPS in SGI's HPC systems. POWER and SPARC remained strong, while the 32-bit x86 architecture grew into the enterprise space. With economies of scale fueled by its enormous installed base, x86 was the preeminent "horizontal" architecture in enterprise computing. HP and Intel recognized that Itanium was not competitive and replaced it with Itanium 2 a year later, as they had planned. Only a few thousand of the original Itaniums were sold, due to limited availability caused by poor yields, relatively poor performance, and high cost. However, these machines were useful for software development for the Itanium 2 processors that followed. IBM delivered a supercomputer based on this processor.[12]
Itanium 2 processors: 2002–present
The Itanium 2 was released in 2002, and was marketed for enterprise servers rather than for the whole gamut of high-end computing. The initial Itanium 2 was codenamed McKinley. McKinley used a 180 nm process, but it relieved many of the performance problems of the original Itanium. In 2003, AMD released the Opteron, which implemented its x86-64 64-bit architecture. Opteron gained rapid acceptance in the enterprise server space because it provided an easy upgrade from x86. Intel responded by implementing x86-64 in its Xeon microprocessors in 2004 Intel released a new Itanium 2 family member, codenamed Madison, in 2003. Madison used a 130 nm process and was the basis of all new Itaniums until Montecito was released in June 2006.
In March, 2005, Intel announced that it was working on a new Itanium device, codenamed Tukwila, to be released in 2007. Tukwila would have four processors and would replace the Itanium bus with a new Common System Interface, which would also be used by a new Xeon.Intel later said that Tukwila would be delivered in late 2008.
In November 2005, the major Itanium server manufacturers joined with Intel and a number of software vendors to form the Itanium Solutions Alliance to promote the architecture and accelerate software portingThe Alliance announced that its members would invest $10 Billion in Itanium solutions by the end of the decade. As of June 2007, Intel has released seven additional versions of the Itanium 2, and another is expected in late 2007.
Architecture
the Itanium instruction set and microarchitecture, and the technical press has provided overviews.The architecture has been renamed several times during its history. HP called it EPIC and renamed it to PA-WideWord. Intel later called it IA-64, before settling on Intel Itanium Architecture, but it is still widely referred to as IA-64. It is a 64-bit register-rich explicitly-parallel architecture. The base data word is 64 bits, byte-addressable. The logical address space is 264 bytes. The architecture implements predication, speculation, and branch prediction. It uses a hardware register renaming mechanism rather than simple register windowing for parameter passing. The same mechanism is also used to permit parallel execution of loops. Speculation, prediction, predication, and renaming are under control of the compiler: each instruction word includes extra bits for this. This approach is the distinguishing characteristic of the architecture.
The architecture implements 128 integer registers, 128 floating point registers, 64 one-bit predicates, and eight branch registers. The floating point registers are 82 bits long to preserve precision for intermediate results.
Instruction execution
Each 128-bit instruction word contains three instructions, and the fetch mechanism can read up to two instruction words per clock from the L1 cache into the pipeline. When the compiler can take maximum advantage of this, the processor can execute six instructions per clock cycle. The processor has thirty functional execution units in eleven groups. Each unit can execute a particular subset of the instruction set, and each unit executes at a rate of one instruction per cycle unless execution stalls waiting for data. While not all units in a group execute identical subsets of the instruction set, common instructions can be executed in multiple units. The groups are:
Six general-purpose ALUs, two integer units, one shift unit
Four data cache units
Six multimedia units, two parallel shift units, one parallel multiply, one population count
two floating-point multiply-accumulate units, two "miscellaneous" floating-point units
three branch units
Thus, the compiler can often group instructions into sets of six that can execute at the same time. Since the floating-point units implement a multiply-accumulate operation, a single floating point instruction can perform the work of two instructions when the application requires a multiply followed by an add: this is very common in scientific processing. When it occurs, the processor can execute four FLOPs per cycle.
Memory architecture
From 2002 to 2006, Itanium 2 processors shared a common cache hierarchy. They had 16 KiB of Level 1 instruction cache and 16 KiB of Level 1 data cache. The L2 cache was unified (both instruction and data) and is 256 KiB. The Level 3 cache was also unified and varied in size from 1.5 MiB to 24 MiB. The 256 Kib L2 cache contains sufficient logic to handle semaphore operations without disturbing the main arithmetic logic unit (ALU).
Main memory is accessed through a bus to an off-chip chipset. The Itanium 2 bus was initially called the McKinley bus, but is now usually referred to by Intel's official name: the Scalability Port. The speed of the bus has increased steadily with new processor releases. The bus transfers 2x128 bits per clock cycle, so the 200 MHz McKinley bus transferred 6.4 GB/s and the 533 MHz Montecito bus transfers 17.056 GB/s.
Architectural changes
Itaniums released prior to 2006 had hardware support for the IA-32 architecture to permit support for legacy server applications, but performance was much worse in comparison with native instruction performance and contemporaneous x86 processors. In 2005 Intel developed a software emulator that provided better performance. With Montecito, Intel removed IA-32 support from the hardware.
With Montecito, Intel made enhancements to the architecture in July 2006 The architecture now includes hardware multithreading: each processor maintains context for two threads of execution. When one thread stalls due to a memory access the other thread gains control. Intel calls this "coarse multithreading" to distinguish it from "hyperthreading technology" that was used in some x86 and x86-64 microprocessors. Coarse multithreading is well matched to the Intel Itanium Architecture and results in an appreciable performance gain. Intel also added hardware support for virtualization. Virtualization allows a software "hypervisor" to run multiple operating system instances on the processor concurrently. Montecito also features a split L2 cache, adding a dedicated 1 MiB L2 cache for instructions and converting the original 256 KiB L2 cache to a dedicated data cache.

As of 2007, several manufacturers offer Itanium 2 based systems, including HP, SGI, NEC, Fujitsu, Unisys, Hitachi, and Groupe Bull. In addition, Intel offers a chassis[20] that can be used by system integrators to build Itanium systems. HP, the only one of the industry's top four server manufacturers to offer Itanium-based systems today, manufactures at least 80% of all Itanium 2 systems. HP sold 7200 systems in the first quarter of 2006.[21] The bulk of the sales are of enterprise servers and machines for large-scale technical computing, with an average selling price per system in excess of US $200,000. A typical system uses eight or more Itanium processors.
Chipsets
The Itanium bus interfaces to the rest of the system via a chipset. Enterprise server manufacturers differentiate their systems by designing and developing chipsets that interface the processor to memory, interconnections, and peripheral controllers. The chipset is the heart of the system-level architecture for each system design. Development of a chipset costs tens of millions of dollars and represents a major commitment to the use of the Itanium. Currently, modern chipsets for Itanium are manufactured by HP, Fujitsu, SGI, NEC, Hitachi, and Unisys. IBM created a chipset in 2003, and Intel in 2002, but neither of them has developed chipsets to support newer technologies such as DDR2 or PCI Express.

Processors

The Itanium processors show a steady progression in capability. Merced was a proof of concept. McKinley dramatically improved the memory hierarchy and allowed Itanium to become reasonably competitive. Madison, with the shift to a 130 nm process, allowed for enough cache space to overcome the major performance bottlenecks. Montecito, with a 90 nm process, allowed for a dual-core implementation and a major improvement in performance per watt.The future of the Itanium family apparently lies in multi-core chips, based on available information about coming generations. The final products will most likely bear the Itanium 2 brand, or possibly Itanium 3. As of June 2007, some information is known for the following:
Montvale will be a revision of Montecito bringing slightly higher clock speeds (to 1.66Ghz), larger L3 caches (to 24MiB), and a faster FSB (to 667Mhz). The processor will implement a new power-saving system. Montvale will comprise a set of six variants called the Itanium 2 9100 series. Release is expected at the end of 2007. The processors were originally expected to be released in June 2007, a year after Montecito.
Tukwila, the first 65 nanometer design, is due in late 2008. Tukwila will include four cores, large on-die caches, Hyper-Threading technology and an integrated memory controller. A key feature of Tukwila is double-device data correction, which helps to fix memory errors. Poulson will use a 32 nm process and will feature four or more cores, multithreading enhancements, and new instructions to take advantage of parallelism, especially in virtualization. For Kittson, few details are known other than the existence of the codename.

computer tutorial

Monday, August 11, 2008

Intel Itanium

0 comments:

Labels

Blog Archive

About Me

feed