The first two I/O tasks noted in Section L5 concern the establishment of connections between the memory and the I/O devices and the control of data transfers over this connection. Most modern computers use a bus, or buses, for interconnecting the major components of the system. A bus serves as a pathway between components for passing address, instructions, and data. A simple high-level view of a bus is shown in Fig.1-8. This bus interconnects the processor, main memory, and I/O devices by means of their controllers.
Fig. 1-8 Simple bus structure
Buses can be classified in a number of ways: by purpose, control, and communication technique.
The major difference between dedicated and general-purpose buses is that a dedicated bus is point-to-point between two physical devices, whereas a general-purpose bus interconnects more than two physical devices. Dedicated buses are used in cases in which the latency and the bandwidth requirements are such that sharing the bus with another user can result in unacceptable system performance. Note that a dedicated bus does not need address lines because the source and destination are addressed by implication. That is, device 1 always sends to device 2 or device 2 to device 1. Some dedicated buses are unidirectional with information flow in only one direction. For example, the bus that connects a memory to a graphics controller may be unidirectional.
Because dedicated buses are used internal to the processor or for special high-bandwidth applications without general-purpose capabilities, these buses are not considered further. Instead, the following paragraphs discuss the control and communications design techniques found with general-purpose buses.
Also, with a general-purpose bus, a number of users share the same bus and simultaneous requests for the bus are resolved by one of a number of resolution techniques. Some of the devices on a general-purpose bus are both senders and receivers or only senders or only receivers. For example, a printer controller's primary function is to receive data and send some status information. A disk controller, on the other hand, sends and receives data and sends status information.
The control of a general-purpose bus can be either centralized or decentralized. The basic requirement is to grant or not grant a device access to the bus. With centralized or decentralized control, all devices are treated equally except for their priority of access. Thus, if one of the devices is the processor, it may be given the highest priority for bus access. However, in some system an I/O device may have the highest priority because of the loss in system performance; for example, if a disk is unable to read and misses a complete revolution, many milliseconds will be lost.
· Centralized Control: A single hardware control unit will recognize a request and grant access to the bus to a requesting device. It is the responsibility of the controller to resolve simultaneous requests and assign priority to the requests. At least three designs are used for centralized controller: daisy chain, polling with a global counter, and polling with local counters.
· Distributed Control: Distributed control, also known as decentralized control, distributesthe control function between all the devices on the bus. The major advantage of decentralized control is that the system is easily expandable by the addition of modules. As with centralized control, there are three basic designs: daisy chain, polling, and independent requests.
The transmission of addresses, control information, and data between two devices may be synchronous with a clock, or asynchronous without a clock and self-timed.
· Synchronous Communication: A simplified diagram of a synchronous bus (the data and clock portion) connected between two devices is shown on the left of Fig.1-9. Data are to be transmitted between the card in the right-hand slot to the card in the left-hand slot. The transmitter and the receiver, are clocked from a common source on the left-hand card.
Fig. 1-9 Synchronous bus clocking
· Asynchronous communication: Asynchronous communication for buses was developed to overcome the worst-case clock rate limitations of synchronous systems. With asynchronous communications, data transfers occur at the fastest rate and smallest delay possible under the circumstances of the physical status of the bus. As cards are added to the bus, the timing automatically adjusts for each card. There are a number of asynchronous protocols; however, only one of the simpler ones is discussed here.
Table 1-5 provides examples of the four general-purpose bus type from the taxonomy: centralized and asynchronous communications.
Table 1-5 General-purpose bus design examples
Control |
Communications |
Examples |
Specification |
Centralized |
Synchronous |
PCI |
PCI SIG v2.1 |
Centralized |
Asynchronous |
IPI |
ANSI X3.129 |
Decentralized |
Synchronous |
Multibus 11 |
ANSI/IEEE 1296 |
Decentralized |
Asynchronous |
VME |
IEEE 1014 |
|
|
Future Bus |
IEEE 896.1 |
The peripheral component interconnect (PCI) is a recent high-band width, processor–
independent bus that can function as a mezzanine or peripheral bus. Compared with other common bus specifications, PCI delivers better system performance for high-speed I/O subsystems (e. g., graphic display adapters, network interface controllers, disk controllers, and soon). The current standard allows the use of up to 64 data lines at 33 MHz, for a raw transfer rate of 264 MBps, or 2.112Gbps. But it is not just a high speed that makes PCI attractive. Economically, PCI is specifically designed to meet the I/O requirement of modern systems so it requires very few chips to implement and supports other buses attached to the PCI bus.
PCI is designed to support a variety of microprocessor-based configurations, including both single-and multiple-processor systems. Accordingly, it provides a general-purpose set of functions. It makes use of synchronous timing and a centralized arbitration scheme.
A combined DRAM controller and bridge to the PCI bus provides tight coupling with the processor and the ability to deliver data at high speeds. The bridge acts as a data buffer so that the speed of the PCI bus may differ from that of the processor's I/O capability. In a multiprocessor system, one or more PCI configurations may be connected by bridges to the processor's system bus. The system bus supports only the processor/cache units, main memory, and the PCI bridges. Again, the use of bridges keeps the PCI independent of the processor speed yet provides the ability to receive and deliver data rapidly.
bus |
总线 |
clock |
时钟,同步 |
synchronous |
同步的 |
asynchronous |
异步的 |
Peripheral Component Interconnect (PCI ) |
外围部件互连(总线) |
decentralized |
分散式(的) |
adapter |
适配器 |
distributed |
分布式(的) |
timing |
定时,同步 |
priority |
优先级 |
chip |
芯片 |
daisy chain |
菊花链 |
dynamic RAM (DRAM) |
动态随机存储器 |
polling |
轮询 |
(1)clock(时钟)。计算机内部的一种电子电路,用来生成稳定的定时脉冲流,即用来同步每一次操作的数字信号。计算机的时钟频率是决定计算机运行速度的主要因素之一,因此在计算机的其他部件允许的范围内,频率越高越好。
(2)adapter或adaptor(适配器)。一种使个人计算机可以使用外围设备的印刷电路板。这些外围设备包括:CD-ROM驱动器、调制解调器、游戏杆等,因为这些外围设备本身并不包括必要的连接、端口或电路板。一个单独的适配卡可以含有多个适配器。也称为interface card。
(3)priority(优先级)。接收微处理器服务及使用系统资源的顺序。在计算机中,优先级等级可用于避免多种潜在的崩溃和矛盾。在网络中,可以调整站点优先级以决定其控制通信线路的时间和频度,可以调整消息优先级以决定其传送速度。
(4)daisy chain(菊花链)。一组顺序连接的设备。为了消除对通道(总线)的请求冲突,这些通道(总线)是与所有设备都连接的,需要对每个设备都赋予一个不同的优先级,如果是在Apple Desktop Bus中,那么每个设备都要监视通道并且仅当线路被清除时才传输。
(5)DRAM(动态随机存储器)。动态RAM半导体是RAM的一种。动态RAM将信息存储在包含电容器的集成电路中。由于电容器的电荷会随时间丢失,因此动态RAM板上必须包括一个连续刷新(再充电)RAM芯片的逻辑电路。当动态RAM正在刷新时,处理器不能从中读取数据。因为其电路简单,而且存储的数据量是RAM的四倍,所以尽管速度较慢,动态RAM的使用仍比RAM普遍。
1. True or False.
(1) A general-purpose bus is point-to-point between two physical devices.
(2) Dedicated buses are used of special high-bandwidth app1ications.
(3) Many users on a general-purpose bus can share the same bus simultaneously.
(4) The difference between centralized and decentralized control is to grant or not grant a device access to the bus.
(5) According to this text, decentralized control is the same as distributed control.
(6) The clock rate of asynchronous communication is limited.
(7) PCI is designed to support single processor system only.
(8) PCI bridge acts as a buffer.
(9) A disk controller has a receiver only.
(10) A dedicated bus needs address lines.
2. Fill in the blanks with appropriate words or phrases found behind this exercise.
(1)In a computer system the bus is used to pass between its components.
(2)The current standard of PCI allows the use of up to .
(3)We can classify the buses by .
(4)When we hope short latent time and high bandwidth of a bus, we should choose .
(5)Synchronous communication needs a common while asynchronous communication doesn't need.
(6)All devices in a centralized control are treated equally except for their
(7)If a disk is unable to read, the system performance will be .
(8)Centralized controllers have at least three types: .
(9)Asynchronous transmission can get the fastest and smallest .
(10)In a typical server system we can find two types of bus, they are .
A.64 data lines
B.priority for bus access
C.dedicated bus
D.system bus, PCI bus
E.daisy chain, polling with a global counter, and polling with local counters
F.data rate, delay
G.clock
H.lost
I.address, instruction, and data
J.purpose, control, and communication technique
PCI Express pumps up performance
In the past decade, PCI has served as the dominant I/O architecture for PCs and servers, carrying data generated by microprocessors, network adapters, graphics cards and other sub-
systems to which it is connected. However, as the speed and capabilities of computing components increase, PCI's bandwidth limitations and the inefficiencies of its parallel architecture increasingly have become bottlenecks to system performance.
PCI is a unidirectional parallel bus architecture in which multiple adapters must contend for available bus bandwidth. Although performance of the PCI interface has been improved over the years, problems with signal skew (when bits of data arrive at their destination too late), signal routing and the inability to lower the voltage or increase the frequency, strongly indicate that the architecture is running out of steam. Additional attempts to improve its performance would be costly and impractical. In response, a group of vendors, including some of the largest and most successful system developers in the industry, unveiled an I/O architecture dubbed PCI Express (initially called Third Generation I/O, or 3GIO).
PCI Express is a point-to-point switching architecture that creates high-speed, bidirectional links between a CPU and system I/O (the switch is connected to the CPU by a host bridge). Each of these links can encompass one or more “lanes” comprising four wires—two for transmitting data and two for receiving data. The design of these lanes enables the use of lower voltages (resulting in lower power usage), reduces electromagnetic emissions, eliminates signal skew, lowers costs through simpler design and generally improves performance.
In its initial implementation, PCI Express can yield transfer speeds of 2.5 Gbps in each direction, on each lane. By contrast, the version of the PCI architecture that is most common today, PCI-X 1.0, offers 1Gbps in throughput. PCI Express cards are available in four- or eight-lane configurations (called x4 and x8). An x4 PCI Express card can provide as much as 20ps Gbps in throughput, while an x8 PCI Express card can offer up to 40ps Gbps in throughput.
Earlier attempts to create a new PCI architecture failed in part because they required so many changes to the system and application software. Drivers, utilities and management applications all would have to be rewritten. PCI Express developers removed the dependency on new operating system support, letting PCI-compatible drivers and applications run unchanged on PCI Express hardware.
Developers are working on increasing the scalability of PCI Express. While current server and desktop systems support PCI Express adapters and graphics cards with up to eight lanes (x8), the architecture will support as many as 32 lanes (x32) in the future.
The first Fibre Channel host bus adapters were designed to support four lanes instead of eight lanes, in part because server developers had designed their systems with four-lane slots. As even more bandwidth is required, implementing an eight-lane design potentially could double the performance, provided there were no other bottlenecks in the system.
This scalability, along with the expected doubling of the speed of each lane to 5Gbps, should keep PCI Express a viable solution for designers for the foreseeable future.
PCI Express is a significant improvement over PCI and is well on its way to becoming the new standard for PCs, servers and more. Not only can it lower costs and improve reliability, but it also significantly can improve performance. Applications such as music and video streaming, video on demand, VoIP and data storage will benefit from these improvements.
VIA P4X333 with DDR333 and AGP 8x
The new chipsets that are introduced from time to time usually lack innovation. Not this time. VIA, formerly “just a chipset maker”, has become the number two in the global chipset market, and now it is putting all its efforts into extending the good reputation that it achieved through a series of successful chipsets. As numerous tests have revealed, the Pentium 4 lacks the bandwidth that's needed to take advantage of its full potential. Is the P4X333 platform to remedy this grievance?
It looks that VIA should be able to continue this success story—the new chipset does implement a bunch of features for which most of us have been impatiently waiting. USB 2.0 will be the most important external interface for all kind of computers, and obviously, VIA wouldn't do without it. The new Southbridge chip VT8235 not only offers USB 2.0, but also includes an IDE interface with support for UltraATA/133. Even though Maxtor is the only manufacturer that ships such drives, there's certainly nothing wrong in having this most advanced interface. Finally, VIA emphasizes that the bandwidth of their bus between the Northbridge and Southbridge has been doubled, now delivering 533 MB/s (just as fast as SiS, twice as fast as the Intel Hub architecture).
Last but not least, there is a question that this article won't be able to answer: What about AGP 8x? According to the specs, VIA has implemented the new graphics card interface that finally also doubles the bandwidth between the graphics adapter and the Northbridge. In the past, upgrading from AGP 1x to 2x and 4x always raised graphics performance. A separate article will discuss this topic later. Here, it's not a primary factor in evaluating the perfomance of the P4X333 and the new memory interface in particular. Instead, we stick to known factors, such as GeForce4 TI4600 512 MB DDR333 SDRAM (CL2.0) and a fast hard drive from Maxtor. Let's see what this chipset is all about.
VIA is dipping its toes into a market that has always been dominated by the chip giant Intel. Inters advantage is that it supplies chipsets for their own processors, thus offering a platform that is both fast and reliable.
The only setback that Intel ever had to suffer was the disaster with the Pentium III chipset “i820” and the so-called “Memory Translator Hub” (or MTH), which was supposed to enable the use of conventional SDRAM memory on a chipset that was designed for Rambus DRAM. Unfortunately, this MTH chip had some bugs that could not been eliminated, and this whole affair was dubbed “Caminogate” —an allusion to the code-name that was used for i820.
As a result of this disaster, Intel phased out the 820 chipset and released the 815 in order to replace the aged BX. A modified version of this chipset (815T) is used today for Celeron and Pentium IIICPUs, but through Inter's mistake, VIA finally gained a strong hold in the market by providing the Apollo Pro 133A chipset which, due to the failure of i820+MTH, managed to become the fastest PIII chipset at the time.
Since then, products from the Taiwanese company are steadily improving in terms of consistency and performance. Today, VIA is strong enough to push their own technological developments (such as the C3, Eden and now, the first chipset with AGP 8x).
However, there is also another competitor that is not dormant—Silicon Integrated Systems (SiS) who has managed to get rid of its “very-low-cost” image within the last months. The first product to surprise us was the 735 chipset for Athlon; today, SiS delivers various chipsets for all of the common PC architectures. The 645 DX is their current flagship for the Pentium 4, also supporting DDR333 and 533 MHz FSB, but it lacks support for ATA/133, USB2.0 and AGP 8x. While VIA is still fighting for the Pentium 4 bus license, SiS is officially allowed to sell P4 chipsets. This issue could decide whether the P4X333 will be success all or not. In Europe, for example, it's not quite as easy to get motherboards based on the P4X266A (except for the VIA brands), so it looks that the big motherboard players are still cautious.
The P4X333 is the first Pentium 4 chipset to support AGP 8x (or AGP 3.0, to be more precise). Though the standard has been defined since late 2000, it is not yet introduced through the industry. The upcoming Intel chipsets i845E and i845G both do not support AGP 8x, neither does the just released 850E version. In addition, there are no AGP 8x graphics cards available now, so this may not even be so tragic.
You may wonder why it could ever be necessary to have such a huge bandwidth between the graphics card and the system. On the one hand, the graphics adapter always has the possibility to swap textures and other graphic data to the main memory. Most BIOSs have an item called “aperture size”; here you can define the maximum memory capacity that can be used by the graphics adapter. Machines running with on-board graphics and unified memory architecture (no dedicated video memory available) obviously will benefit tremendously from the bandwidth doubling. But there is quite a lot of traffic on the AGP bus anyway, so we should expect a performance gain in most benchmarks.
The bandwidth doubling from AGP 4x to AGP 8x was mainly achieved by running the AGP at octuple-pumped 66MHz (resulting in effective 533MHz) rather than quad-pumping. Doesn't that sound familiar? Yes, the Pentium 4 does pretty much the same with its system bus. So far, it has been running at 100MHz quad-pumped (=400MHz), while the latest chipsets (850E, 845E) raised the clock speed to 133 MHz. Thanks to this, the FSB and the AGP keep running pseudo-synchronous.
Table 1-6 shows the differences between all AGP standards.
Table 1-6 AGP Standards
|
AGP 1.0 |
AGP 2.0 |
AGP 3.0 |
Name |
AGP, AGP 2x |
AGP 4x |
AGP 8x |
Signaling |
3.3V |
1.5V |
0.8V |
Clock Speed |
66 MHz double-pumped |
66 MHz quad-pumped |
66 MHz octuple-pumped |
Bus Width |
32 bits |
32 bits |
32 bits |
Bandwidth |
533 MB/s |
1066 MB/s |
2133 MB/s |
Backwards Compatible |
yes |
yes |
only to AGP 4x |
AGP 8x uses the same connector as AGP 4x, the only difference is that some pins have been reassigned in order to support the new signaling. As a result, you will be able to run all AGP 8x and AGP 4x graphics cards (at 0.8V and 1.5V)—but not AGP 2x! This means that you won't be able to use graphics adapters that were made before mid-1999. So once again, you will have to sacrifice backwards compatibility in order to get a faster platform.
Even though the ATA/133 interface, USB 2.0 and AGP 8x are very important as well as desirable; they each have less influence on overall performance than the memory controller plus the memory combined. With the clock speed increased from 133 to 166 MHz (double-pumped), the maximum bandwidth of DDR-SDRAM climbed from 2.1 GB/s to 2.7 GB/s (which is why the standards are also called PC2100 and PC2700).
This is still lower than the bandwidth of dual-channel RDRAM (32GB/s), but conventional SDRAM can live with only a fraction of the latencies of RDRAM, thus resulting in equal or better performance.
This is also the main reason why the memory clock of RDRAM was increased from 400 to 533MHz as well (PC1066 RDRAM). By the way, the test setup that we used there is the same one that we used to review the new VIA chipset.
When talking about DDR333 memory, we should not forget that there are two types of RAM available: CL2 and CL2.5 modules. Only a few days ago we published an article showing the difference between fast (CL2) and slow (CL2.5) memory setups. Basically, shorter latencies and thus CL2 memory should always be preferred.
Many THG readers have been asking about the performance difference between DDR266 at CL2 and DDR333 at CL2.5. Well, the difference is quite significant, or in other words: DDR333 is always faster than DDR266, no matter which timings you are running. Still, we recommend that you go for the faster DIMMs if possible.
Apart from the technical specs and the performance evaluation, the P4X333 introduces a new Southbridge, the VT8235. In addition to the standard features (AC97sound, serial and parallel ports, IR port, keyboard and mouse controller, PCI bridge), this chip introduces USB 2.0 and UltraATA/133 to the VIA chipset family. Note that both the P4X333 and the VT8235 are pin-compatible with their predecessors P4X266/A and VT8233A, thus making them easily interchangeable.
As a result, motherboard manufacturers can quickly switch their production to accommodate P4X333 without having to make expensive modifications to the production process or to the motherboard layout.
The reference motherboard is equipped with the maximum hardware features that are directly supported by the chipset. That is six PCI slots, an ACR slot, three DIMM sockets for DDR266 or DDR333 DIMMs, AC97 sound system, 100 Mbps network adapter and the Ultra-
ATA/133 interface. It's very likely that this motherboard will be available soon with only a few modifications, if any.
We ran a total of 25 benchmarks in order to give you a balanced, overall picture of how the P4X333s perform. Please note that all benchmarks were performed with Intel's latest Pentium4, the 2.53 GHz model, running at 133 MHz FSB. Due to time limitations, we were not able to re-test all other chipsets for this review, so we chose one main competitor instead.
We chose to pit the i850E against P4X333 for three reasons: First, neither its predecessor P4X266A nor the Intel845D can run DDR333 at the same pace. Secondly, none of them is able to run the Pentium4 at 133 MHz FSB, and thirdly, Intel is going to release the renewed 845 chipset with support for DDR333 and FSB133 next week anyway.
Bios tuning: maximum power
Overclocking fans know where it's at: To push state-of-the-art motherboards to their limit in conjunction with the CPU and memory, a touch of manual fine-tuning to the BIOS settings is called for. It often happens that one setting or the other proves to be too “progressive”, with the effect that the board does not boot up afterwards. If this happens, deleting the CMOS settings is frequently the only option available if the board does not automatically boot with the slow default values. However, most users require a fair amount of explanation to optimize the performance of their systems to the fullest. Understandably, quite a number of PC enthusiasts who deal with hardware on a daily basis tend to avoid tinkering with the BIOS settings.
Using the well-known motherboard Asus CUSL2 as an example, we will show you step by step how it is possible to speed up a relatively sluggish board with digressive settings (mostly also factory settings) by a fair amount. Our example is typical for most boards and is based on an Intel (Socket 370) or AMD platform (Socket 462). For the sake of completeness, we have also taken a look at the BIOS from the Asus P4T for the Intel Pentium 4 to briefly highlight the special features of the Rambus memory.
Following the merger of Award and Phoenix, both the traditional Award BIOS and the AwardBIOS with a Phoenix look are now available. AMI also supplies BIOS software, but this is extremely rare and found only in very few mainboards. Our experience shows that the BIOS from Award (with the Phoenix look) is not very user-friendly—unlike the traditional Award BIOS, which is very logically structured and simple to use.
Before we get down to the actual business, it is advisable to check whether or not the motherboard already has the latest BIOS version. To do this, the version shown in the bottom left-hand corner during boot-up should display the latest date. The best way to obtain the latest BIOS version is from the FTP server of the respective motherboard manufacturer.
On numerous motherboards, the BIOS is write-protected. This is protection has to be disabledprior to burning, by jumpers on the motherboard, or in the BIOS itself.
The port 80 card can be a very helpful tool for tuning a motherboard. It costs only very little and basically displays the status on boot-up. If a computer hangs at a certain point while booting up, the port 80 card can give a good idea which component is responsible for the fault. For this reason, a port 80 card is an absolute must for every experienced hardcore over clocker and fans of system tuning. Otherwise, it can be very difficult to determine the exact cause if the PC crashes or hangs.
You need to select this menu in the BIOS if the settings for the memory are to be changed.
Default settings for memory timing ex works: All the adjustments are made automatically and are read by the EEPROM of the memory module.
Most PCs are supplied with highly conservative factory settings for memory access, with the result that vast amounts of power are either squandered or lie dormant. In the following pictures, we show how it is possible to change the settings for memory timing step by step.