This is part of a series of articles that covers the booting of an OSR5 machine. See Booting OSR5 for other related articles.
I think it's unfortunate that most programmers and administrators don't have much knowledge of hardware or indeed much interest in it. I can understand that to some degree; I'm not the soldering iron type myself. But having at least a superficial understanding is, I think, important.
By the way, superficial is about all you can expect from me. Hardware is my weakest area, and, just like most of you, I don't really have a lot of interest in it. I force myself to learn a little, and I'm going to try to share some of that here, but any real hardware types will undoubtedly find this pretty basic.
The CPU is, of course, the heart of our systems. Most of us just think of CPU's in terms of general class and speed: 486 or Pentium, 66MHz or 266MHz. The CPU info spit out during the boot doesn't give us even that much:
%cpu - - - unit=1 family=5 type=Pentium %cpuid - - - unit=1 vend=GenuineIntel step=cC0
We can get much more information from the "hw" command. Here's the data from this system:
For Linux, use "x86info -a"
# hw -v -r cpu Report about cpu for scobox on Tue Nov 30 18:30:16 1999 There is one CPU on this system. The CPU performs like a 186Mhz Intel Pentium (735\90, 815\100) Processor: 1 (0x00) Vendor ID: GenuineIntel cpu_family: 5 cpu_id: 0x0000052c type: 0 family: 5 model: 2 stepping: 12 mfgr step: cC0 cpu_features: 0x000001bf APIC: No APIC on Chip CMOV: No Conditional Move and Compare Instructions CXS: Yes CMPXCHG8B instruction DE: Yes Debugging Extensions FPU: Yes FPU on chip MCA: No Machine Check Architecture MCE: Yes Machine Check Exception MMX: No MMX Technology Supported MSR: Yes RDMSR and WRMSR Support MTRR: No Memory Type Range Registers PAE: No Physical Address Extensions PGE: No PTE Global Flag PSE: Yes Page Size Extensions TSC: Yes Time Stamp Counter VME: Yes Virtual 8086 Mode Enhancement microdata: 246 derived speed: 186.364 MHz Model Specific registers Machine check exception address: 0x0000000007a34760 Machine check exception type: 0x00000008 TR1 parity reversal test register: 0x00000004
Here's "x86info -a" from a Linux box:
Found 1 CPU -------------------------------------------------------------------------- eax in: 0x00000000, eax = 00000002 ebx = 756e6547 ecx = 6c65746e edx = 49656e69 eax in: 0x00000001, eax = 00000683 ebx = 00000002 ecx = 00000000 edx = 0383f9ff eax in: 0x00000002, eax = 03020101 ebx = 00000000 ecx = 00000000 edx = 0c040882 Family: 6 Model: 8 Stepping: 3 Type: 0 Brand: 2 CPU Model: Pentium III-M (Coppermine) [cB0] Original OEM Feature flags: Onboard FPU Virtual Mode Extensions Debugging Extensions Page Size Extensions Time Stamp Counter Model-Specific Registers Physical Address Extensions Machine Check Architecture CMPXCHG8 instruction SYSENTER/SYSEXIT Memory Type Range Registers Page Global Enable Machine Check Architecture CMOV instruction Page Attribute Table 36-bit PSEs MMX support FXSAVE and FXRESTORE instructions SSE support Extended feature flags: L1 Instruction cache: Size: 16KB 4-way associative. line size=32 bytes. L1 Data cache: Size: 16KB 4-way associative. line size=32 bytes. L2 unified cache: Size: 256KB 8-way associative. line size=32 bytes. Instruction TLB: 4KB pages, 4-way associative, 32 entries Instruction TLB: 4MB pages, fully associative, 2 entries Data TLB: 4KB pages, 4-way associative, 64 entries Data TLB: 4MB pages, 4-way associative, 8 entries /dev/cpu/0/msr: No such device MTRR registers: MTRRcap (0xfe): MTRRphysBase0 (0x200): MTRRphysMask0 (0x201): MTRRphysBase1 (0x202): MTRRphysMask1 (0x203): MTRRphysBase2 (0x204): MTRRphysMask2 (0x205): MTRRphysBase3 (0x206): MTRRphysMask3 (0x207): MTRRphysBase4 (0x208): MTRRphysMask4 (0x209): MTRRphysBase5 (0x20a): MTRRphysMask5 (0x20b): MTRRphysBase6 (0x20c): MTRRphysMask6 (0x20d): MTRRphysBase7 (0x20e): MTRRphysMask7 (0x20f): MTRRfix64K_00000 (0x250): MTRRfix16K_80000 (0x258): MTRRfix16K_A0000 (0x259): MTRRfix4K_C8000 (0x269): MTRRfix4K_D0000 0x26a: MTRRfix4K_D8000 0x26b: MTRRfix4K_E0000 0x26c: MTRRfix4K_E8000 0x26d: MTRRfix4K_F0000 0x26e: MTRRfix4K_F8000 0x26f: MTRRdefType (0x2ff): 700MHz processor (estimate).
What does all this mean? Let's start with the simple stuff. First, this isn't a SMP (Multiple CPU system). The hw command says it's running at 186 MHz (which is reasonably close to the supposed 200 MHz specs).
Hertz, , or cycles per second, which refers to the clock speed, so it's millions of clock cycles. A clock cycle is a chance for the CPU to run; during that cycle it might complete 1 instruction, more than one instruction, or even less than one instruction. Different processors take different number of clock cycles for different instructions. For example, an 8086 processor needed 4 clock cycles to do an "OR AX, 3". The '386 did it in 2 clocks, but a Pentium can do that in one clock cycle. Pentium Pro and Pentium II's have the ability to do multiple instructions at once, so it's not really sensible to talk about how many clocks an instruction takes, but the important thing to understand is that a Pentium at 66 MHz is easily 25% faster than a 486 at 100 MHz just because most Pentium instructions have shorter clocks than the 80486 did. Likewise, a Pentium II is faster than a plain Pentium at the same clock rate.
But that "performs like a" output can be misleading or even dead wrong. When dealing with a Pentium CPU that existed at the time this code was written, it is giving a reasonably accurate benchmark. But if it's something else (a Cyrix, for example, or some new CPU by Intel) who knows? It's just how fast it got through a certain loop that contained certain instructions where the time required is known for specific CPU families. When the code does NOT know the family it just guesses. I don't know whether SCO's code is or is not aware of Cyrix; the point is that unless you know that it is aware of how a CPU like yours is supposed to perform, that output may not have much value.
By the way, the early 8086 CPU's managed to crank out 2.5 million instructions per second for their simplest functions. That was a big improvement from the first 4004 chip that could only put out 50,000 at best.
The generations just beyond the Pentium (the Pro, Pentium II, etc.) are really incredible machines. Although they understand the same x86 code that goes all the way back to the 8086, internally they translate that code to RISC instructions, have parallel execution and many more features that really make them in a class far beyond their predecessors. If this interests, you, I suggest:
This CPU is type 0, which means "Original OEM processor". Other possibilities are:
The "cpu_id" is actually an instruction that can be issued that returns the bit string that hw displays as hex and the text of the Vendor ID. This is a "Genuine Intel"; other possibilities include "AuthenticAMD", "UMC UMC UMC", "CyrixInstead" and more. Intel says that Pentiums below step B0 don't return any Vendor ID string.
This is a Pentium, so it's "family 5". Strangely, the Pentium Pro is 6, and it gets even more strange: Family 4 is "most" 80486's, plus Cyrix and AMD 5x86. Family 5 is the Pentium, AMD K5 and K6 and quite a few others. Family 6 is Pentium Pro, Pentium II, Pentium III, AMD K7 and Cyrix M2.
The model possibilities is an extensive list. Refer to https://www.sandpile.org/arch/cpuid.htm for most of them.
Stepping is a manufacturer revision level. It's important if there are bugs at certain levels- if you are above that level you don't have the bug.
There is another field that hw doesn't show, but it only affects Pentium III models, and that distinguishes between the Pentium III and the Pentium III Xeon.
This processor doesn't have an APIC on chip. An APIC (Advanced Programmable Interrupt Controller) is required for SMP (Symmetric Multi Processing- more than once CPU). It wouldn't necessarily have to be on the processor; it could be on the motherboard, but this CPU would obviously require the motherboard version if it were used in an SMP system.
The Conditional Move and Compare instructions arrived with the Pentium Pro. There's quite a few variations, but the idea is that the move will be done based on the result of a previous instruction. Doesn't sound like much, does it? Actually, it's very important because of the way the Pentium Pro and Pentium II work: these processors execute instructions out of order and do speculative execution: they'll try to predict which direction a branch (which is how older processors handle conditional moves) will take and pre-execute the instructions in that branch. If it turns out that the branch isn't taken, those instructions have to be thrown out. The conditional moves can avoid a branch, so that avoids speculative execution. Understand that speculative execution is, overall, a good thing, but if it's possible to avoid multiple execution paths, that's even better. Consider a simple case where you need to put something in a memory location if a certain register is non-zero. Traditionally, you'd test the register, and then jump to some other location if the results were not zero. If the Pentium Pro came across such a test and branch, it would try to decode and execute one of the branches while continuing with it's "normal" out-of-order parallel execution of the instruction stream. If it guessed right about the results of the test, it would have saved time, but if it was wrong, eventually all that speculative work would be just wasted. If instead this were coded with a CMOV, there is no branch, so the flow would just continue clicking along.
The CMPXCHG8B instruction started with the Pentium. It compares and exchanges 8 bytes in one operation. Big deal; another new instruction. Yes, but this one is famous: it's the source of the famous "Pentium Bug" that would lock up your machine tight as drum if it was encountered under the right conditions. See https://www.zdnet.com/pcmag/news/trends/t971117b.htm for more details on that bug.
Aside from the bug, this is just another instruction like CMOV that can avoid code branches.
Debugging Extensions was another new feature when the Pentium was announced. These features are primarily of interest to hardware designers, but they are very important to them, because these let the internals of the chip be seen without affecting the programs that are currently running. See https://x86.ddj.com/articles/probemd/probemode.htm.
Until the 80486, the FPU (Floating Point Unit) was always a separate chip. Early CPU's couldn't do floating point operations at all; the programmers had to write their own routines to do such math. FPU's actually do more than just work with decimal numbers; they have logarithmic and geometric functions, square roots and the like.
Machine Check Exception and Machine Check Architecture are closely related, so I'm going to cover them together. This was introduced with the Pentium, and expanded with the Pentium Pro. Here's the concept: certain errors, such as bus errors, parity errors and cache errors are logged in special registers, and an interrupt is generated (this feature is only enabled if the OS designer wants it; the interrupt isn't automatic). If there were a way to correct the error (there's not on any hardware I know of now), the interrupt handler would presumably try to recover, or at least log the problem somewhere. The MCE is a simple implementation of this; the MCA (Pentium Pro and up) is more complex and complete.
Multimedia Extensions add more than 50 new instructions. These instructions work with 64 bit registers stolen (borrowed) from the FPU. They can do 64 bit moves to memory, which speeds up such operations as writing to video ram. They also have clamping or "saturated" math: if the result of these special add instructions goes above or below specified values, it gets set to the appropriate value. This avoids having to test for such conditions at all in many situations. It also has parallel compares which can compare 4 sixteen bit values in parallel, setting a special register that shows which values compared true and which compared false. That register can then be used to perform boolean (AND, OR, etc) instructions on the 4 sixteen bit values depending upon the results of the comparisons stored previously. Effectively, you get to operate on 64 bits at once. This again speeds up operations and can eliminate branches.
The Pentium was the first model to have "model-specific registers" or MSR's. The RDMSR and WRMSR read and write those registers. The Pentium only had a few but the Pentium Pro added more. See https://www.byte.com/art/9407/sec12/art3.htm.
The Pentium MSR's are listed at the bottom of hw's cpu report; they all relate to the Machine Check Exception capability covered above.
Memory Type Range Registers allow the specification of attributes for specific ranges of memory. That would seem to be handy, especially for designating uncacheable areas, etc. but nothing short of a Pentium Pro has this feature, so Unix uses Page Tables to define memory attributes. I don't know if SCO makes any use of these in actual code, but any web search for MTRR will turn up lots of Linux references.
Physical Address Extensions are again only available with Pentium Pro and up. This feature allows access to 64GB of memory rather than the 4GB's that lesser cpu's can address. If you are aware that Unixware 7 can be licensed to access 64GB of memory, you might think that a Pentium Pro or better would be required for that, and I'm sure that's true.
PTE Global Flag is another neat Pentium Pro and up feature that allows shared pages to be specified as "global" so they don't get flushed from cache when a new process is switched in. Again, I have no idea if SCO Unix uses this when the processor supports it.
Page Size Extensions were introduced (but undocumented) with the Pentium. The concept is that you can have large areas of memory that are most easily treated as a single unit rather than multiple 4KB pages. An example is a large video frame buffer or even the kernel code itself. The PSE allows 4MB pages to be defined. Since caching works on a page basis, these large pages tend to stay in cache because a hit anywhere within their scope counts, whereas if the same are were multiple 4K pages, parts of it would leave cache if not accessed. Again, I have no idea about actual use of this.
The Time Stamp Counter is my favorite register. It is 64 bits, and it increments once for every clock cycle- that's once every 5 nanoseconds for a 200MHz cpu. It ticks even if the CPU is otherwise halted, and if it reaches all one's, it just rolls over to all 0's with no announcement. Intel guarantees that the TSC will not roll over in less than 10 years on all future processors. This must mean that Intel is planning some mighty fast cpu's, because it would take several thousand years for this 200 MHz Pentium cpu to reach that point- do the math!
Operating Systems can also take advantage of the TSC as a time counter. I don't know specifically where SCO would use this, but I'm sure that they would have code that would whenever possible- it would be much easier and more reliable than anything previously available.
The Virtual 8086 Mode Enhancement is enhancements to the Virtual 8086 capability that was introducced way back with the 80386. I assume that products like Merge use these features, but I don't know if the enhancements available on Pentium or better are used in current releases.
I've not yet been able to determine the significance of this data.
Got something to add? Send me email.
More Articles by Tony Lawrence © 2011-03-12 Tony Lawrence