Ulrich Drepper ([info]udrepper) wrote,
@ 2005-01-18 04:43:00
Previous Entry  Add to memories!  Tell a Friend!  Next Entry
Users and Complexity vs Bugs
It is not clear to me whether this in the foreseeable future will be relevant again, but I think it is still worth pointing out.

The Players



The last four years or so have seen some dramatic development in CPU architectures. The move to 64-bit platforms happened in this brief time and on a wide front in workstations, servers, and now desktops. Alpha turned out to be a dead end due to company politics (certainly not due to technology), SPARC64 and mainframes are niche players, MIPS64 is irrelevant in this area. This basically leaves x86-64, IA-64, and PPC64. It is interesting to look at the fate of these architectures. What I claim here is that the fate could have been foreseen very early.

IA-64



Work on IA-64 has begun a long time ago. Mostly behind closed doors at Intel and HP. Two cooks working at the same time is by itself a warning signal. It is worse if one of them is HP, a company always known for extravagent, overly complex designs. Everything is designed so that it theoretically can achieve peak performance. The word "theoretically" is key here. I once had to go to an HP office where the party I visited had to use one of the local servers. To make a long story short, this monster of a machine ran slower than the machine I had in my office at that time and the excuse was that the technicians hadn't come yet to set the machine up correctly.

Another example is the HP/PA architecture. The architecture is clearly designed for scalability. The memory subsystem specifically has all signs for this. But at what price? The architecture has no usable atomic operations (those are the hardest to implement on scalable systems). The result: it is up to this day not possible to implement a well performing thread library for Linux/HP/PA. Specifically, the NPTL library still hasn't been ported to this architecture and if it ever will, the fast paths for the synchronization primitives, which are completely at userlevel for the other architectures, has to use kernel support. In other words, it will be slow.

The same kind of over-engineering shows in the IA-64 architecture. Sure, all the features the processor has can theoretically achieve phenomenal performance. The architecture is like no other (production) architecture before. No CISC or RISC. A kind of VLIW architecture with synchronization between instructions being the responsibility of the author (or the tools s/he uses). There are dozens of relocation types, complications in the instruction set require unwind information to be present even for *regular* operation (not only for debugging and exception handling), the entire calling conventions are very complicated. The architecture got compatibility support in one form or another for both IA-32 and HP/PA. Especially adding the crippled IA-32 model in the modern IA-64 memory architecture is a pain.

On top of this, everybody who had the misfortune to sit next to one of these machines for an extended period of time will testify that they are loud. This is not only the result of the case design (admittedly, these are almost all server cases). But the power consumption of the processors require extensive cooling. And cooling is loud. Even the smallest of the machines is many times more expensive than a performance-wise equivalent machine with, in the extreme case, the IA-32 architecture. Finally, the chips are up to this point produced with technology which is one or two generations behind what is used for IA-32. On the plus side, IA-64 has probably the best support for machines with huge numbers of processors, good support for the next big thing: virtualization.


PPC64



PPC64 is also complex. But the participating parties restrained themselves. No experimentations with VLIW or so were made. PPC64 is to most extend a straight-forward extension of the PPC32 architecture and provides complete support for running PPC32 programs and even kernels. The architecture specification is flexible and allows different levels of implementations. Some exotic features, which might be interesting to high-performance machines, need not be available in the processors for workstations and normal servers. A consequence of this is that the ABI cannot rely on these features. Here we see already a big difference with IA-64. The latter has all the bells and whistles included in the basic architecture since there is only really one implementation. And the ABI designers went wild and tried to design the ABI so that one can take advantage of each and every one of these features. The PPC64 didn't do that but they still haven't done everything right. The main problem with the (initial) PPC64 ABI use on Linux was that the wrong people were allowed to design it. Those came from AIX. AIX didn't use ELF for PPC32 machines, but switched to ELF for PPC64. In this step, the ABI designers tried to carry over as much of the AIX PPC32 ABI as possible to make it easier for AIX people. Linux PPC32 did use ELF and therefore this preference for the old AIX ABI is not welcome. In fact the "dot symbols" were a big obstacle for PPC64. They made porting of applications tiresome, sometimes even hard. The concept of function descriptors is sound, but there is no reason for these horrible "dot symbols" other than AIX had them, too. Fortunately the ABI maintainers were flexible enough to see the benefits of the change and so, after many years of using the old ABI, dot symbols were removed in 2004. Compatibility to a large extend is maintained but there is always a certain risk. The responsible people were willing to take this.

I terms of machines IBM did not do much to provide reasonable small machines. Fortunately for the PPC64 architectures, others stepped in, namely Apple. The G4 and G5 machines run well under Linux these days even though specifications for some parts of the machines are hard to come by. There are also smaller companies which provide motherboards for PPC machines. Still, any PPC64 is sold at a premium. We'll see what Apple's cheap machines bring in the future.


x86-64



The third architecture is x86-64. Closely based on the IA-32 architecture, it in fact can be built as an extension of IA-32. AMD didn't really do it this way, they redesigned their processors around the new functionality (having licensed technology from Alpha, major improvements in the memory architecture are made, from integrated memory controllers to quasi-NUMA memory for multi-processor nodes). Intel, so far at least, simply extends the existing P4 architecture with the new EM64T technology. For both, Intel's and AMD's, processors this means 100% binary compatibility can be achieved. Not only is the 64-bit mode similar enough to the 32-bit mode, both processors can be booted in 32-bit mode. Future Xeon processors might hold up with the big AMD processors, but scalability beyond 4, maybe 8, processors is limited. Just as for IA-32. The IBM x440 machines and Unisys monsters are the proof of the problems of the IA-32 scalability. This is of course OK with Intel, since they rather would like to sell IA-64 machines anyway.

The ELF ABI for x86-64 follows closely the IA-32 ELF ABI. But it diverges in some important points. Not in an unreasonable way, mind you: if the IA-32 ABI would be redesigned today, the same set of decisions would be made. Chief among those changes is the calling convention, which can reduce the function call overhead by up to 75% (this is really a guess, don't quote me). It was just that the guys at AT&T and Sun, who designed the original IA-32 ABI, didn't think of these optimizations. This does not mean that the x86-64 ELF ABI is perfect. There are a number of aspects where the designers blundered. But it is not overly complicated and people who know the IA-32 ELF ABI have no problems with the x86-64 ABI.

Due to the close relationship with IA-32 and the fact, the x86-64 are automatically mean the end-of-the-line for the old processors, the 64-bit variants are now becoming the mainstream machines. At least so far for desktop, workstations, and servers. Laptops and other mobile devices follow later. The consequence is that now even budget machines are available with the x86-64 architecture. After the next update cycle all machines will be automatically changed to use x86-64. This means many millions of users and access to these machines for enthusiasts.

The last point might not mean much for proprietary operating systems. It is crucial for free software, though, and since Gardner now declared Linux a "mainstream technology", this is also something the hardware producers have to take into account.


Software Status Today



Linux is now supported on all three architectures for quite some time. Stability is reasonable on all architectures but there differences between the three themselves and between IA-32. Very briefly summarized, the situation is as follows. This is a mostly subjective impression, based solely on Linux, but obviously with some insider knowledge.

Architecture Age of Linux port Complexity 1 CPU performance Scalability Stability Time to find bugs 32-bit compat user base machine prices
IA-32 oldest low high low high short N/A huge low
IA-64 2nd oldest very high low very high medium long problematic tiny very high
PPC64 3rd oldest medium-high medium-high high medium-high long good respectable high, but lowering
x86-64 youngest medium-low very high medium-high fairly high short good large low


The tables needs a few explanations:



  • Age refers to how long the support is available in sources form, more or less publically


  • Complexity refers to the ABI and CPU architecture.


  • Scalability means performance degradation as many (hundreds, thousands) processors are added.


  • Stability refers to the overall system stability, more specifically the stability of 64-bit code.


  • The Time to find bugs can be measured by looking at features which are introduced on all architectures at the same time and determining when the last bug fix was applied to make the feature usable. This is a critical measurement.


  • Linux installations on PPC64 use by default 32-bit applications since 64-bit applications are slower. This means that it is not the 32-bit compatibility which is problematic, it is the 64-bit code which is not so well tested since it is not that much used. Both other architectures use 64-bit code by default.


  • Scalability for PPC64 really depends on the actual silicon.





Conclusion



My conclusions from all this is that there are two crucial factors for the success of an architecture: Complexity and Cost (or userbase, the two are interchangeable, both compared comparing machines in the same price/performance class). These factors directly determine the quality of the available software.



This graph shows the correlation between the number of users and the time it takes to discover a bug. It is possible to compensate for lower user numbers with a more dedicated tester/user group (meaning, massively extending the QA department). But experience shows that this can never reach the same level of coverage as a diverse group of users. Few users/testers always use the same paths.

A concequence of this is that it is essential to provide entry-level machines with low prices. Otherwise it cannot be explained why the architecture, which is supported for the shortest time is more stable. The reduced complexity certainly also plays a role, but high complexity also scares off people, indirectly increasing prices due to lowered demand (and of course developing complex solutions is more expensive). HP's exit from the IA-64 workstation market and the connected cancellation of a product for those machine from msft does the exact opposite for this architecture. With Linux there still is an OS available for workstations, but if there is no hardware vendor it makes not much of a difference.

The difference in stability between IA-64 and PPC64 can be explained by the complexity. There are still major problems found in IA-64 code like the unwinder. The compilers still don't generate code as good as could be done by hand (which is necessary to compete). It does not help that 32-bit support will be much harder, if the 32-bit hardware support is going to be removed, as hinted several years ago. The execution layer simply cannot replace real hardware support.

The major advantage of IA-64 is scalability. The question is: is it really that important? There are certainly huge machines being build, for instance, from SGI. But the business world is moving to inexpensive blades. Those will predominantly not run the slower, hotter IA-64 and PPC64 processors.

In summary:


  • cheap hardware is a must, even if it is not the full-fledged processor.

  • design simple ABI, leave out the bells and whistles

  • even if there is a user base, necessary changes to correct mistakes must be made




Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…