September 17th, 2005

Do you still think the LSB has some value?

There are still people out there who think that the LSB has any value. This just means they buy into the advertisement of the people who have monetary benefits from the existence of the specification, they don't do any research, and they generally don't understand ABI issues.

Just look at the recent LSBv3 certification process. Our management got pressured by certain parties into declaring that once again we go through the process. The v3 spec was extended significantly, some new tests were added. And of course, the tests are run against the current code base and using the machines people nowadays use. What is the result of all this: many many reported bugs.

This is nothing new, it has been the case for every test run after an update of the test suite. And the analysis of the failures is also always the same: the bugs are not in the tested code, they are in the test suite. There might be occasionally a problem in the code, I think I've seen one or two of these, but it's safe to say 90+% of the reported bugs are actually problems in the test suite. Look at the LSB bug database and all the reported problems if you doubt that.

Now, considering this, there are some interesting observations to be made:

  • The tests certainly at some point worked. I don't doubt that. But the kind of errors showed that the author not only doesn't really understand the standards, but also cannot really program and doesn't understand hardware. This applies also to the code which is written by the presumed professionals paid by the OpenGroup to write tests. Want an example?
    Look at this. This is no isolated incident, I've found this kind of problems on many occasions.

  • Some distributions still somehow manage to pass the test suits of a new version of the spec. And all this without the people reporting any problems and requesting waiving the test. When we analyze the code and see that it cannot work we file bugs and wait of the test to be waived. If this is not what others do, how can they pass the tests? I hope nobody actually cheats, although I'm not completely dismissing this. This leaves one solution: create a special LSB environment in which the test succeeds. This is possible since the geniuses who came up with the whole idea think that an own dynamic linker is OK; the latter can then use different DSOs which effectively implements a completely separated environment. This environment then can be modified to accommodate the wrong tests.

    But what does this mean for the user of LSB who is falling victim to the illusion of LSB as a stable environment? It means that these programs have to work within an environment which a) is different from the standard runtime environment and b) is wrong. Since the tests are wrong the semantics is not required (after the waiver) from other LSB-compliant implementations and therefore the adjusted LSB environment differs from those environments which didn't change and rely on the waiver. The result of all this is that you can have a program which is certified for LSBv3 which doesn't run on all LSBv3 certified systems, depending on whether the LSB environment worked around the broken test or not. The only way to ensure that this specific kind of problem never happens is to invalidate every certified distribution once a new waiver is released.

  • Another nice things we came across during the LSBv3 testing are numerous timing problems. The bug referenced above is also depending on timing, which is why it hasn't been discovered earlier (the test code is pretty old). When the problem was reported the answer we got from the LSB working group was along the lines use a slow uni-processor machine, it is known to work there. And what do you know, look at the filing for compliance from a certain other distribution. Pay attention to the hardware specification. As a colleague of mine correctly said:

    I'm so glad we worry ourselves and put resources into complying with
    the industry standard for doorstops.

    What is the value of such a certification? What assurance does this give you? Is don't use fast SMP machines an acceptable answer in any universe, especially when it comes to thread tests?

I think it's time to stop kidding ourselves. It is not possible to achieve the goal of 100% binary compatibility except when the same binaries are used everywhere. There are test suite bugs (worked around or not-yet-discovered), there are whole in the test suite as far as specified behavior is concerned (especially when it comes to implementation-defined behavior, since this is nothing the OpenGroup-provided test suites can test for), and there is the huge source of problems called unspecified behavior. Nothing is guaranteed when interfaces are used in a way which is defined as unspecified. So no test of the implementation can ever help. The applications are the issue and here testing is almost entirely lacking. Yes, there is an attempt by the LSB to cover this. But it is completely inadequate as is and who knows anybody who went through the process or even is willing to, considering the extra amount of work this would mean?

Jim Zemlin and Art Tyde actually came for a visit to talk about these issues not too long ago. They were asking for more time as they reorganize the LSB (some personnel affects have been visible, already). But I think all this is futile, regardless of how much time is sunk into this, there will always be holes big enough to drive a truck through. And who is writing the tests? Remember, the guy who wrote the thread test suite for the OpenGroup is supposed to be an expert. Look at the code!

My advise: but the losses. Remove any claim that the LSB will ensure any additional level of assurance for developers. To some extend, I think, the claims a scaled back meanwhile, if I understood Art correctly. It might be useful to still provide the test suites but given their quality, this is questionable at best, too. I'd rather see new tests to be written, maybe as an extension of the POSIX test suite Intel started. After they added the 100+ reports I sent and those others sent the test suite is a somewhat good reflection of who a Linux implementation should behave (important: I wrote Linux and not POSIX).

Until the LSB loses the monetary backing this is unlikely to happen, though. The main reason: people who deal with standards professionally. They of course have everything to lose. These are the same people who brought you useless crap like the Linux ISO spec (based on some old LSB version). These people are paid to participate in calls, meetings, get there travels to exotic places financed. Did you look at the list of meeting places for ISO meetings? I hope the penny squeezers in the companies financing these standardization groups (not only the LSB, there are many more) realize the waste of money most of these efforts are and introduce more control. Yes, something like the Austin Group working group is useful, this is an API standard (as opposed to ABI in case of LSB) which makes it much more realistic and useful. But ISO Linux? Shudder...