<?xml version='1.0' encoding='utf-8' ?>
<!--  If you are running a bot please visit this policy page outlining rules you must respect. http://www.livejournal.com/bots/  -->
<rss version='2.0' xmlns:lj='http://www.livejournal.org/rss/lj/1.0/'>
<channel>
  <title>Ulrich Drepper</title>
  <link>http://udrepper.livejournal.com/</link>
  <description>Ulrich Drepper - LiveJournal.com</description>
  <lastBuildDate>Thu, 22 Nov 2007 02:35:21 GMT</lastBuildDate>
  <generator>LiveJournal / LiveJournal.com</generator>
  <lj:journal>udrepper</lj:journal>
  <lj:journaltype>personal</lj:journaltype>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/19751.html</guid>
  <pubDate>Thu, 22 Nov 2007 02:35:21 GMT</pubDate>
  <title>Producing PDFs</title>
  <link>http://udrepper.livejournal.com/19751.html</link>
  <description>I don&apos;t want to throw this in with the announcement of the availability of the paper on memory and cache handling but I also don&apos;t want to forget it.  So, here we go.&lt;br /&gt;&lt;br /&gt;I write all the text I can using TeX (PDFLaTeX to be exact).  This leads directly to a PDF document without intermediate steps.  The graphics are done using Metapost because I&apos;m better at programming than at drawing.  Metapost produces Postscript-like files which some LaTeX macros then read and directly integrate into the PDF output.&lt;br /&gt;&lt;br /&gt;The result in &lt;a href=&quot;http://people.redhat.com/drepper/cpumemory.pdf&quot;&gt;this case&lt;/a&gt; is a PDF with 114 pages which is only 934051 bytes in size.  Just about 8kB for each page.  Given that the text is multi-column and the numerous graphics in the text this is amazingly small.&lt;br /&gt;&lt;br /&gt;I &lt;a href=&quot;http://udrepper.livejournal.com/12663.html&quot;&gt;mentioned before&lt;/a&gt; how badly OO.org sucks at exporting graphics. I bad all the other word processor, spreadsheets, etc suck just as badly.  Also generated PDFs for text is much, much bigger.&lt;br /&gt;&lt;br /&gt;My guess is that if I&apos;d written the document with OOO.org the size would be north of 4MB, probably significantly more.  I cannot understand why people do this to themselves and, more importantly, to others.</description>
  <comments>http://udrepper.livejournal.com/19751.html</comments>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/19557.html</guid>
  <pubDate>Thu, 22 Nov 2007 02:09:09 GMT</pubDate>
  <title>Memory and Cache Paper</title>
  <link>http://udrepper.livejournal.com/19557.html</link>
  <description>Well, it&apos;s finally done.  I&apos;ve uploaded the PDF of the memory and cache paper to my home page.  You can &lt;a href=&quot;http://people.redhat.com/drepper/cpumemory.pdf&quot;&gt;download it&lt;/a&gt; but do not re-publish it or make it available in any form to others.  I do not want multiple copies flying around, at least not while I&apos;m still intending to maintain the document.&lt;br /&gt;&lt;br /&gt;With Jonathan Corbet&apos;s help the text should actually be readable.  I had to change some of the text in the end to accommodate line breaks in the PDF.  So I might have introduced problems, don&apos;t think bad about Jonathan&apos;s abilities.  Aside, this is a large document.  You simply go blind after a while, I know I do.&lt;br /&gt;&lt;br /&gt;Which brings me to the next point.  Even though I intend to maintain the document, don&apos;t expect me to do much in the near future.  I&apos;ve been working on it for far too long now and need a break.  Integrating all the editing Jonathan produced plus today&apos;s line breaking have given me the rest.  I haven&apos;t even integrated all the comments I&apos;ve received.  I know the structure of the document is in a few places a bit weak, esp section 5 which contains a lot of non-NUMA information.  But it was simply too much work so far.  Maybe some day.</description>
  <comments>http://udrepper.livejournal.com/19557.html</comments>
  <category>programming</category>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/19395.html</guid>
  <pubDate>Tue, 13 Nov 2007 02:05:25 GMT</pubDate>
  <title>The Evils of pkgconfig and libtool</title>
  <link>http://udrepper.livejournal.com/19395.html</link>
  <description>&lt;p&gt;If you need more proof that this insane just look at some of the packages using it.  I recently was looking at krb5-auth-dialog.  The output of &lt;tt&gt;ldd -u -r&lt;/tt&gt; on the original binary shows 26 unused DSOs.&lt;/p&gt;

&lt;p&gt;This can be changed quite easily: add &lt;tt&gt;-Wl,--as-needed&lt;/tt&gt; to link line.  Do this in case of this package all but one of the unused dependencies is going away.  This is several benefits:&amp;lt;/tt&amp;gt;

&lt;p&gt;The binary size is actually measurably reduced.&lt;/p&gt;

&lt;pre&gt;
   text    data     bss     dec     hex filename
  35944    6512      64   42520    a618 src/krb5-auth-dialog-old
  35517    6112      64   41693    a2dd src/krb5-auth-dialog
&lt;/pre&gt;

  &lt;p&gt;That’s a 2% improvement.  Note that all the saved dependencies are all recursive dependencies.  The runtime is therefore not much effected (only a little).  The saved data is pure overhead.  Multiply the number by the thousands of binaries and DSOs which are shipped and the savings are significant.&lt;/p&gt;

&lt;p&gt;The second problem to mention here is that not all unused dependencies are gone because somebody thought s/he is clever and uses -pthread in one of the pkgconfig files instead of linking with &lt;tt&gt;-lpthread&lt;/tt&gt;.  That’s just stupid when combined with the insanity called libtool.  The result is that the &lt;tt&gt;-Wl,--as-needed&lt;/tt&gt; is not applied to the thread library.&lt;/p&gt;

&lt;p&gt;Just avoid libtool and pkgconfig.  At the bery least fix up the pkgconfig files to use &lt;tt&gt;-Wl,--as-needed&lt;/tt&gt;.&lt;/p&gt;</description>
  <comments>http://udrepper.livejournal.com/19395.html</comments>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/19041.html</guid>
  <pubDate>Fri, 09 Nov 2007 04:44:21 GMT</pubDate>
  <title>Energy saving is everybody&apos;s business</title>
  <link>http://udrepper.livejournal.com/19041.html</link>
  <description>&lt;p&gt;With the wide acceptance of laptop and even smaller devices more and more people have been exposed to devices limited by energy consumption.  Still, programmers don&apos;t pay much attention to this aspect.&lt;/p&gt;

&lt;p&gt;This statement is not entirely accurate: there has been a big push towards energy conservation in the kernel world (at least in the Linux kernel).  With the tickless kernels we have the infrastructure to sleep for long times (&lt;q&gt;long&lt;/q&gt; is a relative term here).  Other internal changes avoid unnecessary wakeups.  It is now realy up to the userlevel world to do its part.&lt;/p&gt;

&lt;p&gt;The situation is pretty dire here.  There are some projects (e.g., &lt;a href=&quot;http://www.lesswatts.org/projects/powertop/&quot;&gt;PowerTOP&lt;/a&gt;) which highlight the problems.  Still, not much happens.&lt;/p&gt;

&lt;p&gt;I&apos;ve been &lt;i&gt;somewhat&lt;/i&gt; guilty myself.  nscd (part of glibc) was waking up every 5 seconds to clean up its cache, even if often was to be done.  This program structure has several reasons.  Good ones, but not ultimate reason.  So I finally bit the bullet and changed the program structure significantly to better enable wakeup.  The result is that now nscd at all times determines when the next cache cleanup is due and sleeps until then.  Cache cleanups might be many hours out, so the code improved from one wakeups every 5 seconds to one wakeup every couple of hours.&lt;/p&gt;

&lt;p&gt;nscd is a &lt;b&gt;very&lt;/b&gt; small drop in the bucket, though.  Just look at your machine and examine the running processes and those which are regularly started.  PowerTOP cannot realy help here (Arjan said something will be coming soon though).&lt;/p&gt;

&lt;p&gt;There is a tool which can help, though: systemtap.  Simply create a small script which traps syscalls the violators will use and disply process information.  The syscalls to use include: &lt;tt&gt;open&lt;/tt&gt;, &lt;tt&gt;stat&lt;/tt&gt;, &lt;tt&gt;access&lt;/tt&gt;, &lt;tt&gt;poll&lt;/tt&gt;, &lt;tt&gt;epoll&lt;/tt&gt;, &lt;tt&gt;select&lt;/tt&gt;, &lt;tt&gt;nanosleep&lt;/tt&gt;, &lt;tt&gt;futex&lt;/tt&gt;.  For the latter five it is a matter of small timeout values which is the problem.&lt;/p&gt;

&lt;p&gt;I&apos;ll post a script to do this soon (just not now).  But the guilty parties probably already know who they are.  Just don&apos;t do this quasi busy waiting!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If a program has to react to a file change or removal or creation, use &lt;tt&gt;inotify&lt;/tt&gt;&lt;/li&gt;
&lt;li&gt;for internal cleanups, choose reasonable values and then compute the timeout so that you don&apos;t wake up when nothing has to be done.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to see how &lt;b&gt;not&lt;/b&gt; to do it, look at something like the flash player (the proprietary one).  If you inadvertently have started it it&apos;ll remain active (even if no flash page is displayed) and it is basically busy waiting on something.&lt;/p&gt;

&lt;p&gt;Let&apos;s show the proprietary software world we can do better.&lt;/p&gt;</description>
  <comments>http://udrepper.livejournal.com/19041.html</comments>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/18882.html</guid>
  <pubDate>Mon, 01 Oct 2007 15:09:36 GMT</pubDate>
  <title>Part 2 released</title>
  <link>http://udrepper.livejournal.com/18882.html</link>
  <description>&lt;p&gt;Jonathan and crew published part 2 of the paper.  If you have an LWN subscription you can read it &lt;a href=&quot;http://lwn.net/Articles/252125/&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;</description>
  <comments>http://udrepper.livejournal.com/18882.html</comments>
  <category>linux programming</category>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/18555.html</guid>
  <pubDate>Thu, 27 Sep 2007 17:45:53 GMT</pubDate>
  <title>Directory Reading</title>
  <link>http://udrepper.livejournal.com/18555.html</link>
  <description>&lt;p&gt;In the last weeks I have seen far too much code which reads directory content in horribly inefficient ways to let this slide.  Programmers really have to learn doing this efficiently.  Some of the instances I&apos;ve seen are in code which runs frequently.  Frequently as in once per second.  Doing it right can make a huge difference.&lt;/p&gt;

&lt;p&gt;The following is an exemplary piece of code.   Not taken from an actual project but it shows some of the problems quite well, all in one example.  I drop the error handling to make the point clearer.&lt;/p&gt;

&lt;pre&gt;
  DIR *dir = opendir(some_path);
  struct dirent *d;
  struct dirent d_mem;
  while (readdir_r(d, &amp;d_mem, &amp;d) == 0) {
    char path[PATH_MAX];
    snprintf(path, sizeof(path), &quot;%s/%s/somefile&quot;, some_path, d-&amp;gt;d_name);
    int fd = open(path, O_RDONLY);
    if (fd != -1) {
      ... do something ...
      close (fd);
    }
  }
  closedir(dir);
&lt;/pre&gt;

&lt;p&gt;How many things are inefficient at best and outright problematic in some cases?&lt;/p&gt;

&lt;p&gt;Let&apos;s enumerate:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Why use &lt;tt&gt;readdir_r&lt;/tt&gt;?&lt;/li&gt;
&lt;li&gt;Even the use of &lt;tt&gt;readdir&lt;/tt&gt; is dangerous.&lt;/li&gt;
&lt;li&gt;Creating a path string might exceed the &lt;tt&gt;PATH_MAX&lt;/tt&gt; limit.&lt;/li&gt;
&lt;li&gt;Using a path like this is racy.&lt;/li&gt;
&lt;li&gt;What if the directory contain entries which are not directories?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;tt&gt;readdir_t&lt;/tt&gt; is only needed if multiple thread are using the &lt;i&gt;same&lt;/i&gt; directory stream.  I have yet to see a program where this really is the case.  In this toy example the stream (variable &lt;tt&gt;dir&lt;/tt&gt;) is definitely not shared between different threads.  Therefore the use of &lt;tt&gt;readdir&lt;/tt&gt; is just fine.  Should this matter?  Yes, it should, since &lt;tt&gt;readdir_r&lt;/tt&gt; has to copy the data in into the buffer provided by the user while &lt;tt&gt;readdir&lt;/tt&gt; has the possibility to avoid that.&lt;/p&gt;

&lt;p&gt;Instead of &lt;tt&gt;readdir&lt;/tt&gt; code should in fact use &lt;tt&gt;readdir64&lt;/tt&gt;.  The definition of the &lt;tt&gt;dirent&lt;/tt&gt; structure comes from an innocent time when hard drive with a couple of dozen MB of capacity were huge.  Things change and we need larger values for inode numbers etc.  Modern (i.e., 64-bit) ABIs do this by default but if the code is supposed to be used on 32-bit machines as well the &lt;tt&gt;*64&lt;/tt&gt; variants should always be used.&lt;/p&gt;

&lt;p&gt;Path length limits are becoming an ever-increasing problem.  Linux, like most Unix implementations, imposes a length limit on each filename string which is passed to a system call.  But this does not mean that in general path names have any length limit.  It just means that longer names have to be implicitly constructed through the use of multiple relative path names.  In the example above, what happens if &lt;tt&gt;some_path&lt;/tt&gt; is already close to &lt;tt&gt;PATH_MAX&lt;/tt&gt; bytes in size?  It means the &lt;tt&gt;snprintf&lt;/tt&gt; call will truncate the output.  This can and should of course be caught but this doesn&apos;t help the program.  It is crippled.&lt;/p&gt;

&lt;p&gt;Any use of filenames with path components (i.e., with one or more slashes in the name) is racy and an attacker change any of the contained path components.  This can lead to exploits.  In the example, the &lt;tt&gt;some_path&lt;/tt&gt; string itself might be long and traverse multiple directories.  A change in any of these will lead to the &lt;tt&gt;open&lt;/tt&gt; call not reaching the desired file or directory.&lt;/p&gt;

&lt;p&gt;Finally, while the code above works (the &lt;tt&gt;open&lt;/tt&gt; call will fail if &lt;tt&gt;d-&amp;gt;d_name&lt;/tt&gt; does not name a directory) it is anything but efficient.  In fact, the &lt;tt&gt;open&lt;/tt&gt; system calls are quite expensive.  Before any work is done, the kernel has to reserve a file descriptor.  Since file descriptors are a shared resource this requires coordination and synchronization which is expensive.  Synchronization also reduces parallelism, which might be a big issue in some code.  The &lt;tt&gt;open&lt;/tt&gt; call then has to follow the path which also is  not free.&lt;/p&gt;

&lt;p&gt;To make a long story short, here is how the code should look like (again, sans error handling):&lt;/p&gt;

&lt;pre&gt;
  DIR *dir = opendir(some_path);
  int dfd = dirfd(dir);
  struct dirent64 *d;
  while ((d = readdir64(dir)) != NULL) {
    if (d-&amp;gt;d_type != DT_DIR &amp;&amp; d-&amp;gt;d_type != DT_UNKNOWN)
      continue;
    char path[PATH_MAX];
    snprintf(path, sizeof(path), &quot;%s/somefile&quot;, d-&amp;gt;d_name);
    int fd = openat(dfd, path, O_RDONLY);
    if (fd != -1) {
      ... do something ...
      close (fd);
    }
  }
  closedir(dir);
&lt;/pre&gt;

&lt;p&gt;This rewrite addresses all the issues.  It uses &lt;tt&gt;readdir64&lt;/tt&gt; which will do just fine in this case and it is safe when it comes to huge disk drives.  It uses the &lt;tt&gt;d_type&lt;/tt&gt; field of the &lt;tt&gt;dirent64&lt;/tt&gt; to check whether we already know the file is no directory.  Most of Linux&apos;s directories today fill in the &lt;tt&gt;d_type&lt;/tt&gt; field correctly (including all the pseudo filesystems like &lt;tt&gt;sysfs&lt;/tt&gt; and &lt;tt&gt;proc&lt;/tt&gt;).  Those file systems which do not have the information handy fill in &lt;tt&gt;DT_UNKNOWN&lt;/tt&gt; which is why the code above allows this case, too.  In some program one also might want to allow &lt;tt&gt;DT_LNK&lt;/tt&gt; since a symbolic link might point to a directory.  But more often enough this is not the case and not following symlinks is a security measure.&lt;/p&gt;

&lt;p&gt;Finally, the new code uses &lt;tt&gt;openat&lt;/tt&gt; to open the file.  This avoids the length path lookup and it closes most of the races of the original &lt;tt&gt;open&lt;/tt&gt; call since the pathname lookup starts at the directory read by &lt;tt&gt;readdir64&lt;/tt&gt;.  Any change to the filesystem below this directory has no effect on the &lt;tt&gt;openat&lt;/tt&gt; call.  Also, since now the generated path is very short (just the maximum of 256 bytes for &lt;tt&gt;d_name&lt;/tt&gt; plus 10 we know that the buffer &lt;tt&gt;path&lt;/tt&gt; is sufficient.&lt;/p&gt;

&lt;p&gt;It is easy enough to apply these changes to all the places which read directories.  The result will be small, faster, and safer code.&lt;/p&gt;</description>
  <comments>http://udrepper.livejournal.com/18555.html</comments>
  <category>programming linux</category>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/18193.html</guid>
  <pubDate>Fri, 21 Sep 2007 20:41:39 GMT</pubDate>
  <title>The Series is Underway</title>
  <link>http://udrepper.livejournal.com/18193.html</link>
  <description>Jon Corbet has edited the first two sections of the document I mentioned earlier &lt;a href=&quot;http://udrepper.livejournal.com/17682.html&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;http://udrepper.livejournal.com/17280.html&quot;&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The document will be published in multiple installments, beginning with &lt;a href=&quot;http://lwn.net/Articles/250967/&quot;&gt;Sections 1 and 2&lt;/a&gt; which are available now.  Since LWN is a business the reasonable limitation is put in place that for the first week only subscribers have access to it.&lt;br /&gt;&lt;br /&gt;So, get a &lt;a href=&quot;https://lwn.net/subscribe/&quot;&gt;subscription&lt;/a&gt; to LWN.&lt;br /&gt;&lt;br /&gt;If you find mistakes in the text let me know directly, either as a comment here or as a personal mail.  Don&apos;t bother J on with that.</description>
  <comments>http://udrepper.livejournal.com/18193.html</comments>
  <category>programming</category>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/18038.html</guid>
  <pubDate>Wed, 19 Sep 2007 21:55:10 GMT</pubDate>
  <title>SHA for crypt</title>
  <link>http://udrepper.livejournal.com/18038.html</link>
  <description>Just a short note: I added SHA support to the Unix crypt implementation in glibc.  The reason for all this (including replies to the extended &quot;NIH&quot; complaints) can be found &lt;a href=&quot;http://people.redhat.com/drepper/sha-crypt.html&quot;&gt;here&lt;/a&gt;.</description>
  <comments>http://udrepper.livejournal.com/18038.html</comments>
  <category>security</category>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/17682.html</guid>
  <pubDate>Tue, 14 Aug 2007 03:43:33 GMT</pubDate>
  <title>Publishing Update</title>
  <link>http://udrepper.livejournal.com/17682.html</link>
  <description>&lt;p&gt;A few weeks back I asked how I should publish the document on memory and cache handling.  I got quite some feedback.&lt;/p&gt;

&lt;ul&gt;

&lt;li&gt;There was the usual &lt;q&gt;it doesn&apos;t matter but I want it for free&lt;/q&gt; crowd.&lt;/li&gt;

&lt;li&gt;Then there was the &lt;q&gt;even $8 for a book is too much for me&lt;/q&gt;.  These are people from outside the US and $8 translated to local currency and income is certainly far too much for many people.  I do not throw this group in with the first.&lt;/li&gt;

&lt;li&gt;Several people (all or mostly US-based) thought the idea of printed paper to be nice.  The price was no issue.&lt;/li&gt;

&lt;li&gt;Most people said a freely PDF is more important than a printed copy.  Some derogatory comments about lecturers who require books were heard.  Others said editing isn&apos;t important.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Because of this first obnoxious  group of people I would probably have gone with a print-only route.  This attitude that just because somebody works on free software he always has to make everything available for free makes me sick.  These are most probably the same people who never in their life produced anything that other found of value or they are the criminals working on (mostly embedded) project exploiting free software.&lt;/p&gt;

&lt;p&gt;But since I really want the document to be widely distributed and available to places where $8 is too much money I will release the PDF for free.  But this won&apos;t happen right away.  Unlike some of the people making comments I do think that editing is important.  Fortunately having professional editing and a free PDF don&apos;t exclude each other.&lt;/p&gt;

&lt;p&gt;I&apos;ll not go with a publisher (esp not these $%# at O&apos;Reilly, as several people suggested).  This would in most cases have precluded retaining the copyright and making the text available for free.&lt;/p&gt;

&lt;p&gt;Instead the nice people at &lt;a href=&quot;http://lwn.net/&quot;&gt;LWN&lt;/a&gt;, Jonathan Corbet and crew, will edit the document.  They will then serialize it, I guess, along with the weekly edition.  It&apos;s up to Jon to make this decision.  The document has 8 large section including introduction which means &lt;i&gt;my&lt;/i&gt; guess is that after 7 installments the whole document is published.  Once this has happened I&apos;ll then make the whole updated and edited PDF available.&lt;/p&gt;

&lt;p&gt;This means if you think it&apos;s worth it, get a subscription to the LWN instead of waiting a week to read it for free.&lt;/p&gt;

&lt;p&gt;So in summary, I get professional editing, keep the copyright, and might be able to help getting some more subscribers for the LWN.  Win, win, win.  If the &lt;q&gt;L&lt;/q&gt; in LWN bothers you I&apos;ve news for you: the document itself is very Linux-centric.&lt;/p&gt;

&lt;p&gt;I haven&apos;t forgotten the printed version.  I&apos;ve read a bit more of the Lulu documentation.  Apparently there is a model where I don&apos;t have to pay anything.  People ordering the book pay a per-copy price and that&apos;s it (apparently with discounts for larger orders).  If I submit it in letter/A4 format I don&apos;t have to do any reformatting and the price is less (for the color print) since there are fewer pages.&lt;/p&gt;

&lt;p&gt;I&apos;ll probably try to do this after the PDF is freely available.  People who like to have something in their hands will have their wishes.  The only problem I see right now is that Lulu has a stupid requirement that the PDF documents must be generated with proprietary tools from Adobe.  Of course I don&apos;t do this, I use pdfTeX.  If this proves to be the case I guess I&apos;ll have to have a word with Bob Young...&lt;/p&gt;</description>
  <comments>http://udrepper.livejournal.com/17682.html</comments>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/17577.html</guid>
  <pubDate>Mon, 13 Aug 2007 23:52:58 GMT</pubDate>
  <title>Increasing Virtualization Insanity</title>
  <link>http://udrepper.livejournal.com/17577.html</link>
  <description>&lt;p&gt;People are starting to realize how broken the Xen model is with its privileged Dom0 domain.  But the actions they want to take are simply ridiculous: they want to add the drivers back into the hypervisor.  There are many technical reasons why this is a terrible idea.  You&apos;d have to add (back, mind you, Xen before version 2 did this) all the PCI handling and lots of other lowlevel code which is now maintained as part of the Linux kernel.  This would of course play nicely into Xensource&apos;s (the company) pocket.   Their technical people so far turn this down but I have no faith in this group: sooner or later they want to be independent of OS vendors and have their own mini-OS in the hypervisor.  Adios remaining few advantages of the hypervisor model. But this is of course also the direction of VMWare who loudly proclaim that in the future we won&apos;t have OS as they exist today. Instead only domains with mini-OS which are ideally only hooks into the hypervisor OS where single applications run.&lt;/p&gt;

&lt;p&gt;I hope everybody realizes the insanity of this:&lt;/p&gt;

&lt;ul&gt;

&lt;li&gt;If they really mean single application this must also mean single-process.  If not, you&apos;ll have to implement an OS which can provide multi-process services.  But this means that you either have no support to create processes or you rely on an mini-OS which is a front for the hypervisor.  In VMWare&apos;s case this is some proprietary mini-OS and I imagine Xensource would like to do the very same.&lt;/li&gt;

&lt;li&gt;Imagine that you have such application domains.  All nicely separated because replicated.  The result is a maintainance nightmare.  What if a component which is needed in all application domains has to be updated?  In a traditional system you update the one instance per machine/domain.  With application domains you have to update every single one and not forget one.&lt;/li&gt;

&lt;/ul&gt;

And worst of all:

&lt;ul&gt;

&lt;li&gt;Don&apos;t people realize that this is the KVM model just implemented &lt;b&gt;much&lt;/b&gt; poorer and more proprietary?  If you invite drivers and all the infrastructure into the hypervisor it is not small enough anymore to have a complete code review.  I.e., you end up with a full OS which is too large for that.  Why not use one which already works: Linux. &lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;I fear I have to repeat myself over and over again until the last person recognizes that the hypervisor model does not work for the type of virtualization for commodity hardware  we try to achieve.  Using a hypervisor was simply the first idea which popped into people&apos;s head since it was already done before in quite different environments.  The change from Xen v1 to v2 should have shown how rotten the model is.  Only when you take a step back you can see the whole picture and realize the KVM model is not only better, it&apos;s the only logical choice.&lt;/p&gt;

&lt;p&gt;I know people have invested into Xen and that KVM is not yet there yet but a) there has been a lot of progress in KVM-land and b) the performance is constantly improving and especially with next year&apos;s processor updates hardware virtualization costs will go down even further.&lt;/p&gt;

&lt;p&gt;For sysadmin types this means: do what you have to do with Xen for now.  But keep the investments small. For developers this means: don&apos;t let yourself be tied to a platform.  Use an abstraction layer such as libvirt to bridge over the differences.  For architects this means: don&apos;t looking to Xen for answers, base your new designs on KVM.&lt;/p&gt;</description>
  <comments>http://udrepper.livejournal.com/17577.html</comments>
  <category>virtualization</category>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/17280.html</guid>
  <pubDate>Mon, 25 Jun 2007 17:08:04 GMT</pubDate>
  <title>How to publish?</title>
  <link>http://udrepper.livejournal.com/17280.html</link>
  <description>&lt;p&gt;That is meant as a question to the readers.  The problem I have right now is that I have more or less finished the paper accompanying one of the talks I gave at the Red Hat Summit in Nashville last year.  The slides for the talk about &lt;a href=&quot;http://people.redhat.com/drepper/cpucache-slides.pdf&quot;&gt;CPU Caches&lt;/a&gt; are available.  But quite honestly, as most slide sets, they don&apos;t do the topic any justice.  I had to compress things to &amp;lt; 45 mins which is of course not enough.  The paper covers everything I can currently think of and which makes sense with relation to CPU caches and CPU memory, as far as programmers are concerned (nothing for hardware people).  The title I currently use it&lt;/p&gt;

&lt;blockquote&gt;What Every Programmer Should Know About Memory&lt;/blockquote&gt;

&lt;p&gt;and I think this is adequate.&lt;/p&gt;

&lt;p&gt;For this reason I usually write a paper on the important topics I talk about.  And this topic qualifies.  I consider the topic especially important since it&apos;s almost never treated in the software world &lt;b&gt;at all&lt;/b&gt;.  College grads today in most cases have not the slightest clue about this topic.  Ideally I&apos;d like the paper be picked up by some lecturers (like they do for many of my other publications) and use it in a course.  Heck, I&apos;m even willing to teach it myself if that is what it takes to get credibility.&lt;/p&gt;

&lt;p&gt;The problem I&apos;m facing is that the document is (using my usual paper style, two column etc) around 100 densely packed  pages long.  Some of the people I&apos;ve shown it to suggested that it should rather be published as a book.  I&apos;m a bit unsure about this.  I have a few publisher who for a long time keep pestering me about writing something for them (some even prematurely submitted titles to distributors!).  One I talked to would be willing to print it even though it&apos;s thin for a book.  But there are a lot of pluses and minuses all around:&lt;/p&gt;

&lt;dl&gt;
  &lt;dt&gt;My PDF only&lt;/dt&gt;

  &lt;dd&gt;Going this route means the document is easy to change and extend.  The format is exactly as I want it.  The visibility is restricted, not in the print market.  No professional review.  Due to the size (and use of color) it is hard to print.&lt;/dd&gt;

  &lt;dt&gt;Go with a publisher&lt;/dt&gt;

  &lt;dd&gt;Professional editing, maybe a college edition, visibility through listing in catalogs etc.  Additionally available as e-book.  But it likely means the color has to go (printing in color is expensive) and there will be no free-of-charge copy.  Getting a revision out will be almost impossible.&lt;/dd&gt;

  &lt;dt&gt;Go with Lulu&lt;/dt&gt;

  &lt;dd&gt;The alternative publishing route: I could submit an appropriately formatted PDF to Lulu and have them publish it.  Demand printing, ISBN available.  B&amp;W and color printing possible.  Even e-books if anybody cares.  No professional editing.&lt;/dd&gt;

&lt;/dl&gt;

&lt;p&gt;Going with Lulu has the advantages I want but it&apos;s quite an effort.  And there are costs associated with it.  I do not plan to make money out of all this but I&apos;d have to recover the costs.  Excess gains would probably go to charity (in my case this is the &lt;a href=&quot;http://www.mbayaq.org/&quot;&gt;Monterey Bay Aquarium&lt;/a&gt; in case anybody is interested).&lt;/p&gt;

&lt;p&gt;So, the questions I have and would like to get some feedback on are:&lt;/p&gt;

&lt;ul&gt;

&lt;li&gt;Are printed copies wanted at all?  Especially for those teaching, is it a prerequisite?&lt;/li&gt;

&lt;li&gt;If yes, do you prefer a professional, more expensive book?&lt;/li&gt;

&lt;li&gt;Or perhaps an amateur-ish publication which is either B&amp;W and cheap (I guess not much more than $10)...&lt;/li&gt;

&lt;li&gt;... or a colored print for around $30.  The paper has currently around 60 diagrams and color helps.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;If you have an opinion and a mail or add a comment to the blog (which won&apos;t be published).  I know it is not easy to answer given that you haven&apos;t seen the material.  But this is the same for most books, isn&apos;t it?  Look at the slides and assume 100 times more details.  I doubt I&apos;ll find many people who know all these details now (I had to do research myself).&lt;/p&gt;</description>
  <comments>http://udrepper.livejournal.com/17280.html</comments>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/17109.html</guid>
  <pubDate>Fri, 01 Jun 2007 13:53:06 GMT</pubDate>
  <title>grep and color</title>
  <link>http://udrepper.livejournal.com/17109.html</link>
  <description>&lt;p&gt;I cannot believe there are still people who are surprised they see me working with the command line on my machine or when I tell them otherwise the the output of grep can use highlighting.  Just add &lt;tt&gt;--color&lt;/tt&gt; to the command line (with the optional argument just like ls).  I&apos;ve implemented that more than six years ago.  In my &lt;tt&gt;.bashrc&lt;/tt&gt; I  have the following:&lt;/p&gt;

&lt;pre&gt;
alias egrep=&apos;egrep --color=tty -d skip&apos;
alias egrpe=&apos;egrep --color=tty -d skip&apos;
alias fgrep=&apos;fgrep --color=tty -d skip&apos;
alias fgrpe=&apos;fgrep --color=tty -d skip&apos;
alias grep=&apos;grep --color=tty -d skip&apos;
alias grpe=&apos;grep --color=tty -d skip&apos;
&lt;/pre&gt;

&lt;p&gt;Yes, I mistype &lt;tt&gt;grep&lt;/tt&gt; often enough to warrant the extra aliases.  Using &lt;tt&gt;tty&lt;/tt&gt; as the color mode mean that if I pipe the output into another program there won&apos;t be any color escape sequences added which could irritate those programs.&lt;/p&gt;

&lt;p&gt;Just make your life easier and add such aliases, too.&lt;/p&gt;</description>
  <comments>http://udrepper.livejournal.com/17109.html</comments>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/16844.html</guid>
  <pubDate>Tue, 22 May 2007 18:46:46 GMT</pubDate>
  <title>pthread_t and similar types</title>
  <link>http://udrepper.livejournal.com/16844.html</link>
  <description>&lt;p&gt;Constantly people complain that the runtime does not catch their mistakes.  They are hiding behind this requirement in the POSIX specification (for &lt;tt&gt;pthread_join&lt;/tt&gt; in this case, also applies to &lt;tt&gt;pthread_kill&lt;/tt&gt; and similar functions):&lt;/p&gt;

&lt;pre&gt;
       The pthread_join() function shall fail if:
       [...]

       ESRCH  No thread could be found corresponding to that specified by the given thread ID.
&lt;/pre&gt;

&lt;p&gt;The glibc implementation follows this requirement to the letter.  *IFF* we can detect that the thread descriptor is invalid we do return &lt;tt&gt;ESRCH&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;But&lt;/b&gt;: the above does not mean that all uses of invalid thread descriptors must result in &lt;tt&gt;ESRCH&lt;/tt&gt; errors.  The reason is simple: the standard does not restrict the implementation in any way in the definition of the type &lt;tt&gt;pthread_t&lt;/tt&gt;.  It does not even have to be an arithmetic type.  This means it is valid to use a pointer type and this is just what NPTL does.&lt;/p&gt;

&lt;p&gt;Nobody argues that functions like &lt;tt&gt;strcpy&lt;/tt&gt; should not dump a core in case the buffer is invalid.  The same for &lt;tt&gt;pthread_attr_t&lt;/tt&gt; references passed to &lt;tt&gt;pthread_attr_init&lt;/tt&gt; etc.  The use of &lt;tt&gt;pthread_t&lt;/tt&gt; when defined as a pointer is no different.  The only complication is in the understanding that &lt;tt&gt;pthread_t&lt;/tt&gt; can be a pointer type.  This is obvious for &lt;tt&gt;void*&lt;/tt&gt; etc.&lt;/p&gt;

&lt;p&gt;In the POSIX committee we discussed several times changing the &lt;tt&gt;pthread_join&lt;/tt&gt; and &lt;tt&gt;pthread_kill&lt;/tt&gt; man pages.  The &lt;tt&gt;ESRCH&lt;/tt&gt; errors could be marked as &lt;q&gt;may fail&lt;/q&gt;.  But&lt;/p&gt;

&lt;ol&gt;

&lt;li&gt;this really is not necessary, see above.&lt;/li&gt;

&lt;li&gt;it would mean we have to go through the entire specification and treat every other place where this is an issue the same way.&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;If somebody wants to do the work associated with the second step above and we have confidence in the results, we (= Austin Group) might make the change at some later date.  But it is a rather high risk for no real gain.  Programmers have to educate themselves anyway.&lt;/p&gt;

&lt;p&gt;What remains is the question: how can programs avoid these mistakes?  It is actually pretty simple: the program should make sure that no calls to &lt;tt&gt;pthread_kill&lt;/tt&gt;, for instance, can happen when the thread is exiting.  One way to solve this problem is:&lt;/p&gt;

&lt;ol&gt;

&lt;li&gt;Associate a variable &lt;tt&gt;running&lt;/tt&gt; of some sort and a mutex with each thread.

&lt;li&gt;In the function started by &lt;tt&gt;pthread_create&lt;/tt&gt; (the thread function) set &lt;tt&gt;running&lt;/tt&gt; to true.&lt;/li&gt;

&lt;li&gt;Before returning from the thread function or calling &lt;tt&gt;pthread_exit&lt;/tt&gt; or in a cancellation handler acquire the mutex, set &lt;tt&gt;running&lt;/tt&gt; to false, unlock the mutex, and proceed.&lt;/li&gt;

&lt;li&gt;Any thread trying to use &lt;tt&gt;pthread_kill&lt;/tt&gt; etc first must get the mutex for the target thread, if &lt;tt&gt;running&lt;/tt&gt; is true call &lt;tt&gt;pthread_kill&lt;/tt&gt;, and finally unlock the mutex.&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;This ensures that no invalid descriptor is used.  But I can already hear people complain:&lt;/p&gt;

&lt;blockquote&gt;This is too expensive!&lt;/blockquote&gt;

&lt;p&gt;That is ridiculous.  The implementation would have to do something similar if it would try to catch bad thread descriptors.  In fact, it would have to do more.  What is important is to recognize that this price would have to be paid by &lt;sl&gt;every&lt;/sl&gt; program, not just the buggy ones.  This is wrong.  Only those people who need this extra protection should pay the price.&lt;/p&gt;

&lt;blockquote&gt;But I don&apos;t have control over the code calling &lt;tt&gt;pthread_create&lt;/tt&gt;!&lt;/blockquote&gt;

&lt;p&gt;Boo hoo, cry me a river.  Don&apos;t expect sympathy for using proprietary software.  I will never allow good free software to be shackled because of proprietary code.  If  you cannot get this changed in the code you pay good money for this just means it is time to find a new supplier or, even better, use free software.&lt;/p&gt;

&lt;p&gt;In summary, this is entirely a problem of the programs which experience them.  Existing Linux systems are proof that it is possible to write complex programs without requiring the implementation to help incompetent programmers.  We will have a few more words in the next revision of the POSIX specification which talk about this issue.  But I expect they will be ignored anyway and all focus remains on the &lt;q&gt;shall fail&lt;/q&gt; errors of &lt;tt&gt;pthread_kill&lt;/tt&gt; etc.&lt;/p&gt;</description>
  <comments>http://udrepper.livejournal.com/16844.html</comments>
  <category>programming posix</category>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/16394.html</guid>
  <pubDate>Sat, 12 May 2007 17:49:32 GMT</pubDate>
  <title>The Growing Importance of Parallel Programming</title>
  <link>http://udrepper.livejournal.com/16394.html</link>
  <description>At the 2007 Red Hat Summit in San Diego which just which just wrapped up yesterday I gave a talk about parallel programming which the marketing folks retitled &lt;q&gt;Programming for tomorrow&apos;s high speed processors, today&lt;/q&gt;.&lt;br /&gt;&lt;br /&gt;The crux of the talk is that programmers in the future cannot always rely on improving hardware to make their programs run faster.  This is summarized nicely in the following graph which I generated from performance data for x86 processors.&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;http://people.redhat.com/drepper/coreperformance.png&quot; /&gt;&lt;br /&gt;&lt;br /&gt;The crucial part is the divergence of the two lines going forward and the flattening of the blue line.  This means programs which are not able to take advantage of ever increasing numbers of processing cores simply won&apos;t run (much) faster.&lt;br /&gt;&lt;br /&gt;Parallel programming is hard.  There are algorithms to change to allow more than one thread in parallel.  Well, not necessarily thread, especially on Linux one should use processes if the sharing requirement between the processes makes this feasible.&lt;br /&gt;&lt;br /&gt;There are data structures to lay out correctly to allow a) vectorization and b) data parallelization.  Vectorization is important if one wants to come even close to the peak performance listed for the processor.  But when you do this you also have to know a lot about CPU design (pipelines etc), caches, and memory.&lt;br /&gt;&lt;br /&gt;And then there is something people might have heard about but didn&apos;t really register: co-processors are back.  Intel&apos;s Geneseo and AMD&apos;s Torrenza are technologies to couple 3rd party processors tightly to the existing processor-memory mash.&lt;br /&gt;&lt;br /&gt;In general I think the industry is entirely ill-prepared for these upcoming changes.  Many/most programmers are not able to write code with these requirements.  Companies and other organizations will have to invest into education.  The system provides (like Red Hat) have to find ways to make parallel programming easier.&lt;br /&gt;&lt;br /&gt;One big step in the right direction is OpenMP.  Officially supported in gcc 4.2 Red Hat has backported the changes to our gcc 4.1 used in RHEL5 and Fedora Core 6 and later.  Not only does OpenMP allow relatively easy conversion of existing code, it also frees the programmer from dealing with all the details of thread lifetime handling, thread stacks, etc.  Even mutual exclusion happens at a higher level.  All this is good, It will make programmers more productive if only it is used more often.&lt;br /&gt;&lt;br /&gt;But there is one more thing: the OpenMP runtime is basically in complete control.  It can decide on using just one thread or many threads.  It can decide where to run threads and many more things.  All these details are hidden from the programmer.  This is a good thing since it allows the runtime to perform optimizations.  I&apos;ll have more about this at a later date.&lt;br /&gt;&lt;br /&gt;In summary, programmers have to learn, re-learn or for the first time, about parallelism.  I think the topic of this talk is very important.  If you are a Red Hat customer you could potentially ask for somebody from Red Hat to come in and talk about these issues.  I&apos;ll give the slides and the details to our consulting organization and possibly also sales engineers.  I cannot make any promises but I&apos;ll encourage those gals and guys to be willing to talk about this.  If you&apos;re a big enough customer and you demand it, I might (have to) come out myself, if this is wanted.  Or somebody can organize gatherings in places I have to go to anyway and have me speak there.</description>
  <comments>http://udrepper.livejournal.com/16394.html</comments>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/16362.html</guid>
  <pubDate>Sat, 12 May 2007 17:04:35 GMT</pubDate>
  <title>nscd and DNS TTL</title>
  <link>http://udrepper.livejournal.com/16362.html</link>
  <description>Recently some people spread their non-existing knowledge about nscd (Name Service Cache Daemon) by claiming it ignores the TTL (time-to-live) value a DNS server returns.  As far as I know this rampant ignorance is especially wide-spread in the ubuntu world.  They claim that for this reason one has to run a local, caching DNS server.  This is complete nonsense.  nscd does handle TTL for a long time now (committed to the public CVS on 2004-9-15).  All reasonable requests are handled,  i.e., all &lt;tt&gt;getaddrinfo&lt;/tt&gt; requests.&lt;br /&gt;&lt;br /&gt;As I have pointed out many times before (&lt;a href=&quot;http://udrepper.livejournal.com/16116.html&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;http://people.redhat.com/drepper/userapi-ipv6.html&quot;&gt;here&lt;/a&gt; and in other places), it is completely unacceptable today to use &lt;tt&gt;gethostbyname&lt;/tt&gt; etc.  These functions simply don&apos;t work.  Which is why I found it unnecessary to make the implementation of nscd more complicated and add more compatiblity and maintenance problems just to fix one of the many problems these interfaces have.  Just don&apos;t use them and convert all your programs (e.g., I think we&apos;ve done just that for all of RHEL and Fedora nowadays).  Also don&apos;t use&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;
  getent hosts some.host
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;You have to use&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;
  getent ahosts some.host
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;For all &lt;tt&gt;getaddrinfo&lt;/tt&gt; lookups the TTL value from DNS replies takes precedence over the TTL value from &lt;tt&gt;/etc/nscd.conf&lt;/tt&gt;.  The latter is used for services which do not provide a TTL themselves (today all other services).</description>
  <comments>http://udrepper.livejournal.com/16362.html</comments>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/16116.html</guid>
  <pubDate>Wed, 07 Mar 2007 09:11:14 GMT</pubDate>
  <title>getaddrinfo is not just for IPv6</title>
  <link>http://udrepper.livejournal.com/16116.html</link>
  <description>&lt;p&gt;I&apos;ve heard far too often that &lt;tt&gt;getaddrinfo&lt;/tt&gt; is only interesting for IPv6 and therefore can be ignored since one does not have IPv6.&lt;/p&gt;

&lt;p&gt;Aside from the fact that all programs should be protocol independent this statement is bogus.  &lt;tt&gt;gethostbyname&lt;/tt&gt; etc do not perform correctly in some situations where only ever IPv4 is involved.&lt;/p&gt;

&lt;p&gt;Assume you have an internal IPv4 network with, say, 192.168.x.y addresses.  In addition you have a server (web server, for instance) which is also visible on the Internet.  This server has two addresses: one 192.168.x.y address and one global address.  The client is a NATed machine on the intranet.&lt;/p&gt;

&lt;p&gt;Now what happens if the nameserver returns both addresses to a query for the addresses of said server?  With &lt;tt&gt;gethostbyname&lt;/tt&gt; the addresses are returned to the caller in the order they are received from the DNS server.  Maybe some randomization is applied.  In short, it is possible that the internal machine gets sees the public IPv4 address and then connects to it.  This is not only wasteful (the request has to be routed through a switch), it might even be dangerous (the traffic might actually have to go through the Internet).&lt;/p&gt;

&lt;p&gt;With &lt;tt&gt;getaddrinfo&lt;/tt&gt; this is not the case.  The sorting according to RFC 3484 makes sure that the internal address of the server is returned first.  The sorting function will notice that the source address used on the client is also an internal address and therefore the internal address of the server is a better match than the global address.&lt;/p&gt;

&lt;p&gt;In summary, &lt;tt&gt;gethostbyaddr&lt;/tt&gt; is not only about IPv6.  The old interfaces were simply completely inadequate and should never be used.  If you &lt;i&gt;still&lt;/i&gt; haven&apos;t converted your programs to use &lt;tt&gt;getaddrinfo&lt;/tt&gt; instead of &lt;tt&gt;gethostbyname&lt;/tt&gt; and &lt;tt&gt;gethostbyname2&lt;/tt&gt; do it now.  I have written some time ago a brief &lt;a href=&quot;http://people.redhat.com/drepper/userapi-ipv6.html&quot;&gt;intro&lt;/a&gt;.&lt;/p&gt;</description>
  <comments>http://udrepper.livejournal.com/16116.html</comments>
  <category>programming</category>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/15795.html</guid>
  <pubDate>Tue, 27 Feb 2007 06:07:20 GMT</pubDate>
  <title>Xensource/VMWare start sandbagging</title>
  <link>http://udrepper.livejournal.com/15795.html</link>
  <description>&lt;p&gt;With KVM proving more and more that it is viable Xensource and VMWare start sandbagging.  They call KVM &lt;q&gt;immature&lt;/q&gt; and the wrong approach (see their quotes in &lt;a href=&quot;http://news.com.com/KVM+steals+virtualization+spotlight/2100-7344_3-6161804.html&quot;&gt;CNET&lt;/a&gt; article).&lt;/p&gt;

&lt;p&gt;Calling KVM is immature is, well, premature and misleading.  Xen has a headstart of several years.  KVM is today not supposed to be in the state Xen is.  Nevertheless, KVM already has hardware virt support, SMP support, support for 64-bit host and guests (despite what the article says), live migration, and more.  Xen simply started from the other direction, namely para-virt, hardware virt took them a long time and a lot of help from the hardware vendors.  I think para-virt will be done RSN.&lt;/p&gt;

&lt;p&gt;But &lt;q&gt;immature&lt;/q&gt; is not the worst complain.  Claiming the hypervisor approach is the only viable option is what should get people worked up.  Look at the arguments:&lt;/p&gt;

&lt;blockquote&gt;[...] but hypervisors offer better performance, have security advantages, and juggle the competing needs of multiple virtual machines better [...]&lt;/blockquote&gt;

&lt;blockquote&gt;In order to [deliver Virtual Infrastructure], you need the separate hypervisor layer.&lt;/blockquote&gt;

&lt;p&gt;These are bogus claims.  And you have realize where they come from.  VMWare’s ESX is a kernel on itself, one which only few people work on (compared to something like Linux).  Device drivers will always be a nightmare unless/until devices get their own PCI devices (once DMA can be virtualized).  Nevertheless, ESX is a full OS by itself.  Plus, ESX has the service console a Linux OS.  The service console of course has to have some control over the hypervisor.&lt;/p&gt;

&lt;p&gt;For Xen the situation is similar.  Here the hypervisor, after the mistakes of the 1.x series, don’t have device drivers included and use a privileged domain, a complete OS.&lt;/p&gt;

&lt;p&gt;This means, both Xen and VMWare do not have less code.  I’d say they even have more code that is part of the privileged code base.  Certainly a Linux installation hosting KVM domains can be scaled down to only have the kernel, kqemu, and the service console.&lt;/p&gt;

&lt;p&gt;As for specific security support, Xen has in theory shype or whatever will come out of it.  Like SELinux, it’s based on Flask.  But it still is a separate code base.  And if shype is actually moved out of the hypervisor itself and into a separate domain you have to worry about even more interfaces to worry about.  I haven’t seen any security features of this caliber even mentioned for VMWare.  With KVM, the SELinux policy governing the kernel can also handle the KVM module.  It’s after all part of the same kernel.  One implementation, one policy.&lt;/p&gt;

&lt;p&gt;As for performance, let’s wait until KVM actually has been optimized.  Ingo did some work on a para-virt network driver and the results are simply great.  It’s just that performance tuning hasn’t been a focus.  In theory there is absolutely no reason why the KVM approach should be any slower.&lt;/p&gt;

&lt;p&gt;As for better scheduling with a hypervisor: that can only be a joke.  Especially for Xen, the privileged domain (Dom0) has to be scheduled without the hypervisor having any insight into the Dom0 kernel.  How can this be better?  For VMWare we have a simple-minded OS serving as the hypervisor.  The Linux kernel has support for all kinds of situations, including NUMA machines, many processor machines, HT and multi-core processors etc.  And it’s an O(1) scheduler which sooner or later will make a difference even for hypervisors.&lt;/p&gt;

&lt;p&gt;And then there are the advantages the KVM solution has.  For instance, ever tried to run Xen on a laptop while on battery?  It’s almost not worth it since power management does not exist.  The machine will always run at full power.  VMWare has the same problem. This is not only an issue for laptops.  Cooling is a major issue in data centers.  Maybe even a bigger issue with increasing density.&lt;/p&gt;

&lt;p&gt;NUMA has already been mentioned, but there is also the memory allocation issue as part of the problem.  Xen has nothing of it, I bet VMWare neither or something simple.  The Linux kernel can provide KVM with all kinds of support, as the performance on big NUMA machines like SGI’s Altix shows.&lt;/p&gt;

&lt;p&gt;In short: neither Xen nor VMWare have any real advantages which cannot be surmounted by giving KVM more time to catch up, i.e., grant it the same time to develop the features.  On the other hand there are device driver issues which VMWare will never be able to muster.  Xen is not included into the mainstream kernel and even with paravitr_ops interface will be lagging because it needs synchronization.&lt;/p&gt;

&lt;p&gt;So why do these companies (and Xensource makes this statement as a company) make such statements?  The answer should be not surprising: they have a lot or all to lose.  KVM can be the one-in-all solution, unlike any of the others.  Xensource and VMWare want to get on your system by providing a hypervisor which then can be used with all kinds of OSes. But: they are in ultimate control.  The idea that there suddenly is a virtualization solution which does not need any hypervisor must be absolutely frightening to them.  So, they try to suppress this no technology from the start.&lt;/p&gt;

&lt;p&gt;Don’t believe the propaganda.  Try KVM once it Fedora 7 is out.  I expect it to be updated over the lifetime of Fedora 7.  Or for the more adventurous people, start using rawhide now and keep using it.&lt;/p&gt;</description>
  <comments>http://udrepper.livejournal.com/15795.html</comments>
  <category>virtualization</category>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/15543.html</guid>
  <pubDate>Thu, 22 Feb 2007 21:27:11 GMT</pubDate>
  <title>DST Panic</title>
  <link>http://udrepper.livejournal.com/15543.html</link>
  <description>&lt;p&gt;With the DST rule changes for the US going into effect real soon (2007-3-11) people are panicking.&lt;/p&gt;

&lt;ol&gt;

&lt;li&gt;there are those who run completely obsolete OSes.  People contact us for support of RHL9 or even RHL7.2 (that&apos;s the predecessor of Fedora, for those who don&apos;t know.  Guess what, even FC4 is not supported anymore, leave alone anything earlier.  The DST change is just one of many good reasons to update.  Security is the other big one.&lt;/li&gt;

&lt;li&gt;many applications are broken and they use private timezone data.  In their quest to achieve &lt;q&gt;perfect&lt;/q&gt; portability the people writing the Java runtime added the data into their sources.  And unfortunately the same has been done for libgcj.  Only somebody without the slightest clue about the nature of DST rules would do something like this.  Would Sun/BEA/IBM/... be willing to update their JVMs 20 times a year for all DST changes (if they would have that broad support in the first place)?  Of course not.  Only people in countries with stable rules would not think about it.  There probably hasn&apos;t been a day in the life of the JVMs when the data was really accurate. and complete.&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;Time zone changes are nothing special.  I guess on average we see about 20 each year, maybe more.  Do you see people packing 20 times a year?  No, only if the US is involved.  Yes, you can argue that more computers are affected but aren&apos;t the computers which are in countries affected by those changes as important to the people living there?  There are also banks, utilities, etc which need to keep the correct time.&lt;/p&gt;

&lt;p&gt;Having lived here in the US now for quite a few years I think the root of the problem is the same that keeps the US from making progress on other fronts: fear of change and trying to  prevent change through denial.  Another example?  Take the measurement system.  When knowing the metric and the imperial system equally well, who would argue the latter is better?  And it&apos;s not that people don&apos;t know the metric system at all.  There are large numbers of people who serve/d in the military and all these people had to use it in their job.  Every food container also shows grams.  But I&apos;m getting off-topic.&lt;/p&gt;

&lt;p&gt;Fact is, people delay things.  Delaying to update their OSes and even delaying to think about the problem.  It might go away on its own and then no time has been wasted.  But guess what: the DST change is coming.&lt;/p&gt;

&lt;p&gt;So, people, get your act together.  Update your OSes.  If you really for some obscure reason cannot do this, update the timezone data (in &lt;tt&gt;/usr /share/zoneinfo&lt;/tt&gt;).  The data we have today is fully compatible, even the extended file format.  Since there is no glibc update coming you could just overwrite the files without fear of reverting the changes inadvertently later.  Update applications with their own timezone data.  There are lots of (especially big) programs which come with their own data.  The timezone data is free to copy by everybody so companies take &lt;q&gt;advantage&lt;/q&gt; of this.  Now you know why I always advised against this.  More likely than not, updates for old versions of these programs are not available anymore.  Make sure you let the companies who produce this kind of shitty products know what you think of them, duplicating timezone data is always bad.  As for JVM: have fun!  Old versions will get no updates and new version often don&apos;t run on old OS versions.  At least libgcj should now be &lt;a href=&quot;http://gcc.gnu.org/ml/gcc-patches/2007-02/msg01735.html&quot;&gt;fixed for good&lt;/a&gt; due to the work of one of my collegues.&lt;/p&gt;

&lt;p&gt;Once the timezone data is updated some more steps might be needed.  Many programs will just work.  glibc detects updates to the &lt;tt&gt;/etc/localtime&lt;/tt&gt; file and reloads the data.  Lots of people complained about this in the past and present since it means time operations cause my filesystem operations, but it is critical in some situations.  If a process only uses the time functions which do not implicitly call &lt;tt&gt;tzset()&lt;/tt&gt; they must be restarted.  The same is true for processes which have the &lt;tt&gt;TZ&lt;/tt&gt; environment variable set.  In general you cannot know whether a process falls into any of the later categories.  The safest thing to do is to reboot the machine.&lt;p&gt;</description>
  <comments>http://udrepper.livejournal.com/15543.html</comments>
  <category>dst</category>
  <category>programming</category>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/15119.html</guid>
  <pubDate>Tue, 20 Feb 2007 23:27:42 GMT</pubDate>
  <title>More array fun</title>
  <link>http://udrepper.livejournal.com/15119.html</link>
  <description>&lt;p&gt;As a continuation of &lt;a href=&quot;http://udrepper.livejournal.com/13851.html&quot;&gt;a previous post&lt;/a&gt;, here&apos;s another thing I frequently stumble across:&lt;/p&gt;

&lt;pre&gt;#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;string.h&amp;gt;
int main(void)
{
  const char s[] = &quot;hello&quot;;
  strcpy (s, &quot;bye&quot;);
  puts (s);
  return 0;
}
&lt;/pre&gt;

&lt;p&gt;Yes, this code will produce a warning.  But it will run.  Slowly, since it does not do what the programmer actually meant.  &lt;tt&gt;s&lt;/tt&gt; is a dynamic variable.  The compiler has to allocate space on the stack (or in TLS) and then copy the string from some static, read-only area into it.  This of course is not only slow, uses memory, it also means that the newly created string is &lt;b&gt;NOT&lt;/b&gt; read-only.  The compiler-generate function prologue has to write to the memory.&lt;/p&gt;

&lt;p&gt;Whenever you write code where you define an array in the scope of a function, always stop and think what the semantics should be.  For all constant arrays it is almost always correct to have exactly one copy (it cannot be changed).  If one copy is needed or OK then don&apos;t forget the &lt;tt&gt;static&lt;/tt&gt;:

&lt;pre&gt;
  static const char s[] = &quot;hello&quot;;
&lt;/pre&gt;

&lt;p&gt;If you do this all of a suddenly the code will not only produce a warning, it will also crash at runtime since the string is stored in read-only memory and &lt;tt&gt;s&lt;/tt&gt; is not now really a variable anymore, it&apos;s a label for the region in read-only memory.&lt;/p&gt;</description>
  <comments>http://udrepper.livejournal.com/15119.html</comments>
  <category>programming</category>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/15083.html</guid>
  <pubDate>Tue, 13 Feb 2007 05:09:46 GMT</pubDate>
  <title>But I Have Nothing Of Interest On My Machine</title>
  <link>http://udrepper.livejournal.com/15083.html</link>
  <description>&lt;p&gt;I&apos;m sick and tired of hearing people saying&lt;/p&gt;

&lt;blockquote&gt;I don&apos;t have to secure my machine since I have nothing of interest on it.  Nobody would want to steal anything I have.
&lt;/blockquote&gt;

&lt;p&gt;That&apos;s absolutely not the point.  Yes, some attackers are after personal data like account numbers.  But this is not all:&lt;/p&gt;

&lt;ul&gt;

&lt;li&gt;passwords are high on the list since people use the same password for all their accounts, be it banks, Amazon, eBay, whatever.  Do you still agree you don&apos;t have anything interesting protected by those passwords?&lt;/li&gt;

&lt;li&gt;if a machine can be taken over it can be used to a) sniff the local network, b) attack other machines, c) send spam.  Some ISPs already stopped being lenient towards idiots who allow this to happen unchecked and they simply suspend the accounts.  Do you care about having an Internet connection?&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Security &lt;b&gt;always&lt;/b&gt; matters even if the data stored on the machine is benign.  Nobody should be allowed to even run machines which have no distinction between user and administrator.  This includes more and more Linux people because new idiot distributions like Linspire, NimbleX, etc pop up.  No machine should be without firewalls, in both directions.  For RHEL/Fedora users it of course doesn&apos;t stop there, we have many more security features and if it would be up to me I would take out the switch to disable them.&lt;/p&gt;

&lt;p&gt;Next time when you see somebody writing nonsense like the above (or hear them talking like this) do me a favor: smack them a bit so that they come to their senses.  These are the people who create the opportunity for spam, phishing, and other illicit activities.  Heck, they deserve more then a bit of smacking...&lt;/p&gt;</description>
  <comments>http://udrepper.livejournal.com/15083.html</comments>
  <category>security</category>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/14840.html</guid>
  <pubDate>Wed, 07 Feb 2007 08:50:43 GMT</pubDate>
  <title>RSA conference, Day 1 (for me)</title>
  <link>http://udrepper.livejournal.com/14840.html</link>
  <description>I had the podium discussion today (nothing special to report) and so I stayed a bit longer until my ride arrived.   What to do?  The show floor is boring for me nobody really targets developers.  So join a few sessions.&lt;br /&gt;&lt;br /&gt;The first by Eugene Kaspersky.  Well known name, quite interesting title: &lt;q&gt;The Dark Side of Cybercrime: Details on the Latest Hacker Tactics from Around the World&lt;/q&gt;.  What would you expect when reading this?  I myself expected to actually learn about attack vectors etc since this guy must be exposed to them on a daily basis.&lt;br /&gt;&lt;br /&gt;Well, Mr. Kaspersky didn&apos;t think so.  He spent the first 40-45 minutes on recounting the history of attacks, viruses, worms, trojans, etc.  Some statistics thrown in, some pictures of authors.  Then in the last 5-10 minutes he talks about attacks going on today but still only at the level of &lt;q&gt;there will be phishing attacks, and data theft, and ...&lt;/q&gt;.  And suddenly it was all over?&lt;br /&gt;&lt;br /&gt;If the title promises the &lt;q&gt;latest&lt;/q&gt; tactics, why waste time on ancient history?  When promising &lt;q&gt;details&lt;/q&gt;, why only scratch the surface and throw out a few buzzwords?  This was probably one of the most wasteful hour I&apos;ve spent in a long time.  Heck, I might have enjoyed an HR seminar more than this baloney.&lt;br /&gt;&lt;br /&gt;Still not time to leave, so I go into the podium discussion about &lt;q&gt;Virtualization and Security&lt;/q&gt;.  I was skeptical from the get go.  A panel without anyone who actually works on virtualization technology.  Only &lt;q&gt;security professionals&lt;/q&gt;, i.e., the people who benefit from security problems.  Turns out this discussion is really meant as a big fright fest.  It was an enumeration of additional problems in security, monitoring, auditing when you deploy virtualization.  Close to the end one of the panelists actually asked (I paraphrase) &lt;q&gt;And who in the audience still considers deploying virtualization after what you heard here today?&lt;/q&gt;&lt;br /&gt;&lt;br /&gt;I&apos;m always willing to accept that there are some new problems.  They are mostly concerning the introduction of a new code base (hypervisor or the hardware emulation like KQEMU) and the interfaces between it an the VMs.  But many (most?) of the problems they mentioned are home made or are simply problems which exist without virtualization.  For instance, they were complaining about VLANs which are created between the domains so that a single NIC is sufficient for all domains.  Dah!  If this is a problem for you, don&apos;t do it.  Use separate network cards for each domain.  PCI forwarding is there and by the time people actually start deploying Intel will have VT-d in their chips (and AMD whatever they need).  We&apos;ll soon enough have NICs with virtualization supoprt built in (Infiniband already can do this today).  Once this is true I hear them shout &lt;q&gt;but who audits the firmware which implements this&lt;/q&gt; (it&apos;ll indeed something mostly implemented in firmware).  The answer here is again: do you audit the firmware of the NIC today?  I don&apos;t think so and still it can very well be a security risk.&lt;br /&gt;&lt;br /&gt;I took away from this that the security industry sees virtualization as yet another source of money and full employment.  Yes, you&apos;ll have problems if you do stupid things when deploying virtualization.  But the same is true without virtualization.  I fail to see the difference.  And the panel constantly reminded everybody that no company out there has a person who understands all the problems, front to back, from technical details about virtualization to specific problems of SOA deployments in virtualized environments.  That&apos;s most probably true.  But how is this difference from non-virtual deployments.  I dare a &lt;q&gt;security professional&lt;/q&gt; to step forward and prove s/he knows all this.  Heck, I can think of a gazillion security-relevant details at low levels which are not known except to people who actually work on that code.&lt;br /&gt;&lt;br /&gt;The organizers claim that they try to keep the sessions clear from being marketing sessions.  Mr Kaspersky certainly didn&apos;t manage to do this, my podium discussion obviously couldn&apos;t (it was after all about three specific implementations), and this virtualization session was a big &lt;q&gt;see, we are more than ever relevant&lt;/q&gt; session byt the security professions (with special plugs of the Center for Internet Security).&lt;br /&gt;&lt;br /&gt;What was is there are sessions which actual practical advice for programmers, i.e., to cure the root of all the evil.  My Thursday session is probably one of the very few exceptions.  And the funny thing is: during my podium session people actually made it known that one of the things they like to hear about at conference is specifically this.&lt;br /&gt;&lt;br /&gt;My opinion thus far: if you are a security professional, CSO, etc, run to San Francisco, don&apos;t walk.  You&apos;ll get plenty of stories you can tell your boss to frighten her/him and give you a large budget and many underlings to have fun with.  You&apos;ll also find people who want to sell you piece of mind and that should be well worth it to you.  After all, you somehow have to spend the money your scared boss throws at you.&lt;br /&gt;&lt;br /&gt;If you actually are interested in fixing the problem, don&apos;t bother.  The organizers don&apos;t either.</description>
  <comments>http://udrepper.livejournal.com/14840.html</comments>
  <category>security</category>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/14567.html</guid>
  <pubDate>Sun, 04 Feb 2007 10:19:25 GMT</pubDate>
  <title>Security Now! podcast</title>
  <link>http://udrepper.livejournal.com/14567.html</link>
  <description>I happened to listen to a few episodes of the &lt;q&gt;Security Now&lt;/q&gt; podcast, by Leo Laporte and Steve Gibson.  It&apos;s mostly Windows stuff, hence uninteresting technically, but it&apos;s an eye opener nevertheless.  And not in the positive sense.  They, well Steve, often makes clueless comments about non-Windows OSes in hos attempt to give every OS its fair share.  But this of course backfires when the comments are wrong or misleading.&lt;br /&gt;&lt;br /&gt;But the worst thing I came across so far is in episode 71, called &quot;Securable&quot;.  That&apos;s a program of Steve&apos;s and of no relevance.  But he tried to explain the NX feature of modern x86/x86-64 processors and this is what he said (see the &lt;a href=&quot;http://www.grc.com/sn/SN-071.txt&quot;&gt;transcript&lt;/a&gt;):&lt;br /&gt;&lt;br /&gt;&lt;blockquote cite=&quot;http://www.grc.com/sn/SN-071.txt&quot;&gt;&lt;br /&gt;[...] what this does is essentially it allows the system to stop virtually all buffer overruns.  And that’s big.  I mean, all the security problems that we encounter with incredibly small exception are buffer overrun attacks. [...]&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;This is really what he thinks, he repeats it with different words later on in the show.&lt;br /&gt;&lt;br /&gt;It seems for him &lt;q&gt;buffer overflow&lt;/q&gt; is synonymous with &lt;q&gt;inject code through a writing over buffer boundaries and then execute that code in place&lt;/q&gt;.  Everybody who deals with security will laugh about such a definition.  These are the first generation buffer flows which were exploited.  At least on platforms which are secure.  In Linux we have for the longest time means to protect against these kinds of attacks, starting from address space randomization to NX emulation.  This does not in any way stop buffer overflows from being a problem.&lt;br /&gt;&lt;br /&gt;Buffer overflows still can be used to redirect program execution.  Overwriting return addresses for return-to-libc exploits (so other libraries), overwriting function pointers elsewhere, overwriting local variables and changing the direction of execution at branch points.  The list goes on.  These kinds of effects of buffer overruns are &lt;b&gt;not&lt;/b&gt; detected by NX.&lt;br /&gt;&lt;br /&gt;There are two ways I can interpret Steve&apos;s comments:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;On Windows, because it is such a soft target, attackers didn&apos;t have to bother with more sophisticated attacks and they really didn&apos;t happen.  In this situation the attackers will simply adapt and use the attack vectors I described above.&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;&lt;li&gt;Steve doesn&apos;t know what he&apos;s talking about and he&apos;s doing his listeners a disservice by suggesting they are almost completely safe just because they enable NX.&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;For Steve&apos;s sake, and Leo&apos;s since he would be guilt by association, let&apos;s go with the first possibility.  But all this means that Windows is years, many years, behind the Unix world when it comes to security.  It might be a rude awakening for some people to find that the new features do not cure all problems.&lt;br /&gt;&lt;br /&gt;Yes, MSFT has copied us on many levels and also implement things like address space randomization and stack canaries.  This will help but only if the features are enabled.  And this is the second eye opener from the show.  Windows has apparently no fine grained control.  This means at the slightest sign of problems the features will be turned off completely.  They mentioned the BIOS and OS control of the NX bit.  Since drivers and many applications are badly written the machines run with the feature turned off.  One point for an easy sysadmin interface, but -100 points for security.&lt;br /&gt;&lt;br /&gt;I think everybody who hopes that with the (slow) proliferation of MSFT&apos;s new OS release the Internet will be more secure is gravely mistaken.  There are still not enough security features in place and those which are in place will be turned off.  Heck, or they are not even implemented.  Apparently several of the security features are not implemented in the 32-bit version to maintain compatibility.&lt;br /&gt;&lt;br /&gt;This is very, very wrong.  But it&apos;s been MSFT&apos;s goal, don&apos;t piss of the customer even if it&apos;s technically wrong and it causes huge problems for everybody.  I&apos;m a strong advocate of security over backward compatibility if there is a good reason.  But usually it does not come to this because you can strengthen security without compromising backward compatibility.  Case in point: see how we implemented non-executable stacks.  Old programs continue to run while almost all new code automatically gets protected.  And the case with, automatically again, get flagged as requiring an executable stack got fixed.  It is one of Red Hat&apos;s release requirements that no binaries needs stack execution permission.&lt;br /&gt;&lt;br /&gt;One last thing: it&apos;s really amusing to see that x86-64 pick-up (I mean real 64-bit code) is so slow on Windows.  For the last 3 years I haven&apos;t been using any 32-bit machine except my laptop.  This is no isolated case in the Linux world, we are well on the way to make 32-bit obsolete.</description>
  <comments>http://udrepper.livejournal.com/14567.html</comments>
  <category>security</category>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/14249.html</guid>
  <pubDate>Sat, 03 Feb 2007 08:41:52 GMT</pubDate>
  <title>Lock-Free Datastructures</title>
  <link>http://udrepper.livejournal.com/14249.html</link>
  <description>I looked at an unfortunately quite widely cited paper about lock-free operations:&lt;br /&gt;&lt;br /&gt; &lt;a href=&quot;http://www.grame.fr/pub/fober-JIM2002.pdf&quot;&gt;Lock-Free Techniques for Concurrent Access to Shared Objects&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;q&gt;Shared object&lt;/q&gt; here refers to a data structure.  The paper describes how to use compare-and-exchange to avoid mutual exclusion.  This is the second example on the way to derive a working LIFO implementation:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;lifo-pop (lf: pointer to lifo): pointer to cell
       C1:     loop
       C2:          head = lf-&amp;gt;top                # get the top cell of the lifo
       C3:          if head == NULL
       C4:               return NULL              # LIFO is empty
       C5:          endif
       C6:          next = head-&amp;gt;next             # get the next cell of cell
       C7:          if CAS (&amp;lf-&amp;gt;top, head, next) # try to set the top of the lifo to the next cell
       C8:               break
       C9:          endif
       C10: endloop
       C11: return head
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;The authors correctly point out that this code has a so-called ABA problem on preemption after C6.  Assume another thread is scheduled and it pops the top element,  pushes one or more other elements, and then pushes the initially popped element again (it&apos;s here only about the memory address, the use can differ, so this makes actually sense).  If then the first thread continues with C7 it will see the CAS operation succeed and it&apos;ll screw up the list.  The solution is a double-word CAS operation as provided by x86 and x86-64.  Hurray!  I guess this is why Intel cites the paper in a few documents, it makes them look good.&lt;br /&gt;&lt;br /&gt;But wait, if they care about preemption as they must, what about preemption between C2 and C6?  In this case the top element (pointed to by &lt;tt&gt;head&lt;/tt&gt;) can be popped and suddenly the pointer reference in C6 can go bad.  This problem is never pointed out and people cite this paper to point out how double-word CAS saves the day.&lt;br /&gt;&lt;br /&gt;Fact is, you have to write a very special, expensive, and limited data structure for the code to work.  The pointer dereference must never fail.  This mean the memory used for the LIFO elements must never be freed.  This means equipping the LIFO data structure with its own memory allocator.  A small, specialized one, but an allocator nonetheless.  This is a problem in case the number of elements can grow large since after that none of the elements can be freed again.  At least in general.  An implementations could try to determine that no thread holds a reference to an element and in this case proceed with freeing it.  But the whole point behind lock-free data structures is that they are supposed to be fast.  Adding what amounts to a simple RCU (Read Copy Update) implementation plus the memory allocator is contra-productive.&lt;br /&gt;&lt;br /&gt;I guess what I mean to say here is:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;br /&gt;&lt;li&gt;lock-free data structures with today&apos;s processor technology is limited to very few limited uses&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;&lt;li&gt;take the paper above with a huge grain of salt, it&apos;s typical academia work, without relevance to practice&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;For now mutual exclusion is the best solution for most data structures, at least those which are general enough to have to deal with dynamically allocated memory.  This will change in future, maybe in the not so distant future in fact.</description>
  <comments>http://udrepper.livejournal.com/14249.html</comments>
  <category>programming</category>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/13851.html</guid>
  <pubDate>Tue, 30 Jan 2007 16:43:47 GMT</pubDate>
  <title>So close, but no cigar</title>
  <link>http://udrepper.livejournal.com/13851.html</link>
  <description>It&apos;s nice to see some people actually look at their DSO&apos;s and rewrite them to not be resource hogs.  One late example is &lt;a href=&quot;http://www.barisione.org/blog.html?p=55&quot;&gt;this PCRE&lt;/a&gt; code and the optimization done by one Marco Barisione who should be applauded for starting the work.  But then this:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;
const char *_pcre_ucp_names =
  &quot;Any\0&quot;
  &quot;Arabic\0&quot;
  &quot;Armenian\0&quot;
  ...
  &quot;Zs&quot;;
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;This is a global variable.  Anybody seeing what is wrong?&lt;br /&gt;&lt;br /&gt;What this does is define a variable in .data (it&apos;s modifiable) which points to a constant string.  This means&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;An additional variable&lt;/li&gt;&lt;br /&gt;&lt;li&gt;More attack points, the variable is writable&lt;/li&gt;&lt;br /&gt;&lt;li&gt;An additional relocation&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Getting the string address requires a memory load and accessing the string itself requires two memory loads&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;People, think before writing code!  All that is needed here is name for the memory area containing the constant string.  I.e.:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;
const char _pcre_ucp_names[] =
  &quot;Any\0&quot;
  &quot;Arabic\0&quot;
  &quot;Armenian\0&quot;
  ...
  &quot;Zs&quot;;
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;See the difference?  This one character removed and two added make all the difference in the world.  The binary is smaller (at least 32 bytes on x86-64, more counting the simpler memory access in the actual code), one less relative relocation, faster code at runtime since the code to compute the string address needs no memory access.</description>
  <comments>http://udrepper.livejournal.com/13851.html</comments>
  <category>programming</category>
  <lj:security>public</lj:security>
</item>
<item>
  <guid isPermaLink='true'>http://udrepper.livejournal.com/13698.html</guid>
  <pubDate>Mon, 29 Jan 2007 08:49:36 GMT</pubDate>
  <title>RSA conference</title>
  <link>http://udrepper.livejournal.com/13698.html</link>
  <description>I perhaps should mention that I&apos;ll be talking at the RSA conference in San Francisco on February 6&lt;sup&gt;th&lt;/sup&gt; and 8&lt;sup&gt;th&lt;/sup&gt;.  I don&apos;t know yet whether I&apos;ll be around outside of these two times.  There are not too many other talks which I am interested in.  Two I found conflict with my own appearances.  I have a few others but hardly anything which deals with secure development and system software design.  Maybe somebody has some proposals.</description>
  <comments>http://udrepper.livejournal.com/13698.html</comments>
  <category>security</category>
  <lj:security>public</lj:security>
</item>
</channel>
</rss>
