• Paulo Marques's avatar
    [PATCH] kallsyms data size reduction / lookup speedup · e1039211
    Paulo Marques authored
    This patch is an improvement over my first kallsyms speedup patch posted about
    2 weeks ago.
    
    It changes scripts/kallsyms as to produce a different format for
    kallsyms_names and extra data to speedup lookups.  The compression algorithm
    is quite simple: it uses all the char codes not actually used in symbols to
    build a lookup table that translates these codes into small strings.  For
    instance, in my test runs the code 0xFE was being translated into "acpi_"
    giving a 4 byte save on every translation.
    
    The advantage of this algorithm is that to translate a symbol we only require
    information that is stored on that symbol position, and never need to go back
    on the compressed stream to get information from other symbols.
    
    To give an idea about the benefits of this algorithm here are some benchmark
    results on a P4 2.8GHz with a symbol table with 10000 entries:
    
    kallsyms_lookup average time:
      vanilla           1346.0 us
      speedup             14.4 us
      with this patch      0.5 us
    
    total data produced by scripts/kallsyms:
      uncompressed         169 Kb
      vanilla              134 Kb
      with this patch       91 Kb
    
    (speedup was my latest patch, that only changed the way kallsyms_lookup worked
    and not the data format)
    
    I removed a cond_resched() from the proc/kallsyms handling code path, because
    using stem compression, if the current position went backwards, the hole
    stream would be uncompressed up to the current position.  It seemed that by
    removing this loop it would be safe to remove the conditional reschedule
    altogether.
    
    There is just one catch with this patch: the time it takes to compile the
    kernel goes up just a bit (about 0.8s on a P4 2.8GHz with defconfig).  If this
    delay is not acceptable I can change the compression algorithm so that it can
    use the previous table (calculating a new table is what consumes most of the
    time, and not doing the actual compression) and check to see if it obtains a
    similar compression ratio.  If it does, then this is a sign that the symbol
    patterns haven't changed that much and this table is still good to use.  This
    would not only cut the time down to half on any compilation (because of the 2
    pass symbol build method), but in frequent cases where a developer is
    compiling a single file and linking everything over and over again, the table
    optimization process would never run.
    
    I'm CC'ing Brent Casavant on this email, because last june he sent a patch
    trying a different approach that used a 32 entry symbol cache, because there
    was a problem with the time "top" took to read "proc/<pid>/wchan".  I was
    hopping he would be willing to test this patch and comment on the results.
    Signed-off-by: default avatarPaulo Marques <pmarques@grupopie.com>
    Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    e1039211
kallsyms.c 15.5 KB