1. 29 Jan, 2016 1 commit
  2. 27 Jan, 2016 2 commits
  3. 26 Jan, 2016 2 commits
  4. 21 Jan, 2016 1 commit
  5. 19 Jan, 2016 3 commits
  6. 18 Jan, 2016 1 commit
  7. 05 Jan, 2016 1 commit
    • Dan Good's avatar
      deque: check HAVE_STATEMENT_EXPR, tweaks from feedback · 3022d16d
      Dan Good authored
      Thanks to the detailed feedback from David Gibson, I made the
      following improvements:
      
      * add missing includes
      * check for statement expression support, give an error if absent
      * ccanlint directive to skip "without features" steps
      * add license ref to top of source files
      * rename run1.c test to api1.c
      Signed-off-by: default avatarDan Good <dan@dancancode.com>
      3022d16d
  8. 28 Dec, 2015 1 commit
  9. 14 Dec, 2015 1 commit
  10. 10 Dec, 2015 2 commits
  11. 09 Dec, 2015 3 commits
  12. 20 Nov, 2015 12 commits
  13. 15 Nov, 2015 1 commit
    • Andrew Jeffery's avatar
      strgrp: Cache cosine popcounts to reduce computation · 4a439212
      Andrew Jeffery authored
      Neatly, this reduces some of the OpenMP-related noise in the
      implementation by removing the need for the include. The loop body of
      grp_for() is also clearer as a result of the change; the changes to the
      function interfaces motivated a re-think of the complexity here.
      4a439212
  14. 13 Nov, 2015 1 commit
  15. 12 Nov, 2015 1 commit
  16. 08 Nov, 2015 1 commit
    • Andrew Jeffery's avatar
      strgrp: Add cosine similarity filter · e95575c4
      Andrew Jeffery authored
      The cosine similarity measure[1] (O(m + n)) contributes a decent runtime
      reduction when used as a filter prior to execution of more expensive
      algorithms such as LCS[2] (O(m * n)).
      
      A private test set of 3500 strings was used to quantify the improvement.
      The shape of the test set is described by Python's Pandas module as:
      
          >>> frames.apply(len).describe()
          count    3500.000000
          mean       47.454286
          std        14.980197
          min        10.000000
          25%        37.000000
          50%        45.000000
          75%        61.000000
          max       109.000000
          dtype: float64
          >>>
      
      The tests were performed on a lightly loaded Lenovo X201s stocked with a
      Intel Core i7 L 640 @ 2.13GHz CPU with 3.7 GiB of memory. The test was
      compiled with GCC 4.9.3:
      
          $ gcc --version
          gcc (Gentoo 4.9.3 p1.0, pie-0.6.2) 4.9.3
          Copyright (C) 2015 Free Software Foundation, Inc.
          This is free software; see the source for copying conditions.  There is NO
          warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
      
      Using the test program outlined below, ten runs for input set sizes
      incrementing in batches of 500 were taken prior to filtering with cosine
      similarity:
      
          500:  0.61, 0.25, 0.08, 0.07, 0.07, 0.07, 0.09, 0.07, 0.07, 0.07
          1000: 0.33, 0.32, 0.34, 0.32, 0.32, 0.33, 0.32, 0.32, 0.34, 0.32
          1500: 0.72, 1.53, 0.72, 0.70, 0.72, 0.70, 0.72, 0.71, 1.46, 0.71
          2000: 1.22, 1.20, 1.22, 1.98, 1.20, 1.20, 1.22, 1.94, 1.20, 1.20
          2500: 1.97, 2.72, 1.94, 2.33, 2.44, 1.94, 2.82, 1.93, 1.94, 2.69
          3000: 2.69, 3.41, 2.66, 3.38, 2.67, 3.42, 2.63, 3.44, 2.65, 3.39
          3500: 4.18, 3.65, 4.21, 4.21, 3.56, 4.21, 4.16, 3.63, 4.20, 4.13
      
      After adding the cosine similarity filter the runtimes became:
      
          500:  0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02
          1000: 0.08, 0.07, 0.07, 0.07, 0.08, 0.07, 0.07, 0.08, 0.07, 0.07
          1500: 0.16, 0.16, 0.15, 0.16, 0.16, 0.15, 0.15, 0.15, 0.16, 0.16
          2000: 0.26, 0.26, 0.25, 0.26, 0.26, 0.26, 0.25, 0.26, 0.26, 0.26
          2500: 0.41, 0.41, 0.41, 0.40, 0.42, 0.42, 0.42, 0.41, 0.41, 0.41
          3000: 0.58, 0.56, 0.57, 0.56, 0.58, 0.57, 0.56, 0.56, 0.57, 0.55
          3500: 0.75, 0.74, 0.73, 0.74, 0.74, 0.73, 0.72, 0.75, 0.75, 0.75
      
      As such, on average the runtime improvements are:
      
          N       Avg Pre         Avg Post        Improvement (Pre / Post)
          500     0.145           0.02            7.25
          1000    0.326           0.073           4.47
          1500    0.869           0.156           5.57
          2000    1.358           0.258           5.26
          2500    2.272           0.412           5.51
          3000    3.034           0.566           5.36
          3500    4.014           0.74            5.42
      
      The test driver is as below, where both it and its dependencies were
      compiled with 'CFLAGS=-O2 -fopenmp' and linked with 'LDFLAGS=-fopenmp',
      'LDLIBS=-lm':
      
          $ cat test.c
          #include "config.h"
          #include <stdio.h>
          #include <stdlib.h>
          #include <string.h>
          #include "ccan/strgrp/strgrp.h"
      
          int main(void) {
              FILE *f;
              char *buf;
              struct strgrp *ctx;
              f = fdopen(0, "r");
          #define BUF_SIZE 512
              buf = malloc(BUF_SIZE);
              ctx = strgrp_new(0.85);
              while(fgets(buf, BUF_SIZE, f)) {
                  buf[strcspn(buf, "\r\n")] = '\0';
                  if (!strgrp_add(ctx, buf, NULL)) {
                      printf("Failed to classify %s\n", buf);
                  }
              }
              strgrp_free(ctx);
              free(buf);
              fclose(f);
              return 0;
          }
      
      [1] https://en.wikipedia.org/wiki/Cosine_similarity
      [2] https://en.wikipedia.org/wiki/Longest_common_subsequence_problem
      e95575c4
  17. 03 Nov, 2015 2 commits
    • David Gibson's avatar
      aga: Error codes · afae9c82
      David Gibson authored
      Add an enum to record error codes for aga routines.  The current
      algorithms, dfs and bfs don't have any error conditions except those
      reported by callbacks.  So, for now, the only code is "no error", but this
      will be expanded in future.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      afae9c82
    • David Gibson's avatar
      aga: Fix initialization bug in aga_for_each_edge_info · 2ab26c62
      David Gibson authored
      The aga_for_each_edge_info macro is constructed so that if the edge_info
      callback returns an error, the for loop terminates early and leaves the
      _err parameter set to the error.  On successful completion of the loop,
      _err should be zero.
      
      However, on a node with no edges, _err will not be initialized, meaning
      that it could be non-zero even on successful (trivial) completion of the
      loop.  This fixes the bug.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      2ab26c62
  18. 02 Nov, 2015 1 commit
  19. 27 Oct, 2015 1 commit
  20. 25 Oct, 2015 2 commits
    • David Gibson's avatar
      lqueue: Streamline interface with TCON_CONTAINER · 42b7ab48
      David Gibson authored
      The interfaces the lqueue module currently has implement (partial) type
      safety in a somewhat clunky way - types and member names need to be passed
      to a number of entry points.
      
      This patch uses the new TCON_CONTAINER magic to stash the typing
      information into the declaration of the queue object, so it no longer needs
      to be explicitly passed to later calls.
      
      This does alter the lqueue interface incompatibly.  I think the module
      is young enough that we can reasonably do that.  One other module,
      'aga', was using lqueue, and this also includes fixes so that it still
      works.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      42b7ab48
    • David Gibson's avatar
      lstack: Streamline interface with TCON_CONTAINER · e47d78e9
      David Gibson authored
      The interfaces the lstack module currently has implement (partial) type
      safety in a somewhat clunk way - types and member names need to be passed
      to a number of entry points.
      
      This patch uses the new TCON_CONTAINER magic to stash the typing
      information into the declaration of the stack object, so it no longer needs
      to be explicitly passed to later calls.
      
      This does alter the lstack interface incompatibly.  I think the module
      is young enough that we can reasonably do that.  One other module,
      'aga', was using lstack, and this also includes fixes so that it still
      works.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      e47d78e9