- 08 Nov, 2015 1 commit
-
-
Andrew Jeffery authored
The cosine similarity measure[1] (O(m + n)) contributes a decent runtime reduction when used as a filter prior to execution of more expensive algorithms such as LCS[2] (O(m * n)). A private test set of 3500 strings was used to quantify the improvement. The shape of the test set is described by Python's Pandas module as: >>> frames.apply(len).describe() count 3500.000000 mean 47.454286 std 14.980197 min 10.000000 25% 37.000000 50% 45.000000 75% 61.000000 max 109.000000 dtype: float64 >>> The tests were performed on a lightly loaded Lenovo X201s stocked with a Intel Core i7 L 640 @ 2.13GHz CPU with 3.7 GiB of memory. The test was compiled with GCC 4.9.3: $ gcc --version gcc (Gentoo 4.9.3 p1.0, pie-0.6.2) 4.9.3 Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Using the test program outlined below, ten runs for input set sizes incrementing in batches of 500 were taken prior to filtering with cosine similarity: 500: 0.61, 0.25, 0.08, 0.07, 0.07, 0.07, 0.09, 0.07, 0.07, 0.07 1000: 0.33, 0.32, 0.34, 0.32, 0.32, 0.33, 0.32, 0.32, 0.34, 0.32 1500: 0.72, 1.53, 0.72, 0.70, 0.72, 0.70, 0.72, 0.71, 1.46, 0.71 2000: 1.22, 1.20, 1.22, 1.98, 1.20, 1.20, 1.22, 1.94, 1.20, 1.20 2500: 1.97, 2.72, 1.94, 2.33, 2.44, 1.94, 2.82, 1.93, 1.94, 2.69 3000: 2.69, 3.41, 2.66, 3.38, 2.67, 3.42, 2.63, 3.44, 2.65, 3.39 3500: 4.18, 3.65, 4.21, 4.21, 3.56, 4.21, 4.16, 3.63, 4.20, 4.13 After adding the cosine similarity filter the runtimes became: 500: 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02 1000: 0.08, 0.07, 0.07, 0.07, 0.08, 0.07, 0.07, 0.08, 0.07, 0.07 1500: 0.16, 0.16, 0.15, 0.16, 0.16, 0.15, 0.15, 0.15, 0.16, 0.16 2000: 0.26, 0.26, 0.25, 0.26, 0.26, 0.26, 0.25, 0.26, 0.26, 0.26 2500: 0.41, 0.41, 0.41, 0.40, 0.42, 0.42, 0.42, 0.41, 0.41, 0.41 3000: 0.58, 0.56, 0.57, 0.56, 0.58, 0.57, 0.56, 0.56, 0.57, 0.55 3500: 0.75, 0.74, 0.73, 0.74, 0.74, 0.73, 0.72, 0.75, 0.75, 0.75 As such, on average the runtime improvements are: N Avg Pre Avg Post Improvement (Pre / Post) 500 0.145 0.02 7.25 1000 0.326 0.073 4.47 1500 0.869 0.156 5.57 2000 1.358 0.258 5.26 2500 2.272 0.412 5.51 3000 3.034 0.566 5.36 3500 4.014 0.74 5.42 The test driver is as below, where both it and its dependencies were compiled with 'CFLAGS=-O2 -fopenmp' and linked with 'LDFLAGS=-fopenmp', 'LDLIBS=-lm': $ cat test.c #include "config.h" #include <stdio.h> #include <stdlib.h> #include <string.h> #include "ccan/strgrp/strgrp.h" int main(void) { FILE *f; char *buf; struct strgrp *ctx; f = fdopen(0, "r"); #define BUF_SIZE 512 buf = malloc(BUF_SIZE); ctx = strgrp_new(0.85); while(fgets(buf, BUF_SIZE, f)) { buf[strcspn(buf, "\r\n")] = '\0'; if (!strgrp_add(ctx, buf, NULL)) { printf("Failed to classify %s\n", buf); } } strgrp_free(ctx); free(buf); fclose(f); return 0; } [1] https://en.wikipedia.org/wiki/Cosine_similarity [2] https://en.wikipedia.org/wiki/Longest_common_subsequence_problem
-
- 03 Nov, 2015 2 commits
-
-
David Gibson authored
Add an enum to record error codes for aga routines. The current algorithms, dfs and bfs don't have any error conditions except those reported by callbacks. So, for now, the only code is "no error", but this will be expanded in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
David Gibson authored
The aga_for_each_edge_info macro is constructed so that if the edge_info callback returns an error, the for loop terminates early and leaves the _err parameter set to the error. On successful completion of the loop, _err should be zero. However, on a node with no edges, _err will not be initialized, meaning that it could be non-zero even on successful (trivial) completion of the loop. This fixes the bug. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
- 02 Nov, 2015 1 commit
-
-
David Gibson authored
Add block comments in the usual ccan format describing a number of the important functions in the aga module. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
- 27 Oct, 2015 1 commit
-
-
David Gibson authored
Simple, slow priority queue implementation. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
- 25 Oct, 2015 8 commits
-
-
David Gibson authored
The interfaces the lqueue module currently has implement (partial) type safety in a somewhat clunky way - types and member names need to be passed to a number of entry points. This patch uses the new TCON_CONTAINER magic to stash the typing information into the declaration of the queue object, so it no longer needs to be explicitly passed to later calls. This does alter the lqueue interface incompatibly. I think the module is young enough that we can reasonably do that. One other module, 'aga', was using lqueue, and this also includes fixes so that it still works. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
David Gibson authored
The interfaces the lstack module currently has implement (partial) type safety in a somewhat clunk way - types and member names need to be passed to a number of entry points. This patch uses the new TCON_CONTAINER magic to stash the typing information into the declaration of the stack object, so it no longer needs to be explicitly passed to later calls. This does alter the lstack interface incompatibly. I think the module is young enough that we can reasonably do that. One other module, 'aga', was using lstack, and this also includes fixes so that it still works. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
David Gibson authored
Add "container canaries" to tcon. This allows information about a specific member of a container structure to be encoded with TCON or TCON_WRAP. Once that's done, tcon_container_of() and tcon_member_of() can be used to translate between member and container pointers based on the canary information, without having to repeat the type and member details. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
David Gibson authored
Add the ability to encode, as well as types, integer values (must be positive, compile time constant and in the range of size_t) into TCON constructed "type" canaries. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
David Gibson authored
Add a new tcon_sizeof() helper macro which returns the size of a type for which a canary has been constructed with TCON or TCON_WRAP. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
David Gibson authored
The tcon module allows you to add "type canaries" to a structures, which can be used for later typechecks. The canaries are implemented using a flexible array member, to avoid them taking any actual space at runtime. That means the canaries must go at the end of your structure. That doesn't seem like a big limitation, except that it also means the structure containing the canaries must be at the end of any structure it is embedded in in turn, which is a rather more serious limitation. This patch adds a TCON_WRAP() macro which wraps a given type in a new type which also contains type canaries, and doesn't suffer the last member limitation. The drawback is that if the wrapped type has smaller size than a pointer, the type canaries will pad the wrapped type out to the size of a pointer. By constructing the wrappers carefully, the existing tcon macros will work on both wrapper types constructed with TCON_WRAP and on structures explicitly including TCON type canaries. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
David Gibson authored
The lstack and lqueue modules are entirely implemented in a single header. However, in Makefile-ccan they're listed in MODS_WITH_SRC, instead of MODS_NO_SRC. This appears to be harmless, but this patch moves them to the correct category anyway. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
David Gibson authored
By construction these functions depend only on their arguments, so declare them as CONST_FUNCTION using the helper from ccan/compiler. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
- 21 Oct, 2015 2 commits
-
-
Paul Wayper authored
-
Rusty Russell authored
We should be allocating them with opt's allocator (we use it to free them!). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
-
- 20 Oct, 2015 1 commit
-
-
Rusty Russell authored
Closes: #32 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
-
- 18 Oct, 2015 2 commits
-
-
David Gibson authored
In commit 37825438 "order: Scalar comparison functions", I accidentally checked in a file which didn't belong with that commit, and was actually from a module I was experimenting with but wasn't ready to commit. This cleans up the bogus extra file. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
David Gibson authored
Add a wrapper macro total_order_cmp() to more conveniently use the total_order structures. Add some tests for it, which also improve tests the "fancy_cmp" function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
- 16 Oct, 2015 1 commit
-
-
Rusty Russell authored
It used strlen() on the source, which might not be valid. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
-
- 15 Oct, 2015 4 commits
-
-
Rusty Russell authored
Best of both worlds. Before: 1: 6ns 2: 7ns 4: 7ns 8: 7ns 16: 7ns 32: 8ns 64: 9ns 128: 13ns 256: 24ns 512: 47ns 1024: 92ns 2048: 185ns 4096: 376ns 8192: 739ns 16384: 1463ns 32768: 2914ns 65536: 5800ns 2: 7ns 3: 7ns 5: 7ns 9: 7ns 17: 7ns 33: 8ns 65: 9ns 129: 20ns 257: 31ns 513: 49ns 1025: 96ns 2049: 189ns 4097: 381ns 8193: 745ns 16385: 1477ns 32769: 2930ns 65537: 5824ns total = 599391004 After: 1: 3ns 2: 3ns 4: 4ns 8: 5ns 16: 12ns 32: 13ns 64: 15ns 128: 19ns 256: 25ns 512: 35ns 1024: 57ns 2048: 105ns 4096: 183ns 8192: 324ns 16384: 607ns 32768: 1317ns 65536: 2774ns 2: 3ns 3: 3ns 5: 4ns 9: 6ns 17: 11ns 33: 13ns 65: 14ns 129: 19ns 257: 24ns 513: 35ns 1025: 57ns 2049: 106ns 4097: 183ns 8193: 324ns 16385: 607ns 32769: 1315ns 65537: 2773ns total = 599391004 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
-
Rusty Russell authored
Better for larger, worse for smaller compares. Before: 1: 3ns 2: 3ns 4: 5ns 8: 9ns 16: 11ns 32: 33ns 64: 45ns 128: 87ns 256: 157ns 512: 296ns 1024: 579ns 2048: 1139ns 4096: 2251ns 8192: 4505ns 16384: 9704ns 32768: 18482ns 65536: 36144ns 2: 4ns 3: 6ns 5: 8ns 9: 9ns 17: 12ns 33: 22ns 65: 45ns 129: 90ns 257: 175ns 513: 357ns 1025: 607ns 2049: 1204ns 4097: 2278ns 8193: 4552ns 16385: 9011ns 32769: 18405ns 65537: 36153ns total = 599391004 After: 1: 6ns 2: 7ns 4: 7ns 8: 7ns 16: 7ns 32: 8ns 64: 9ns 128: 13ns 256: 24ns 512: 47ns 1024: 92ns 2048: 185ns 4096: 376ns 8192: 739ns 16384: 1463ns 32768: 2914ns 65536: 5800ns 2: 7ns 3: 7ns 5: 7ns 9: 7ns 17: 7ns 33: 8ns 65: 9ns 129: 20ns 257: 31ns 513: 49ns 1025: 96ns 2049: 189ns 4097: 381ns 8193: 745ns 16385: 1477ns 32769: 2930ns 65537: 5824ns total = 599391004 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
-
Rusty Russell authored
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
-
Rusty Russell authored
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
-
- 09 Oct, 2015 2 commits
-
-
Rusty Russell authored
Seems to. Numbers are noisy, but before was 5 min 32 sec, after this commit was 3 min 42 sec. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
-
Rusty Russell authored
I've filed bugs to get those dev packages whitelisted: https://github.com/travis-ci/apt-package-whitelist/issues/1366 https://github.com/travis-ci/apt-package-whitelist/issues/1367 https://github.com/travis-ci/apt-package-whitelist/issues/1368 https://github.com/travis-ci/apt-package-whitelist/issues/1369Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
-
- 29 Sep, 2015 1 commit
-
-
Rusty Russell authored
1) Require "" around input 2) Make them optional around output: if not there, loose match whitespace 3) Handle \n in output. 4) Document that "Given xxx" is optional. 5) Reject any non-matching comment lines starting with "given" or "outputs" 6) Fix missed test in ccan/cast Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
-
- 28 Sep, 2015 1 commit
-
-
David Gibson authored
New module. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
- 24 Sep, 2015 2 commits
-
-
Rusty Russell authored
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
-
Rusty Russell authored
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
-
- 14 Sep, 2015 1 commit
-
-
David Gibson authored
Commit 63f13d64 "strgrp: Tidy up kerneldoc in _info" introduced some compile errors into the example in strgrp/_info. This fixes them. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
- 13 Sep, 2015 3 commits
-
-
David Gibson authored
Add a memswap() function to the mem module, which exchanges two (equal sized, non-overlapping) memory regions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
David Gibson authored
The test is simple, but every time I do it by hand, I always spend ages convincing myself it's actually correct. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
David Gibson authored
The mem module declares array_size as a test dependency, and includes it in test/api.c, but doesn't actually use it. This removes the unneeded dependency. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
- 12 Sep, 2015 1 commit
-
-
Andrew Jeffery authored
The documentation as it stood rendered badly in HTML due to a lack of knowledge of kerneldoc formatting.
-
- 09 Sep, 2015 3 commits
-
-
Andrew Jeffery authored
-
Andrew Jeffery authored
-
Andrew Jeffery authored
-
- 08 Sep, 2015 1 commit
-
-
Andrew Jeffery authored
-
- 06 Sep, 2015 2 commits
-
-
Cody P Schafer authored
Signed-off-by: Cody P Schafer <dev@codyps.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-
Cody P Schafer authored
This might allow compilers that support the anotation to make better choices when optimizing, and all these functions meet the requirements for being marked pure. Signed-off-by: Cody P Schafer <dev@codyps.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-