Commit d2277063 authored by Leo Le Bouter's avatar Leo Le Bouter

Rewrite in Rust to obtain standalone static binary

In contradiction with Jean-Paul's guidelines on not using Rust due
to lack of knowledge about it inside Nexedi, I am using it here
because it is the fastest way for me to get a working standalone
static binary, I know that language best. Considering we must be
getting results ASAP, this is the best strategy for me. We may
later rewrite it in another language if necessary.

A shell script is included to build the static binary, you need
to install rustup to get rust for musl, an alternative libc that
allows to create real static binaries that embed libc itself too.

Rustup can be found at: https://rustup.rs/

You can get a musl toolchain with:
  $ rustup target add x86_64-unknown-linux-musl

The acl library is being downloaded and built as a static library
by the script, and the rust build system will also build a vendored
copy of openssl as a static library.

Parallel hashing is done a bit differently in that Rust version,
only files contained in the currently processed directories will be
hashed in parallel. If there is a single big file in a directory
hashing will be stuck on that file until it's done and it goes onto
the next directory. To clarify, each file is only hashed on a
single thread, the Python version also does this, it just keeps the
number of files being hashed in parallel to a constant number as
long as there is more files to process, this version will only hash
with one thread per file in the currently processed directory. It
was done that way for sake of simplicity but we can implement an
offload threadpool to mimick what was done in Python later on.
parent 4256ada0
...@@ -2,4 +2,9 @@ test ...@@ -2,4 +2,9 @@ test
test_stderr test_stderr
/env /env
/.vscode /.vscode
/test_dir /test_dir
\ No newline at end of file /acl-*
# Added by cargo
/target
This diff is collapsed.
[package]
name = "metadata-collect-agent"
version = "0.1.0"
authors = ["Leo Le Bouter <leo.le.bouter@nexedi.com>"]
edition = "2018"
[dependencies]
posix-acl = "1.0.0"
xattr = "0.2.2"
md-5 = "0.9.1"
sha-1 = "0.9.1"
sha2 = "0.9.1"
hex = "0.4.2"
anyhow = "1.0.32"
clap = "2.33.3"
psutil = { git = "https://github.com/leo-lb/rust-psutil", branch = "lle-bout/impl-serde", version = "3.1.0", features = ["serde"] }
reqwest = { version = "0.10.7", features = ["blocking", "native-tls-vendored"] }
rmp-serde = "0.14.4"
nix = "0.18.0"
serde = { version = "1.0.115", features = ["derive"] }
base64 = "0.12.3"
rayon = "1.3.1"
[profile.release]
opt-level = 'z'
lto = true
codegen-units = 1
\ No newline at end of file
#!/bin/bash
set -eux
ACLVERSION="2.2.53"
ACLTARGZSHA256=9e905397ac10d06768c63edd0579c34b8431555f2ea8e8f2cee337b31f856805
HOST_TARGET=${HOST_TARGET:-x86_64-unknown-linux-musl}
wget -O "acl-${ACLVERSION}.tar.gz" \
"https://git.savannah.nongnu.org/cgit/acl.git/snapshot/acl-${ACLVERSION}.tar.gz"
echo -n "$ACLTARGZSHA256 acl-${ACLVERSION}.tar.gz" | sha256sum -c /dev/stdin
tar -xvf "acl-${ACLVERSION}.tar.gz"
cd "acl-${ACLVERSION}"
[ -f ./configure ] || ./autogen.sh
[ -f ./config.status ] || ./configure --host "${HOST_TARGET}"
make -j$(nproc)
cd -
RUSTFLAGS="-L native=$(pwd)/acl-${ACLVERSION}/.libs -l static=acl" PKG_CONFIG_ALLOW_CROSS=1 \
cargo build --target "$HOST_TARGET" --release
strip --strip-all "target/$HOST_TARGET/release/$(basename $(pwd))"
objdump -T "target/$HOST_TARGET/release/$(basename $(pwd))"
This diff is collapsed.
  • I'm halfway through reading https://doc.rust-lang.org/stable/book/ and I have to say rust is really excellent.

  • BTW, in slapos!799 (merged) there's the beginning of rust component for slapos (which takes ~2 hours to compile from source and I'm not really sure it's reproductible, the setup seems to download things) and rust support in theia.

  • @jerome I'm glad you find it great! I'm 2 years in and besides the security benefits it brings I find it really consistent and ergonomic, it's pleasant to work with most of the time :-)

  • Hashing several files in parallel looks like a bad idea. We already did it in SlapOS for the resilience and it was a disaster. It's rare that hashing is slower than IO (maybe only a hashing a big file on high-performance NVMe with the slowest algorithm). Most of the time, it's so inefficient that it's slower: at best, it could be faster but the hardware consumes a lot.

    But there may other to parallelizing work. First, by doing IO and hashing in separate threads, i.e. pipelining. Since you compute several hashes, you could use 1 thread per hash.

    In any case, I find such attempt to optimize very premature, in particular if it is planned to rewrite.

  • @jm It's really really slow with a single thread, the use case is whole file system hashing, and it's performed during boot with nothing else running on the system, it's mandatory that it is fast, otherwise we might end up with terrible dozens of minutes boot times. I did some tests before making it parallel and it made it much faster and I/O utilization is 100% now on my fast NVMe drive so it can't be any faster, but before it wasnt 100%, so the first benefit is from doing I/O from multiple threads but then on my system hashing on a single thread is as slow or a bit faster than I/O on a single thread for something like 100MB/s, so you would have to parallelize anyway because it wouldnt keep up hashing data from I/O being done on multiple threads on one thread.

    The hashing functions come from: https://github.com/RustCrypto/hashes - maybe they could be optimized but they already use hand-written assembly in select places for performance so it turns out making it parallel is easier for me than digging into the assembly with cryptographer knowledge that I don't have.

    Note that the performance issues were identical with Python that uses OpenSSL where hash functions are already well optimized there.

    In any case, I find such attempt to optimize very premature, in particular if it is planned to rewrite.

    It's pretty trivial in Rust, add Arc/Mutex on data structure then use Rayon's parallel iterator and it's done, so it wasnt much effort and time spent there.

Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment