Commits · e7a6f910ab9af754018606e6f96018b0447c61bc · Xiaowu Zhang / metadata-collect-agent

14 Oct, 2020 3 commits
- Fix compilation error by forcing CFLAGS=-fPIE · e7a6f910
  Leo Le Bouter authored Oct 14, 2020
  
  e7a6f910
- cargo update; also psutil-rust fork changes were merged upstream · 0957b7ac
  Leo Le Bouter authored Oct 14, 2020
  
  0957b7ac
- dracut.module: Print correct names for missing env vars · 56107efc
  Leo Le Bouter authored Oct 14, 2020
  
  56107efc
30 Sep, 2020 2 commits
- debian.package: Make path of UEFI application compliant to standard · b6936735
  Leo Le Bouter authored Sep 30, 2020
```
UEFI standard asks that UEFI applications are located in an
/EFI/vendor folder. We therefore place them within /EFI/Nexedi
inside the EFI System Partition.
```
  b6936735
- debian.package: BootOrder is modified automatically on new entry · 479194ea
  Leo Le Bouter authored Sep 30, 2020
  
  479194ea
15 Sep, 2020 3 commits
- debian.package: Modify boot order · d1d018e5
  Leo Le Bouter authored Sep 15, 2020
```
Put our UEFI boot application just before the currently booted element in the list.
```
  d1d018e5
- debian.package: use xfs_io for online label update · 33fb9beb
  Leo Le Bouter authored Sep 15, 2020
  
  33fb9beb
- cargo update and rayon 1.3.1 -> 1.4.0 · 4ff8ab61
  Leo Le Bouter authored Sep 15, 2020
  
  4ff8ab61
14 Sep, 2020 1 commit

Move back to JSON from MsgPack, change upload module in ERP5 · 542c9935

Leo Le Bouter authored Sep 14, 2020

The Python msgpack library does not deserialize MsgPack data
created with Rust's rmp-serde well.

Upload to Computer Metadata Snapshot module recently created for
ERP5.

542c9935

08 Sep, 2020 10 commits
- Add experimental GNU Guix package for Python metadata-collect-agent · 6d98aefb
  Leo Le Bouter authored Sep 08, 2020
```
Currently GNU Guix does not support private git repos, so the
origin is only a placeholder until it works. The package otherwise
does work when supplying the origin from a local or public URL.
```
  6d98aefb
- debian.package: mkdir -p to create whole tree · ebfcc1fa
  Leo Le Bouter authored Sep 08, 2020
  
  ebfcc1fa
- debian.package: dracut --force, add EFI boot entry once · 67adf281
  Leo Le Bouter authored Sep 08, 2020
```
Also, only btrfs needs mountpoint if filesystem is mounted.
```
  67adf281
- debian.package: chmod/chown secboot.cer properly · 381d699a
  Leo Le Bouter authored Sep 08, 2020
  
  381d699a
- Remove .gitkeep, adapt Makefile · 5c5aa429
  Leo Le Bouter authored Sep 08, 2020
```
The .gitkeep file was getting into the debian package causing
spurious errors during install.
```
  5c5aa429
- dracut.module: Put env guards inside rule so install works without · b77bed4d
  Leo Le Bouter authored Sep 08, 2020
  
  b77bed4d
- Include DER cert in deb and refactor Makefile · d5c7d270
  Leo Le Bouter authored Sep 08, 2020
  
  d5c7d270
- Add clean in Makefile and use mountpoint instead of device · eee35c0f
  Leo Le Bouter authored Sep 08, 2020
```
Using mountpoint instead of device is required to change the label
of a currently mounted filesystem.
```
  eee35c0f
- Add small ergonomic change from jm review · 51bc37ba
  Leo Le Bouter authored Sep 08, 2020
  
  51bc37ba
- Add Makefile to build simple debian package with local kernel · c2b83bac
  Leo Le Bouter authored Sep 08, 2020
  
  c2b83bac
21 Aug, 2020 2 commits

Add metadata-collect dracut module · 025a9bea

Leo Le Bouter authored Aug 22, 2020

To install the dracut module on your current system, change into
the dracut.module directory then run:

```
$ ERP5_USER="user" ERP5_PASS="pass" \
  ERP5_BASE_URL="https://example.local/erp5" \
  make
$ sudo make install
```

To uninstall:

```
$ sudo make uninstall
```

Then in a dracut.conf file, to include it you can add:

```
add_dracutmodules="metadata-collect"
```

You will also need to append "ip=dhcp rd.neednet=1" to the
kernel_cmdline directive inside the dracut.conf so that the
initramfs requests networking for the agent to upload results.

Make sure the dracut network modules are installed, on Debian
that is the dracut-network package.
You can otherwise check their presence using:

```
$ dracut --list-modules | grep network
```

There you should see a few modules.

025a9bea

Use rustls instead of openssl · 4d94b540

Leo Le Bouter authored Aug 21, 2020

With rustls it's easier to embed the root CA certificates inside
the compiled binary itself using the webpki-roots crate. We need to
do this because it's the easiest way of getting TLS certificate
validation working inside the initramfs where /etc/ssl/certs or
else does not exist.

4d94b540

20 Aug, 2020 1 commit

Rewrite in Rust to obtain standalone static binary · d2277063

Leo Le Bouter authored Aug 20, 2020

In contradiction with Jean-Paul's guidelines on not using Rust due
to lack of knowledge about it inside Nexedi, I am using it here
because it is the fastest way for me to get a working standalone
static binary, I know that language best. Considering we must be
getting results ASAP, this is the best strategy for me. We may
later rewrite it in another language if necessary.

A shell script is included to build the static binary, you need
to install rustup to get rust for musl, an alternative libc that
allows to create real static binaries that embed libc itself too.

Rustup can be found at: https://rustup.rs/

You can get a musl toolchain with:
  $ rustup target add x86_64-unknown-linux-musl

The acl library is being downloaded and built as a static library
by the script, and the rust build system will also build a vendored
copy of openssl as a static library.

Parallel hashing is done a bit differently in that Rust version,
only files contained in the currently processed directories will be
hashed in parallel. If there is a single big file in a directory
hashing will be stuck on that file until it's done and it goes onto
the next directory. To clarify, each file is only hashed on a
single thread, the Python version also does this, it just keeps the
number of files being hashed in parallel to a constant number as
long as there is more files to process, this version will only hash
with one thread per file in the currently processed directory. It
was done that way for sake of simplicity but we can implement an
offload threadpool to mimick what was done in Python later on.

d2277063

19 Aug, 2020 1 commit
- Use certifi for more portable TLS cert validation · 4256ada0
  Leo Le Bouter authored Aug 19, 2020
  
  4256ada0
18 Aug, 2020 2 commits

Add setup script · e63cabb5
Leo Le Bouter authored Aug 19, 2020

e63cabb5

Upload results to ERP5 · 7d922faa

Leo Le Bouter authored Aug 18, 2020

TODO: Find a way to properly increment version without having to
      store any additional state client-side

TODO: Investigate using HATEOAS to talk to ERP5

TODO: Investigate using TLS client certificates to authenticate,
      they would be stored in /boot and would prevent the machine
      from booting if they were invalid or missing so that
      tampering with them is not interesting for an attacker.
      Also, the certificate's Common Name should be the computer
      reference and therefore should be used to construct the
      metadata snapshot document's reference instead of having
      to specify it on the command line.

7d922faa

14 Aug, 2020 2 commits

Formatting · d6bebb62
Leo Le Bouter authored Aug 14, 2020

d6bebb62

Use MsgPack instead of JSON, add command line arguments + bug fixes · 86c55efd

Leo Le Bouter authored Aug 14, 2020

* Convert stat_result to proper dictionary so that field names are
  retained after serialization

* Add ability to ignore directories through command line arguments,
  explicitly add "ignored" field on ignored directories

It was decided that JSON was not a suitable format because bytes
serialization support is lacking. MsgPack supports it and is more
efficient, also it is the internal serialization format for Fluentd
which we will most probably use for ingesting data in a central
place.

86c55efd

13 Aug, 2020 3 commits

do not follow symlinks in getxattr, close mp_pool first · 02a190aa

Leo Le Bouter authored Aug 13, 2020

multiprocessing.Pool.close() ensures no new tasks can be submitted
to the pool and waits for them to all finish. Even though
AsyncResult.get() also waits for the tasks to finish, and our code
structure shouldnt submit new tasks at that point, close() first,
get() then. In the future this could be error-prone in the future
where mp_tasks is modified while results are being merged back and
we miss some results because the iterator wont take these new items
into account *during* iteration.

02a190aa

README: list used hashing algorithms, add benchmark results · 16acbcbc
Leo Le Bouter authored Aug 13, 2020

16acbcbc

xattrs dict must be created first, decode xattrs as utf-8 · 001ed5c5

Leo Le Bouter authored Aug 13, 2020

In Python, the JSON encoder cannot process bytes, the JSON
specification also does not define a "bytes" type. We are
constrained by this in that we cannot serialize data of bytes type.

xattrs can be either strings or bytes, in practice they're likely
representable as strings, therefore, decode as utf-8, error
otherwise. If real world situation of xattrs in true binary format
arise then we will rule out another solution.

001ed5c5

12 Aug, 2020 1 commit
- Initial commit · 562b18bc
  Leo Le Bouter authored Aug 13, 2020
  
  562b18bc