Skip to content

Commit 93bd110

Browse files
cmazakaslouistatta
authored andcommitted
add christian q4 update
1 parent 8fccb7e commit 93bd110

File tree

1 file changed

+70
-0
lines changed

1 file changed

+70
-0
lines changed
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
---
2+
layout: post
3+
nav-class: dark
4+
categories: christian
5+
title: "Hashing and Matching"
6+
author-id: christian
7+
author-name: Christian Mazakas
8+
---
9+
10+
## Boost.Hash2
11+
12+
I'm happy to report that the library I helped Peter Dimov develop, [Hash2](https://github.com/pdimov/hash2), was accepted
13+
after probably one of the most thorough Boost reviews to have happened in recent history.
14+
15+
I can't claim to have contributed all too much to the design. After all, Hash2 was an implementation of
16+
[Types Don't Know #](https://open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html).
17+
But I did come along and help implement myriad algorithms and help with the absolutely massive testing burden.
18+
Interestingly, I think people who don't sit and write/maintain Boost libraries all day underestimate just how much testing
19+
even something like 10 extra lines of source can require. When you write software with a certain minimum bar of quality,
20+
almost all your effort and time go into testing it than anything else. This is because if a Boost library gets it wrong,
21+
there's really no good way to unwind that. Bad versions of `libboost-dev` will have already gone out and then packagers
22+
need to re-package and the whole thing is a huge debacle for users and packagers.
23+
24+
Working on Hash2 is fun and engaging and more importantly, it finally gives C++ developers something reputable and makes
25+
hashing as easy as it should be. The only problem with Hash2 that I can think of as a user of Boost would be that it took
26+
until 2024 (and now basically 2025) for Boost to have simple and effective hashing routines.
27+
28+
For example,
29+
```cpp
30+
std::string get_digest(std::vector<char> const& buf)
31+
{
32+
boost::hash2::sha2_256 h;
33+
h.update(buf.data(), buf.size());
34+
return to_string(h.result());
35+
}
36+
```
37+
38+
Very simple, very nice and clean and does what it says on the box.
39+
40+
The version of the library that was accepted is also far from the final version as well. The library will continue to evolve
41+
and quality of implementation will be iterated on and the interfaces will naturally be refined. It's good for reviewers and
42+
authors of Boost libraries to keep in mind that libraries aren't some static thing that are carved out of stone. The accepted
43+
version of a Boost library is very seldom similar to the version 4 releases down the line. Boost.Asio is probably the most
44+
emblematic of this, having undergone dramatic refactors over the years.
45+
46+
One thing I'm particularly looking forward to is experimenting with sha-256 intrinsics available on certain Intel CPUs but
47+
that'll come later once the base sha2 family has had a nice performance overhaul and maybe a few new algorithms are also added
48+
to the collection.
49+
50+
## Boost.Regex
51+
52+
I've also started working with Boost.Regex author John Maddock to squash CVEs filed against Boost.Regex found by Google's oss-fuzz
53+
project, which is a wonderful contribution to the world of open-source.
54+
55+
While as developers we may use regexes in our day-to-day lives, it's an entirely different world to actually implement a regex engine.
56+
Learning what goes into this has been fascinating and I do have to admit, I'm tremendously humbled by John's prowess and ability
57+
to navigate the complexities of the state machine construction. In a similar vein, I'm equally impressed at just how effective fuzzing is
58+
at crafting input. I've known about fuzzing for a good bit of time now as most modern software developers do but I've never stopped to
59+
sit down and truly appreciate just how valuable these tools are.
60+
61+
One of the first things I essentially had to do for the repo was get it to a place where clangd could handle the generated compiled_commands.json.
62+
clangd is one of the better developer tools to come out but has caveats in that it can't handle non-self-contained header files and old school Boost
63+
libraries like to `#include` various implementation headers at the bottom. Fixing this for clangd normally requires recursive `#include`s or just not
64+
using implementation headers. In most cases, it's easiest to deal with the recursion and solve it by simply just adding `#pragma once`. Because this is
65+
Boost, the Config module even has all the macros that help detect when `#pragma once` is available so we can support all the compilers and all the
66+
toolchains no matter what.
67+
68+
I look forward to continuing to work with John on Regex but in the interim, I'm having fun taking a Hash2 break.
69+
70+
-- Christian

0 commit comments

Comments
 (0)