|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +nav-class: dark |
| 4 | +categories: christian |
| 5 | +title: "Hashing and Matching" |
| 6 | +author-id: christian |
| 7 | +author-name: Christian Mazakas |
| 8 | +--- |
| 9 | + |
| 10 | +## Boost.Hash2 |
| 11 | + |
| 12 | +I'm happy to report that the library I helped Peter Dimov develop, [Hash2](https://github.com/pdimov/hash2), was accepted |
| 13 | +after probably one of the most thorough Boost reviews to have happened in recent history. |
| 14 | + |
| 15 | +I can't claim to have contributed all too much to the design. After all, Hash2 was an implementation of |
| 16 | +[Types Don't Know #](https://open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html). |
| 17 | +But I did come along and help implement myriad algorithms and help with the absolutely massive testing burden. |
| 18 | +Interestingly, I think people who don't sit and write/maintain Boost libraries all day underestimate just how much testing |
| 19 | +even something like 10 extra lines of source can require. When you write software with a certain minimum bar of quality, |
| 20 | +almost all your effort and time go into testing it than anything else. This is because if a Boost library gets it wrong, |
| 21 | +there's really no good way to unwind that. Bad versions of `libboost-dev` will have already gone out and then packagers |
| 22 | +need to re-package and the whole thing is a huge debacle for users and packagers. |
| 23 | + |
| 24 | +Working on Hash2 is fun and engaging and more importantly, it finally gives C++ developers something reputable and makes |
| 25 | +hashing as easy as it should be. The only problem with Hash2 that I can think of as a user of Boost would be that it took |
| 26 | +until 2024 (and now basically 2025) for Boost to have simple and effective hashing routines. |
| 27 | + |
| 28 | +For example, |
| 29 | +```cpp |
| 30 | +std::string get_digest(std::vector<char> const& buf) |
| 31 | +{ |
| 32 | + boost::hash2::sha2_256 h; |
| 33 | + h.update(buf.data(), buf.size()); |
| 34 | + return to_string(h.result()); |
| 35 | +} |
| 36 | +``` |
| 37 | +
|
| 38 | +Very simple, very nice and clean and does what it says on the box. |
| 39 | +
|
| 40 | +The version of the library that was accepted is also far from the final version as well. The library will continue to evolve |
| 41 | +and quality of implementation will be iterated on and the interfaces will naturally be refined. It's good for reviewers and |
| 42 | +authors of Boost libraries to keep in mind that libraries aren't some static thing that are carved out of stone. The accepted |
| 43 | +version of a Boost library is very seldom similar to the version 4 releases down the line. Boost.Asio is probably the most |
| 44 | +emblematic of this, having undergone dramatic refactors over the years. |
| 45 | +
|
| 46 | +One thing I'm particularly looking forward to is experimenting with sha-256 intrinsics available on certain Intel CPUs but |
| 47 | +that'll come later once the base sha2 family has had a nice performance overhaul and maybe a few new algorithms are also added |
| 48 | +to the collection. |
| 49 | +
|
| 50 | +## Boost.Regex |
| 51 | +
|
| 52 | +I've also started working with Boost.Regex author John Maddock to squash CVEs filed against Boost.Regex found by Google's oss-fuzz |
| 53 | +project, which is a wonderful contribution to the world of open-source. |
| 54 | +
|
| 55 | +While as developers we may use regexes in our day-to-day lives, it's an entirely different world to actually implement a regex engine. |
| 56 | +Learning what goes into this has been fascinating and I do have to admit, I'm tremendously humbled by John's prowess and ability |
| 57 | +to navigate the complexities of the state machine construction. In a similar vein, I'm equally impressed at just how effective fuzzing is |
| 58 | +at crafting input. I've known about fuzzing for a good bit of time now as most modern software developers do but I've never stopped to |
| 59 | +sit down and truly appreciate just how valuable these tools are. |
| 60 | +
|
| 61 | +One of the first things I essentially had to do for the repo was get it to a place where clangd could handle the generated compiled_commands.json. |
| 62 | +clangd is one of the better developer tools to come out but has caveats in that it can't handle non-self-contained header files and old school Boost |
| 63 | +libraries like to `#include` various implementation headers at the bottom. Fixing this for clangd normally requires recursive `#include`s or just not |
| 64 | +using implementation headers. In most cases, it's easiest to deal with the recursion and solve it by simply just adding `#pragma once`. Because this is |
| 65 | +Boost, the Config module even has all the macros that help detect when `#pragma once` is available so we can support all the compilers and all the |
| 66 | +toolchains no matter what. |
| 67 | +
|
| 68 | +I look forward to continuing to work with John on Regex but in the interim, I'm having fun taking a Hash2 break. |
| 69 | +
|
| 70 | +-- Christian |
0 commit comments