Skip to content

Conversation

@thomasjm
Copy link
Contributor

Hi @Wulf , thanks for merging #116! However, the one after that, #117, somehow removed the "resolved" and "integrity" fields from package-lock.json. This makes it impossible to package in Nixpkgs.

In this PR I just ran a clean npm install to regenerate package-lock.json with these fields.

@thomasjm
Copy link
Contributor Author

@Wulf? I'd really like to fix the breakage from #117 so that I can package this downstream.

@Wulf
Copy link
Owner

Wulf commented Dec 23, 2025

@thomasjm thanks for your patience here. This should be fixed on the main branch :)

Please let me know if there's anything else I can help with

@Wulf Wulf closed this Dec 23, 2025
@thomasjm
Copy link
Contributor Author

Thanks @Wulf! There is something actually...after I made this PR I believe I found a memory safety bug, which can cause nodehun to segfault.

Let me talk you through it in the order I worked through this in my main branch. First I added 3d1fa37 and f6a00b6, to run CI on a range of Node versions from 18.x from 25.x.

After this I started noticing segfaults in CI, more commonly on certain Node.js versions and (IIRC) more commonly on macOS. I ultimately traced it to this line:

context = new HunspellContext(new Hunspell(affixBuffer.Data(), dictionaryBuffer.Data(), NULL, true));

AFAICT, calling .Data() on one of these buffers returns the underlying data buffer, but does not ensure it is null-terminated as Hunspell expects! So when Hunspell is trying to process the dictionary data, it will read off the end until the next NULL, which can cause a memory access error. (And probably a potential security vulnerability.)

One fix is to convert these data buffers to a std::string, like this:

    std::string affixStr(affixBuffer.Data(), affixBuffer.Length());
    std::string dictionaryStr(dictionaryBuffer.Data(), dictionaryBuffer.Length());

and then pass that to Hunspell. That's basically what I did in 349a8bf. But because this involves making an additional copy, I thought it seemed inefficient so I also started working on an alternative API option where you just pass the file paths in, which Hunspell can also accept. I'm not sure if this is the best API decision, but it allows almost everything to remain unchanged and accept either a buffer (as it does currently) or a path. I just had to one new path-based function called addDictionaryPath (and addDictionaryPathSync), as an analogue to addDictionary/addDictionarySync.

There are also a couple commits where I totally refactor the tests into individual files, in a way that makes it easier to test both the buffer and path-based approaches. Not sure if you'll want this stuff though.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants