ROX-33198: Instrument inode tracking on file open lsm hook#391
ROX-33198: Instrument inode tracking on file open lsm hook#391JoukoVirtanen wants to merge 19 commits intomainfrom
Conversation
tests/test_inode_tracking.py
Outdated
| """ | ||
| cwd = os.getcwd() | ||
| config = { | ||
| 'paths': [f'{monitored_dir}/**', '/mounted/**', '/container-dir/**'], |
There was a problem hiding this comment.
I am not sure what globbing to use here.
| } | ||
|
|
||
| unsigned long magic = inode->i_sb->s_magic; | ||
| unsigned long magic = BPF_CORE_READ(inode, i_sb, s_magic); |
There was a problem hiding this comment.
inode_to_key needs to be able to use untrusted pointers, so that it can get the key for a parent inode. To get it to be able to use untrusted pointers, BPF_CORE_READ is used.
tests/test_editors/test_nvim.py
Outdated
| fut = '/mounted/test.txt' | ||
| fut_host = f'{ignored_dir}/test.txt' | ||
| fut_backup = f'{fut}~' | ||
| fut_backup_host = f'{ignored_dir}/test.txt~' |
There was a problem hiding this comment.
fut_host and fut_backup_host should probably be blank.
tests/test_editors/test_sed.py
Outdated
| def test_sed(vi_container, server, ignored_dir): | ||
| # File Under Test | ||
| fut = '/mounted/test.txt' | ||
| fut_host = f'{ignored_dir}/test.txt' |
There was a problem hiding this comment.
Same as above.
Molter73
left a comment
There was a problem hiding this comment.
There seems to be a lot of style changes, is this due to you using a newer Rust version? Can you check what version you are using? Is it 1.92+? If this is due to a version difference and yours is newer, can you open a new PR with those style changes? that way we can focus this PR on the actual inode tracking.
Actually, on a second read-through, it might also be related to the bump to version 2024, if so, please move the bump to a separate PR and we can stack this one on top of that one, sorry for the inconvenience! |
Molter73
left a comment
There was a problem hiding this comment.
I still need to go through the integration test changes, but I see them failing in the PR, do you need help debugging the failures?
| self.update_entry(path.as_path()).with_context(|| { | ||
| format!("Failed to update entry for {}", path.display()) | ||
| })?; | ||
| } else if path.is_dir() { |
There was a problem hiding this comment.
This is slightly related to a conversation I've been having with @Stringy. IMO, in the case of finding a directory that matches a glob pattern, we should recursively scan into it, the reason being that tracking the directory itself isn't of much value if the pattern doesn't hit any files inside the directory.
For now we can leave it like this, but I think we need to take a close look at what users might expect when configuring different types of glob patterns (i.e do they want that strict set of patterns to be monitored, or are they expressing a set of paths that work as the base to be monitored?)
| /// We use the parent inode provided by the eBPF code | ||
| /// to look up the parent directory's host path, then construct the full | ||
| /// path by appending the new file's name. | ||
| fn handle_creation_event(&self, event: &Event) -> anyhow::Result<()> { |
There was a problem hiding this comment.
I'm hoping this is still WIP and the debug statements are for testing, we really don't need a debug statement on each return condition, particularly not in the hot path where monitoring sets of files mildly noisy will just flood the logs and make it hard to see what is actually going on.
Please cut down on the debug statements, if something is expected to happen, don't put a statement there.
There was a problem hiding this comment.
I tried removing all the debug statements in this method and managed to boil it down from 50+ lines to just this:
let inode = event.get_inode();
let parent_inode = event.get_parent_inode();
if self.get_host_path(Some(inode)).is_some() || parent_inode.empty() {
return Ok(());
}
if let Some(filename) = event.get_filename().file_name()
&& let Some(parent_host_path) = self.get_host_path(Some(parent_inode))
{
let host_path = parent_host_path.join(filename);
self.update_entry_with_inode(inode, host_path)
.with_context(|| {
format!(
"Failed to add creation event entry for {}",
filename.display()
)
})?;
}
Ok(())Given this is a lot more compact and easy to ready and the additional verbosity in the logs the implementation in the PR doesn't bring much of a benefit IMO, I would propose we try to move the implementation closer to this.
There was a problem hiding this comment.
I have made this change, except I was not able to combine the if statements because the edition is still 2021.
There was a problem hiding this comment.
The PR bumping the edition is already in the main branch, you can rebase and get this working.
8fc3c95 to
4f1686a
Compare
The CI tests are now passing. I think there might be a flake involving unlinking, but that might need to be fixed in the future PR for the file_unlink LSM hook. |
… can now use untrusted pointers
…et the inode and then add to the map
cd4df3f to
2b139ff
Compare
There was a problem hiding this comment.
Could we unify the logic for checking if an inode is monitored in inode_is_monitored? Just have that method take both the inode and parent_inode, then return one of MONITORED, NOT_MONITORED or PARENT_MONITRED values? I think that might be a bit cleaner and we don't need a dedicated _with_parent function.
| } | ||
|
|
||
| /// Similar to update_entry except we are are directly using the inode instead of the path. | ||
| fn update_entry_with_inode(&self, inode: &inode_key_t, path: PathBuf) -> anyhow::Result<()> { |
There was a problem hiding this comment.
I don't think we need to take inode by reference here, if anything, it is causing the call from update_entry to create the inode and then copy it inside here, meaning we copy the data twice. On the other hand, the other usage in handle_creation_event gets a reference to an inode_key_t because it gets if from event.get_inode(), if anything, handle_creation_event should create the copy explicitly instead.
| f'{monitored_dir}', | ||
| f'{monitored_dir}/**/*', | ||
| '/mounted', | ||
| '/mounted/**/*', | ||
| '/container-dir', | ||
| '/container-dir/**/*', |
There was a problem hiding this comment.
Can you explain why we need these many monitored paths? Is the one with wildcards not hitting the parent directory?
Description
Host file paths need to be reported correctly. When a file is created in a tracked directory its inode should be added to a hash set in kernel space. In user space an entry needs to be added into a map with the inode as the key and file path as the value.
Checklist
Automated testing
If any of these don't apply, please comment below.
Integration tests were modified.
Testing Performed
The following command was run in the main branch and this branch.
The following commands were also run
One of the tests was run with bye_world.txt instead.
In the main branch
{ "timestamp": 1773981129521946307, "hostname": "jvirtane-thinkpadp1gen3.rmtuswa.csb", "process": { "comm": "vim", "args": [ "/usr/bin/vim", "bye_world.txt" ], "exe_path": "/usr/bin/vim", "container_id": null, "uid": 0, "username": "root", "gid": 0, "login_uid": 4203274, "pid": 1252416, "in_root_mount_ns": true, "lineage": [ { "uid": 4203274, "exe_path": "/usr/bin/sudo" }, { "uid": 4203274, "exe_path": "/usr/bin/sudo" } ] }, "file": { "Chmod": { "inner": { "filename": "/etc/sensitive-files/bye_world.txt", "host_file": "", "inode": { "inode": 0, "dev": 0 } }, "new_mode": 33188, "old_mode": 33188 } } }Note that the
host_fileis blank.In this branch
{ "timestamp": 1773980922622536024, "hostname": "jvirtane-thinkpadp1gen3.rmtuswa.csb", "process": { "comm": "vim", "args": [ "/usr/bin/vim", "hello_world.txt" ], "exe_path": "/usr/bin/vim", "container_id": null, "uid": 0, "username": "root", "gid": 0, "login_uid": 4203274, "pid": 1251608, "in_root_mount_ns": true, "lineage": [ { "uid": 4203274, "exe_path": "/usr/bin/sudo" }, { "uid": 4203274, "exe_path": "/usr/bin/sudo" } ] }, "file": { "Chmod": { "inner": { "filename": "/etc/sensitive-files/hello_world.txt", "host_file": "/etc/sensitive-files/hello_world.txt", "inode": { "inode": 74960529, "dev": 37 }, "parent_inode": { "inode": 0, "dev": 0 } }, "new_mode": 33188, "old_mode": 33188 } } }Note that
host_fileis now populated.