Plug sysroot bypasses in at-family path syscalls#27
Merged
Merged
Conversation
A path-translation audit on the *at()/readlinkat/sysroot surface that
does not flow through openat2 found a consistent set of gaps:
sys_fchmodat, sys_fchownat, sys_mknodat, sys_utimensat, sys_chdir, and
the four sys_*xattr path handlers passed absolute guest paths straight
to the host syscall, skipping path_resolve_sysroot_*. A guest running
chmod("/etc/foo") under --sysroot=/opt/sr therefore hit the host's
/etc/foo rather than /opt/sr/etc/foo, contradicting the redirect
contract every other path syscall already implements (sys_openat_path,
sys_truncate, sys_statfs, stat_at_path).
Route each handler through the matching resolver: nofollow when
AT_SYMLINK_NOFOLLOW or the lxattr variant is set, _create for mknodat
since the target may not yet exist, _path for everything else.
sys_chroot accepted any guest-supplied path, stat()'d it on the host,
and stored it via proc_set_sysroot without checking that the new root
was reachable from the current one. A guest already running under
--sysroot=/opt/sr could call chroot("/etc") and pivot to the host's
/etc with no containment check. Reduce to a chroot("/") no-op and
return -EPERM for anything else; this still satisfies the original
motivation (coreutils stdbuf does fork -> chroot("/") -> exec) without
keeping the pivot path. Real Linux chroot requires CAP_SYS_CHROOT,
which the guest does not have in elfuse's single-process VM.
The sysroot_path static buffer was mutated by proc_set_sysroot and
read by proc_get_sysroot with no lock. A sibling vCPU running chroot
during another vCPU's path resolution could tear the snprintf source
underneath the consumer. Add sysroot_lock and a copying
proc_sysroot_snapshot helper, then migrate proc_resolve_sysroot_*,
fork-state.c, and exec.c to the snapshot. proc_get_sysroot stays
as-is for the NULL-test fast paths (path[0] != '/' early returns)
that tolerate a racy read; the docstring spells out the contract.
Also canonicalize the sysroot once at proc_set_sysroot time via
realpath so subsequent containment checks compare against a stable
form. The /lib/basename retry inside proc_resolve_sysroot_path_flags
was an early hack that masked containment errors when the dynamic
linker walked DT_RPATH/RUNPATH/LD_LIBRARY_PATH for itself; drop it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A path-translation audit on the *at()/readlinkat/sysroot surface that does not flow through openat2 found a consistent set of gaps: sys_fchmodat, sys_fchownat, sys_mknodat, sys_utimensat, sys_chdir, and the four sys_xattr path handlers passed absolute guest paths straight to the host syscall, skipping path_resolve_sysroot_. A guest running chmod("/etc/foo") under --sysroot=/opt/sr therefore hit the host's /etc/foo rather than /opt/sr/etc/foo, contradicting the redirect contract every other path syscall already implements (sys_openat_path, sys_truncate, sys_statfs, stat_at_path).
Route each handler through the matching resolver: nofollow when AT_SYMLINK_NOFOLLOW or the lxattr variant is set, _create for mknodat since the target may not yet exist, _path for everything else.
sys_chroot accepted any guest-supplied path, stat()'d it on the host, and stored it via proc_set_sysroot without checking that the new root was reachable from the current one. A guest already running under --sysroot=/opt/sr could call chroot("/etc") and pivot to the host's /etc with no containment check. Reduce to a chroot("/") no-op and return -EPERM for anything else; this still satisfies the original motivation (coreutils stdbuf does fork -> chroot("/") -> exec) without keeping the pivot path. Real Linux chroot requires CAP_SYS_CHROOT, which the guest does not have in elfuse's single-process VM.
The sysroot_path static buffer was mutated by proc_set_sysroot and read by proc_get_sysroot with no lock. A sibling vCPU running chroot during another vCPU's path resolution could tear the snprintf source underneath the consumer. Add sysroot_lock and a copying proc_sysroot_snapshot helper, then migrate proc_resolve_sysroot_*, fork-state.c, and exec.c to the snapshot. proc_get_sysroot stays as-is for the NULL-test fast paths (path[0] != '/' early returns) that tolerate a racy read; the docstring spells out the contract. Also canonicalize the sysroot once at proc_set_sysroot time via realpath so subsequent containment checks compare against a stable form. The /lib/basename retry inside proc_resolve_sysroot_path_flags was an early hack that masked containment errors when the dynamic linker walked DT_RPATH/RUNPATH/LD_LIBRARY_PATH for itself; drop it.
Summary by cubic
Fixes sysroot bypass in at-family path syscalls and restricts chroot to chroot("/") to prevent escaping the configured root. Adds locking and snapshots for sysroot updates and hides the sysroot prefix from guest-visible paths.
Bug Fixes
fchmodat,fchownat,mknodat(use create resolver),utimensat,chdir, andgetxattr/setxattr/listxattr/removexattr(respect nofollow)./proc/self/exe,getcwd, and/proc/self/cwdso guests see guest paths./dev/shm/<name>via a validated helper to block invalid suffixes.Refactors
chrootto"/"(no-op); return-EPERMfor any other path.proc_sysroot_snapshot; canonicalize sysroot withrealpath; migrate resolvers,execve, and fork state to snapshots; remove the/lib/<basename>fallback in the resolver.proc_dev_shm_resolveAPI and atest-sysroot-chdirtarget.Written for commit ca9c1be. Summary will update on new commits.