Table of Contents
In a previous post, I described the development of a file system walker. It returns files from a directory recursively while considering ignore patterns.
In this post, I’ll update it. Some of the requirements have changed, and I need new features:
walk_serialcurrently takes aSenderand uses channels to send the results. This is confusing, as it doesn’t use any parallelism to traverse the file system.There are long-running processes that modify the file system, which can affect behavior. We need to be notified about file system changes while using identical ignore rules.
For the first change, I’ll convert the signature of walk_serial to:
pub fn walk_serial(
given: IgnoreRules,
dir: &Path,
walk_options: &WalkOptions,
res_paths: &mut Vec<Result<PathMetadata>>,
) -> Result<IgnoreRules>
In the first version, instead of res_paths, we had a sender through which the results were sent.
Why don’t I simply return the paths instead of receiving a mut Vec parameter to fill? While it might be simpler, I don’t like the idea of creating a new Vec for each function call. The overhead of creating new vectors and merging them can become a burden for large directory trees. Instead, each call of the function works on the same vector and expands its capacity when necessary. Since the current Vec strategy is to double the capacity when needed, there are fewer memory allocations for complex and large directory hierarchies.
Other than this, instead of Sender::send, the function uses Vec::push to add elements to the vector. The rest of the implementation is similar.
One major difference in this version is that the function returns an IgnoreRules value instead of (). This contains the compiled ignore rules from all directories below the given directory. We can use this returned value to further check the paths we get from the notifier we’ll develop.
Adding a file system notifier
For the second change, we need to access the underlying OS machinery for file system notifications. My research led me to the notify crate. Although it seems to be undergoing a major overhaul as of this writing, it’s the only promising option other than interfacing directly with the OS.
Again, we begin by defining the enums and structs for this problem.
What do we want? We want to collect file system events to update the Vec<PathMetadata> that was built with walk_serial or walk_parallel. PathMetadata was defined as:
pub struct PathMetadata {
pub path: PathBuf,
pub metadata: Metadata,
}
We could build a HashMap<PathBuf, Metadata> or keep this as a vector, but for our purposes, we want to track new events. These events correspond to creating a new file, updating metadata (size, timestamp), or deleting a file. We don’t need, for example, permission changes, as we primarily want to track which file is newer or whose size has changed.
pub enum PathEvent {
Create { path: PathBuf, metadata: Metadata },
Update { path: PathBuf, metadata: Metadata },
Delete { path: PathBuf },
}
notify provides an abstraction over different operating systems. It’s called RecommendedWatcher and is used to provide a consistent interface for events. A watcher is built by implementing the EventHandler trait.
Since we need to check whether paths are ignored before reporting them, the watcher should receive an IgnoreRules struct.
pub fn make_watcher(
ignore_rules: IgnoreRules,
) -> Result<(RecommendedWatcher, Receiver<PathEvent>)> {
It returns a RecommendedWatcher and a (crossbeam) Receiver for the PathEvent we defined above. The implementation creates a channel, initializes the watcher, and starts watching.
let (sender, receiver) = bounded(10000);
let root = ignore_rules.root.clone();
let mut watcher = notify::recommended_watcher(PathEventHandler {
ignore_rules,
sender,
})?;
watcher.watch(&root, RecursiveMode::Recursive)?;
PathEventHandler is the struct that implements the EventHandler trait. The implementation requires a single function to be implemented:
fn handle_event(&mut self, event: notify::Result<Event>)
notify::Event is a hierarchical set of enums that describe the type of file system event. The event handler is expected to discriminate between these events and react accordingly.
pub struct Event {
pub kind: EventKind,
pub paths: Vec<PathBuf>,
pub attrs: EventAttributes,
}
We’re interested in filtering events by EventKind, checking whether the paths are ignored, and sending the path and metadata through a channel. We have three kinds of events in PathEvent. PathEventHandler implements event filtering and calls handlers for these PathEvents.
The struct has functions that correspond to one or more PathEvent elements. These are create_event, write_event, remove_event, and rename_event. The first three correspond directly to members of PathEvent, and the latter sends both a PathEvent::Delete and a PathEvent::Create.
match event.kind {
notify::EventKind::Create(_) => self.create_event(event.paths[0].clone()),
notify::EventKind::Modify(mk) => match mk {
notify::event::ModifyKind::Any => todo!(),
notify::event::ModifyKind::Data(_) => self.write_event(event.paths[0].clone()),
notify::event::ModifyKind::Metadata(_) => {
self.write_event(event.paths[0].clone())
}
notify::event::ModifyKind::Name(rk) => match rk {
notify::event::RenameMode::Any => {}
notify::event::RenameMode::To => self.create_event(event.paths[0].clone()),
notify::event::RenameMode::From => {
self.remove_event(event.paths[0].clone())
}
notify::event::RenameMode::Both => {
self.rename_event(event.paths[0].clone(), event.paths[1].clone())
}
notify::event::RenameMode::Other => {}
},
notify::event::ModifyKind::Other => {}
},
notify::EventKind::Remove(_) => self.remove_event(event.paths[0].clone()),
notify::EventKind::Any => {}
notify::EventKind::Access(_) => {}
notify::EventKind::Other => {}
}
An example event handler should clarify how the others work:
fn write_event(&mut self, path: PathBuf) {
match check_ignore(&self.ignore_rules, &path) {
MatchResult::Whitelist | MatchResult::NoMatch => {
self.sender
.send(PathEvent::Create {
path: path.clone(),
metadata: path.metadata().map_err(Error::from).unwrap(),
})
.unwrap_or_else(|e| warn!("{}", e));
}
MatchResult::Ignore => {
debug!("FS Notification Ignored: {}", path.to_string_lossy());
}
}
}
Other event handlers work similarly. They send PathEvent values that can be used to update a list of paths and their metadata.
To use this watcher, you first retrieve the list of files with walk_serial. It returns a snapshot of the directory and the IgnoreRules collected from the directories below it. Then, you create a watcher with:
let ignore_rules = walk_serial(...)?;
let (watcher, path_event_rec) = make_watcher(ignore_rules)?;
After this, you can create another thread to monitor file system changes:
crossbeam::scope(|s| {
s.spawn(|_| {
while let Ok(path_event) = path_event_rec.recv() {
match path_event {
PathEvent::Create { .. } => { /* ... */ }
PathEvent::Delete { .. } => { /* ... */ }
PathEvent::Update { .. } => { /* ... */ }
}
}
});
});