Creating a file system watcher with ignore rules
In the [previous post], I described the development of a file system walker. It returns files from a directory recursively considering the ignore patterns.
In this post, I’ll update it. Some of the requirements has changed and I need new features:
walk_serial
takes aSender
and uses channels to send the results. This is confusing, as it doesn’t use any parallelism to traverse the file system.There are long running processes that modify the filesystem, which can affect the behavior. We need to be notified about the filesystem changes with the identical ignore rules.
For the first change, I’ll convert the signature of walk_serial
to:
pub fn walk_serial(
given: IgnoreRules,
dir: &Path,
walk_options: &WalkOptions,
res_paths: &mut Vec<Result<PathMetadata>>,
) -> Result<IgnoreRules>
In the first version, instead of res_paths
, we had a sender
and the results are sent through it.
Why I don’t simply return the paths and receive a mut Vec
parameter to fill? It may be a bit simpler, but I don’t like the idea of creating another Vec
for each function call. Overhead of creating new vectors and merging them may become a burden for large directory trees. Instead, each call of the function works on the same vector and expands its capacity when necessary. As the current Vec
strategy is to double the capacity when necessary. There are fewer memory allocations for complex and large directory hierarchies.
Other than this, instead of Sender::send
, the function uses Vec::push
to add elements to the vector. The rest of the implementation is similar.
One major difference of this version is that the function returns an IgnoreRules
value instead of ()
. This is the compiled ignore rules from all directories below the given directory. We can use this returned value to further check the paths we get from the notifier we’ll develop.
Adding a file system notifier
For the second change, we need to access the underlying OS machinery for filesystem notifications. My research led me to notify
crate. Although it seems to be going a major overhaul as of this writing, it’s the only promising option other than interfacing directly with the OS.
Again, we begin by thinking enums and structs for this problem.
What do we want? We want to collect file system events to update the Vec<PathMetadata>
that was built with walk_serial
or walk_parallel
. PathMetadata
was defined as:
pub struct PathMetadata {
pub path: PathBuf,
pub metadata: Metadata,
}
We can build a HashMap<PathBuf, Metadata>
or keep this as a vector, but for our purposes we want to track new events. These events correspond to creating a new file, updating the metadata (size, timestamp) of a file or deleting the file. We don’t need, e.g., permission changes as we want to track which file is newer or whose size has changed.
pub enum PathEvent {
Create { path: PathBuf, metadata: Metadata },
Update { path: PathBuf, metadata: Metadata },
Delete { path: PathBuf },
}
notify
has an abstraction over three OSes. It’s called [RecommendedWatcher] and used to provide a consistent interface to events. A watcher is built by implementing [EventHandler] trait.
As we need to check whether paths are ignored before reporting, watcher should receive a IgnoreRules
struct.
pub fn make_watcher(
ignore_rules: IgnoreRules,
) -> Result<(RecommendedWatcher, Receiver<PathEvent>)> {
It returns a RecommendedWatcher
and a (crossbeam) Receiver
for the PathEvent
we defined above. The implementation creates a channel, initializes the watcher and starts watching.
let (sender, receiver) = bounded(10000);
let root = ignore_rules.root.clone();
let mut watcher = notify::recommended_watcher(PathEventHandler {
ignore_rules,
sender,
})?;
watcher.watch(&root, RecursiveMode::Recursive)?;
PathEventHandler
is the struct that implements EventHandler
trait. The implementation requires a single fn
to be implemented:
fn handle_event(&mut self, event: notify::Result<Event>)
notify::Event
is a hierarchical set of enums that describe the type of file system event. The event handler is expected to discriminate these events and react accordingly.
pub struct Event {
pub kind: EventKind,
pub paths: Vec<PathBuf>,
pub attrs: EventAttributes,
}
We’re interested in filtering the event by EventKind
, checking whether the paths are ignored and sending the path and metadata through a channel. We have three kinds of events in PathEvent
. PathEventHandler
implements the event filtering and calls event handlers for these PathEvent
s.
The struct has functions that correspond to one or more PathEvent
elements. These are create_event
, write_event
remove_event
, and rename_event
functions. First three of these correspond to members of PathEvent
, and the latter sends a PathEvent::Delete
and PathEvent::Create
.
match event.kind {
notify::EventKind::Create(_) => self.create_event(event.paths[0].clone()),
notify::EventKind::Modify(mk) => match mk {
notify::event::ModifyKind::Any => todo!(),
notify::event::ModifyKind::Data(_) => self.write_event(event.paths[0].clone()),
notify::event::ModifyKind::Metadata(_) => {
self.write_event(event.paths[0].clone())
}
notify::event::ModifyKind::Name(rk) => match rk {
notify::event::RenameMode::Any => {}
notify::event::RenameMode::To => self.create_event(event.paths[0].clone()),
notify::event::RenameMode::From => {
self.remove_event(event.paths[0].clone())
}
notify::event::RenameMode::Both => {
self.rename_event(event.paths[0].clone(), event.paths[1].clone())
}
notify::event::RenameMode::Other => {}
},
notify::event::ModifyKind::Other => {}
},
notify::EventKind::Remove(_) => self.remove_event(event.paths[0].clone()),
notify::EventKind::Any => {}
notify::EventKind::Access(_) => {}
notify::EventKind::Other => {}
}
An example event handler should illuminate the others.
fn write_event(&mut self, path: PathBuf) {
match check_ignore(&self.ignore_rules, &path) {
MatchResult::Whitelist | MatchResult::NoMatch => {
self.sender
.send(PathEvent::Create {
path: path.clone(),
metadata: path.metadata().map_err(Error::from).unwrap(),
})
.unwrap_or_else(|e| warn!("{}", e));
}
MatchResult::Ignore => {
debug!("FS Notification Ignored: {}", path.to_string_lossy());
}
}
}
Other event handlers work similarly. They send other PathEvent
values that can be used to update a list of paths, and their metadata.
The usage of this watcher requires you first get the list of files with walk_serial
. It returns a snapshot of the directory and a IgnoreRules
that are collected from the directories below it. Then you make a watcher
with
let ignore_rules = walk_serial(...)?;
let (watcher, path_event_rec) = make_watcher(ignore_rules)?;
After this, you can create another thread to watch the file system changes.
crossbeam::scope(|s| {
s.spawn(|_| {
while Ok(path_event) = path_event_rec.recv() {
match path_event {
PathEvent::Create ...
PathEvent::Remove ...
PathEvent::Update ...
}
}
});
});