Creating a file system watcher with ignore rules

2022-07-06

In the [previous post], I described the development of a file system walker. It returns files from a directory recursively considering the ignore patterns.

In this post, I'll update it. Some of the requirements has changed and I need new features:

For the first change, I'll convert the signature of walk_serial to:

pub fn walk_serial(
    given: IgnoreRules,
    dir: &Path,
    walk_options: &WalkOptions,
    res_paths: &mut Vec<Result<PathMetadata>>,
) -> Result<IgnoreRules> 

In the first version, instead of res_paths, we had a sender and the results are sent through it.

Why I don't simply return the paths and receive a mut Vec parameter to fill? It may be a bit simpler, but I don't like the idea of creating another Vec for each function call. Overhead of creating new vectors and merging them may become a burden for large directory trees. Instead, each call of the function works on the same vector and expands its capacity when necessary. As the current Vec strategy is to double the capacity when necessary. There are fewer memory allocations for complex and large directory hierarchies.

Other than this, instead of Sender::send, the function uses Vec::push to add elements to the vector. The rest of the implementation is similar.

One major difference of this version is that the function returns an IgnoreRules value instead of (). This is the compiled ignore rules from all directories below the given directory. We can use this returned value to further check the paths we get from the notifier we'll develop.

Adding a file system notifier

For the second change, we need to access the underlying OS machinery for filesystem notifications. My research led me to notify crate. Although it seems to be going a major overhaul as of this writing, it's the only promising option other than interfacing directly with the OS.

Again, we begin by thinking enums and structs for this problem.

What do we want? We want to collect file system events to update the Vec<PathMetadata> that was built with walk_serial or walk_parallel. PathMetadata was defined as:

pub struct PathMetadata {
    pub path: PathBuf,
    pub metadata: Metadata,
}

We can build a HashMap<PathBuf, Metadata> or keep this as a vector, but for our purposes we want to track new events. These events correspond to creating a new file, updating the metadata (size, timestamp) of a file or deleting the file. We don't need, e.g., permission changes as we want to track which file is newer or whose size has changed.

pub enum PathEvent {
    Create { path: PathBuf, metadata: Metadata },
    Update { path: PathBuf, metadata: Metadata },
    Delete { path: PathBuf },
}

notify has an abstraction over three OSes. It's called [RecommendedWatcher] and used to provide a consistent interface to events. A watcher is built by implementing [EventHandler] trait.

As we need to check whether paths are ignored before reporting, watcher should receive a IgnoreRules struct.


pub fn make_watcher(
    ignore_rules: IgnoreRules,
) -> Result<(RecommendedWatcher, Receiver<PathEvent>)> {

It returns a RecommendedWatcher and a (crossbeam) Receiver for the PathEvent we defined above. The implementation creates a channel, initializes the watcher and starts watching.

    let (sender, receiver) = bounded(10000);
    let root = ignore_rules.root.clone();
    let mut watcher = notify::recommended_watcher(PathEventHandler {
        ignore_rules,
        sender,
    })?;
    watcher.watch(&root, RecursiveMode::Recursive)?;

PathEventHandler is the struct that implements EventHandler trait. The implementation requires a single fn to be implemented:

    fn handle_event(&mut self, event: notify::Result<Event>) 

notify::Event is a hierarchical set of enums that describe the type of file system event. The event handler is expected to discriminate these events and react accordingly.

pub struct Event {
    pub kind: EventKind,
    pub paths: Vec<PathBuf>,
    pub attrs: EventAttributes,
}

We're interested in filtering the event by EventKind, checking whether the paths are ignored and sending the path and metadata through a channel. We have three kinds of events in PathEvent. PathEventHandler implements the event filtering and calls event handlers for these PathEvents.

The struct has functions that correspond to one or more PathEvent elements. These are create_event, write_event remove_event, and rename_event functions. First three of these correspond to members of PathEvent, and the latter sends a PathEvent::Delete and PathEvent::Create.

match event.kind {
    notify::EventKind::Create(_) => self.create_event(event.paths[0].clone()),
    notify::EventKind::Modify(mk) => match mk {
        notify::event::ModifyKind::Any => todo!(),
        notify::event::ModifyKind::Data(_) => self.write_event(event.paths[0].clone()),
        notify::event::ModifyKind::Metadata(_) => {
            self.write_event(event.paths[0].clone())
        }
        notify::event::ModifyKind::Name(rk) => match rk {
            notify::event::RenameMode::Any => {}
            notify::event::RenameMode::To => self.create_event(event.paths[0].clone()),
            notify::event::RenameMode::From => {
                self.remove_event(event.paths[0].clone())
            }
            notify::event::RenameMode::Both => {
                self.rename_event(event.paths[0].clone(), event.paths[1].clone())
            }
            notify::event::RenameMode::Other => {}
        },
        notify::event::ModifyKind::Other => {}
    },
    notify::EventKind::Remove(_) => self.remove_event(event.paths[0].clone()),
    notify::EventKind::Any => {}
    notify::EventKind::Access(_) => {}
    notify::EventKind::Other => {}
}

An example event handler should illuminate the others.

fn write_event(&mut self, path: PathBuf) {
    match check_ignore(&self.ignore_rules, &path) {
        MatchResult::Whitelist | MatchResult::NoMatch => {
            self.sender
                .send(PathEvent::Create {
                    path: path.clone(),
                    metadata: path.metadata().map_err(Error::from).unwrap(),
                })
                .unwrap_or_else(|e| warn!("{}", e));
        }
        MatchResult::Ignore => {
            debug!("FS Notification Ignored: {}", path.to_string_lossy());
        }
    }
}

Other event handlers work similarly. They send other PathEvent values that can be used to update a list of paths, and their metadata.

The usage of this watcher requires you first get the list of files with walk_serial. It returns a snapshot of the directory and a IgnoreRules that are collected from the directories below it. Then you make a watcher with

let ignore_rules = walk_serial(...)?;
let (watcher, path_event_rec) = make_watcher(ignore_rules)?;

After this, you can create another thread to watch the file system changes.

crossbeam::scope(|s| {
        s.spawn(|_| {
            while Ok(path_event) = path_event_rec.recv() {
                match path_event {
                    PathEvent::Create ...
                    PathEvent::Remove ...
                    PathEvent::Update ...
                }
            }
        });
    });