Syncing Path Operations in Xvc

While writing a HOWTO post to the documentation, I found a bug where multiple carry-in commands are causing file system failures. When mulitple threads were accessing the same cache directory, if one of them sets the cache directory while the one is still working, it caused a permissions error.

Rust has fearless concurrency for memory access but for file system, there seems to be no locked access primitives.

I decided to write one. Paralel working of file system operations are important for Xvc. The messages are annoying. Having identical files in a repository is common and in those cases these messages look like there is a problem. In theory, when files are identical, having only one of them written to the cache is not a problem but there may be other issues that’s preventing the cache access. We can just swallow the error and get away with it.

Two options come to my mind. One is modifying the list of cache files before creating threads, so no two threads acccess to the ssame cache file at the same time. This is hard to convey and it brings extra complexity to thread creation. The one in a thousand concern becomes an architectural burden.

The other solution is to lock paths while accessing them, so two threads working on the same cache path can wait each other. This is easier and requires just a dependency injection to the thread functions. It has a downside of making file system operations slower, as each path operation now will require to check the mute but that seems of little concern for file system access which is already much slower than memory operations.

First I tried to implement this with a HashMap<PathBuf, Mutex<()>> but this must be passed to the function wrapped in Arc<Mutex<HashMap>> too and this made the ceremony of the receiving the lock for a single file much longer.

The issue is, you don’t want to bring this HashMap as a bottleneck. Multiple threads shouldn’t wait for this HashMap to become available, as it’s not the HashMap that we’re looking for to lock but the values in it.

The solution is to return the lock value in a method of a struct. Something like:

pub struct PathSync {
    locks: Arc<RwLock<HashMap<PathBuf, Arc<Mutex<()>>>>>,
}

Now the ceremony can be performed in the method and the threads that work in different dirs wont need to wait for the hash map to become available.

However, after implementing this I decided I’ll probably forget to lock a path at some point. This is a general purpose solution and I should apply to all path operations when multiple threads are in working. In most of the cases there are multiple paths to lock (cache_path, cache_dir, repository path) andif I forget to lock one of them, a future user some time somewhere will probably see an error message.

So I decided to as in stores: Creating wrappers to run passed closures. This is much more obvious that paths that Xvc work must be locked in before operations.

    pub fn with_sync_path(
        &self,
        path: &Path,
        mut f: impl FnMut(&Path) -> Result<()>,
    )

This works by passing the path lock and a closure that operates on the path. It first locks the path and runs the closure. Locking mechanism allows to run threads with different paths, but to operate on the same path, they should wait for each other.

The implementation is here

/path/ /path_sync/ /multithreading/ /parallel/ /os/ /xvc/