Premature Caching is the Root of All Evil
I’m writing a Rust command line app in my spare time to learn the language. It involves some file system checks that I use fs::metadata
. As everyone knows, accessing the disk is an expensive operation and must be kept at the minimum. I was thinking to use a HashMap::<Path, Metadata>
to cache the results for paths.
I’ve then met the cached crate. It caches the results of functions for memoization. This is exactly what I need I thought. Internally it does what I was planning to do.
Later, I’ve noticed the possible bugs that can arise. I’m thinking to use the function in a short running process, so the metadata is not expected to change during the run. For some reason, suppose the run-length of the process began to get longer, or I’ve decided add a web server on top of it. At that time, probably many moons from now, I’ll be forgotten the decision I did about caches, and my assumption that the metadata won’t change during the run. It will cause some weird bugs about files when their timestamp changes aren’t detected.
No one will notice that I’m fixing bugs if they never appear but I believe this is the best kind of software engineering.
Commentary (2022-08-01)
- It looks my assumption that RAM is extensively faster than disk access may also be wrong. SSDs are fast, and for parallel access they may perform as fast as RAM.