Linux Deleting Many Files⚓︎
When a process creates hundreds of thousands, or millions, of files in the same directory on a Linux system it can be difficult to delete the files in a timely manor. There is the simple go-to of
rm -rf /directory, but this may not be the fastest method, and may not be the optimal method in every case. Below are a few methods I’ve used to assist in removing files in mass.
All of these methods are dependant on many factors. Filesystem type, NAS or local, I/O speeds, etc. There may not always be one method that is always the fastest, but there are other benefits to the various methods that may help you choose the best method in any given situation.
While this method is the one that comes to mind first, it is possibly the slowest method, and may delete more than you initially want. Depending on the exact command, you may delete /directory itself, and you may then need to manually recreate that directory and setup permissions after the delete. Subdirectories may also be deleted depending on exact command.
One concern with this method is that while you can use wildcards to limit what is deleted, this actually slows down the rm process further as the system has to list all files, filter, and then start the deleted.
find and delete⚓︎
This process uses find’s built in -delete switch to delete files as they are identified. This has the added benefit that directory structure will not be lost as we filter only on files.
This is the fastest method of all the find options as only one process is being spawned to search for and delete files.
find and pipe to rm with xargs⚓︎
This method is likely not going to be performant, but it does allow for piping to other processes like grep to be even more selective of which files you need to delete.
find and pipe to many rm with xargs⚓︎
Again, this method is not the fastest, but it may be slightly better than piping directly to rm as you can now start 100 rm processes. Be careful with piping as you may exponentially increase the number of processes you spawn.
This one is a bit counter intuitive, but can be very efficient. You create an empty directory, then use rsync to sync the empty directory into your directory with files, and delete anything that doesn’t exist in the empty directory.
Timing and System Resources⚓︎
When performing deletes like this it may take a significant amount of time. If you would like to time the deletion run it with
time, and if you would like for it to not impact other system processes you can
nice -n 19 the process.