Skip to main content

File Searching Speed

Author
Gāruḍam
Torana over Shantipur, next to Svayambhu stupa, Kathmandu

Noticing that some of my colleagues get results for their GREP searches almost instantly, I began to wonder why my searches of all my Sanskrit etexts took close to 2 minutes. They use grep from the command line (Terminal on OSX), and attributed the speed to that. Not wanting to drop my one-stop application BBEdit, where I can edit the files returned when I search them right then and there, I decided to run a test with a colleague who had a similar machine and etext collection. The same search on his was finished in just 20 seconds, while mine took almost 2 minutes. He attributed it to having converted most of his files to have Unix line breaks. Not having luck with batch converter applications, I realized I would have to go through every folder individually and batch convert on a smaller scale. This allowed me the chance to see what was there and clean out non-text files. I moved several hundred megabytes of web-archives and PDFs out of the etext collection and reduced my number of files and size of the collection by about 25%. In the process I converted everything to have the ending .txt, whereas before there was a plethora of endingless files and files with many different types of extensions. I haven’t gotten to converting the line breaks yet, but now I can search all of my Sanskrit etexts in around 20 seconds, and have them ready for editing instantly in the results window of BBEdit. This is a huge improvement, because now I can search more freely, whereas before I often limited the searches to specific folders to keep the speed within reason.

UPDATE 2021: processors and disks have gotten faster, and so has grep. I now prefer “ripgrep” to search my whole archive of etexts from the command line in less than 1 second.