Category Archives: GREP

File Searching Speed

Torana over entrance to Śāntipura, Svayambhunath, Kathmandu

Noticing that some of my colleagues get results for their GREP searches almost instantly, I began to wonder why my searches of all my Sanskrit etexts took close to 2 minutes.  They use grep from the command line (Terminal on OSX), and attributed the speed to that.  Not wanting to drop my one-stop application BBEdit, where I can edit the files returned when I search them right then and there, I decided to run a test with a colleague who had a similar machine and etext collection.  The same search on his was finished in just 20 seconds, while mine took almost 2 minutes.  He attributed it to having converted most of his files to have Unix line breaks.  Not having luck with batch converter applications, I realized I would have to go through every folder individually and batch convert on a smaller scale.  This allowed me the chance to see what was there and clean out non-text files.  I moved several hundred megabytes of web-archives and PDFs out of the etext collection and reduced my number of files and size of the collection by about 25%.  In the process I converted everything to have the ending .txt, whereas before there was a plethora of endingless files and files with many different types of extensions.  I haven’t gotten to converting the line breaks yet, but now I can search all of my Sanskrit etexts in around 20 seconds, and have them ready for editing instantly in the results window of BBEdit.  This is a huge improvement, because now I can search more freely, whereas before I often limited the searches to specific folders to keep the speed within reason.