Opinions on Speed

Viper426

New member
Joined
Mar 1, 2012
Messages
2
Location
Toronto, ON, Canada
Programming Experience
5-10
I'm finalizing an application that does a recursive search through our server farm (we're talking in the tens of millions of folders area). As it is now it scans each and every folder and reports back on various metrics as entered by the user. The overall functionality isn't terribly important.

What I'm looking at now is shortening the run time by excluding folders with certain names ("Windows", "Temporary Internet Files", "Product", etc... the list is ever-growing but will likely end up in the 20-30 word range within a couple days).

My question is one of speed. Since every directory that the application hits is getting compared against the list of excluded folders, even a minor slowdown in the comparison could add hours to the total run time. From what I've seen I have three options:
1: Build a for-each loop for the entire list of excluded names and run it on each directory. This option just sounds stupid since that loop will run 30 times for every directory = 300,000,000 runs. Way too slow.
2: Add a table of excluded names to the SQL Server DB that the app uses for storing reports and query that against the name of each directory. On paper it's smarter than option 1, but I'm worried that the VB->Sql Server bridge might be even slower than the for-each loop. I've worked with DBs before, but usually just running one-off gigantic queries as opposed to millions of small queries like this.
3: Create a String array of the excluded names and just use If strDirectoryNames.Contains(myDirectory.name) Then <blah>. Again, on paper this seems like a pretty good idea, but it all depends on what's actually going on in the back-end of the Contains method of the String Array class. For all I know it's just running a for-each on its own (but hopefully a little faster... I don't think Arrays are indexed, so I don't know how it would be able to determine if a value exists without checking the entire array).

I'm well aware that there's no "right" answer, but I figured I'd poll the experts to see what opinions there are out there before I build up and test a solution.

Thanks!
 
1- Scan the folders into a source list: "dir c:\* /b /s > list.txt"
2- Import that list into a SQL table
3- Import the list of keywords into another SQL table
4- Create a query, something like "SELECT * FROM FolderList JOIN Keywords ON FolderList.Filename NOT LIKE '%' + KeyWords.KeyWord + '%'"

I actually just did all that, scanning my C drive took about 1 minute, importing the data into tables maybe 30 seconds, and running the query was close to instantaneous, over 450000 records in the FolderList table.
 
Last edited:
What a great idea. I'd become so wrapped up in manipulating the DirectoryInfo class that I hadn't thought leave it out of the directory exclusion part of the program. Good thinking, Herman!
 
Back
Top