Multi-Threaded file copying?

MNVBGuy

New member
Joined
Dec 1, 2018
Messages
3
Programming Experience
1-3
Hello,

Im a noob but am trying to learn. Ive found a way to make VB useful for a task ive been given to copy thousands of files to a slow SAN server daily. Right now i copy one file at a time. it works. painfully slow but it works.

What id like to do is come up with some kind of stack where i can add filenames to copy to the stack. Call some code to spawn X number of threads and one by one it chews away at the stack. popping each file off the stack, copying it and then
taking another off the stack until it's done then ending gracefully.

Umm yeahhh. my code isnt even close. I got the multi-threading to work. But i cant seem to get control of one thread taking a file from the top and the second thread takes the next one down etc. The way i have it all three threads seem to grab the same top item in the stack. its a mess. Well any ideas on how i should be doing this? I appreciate the help. Thank you!
Imports System


Module Module1
    Public FileStack As New Stack(Of String)


    Sub FCT1()
        Try
            Console.WriteLine("FCT1: " & FileStack.Peek)
            FileStack.Pop()
        Catch ex As Exception
            Console.WriteLine("Error: FCT1")
        End Try
        'Do actual file copy to SAN
    End Sub
    Sub FCT2()
        Try
            Console.WriteLine("FCT2: " & FileStack.Peek)
            FileStack.Pop()
        Catch ex As Exception
            Console.WriteLine("Error: FCT2")
        End Try
        'Do actual file copy to SAN
    End Sub
    Sub FCT3()
        Try
            Console.WriteLine("FCT3: " & FileStack.Peek)
            FileStack.Pop()
        Catch ex As Exception
            Console.WriteLine("Error: FCT3")
        End Try
        'Do actual file copy to SAN
    End Sub


    Sub Main()
        FileStack.Push("1.PDF")
        FileStack.Push("2.PDF")
        FileStack.Push("3.PDF")
        FileStack.Push("4.PDF")
        FileStack.Push("5.PDF")


        If FileStack.Count > 0 Then
            Console.WriteLine("Copying " & FileStack.Count.ToString & " files...")
        Else
            Console.WriteLine("No files to copy.")
        End If


        Do While FileStack.Count > 0
            Dim A As System.Threading.Thread = New Threading.Thread(AddressOf FCT1)
            Dim B As System.Threading.Thread = New Threading.Thread(AddressOf FCT2)
            Dim C As System.Threading.Thread = New Threading.Thread(AddressOf FCT3)
            A.Name = "File_Copier_Thread_1"
            B.Name = "File_Copier_Thread_2"
            C.Name = "File_Copier_Thread_3"
            A.Start()
            B.Start()
            C.Start()


        Loop


        Console.ReadLine()
    End Sub
End Module
 
These days, you should use Tasks. They will be executed on ThreadPool threads and that will manage the number of concurrent threads. E.g.
Private Sub CopyFiles(filePaths As IEnumerable(Of String))
    Dim tasks As New List(Of Task)

    For Each filePath In filePaths
        tasks.Add(Task.Factory.StartNew(Sub() File.Copy(CStr(filePath), "destination path here")))
    Next

    Task.WaitAll(tasks.ToArray())
End Sub
 
Hello jmcilhinney,

Ok that works so much better than the way i tried. lol. A few questions if i may? I tried to read up on tasks first. I liked the way you used a Lambda expression when adding the task.

1. It sounds like all of the tasks run in one thread pool. i.e. does that mean they all run on one cpu? Or would they be distributed against all logical cores?
2. Is there an upper limit / practical limit on the numer of tasks one can add? i.e. say i have 100,000 files to copy.
should I:
A- Add all files in that iEnumerable list and pass it in to the subroutine?
B- Only pass in say 100 at a time, wait for them to finish, then take another chunk of files?

What are your thoughts on this?

Thanks
 
1. It sounds like all of the tasks run in one thread pool. i.e. does that mean they all run on one cpu? Or would they be distributed against all logical cores?
You should do some reading on the ThreadPool class for more specific information because I don't actually know the answer to that question, but I would assume that all cores would be utilised. I'm not sure that that would even be up to .NET but rather the domain of the OS, but I'm not really sure. The ThreadPool always has at least a certain number of threads available to do work and will create more if the number of work items queued grows too large. It will continue to create new threads as needed up to an upper limit. Once the work item queue drops to a certain level, the ThreadPool will start terminating unused threads after a while. I don't even know whether a single thread is executed on the same CPU core all the time. I'm guessing that it is most efficient to do so but, if another core becomes available, I would guess that threads waiting on a busy core would be executed on that available core. Again, I would guess that that is up to the OS to manage rather than .NET or the ThreadPool itself, but I could be wrong about that.
2. Is there an upper limit / practical limit on the numer of tasks one can add? i.e. say i have 100,000 files to copy.
should I:
A- Add all files in that iEnumerable list and pass it in to the subroutine?
B- Only pass in say 100 at a time, wait for them to finish, then take another chunk of files?
A Task is an object so I would expect that any relevant limits would be the same that apply to any other type of object. If you use the ThreadPool directly you call QueueUserWorkItem and pass a delegate and I would imagine that that's what happens inside a Task too, amongst other things. Other than available memory and the like, I would think that the only limits on that would be that the queue cannot have more than Integer.MaxValue items, which is way bigger than 100,000.
 
You rock. Thank you for pointing me in the right direction. I hope to be able to help others on this forum once i get better myself. I'll do more reading on that and some testing and see how it goes. Thank you!!
 
Back
Top