Question Problem with multi threading

aceinc

Member
Joined
Mar 12, 2011
Messages
5
Programming Experience
10+
I have written a rather complex dll for processing a large volume of data (~4,000,000 records) from an Oracle database. It works well, but takes about 23 hours to run. Since I wrote the code fairly compartmentalized I realized that there are 6 routines that should be able to run concurrently, and I have a multi core processor, I decided to try my first multi-threaded implementation.

In the routine I am working on I am doing no I/O everything is in memory in variables. My data is already in dataTables and hashTables. Most of the data is defined as private in the class scope, but no conflicting updates should happen because the code segments works very distinct portions of the dataTables.

The concept is simple, I have 2 routines that need to complete before launching my threads, and one routine that needs to run after all of the threads have completed. Here is what I am trying;


ProcessDXDate()
ProcessHistology()

aryThreads(0) =
New Threading.Thread(AddressOf ProcessLaterality)
aryThreads(0).Start()
aryThreads(1) =
New Threading.Thread(AddressOf ProcessTreatments)
aryThreads(1).Start()
aryThreads(2) =
New Threading.Thread(AddressOf ProcessRadTherapy)
aryThreads(2).Start()
aryThreads(3) =
New Threading.Thread(AddressOf ProcessSummaryStage)
aryThreads(3).Start()
aryThreads(4) =
New Threading.Thread(AddressOf ProcessReportingSource)
aryThreads(4).Start()
aryThreads(5) =
New Threading.Thread(AddressOf ProcessDXAddress)
aryThreads(5).Start()

aryThreads(0).Join()
aryThreads(1).Join()
aryThreads(2).Join()
aryThreads(3).Join()
aryThreads(4).Join()
aryThreads(5).Join()

ProcessSurgery()

My understanding, incorrect it would appear, is that the ".join()" is supposed to pause processing until the thread has completed. It appears that all the threads do not complete before running "ProcessSurgery()."

How should I be doing this?

Paul
 
My understanding, incorrect it would appear, is that the ".join()" is supposed to pause processing until the thread has completed. It appears that all the threads do not complete before running "ProcessSurgery()."
Your understanding is correct, place for example a debug.write call at end of each method and you'll see this output before all Joins returns.
 
Your understanding is correct, place for example a debug.write call at end of each method and you'll see this output before all Joins returns.

OK I did what you said. My code now looks like this;

ProcessDXDate()
ProcessHistology()
Debug.Print(
"Starting Threads")

aryThreads(0) =
New Threading.Thread(AddressOf ProcessLaterality)
aryThreads(0).Start()
aryThreads(1) =
New Threading.Thread(AddressOf ProcessTreatments)
aryThreads(1).Start()
aryThreads(2) =
New Threading.Thread(AddressOf ProcessRadTherapy)
aryThreads(2).Start()
aryThreads(3) =
New Threading.Thread(AddressOf ProcessSummaryStage)
aryThreads(3).Start()
aryThreads(4) =
New Threading.Thread(AddressOf ProcessReportingSource)
aryThreads(4).Start()
aryThreads(5) =
New Threading.Thread(AddressOf ProcessDXAddress)
aryThreads(5).Start()

aryThreads(0).Join()
aryThreads(1).Join()
aryThreads(2).Join()
aryThreads(3).Join()
aryThreads(4).Join()
aryThreads(5).Join()

Debug.Print(
"All Done Threads")
ProcessSurgery()

And at the end of each module I put a debg.print. It seems to work swimmingly for the first few iterations with output that looks like this;
intCurrentRecord=3 | Count=1 | ID=4-1
------------Finished ProcessDXDate
------------Finished ProcessHistology
Starting Threads
------------Finished ProcessLaterality
The thread 0x820 has exited with code 0 (0x0).
The thread 0x7b4 has exited with code 0 (0x0).
------------Finished ProcessRadTherapy
The thread 0xa44 has exited with code 0 (0x0).
------------Finished ProcessDXAddress
The thread 0x1ab8 has exited with code 0 (0x0).
------------Finished ProcessSummaryStage
The thread 0x19f0 has exited with code 0 (0x0).
------------Finished ProcessReportingSource
The thread 0xad4 has exited with code 0 (0x0).
All Done Threads
------------Finished ProcessSurgery
Saving

On the fourth iteration I get;

intCurrentRecord=4 | Count=1 | ID=5-1
------------Finished ProcessDXDate
------------Finished ProcessHistology
Starting Threads
The thread 0x1af0 has exited with code 0 (0x0).
------------Finished ProcessRadTherapy
The thread 0x12ac has exited with code 0 (0x0).
------------Finished ProcessReportingSource
The thread 0x10d8 has exited with code 0 (0x0).
------------Finished ProcessDXAddress
The thread 0x1414 has exited with code 0 (0x0).
All Done Threads
A first chance exception of type 'System.InvalidCastException' occurred in Microsoft.VisualBasic.dll

It seems to not wait until all threads are finished. I will put a debug.print at the beginning of each module as well to make sure it starts each thread.

But if you have any ideas, let me know.

Paul
 
As previously posted I added a debug.print to the beginning of each module. After doing so, the problem went away in debug mode. Apparently the debug.print changed the timing enough so that everything finished properly, processing 11 records, where before it would die on 4.

As a test I compiled for release as opposed to debug, and when I run it against the same data it throws a "WindowsApplication1 has stopped working" appcrash error.

Here is the record that failed in the previous attempt with just "Finished" Debug messages, after I put in the "Starting" debug messages.

intCurrentRecord=4 | Count=1 | ID=5-1
++++++++++ Starting ProcessDXDate
------------Finished ProcessDXDate
++++++++++ Starting ProcessHistology
------------Finished ProcessHistology
Starting Threads
++++++++++ Starting ProcessLaterality
------------Finished ProcessLaterality
The thread 0x19c4 has exited with code 0 (0x0).
++++++++++ Starting ProcessTreatments
++++++++++ Starting ProcessRadTherapy
++++++++++ Starting ProcessSummaryStage
------------Finished ProcessTreatments
The thread 0xc18 has exited with code 0 (0x0).
------------Finished ProcessRadTherapy
The thread 0xd18 has exited with code 0 (0x0).
++++++++++ Starting ProcessReportingSource
------------Finished ProcessReportingSource
++++++++++ Starting ProcessDXAddress
The thread 0x1600 has exited with code 0 (0x0).
------------Finished ProcessSummaryStage
The thread 0x1b0c has exited with code 0 (0x0).
------------Finished ProcessDXAddress
The thread 0x420 has exited with code 0 (0x0).
All Done Threads
++++++++++ Starting ProcessSurgery
------------Finished ProcessSurgery
Saving

I am at a loss.

Paul
 
OK, so maybe the problem isn't that the modules are not waiting. I took the debug.print statements out of my modules, and created a class scoped private integer. The last line in each module added 1 to that variable. I changed the code to look like this;


ProcessDXDate()
ProcessHistology()

Debug.Print(
"Starting Threads")
_intThreadCount = 0
aryThreads(0) =
New Threading.Thread(AddressOf ProcessLaterality)
aryThreads(0).Start()
aryThreads(1) =
New Threading.Thread(AddressOf ProcessTreatments)
aryThreads(1).Start()
aryThreads(2) =
New Threading.Thread(AddressOf ProcessRadTherapy)
aryThreads(2).Start()
aryThreads(3) =
New Threading.Thread(AddressOf ProcessSummaryStage)
aryThreads(3).Start()
aryThreads(4) =
New Threading.Thread(AddressOf ProcessReportingSource)
aryThreads(4).Start()
aryThreads(5) =
New Threading.Thread(AddressOf ProcessDXAddress)
aryThreads(5).Start()

aryThreads(0).Join()
aryThreads(1).Join()
aryThreads(2).Join()
aryThreads(3).Join()
aryThreads(4).Join()
aryThreads(5).Join()

Debug.Print(
"All Done Threads " & _intThreadCount)

While _intThreadCount < 6
System.Threading.Thread.Sleep(1)
End While

Debug.Print(
"All Done Counting " & _intThreadCount)

For each record that succeeded I got "All Done Threads 6" and "All Done Counting 6."

On the one that failed it failed on "ProcessDXAddress" on a line that assigns a string to a field in one of the rows in a DataTable like;

Dim strMyVariable As String = objInputDataTable.Rows(intDXDateRec)("MY_VARIABLE")

This field is set by a module way upstream of this, and it gets used in the modules that are non threaded before I start the threading (ProcessHistology) where it would die, were it not set.

So, perhaps my understanding of how variables can be accessed is flawed.

What I currently have is;

A group of class scoped private variables that contain data used for processing. One of the variables is a DataTable called "objInputDataTable" which contains a number of rows less than 70. "objInputDataTable" contains my input data and is used strictly as an input area, I never modify anything in this object. Each of the 6 threaded routines will however loop through it one or more times analyzing data elements (fields).

I do not understand how when I access the data it has a value in the main thread, and most of the time the child threads see the data too, but sometimes accessing the data returns;

Run-time exception thrown : System.InvalidCastException - Conversion from type 'DBNull' to type 'String' is not valid.

Can anyone illuminate me regarding this?

Paul
 
A DataTable that is not changed can't have row that sometimes have a DBNull value (ie no value) and sometimes something else. You should have a closer look at that table.
It is true though, that DBNull can't be converted to String.
 
Perhaps a little help using the debugger might be useful. Please tell which FM, and where in the FM to find it if I need to RTFM.

a) How can I set a trap/breakpoint whenever a value changes, such as the DataTable?

b) from within immediate window, how can I display all the values in a DataTable?

Paul
 
Since you say from at a point the table is finished 'loaded' and from there on only used for reading, I would approach it there. You also think there should be no DBNull, so loop through all rows and all fields to verify that is true or not. If there is a DBNull then there is the problem. If not, and it appear later, some code must be filling the table or setting a value in it. DataTable class does have events for change notification: Handling DataTable Events (ADO.NET)
If some field may not have a value, then this is expected and you can in code check for that, for example using DataRow.IsNull or one of the methods mentioned in remarks here: DBNull Class (System)
 
Back
Top