Deleting records in a DataView problem

Bernie

Well-known member
Joined
Aug 13, 2007
Messages
98
Programming Experience
3-5
I have a DataTable in memory that is created from processed data from a previous query. To it I attach a Dataview. I'm reading through the DataView with a DataRecordViewer using a For Each loop.

The objective is to find similar records and then report the match to the user interface. When I find a matching record, I delete the record in the DataView using myDataView.delete. The records are deleted to make the next search run quicker.

The code runs fine in Visual Studio and also when installed on the development system. The database on the developement system is a copy of the production database.

When run on a production system with the production database, I get an error of "There is no row at position 16". The stack trace is "at System.Data.DataView.GetRecord(Int32 recordIndex).

Any suggestions why this could work on the development system and not on the production system?

It appears the first record deleted is around the 16th in the table, so that's what has me suspecting the deletion process. But when I step through the code it works find, so I have to sumise this from what's on the UI at the time of the crash.

Any suggetsions on how to duplicate the problem in the test environment?

Any suggestions on how to "refresh" the DataView after the deletes so it can be reused for the next search?

Thanks,

Bernie
 
Please don't post the same question more than once. If you want to add information then just add another post to the same thread. If you want to change something then edit your existing post or, if you can't, ask a moderator. Your other thread has been deleted.
 
Show us your code. I'm guessing that by deleting one row you are affecting the indexes of the remaining rows. Consider a list of two items that you want to delete both items from. If you delete the item at index 0 then there's only one item left, so you can't then delete the item at index 1.

If you want to loop through a collection and remove items as you go you should loop backwards. If you remove the item at index 1 first then you can safely move onto the item at index 0.
 
Please don't post the same question more than once.

Please fix the message board so it doesn't lock up IE.

IE locked up trying to post the first message. I logged back on with a new instance of IE and my post was not posted. Therefore I posted again.

Thank you for cleaning up the other post, but there is nothing I can do if the board locks up and doesn't show my post.

Bernie
 
Show us your code.

VB.NET:
    ds = getPatientsForTesting()
    dtPatients = ds.Tables("Patients")
    dtAddresses = ds.Tables("Addresses")

    dvPatients = New DataView(dtPatients)
    dvPatients.Sort = "LastName"
    dvAddress = New DataView(dtAddresses)
    dvAddress.Sort = "Zip5"

    Try
            For Each drTemp In dvPatients
                If cbSameAddress.Checked Then
                    If tmpAddress.Length > 0 Then
                        dvAddress.RowFilter = ("Zip5 = " & "'" & FirstFive(tmpZip) & "'")
                        i = 0
                        For Each drAddress In dvAddress
                            If tmpAddress.CompareTo(CheckForNulls(drAddress.Item("Address").ToString)) = 0 Then
                                i = i + 1
                            End If
                        Next
                        For Each drAddress In dvAddress
                            If tmpAddress.CompareTo(CheckForNulls(drAddress.Item("Address")).ToString) = 0 Then
                                If i > 1 Then
                                    DebugOutput("Cluster", _
                                                tmpAddress, _
                                                (drAddress.Item("PatientID").ToString & " - " & CheckForNulls(drAddress.Item("FirstName").ToString) & " " & CheckForNulls(drAddress.Item("LastName").ToString)) & _
                                                ", " & (drAddress.Item("MailingLabelType")).ToString)
                                End If
                                drAddress.Delete()
                            End If
                        Next
                    End If
                End If    ' cbSameAddress ...
            Next ' dataRow in dvPatients ...
        Catch ex As Exception
            MessageBox.Show(ex.Message)
            MessageBox.Show(ex.StackTrace)
            MessageBox.Show(ex.Data.ToString)
        End Try

I tried to cut the code down as much as possible, sorry it's so long.

The records being removed are in the middle of the DataView. I understand your point about deleting records from the end rather than the front. But that doesn't explain why the code runs on one system and not the other.

If it wasn't for a bunch of firewalls in the way I'm halfway considering VPNing my development machine into the production site to attach to the production database. I've double checked the ODBC driver settings, but it still bugs me that it runs here and not there.

Bernie
 
I'm reading through the DataView with a DataRecordViewer
A what?

The objective is to find similar records and then report the match to the user interface.
Define similar - it might be possible, and faster, to create an SQL that does the comparison youre running, rather than having the client do it

When I find a matching record, I delete the record in the DataView using myDataView.delete
Beware of Delete vs Remove; one has data loss implications at the database end, the other does not

When run on a production system with the production database, I get an error of "There is no row at position 16". The stack trace is "at System.Data.DataView.GetRecord(Int32 recordIndex).

You may be making the mistake of assuming that the row position in a datatable is correlated to the apparent position of a row in a dataview. A view can sort or filter data and may maintain its own internal list inedependent of the data source. Ergo you should never assume that a view positon is correlated to the underlying datasource layout. The MSDN boards have had note of some internal indexing issues with datatables resulting in similar error messages to yours (from my memory) but in most cases these erros have arisen through concurrent modification in multithreaded scenarios and datatable hasnt always been the best choice data container

Edit: I just looked at your code! Oh no! Youre using both filtering AND sorting.. AND youre modifying a colelction as you iterate over it. I'm really not surprised youre having problems.
Please fix the message board so it doesn't lock up IE
Please use firefox? :)

Side note: I use IE6, and I don't encounter any problem. Are you running any browser add ons that might interfere?



Any suggetsions on how to duplicate the problem in the test environment?
Rework your code so that it does not exhibit the error, possibly by using a different approach to the problem

Any suggestions on how to "refresh" the DataView after the deletes so it can be reused for the next search?
Why are you using a DataView at all? Youre allowed to iterate over a DataTable?!
 
Whoops DataRowViewer.

Define similar - it might be possible, and faster, to create an SQL that does the comparison youre running, rather than having the client do it
I'm thinking about that, but the table is over 20K records and I'd like to not have to hit the DB 10K times to get this done. I think I'll give this a try to benchmark some timing and overhead used.

Beware of Delete vs Remove; one has data loss implications at the database end, the other does not
I assume you mean conceptually rather than specific methods. In my case the data is all in memory with no connection back to the database, so the implecation is the same.

Side note: I use IE6, and I don't encounter any problem. Are you running any browser add ons that might interfere?
IE addins = death! No, I gave up on IE addins a long time ago, hahaha. For some reason this board has always been problematic for posting. I've just learned to copy my post before submitting just in case. Last night I got sloppy and had to retype, hahaha.

Why are you using a DataView at all? Youre allowed to iterate over a DataTable?!
This routine looks for exact addresses, it has a sister routine that looks exactly the same but only matches the first potion of the address, the numbers and a few letters. This avoids the problem of non stadardized addresses with the many ways to indicate street, road, or avenue. So I filtered the search group to the same zip code to help inprove the accuracy. I don't know of anyway to sort a datatable other than a dataview. Is there one?


Bernie
 
Whoops DataRowViewer.
There's no such thing. Do you mean DataRowView? Each item in a DataView is a DataRowView, just as each item in a DataRowCollection is a DataRow. The Rows property of a DataTable is type DataRowCollection while the DefaultView property is type DataView. One is the raw data and the other is a customisable view of that data.

Can you please explain, fully and clearly, what you're actually trying to achieve? That code is rather convoluted and I don't enjoy reading other people's uncommented code in an effort to work out what it does. I'm quite sure there's a far cleaner way to do what you're doing. For instance, the DataRow class has an IsNull method, so I'm guessing that your CheckForNulls method is redundant. How can you possibly check for nulls like that anyway, given that you're converting a null value to an empty string? A null value and an empty string are not the same thing and should not be used to represent the same thing in a database.
 
I'm thinking about that, but the table is over 20K records and I'd like to not have to hit the DB 10K times to get this done. I think I'll give this a try to benchmark some timing and overhead used.
It might be better to just give me the table structure and example data containing duplicates, and how you'd like me to present a query that finds them, and I'll assist you writing a query.

Queries don't "hit the db" any more than once, because they run within the db. Your process, on the other hand, selects a load of data that is irrelevant, drags it across the network and exceutes a slow and possibly flawed comparison routine; youre not using a hash table like the DB would and having a collection of 20,000 items, looping through it and then looping more to find similar items can result in up to 400,000,000 comparisons. Sorting first helps as it limits the looking, but even then a sort operation is more costly than loading a hash table.

This is why I say; tell us more about your problem then we can help fix it


I assume you mean conceptually rather than specific methods. In my case the data is all in memory with no connection back to the database, so the implecation is the same.
Remove would probably be more optimal in your case, if youre not saving changes to the db

IE addins = death! No, I gave up on IE addins a long time ago
Google toolbar 4 beta and HttpWatch Professional are my two must-haves



This routine looks for exact addresses, it has a sister routine that looks exactly the same but only matches the first potion of the address
SELECT addrLine1 FROM address GROUP BY addrLIne1 HAVING COUNT(*) > 1 would be better

If you want the whole row:
SELECT * FROM(SELECT t.*, COUNT(*) OVER(PARTITION BY addrLine1) as ctr) WHERE ctr > 1


the numbers and a few letters
SELECT * FROM(SELECT t.*, COUNT(*) OVER(PARTITION BY SUBSTR(addrLine1, 1, 5)) as ctr) WHERE ctr > 1

These are oracle syntax. Hopefully youre using a sufficiently advanced version of SQLServer that supports analytics, otherwise you'll have to rework them in a form that SQLS can understand. It's not hard, just tedious..


I don't know of anyway to sort a datatable other than a dataview. Is there one?
I kinda think youre asking me to do something I call "solution fixing" - you've envisaged a solution that is actually broken and now seek assistance for fixing a broken solution. It would be better if you can involve us in the problem, then we can fix the problem, not the problem with your solution to the problem.. Hopefully that makes sense!
 
I think I have this problem solved. After some more testing on various workstations, the failure is on the older workstations with slower processors and less memory. I suspect its related to page swapping but I don't have the desire to pursue it any further.

On Cjard's recommondation, I changed the methodology to not delete the records once they have been used. A new boolean field was added to the table to indicate if the record had been matched or not. The dataview filtering was then modified to include this field being true before the record would be shown. Once the record is matched, then that field is updated to false and it won't be shown again.

This fixes two problems, one the deleting of records while iterating through the table, and also the need to loop twice to tell if more than one record was available for the match.

This solution seems to have solved the problem on the older workstations and I think is better code overall.

Bernie
 
Hashtable would have been more optimal than loops

Well now you tell me. :D

Thanks for your help. I'll have to look into the hash table later, that programs already in the can and I've move onto the next one, hahaha.

Thanks again,
Bernie
 
Well now you tell me. :D

I tried to tell you many times before, including asking for a better description of your problem so I could tell you how to get the database to do this (and the database would have used a hashtable) ;)

But.. On to the next one..
 
Back
Top