DataColumn.Expression is too slow

Abattoir

Member
Joined
Jul 4, 2006
Messages
5
Programming Experience
1-3
I'm just wondering if anyone else has had a problem with Expression columns taking way to long to compute. I have a DataSet with 15 interconnected tables, and I'm doing a lot of Count operations on the parent tables. Also some Parent lookups, string concatenations, and a little bit of conditional processing.

It's not bad with a small amount of data, but when I get into the 10-20k rows range, suddenly it's very computationally expensive. I've taken to eliminating Expression columns, and instead incrementing/decrementing Count columns based on child table events, etc.

Is there anything I can do to speed these up? They are very helpful for development, but I can't seem to get them up to production speed.
 
DataSet isnt supposed to be a world-class replacement for an enterprise level database. It certainly isnt intended for use with 10 to 20 thousand rows at a time.. Its a local container for a small amount of data, downloaded from the database for the user to peruse and update. No user can effectively deal with tens of thousands of data items, so why bother loading them all? I suggest you redesign your app to use data more selectively and wisely, and also make better/more use of the analytical/aggregating capabilities of whichever RDBMS you are using
 
There's no way for me to do that in this application, without abandoning a disconnected architecture. I am already being quite selective in the rows I am downloading - to do any less would result in excessive database reads.

Based on my research, I would respectfully disagree with you on the capabilities of the DataSet, at least when speaking of ADO.NET 2.0. Version 1.1 - sure, loading time increased almost proportionally to the number of rows in a DataTable. Tests on the capabilities of the new DataSet, however, show an speed boost by a factor of ~40 when loading a DataTable with 1 million rows.

I can load my DataSet in under a minute (even 20 seconds) without expressions - with them, it takes 10 minutes. DataBinding and record navigation in code is excellent - no issues there either.

Being able to use Expressions properly would really simplify my development - but it slows it down too much to be useful. Just wondering if anyone else has seen this.
 
You dont need to respectfully disagree with me; I'm pretty hard to offend :D - I'm an opinionated sod and annoyingly right a lot of the time.. ;)

Hopefully you dont mind me making the observation that only an fool would load a million rows into a DataTable. There is no way, on earth, that they were designed to hold and manipulate that much data.
Seriously, why would you spend thousands of dollars on an enterprise class database system, when you might as well just save all million rows on the hard disk in an XML file, if youre going to get the free-with-the-framework to be container to it for manipulation.


If youre being selective in loading only a million rows, then I find it hard to take you seriously, sorry. You dont quite seem to realise just how fast blitting bits of data from a database is, or where your data should be stored. There isnt much you can do to improve the performance of .Expression; youre already using it way out of spec.

Might I suggest you move some business logic into the database; if you really must work with a huge amount of data, write some stored procedures and leave the data where it is best stored - in the system that MS spent years/millions developing solely for data storage and analysis purposes, not the DataSet architecture that has (relatively) had months/thousands thrown at it in a different direction (i.e. not data storage)
 
I agree with you that a DataTable is not the right place to load a million rows - I was using it to show that the performance of ADO.NET has greatly improved when in comes to large DataSets. If 1,000,000 rows can be loaded into a DataTable in an acceptable amount of time, then surely 10,000 isn't too much for it to handle.

I'm being selective in loading only 10k rows, yes. I could have designed the tool to load 10-100 rows with every click of the mouse, or just the 10k I need in one shot. I leave the other 20 million rows in the database, where they rightly belong.

It actually handles it very well, with the singular exception of the Expression column. All of my data is loaded in < 5 seconds, and reaction time is instantaneous. With the expressions, my load time jumps to 2 minutes.

I do understand how quickly data can be spit out of a database - I also understand connection overhead, server loading, and reaction time. Do you know of any source that describes the size constraints of DataSets?
 
I agree with you that a DataTable is not the right place to load a million rows - I was using it to show that the performance of ADO.NET has greatly improved when in comes to large DataSets. If 1,000,000 rows can be loaded into a DataTable in an acceptable amount of time, then surely 10,000 isn't too much for it to handle.
I'm not sure that the load time is a great indicator of performance. Effectively all youre testing is the speed of the network connection; rows are loaded into the table merely by being copied into the client memory. I;d imagine that loading a hash table with a million entries would take a similarly minimal amount of time. Of course, loading and manipulating are different things..

I'm being selective in loading only 10k rows, yes. I could have designed the tool to load 10-100 rows with every click of the mouse, or just the 10k I need in one shot. I leave the other 20 million rows in the database, where they rightly belong.
What I cant understand is, typically, we use datasets to download some data.. maybe up to 1000 rows if a customer really has made a lot of orders this month but usually closer to, er.. 2 or 3 rows. We edit them, and we send them back.

What can your program possibly do that means that
a) the user needs to see 10,000 rows of info
b) the user can effectively manipulate 10,000 rows of info (that's what update statements are for)
c) the user needs to do something significant with 10,000 rows of info

?

That's what I'm having difficulty understanding. I can think of NO situation where 10,000 lines of info is an acceptably small amount of information that a user can effectively look at, navigate, use, add up, consider, and not get lost in.

It actually handles it very well, with the singular exception of the Expression column. All of my data is loaded in < 5 seconds, and reaction time is instantaneous. With the expressions, my load time jumps to 2 minutes.
Expressions probably arent compiled into .net code.. they will likely be interpreted every time, parsed from string into number, operators parsed, IF "+" then a + b Else If "-" then a - b ...
Youre not looking at something quick here, youre looking at a simple, scripting level intelligence add-on to a datatable that is intended to maybe add up 10 items, or give Cost * Qty = Price. DataSets are not relational databases. Expressions are no substitute for analytical queries (rank() over(partition by group_number order by rec_date desc) ) and other advanced sql

I do understand how quickly data can be spit out of a database - I also understand connection overhead,
.NET pools connections, there is virtually no overhead

server loading,
Opening a cursor to read 10,000 rows ties the database up for longer than reading 10 rows. If 100 people all load 10,000 rows, dont you tink the database will be more busy than if they asked for 10 rows?

and reaction time.
If the server is limited to X connections, then keeping those connections open for longer (10,000 rows) lengthens your reaction time as queries queue up waiting for the release of resources

Do you know of any source that describes the size constraints of DataSets?
The memory available in the client machine. This doesnt mean it is sensible to load 2 gig of data into a DataSet.
 
Back
Top