DataSet or DataReader?

cjard

Well-known member
Joined
Apr 25, 2006
Messages
7,081
Programming Experience
10+
I offer the following as a supplement/corrective advice (the article seems to be written by someone with an interest that is focused on web things, which does change the choice you will make)

First off, here is a brief orientation to ADO.NET 2.0 major components:

A DataSet is a collection of DataTable
A DataTable is a collection of DataRow
A DataRow is a collection of data elements
A TableAdapter is a device that uses a hidden DataAdapter (which is based on a hidden DataReader) to fill a DataTable with DataRows from a data source such as a database, and send updates (changes to the DataRows) back to the data source
TableAdapters do not fill DataSets. DataSets do not hold data directly.
In .NET 2.0 controls are usually bound to data through a BindingSource
A BindingSource sits on top of a block of data (such as a DataTable) and maintains knowledge of position; changing the .Position of a BindingSource changes what row of data all controls that bind through that BindingSource, are looking at
BindingSources can also filter and sort the rows they present, independently of the DataTable or DataRelation upon which they are based
A DataRelation acts as a filter device to only show records from a child data source (usually a DataTable) whose foreign key matches the primary key of the current row of the relation's parent table. Because the concept of "Current Parent Row" is needed for a DataRelation to work, they are usually found within BindingSources; the Parent BindingSource exposes all the DataRelations whose parent table is the the table upon which the BindingSource sits. Any databound entity (usually a BindingSource) bound to the DataRelation, only receives a filtered set of rows from the child data source, dependent on which parent row the Parent BindingSource is positioned at
A BindingNavigator is a device that alters the position of a BindingSource through use of its [|<], [<], [>] and [>|] buttons, and can be used to invoke common actions on the BindingSource, such as .AddNew(), .Delete() and .Update()


Now that we know what is going on, which to choose? Reader or Adapter/Table?


A DataReader is the only way data comes out of a database. Anything else is an additional device to fill the need for some functionality.



You would use a DataReader directly when:
  • You want forward-only, read-only access to data coming out of a database and you can process it on the fly
    • For example, you are dumping search results to a web page (data grid), or writing a list of values to a text file on the client PC (e.g. youre dumping a list of products that will then be emailed to your supplier, thereby creating an order)
  • You cant support, or dont want the overhead (in your client machine's memory) of downloading thousands or millions of rows from the database.
    • Remember that a server serving up ASP.NET web pages is, when pulling data out of a database, a client of the database. You must work within the confines of its memory
You would use a DataTable when:
  • You want to download a small amount of data from a database, and you want to scan over that data backwards and forwards or have random-access to it
  • You want to download data and show it on screen (in a Windows Forms client app)
  • You want to download data that will be viewed and changed by the user, and the changes sent back to the database (in Windows Forms)
  • You want to download data that doesnt change often (lookup table values) and you have some way of storing it in a globally accessible location for all your app to use (some kind of Session on ASP.NET or a global static class or module, in Windows Forms)
  • You want to group, count, sum or present things hierarchically (though using the database to do this is often a better idea)
You would use a DataSet of DataTables when:
  • You are displaying related data to your users, with the expectation that they will view/edit it and changes will be sent back to the database.
  • You require the facility of DataRelations (similar to above point)
    • If you have database tables that participate in several relations i.e. your one Addresses table is used to store Supplier, Customer and Employee addresses, consider making several DataSets e.g. EmpDataSet[Employee,Address], SupDataSet[Supplier,Address] ... you will find this approach much easier to work with than arranging all your tables in one dataset, and having several DataRelations attached to one table
  • You want to easily persist to disk, in xml format (or read from disk in xml format) a collection of DataTables
  • You have lots of lookup tables for use across your app and want them all in one place
  • You want to use typed DataSets, generated by the DataSet Designer, and all the wonderful things that go with them (TableAdapters, Compile time checking of your data access code to reduce bugs and speed up development etc)
Nowadays, the argument for using a DataReader directly is limited, in a similar way that writing your own database driver for Oracle doesnt really make sense when Oracle provide a perfectly good, working one with wonderful functionality. I only ever use a DataReader when I am pulling millions of rows out of my database and dumping them to disk.

It is extremely rare that I do this

The rest of the time (even if I am reading thousands of rows for dumping to disk), I use a datatable - the client machines have enough memory to cope with it, and (typed) DataTables are much nice to work with than untyped DataReaders

The line in the original article:
The sooner the data is off-loaded from the DataReader and the connection is closed the better the application performance.
Is a little moot. You see, a DataAdapter, or TableAdapter (more sophisticated wrapper of a DataAdapter) uses a DataReader and reads everything, very quickly, in a forward-only fashion from the database. Dumps all the data into a DataTable and closes the connection. It doesnt get much faster than this in terms of "get the data and close the conenction" so I disagree with the original article's inference that you should use a DataReader when you want the data offloaded and the connection closed as soon as possible. I virtually guarantee that whatever you use the DataReader for, you wont get that connection closed as quickly as the DataAdapter/TableAdapter gets it closed.

One might say that adding a datatable, reading data into into it, and then copying all that data into a DataGridView (Windows Forms) is expensive, using twice the resources that the DataReader appraoch uses, and yes.. It does.. But youre not supposed to copy the data from the table to the grid..

In MVC (not MFC!) concept we have what is known as:
Model
This is where your data is stored. The DataTable is the Model.

View
This is the component that shows the data found in the Model. The DataGridView is the View here. When used in data-bound mode, you dont copy from the datatable into the grid, the grid just shows what it finds in the Model. Change the Model, the View changes what it shows

Controller
I mentioned changing the Model; that's what this does. Most of the time, the thing that changes the Model is the same thing that Views the Model. It doesnt really make sense (usually) to show some text from your database, in a Label, but then have a TextBox to edit it.. because the TextBox is capable of being the View.
But suppose you have a button called [Clear] that removes all the information from the Model. Pressing this button clears the Model and suddenly the View (grid) updates to show nothing. This is an example of a Controller; it affects the Model, but it doesnt behave as a View of anything in the Model.


MVC is a good OO concept. Put your data here, show it using that component there. "There" and "here" shouldnt be the same thing. Dont store your data into your View components.

It is important to remember there are now two data retrieval classes
Well, not really. As I said before, only ever a DataReader is used to do the donkeywork of pulling data from a database.. Whether that DataReader is hidden inside an adapter or used directly, there is still only one. However, it is true that you should carefully consider when to use each. Using a DataReader directly is/should be uncommon because there is only a limited set of circumstances when it can truly be used for good reason.
Most of the time, data is to be gotten, shown, changed and sent back. That's where datasets/tables/adapters come in, and they do it very nicely.
 
Bookmarked for future reference.

-tg
 
I'm going to briefly expand on the TableAdapter(TA) because it seems some things are not well understood.

TA is quite a hard-working component, but remember, it is only a device that moves data from a database to your client machine and back to the database again.. It has nothing more to do with the data.

Data is downloaded from a database using the Fill command(s) offered by the TA. If you have created a parameterised query, the Fill() command will be created by the wizard, to have inputs for all your parameter values.

Data is usually uploaded to the database using the Update() command. I personally feel it was somewhat foolish of microsoft to have called the method such, as it repeatedly causes confusion with the SQL query type of the same name. Update takes a datatable and scans every row. Depending on the RowState of the row (Added, Modified, Deleted), the relevant InsertCommand, UpdateCommand or DeleteCommand is run respectively. A further point of confusion is found in the default TA wizard option of "Generate DBDirect methods" which generates methods on the TA that allow the developer to directly Insert, Update or Delete a record without using bound data at all. Because the DB Direct method for updating data is also called Update it becomes overloaded, so that a TA with DB Direct methods has a up to 6-times overloaded Update() method.
Overloads 5 and 6 take individual parameters, invoke an UPDATE query on the database but do not require the use of bound data. Overloads 1-4 take a dataset/datatable and iterate it, looking for rows of various states and firing off relevant Insert, Update or Delete queries in order to sync the database with the DataTable

As such the overloads of Update() that take a datatable, or dataset should actually be thought of as Save(), and not exclusively with the notion of updating data that already exists.

Use of the DB Direct methods:
TA.Insert(individual, values)
TA.Update(individual, values)
TA.Delete(individual, values)

will have no effect on any bound data.

-

If your TA's InsertCommand, UpdateCommand or DeleteCommand refer to a stored procedure that has output parameters, or an SQL text that is capable of returning values after it has executed (used when e.g. the database calculates a new ID upon insert and needs to pass the ID back), the use of Update() will cause the relevant DataRow to be populated with any values that were returned. If youre calling one of the DB Direct methods, you'll need to pass in a ByRef argument. When the method has finished executing the variable will be altered to the value the database calculated. This feature is only available on database systems that support these kind of stored procedures or sepcialized parameterised queries

-

TA connections can be customised in run time by clicking the relevant TA in design view, and altering the access level for the ConnectionModifier to something accessible to you. This also allows low level access to the database connection for purposes such as starting/enrolling transactions. TA are also programmed such that, if you open a connection explicitly before you use the TA, the TA will not close the conenction when it is done. In cases where the connection was NOT opened explicitly, the TA will open, use and close the conenction for you. Unless youre using connection-level transactions it is recommended that you do not open and close connections yourself.

-

I'll talk more about TA eventually, this was just a quick 15 minutes update (no pun intended) ;)
 
Back
Top