Typed datasets make sense with large static tables, queries, etc. but it doesn't really make sense to use it here. I'm always changing the tables to suit my needs
Whenever i;ve designed a database app, the db schema has been the first thing to be concreted.. Like you, I cant cope with having to faff with changing the datasets all the time, so the approach I take is to get the data right right time.
However, it isnt that hard to add columns etc.. just throw them into the tableadapter and let the wizard update the rest of the queries. Stick to simple queries for your select and it wont have a problem creating/maintaining the I/U/D
and its irritating to have to retype the datasets each time as opposed to typing in a field name (i know... i know... it doesn't take long either way but I'm partial to my way of doing things!).
The other thing to consider, if you keep changing your database schema, is that it might make some code stale, or out of date. Given that the compiler cannot check this:
ds.Tables("myTable").Rows(0).Item("myCol")
in the same way it can do this:
ds.myTable(0).myCol
You can code in a lot of bugs using untyped stuff.. However you wanna do it, though, is your choice..
Sometimes they get repeated for a particular number value (this CANNOT be changed because each is independent of the others due to other unique info
That's fine ; it is expected that a child table's primary key is different to the parent, and that the child rows reference a particular primary key in the parent such that many child rows exist for one parent row.
The important thing here is to consider carefully which table will drive the query, and also which rows are actually wanted from the child. It might be possible to reduce the repeated rows from the child to none with careful choice of a where clause..
The whole purpose of the DISTINCT keyword is to cater for this.
Contentious though it may be, I hold the opinion that the whole purpose of the DISTINCT keyword is to cater for lazy SQL programmers who dont know how queries work or why they get repeated rows, and use it to conveniently reduce their result sets to unique elements without consideration to the fact that generating a million repeated rows through semi-cartesian joins, then reducing them to unique rows through duplicate checking and hashing is far more inefficient than just selecting only unique rows in the first place, thereby removing the cartesian aspect of the join
I hence normally recommend avoiding use of DISTINCT unless you are using COUNT(DISTINCT x) which is a rather different thing
You can analyse my table structure and design and you'll see it follows conventions and best practices. Done enough courses and db design to have that drilled into me.
I dont doubt that it does, because as you have described your tables, it is perfectly sensible that a child will have many rows for a particular parent. What we should not do, however, is join child to parent and then DISTINCT it to remove repeated rows - I'd be more than happy to take a look at the sqls and table designs and make some recommendations for query design, if you'd like?
design my query to avoid using DISTINCT - how is it possible?
it rather depends on the particular query, but once I see it I can explain to you in better detail
They're not even the fields being DISTINCTed either so why didn't they just design the driver to use the primary keys only?
Oh, its not so much that they are or are not used in the PK; you see, the PK becomes redundant after a 1:M join because one tables PK can be repeated (thus it is no longer primary, unique, capable of identifying a single row). I know what you are angling at in that you think the rows can be distincted using only the PKs of each table, and then the additional data can be tacked in afterwards, but this isnt how distinct works. Let me explain:
The memo fields are truncated to preserve resources.
Lets say you have a query that selects 4 text fields at 100 bytes, and 3 memo fields, which can be what.. 64kb each?
So, potentially each row can be 4 * 100 + 3 * 65536, so that's 197008 bytes per row
Now suppose the query returns 1000 rows, that's 197008000 bytes or to you and me, 187 megabytes
Now how do we de-dupe things? The easiest way is to use a hash table with a key of what we are deduping and a null value:
If ht.Contains(theKey) Then ignore Else ht.Add(theKey)
remember that theKey is 197kb of data. It has to be hashed, which will take some time. The hashtable is then queried to see if it contains that hash. This isnt always an O(1) operation so it can take a varying amount of time.
Then if the key isnt in the table, it is put there. Its not sufficient to store just the hash just incase another different key hashes to the same value (unlikely but posssible) so the whole 197kb has to be stored.
Repeat this op for all 1000 rows. --> Long time
So our hash table eventually contains somewhere between 197kb (all rows are dupes) and 186 megs (all rows are unique)
The hashtable is paged and sent out de-duped
Can you see how this is a hugely more intense operation than if we truncate the memo fields to 255 before we hash them? (1165 bytes per row)
So, its a resources issue. memos have to be truncated to make the hashing of the whole row (so we can distinct it) doable in a reasonable time and memory constraint. What we must do is:
Not use DISTINCT
not GROUP BY on any table containing a memo that we want to use
The end result is I now dislike Jet 4.0 because I have to manually iterate hundreds/thousands of times through my dataset making sure that there are no duplicate rows
So effectively you hit the same wall that Microsoft do.. It takes you a long time to write code to dedupe huge datasets. Now that youre in their shoes, hopefully you can accept they cant wave a magic wand any more than you can. Their choice was to truncate the memo fields in DISTINCT cases
The answer is simple.. not use distinct. Jet is great, in its own way.. We just have to accept the limitations it imposes in the same way we accept the limitations of nature and physics that mean a human cant run faster than about 25mph or something...
where this should have been handled correctly through the query.
Heh, but dont blame poor performance of a db engine on the engine, when the query it is running on is suboptimal
has enough memory to handle it anyway!
Just 1000 rows -> 186 megs. * 4 concurrent users and you barely ahve enough ram left for the O/S on most machines where jet is found (desktops)..