Access incomplete value returned

UncleRonin

Well-known member
Joined
Feb 28, 2006
Messages
230
Location
South Africa
Programming Experience
5-10
I'm querying an access database to get a memo value. When the value is returned its only 255 characters long. It was working just now but SOMETHING happened to screw it up. I've checked in the database and the value is there and the full length. I've checked other memo fields as well and they all return a maximum of 255 characters.

Can anyone tell me whats going wrong because this is ridiculous!
 
Post the code that is reading them, including the schema of any datasets they are being read into (xml is fine)
 
The thing is I use untyped datasets so the schemas are a bit of a bugger. The database only has two tables with a total of 11 fields. Really small so I just built three data adaptors and go nuts. I steer clear of typed datasets for the simple reason I never do anything intense so they aren't really need, bad practice i know but I am a strange one... in any case...

I was getting particularly angry so I removed the data adaptor I was using (the one generating the dodgy 255 character string) and duplicated another data adaptor which does something similar. I ran it and got the correct result. This pissed me off even further because the two DA are identical aside from a single field. I added the additional field and again it worked perfectly. Now, you must understand when I tell you that I became furious! Why would this one work and not that one? As a random attempt at 'debugging' I mangled the SELECT statement I was using and removed DISTINCT from it, trying to narrow down on the error. Lo-and-behold! It was the DISTINCT that was causing the problem. I really don't understand this! I've used SELECT DISTINCT in plenty of instances and they all worked fine. I've never used them with memo fields but apparently there was a reason for this.

Has anyone ever encountered this before? I'm using the default Jet 4 driver so could the problem not maybe be because of the driver? This is truly an unexpected problem.
 
I steer clear of typed datasets for the simple reason I never do anything intense
Uhm, but not using typed datasets is kinda like arguing that you make all your variables Object, in a class called MyObjects, or that you dont bother with the Generics collections..

Typed datasets are nice, sensible ways of encapsulating differing data types.

If you would make the following class:

VB.NET:
Class Person
  String firstName
  String surName
  Integer age
...

rather than:

VB.NET:
Class MyObjects
  Object firstName
  Object surName
  Object age
  Object myTempTreeeviewNodeCOllection
  Object someOtherVariablesMyAppUses
  Object...
...
5000 lines later...
... Object yetAnotherTempVariableForThatProcessOnThatForm

You type the rest of your app, including the variables that hold strings, ints etc
Why not type the datasets?

This annoyed me even further
Do try to avoid the P word! :D

I'm using the default Jet 4 driver so could the problem not maybe be because of the driver?
Ah, now you never mentioned the grouping!

One of jet's limitations is that queries involving aggregation (GROUP BY or DISTINCT) will cause memo fields to be truncated to 255.

To be honest, you really shouldnt ever be using DISTINCT, except to modify counting aggregators. Never use it to suppress repeated rows. Instead, write teh query better to remove the repeated rows (treat the cause, not the symptoms).
Additionally, dont GROUP BY any huge fields. While it is grouping, the database has to hold all the variations of a grouped clause in memory. The larger the data item being grouped, the more variations it can have, and hence memory usage.

I personally think the jet limitation is a good thing. In 13 years of developing, i dont think I've ever group aggregated on a field longer than 255..
 
Typed datasets make sense with large static tables, queries, etc. but it doesn't really make sense to use it here. I'm always changing the tables to suit my needs and its irritating to have to retype the datasets each time as opposed to typing in a field name (i know... i know... it doesn't take long either way but I'm partial to my way of doing things!).

The bugger with not being able to use DISTINCT is that the secondary table contains lots of number values, each with a reference to a particular entry in the primary table. Sometimes they get repeated for a particular number value (this CANNOT be changed because each is independent of the others due to other unique info - it just happens that the values can be repeated) So without DISTINCT, I have multiple references to the same items in the primary table. The whole purpose of the DISTINCT keyword is to cater for this. You can analyse my table structure and design and you'll see it follows conventions and best practices. Done enough courses and db design to have that drilled into me. *shakes* don't like database design! But if as you say I should design my query to avoid using DISTINCT - how is it possible?

Also, I don't EVER GROUP BY. I might use ORDER BY once in a while on a primary key but thats normal. *giggle* Personally I think its a bit daft to truncate memo fields! They're MEMO fields! The whole point is to store long text. They're not even the fields being DISTINCTed either so why didn't they just design the driver to use the primary keys only? Generally thats all you should be using in any case otherwise your db has overmuch redundancy.

Man, I can talk nonsense. The end result is I now dislike Jet 4.0 because I have to manually iterate hundreds/thousands of times through my dataset making sure that there are no duplicate rows where this should have been handled correctly through the query. The result - its much slower. And in any case, if a table is SO big and has SO many items/tuples/rows/whatever that memo has to be restricted - then it is either badly designed or running on a server somewhere and has enough memory to handle it anyway!

Geh.
 
Typed datasets make sense with large static tables, queries, etc. but it doesn't really make sense to use it here. I'm always changing the tables to suit my needs

Whenever i;ve designed a database app, the db schema has been the first thing to be concreted.. Like you, I cant cope with having to faff with changing the datasets all the time, so the approach I take is to get the data right right time.
However, it isnt that hard to add columns etc.. just throw them into the tableadapter and let the wizard update the rest of the queries. Stick to simple queries for your select and it wont have a problem creating/maintaining the I/U/D :D

and its irritating to have to retype the datasets each time as opposed to typing in a field name (i know... i know... it doesn't take long either way but I'm partial to my way of doing things!).
The other thing to consider, if you keep changing your database schema, is that it might make some code stale, or out of date. Given that the compiler cannot check this:

ds.Tables("myTable").Rows(0).Item("myCol")

in the same way it can do this:

ds.myTable(0).myCol


You can code in a lot of bugs using untyped stuff.. However you wanna do it, though, is your choice..


Sometimes they get repeated for a particular number value (this CANNOT be changed because each is independent of the others due to other unique info
That's fine ; it is expected that a child table's primary key is different to the parent, and that the child rows reference a particular primary key in the parent such that many child rows exist for one parent row.

The important thing here is to consider carefully which table will drive the query, and also which rows are actually wanted from the child. It might be possible to reduce the repeated rows from the child to none with careful choice of a where clause..


The whole purpose of the DISTINCT keyword is to cater for this.
Contentious though it may be, I hold the opinion that the whole purpose of the DISTINCT keyword is to cater for lazy SQL programmers who dont know how queries work or why they get repeated rows, and use it to conveniently reduce their result sets to unique elements without consideration to the fact that generating a million repeated rows through semi-cartesian joins, then reducing them to unique rows through duplicate checking and hashing is far more inefficient than just selecting only unique rows in the first place, thereby removing the cartesian aspect of the join

I hence normally recommend avoiding use of DISTINCT unless you are using COUNT(DISTINCT x) which is a rather different thing :)

You can analyse my table structure and design and you'll see it follows conventions and best practices. Done enough courses and db design to have that drilled into me.
I dont doubt that it does, because as you have described your tables, it is perfectly sensible that a child will have many rows for a particular parent. What we should not do, however, is join child to parent and then DISTINCT it to remove repeated rows - I'd be more than happy to take a look at the sqls and table designs and make some recommendations for query design, if you'd like?


design my query to avoid using DISTINCT - how is it possible?
it rather depends on the particular query, but once I see it I can explain to you in better detail

They're not even the fields being DISTINCTed either so why didn't they just design the driver to use the primary keys only?
Oh, its not so much that they are or are not used in the PK; you see, the PK becomes redundant after a 1:M join because one tables PK can be repeated (thus it is no longer primary, unique, capable of identifying a single row). I know what you are angling at in that you think the rows can be distincted using only the PKs of each table, and then the additional data can be tacked in afterwards, but this isnt how distinct works. Let me explain:

The memo fields are truncated to preserve resources.

Lets say you have a query that selects 4 text fields at 100 bytes, and 3 memo fields, which can be what.. 64kb each?
So, potentially each row can be 4 * 100 + 3 * 65536, so that's 197008 bytes per row
Now suppose the query returns 1000 rows, that's 197008000 bytes or to you and me, 187 megabytes

Now how do we de-dupe things? The easiest way is to use a hash table with a key of what we are deduping and a null value:


If ht.Contains(theKey) Then ignore Else ht.Add(theKey)

remember that theKey is 197kb of data. It has to be hashed, which will take some time. The hashtable is then queried to see if it contains that hash. This isnt always an O(1) operation so it can take a varying amount of time.
Then if the key isnt in the table, it is put there. Its not sufficient to store just the hash just incase another different key hashes to the same value (unlikely but posssible) so the whole 197kb has to be stored.
Repeat this op for all 1000 rows. --> Long time

So our hash table eventually contains somewhere between 197kb (all rows are dupes) and 186 megs (all rows are unique)

The hashtable is paged and sent out de-duped



Can you see how this is a hugely more intense operation than if we truncate the memo fields to 255 before we hash them? (1165 bytes per row)


So, its a resources issue. memos have to be truncated to make the hashing of the whole row (so we can distinct it) doable in a reasonable time and memory constraint. What we must do is:
Not use DISTINCT
not GROUP BY on any table containing a memo that we want to use


The end result is I now dislike Jet 4.0 because I have to manually iterate hundreds/thousands of times through my dataset making sure that there are no duplicate rows
So effectively you hit the same wall that Microsoft do.. It takes you a long time to write code to dedupe huge datasets. Now that youre in their shoes, hopefully you can accept they cant wave a magic wand any more than you can. Their choice was to truncate the memo fields in DISTINCT cases

The answer is simple.. not use distinct. Jet is great, in its own way.. We just have to accept the limitations it imposes in the same way we accept the limitations of nature and physics that mean a human cant run faster than about 25mph or something...

where this should have been handled correctly through the query.
Heh, but dont blame poor performance of a db engine on the engine, when the query it is running on is suboptimal

has enough memory to handle it anyway!
Just 1000 rows -> 186 megs. * 4 concurrent users and you barely ahve enough ram left for the O/S on most machines where jet is found (desktops)..
 
I have come to the realisation that when I posted this thread I had barely a clue what I was talking about and ran around in circles and contradicted myself and...

I cannot remember which project this was related to but I'm sure it would have sorted itself out... well, almost sure... ;)

I'm very curious to see what queries you would use in place of using SELECT DISTINCT! Say for example you have a parent table with a PK and however many fields. You then have a child table (I really prefer the term table to relation so bear with me here) which has a whole bunch of fields and a FK to the parent. The problem here though is that the child table's fields are mostly numeric and the values are all very very similar and duplicated all over the place. Say you're only interested in the unique combinations of these numeric values and you have two queries: one is to show the unique combinations as per the parent table PK and the other is to show the unique combinations overall without regard for the parent. How would you do this? Obviously SELECT DISTINCT would pretty much do this perfectly in both cases and it will be quick and painless but how would you do it otherwise?

Now for some more fun stuff... say you have to append a MEMO description to the end of each record (I prefer record to tuple so bear with me here... :3) and this memo field (is more than 255 characters long and somewhere in the field there is some aspect which changes according - like an error code or something interesting that is there for whatever reason. So you now want all the unique combinations plus the related memo field so you can see which combinations had which different errors or info. Obviously SELECT DISTINCT should theoretically work here as well barring the 255 character cut off but how would you do it otherwise?

*scratches* how on earth you would manage this without some seriously intensive iteration of the results or lots of queries I don't know! If the answer is really simple and straight forward I do believe I will pass out on the spot!! *indicates with foot* this spot right over here...

The question has been asked and the goal has been set...
 
DISTINCT is a shortcut for writing GROUP BY (list of columns selected)

My problem with it is that it appeals to laziness, a human constant of significant pervasiveness.

"Oh, my query is returning multiple rows that are identical. 2 choices: find the table I have joined in incorrectly without squashing the cartesian product, or.. write the word DISTINCT at the top"

Lazy developers prefer the latter...

SO you end up with a barmy situation where a db might have one parent row, and join 1000 child rows in resulting in 1000 repeated values. Then the DISTINCT will cause it to create a hashtable, add all 1000 results into the hash, ergo killing duplicates, and then return only the one unique one.

If I asked you to fetch me spanners where the size was 13mm but I actually had 17 of them in my toolbox, you would bring me all 17. I only need to undo one nut. What should I do.. Tell you to compare all the sizes of the spanners with each other and return any duplicates to the toolbox (DISTINCT), or should I just have told you "bring me the first 13mm spanner you find"


As to your situations, both can be solved without use of DISTINCT.. (And I dont use memo fields, or access for that matter) but the actual design and data content of the database prevents me from giving you a "one size fits all" answer to your question. As a guideline though, a query like:

VB.NET:
SELECT
  Combo1,
  Combo2,
  ErrorCode,
  Count(*)
FROM
  tblErorCombinations
GROUP BY
  Combo1,
  Combo2,
  ErrorCode

You can now see what combinations occur with what error codes on what frequency.. an
A,B,404,1

Might not be as worthy investigating as:
A,B,302,1987234
 
Last edited:
Back
Top