Dodgy SQl query execution

UncleRonin

Well-known member
Joined
Feb 28, 2006
Messages
230
Location
South Africa
Programming Experience
5-10
Okay, i find this to be particularly weird. I've seen this with MySQL but haven't tested it with any other DB servers.

If I execute this query I get a result of x for AH.
VB.NET:
SELECT SUM(A_Hours) AS AH, SUM(B_Total) AS BT FROM A, B, C WHERE A_C = C_Code AND B_C = C_Code AND C_Code = "Code"
If I execute this query I get a result of y for AH.
VB.NET:
SELECT SUM(A_Hours) AS AH FROM A, C WHERE A_C = C_Code AND C_Code = "Code"
If I execute this query I get a result of z for AH.
VB.NET:
SELECT SUM(A_Hours) AS AH FROM A, C WHERE A_C = "Code"
If I execute this query I get a result of v for AH.
VB.NET:
SELECT SUM(A_Hours) AS AH FROM A WHERE A_C = "Code"
What the hell! The last two queries SHOULD return the same result but they differ considerably. I mean, there is no actual difference in the queries. Even if I refer to the fields specifically with A.A_Hours, etc. there is no change in the differing results.

Can somebody explain to me what the problem is exactly? I've never encountered this sort of problem before and never would've if I hadn't checked the results against each other. Is this a problem with MySQL (4.*) or is it a problem with the actual queries themselves? This is really bizarre and because of it I'm gonna have to go and check every other query I've used - which is over 500!
 
I'm going to take a guess that z is less than v.... correct?

You have an inner join... which means that it's only going to deal with rows that have corresponding data in both tables... that alone probably reduces the number of rows to less than those with A_C="Code" by itself.... although.... you don't define the link between A & C .... in SQL Server at least, that would produce an error. I don't know if mySQL maybe does something intelligent in finding the links between tables. *shrug*

-tg
 
Okay, i find this to be particularly weird. I've seen this with MySQL but haven't tested it with any other DB servers.

.
VB.NET:
SELECT SUM(A_Hours) AS AH FROM A, C WHERE A_C = "Code"
If I execute this query I get a result of v for AH.
VB.NET:
SELECT SUM(A_Hours) AS AH FROM A WHERE A_C = "Code"
What the hell! The last two queries SHOULD return the same result but they differ considerably.

Of course they differ considerably. In the first one, you are putting an extra table, but not specifying any restrictions on its participation, thus the DB, not having any more instruction, performs what is called a cartesian join.

Here is what you have done, suppose your tables look like this:

Table A, a_hours, code
1, "CODE"
2, "CODE"
3, "CODE"

Table C, whatever column
W
X
Y
Z


Now you write:
SELECT SUM(hours) FROM A, C WHERE A.code = "CODE"


You join C in but say nothing about it, so the database does this cartesian product:

a_hours, whatever
1, W
1, X
1, Y
1, Z
2, W
2, X
2, Y
2, Z
3, W
3, X
3, Y
3, Z

Obviously, the sum of 1+1+1+1+2+2+2+2+3+3+3+3 = 40 is very different to 1+2+3 = 10

I mean, there is no actual difference in the queries.
THere's a huge difference. Its actually caused by the way you write them.. That style is very old syntax, and we try not to write SQL that way any more

Even if I refer to the fields specifically with A.A_Hours, etc. there is no change in the differing results.
Indeed not, because no matter what you name the fields, the cartesian product will still exist.

I think youll find that the two queries give different results such that the sum from query 1 (the big one) divided by the sum from query 2 (the small one0 will = the numbe rof rows in table C

Can somebody explain to me what the problem is exactly?

Yes, if you'd get into the habit of writing sql like this:

VB.NET:
SELECT
  whatever
FROM
  a
  INNER JOIN 
  c
  ON 
    a.id = c.id
THen youll never induce a cartesian accidentally by leaving a stray table name in a FROM without linking it. To perform a cartesian in this way of writing SQLs (modern ANSI standard) you must state CROSS JOIN which is very different to type than INNER JOIN

Get into the new way, and discover a whole new world of SQL, including left and right joins. Up to now, as TG says, inner joining everything causes results to drop out of the set as equivalences are lost..



I'm gonna have to go and check every other query I've used - which is over 500!
I dont envy you, though I'd recommend checking and you really should consider using ANSI sql, not that old way..
 
I learnt SQL using the INNER JOIN syntax but I dislike having to type excessive INNER JOINs when creating my queries. I basically use WHERE and do things the old way. It also makes it simpler to view the tables I'm referencing within a query since they are grouped together rather than being scattered all over the place.

I almost never use any complicated joins because I design my tables to be directly related and task specific. The only reason I'm even using more than 1 'INNER JOIN' at the moment is because I'm working with a ghastly DB that was designed by an engineer rather than a programmer - 100+ tables of badness. It is UGLY! In virtually all the tables the primary key is an autonumber! I'm too scared to even do a redundancy check because I know I'm gonna find some very bad results.

Thanks for all the info about the joins and that but I already know about it, I just didn't know that an unreferenced table defaults to a CROSS JOIN. Apparently, for that query - if that table is left unreferenced while creating a query in SQL Server a CROSS JOIN is automatically inserted.
 
I learnt SQL using the INNER JOIN syntax but I dislike having to type excessive INNER JOINs when creating my queries. I basically use WHERE and do things the old way.
Or you could look at it as a bug inducing hazard; its a subjective opinion, but having to press an additional 10 keys to type the words INNER JOIN isnt really that much hardship..
The point I'm trying to put forward, is one you have proved here yourself; You failed to notice that the two queries were significantly different, possibly because the essential difference was just 2 characters (",c").
This wouldnt have happened with new style, so why insistently stick to a style that induced a mistake in potentially hundreds of cases, for the sake of 10 bytes?
It's like peeling potatoes with a knife and shrugging that you cut yourself occasionally, when you own a potato peeler (impossible to cut one's self, wastes less potato, is faster) that you do not use because it's new-fangled and you prefer the old way..

It also makes it simpler to view the tables I'm referencing within a query since they are grouped together rather than being scattered all over the place.
When simple queries are written, necessarily all the tables appear in one place:
VB.NET:
SELECT
  *
FROM
  (all the tables appear here)
WHERE
There's nothing particularly scatered-all-over about this, and I use indentation to show the reader the difference between a top-level keyword and the data block upon which it is working. Granted, tables start to get scattered when queries start to become nested but at that moment you have to start becoming less precious about tables as being the only source of data you work with in a database; tables are just one among many, and every time you write the word SELECT, youre generating a block of data that can be used in conjuction with another block of data to be selected from again.. It's all just blocks of data, and the sematic that it's from a table becomes rather moot after a few properly complex queries..

Fact of life though, that in relational databases, you cannot solve everything with an inner join. There are some times when there really are no entries in the PhoneCalls table for a particular person because they havent made any calls.. If you want to show them with a 0 next to their name, you cannot do this with inner joins.

Interestingly, its at a state now in our company, where the underplebs who write routine SQLs as part of their job, use left outer joins exclusively - they dont even stop to think what join they need. Everything is left outer because more often than not they are linking two tables where not every row in L has a row in R. Rather than have to think about which joins should be inner and which should be outer, they exclusively use outer because an outer join can do the work of inner (unless you specifically want to knock out results, which is rarer) whereas the converse is not true. Drives me bonkers when I come to audit the SQLs.. ;)

I almost never use any complicated joins because I design my tables to be directly related and task specific.
Hmm.. Sounds more like you have a manager who understands how computers work and doesnt want the moon on a stick! Envy that!
For most slightly-complex reports though, table design cannot obviate the need for outer joins..
If your manager asked for "all accounts who did not spend last year" pleeease dont tell me you would do:
SELECT * FROM account_details WHERE account_number NOT IN (SELECT DISTINCT account_number FROM accounts_spend WHERE spend_date BETWEEN #01/01#2006 AND #31/12/2006#)

Apparently, for that query - if that table is left unreferenced while creating a query in SQL Server a CROSS JOIN is automatically inserted.
Cartesian product is the result of a CROSS JOIN. Cross join is modern ansi syntax, your query doesnt contain a cross join keyword and sql server doesnt amend your query text to contain one, but it does perform the same operation as would writing cross join..
It's not so much that SQL Server consciously thinks "oh, he hasnt referenced C in the where clause, I'll cartese it" - its just that conceptually in a SELECT ... FROM a,b,c WHERE ... query, all tables have their cartesian product worked out, then only those rows that pass the where clause criteria are shown..

Either way; do consider updating your programming style when writing SQLs.. This style standard has been around since 1992 so clinging on to an even older syntax in a world that shifts as rapidly as computing will outdate your thinking and professional appearance quite quickly.
I note too, that it is rare to find sites offering SQLs in old-style, because it is so proprietary. Nobody wants to write a tutorial with 4 different ways of writing a query in old-style when a ratified standard that all modern, serious RDBMSs adhere to, is established..

ANSI92; It's a Good Thing! (TM) :D
 
I've been trolling through all my odd posts and came across this one... I sound rather, err... how do you say... stupid! ;)

Thanks for all the information you provided here Cjard. I'm pretty sure it would've helped more than a few people learn a thing or two. As for myself, I still stick to WHERE style simply because the database I'm being forced to use (the same one mentioned earlier in this thread) doesn't support anything remotely interesting or complex (sub queries aren't even supported!!) and everything is pretty straight forward. Plus all my new projects I get involved with never require anything but a basic join between tables (the way I design databases always happens to fit the WHERE style more easily for some reason... the subconscious at work no doubt!)

I've always abhorred anything with the word 'JOIN' in it because I know quite a few people who over use it and turn what should be a nice and simple query into this nightmare with JOINs all over the place where one WHERE clause would do the same job. I dunno. I guess I just like keeping my eggs in one basket and my comparisons together... I am gonna make a point of forcing myself to go over to the other style though seeing as most modern methods and training force people to use it and most new kids on the block get all nervous and sweaty when they see something that isn't familiar to them :3

But ja. I honestly enjoyed reading this thread and hearing what you had to say. So shot for that!
 
I've always abhorred anything with the word 'JOIN' in it because I know quite a few people who over use it and turn what should be a nice and simple query into this nightmare with JOINs all over the place where one WHERE clause would do the same job.

The database will work in the same way regardless. You should note that arguing that JOIN is pointless or superfluous is a bit like arguing that the English language is overly wordy when hitting someone over the head with a club communicates perfectly well that youre displeased with them. JOIN is the standard, it allows for more logical grouping of queries, subqueries, views etc and the different types of join are like modified verbs in spoken language; they tell us more about what is going on. Before it came along each vendor had their own proprietary joining regulation and while some overlapped, some didnt.


I dunno. I guess I just like keeping my eggs in one basket and my comparisons together...
Yes, but being able to turn up to a job interview saying that you can start writing queries compliant with most databases is better than saying you only learned one proprietary syntax applicable to a dead database
Glad youre looking to move to new syntax. It makes a lot more sense, when you start to see it. remember to indent your SQLs properly and nicely and you'll always be able to work out what goes where
 
Back
Top