Interesting SELECT query

InertiaM

Well-known member
Joined
Nov 3, 2007
Messages
663
Location
Kent, UK
Programming Experience
10+
VB.NET:
[FONT="Courier New"]
ID    WONUMBER   ITEMCODE
 1    8642       ABC1
 2    8642       TESTA
 3    8642       TESTB
 4    8642       TESTC
 5    8642       TESTD
 6    8666       TESTE
 7    8666       TESTF
 8    8666       TESTG
 9    8666       ABC2
10    8680       TESTH
11    8680       TESTJ
12    8680       DEF1
13    8680       TESTK
14    8681       DEF2
15    8682       ZYX9
[/FONT]

I need to run a query on the above records in SQL Server 2000, which will return the following

VB.NET:
[FONT="Courier New"]
ID    WONUMBER   ITEMCODE
 1    8642       ABC1
 9    8666       ABC2
12    8680       DEF1
14    8681       DEF2
15    8682       ZYX9
[/FONT]

because ABC1 is the first alphabetical (top) record of 8642, ABC2 is the first alphabetical (top) record etc.. This is only a small snapshot - there will be about 5000 records returned.

Any suggestions gratefully received :D
 
Last edited:
Care to elaborate a bit?

At first glance it appears you're just wanting an ORDER BY ITEMCODE clause in your select statement.
 
Sorry, maybe I should have added some more records to the example. Basically I want the "TOP 1 ITEMCODE FOR DISTINCT WONUMBER".

I used to manage it by running a query on all the records, and then programatically paging through to get the TOP 1 value for each WONUMBER. As the number of records has now increased dramatically, I wonder if it is possible to write a fairly complex query to do it.
 
you simply join to a pseudo table that has what you want. SOmething like this:


VB.NET:
SELECT t1.* FROM
 table1 t1
 INNER JOIN
 (
   SELECT woNumber, MIN(itemCode) as itemCode FROM table1 GROUP BY woNumber
 ) finder
 ON
  t1.wonumber = finder.woNumber AND t1.itemCode = finder.itemCode

group for every wonumer, find the MIN (I assume this is what you mean by alphabetically first) itemcode, and then use it as a filter via join back to the main table
 
Works fine, thank you :D
 
I see you've already got a valid answer to your question. The solution I arrived at was a subselect. Hopefully somebody can learn something from it.

VB.NET:
SELECT *
FROM TestData
WHERE ITEMCODE IN
	(SELECT MIN(ITEMCODE) As ItemCode
	FROM TestData
	GROUP BY WONUMBER)
 
The problem I have always found with Subselects is that they take ages to run. I'm not sure about this (someone please correct me if I'm wrong) but the subselect part of that has to run for every line. In other words, if there are 10,000 ITEMCODEs, it will run the subselect 10,000 times. Preselecting the records cjard's way into a pseudo table, it will calculate the pseudo table once only and then compare against that.

I do, however, use subselects for deleting because I feel the logic is safer to understand. I expect cjard is going to have a dig at me about this :p but

VB.NET:
DELETE FROM
  tblParent
WHERE
  tblParent.USRRowID >= 498677
AND
  tblParent.ITEMCODE NOT IN
(
  SELECT
    tblMaster.ITEMCODE
  FROM
    tblMaster
)

To me, it's blindingly obvious that you are only deleting records from tblParent, and you are NOT deleting anything from tblMaster.

I know there's a better way of doing this with LEFT JOINs - it's just that if I delete records from tblParent, I can recreate them. If I delete anything from tblMaster, the world would come to an end for me :D If someone wants to post an extremely safe way of doing the above, I'd be happy to change to it :)
 
I see you've already got a valid answer to your question. The solution I arrived at was a subselect. Hopefully somebody can learn something from it.

VB.NET:
SELECT *
FROM TestData
WHERE ITEMCODE IN
	(SELECT MIN(ITEMCODE) As ItemCode
	FROM TestData
	GROUP BY WONUMBER)

This isnt a solution I'm afraid. You have made the assumption that the ItemCode and ID sort in the same order but they dont..

If you pick the MIN ID from WONUMBER group 8680 you will get ID 10, but the requirement is ID 12, which is gained from the alphabetical sorting of the ItemCode, not the numerical sorting of the ID

Further to this, there s asimple rule to follow when using IN operators: Never ever ever use an IN operator with a list longer than your would be prepared to type by hand.

i.e. DONT do this:

SELECT * FROM table WHERE column IN (SELECT * FROM million_rows)

never run an IN with a million rows, the performance is horrendous. Always and only ever use IN for small queries.

-

Inertia's point about using IN for delete and sometimes update is valid though it should be balanced, performance wise, against an EXISTS query.
I could say prefer to avoid IN, but realistically, you have to look at the plans and the performances of individual queries... I've had IN queries run rapidly, i've had EXISTS queries take forever. Typically, IN is horrible to the database, but sometimes it works out.. Prefer to avoid it, but be open to using it if necessary.

It wouldnt have helped here though.. Because IN can only join on one field. Making a sub-list and joining to its multiple fields is a great, high performance option in most cases
 
reduce the IN list length:

VB.NET:
DELETE FROM
  tblParent
WHERE
  tblParent.USRRowID >= 498677
AND
  tblParent.ITEMCODE NOT IN
(
  SELECT
    ITEMCODE
  FROM
    tblMaster INNER JOIN tblParent USING (ItemCode)
)

Before, youselected all million rows from master. Now you only selec tthe 1000 that match. This may outperform the original

Also the linked query form:

VB.NET:
DELETE FROM
  tblParent p
WHERE
  tblParent.USRRowID >= 498677
AND
  EXISTS
(
  SELECT
    1
  FROM
    tblMaster m
  WHERE
    m.itemcode = p.itemcode
)

It depends how the db engine implements the keywords.. i find that oracle uses nested loops for In and hash tables for exists. THere are certain situations wher eone is preferable to the other
 
This isnt a solution I'm afraid. You have made the assumption that the ItemCode and ID sort in the same order but they dont..

If you pick the MIN ID from WONUMBER group 8680 you will get ID 10, but the requirement is ID 12, which is gained from the alphabetical sorting of the ItemCode, not the numerical sorting of the ID

I'm note sure what you mean here. I created a database with the records supplied and received the exact same result set that InertiaM supplied in his first post.
-----
Thank you for the advice on the usage of the IN operator though. I see now that 5000 results returned in an IN dataset could result in a much higher overhead than the JOIN statement you provided.
 
I'm note sure what you mean here. I created a database with the records supplied and received the exact same result set that InertiaM supplied in his first post.

I'm not sure how you can get the same result as the examples..

Look at group WONUMBER 8680:
10 8680 TESTH
11 8680 TESTJ
12 8680 DEF1
13 8680 TESTK

Your query will pull the MIN ID in that group, yes?
The min ID is 10

Look at inertiaM's example. He needs ID 12 from group WONUMBER 8680. SELECT MIN ID GROUP BY WONUMBER will not give ID 12, because 10 is less than 12 and MIN picks the smallest. The same for 8666 group.. InetriaM wants ID 9 but your query will pull ID 6

Allow me to put it another way. Sort group 8680 by ID:
10 8680 TESTH
11 8680 TESTJ
12 8680 DEF1
13 8680 TESTK

Sort by ItemCode:
12 8680 DEF1
10 8680 TESTH
11 8680 TESTJ
13 8680 TESTK

Can you see that picking the top one of each of these groups will realise different things? MIN ID does work out for some groups because coincidentally the ID and the itemcode sort in the same order, but it does not work for all groups

;)



Thank you for the advice on the usage of the IN operator though. I see now that 5000 results returned in an IN dataset could result in a much higher overhead than the JOIN statement you provided.
Conceptually the DB vendor could choose to implement JOIN anf IN the same, but because IN typically expands out to a sequence of ORs, it has its own "in-list iterator" rather than using some intelligent join
 
The query will pull out the MIN ITEMCODE not the MIN ID.

Here's the results I get:

VB.NET:
ID          WONUMBER                                ITEMCODE
----------- --------------------------------------- -----------------------
1           8642                                    ABC1
9           8666                                    ABC2
12          8680                                    DEF1
14          8681                                    DEF2
15          8682                                    ZYX9

(5 row(s) affected)

Once again thank you for your input, it has been enlightening.
 
Sorry.. i seem to have misread your SQL repeatedly; i read it as SELECT MIN(id) ... I have no idea why!
 
it has been enlightening.

Definitely - thank you both for your input. I've already changed two of my DELETE queries ;)
 

Latest posts

Back
Top