SQL Statement Dilemma

VBStarterKid

Member
Joined
Aug 1, 2007
Messages
13
Location
Malaysia
Programming Experience
Beginner
Hello there, I am stuck as to how to sequence an SQL statement to select all customer names from the Customer table and order the list by PurchaseDate. The structure of the tables are like the following:

The relation between the Customer table and Booking table is linked by a primary key CustomerID in Customer and foreign key CustomerID in Booking table. The relationship is a one-to-many, whereby one customer can make multiple purchases. In order to get all customer names, regardless of whether they have made a purchase or not, I'm joining both the tables with:

VB.NET:
SELECT Customer.CustomerID, Customer.Name, Booking.PurchaseDate 
FROM Customer 
LEFT OUTER JOIN Booking ON Customer.CustomerID = Booking.CustomerID 
ORDER BY Booking.PurchaseDate DESC

Currently my statement returns multiple customer names if he/she has records of multiple purchases in the Booking table. Note that the ORDER BY Booking.PurchaseDate is very important because I need to sort the list by giving priority to the customers who made more recent purchases to the top of the result list.

I tried to put a DISTINCT keyword to the Customer.ID and hoped the query would return each customer ID occurence only once but the returned result does not change from the above code:

VB.NET:
SELECT DISTINCT Customer.CustomerID, Customer.Name, Booking.PurchaseDate 
FROM Customer 
LEFT OUTER JOIN Booking ON Customer.CustomerID = Booking.CustomerID 
ORDER BY Booking.PurchaseDate DESC

Is there any way I can achieve what I wanted?? I'd appreciate help because it's quite needed for my project and if possible I'd want to select distinctive customer details at SQL query level and not VB.Net code level because of performance reasons. Thank you.
 
Hello there, I am stuck as to how to sequence an SQL statement to select all customer names from the Customer table and order the list by PurchaseDate. The structure of the tables are like the following:

The relation between the Customer table and Booking table is linked by a primary key CustomerID in Customer and foreign key CustomerID in Booking table. The relationship is a one-to-many, whereby one customer can make multiple purchases. In order to get all customer names, regardless of whether they have made a purchase or not, I'm joining both the tables with:

VB.NET:
SELECT Customer.CustomerID, Customer.Name, Booking.PurchaseDate 
FROM Customer 
LEFT OUTER JOIN Booking ON Customer.CustomerID = Booking.CustomerID 
ORDER BY Booking.PurchaseDate DESC

Currently my statement returns multiple customer names if he/she has records of multiple purchases in the Booking table. Note that the ORDER BY Booking.PurchaseDate is very important because I need to sort the list by giving priority to the customers who made more recent purchases to the top of the result list.

What did you expect? Or, asked another way:

Please show me how you would want the results written so that customer name only appears once. You yourself have said that the relaitonship is one to many, and when this "the element on the one side of the relationship is repeated X times where X is the number of records on the many side - i dont want that" complaint comes up, I do always wonder what is it they actually want the database do to?

So fred smith buys 3 tickets:
Name, TicketId
Fred, 1
Fred, 2
Fred, 3


My question to you is.. What would you have the database put there instead of Repeating Fred 3 times? What else is there that is logical to put there?

I tried to put a DISTINCT keyword to the Customer.ID and hoped the query would return each customer ID occurence only once but the returned result does not change from the above code:
But surely you see that those three rows above are distinct? One is Fred 1, one is Fred 2, and one is Fred 3.. They are all different.. If that concept doesnt quite make sense, tell me, are all these strings the same:
Fred1
Fred2
Fred3

No, they are all distinctly different. DISTINT applies to the whole row. Again, I wonder, you added DISTINCT to the customer name.. what were you hoping it would show? Tell me from the Fred example, what is the one row you would like to see?


Is there any way I can achieve what I wanted??
I'm not sure if you know what you want, but I get the feeling that it's going to involve an aggregate operation that youre not doing. Basically, by asking you all these questions, I'm trying to establisht he difference between how your brain thinks an how a SQL database works. Once I know, i cqan effectively reprogram your thinking..

I'd appreciate help because it's quite needed for my project and if possible I'd want to select distinctive customer details at SQL query level and not VB.Net code level because of performance reasons. Thank you.
But you already are selecting distinct records as far as the DBMS is concerned.. We'll see what your answers to my post bring, and then we will know where you need to change your logic processes to match dataset operations :)
 
I am pretty sure that VBStarterKid wants to achieve this:

- Tim (last purchasedate 13th Sept)
- Kris (last purchasedate 12th Sept)
- John (last purchasedate 12th Sept)
- Fred (no purchases)
- Roy (no purchases)

To order by purchasedate, he simply had to include it in the select statement.
But by doing that, he gets multiple names, and that's what he's now trying to fix.

So his question is how to order the client names by last purchasedate without showing the purchase date.
 
Thank you cjard for the reply. Yes, as far as the SQL logic goes, the returned rows from my query already are distinct from each other in the sense that at least one column is distinct in each row.

What I wanted is that I wanted a distinctive CustomerID column and nto just the whole row:

But surely you see that those three rows above are distinct? One is Fred 1, one is Fred 2, and one is Fred 3.. They are all different.. If that concept doesnt quite make sense, tell me, are all these strings the same:
Fred1
Fred2
Fred3

No, they are all distinctly different. DISTINT applies to the whole row. Again, I wonder, you added DISTINCT to the customer name.. what were you hoping it would show? Tell me from the Fred example, what is the one row you would like to see?

The above shows that Fred bought 3 tickets, but it shows Fred 3 times in the returned result. I don't need to know how many tickets that Fred bought. I just needed to know whether Fred bought ANY tickets and WHEN is the most recent date that he has bought a ticket.
Let's just say in customer table there is Albert, Fred, John and Mike.

Albert bought 2 tickets, Fred bought 3 tickets, John bought 4 tickets and Mike bought none yet. The returned rows are:

Albert1
Albert2
Fred1
Fred2
Fred3
John1
John2
John3
John4
MikeNULL

10 rows selected.

Let's assume ticket numbers represent months in numerical format.
With that said, Albert2, Fred3,John4 and MikeNULL are the only records im interested in, because they're the most recent purchase by the 3 customers Albert, Fred and John. I'm also interested in knowing that Mike has not bought any tickets because I'm going to sort all the customers by PurchaseDate descending into a listview when the user clicks on the Search Customer button. The listview gives priority to customers who have most recently purchased a ticket and put them at the top of the list whereas for those that has not yet purchased any tickets via the system (for example, Mike), they will be pushed to the bottom of the listview. It's not going to help if the system displays duplicate customer names with similar CustomerIDs in a Search Customer listview.

With all that said, I expect SQL Server to return this result set to me:

John4
Fred3
Albert2
MikeNULL

4 rows selected.

John bought a ticket in April, so he's the customer who most recently purchased a ticket based on the ORDER BY PurchaseDate DESC clause. Whereas the customer Mike will appear after everybody else, because of the NULL value.
 
CygNuS, you're right! That is what I'm trying to do, but if the database has 2 records of Kris, lets just say I follow-on to your example:

- Tim (last purchasedate 13th Sept)
- Kris (last purchasedate 12th Sept)
- John (last purchasedate 12th Sept)
- Fred (no purchases)
- Roy (no purchases)

To order by purchasedate, he simply had to include it in the select statement.
But by doing that, he gets multiple names, and that's what he's now trying to fix.

Let's just say Kris had another purchase record in 11th Sept. thus the resultset would be:

- Tim (last purchasedate 13th Sept)
* - Kris (last purchasedate 12th Sept) *
- John (last purchasedate 12th Sept)
* - Kris (last purchasedate 11th Sept) *
- Fred (no purchases)
- Roy (no purchases)


There! Duplicated Kris! Same customer there. I do not want that...how do I filter out Kris' 11th Sept record?
 
The above shows that Fred bought 3 tickets
Um, it actually shows that fred's id is present 3 times in the ticket table.. To say he bought 3 tickets is quite a human interpretation of the raw data.

I just needed to know whether Fred bought ANY tickets and WHEN is the most recent date that he has bought a ticket.
OK, so, for all the tickets fred has ever bought, you want to know the MAXimum date found, if the ticket sales are grouped up by Fred's ID?

Can you see how this thinking is very different to your current approach?

The listview gives priority to customers who have most recently purchased a ticket and put them at the top of the list whereas for those that has not yet purchased any tickets via the system (for example, Mike), they will be pushed to the bottom of the listview. It's not going to help if the system displays duplicate customer names with similar CustomerIDs in a Search Customer listview.
One of the problems is that youre focusing too much attention on the "most recent" being solvable by an order by.. it isnt, and because databases do ORDER BY last of any operation, you should look to include them last in your thinking..


John bought a ticket in April, so he's the customer who most recently purchased a ticket based on the ORDER BY PurchaseDate DESC clause. Whereas the customer Mike will appear after everybody else, because of the NULL value.

OK, well going back to the concept before, what your brain is trying to do is:
Sort all purchaes by date
Skip down the list looking at the names, only allowing the first occurrence of a name to get through

The problem here is that this requires you to look around the row that youre checking, and requires you to implement some kind of memory to remember whether you have seen a name before. Databases cannot do that, because they dont have a memory (we'll ignore analytics right now)

Instead your brain must approach the problem like this:

Break the results up into groups. All of Fred in one group, all of Mike in another
Clearly the grouping is based on the name. All the Fred rows must go into the Fred bucket etc.
Then we look for the highest (most recent) date in the bucket. This is a simple operation that requires no brain or memory.. you jsut get a ticket out, look at the date, get the next ticket out, and keep the one with the higher date, get the nex tticket out.. until you enpty the bucket

In short, working on grouped/related data is what databases do. Working on lists of data and looking at other entries in the list is what humans do


Your query would look like:

VB.NET:
SELECT
  name,
  MAX(purch_date)
FROM
  cust LEFT OUTER JOIN tickets ON cust.id = ticket.id
GROUP BY name

Its a cert that, when everything is grouped up, there will only be one distinct entry for each member in the group
All other entries in the row (not in the group) must have an aggregate function applied to them, like MIN, MAX, SUM, AVG ..

Hopefully this helps you alter your approach to problems of this nature?!
 
There! Duplicated Kris! Same customer there. I do not want that...how do I filter out Kris' 11th Sept record?

You dont filter it out; you ensure it is never included in the first place.. Filtering out is the human/list approach. Breaking into groups based on name and only including the one MAX is the database approach
 
Yeah! That's it! Thank you cjard, the MAX function is exactly what I needed. Now the list looks good.

And your explanation on the difference between my thinking and the actual database operation is clear and concise. I'll do well to remember to think in a computer's logic in my future programming endeavours.
 
Back
Top