sql distinct troubles

Anti-Rich

Well-known member
Joined
Jul 1, 2006
Messages
325
Location
Perth, Australia
Programming Experience
1-3
hi everyone,

i have a problem at the moment, where i am trying to retrieve timesheet information (for the employees at my company) from the database. now i can get basic information, but i wanted (for comparitive purposes), to break it down to do it by month...

here is my query so far

VB.NET:
[SIZE=2][COLOR=#0000ff]select [/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Distinct [/COLOR][/SIZE][SIZE=2][COLOR=#ff00ff]Month[/COLOR][/SIZE][SIZE=2][COLOR=#808080]([/COLOR][/SIZE][SIZE=2]t[/SIZE][SIZE=2][COLOR=#808080].[/COLOR][/SIZE][SIZE=2]tsDate[/SIZE][SIZE=2][COLOR=#808080])[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]as [/COLOR][/SIZE][SIZE=2][COLOR=#ff0000]'Month'[/COLOR][/SIZE][SIZE=2][COLOR=#808080],[/COLOR][/SIZE][SIZE=2] t[/SIZE][SIZE=2][COLOR=#808080].[/COLOR][/SIZE][SIZE=2]EmployeeID[/SIZE][SIZE=2][COLOR=#808080],[/COLOR][/SIZE]
[SIZE=2][COLOR=#ff00ff]Sum[/COLOR][/SIZE][SIZE=2][COLOR=#808080]([/COLOR][/SIZE][SIZE=2]t[/SIZE][SIZE=2][COLOR=#808080].[/COLOR][/SIZE][SIZE=2]RegularHours[/SIZE][SIZE=2][COLOR=#808080]) [/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]AS [/COLOR][/SIZE][SIZE=2][COLOR=#ff0000]'Total Regular Hours'[/COLOR][/SIZE][SIZE=2][COLOR=#808080],[/COLOR][/SIZE]
[SIZE=2][COLOR=#ff00ff]Sum[/COLOR][/SIZE][SIZE=2][COLOR=#808080]([/COLOR][/SIZE][SIZE=2]t[/SIZE][SIZE=2][COLOR=#808080].[/COLOR][/SIZE][SIZE=2]Overtime[/SIZE][SIZE=2][COLOR=#808080]) [/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]as [/COLOR][/SIZE][SIZE=2][COLOR=#ff0000]'Total Overtime Hours'[/COLOR][/SIZE][SIZE=2][COLOR=#808080], [/COLOR][/SIZE][SIZE=2][COLOR=#ff00ff]sum[/COLOR][/SIZE][SIZE=2][COLOR=#808080]([/COLOR][/SIZE][SIZE=2]t[/SIZE][SIZE=2][COLOR=#808080].[/COLOR][/SIZE][SIZE=2]TotalPay[/SIZE][SIZE=2][COLOR=#808080]) [/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]AS [/COLOR][/SIZE][SIZE=2][COLOR=#ff0000]'Total Paid'[/COLOR][/SIZE]
[SIZE=2][COLOR=#0000ff]FROM[/COLOR][/SIZE][SIZE=2] tblEmployeeTimeSheet t [/SIZE][SIZE=2][COLOR=#808080]INNER [/COLOR][/SIZE][SIZE=2][COLOR=#808080]JOIN[/COLOR][/SIZE][SIZE=2] tblEmployee e [/SIZE]
[SIZE=2][COLOR=#0000ff]ON[/COLOR][/SIZE][SIZE=2] t[/SIZE][SIZE=2][COLOR=#808080].[/COLOR][/SIZE][SIZE=2]EmployeeID [/SIZE][SIZE=2][COLOR=#808080]=[/COLOR][/SIZE][SIZE=2] e[/SIZE][SIZE=2][COLOR=#808080].[/COLOR][/SIZE][SIZE=2]EmployeeID [/SIZE][SIZE=2][COLOR=#0000ff]GROUP [/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]BY[/COLOR][/SIZE][SIZE=2] e[/SIZE][SIZE=2][COLOR=#808080].[/COLOR][/SIZE][SIZE=2]EmployeeID[/SIZE][SIZE=2][COLOR=#808080],[/COLOR][/SIZE][SIZE=2] t[/SIZE][SIZE=2][COLOR=#808080].[/COLOR][/SIZE][SIZE=2]EmployeeID[/SIZE][SIZE=2][COLOR=#808080],[/COLOR][/SIZE][SIZE=2] t[/SIZE][SIZE=2][COLOR=#808080].[/COLOR][/SIZE][SIZE=2]tsDate[/SIZE]

when i run this query in the analyzer it returns incorrect results, in the way that if there are 5 entries from one person for january, it shows 5 rows, rather than 1 condensed row. what im trying to do is make it so that it returns a total hours worked, total overtime and total pay broken down month by month. i understand how distinct works (or at least i thought i did) but it doesnt seem to want to do what i want in this case. i dont understand why its returning x number of rows rather than just 1 per month.

if someone could offer any input id be greatly appreciative

cheers
adam
 
update!

ok, i have discovered why it is returning x number of rows rather than one... the problem is the group by statement, and the t.tsDate column. but i cant remove it from the query, as it spits out an error saying

Column 'tblEmployeeTimeSheet.tsDate' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

does anyone know how i can change the query so i get distinct Month values but without removing the group by clause?

cheers in advance
adam
 
You almost got it.... change the group BY to use Month(tsDate) rather than just tsDate....

-tg
 
bloody hell tg, your a lifesaver! cheeky little thing, that sql statement. i never knew you could group by with an aggregate function.

cheers mate, much appreciated
regards
adam
 
I do that kind of stuff all the time at the office....it's amazing what you can put into a Where or a Group By clause. But just to be technically correct, the Month() function isn't an aggregate.... if it were, it would have to go into a Having clause... Month() is just a function straight up.

-tg
 
really? ill definitely remember that next time im having multiple row trouble! :p

just out of curiosity, what makes an 'aggregate' function, 'aggregate'?

cheers mate

regards
adam
 
aggregates operate on groups of data, usually numerical, but they can be string as well.
Min
Max
Avg
Sum
Count
Basically anything that can have a "list" of things sent into it (which would be the rows in this case) and a single scalar value comes out, it's an aggregate. Any time you use an aggregate, if there are other non aggregate fields in the query, they must be grouped by those non-aggregate values.

If you search SQL books online for "aggregate" one of the results should be a list of all the aggregate functions.

-tg
 
Note that the DISTINCT in this query is moot.. Queries that use GROUP BY are inherently distinct in their groupings. If rows are being repeated it indicates that some grouping granularity is too fine (i.e. you were grouping by exact day, but then taking the month of the day; allowed, but not the group boundaries you wanted)

A tip is to avoid DISTINCT wherever possible; it can be replaced with a proper GROUP BY in 100% of situations, except for when it is used to modify an aggregate:

SELECT COUNT(DISTINCT employeeID), COUNT(employeeID) FROM tblEmployeeTimeSheet WHERE month = january

The distinct will give us the number of different employees in the timesheet table.. If we have 10 employees but only 3 of them have worked in january, count distinct would return us 3.

This can be very useful sometimes for calculating multiple types of aggregates in a single pass of the table. For example: "This website has have 5 million hits from 250 unique visitors this month"

SELECT COUNT(DISTINCT ip_address_hit), COUNT(*) FROM website_hits WHERE month = current_month
 
Back
Top