Question Sql Select from group by

DekaFlash

Well-known member
Joined
Feb 14, 2006
Messages
117
Programming Experience
1-3
I have an application into which data on horse races is entered, I then click calculate to work out and save the top 3 horses into a seperate database.

I am trying to find the highest value within any column in consecutive groups of 3 rows in a datagrid.

The 3 rows are grouped using a flag in the last column on the 3rd row.

However the following code results in incorrect syntax near = exception.

I hope you can assist, thanks.

VB.NET:
Dim sql As String = "SELECT max(Jockey) FROM VsrData group by EndOfRace = Y "
        Try
            
            racetypeda = New SqlDataAdapter(sql, conn)

            racetypeda.SelectCommand = New SqlCommand(sql, conn)
            
            builder = New SqlCommandBuilder(racetypeda)

            conn.Open()

            racetypeds.Clear()

            racetypeda.Fill(racetypeds, "VsrData")

            VsrDataDataGridView.DataSource = racetypeds.Tables("VsrData")

        Catch ex As Exception
            MsgBox(ex.ToString.Trim)
        End Try
        conn.Close()
 

Stonkie

Well-known member
Joined
Sep 12, 2007
Messages
279
Programming Experience
1-3
And how would you order these values?

B - FDB
2 - 2
3 - 3
4 - 4
5 - 5

Or would you reject them all because none of the rows has the best in both column at the same time? And do we still need the 3 best? Or is 3 only a maximum and we can have no rows at all that actually have both the highest B and lowest FDB at the same time?
 

DekaFlash

Well-known member
Joined
Feb 14, 2006
Messages
117
Programming Experience
1-3
Actually its B the highest and FDB the lowest, so the first example (5-2) would indicate a selection and the others would be disregarded.

B - FDB
5 - 2
4 - 3
3 - 4
2 - 5

The majority of the time we will always produce the 3 best, the only exception is on the rare occasion where there are only 2 horses in that race.

There are instances where you can have joint b or joint fdb values or instances where one of the conditions is met but not the other.

There are other conditions to be met in regards to the selection criteria but identifying B and FDB is the first condition we need to look for.

I think to give you a better idea, I'll upload a copy of the analysis file.

Analysis File - 01/11/08

I'll highlight rows matching selection criteria 1 in yellow and those matching selection criteria 2 in blue.
 

Stonkie

Well-known member
Joined
Sep 12, 2007
Messages
279
Programming Experience
1-3
Alright... So, doesn't the last query do that?
 

DekaFlash

Well-known member
Joined
Feb 14, 2006
Messages
117
Programming Experience
1-3
Alright... So, doesn't the last query do that?

I believe the last query example you provided me with was

SELECT TOP(3) * FROM VsrData ORDER BY (B - FDB) DESC

I modified your code as follows - Dim sql As String = "SELECT * FROM VsrData where bolds > 1 and arrows > 1 ORDER BY (b-fdb) desc, Jockey DESC, TRAINER desc"

When I run that as a query it highlights, SHE as a selection even though it doesn't have the highest B.

 

Stonkie

Well-known member
Joined
Sep 12, 2007
Messages
279
Programming Experience
1-3
From that last example, let's compare LIT and MAM. LIT has the highest B and MAM has the lowest FDB. Which one should come first?
 

Stonkie

Well-known member
Joined
Sep 12, 2007
Messages
279
Programming Experience
1-3
How can you get stable results then? I mean, if the highest B does not have the lowest FDB, no other record can have the highest B and no record can be a valid selection. And since B and FDB are not strictly inversely correlated, you will end up with no selection a good part of the time. And I mean that from a mathematical point of view, not from the code. It is not possible to get any selection from the table you showed in the last post.

Is that the expected behavior? If so, you can try this :

VB.NET:
SELECT * FROM VrsData
WHERE (B IN (SELECT MAX(B)  FROM VrsData)) AND 
(FDB IN (SELECT MIN(FDB) FROM VrsData))

Now, how many records does your VrsData table contain? Does it only contain the data for the currently processing race? Otherwise you will have to add filters to that race but I don't know which column your foreign key is.

By the way, this will give no result most of the time. It's not an anomaly, it simply doesn't return any record that does not fit both having the highest B and the lowest FDB at the same time and there isn't necessarily one all the time. This will not return more than one result either unless you have ties for both the B and FDB.
 

DekaFlash

Well-known member
Joined
Feb 14, 2006
Messages
117
Programming Experience
1-3
Yes that is the expected behaviour, as a result the number of times the selection criteria is met for the daily analysis can vary and on some occasions there may not even be any selections.

The reason I highlighted that table was because your previous code identified a wrong selection.

At the moment the VsrData table contains 6597 records but this is only for March 08 as i'm still in the process of importing data over from the old system.

I have put in filters based on date, time and venue.

I'll test your code and let you know the outcome thanks.
 

DekaFlash

Well-known member
Joined
Feb 14, 2006
Messages
117
Programming Experience
1-3
Both SELECT * FROM VsrData WHERE (B IN (SELECT MAX(B) FROM VsrData)) AND (FDB IN (SELECT MIN(FDB) FROM VsrData)) and bolds > 1 and arrows > 1 and SELECT * FROM VsrData WHERE (B IN (SELECT MAX(B) FROM VsrData)) AND (FDB IN (SELECT MIN(FDB) FROM VsrData)) and bolds > 1 and arrows > 1 ORDER BY Jockey DESC, TRAINER desc return no selections.
 

Stonkie

Well-known member
Joined
Sep 12, 2007
Messages
279
Programming Experience
1-3
Of course, we're comparing the values for the event to the minimum and maximum throughout the whole table. We must set the same filter to both inner selects. Here :

VB.NET:
SELECT * FROM VsrData WHERE (B IN (SELECT MAX(B) FROM VsrData WHERE bolds > 1 and arrows > 1)) AND (FDB IN (SELECT MIN(FDB) FROM VsrData WHERE bolds > 1 and arrows > 1)) and bolds > 1 and arrows > 1

Now I don't really understand what "bolds" and "arrows" mean and why you use "> 1". Do you use the database to determine what is selected in the user interface? If you don't specify "race = @race", this will take the matching selection for all of the records in the table and there are good chances it won't return anything.

By the way, this is not the most efficient way of doing it. It should be better to join the table on self, but I always come up with the inner select solutions first. It seems to make more sense to me. If performance is a problem, I can probably figure out at way to boost that up a bit.
 

DekaFlash

Well-known member
Joined
Feb 14, 2006
Messages
117
Programming Experience
1-3
Bolds and Arrows is another part of the criteria - each selection must have at least 1 bold and 1 arrow hence the > 1 but as this is easier to implement I've only just mentioned it.

I'll look at adding race = @race as you suggested, at the moment however from the main screen they select if they want to look at data from a specific date or specific race and this then sets up parameters.

When the analysis window is opened, these parameters help load the correct dataset from the main form into the datagrid on the analysis form.

I'm new to sql so may require help on how to join, if once i get this working i
notice a performance issue we can see if it can be boosted.

Using your code SELECT * FROM VsrData WHERE (B IN (SELECT MAX(B) FROM VsrData WHERE bolds > 1 and arrows > 1)) AND (FDB IN (SELECT MIN(FDB) FROM VsrData WHERE bolds > 1 and arrows > 1)) and bolds > 1 and arrows > 1 still returns a blank grid.
 
Last edited:

Stonkie

Well-known member
Joined
Sep 12, 2007
Messages
279
Programming Experience
1-3
When you add filtering for the event, it will return results. Right now you are considering the whole lot of race results as a single race with thousands of contestant and none of them have both the best B and FDB to be returned. You must limit this to a certain event for the result to make sense.
 

cjard

Well-known member
Joined
Apr 25, 2006
Messages
7,052
Programming Experience
10+
Deka

To add a bit of generic info to this discussion and hopefully aid your understanding of how databases work and how they complete queries, I'd like you to consider the following:


When working with data in databases we are always thinking in terms of a set of data. A SELECT * FROM x will produce a set of all X. SELECT * FROM x WHERE ... will produce a subset of x. When you go on to add grouping the database will reduce the set to a subset, and then start any counting or summing you have told it to do in the group. Any columns that are not counted, summed or have some other aggregate function performed on them must be declared to be group members, and each unique value they contain contributes to the grouping.

Because everything in a database realises a set, a set output from one part of a query can be used as input to another part of the query. SELECT * FROM (SELECT * FROM x WHERE ...) WHERE ...
is a perfectly valid query and the database will first create a subset and output the results, then take that subset and apply another filter, further reducing the size, then output the results again [actually, for the purists, it's sometimes a fib; the database will rewrite your query and in this case it would be perfectly valid to push the two where clauses together because they are effectively an AND.. but we'll leave that truth alone for now].

To effectively have a database answer your problem you must boil it down into sets and subsets, and then understand how sets of data come together.
JOIN connects sets of data horizontally. Some columns must be equal to allow the database to assert that the rows should be linked.
A JOIN B ->
AB

UNION connects sets of data vertically, and the number of columns must be equal
A UNION B ->
A
B


When you have GROUP prepare a set with aggregated results, the detail is irretrievably lost. There are ways of having the database perform aggregate operations alongside the main operation and putting the values into the detailed output, but these are analytical queries and I'll leave them alone. A WHERE clause is applied before a GROUP, and a HAVING clause is applied after.
SELECT a, count(b) as c_b FROM x WHERE y > 100 GROUP BY a HAVING c_b > 20
The above query finds everything in x having a y greater than 100, then creates a group of A and how many times each A value occurs, but only returns values of A where they occur more than 20 times (in the subset)


Have a think of how to state your problem in terms of sets..

In your problem
 
Top Bottom