Question Linq join on parameterized distinct key

lordbaddkitty

New member
Joined
Sep 8, 2010
Messages
4
Programming Experience
1-3
I'm trying to LINQ two tables based on a dynamic key. User can change key via a combo box. Key may be money, string, double, int, etc. Currently I'm getting the data just fine, but without filtering out the doubles. I can filter the double in VB, but it's slooooow. I'd like to do it in the LINQ query right out of the gate.

Here's the data:

First Table:
VB.NET:
 -------------------------------------------------------------
| AppleIndex  | AppleCost  | AppleColor  | AppleDescription   |
 ------------------------------------------------------------
|     1       |     3      | Red         | This is an apple   |
|     2       |     5      | Green       | This is an apple   |
|     3       |     4      | Pink        | This is an apple   |
|     4       |     2      | Yellow      | This is an apple   |
|     5       |     2      | Orange      | This is an apple   |
|     1       |     3      | Red         | This is an uplicate|
|     2       |     5      | Green       | This is an uplicate|
|     3       |     4      | Pink        | This is an uplicate|
|     4       |     2      | Yellow      | This is an uplicate|
|     5       |     2      | Orange      | This is an uplicate|
 -------------------------------------------------------------

Second Table:
VB.NET:
 ------------------------------------------------------------
| OrangeIndex | OrangeCost | OrangeColor | OrangeDescription |
 ------------------------------------------------------------
|     1       |     1      | Orange      | This is an Orange |
|     2       |     3      | Orange      |                   |
|     3       |     2      | Orange      | This is an Orange |
|     4       |     3      | Orange      |                   |
|     5       |     2      | Orange      | This is an Orange |
 ------------------------------------------------------------

Currently, I'm using the following code to get too much data:

VB.NET:
Dim Matches = From mRows In LinqMasterTable Join sRows In LinqSecondTable _
              On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
              Order By mRows(ThePrimaryKey) _
              Select mRows, sRows Distinct

Outcome:
VB.NET:
 -------------------------------------------------------------------------
| 1  | 3 | Red    | This is an apple     | 1 | Orange | This is an Orange |
| 1  | 3 | Red    | This is an duplicate | 1 | Orange | This is an Orange |
| 2  | 5 | Green  | This is an apple     | 3 | Orange |                   |
| 2  | 5 | Green  | This is an duplicate | 3 | Orange |                   |
| 3  | 4 | Pink   | This is an apple     | 2 | Orange | This is an Orange |
| 3  | 4 | Pink   | This is an duplicate | 2 | Orange | This is an Orange |
| 4  | 2 | Yellow | This is an apple     | 3 | Orange |                   |
| 4  | 2 | Yellow | This is an duplicate | 3 | Orange |                   |
| 5  | 2 | Orange | This is an apple     | 2 | Orange | This is an Orange |
| 5  | 2 | Orange | This is an duplicate | 2 | Orange | This is an Orange |
 -------------------------------------------------------------------------

What I want is this:

VB.NET:
 ------------------------------------------------------------------------
| 1 | 3 | Red    | This is an apple | 1 | 1 | Orange | This is an Orange |
| 2 | 5 | Green  | This is an apple | 2 | 3 | Orange |                   |
| 3 | 4 | Pink   | This is an apple | 3 | 2 | Orange | This is an Orange |
| 4 | 2 | Yellow | This is an apple | 4 | 3 | Orange |                   |
| 5 | 2 | Orange | This is an apple | 5 | 2 | Orange | This is an Orange |
 ------------------------------------------------------------------------

WHAT I NEED TO DO:

  1. Select a list of unique items from the LinqMasterTable from the selected "ThePrimaryKey" column
  2. Join LinqMasterTable and LinqSecondTable with the User Selected ThePrimaryKey and TheForignKey "Keys"
  3. Select ONLY the Rows where LinqMasterTable Key Items are Unique (Distinct)

I have spent a ton of time on this and found nothing online that fits as a solution to this problem. Maybe I'm looking wrong, but I'd appreciate some ideas. Thanks.
 
Declarations and Such:
VB.NET:
    Private Sub LinqTwoTableInnerJoin(ByRef qMasterDS As DataSet, _
                                      ByRef qMasterTable As DataTable, _
                                      ByRef qSecondDS As DataSet, _
                                      ByRef qSecondTable As DataTable, _
                                      ByRef qPrimaryKey As String, _
                                      ByRef qForignKey As String, _
                                      ByVal qResultsName As String)

        Dim TheMasterTable As String = qMasterTable.TableName
        Dim TheSecondTable As String = qSecondTable.TableName
        Dim ThePrimaryKey As String = qPrimaryKey
        Dim TheForignKey As String = qForignKey
        Dim TheNewForignKey As String = ""

        MasterTableColumns = GetColumns(qMasterDS, TheMasterTable)
        SecondTableColumns = GetColumns(qSecondDS, TheSecondTable)

        Dim mColumnCount As Integer = MasterTableColumns.Count
        Dim sColumnCount As Integer = SecondTableColumns.Count

        Dim ColumnCount As Integer = mColumnCount + sColumnCount

        Dim LinqMasterTable = qMasterDS.Tables(TheMasterTable).AsEnumerable
        Dim LinqSecondTable = qSecondDS.Tables(TheSecondTable).AsEnumerable

Get the Data and order it by the Selected Key:
VB.NET:
Dim Matches = From mRows In LinqMasterTable Join sRows In LinqSecondTable _
             On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
             Order By mRows(ThePrimaryKey) _
             Select mRows, sRows

Put the Results into a Dataset Table:
VB.NET:
        ' Make sure the dataset is available and/or cleared:
        If dsResults.Tables(qResultsName) Is Nothing Then dsResults.Tables.Add(qResultsName)
        dsResults.Tables(qResultsName).Clear() : dsResults.Tables(qResultsName).Columns.Clear()

        'Adds Master Table Column Names
        For x = 0 To MasterTableColumns.Count - 1
            dsResults.Tables(qResultsName).Columns.Add(MasterTableColumns(x))
        Next

        'Rename Second Table Names if Needed:
        For x = 0 To SecondTableColumns.Count - 1
            With dsResults.Tables(qResultsName)
                For y = 0 To .Columns.Count - 1
                    If SecondTableColumns(x) = .Columns(y).ColumnName Then
                        SecondTableColumns(x) = SecondTableColumns(x) & "_2"
                    End If
                Next
            End With
        Next

        'Make sure that the Forign Key is a Unique Value
        If ForignKey1 = PrimaryKey Then
            TheNewForignKey = ForignKey1 & "_2"
        Else
            TheNewForignKey = ForignKey1
        End If

        'Adds Second Table Column Names
        For x = 0 To SecondTableColumns.Count - 1 
            dsResults.Tables(qResultsName).Columns.Add(SecondTableColumns(x))
        Next

        PleaseWait(True) 'Locks controls while processing data

        'Copy Results into the Dataset:
        For Each Match In Matches

            'Build an array for each row:
            Dim NewRow(ColumnCount - 1) As Object

            'Add the mRow Items:
            For x = 0 To MasterTableColumns.Count - 1
                NewRow(x) = Match.mRows.Item(x)
            Next

            'Add the srow Items:
            For x = 0 To SecondTableColumns.Count - 1
                Dim y As Integer = x + (MasterTableColumns.Count)
                NewRow(y) = Match.sRows.Item(x)
            Next

            'Add the array to dsResults as a Row:
            dsResults.Tables(qResultsName).Rows.Add(NewRow)

        Next

Give the user an option to clean doubles or not:
VB.NET:
        If chkUnique.Checked = True Then
            ReMoveDuplicates(dsResults.Tables(qResultsName), ThePrimaryKey)
        End If

Remove the Duplicates if they so desire:
VB.NET:
    Private Sub ReMoveDuplicates(ByRef SkipTable As DataTable, _
                             ByRef TableKey As String)

        'Make sure that there's data to work with:
        If SkipTable Is Nothing Then Exit Sub
        If TableKey Is Nothing Then Exit Sub

        'Create an ArrayList of rows to delete:
        Dim DeleteRows As New ArrayList()

        'Fill the Array with Row Number of the items equal 
        'to the item above them:
        For x = 1 To SkipTable.Rows.Count - 1
            Dim RowOne As DataRow = SkipTable.Rows(x - 1)
            Dim RowTwo As DataRow = SkipTable.Rows(x)
            If RowTwo.Item(TableKey) = RowOne.Item(TableKey) Then
                DeleteRows.Add(x)
            End If
        Next

        'If there are no hits, exit this sub:
        If DeleteRows.Count < 1 Or DeleteRows Is Nothing Then
            Exit Sub
        End If

        'Otherwise, remove the rows based on the row count value:
        For x = 0 To DeleteRows.Count - 1

            'Start at the END and count backwards so the duplicate 
            'item's row count value doesn't change with each deleted row
            Dim KillRow As Integer = DeleteRows((DeleteRows.Count - 1) - x)

            'Delete the row:
            SkipTable.Rows(KillRow).Delete()

        Next


    End Sub

Then clean up any leftovers:
VB.NET:
        If Not chkRetainKeys.Checked = True Then 'Removes Forign Key
            dsResults.Tables(qResultsName).Columns.Remove(TheNewForignKey)
        End If

        'Clear Arrays
        MasterTableColumns.Clear()
        SecondTableColumns.Clear()

Final Analysis:
Ran this against 2 Files with 4 columns, 65,535 rows, and with some doubles. Process time, roughly 1 second. In fact it took longer to load the fields into memory than it did to parse the data.
 
Back
Top