Refreshing a sqldatareader in SQL Server 2000

pha_roah

Member
Joined
Jun 13, 2007
Messages
11
Programming Experience
3-5
I am writing a small application that traverses a website log file (already imported in SQL server 2000) and populates a field called "country_" based on the ip and another lookup table that matches ip long numbers with countries.

What I want to do is run a sql query giving me a set of records where country_ is not populated, then load the the first record, run an update statement for the ip address for this and all other records with the same ip address.
It works and updates, except that it goes through the records even if they have been updated simply because the datareader is not refreshed, even if I close it and reload it again.

I previously used a data control and updated record by record which worked, but was slow (I update about 300-600K records per run), so I am trying to see if I run the sql command directly if it would make a difference in performance.

Anyone know what I need to do to ensure I refresh the set after each loop?

Help is appreciated.

The code looks as follows (I have removed the regular declarations and parts that don't apply to the database)
------------------------------------
Dim myConnection As SqlConnection
Dim Countries_conn As SqlConnection
Dim myCommand As SqlCommand

Dim Countries As SqlCommand
Dim dr As SqlDataReader

goahead = True



myConnection = New SqlConnection(ConnectionStrText.Text)
myConnection.Open()
myCommand = New SqlCommand("Select * from staging where country_ IS Null or Country_ = 'Not Found'", myConnection)


Countries_conn = New SqlConnection(ConnectionStrText.Text)
Countries_conn.Open()

Do While goahead
dr = myCommand.ExecuteReader

If dr.HasRows Then
dr.Read()
ipindr = dr("IP")
numip = Dot2LongIP(Trim(ipindr))
Countries = New SqlCommand("UPDATE staging SET Country_ = (SELECT countryLong FROM IPCountry where " & numip & " BETWEEN FROM_Long AND To_Long) WHERE staging.IP = '" & ipindr & "'", Countries_conn)
ra = Countries.ExecuteNonQuery()

Countries_conn.Close()
Else
goahead = False
End If
myConnection.Close()
Loop
-------------------------------------------

Dot2LongIP is a function that converts ip address from Dot format to Long format so I can compare it to my ip country database.
Staging is the table where I want to update country_
IPcountry is the lookup table
 
Done.
The zip file is in the same location, same name, so just click on the link above again. The file is slightly larger this time (300MB:D) because I don't have access to the original database any more (records have been moved from staging to weblog) so use the weblog table for the query which has almost 4 million records (sorry but I don't have access to the tools I need to shrink it for you for a few days)

thanks cjard.
 
Wow.. a 3.5 gig database.. Took 15 minutes just to get it extracted..

My next question is.. there are ~3000 rows in staging that cannot be assigned a country (i.e. all of them) because their IPLong doesnt fall into range of ANY record in ipcountry

e.g. the first one i picked on:

select * from ipcountry where from_long <= 2070727755 and to_long > 2070727755

there is just nothing there.. .what do you want to update those to?
 
Yes, there are about 3000 records with no matching longip. I left these as empty or null, that way, next time, when I update the ipcountry table, and run this query again, they may match something in the updated list.
Now you see why I didn't want to send you the whole thing to begin with.

What you can do is clear the weblog table of most of the records (leave a few hundred thousand) and use that instead of staging.
 
ok.. but if i'm writing a query to target staging, I cant use weblog because the schema is different, and specifically i'm joining on a field in Staging that doesnt exist (longip) in weblog..

I presume I'll have to move some number of records into staging and calc the longip as I go..
 
Now I am making you work.
Let me send you another DB with staging already populated. I have everything setup to do that easily, but won't be able to do it till Saturday.
I promise it won't be 3GB this time.
 
I loaded half a million rows into staging, from weblog using the following:

VB.NET:
insert into staging
select
  classa + '.' + classb + '.' + classc + '.' + classd as ip, 
  Date_,
  File_,
  Result_,
  Hour_,
  Country_,
  sizekb,
  cast(classa as bigint) * 256 * 256 * 256 +
  cast(classb as bigint) * 256 * 256 +
  cast(classc as bigint) * 256 +
  cast(classd as bigint) longip
from
(
  select 
    Date_,
    File_,
    Result_,
    Hour_,
    Country_, 
    Sizekb,
    classa,
    classb,
    substring(ip, 1, charindex('.', ip)-1) as classc,
    substring(ip, charindex('.', ip)+1, len(ip) - charindex('.', ip)+1 )  as classd
  from
  (
    select 
      Date_,
      File_,
      Result_,
      Hour_,
      Country_, 
      Sizekb,
      classa,
      substring(ip, 1, charindex('.', ip)-1) as classb,
      substring(ip, charindex('.', ip)+1, len(ip) - charindex('.', ip)+1 )  as ip


    from(
      SELECT 
        top 500000 
        Date_,
        File_,
        Result_,
        Hour_,
        null as Country_, 
        Sizekb,
        substring(ip, 1, charindex('.', ip)-1) as classa,
        substring(ip, charindex('.', ip)+1, len(ip) - charindex('.', ip)+1 )  as ip 

      FROM weblog w
    ) w
  ) w
) w

And set the following query going:

VB.NET:
UPDATE s
set s.country_ = i.country_  
  
FROM
  weblog.dbo.staging s
  inner join
  ipcountry i
  on
    s.longip >=  i.from_long and 
    s.longip < i.to_long

WHERE
  s.country_ is null

And it was still going after half an hour on a 1.8GHz centrino laptop (with the database on an external usb disk)

I did some more testing, and up to about 11000 rows, the execution plan favourably uses the index (I changed it for a clustered one on ipcountry, figuring that if the country data is stored in the leaf nodes it will be faster than index-to-rowid translation and lookup) on staging and the query completes in less than a second. This compares favourably with your original query (UPDATE staging SET country = (SELECT .. ) ) which took 19 seconds to do 11,000 rows,

Beyond 11000 rows, the execution plan changes to include too massive a dataset from the index seek; i dont know why, but it only considers one of the predicates in the range, and pulls all rows matching just s.longip < i.to_long
.. i dont know where the starting range predicate has gone, but it's a nuisance. I suspect that, because we cannot reassure sqlserver that there will only be one row arising from the "between from_long and to_long" predicate, it doesnt treat it like a primary key. The only solution I can relly think of to this is to manipulate the ipcountry table to include a row for every relevant class of IP addresses we have. i.e. it is unlikely that classes will be split and assigned to differnt countries, so if we can say for sure that:

12.34.56.x is USA
12.34.57.x is China
12.34.58.x is Germany

etc, and fill in a row for every known country, then the query can become based on an equality rather than a range:

VB.NET:
ON
  (s.ip / 256) = i.longip_classC

Alternately, you could simply run the update process every 10 thousand rows inserted. How you would count that is up to you.

I'm going to look up how to hint in SQLServer (but it might not be today), or use a pre-arranged plans on any query.. We might get this cracked in a few seconds yet.. Or we may not! :)
 
Last edited:
Thanks!

Thanks cjard for your help.
I think I can accept the current 30 minutes based on the fact that I only do this once every few months anyway. It's not worth the effort to try and make it any faster, especially that it looks like a challenge.
We can consider this case closed.
 
Back
Top