What is the fastest way to write to a text file?

venablc

Member
Joined
Jul 18, 2006
Messages
18
Programming Experience
Beginner
I am converting and old vb6 program into VB.NET for various reasons and have hit a problem....

Part of the program involves writing alot of lines to a file (upto 500,000). The original program can get through 250,000 lines in just over 2 minutes, here is the source code used for this:

VB.NET:
Dim fname As String, rsnap As New ADODB.Recordset, FH As Integer
    Dim buff As String
 
If fname <> "" And Err <> 32755 Then
       FH = FreeFile
       Open fname For Output As #FH%
 
       rsnap.Open "select mobile, status, lastping, network from view_ping_batch where pID=" & pID & " order by status, mobile", CNN, adOpenForwardOnly, adLockReadOnly
       While Not rsnap.EOF
             buff$ = ""
             buff$ = RTrim(rsnap("Mobile")) & vbTab
             buff = buff$ & rsnap("Status") & vbTab
             buff$ = buff$ & Format$(rsnap("lastping"), "dd mmm yyyy ttttt") & vbTab
             buff$ = buff$ & RTrim(rsnap("Network"))
             Print #FH%, buff$
             rsnap.MoveNext
       Wend
       rsnap.Close

In VB .NET the equivalent to this which i am using looks like this:

VB.NET:
Dim fname As String, rSNAP As New ADODB.Recordset, pID As Long, fh As Integer
Dim buff As String
If fname <> "" And Err().Number <> 32755 Then
Dim fs As New FileStream(fname, FileMode.Create, FileAccess.Write)
Dim s As New StreamWriter(fs)
s.BaseStream.Seek(0, SeekOrigin.End)
rSNAP.Open("select mobile, status, lastping, network, rcode from view_ping_batch where pID=" & pID & " order by status, mobile", CNN, ADODB.CursorTypeEnum.adOpenForwardOnly, ADODB.LockTypeEnum.adLockReadOnly)
While Not rSNAP.EOF
buff$ = ""
buff$ = RTrim(rSNAP("Mobile").Value) & vbTab
buff = buff$ & rSNAP("Status").Value & vbTab
buff$ = buff$ & Format$(rSNAP("lastping").Value, "dd mmm yyyy ttttt") & vbTab
buff$ = buff$ & RTrim(rSNAP("Network").Value)
pBAR.Value = pBAR.Value + 1
s.WriteLine(buff)
rSNAP.MoveNext()
End While
s.Close()
rSNAP.Close()
rSNAP = Nothing
End If

This implementation however is very slow taking almost an hour to write 250,000 instead of the original 2 or 3 minutes! the loop and everything else is just as fast, it is the writeline function that is comparitvley very slow compared to the original print function in vb6 (which is no longer present as far as im aware in .NET)

Can anyone suggest a faster way of writing to a file line by line?
 
My test with 500.000 writelines and similar string manipulation during the loop took only 2,6 seconds to finish and made a 36MB file. I don't know about ADODB.Recordset but can't find any particular complaining on the web about slowness. So only conclusion I have is that FileStream and StreamWriter is very fast in .Net.
 
get you out of bad VB6 habits like RTrim(string) .. it's now theString.TrimEnd())

Then look at StringBuilder, and DataReader.. streamwriters can be opened in append mode so there's no need to seek to the end.

To concur with johnH, i once forgot to put a command in to move a datareader on to gthe next record, and in the 30 seconds it took me to realise while debugging, vb had written a 500 meg file of several million records.. it's not slow!

I do wonder if it's something else that is the slow part.. do yourself a big favour and rewrite that entire code sample, in proper vb.net..
 
Last edited by a moderator:
I have found what is causing the slowness, it is not actually the writing to the text file, it is the line
VB.NET:
rsnap.MoveNext()
In VB6 the the line
VB.NET:
rsnap.Open "select mobile, status, lastping, network from view_ping_batch where pID=" & pID & " order by status, mobile", CNN, adOpenForwardOnly, adLockReadOnly
downloads the dataset from our server, which i assume is stored in cache, and
VB.NET:
rsnap.MoveNext()
reads from this locally, however...
in VB .NET the exact same code opens the record set but
VB.NET:
rsnap.MoveNext()
reads the next record from its source, the server, making it incredibly slow when reading 500,000 records!

Does anyone know how to prevent this and make the record set work in the same way as in VB6?:confused:
 
You did explicitly state that you are using MS Access database, didn't you? Have a look at the MS Access forum http://www.vbdotnetforums.com/forumdisplay.php?f=54 There might be something in one of our other database forums too, Database General is generally using by people and also Data Access same. Other than that, looking into the System.Data.OleDB namespace should get you right on track, this is one of the subsets of ADO.Net in .Net Framework. Employing your favorite search engine with the keywords of your now new knowledge should give you supplies of online tutorials.
 
no i didnt say i was using MS access, im using a remote SQL database. thanks for tips on other places to look though.....
 
is there the possibility that the reader is reconnecting to the database everytime a read occurs?

i just wonder, because i've got an app here that uses ADO.net and readers for the file writing stuff (datatables for the random access stuff) and it really does fly.. hundreds of megabytes a minute. I'll have a bash at knocking up some "proper" .net code for you...
 
Here is some code i wrote that does a similar thing to what you require. no error checking; add that yourself

VB.NET:
        'connection
        Dim con As New SqlConnection("Data Source=MATT\SQLEXPRESS;Initial Catalog=tempdb;Integrated Security=True")
        con.Open()
 
        'command linked to connection
        Dim cmd As New SqlCommand("SELECT * FROM tbl1", con)
 
        'data reader
        Dim rdr As SqlDataReader = cmd.ExecuteReader()
        
 
        'open new stream in true=append mode
        Dim sw As New StreamWriter("C:\temp\file.txt", True)
 
        'make sb of 100 char capacity
        Dim sb As New StringBuilder(100)
 
        While rdr.Read
            'get and trim the string, put in sb
            sb.Append(rdr.GetString(0).TrimStart())
            'put a comma in sb
            sb.Append(",")
            'get and format the int, put in sb
            sb.AppendFormat("{0:0.00}", rdr.GetSqlDecimal(1).Value)
 
            'convert sb to string and write to file
            sw.WriteLine(sb.ToString())
 
            'clear sb
            sb.Remove(0, sb.Length)
        End While
 
        sw.Flush()
        sw.Close()
 
        rdr.Close()
        con.Close()

ive made my sample reasonably close to yours,, showing you how to use a reader, get a string use a stringbuilder and format a numeric type

I was using a table with 2 cols, a varchar(50) and a numeric(18,0) filled simply with a - z in the varchar and 0-9 repeating in the numeric. here is the file it produced in approx 0.1 seconds:

a,1.00
b,2.00
c,3.00
d,4.00
e,5.00
f,6.00
g,7.00
h,8.00
i,9.00
j,0.00
k,1.00
l,2.00
m,3.00
n,4.00
o,5.00
p,6.00
q,7.00
r,8.00
s,9.00
t,0.00
u,1.00
v,2.00
w,3.00
x,4.00
y,5.00
z,6.00
 
thank you cjard,

i got it working very fast now, 30,000 in a matter of seconds, although it does make a request to the server for each seperate record, is there any way around this as to lower the load on our servers?

cheers
 
i didnt time it like that.. 0.1 seconds is the rounded-to-1dp time for the entire procedure, the database tcp connection setup, login, auth, parse query, compile query, access tables, pipe data to client, write file, flush and close file.

i wouldnt bother extrapolating it out that 260000 records is going to take 10000 times as long - just run it and see.

(i dont have quarter of a million strong tables lying around to make use of in timing a bit of code for you, sorry)
 
venablc said:
i got it working very fast now, 30,000 in a matter of seconds, although it does make a request to the server for each seperate record

can you qualify this? are you simply looking at the while reader.Next line and reasoning that it is performing a database call? are you imagining a problem that may not exist?

if you look at pretty much all the file IO that is done in any framework, not just .net then there is some lement of sequential repeat calling. in the database case, the reader is plumbed to a tcp socket and data flows from the database though exactly how the flow is controlled is largely beyond your control. If you were to fill an ADODB Recordset, the data would be read in exactly the same way, and you can be sure that there is little performance penalty in the way most modern io subsystems work

i guess, ultimately my question would be why do you think this method has cause for complaint? it's exactly the same way as a recordset would work anyway
 
i used a packet sniffer and it appears that packets are being sent and recieved throughout the downloading using the .NET version and not the original VB6 version

I will get the server admin to monitor the traffic server side to see if the application is indeed making actual requests or not
 
I read around a lot, and most the stuff I find on the web is confusing and contradictory..

Client-side cursors are faster than server side cursors because they dont hold resources on the server and allow the server to dump all the data to the client without a care.
ADO.NET lets you choose between server-side and client-side cursors
ADO.NET doesnt do client-side cursors
ADO.NET DataSet uses client-side cursors
ADO.NET DataSet uses a DataReader to do its work
ADO.NET DataReader always establishes a server-side cursor
ADO.NET SqlClient data provider never uses a server cursor to access your data
ADO.NET DataReader object allows you to perform a single pass through a set of records as efficiently as possible

I was eventually able to find an MSDN page that said for the most efficient way to access data in a forward only fashion such as we have here, use a datareader, as we have done here. Its possible that you were using your recordset in ADO in client-side fashion - all the data is dumped into it and there is no more communication with the server. In contrast, datareader uses server side cursoring and i can think of a few situations where this is beneficial when fast-forwarding through a resultset..

Ends up, we got some code that:

a) Microsoft say is the fastest way to do it
b) works at the same speed as the original
c) is more controllable and quicker to cancel (i believe) if needed

You may consider you have reached a point where futher investigation into a subject in order to gain a single digit % increase in speed is uneconomical in terms of research time.
 
Back
Top