Can you help with my string parsing problem?

lickuid

Active member
Joined
Nov 6, 2006
Messages
37
Programming Experience
Beginner
Hi all,
I have a program that is supposed to parse html. I can run the for loop with for 100 loops or so, but the strings are in the tens of thousands of characters... usually around 21 thousand... My application freezes when I call the Public Sub parser_parse ()...
My main question: Does VB .net 2005 have memory issues when it comes to executing a large string in one pass? Also, I tried adapting the parse to a timer, but the increment variable never incremented.

Notes:
My form name is Root
I took out some of the unrelated code. and strippedd the parser of most of its functions but it still illustrates my issue.
The HTML_txt contains, obviously, the HTML text...
stats is an output window that shows the resulted parsed text

Here's the code:
Public Class Custom_Parser
Public i As Long
Public totalChars As Long
Public currChar As String
Public nextChar As String
Public outString As String
Public currTag As String
Public buildTag As Boolean
Public buildOutString As Boolean
Public Function ParseHTML(ByVal str_input As String)
For Me.i = 0 To Me.totalChars
Me.currChar = str_input.Substring(i, 1)
Root.stats.Text += currChar
Next
Return DBNull.Value
End Function
End Class

Public Class Root
Public myBrowser As New System.Windows.Forms.WebBrowser
Public parser As New Custom_Parser

Private Sub Root_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
HTML_txt.WordWrap = False
parser.i = 0
parser.totalChars = HTML_txt.Text.Length
timer.Enabled = False
End Sub

Public Sub timer_Tick(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles timer.Tick
Dim HTML_transfer As Boolean = True
If (browser.StatusText = "Done" And HTML_transfer = True) Then
HTML_txt.Text = browser.Document.Body.InnerHtml
'parse_timer.Enabled = True
parser.ParseHTML(HTML_txt.Text)
HTML_transfer = False
timer.Enabled = False
stats.Text = "HTML_txt length: " & HTML_txt.Text.Length
End If
End Sub
End Class
 
Every time you change a string in vb.net you do not actually change the string, a new one is created instead.

VB.NET:
Root.stats.Text += currChar

So there you are creating 10's of thousands of new strings. This may be the reason. Check into using the StringBuilder Class instead.
 
Okay, well, here's my problem now...
the StringBuilder Class works fine, however, I get an error still:
ArgumentOutOfRangeException was unhandled
Index and length must refer to a location within the string.
Parameter name: length

here's my Public class code:

Public Class HTML_Parser
Public stringBuild As New System.Text.StringBuilder
Public i As Long
Public totalChars As Long
Public currChar As String
Public nextChar As String
Public outString As String
Public currTag As String
Public buildTag As Boolean
Public buildOutString As Boolean
Public Function ParseHTML(ByVal str_input As String)
For Me.i = 0 To Root.browser.DocumentText.Length Step 1

'Me.currChar = Root.htmlFormatted.ToString.Substring(Me.i, 1)
'Root.htmlParsed.Text = String.Concat(Root.htmlParsed.Text, Me.currChar)
'Root.htmlParsed.Text = String.Concat(Root.htmlParsed.Text, Me.i)
stringBuild.Append(str_input.ToString.Substring(Me.i, 1))
'the problem lies in using the Me.i variable in the substring function
But it's an actual long integer
Next
'outString = stringBuild.ToString
outString = "Me.i: " & stringBuild.ToString & " Total Chars :" & Root.browser.DocumentText.Length
Return Me.outString
End Function
End Class

What am I doing wrong??
 
It's zero based, go from 0 To ...Length-1
 
thanks JohnH, I found the solution to the problem a few minutes after my post... it's one of those common-sense-yet-overlooked problems

Thanks a heap for your reply all the same!
 
Back
Top