Regular Expression Performance

kracked2

New member
Joined
Jun 1, 2007
Messages
2
Programming Experience
3-5
hi, I'm wondering about the performance of regular expression on different scenario, and if it would be better to go with substrings and/or stringbuilder class

VB.NET:
Private m_objRegexStrReplace As Regex = New Regex("\{[0-9]+\}")
        For Each objMatch As Match In m_objRegexStrReplace.Matches(strObjectText)
            Try
                drResults = m_dtData.Select("ID = " & objMatch.Value.Substring(1, objMatch.Length - 2))
                strDatabaseString = drResults(0).Item("cString")
                objSpcRegex = New Regex("\{" & objMatch.Value.Substring(1, objMatch.Length - 2) & "\}")
                strObjectText = objSpcRegex.Replace(strObjectText, strDatabaseString, 1)
            Catch
                'ID Not found in database
                MessageBox.Show("Id not found in DB")
            End Try
        Next

        r_cr_TextObj.Text = strObjectText

I'm trying to cycle through crystal report objects and find fields that contain text and replace them with the datatable m_dtData, which contains 2 columns, the id and the message. So based on the situation, i cannot use my main regex and do a replace right away since i dont know what i have to replace it with since it depends on a datatable field, so i have to create a second regex and i was wondering about performance issues, for a regex like "\{[0-9]+\}", should i use the stringbuilder class and do manual replaces or regex would be faster... And I'm not 100% sure that i'm using the regex in the correct way to do what i want to do.
 
Can you rephrase your question a bit? I'm not clear what youre doing

You have a list of things that might be like:
{123}
{456}
{789}

And you want them to be mapped, so that 123 -> "hello", 456 -> "goodbye", 789 -> "foobar"?

{hello}
{goodbye}
{foobar}
 
More like, i have crystal report objects which might contain text using tags like {8271} and {8712} and etc...

those tags contain an ID, example, the {8271} tag represent message ID #8271, and we use a database that has all language for message #8271 so all we have to do is to replace the {8271} with the appropriate message in the database, in the code above, the datatable m_dtData already has the correct language, it has 2 column, one for the ID and one for the string we have to replace this tag + id with

and no, lets say the database states message 61 being "hello" message 62 being "good bye" and message 63 being "hahaha", it would have to change :

"iha891 {61}{62}{63}BOUBOUjahah{62}**{{9{63}hehehe"
To
"iha891 hellogood byehahahaBOUBOUjahahgood bye{{9hahahahehehe"

But like i said, i cant just give the datatable to this regex and expect him to understand he has to look inside the tags for the id and match it with a message and replace it this way, so i have to do it manually which requires another regex for every match which i think might be slow for the app.

I hope this cleans it up, i'm looking for a faster way to work it out with regex and someone confirming me which one of regex or string manipulation is faster
 
You dont say how your ID are stored, but I'll assume that its like:

id, message
61, haha
62, hello


so we can use regex to get a list of all IDs present in the message. Ive made the variable names a bit smaller for clarity:

VB.NET:
'note the added brackets, gives us a capture group
Private rex As Regex = New Regex("\{([0-9]+)\}")
Dim idList as New Dictionary(Of String, String)        

For Each m As Match In rex.Matches(strObjectText)
  If Not idList.ContainsKey(m.Groups(1)) Then idList.Add(m.Groups(1), m.Groups(0))
Next m

rex = Nothing
    
Dim sb As New StringBuilder(strObjectText)
For Each id as String In idList.Keys
  sb.Replace(idList(id), myDataTable.FindByID(id))
Next id


We use the regex to match all the IDs. We use a dictionary to ensure only distinct values (if 63 is in twice, we wont run 2 replacements)

Groups(0) is "{63}"
Groups(1) is "63"

These are thanks to the capture brackets in the regex

So we put out string in a stringbuilder - fastest way to manipulate strings - and we replace {63} with the message
myDatatable is a TYPED datatable of whcih ID is the PK - it hence has a FindByID function. If the PK were a column called Bobble, this function would be FindByBobble

The database is not repeatedly queried

I could have just saved the IDs from Groups(1), and done this:

sb.Replace("{" & id & "}", FindByID...)

But when Groups(0) has an already concatted version ready for the taking, we can use it rather than building yet another one

Of course, if you want to mod your DB query to:

SELECT '{' + id + '}' as ID, message FROM tblMessages

Then you can use a List(Of Stirng), and merely store the captures as you had them originally (for each match in matchcollection)


cat.SkinWays > 1
 
Back
Top