Extracting value from email body using regex

graham23s

Member
Joined
Feb 22, 2009
Messages
13
Programming Experience
1-3
Hi Guys,

I have never been the greatest with regex, in my application what i am trying to do is find the value after: "username:" e.g. username: graham23s

The very basic code i have so far is:

VB.NET:
                    Dim emailSource As New Regex("(?<=""username:"").*?", _
                                                   RegexOptions.IgnoreCase Or RegexOptions.Singleline)
                    Dim matches As MatchCollection = emailSource.Matches(GetTextBody(message))


                    Dim usernameValue As String = String.Empty
                    For Each captchaSourceMatch As Match In matches
                        usernameValue = matches(0).ToString
                        MessageBox.Show(usernameValue)
                    Next

It's not working but it's a start lol :)

Any help would be appreciated!

thanks guys

Graham
 
Hi,

I have not actually tried your RegEx expression but it looks overly complicated to me. A way you could do this is to use RegEx to split the message body into words (using spaces and carriage return values) and then iterate through the words looking for the usernames to be stripped out. Have a try with this:-

VB.NET:
Imports System.Text.RegularExpressions
 
Public Class Form1
  Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
    Const strSearch As String = "username"
    Const colonCheck As String = ":"
 
    Dim bolUserNameFound As Boolean
    Dim strMessageBody As String = "This is some sample text that has come from ean email" & vbCrLf
    strMessageBody += vbCrLf
    strMessageBody += "All sorts of info could come into your email." & vbCrLf
    strMessageBody += vbCrLf
    strMessageBody += "And you could have one or many users listed. username: Ian" & vbCrLf
    strMessageBody += "The structure could also change and have spelling mistakes. username: Billy - or it could be fixed" & vbCrLf
    'The only constant which I am going to focus on here is the word 'username' in the string body
    strMessageBody += "UserName: graham123 and Username : Harold" & vbCrLf
 
    MsgBox(strMessageBody)
 
    Dim mySplitEmail() As String = Regex.Split(strMessageBody.ToLower, " |" & vbCrLf)
    For Each strWord In mySplitEmail
      'check for the usname string and set a flag for the next words
      If strWord.Contains(strSearch) Then
        bolUserNameFound = True
      Else
        If bolUserNameFound Then
          If Not strWord = String.Empty AndAlso Not strWord = colonCheck Then
            'here we have a valid username
            TextBox1.Text += strWord & vbCrLf
            bolUserNameFound = False
          End If
        End If
      End If
    Next
  End Sub
End Class
Hope that helps.

Cheers,

Ian
 
If you are sure your users won't misspell the string "username:", your RegEx pattern doesn't need to catch anything before it but sign it is the boundary (beginning or end) of a word (\b).
My personal bitter experience with users, however, makes me suggest you should take defensive measures to catch "username", "username=" and "username :" too.
Then, I think, you should match the given username expression as a sequence of any character but a whitespace, and then the boundary (beginning or end) of the word again (\b).
I'd try "\busername(\s*[:=]?\s*|\s+)\S+\b" and would relax RegEx casing option.
But if you are pretty sure that they will type "username", ONE colon and ONE whitespace then you might try: "\busername:\s\S+\b"
Hope it helps.
 
Back
Top