Regular Expression rtf lines

vinnie881

Well-known member
Joined
Sep 3, 2006
Messages
152
Programming Experience
3-5
I have text that I need to use Reg Expressions on.

Basically the text holds numourus records, and each record in the text starts with a %\par or =\par charector string.

I have written the following regex expression to give me everything between those strings

[=|%]\\par \\par.*?%\\par

The issue is that it returns what I want but I do not want it to include the last %\par in the results.

If it includes this in the match, the next record will be skipped because it will start searching from the location after the last %\par, and I need it to start before that..

I need to basically take my regex expression and tell it to omit the last %\par charectors. Does anyone know how to do this?

Thanks

Here is the text.
{\rtf1\ansi \deff0{\fonttbl{\f0\fnil\fprq1 Courier New;}}{\info{\doccomm System: KeyesMail}}\paperw15840\paperh12240\margl0360\margr0360\margt360\margb360\landscape\subfontbysize\sectd\pard\plain\f0\fs16\charscalex79 \par PAGE 1\par PEN230-95-1 707 SEVERANCE FUNDING REPORT FROM 6/01/07 - 6/30/07 7/05/07\par \par 10:39:00\par \par ----- 2006 CONTRACT WKS /$/COMPENS/% ----------- ----- 2007 CONTRACT WKS /$/COMPENS/% ----------- CURRENT\par HOME WEEK 1 WEEK 2 WEEK 3 WEEK 4 WEEK 5 TOTAL WEEK 1 WEEK 2 WEEK 3 WEEK 4 WEEK 5 TOTAL WORK\par CO TM EMPL# EMPLOYEE NAME 6/01 6/08 6/15 6/22 6/29 HIRED ELGBLE TERMED CO LOC\par ======================================================================================================================================================================================================\par \par 09 48 090133 ABITIA GEORGE 38 T HRS 46.50 63.25 44.00 56.00 51.75 261.50 7/25/99 11/25/99 09 48\par 111-11-1111 R HRS 40.00 40.00 40.00 40.00 40.00 200.00\par UNION 01 LOCAL #707 RATE 1.750 1.750 1.750 1.750 1.750\par 401-X MPP-N 707-Y CONTR 70.00 70.00 70.00 70.00 70.00 350.00\par 6.4%\par \par 09 48 087277 ABITIA GEORGE 38 T HRS 52.00 58.50 40.00 40.00 16.00 206.50 5/14/98 9/14/98 09 48\par 111-11-1111 R HRS 40.00 40.00 40.00 40.00 16.00 176.00\par UNION 01 LOCAL #707 RATE 1.750 1.750 1.750 1.750 1.750\par 401-N MPP-N 707-Y CONTR 70.00 70.00 70.00 70.00 28.00 308.00\par 7.1%\par \par 09 48 105138 ACEVEDO MARIA 49 T HRS 40.00 48.00 51.50 51.50 59.00 250.00 11/10/06 3/10/07 09 48\par 111-11-1111 R HRS 40.00 40.00 51.50 40.00 40.00 211.50\par UNION 01 LOCAL #707 RATE .550 .550 .550 .550 .550\par 401-N MPP-N 707-Y CONTR 22.00 22.00 28.32 22.00 22.00 116.32\par 4.7%\par \par 09 48 104494 ADAMS MARCUS 38 T HRS 39.75 27.25 32.00 40.00 39.25 178.25 7/28/06 11/28/06 09 48\par 111-11-1111 R HRS 39.75 27.25 32.00 40.00 39.25 178.25\par UNION 01 LOCAL #707 RATE .550 .550 .550 .550 .550\par 401-N MPP-N 707-Y CONTR 21.86 14.98 17.60 22.00 21.58 98.02\par 4.1%\par \par
 
Wouldn't it be easier to read it into a RichTextBox instance and retrieve the Lines from there? This code does the same as you ask:
VB.NET:
        Dim rtb As New RichTextBox
        rtb.LoadFile("Document.rtf")
        For Each line As String In rtb.Lines
            MsgBox(line)
        Next
 
I need it in the way described above. The code I posted is just to illustrate what I want to accomplish, the overall goal is a lot more complex.
 
How about this:
VB.NET:
        Dim fs As New IO.StreamReader("Document.rtf")
        Dim rtf As String = fs.ReadToEnd
        fs.Close()
        Dim lines() As String = Regex.Split(rtf, "\\par")
 
I understand your sugestion, and appriciate your help, but is there anyway to accomplish what I stated (Specify the regex start-end, then backup a specified amount of charectors)?

I am thinking I can probably take the returned value in regex.match then just subtract the amount from the location, but I was hoping there was a way to do it in the regex expression.
Thanks.
 
What you ask regex-wise is called 'lookaround' (read http://www.regular-expressions.info), for example this will get what is between '\par's:
VB.NET:
(?<=\\par).+?(?=\\par)
Also RegexOptions.Singleline if there are actual linefeeds in the source document.

I don't see the [=|%] happening, also are you sure about this? These chars are part of the document content and not the rtf control codes.
 
Back
Top