Using Regular Expressions or Pattern Matching

irishtom

New member
Joined
Feb 21, 2006
Messages
2
Programming Experience
3-5
I am a newbie using regular expressions and don't really know if this is the right place to ask the question or even if regular expressions is the way to go.

I have a comma delimited text file (A BAI Banking file). These are a line of numbers with two constants. The field 03, denotes that the field following is the account number(Can be any number of digits) and the field 015 is always followed by the account balance (and can be any number of digits without decimal points.) These fields can be in any position within the string of text but the constants (03 and 015) always are true. The fields that follow each are always the account number in the case of the 03 and the balance in the case of the 015. My problem is to extract the account number and the balance from the text string to manipulate these fields. An example of this text string is as follows:

01,,,050111,0943,2988,,,2/
02,1520813349,121301028,1,050110,,USD,2/
03,0005015146,USD,010,380384,,Z/
88,015,13338154,,Z/
88,040,13301754,,Z/
88,045,854354,,Z/
88,050,459800,,Z/
88,055,459700,,Z/
88,072,12447400,,Z/
88,074,34700,,Z/
88,075,1700,,Z/
88,100,12957770,6,Z/
88,110,,,Z/
88,400,,,Z/
16,173,12518500,Z,,000009100638200,/
16,173,210600,Z,,000009100604520,/
16,173,151000,Z,,000009100604540,/

Any help that anyone can give me will be appreciated. I don't really know where to start. Thank you
 
I higher understanding of the source is needed when parsing out content, to determine all conditions relevant to the retrieval. BAI.org got a reference of this in Pdf here http://www.bai.org/operations/faq.asp

I haven't got to thinking and finding an expression, but from what I see regular expressions is one way to get only those fields mentioned. I assume we are here talking about a large data file with many sets of accounts and balance. Here is a good regular expressions site: http://www.regular-expressions.info/
Start then by getting the whole filecontents into one string variable, ready for regex processing.

If this is only your first step of getting some info, to extend and parse out more content later, I would not choose regular expressions, but rather parse out all fields by string manipulation now, following the complete layout of the BAI file format reference.
 
Thanks

John H

Thanks for the advice. I will give more thought. You idea of string manipulation as a first step in the process seems to be the way to go. I will look into this solution as well as check the BAI site you referenced. Thanks again.

Tom
 
Here is basic regex code to get only those fields.

Assumed is they appear in pairs. You have to utilize the linebreaks, assumed here that 03 record is always at "root" ie beginning a new line. Assumed here is also that 015 is not at "root", utilizing that it alway is preceded by a comma. Furher assumptions are that fields is not broken over several lines, that is the 015 and the balance amount is at same line for instance.

Note that 88 means Continuation, a record started at one line continues next line. Continuation is also something one can prepare for this before processing by removing the BAI coded "\" linebreaks and 88 codes..

The actual regular expressions used here is basically looking for numbers only, no +- operators and no decimals. Standard "look before" regex code is used for "^03," and ",015,". The ^ means start of line. Any length number sequence is captured.

Here is the code:
VB.NET:
'Imports System.Text 
'
'getting the results
Dim ex1 As String = "(?<=^03,)[0-9]+"
Dim ex2 As String = "(?<=,015,)[0-9]+"
Dim rm1 As RegularExpressions.MatchCollection = RegularExpressions.Regex.Matches(file, ex1, RegularExpressions.RegexOptions.Multiline)
Dim rm2 As RegularExpressions.MatchCollection = RegularExpressions.Regex.Matches(file, ex2)
 
'displaying the results
Dim out As New StringBuilder
For i As Short = 0 To rm1.Count - 1
  out.Append("account ")
  out.Append(rm1.Item(i).Value)
  out.Append(vbTab)
  out.Append("balance ")
  out.Append(rm2.Item(i).Value)
  out.Append(vbNewLine)
Next
MsgBox(out.ToString)
 
Getting data from delimited files

One way that I load my delimited files is below. If you notice, MYA() contains each field as delimeted (In my case by a "|", you would use a ",") you can the search each field to find the data you want and collect any associated data referencing the index. This way you have all of the data in storage in the array and easily accessable:

Private Sub LoadFile(ByVal TextName As String, ByVal FileName As String)
Dim FS As FileStream
Dim SR As StreamReader
Dim MyA() As String
Dim ParseLine As String
Dim Hold As String
Dim X, Y, Z As Integer
Dim InsertRow As SqlCeCommand
Dim sb As New StringBuilder
AName = FileName
Z = 0
'Set up a filestream & reader for the text data file
FS = New FileStream(Loc & "\TextFiles\" & TextName, System.IO.FileMode.Open, System.IO.FileAccess.Read)
SR =
New StreamReader(FS)
'Until you get to EOF...
sb = New StringBuilder
sb = sb.Append(
"")
InsertRow =
New SqlCeCommand(sb.ToString, ssceconn)
While Not SR.Peek = -1
sb = sb.Append(
"")
Z = Z + 1
Y = 0
'Copy the data stream one line at a time into a hold field
Hold = SR.ReadLine
' Put the Company/Branch/Cust# of any customer data into the log file
'Parse out the data into fields(they are seperated by |'s
MyA = Hold.Split(bar)
'Determine the number of fields using the getlength command. .getlength(0) returns the number of elements in the first dimension of the array
X = MyA.GetLength(0) - 1
'Put in initial data for command
sb = sb.Append("INSERT INTO " & FileName & " Values ('" & MyA(0))
Do While (Y <> X)
'Add data for each field with seperators
Y = Y + 1
sb = sb.Append(
"', '" & MyA(Y))
Loop
sb = sb.Append("')")
' add it by running insertRow
InsertRow.CommandText = sb.ToString
InsertRow.ExecuteNonQuery()
sb.Length = 0
End While
SR.Close()
FS.Close()
SR =
Nothing
FS = Nothing
InsertRow.Dispose()
If File.Exists(Loc & "\Textfiles\" & TextName) = True Then
Try
File.Delete(Loc & "\Textfiles\" & TextName)
Catch ex As Exception
End Try
End If
End Sub



 
Back
Top