Tricky data formatting issue I need some help with

shurb

Member
Joined
Oct 9, 2007
Messages
16
Programming Experience
1-3
How could I extract the information below:

Name :Smith, John WilliamArrest#: 1349087 1342556 All
Alias:pID#:351556
DOB:11/04/1976Race/Sex:W/M
Height:5'10"Weight:160
Arrested:01/02/2008
At:20:30By:MSD
Address:1426 Main DR Sometown ST 00000

Charges for Arrest #: 1349087
Court CaseArrest TypeCharge DescriptionBond $ AmountTypeArrest Process
0725218301TRAFFICDRIVING WHILE IMPAIRED4000SECUREDORDER FOR ARREST
0725218401TRAFFICDRIVING WHILE LICENSE REVOKED2000SECUREDORDER FOR ARREST
0725218501TRAFFICSPEEDING500SECUREDORDER FOR ARREST

into a format like this, (which is just a comma delimited file that I will put in a DB):
Smith, John William,1349087,,351556,11/04/1976,W/M,5'10",160,,01/02/2008,,20:30,MSD,1426 Main DR Sometown ST 00000,0725218301,TRAFFICDRIVING WHILE IMPAIRED,4000,SECURED,ORDER FOR ARREST,0725218401,TRAFFIC,DRIVING WHILE LICENSE REVOKED,2000,SECURED,ORDER FOR ARREST,0725218501,TRAFFICSPEEDING,500,SECURED,ORDER FOR ARREST



I have a text file that could contain well over 100 entries like this. The really tricky part is this piece:
Court CaseArrest TypeCharge DescriptionBond $ AmountTypeArrest Process
0725218301TRAFFICDRIVING WHILE IMPAIRED4000SECUREDORDER FOR ARREST
0725218401TRAFFICDRIVING WHILE LICENSE REVOKED2000SECUREDORDER FOR ARREST
0725218501TRAFFICSPEEDING500SECUREDORDER FOR ARREST

Because there can be more or less charges and they need to be entered into the DB and tied to the Name. So I am really at a loss as to what to do at this point and could really use some help. Any suggestions?
 
yup, what a mess... I've done something similar to this before.

It seems hard when you first look at it but it shouldn't that hard to process it. The strategy I used was quite simple:

1. Derive a way to process a CASE
2. Process the file into multiple CASEs
3. Loop through and pass each CASE to the method in step #1

If you process the text line by line, it will look much simpler. Sometimes, you will need to peek at the next line to make sure that the current piece of info is the correct element to extract.
 
i wouldnt even do that.. i'd just write a regex in multiline mode with capturing groups, but that data is a mess, no matter how you look at it. take a look at your post. Are you sure the forum software didnt mangle the text? Cause it's just out and out a dog's dinner right now. I cant believe that any computer system outputted that as any kind of report
 
incidentally, you'll not achieve your aim by taking all the offences and putting them on one line. I actually wrote a piece of software that would be able to deal with this a long time ago, and it uses regular expressions to find and replace the inward text, the notion being that you convert the inbound text into XML that a dataset reader can read in.. its then much easier to upload into a DB, but in your case, dont think that setting it up will be a breeze.. i can promise you it will be a nightmare.. if you want to take a look at it, search the forums for RegexFindReplaceMutatingReader
 
Back
Top