Single Char Function ?

welchb

Active member
Joined
Jul 4, 2005
Messages
25
Location
UK
Programming Experience
Beginner
Hi Guys,

I'm trying to read a text file, line by line and then use some of the data from these lines as I read them.
I'm OK with opening the file and reading the text in using (tempReadline = objReader.ReadLine() & vbNewLine)

The main problem I have is that the text often has mutiple spaces, or mutiple "-" chars in it. What I'm trying to do is to get a consistent single character throughout, e.g.

VB.NET:
"    This     is an    example of a      string    "
to
VB.NET:
"This is an example of a string"

I have tried using the TRIM function as follows -->

Dim charsToTrim() As Char = {"*"c, " "c, "-"c}
tempReadline = objReader.ReadLine() & vbNewLine
TrimTextLine = tempReadline.Trim(charsToTrim)


however the result is -->

VB.NET:
"This     is an    example of a      string"

so it does not TRIM all of the extra spaces within the string.

Ideally I'd like to repalce all mutiple chars with a single char of my choice (space being the default)...

I did also look at the REPLACE function, however do not want to replace ALL the spaces (or selected char) and I started to look at Regualar Expressions --- however could not find anything.

I also did search the forum to see if this had come up.

If someone can help please.

Tkx,

Brian
 
Last edited by a moderator:
Hi John

Thanks for the quick reply --- I'll have a look at RegEx. I've never used them before so hence why I was hesitating this route.
I'll post back here as and when I have a solution (or if I get stuck) and maybe tease some better brains than mine.

Thanks also for the link.

Cheers,

Brian
 
Hi John,

I probably still need to look @ RegEx to see if I can do this in a better (more efficient way), however this said I did have a quick look and from what I could guage, even with RegEx I would need to do some sort of loop to remove mutiple occurances within a string.

I might be wrong and if someone can suggest something I'm open to this.

Also, my learning curve might need a little more time on RegEx, so in the mean time I've coded a quick test bed, as follows -->

findstring = 0
TrimTextLine = tempReadline.Trim(charsToTrim)
Do
findstring = InStr(TrimTextLine, " ")
If findstring <> 0 Then
TrimTextLine = TrimTextLine.Replace(" ", " ")
End If
Loop Until findstring = 0
TextBox4.Text = TrimTextLine

I've missed out some of the DIM's at the fron of this, however I guess gives the general idea of what's going on -- my thoughts on this are

-- I still use TRIM for leading/training spaces (char of my choice), as this is probably more efficient than a loop and get's rid of both ends at once
-- I test with findstring there are chars to remove
-- This then loops around and replaces 2 for 1 each time

My next steps would be to make this a Function so I can call as needed and replace the hard-coded " " with a char being passed into the function -- if nothing is passed I will defualt a space.

Performance wise --- this does not seem to be too bad and the file I'm loading only takes about 2 seconds ontop of the processing which is acceptable.

Thanks again for you help and pointing me to RegEx -- I will get to take a look as I think this can be useful for other things too.

Cheers,

Brian
 
even with RegEx I would need to do some sort of loop to remove mutiple occurances within a string.
No, regex is all about string patterns, and repetition is a pattern so that is integral in regex syntax, eg in your case the pattern is for example a space char repeated 2 or more times. The regex pattern for this is simply " {2,}". So you end up with this one line call:
Dim input = "   This     is an    example of a      string  "
input = input.Trim 'trim lead/trail

Dim pattern = " {2,}"
input = System.Text.RegularExpressions.Regex.Replace(input, pattern, " ")
 
Thanks again John.

I've added this example to my 'test bed' and yes it works great thanks. I'll try it on the whole file to see if it offers perfromance improvements (which I'm sure it does).

I guess I really do need to look at these Reg expressions more fully --- I understand your example, probably the bigger challenge is understanding the patterns that can be used and what they will do, i.e. the general syntax of Regular Expressions.

It looks like an art form in it's own right!!!

Appreciate the help/feedback
 
Regex comes with a little bit of overhead, especially for small strings and simple patterns, and when only used a single time. Though when text content grow the regex gets more efficient, and when pattern gets more complex coding gets easier. The regex engine may also give better results when a expression is used more frequently, it is also possible to use same Regex object for multiple calls for better performance.

For the simple pattern and very small input used here, and only called once, a String.Replace loop will out-perform Regex.Replace (only measured in ticks, ie marginal), ex this code:
While input.IndexOf("  ") <> -1
    input = input.Replace("  ", " ")
End While

But as mentioned, it doesn't take much usage, or more complexity in pattern, before Regex shows it powers.
Actually for a single call, regexinstance.Replace will do that replacement a bit faster, but it takes a few ticks to create that instance.
 
thanks again for the additional explaination --- for me and the process I'm running it's not too time dependent to save the small 'clicks' of time -- although useful to know and another way to skin the same cat.

I'm actually going to keep the small function I've wrote in addtion to the RegEx -- as it's good for my learning and understanding and your last example is also another useful reference point (for me & I hope others).

At the moment, I'm taking a particular ASCII file format and converting it into an easier to use format, e.g. CSV or TAB delimtered.

Slightly longer term I want to be able to create a profile of the incoming format and a profile of the outgoing format --- e.g. text to CSV, CSV to XML, etc.....

I'm sure there are a few products out there that does this type of things, however I want to learn more and we have some business specific requirements I've not seen anything do before -- hence this approach.

All the best,

Brian
 
Back
Top