Question Stripping HTML

taliesan · Mar 4, 2010

Hey,
I've been writing a program which opens html files in a text area and after allowing the user to make various changes exports it to a word template using bookmarks.

My problem is, i want to remove all the HTML from an opened file whilst its in the text area- without it affecting the original files contents permanently. I found a few example solutions but none worked very well. I'm a novice programmer in vb.net but I've read a bit about using regex to solve this problem, found some examples, couldn't get them working so I'm pretty much at my wits end.

Please note: It'd be brilliant if it would strip the html within the text area inside my form. Some potential solutions suggested opened the file through a browser first, but i'd like to avoid this as its meant to be a very user friendly program

So, I really need some help/guidance. I specifically want to:
1. Strip away the html but -
2. Where break lines or paragraph lines are present interpret them as gaps (so the file content, once html is removed, doesn't just blur together)
3. If possible keep the url's or file paths contained within image tags and reference tags (It'd be nice if it did this, but i won't lose sleep over it if it doesn't)

Is there a neat and fairly simple way of doing this?

I'm using visual studio 2005.
Thanks in advance.

ProtekNickz · Mar 6, 2010

VB.NET:

Trimming and Removing Characters from Strings.

The String class provides Trim, TrimStart and TrimEnd methods to trim strings. The Trim method removes white spaces from the beginning and end of a string. The TrimEnd method removes characters specified in an array of characters from the end of a string and TrimStart method removes characters specified in an array of characaters from the beginning of a string.

You can also use the Remove method to remove characters from a string. The Listing 2 code shows how to use these methods.

Dim str As String = " C# "
Console.WriteLine("Hello{0}World!", str)
Dim trStr As String = str.Trim()
Console.WriteLine("Hello{0}World!", trStr)
str = "Hello World!"
Dim chArr() As Char = {"e", "H", "l", "o", " "}
trStr = str.TrimStart(chArr)
Console.WriteLine(trStr)
str = "Hello World!"
Dim chArr1() As Char = {"e", "H", "l", "o", " "}
trStr = str.TrimEnd(chArr1)
Console.WriteLine(trStr)
Dim MyString As String = "Hello Delta World!"
Console.WriteLine(MyString.Remove(5, 10))

basically you want to remove then in the txt window as it loads it, this is just an example, you might want to look up parsing HTML or string manipulation, Hope you figure it out

taliesan · Mar 6, 2010

Thank you for replying, this seems like a neat solution, I'll give it a try tomorrow

Question Stripping HTML

taliesan

New member

ProtekNickz

Well-known member

taliesan

New member

Similar threads

Share this page

Latest posts