Hey,
I've been writing a program which opens html files in a text area and after allowing the user to make various changes exports it to a word template using bookmarks.
My problem is, i want to remove all the HTML from an opened file whilst its in the text area- without it affecting the original files contents permanently. I found a few example solutions but none worked very well. I'm a novice programmer in vb.net but I've read a bit about using regex to solve this problem, found some examples, couldn't get them working so I'm pretty much at my wits end.
Please note: It'd be brilliant if it would strip the html within the text area inside my form. Some potential solutions suggested opened the file through a browser first, but i'd like to avoid this as its meant to be a very user friendly program
So, I really need some help/guidance. I specifically want to:
1. Strip away the html but -
2. Where break lines or paragraph lines are present interpret them as gaps (so the file content, once html is removed, doesn't just blur together)
3. If possible keep the url's or file paths contained within image tags and reference tags (It'd be nice if it did this, but i won't lose sleep over it if it doesn't)
Is there a neat and fairly simple way of doing this?
I'm using visual studio 2005.
Thanks in advance.
I've been writing a program which opens html files in a text area and after allowing the user to make various changes exports it to a word template using bookmarks.
My problem is, i want to remove all the HTML from an opened file whilst its in the text area- without it affecting the original files contents permanently. I found a few example solutions but none worked very well. I'm a novice programmer in vb.net but I've read a bit about using regex to solve this problem, found some examples, couldn't get them working so I'm pretty much at my wits end.
Please note: It'd be brilliant if it would strip the html within the text area inside my form. Some potential solutions suggested opened the file through a browser first, but i'd like to avoid this as its meant to be a very user friendly program
So, I really need some help/guidance. I specifically want to:
1. Strip away the html but -
2. Where break lines or paragraph lines are present interpret them as gaps (so the file content, once html is removed, doesn't just blur together)
3. If possible keep the url's or file paths contained within image tags and reference tags (It'd be nice if it did this, but i won't lose sleep over it if it doesn't)
Is there a neat and fairly simple way of doing this?
I'm using visual studio 2005.
Thanks in advance.