Question How can I extract the data in a standard format?

pisceswzh

Well-known member
Joined
Mar 19, 2007
Messages
96
Programming Experience
1-3
I have a string like the following.
VB.NET:
<td>aaa</td><td>bbb</td>...

Now, I want to extract each of the <td>...</td> group. How shoud I do this?

I tried to use xml to do this. However, since xml doesn't support < and & in the content area and I can't guarantee there is no such two characters.

Any help would be appreciated. Thanks!

Tom
 
Regex Class (System.Text.RegularExpressions) [ame=http://www.google.de/search?client=firefox-a&rls=org.mozilla%3Ade%3Aofficial&channel=s&hl=de&q=Regex+Groups&meta=&btnG=Google-Suche&aq=f&oq=]Regex Groups - Google-Suche[/ame]

Also search the board, I can rememeber two or three other people with the very same question.

Bobby
 
I trid to use the regex. However if I use the pattern string like
VB.NET:
<td>.*</td>
It turns out it only matches one group which is the entire string, meaning it matches the first <td> and the last </td> and omits the ones between them. In the sample I have provided, it should be two.

I have already searched the board, but in vain. Maybe you can provide me some keywords. I think my problem is kinda difficult to be described in one or two words.

Thanks!
 
Actually I have just trid

VB.NET:
<td>[^</td>]</td>

but it seems that it is not taking </td> as a whole word, meaning if a use this regex to match <td>d</td><td>aaa</td>, it only returns 1 result and bypass the <td>d</td>, because the regex negates the d in it.

Is there a way to negate a whole word?
 
Also, since the given text is quite well formatted. I am just wondering if I can treat it as xml? The only problem using xml is the < and & can't be in the content. Is there a way to solve this?

Thanks!
 
Back
Top