Question How can I extract the data in a standard format?

pisceswzh · May 7, 2009

I have a string like the following.

VB.NET:

<td>aaa</td><td>bbb</td>...

Now, I want to extract each of the <td>...</td> group. How shoud I do this?

I tried to use xml to do this. However, since xml doesn't support < and & in the content area and I can't guarantee there is no such two characters.

Any help would be appreciated. Thanks!

Tom

Robert_Zenz · May 7, 2009

Regex Class (System.Text.RegularExpressions) [ame=http://www.google.de/search?client=firefox-a&rls=org.mozilla%3Ade%3Aofficial&channel=s&hl=de&q=Regex+Groups&meta=&btnG=Google-Suche&aq=f&oq=]Regex Groups - Google-Suche[/ame]

Also search the board, I can rememeber two or three other people with the very same question.

Bobby

pisceswzh · May 7, 2009

I trid to use the regex. However if I use the pattern string like

VB.NET:

<td>.*</td>

It turns out it only matches one group which is the entire string, meaning it matches the first <td> and the last </td> and omits the ones between them. In the sample I have provided, it should be two.

I have already searched the board, but in vain. Maybe you can provide me some keywords. I think my problem is kinda difficult to be described in one or two words.

Thanks!

JohnH · May 7, 2009

* is greedy, add ? to it to make it lazy. Regular-Expressions.info - Regex Tutorial, Examples and Reference - Regexp Patterns recommended.

pisceswzh · May 7, 2009

Actually I have just trid

VB.NET:

<td>[^</td>]</td>

but it seems that it is not taking </td> as a whole word, meaning if a use this regex to match <td>d</td><td>aaa</td>, it only returns 1 result and bypass the <td>d</td>, because the regex negates the d in it.

Is there a way to negate a whole word?

pisceswzh · May 7, 2009

Also, since the given text is quite well formatted. I am just wondering if I can treat it as xml? The only problem using xml is the < and & can't be in the content. Is there a way to solve this?

Thanks!

Question How can I extract the data in a standard format?

pisceswzh

Well-known member

Robert_Zenz

Well-known member

pisceswzh

Well-known member

JohnH

VB.NET Forum Moderator

pisceswzh

Well-known member

pisceswzh

Well-known member

Similar threads

Share this page

Latest posts