I want to create a tool which will remove invalid links from all say *.txt files inside a directory.
Sample file:
<sec id="sec1">
<p>"You fig. 23 did?" I <a href rid="sec12">section 12</a> asked, surprised.</p>
<p>"Cross sent it table 9 to me a few weeks ago." Stanton crossed over to my mother, taking her hand in his. "I <a href rid="sec2">section 2</a> couldn"t have argued for better terms."</p>
<p>"There are always better terms <a href rid="sec6">section 6</a>, Richard!" my mom said sharply.</p>
<p>"Of course, I <a href rid="sec2">section 2</a> didn"t know." He pulled her into his arms, crooning softly like he would wit table 9h a child. "I <a href rid="sec2">section 2</a> assumed he was looking ahead.</p>
<p>I <a href rid="sec2">section 2</a> stood. I <a href rid="sec2">section 2</a> had to hurry if I <a href rid="sec2">section 2</a> was going to get to work on time. Today of all days, I <a href rid="sec2">section 2</a> didn"t want to be late.
<fig id="fig4">
<caption><p>I'm confused</p></caption>
</fig>
</p>
<p>Turning to face her, I <a href rid="sec2">section 2</a> walked backward. "I"ve seriously got to get ready. Why don"t we get together for lunch and talk more then?"</p>
<sec id="sec2">
<p>"You fig. 23 can"t be""</p>
<p>I <a href rid="sec4">section 4</a> adored the Art Deco elegance of the Chrysler Building. I <a href rid="sec2">section 2</a> could pinpoint my place on the island in relation to the posit table 9ion of the Empire State Building. I <a href rid="sec2">section 2</a> was awed by the breathtaking height of the Freedom Tower that now dominated downtown. But the Crossfire Building was in a class by it table 9self.</p>
<p>I <a href rid="sec1">section 1</a> felt Gideon before I <a href rid="sec1">section 1</a> saw him, my entire body humming wit table 9h awareness as he stepped out of the Bentley, which had pulled up behind the Benz. The air around me charged wit table 9h electricit table 9y, the crackling energy that always heralded the approach of a storm.</p>
</sec>
</sec>
I want to remove all invalid "section" link tags from the files by checking for each ``rid="secDIGIT"`` in the files and seeing if there is a ``<sec id="secSAMEDIGIT">`` in the file, if found then move on to next link, if not found delete only the tags i.e. <a href rid="sec@"> and </a> and not what lies in between.
The coding I've done so far is incomplete and probably full of flaws, can anyone help?
code:
Imports System.IO
Imports System.Text.RegularExpressions
PublicClass Form1
PrivateSub Button1_Click(sender AsObject, e As EventArgs)Handles Button1.Click
If FolderBrowserDialog1.ShowDialog = DialogResult.OK Then
TextBox1.Text = FolderBrowserDialog1.SelectedPath
EndIf
EndSub
PrivateSub Button2_Click(sender AsObject, e As EventArgs)Handles Button2.Click
Dim targetDirectory AsString
targetDirectory = TextBox1.Text
Dim txtFilesArray AsString()= Directory.GetFiles(targetDirectory,"*.txt")
ForEach txtFile In txtFilesArray
Dim FileInfo AsNew FileInfo(txtFile)
Dim FileLocation AsString= FileInfo.FullName
Dim input()AsString= File.ReadAllLines(FileLocation)
Dim pattern AsString="(?<=rid="sec)(\d+)(?=">)"
Dim r As Regex =New Regex(pattern)
Dim m As Match = r.Match(input)
If(m.Success)Then
Dim x AsString=" id=""sec"+ pattern +""""
Dim r2 As Regex =New Regex(x)
Dim m2 As Match = r2.Match(input)
If(m2.Success)Then
Dim tgPat AsString="<a href rid=""sec + pattern +"">(\w+) (\d+)</a>"
Dim tgRep AsString="$1 $2"
Dim tgReg AsNew Regex(tgPat)
Dim result1 AsString= tgReg.Replace(input, tgRep)
Else
EndIf
EndIf
Next
EndSub
EndClass
Sample file:
<sec id="sec1">
<p>"You fig. 23 did?" I <a href rid="sec12">section 12</a> asked, surprised.</p>
<p>"Cross sent it table 9 to me a few weeks ago." Stanton crossed over to my mother, taking her hand in his. "I <a href rid="sec2">section 2</a> couldn"t have argued for better terms."</p>
<p>"There are always better terms <a href rid="sec6">section 6</a>, Richard!" my mom said sharply.</p>
<p>"Of course, I <a href rid="sec2">section 2</a> didn"t know." He pulled her into his arms, crooning softly like he would wit table 9h a child. "I <a href rid="sec2">section 2</a> assumed he was looking ahead.</p>
<p>I <a href rid="sec2">section 2</a> stood. I <a href rid="sec2">section 2</a> had to hurry if I <a href rid="sec2">section 2</a> was going to get to work on time. Today of all days, I <a href rid="sec2">section 2</a> didn"t want to be late.
<fig id="fig4">
<caption><p>I'm confused</p></caption>
</fig>
</p>
<p>Turning to face her, I <a href rid="sec2">section 2</a> walked backward. "I"ve seriously got to get ready. Why don"t we get together for lunch and talk more then?"</p>
<sec id="sec2">
<p>"You fig. 23 can"t be""</p>
<p>I <a href rid="sec4">section 4</a> adored the Art Deco elegance of the Chrysler Building. I <a href rid="sec2">section 2</a> could pinpoint my place on the island in relation to the posit table 9ion of the Empire State Building. I <a href rid="sec2">section 2</a> was awed by the breathtaking height of the Freedom Tower that now dominated downtown. But the Crossfire Building was in a class by it table 9self.</p>
<p>I <a href rid="sec1">section 1</a> felt Gideon before I <a href rid="sec1">section 1</a> saw him, my entire body humming wit table 9h awareness as he stepped out of the Bentley, which had pulled up behind the Benz. The air around me charged wit table 9h electricit table 9y, the crackling energy that always heralded the approach of a storm.</p>
</sec>
</sec>
I want to remove all invalid "section" link tags from the files by checking for each ``rid="secDIGIT"`` in the files and seeing if there is a ``<sec id="secSAMEDIGIT">`` in the file, if found then move on to next link, if not found delete only the tags i.e. <a href rid="sec@"> and </a> and not what lies in between.
The coding I've done so far is incomplete and probably full of flaws, can anyone help?
code:
Imports System.IO
Imports System.Text.RegularExpressions
PublicClass Form1
PrivateSub Button1_Click(sender AsObject, e As EventArgs)Handles Button1.Click
If FolderBrowserDialog1.ShowDialog = DialogResult.OK Then
TextBox1.Text = FolderBrowserDialog1.SelectedPath
EndIf
EndSub
PrivateSub Button2_Click(sender AsObject, e As EventArgs)Handles Button2.Click
Dim targetDirectory AsString
targetDirectory = TextBox1.Text
Dim txtFilesArray AsString()= Directory.GetFiles(targetDirectory,"*.txt")
ForEach txtFile In txtFilesArray
Dim FileInfo AsNew FileInfo(txtFile)
Dim FileLocation AsString= FileInfo.FullName
Dim input()AsString= File.ReadAllLines(FileLocation)
Dim pattern AsString="(?<=rid="sec)(\d+)(?=">)"
Dim r As Regex =New Regex(pattern)
Dim m As Match = r.Match(input)
If(m.Success)Then
Dim x AsString=" id=""sec"+ pattern +""""
Dim r2 As Regex =New Regex(x)
Dim m2 As Match = r2.Match(input)
If(m2.Success)Then
Dim tgPat AsString="<a href rid=""sec + pattern +"">(\w+) (\d+)</a>"
Dim tgRep AsString="$1 $2"
Dim tgReg AsNew Regex(tgPat)
Dim result1 AsString= tgReg.Replace(input, tgRep)
Else
EndIf
EndIf
Next
EndSub
EndClass