String fuzzy match algo

pai

Member
Joined
Mar 23, 2005
Messages
5
Programming Experience
Beginner
Does anyone have a code snippet for string fuzzy matching.

i.e I have a source string "visual basic dot net" and a target string as "visual source basic code", it should return me a match result of 50%.

any help will be gladly appreciated.
thanks in advance
--pai
 
why would it return 50%? is it because of: 2 matches out of 4 words. Fuzzy logic is what users feel, but for the back ends (programmers), we have to specify the rules and formula one by one...
 
yupp...because it found two words.

Actually i have the "Edit Distance algo", which i have modified to accept strings and return the match %.(not fully functional yet tho...;) )

but i wanted to know if there was some other approach to it, and if someone could have a better logic. mayb something similar to the Trados algo. Trados calculates string fuzzy matches in Word Macros and it is not CPU intensive. the Edit Distance algo that i ahve can consume a lot of memory when handling large volumes.

lemme know what u think...
 
I think 'Edit Distance' finds similarity between two strings and therefore id does comparison alphabetically... character by character. I have no knowledge about Trados.

Try creating this function somewhere in the form:
VB.NET:
	[color=Blue]Private [/color]strloc [color=Blue]As Integer[/color]
	[color=Blue]Private Function[/color] CountInString([color=Blue]ByVal [/color]StartingPoint [color=Blue]As Integer[/color], _
							   [color=Blue]ByVal [/color]SourceString [color=Blue]As String[/color], _
							   [color=Blue]ByVal [/color]CompareString [color=Blue]As String[/color]) [color=Blue]As Integer[/color]
[color=Green]		' This function counts the number of times
		' a word is found in a sentence.
		' If the word is found, the function is called recursively on
		' the remainder of the string.[/color]
		strloc = Strings.InStr(StartingPoint, SourceString, CompareString)
		[color=Blue]If[/color] strloc <> 0 [color=Blue]Then[/color]
			CountInString += 1
		    CountInString += CountInString(strloc + Strings.Len(CompareString), _
										   SourceString, CompareString)
		[color=Blue]End If[/color]
	[color=Blue]End Function[/color]
and put this in one of the button click event:
VB.NET:
		[color=Blue]Dim [/color]fString [color=Blue]As String[/color] = "visual basic dot net"
		[color=Blue]Dim [/color]cString [color=Blue]As String[/color] = "visual source basic code"
		[color=Blue]Dim [/color]WordArray() [color=Blue]As String[/color] = fString.Split(" ")
		[color=Blue]Dim [/color]tempArray() [color=Blue]As String[/color] = cString.Split(" ")
		[color=Blue]Dim [/color]n [color=Blue]As Long[/color] = tempArray.GetUpperBound(0)
		[color=Blue]Dim [/color]matches [color=Blue]As Integer[/color] = 0
		tempArray = [color=Blue]Nothing[/color]
		[color=Blue]For [/color]i [color=Blue]As Long[/color] = 0 [color=Blue]To [color=Black]tempArray.GetUpperBound(0)[/color][/color]
			[color=Blue]If[/color] CountInString(1, cString, WordArray(i)) > 0 [color=Blue]Then[/color]
				matches += 1
			[color=Blue]End If[/color]
		[color=Blue]Next[/color]
		MessageBox.Show(matches / (n + 1) * 100 & "%")
You would want to modify the CountInString function as it uses recursion and might consume an amount of memory on large loop. Hope this helps.
 
Back
Top