Nearest words
Edit distance is a way of calculating the distance ("nearness") of two words.
This nearness search is often called "approximate search".
One of the most common method is Levenshtein distance, see for example
http://www.merriampark.com/ld.htm or more general
google.
The edit distance counts how many operations (delete, insert,
substitution) that is needed for transforming one word to another. E.g. the
edit distance between the word hakan and the word håkan is 1,
since we need one substitution ("å" is substituted for "a"). The distance of
hakan and kalle is 5.
This program shows the nearest words to the searched word, using the
Levenshtein Distance. It first shows the words at nearest distance,
then the words with the next distance (i.e. distance + 1). (If you don't type
in a word, the program will use kjellerstrand, my last name).
Some statistics of edit distances for Swedish and English
The longest distance I found in Swedish is 27 with the following words:
- distriktsveterinärorganisationen, förhandsanmälningsskyldighet
- distriktsveterinärorganisationen, kommunikationsdepartementet
- distriktsveterinärorganisationen, försvarsområdesbefälhavaren
- industritjänstemannaförbundets, förhandsanmälningsskyldighet
- industritjänstemannaförbundets, kompletteringspropositionen
- industritjänstemannaförbundets, återbetalningsskyldigheten
For English the longest distance is 24:
- antidisestablishmentarianism, electroencephalography
- antidisestablishmentarianism, electroencephalograph
The next distance is 20 for the following word pairs
- electroencephalography, straightforwardness
- electroencephalography, contradistinctions
- electroencephalography, mohammedanizations
- electroencephalograph, straightforwardness
- nondeterministically, straightforwardness
A very nice approximate search program is agrep ("approximate grep").
Also see Generate spelling errors which is, in a way, an "inverted edit distance"
Back to my homepage
Back to my other useless programs
Created by Hakan Kjellerstrand hakank@bonetmail.com