Generate Simple Anagrams

This program shows different methods for generating anagrams which should be (somewhat) easy to solve. The methods are: n-gram distance
The n-gram distance compares the n-grams from the original word with the n-gram for the anagramated (shuffled) word. The more corresponding n-grams, the better. For both words it uses all possible n-grams, from 2-gram to (length word -1)-ngram. E.g. for the word anagram the n-grams used is "an", "na", ""ag", "gr", "ra", "am", "ana", "nag", ... "gram", "anag", ... "gram", "anagr","nagr". The distance is defined as: the number of n-grams that are the same (as the original word) divided by maximal number of n-grams for the word. This means that the distance may be from 0 (no likeness between the words at all) to 1 (same word).

Edit distance The (Levenshtein) edit distance is a common metric for comparing the distance between two words, i.e. how similiar the are. It seems that this metric is not very good for comparing how well we recognize the word (i.e. solve the anagram), but this notion is just subjective findings and hold no scientific bearing.

Shuffling Please note that the (random) shuffling part make take some time. The number of shuffled words is 500.

See also: A program which use a different approach to approximately the same problem is Reading scrambled words; and possible also Generate spelling errors.

Note: This program was announced at my (swedish) blog: Skapa enkla anagram, where there may be some more info.
Word:
Max steps:
Number to show (max 100):
Random word: no yes
Language (for random word): Swedish English

Result

Word: establishment
Max steps: 1
Random word: no

n-step Method

Result by n-step method. This is for max step = 1. Showing the first 20 words.
The number and charactes in parenthesis is the specific steps.
(5 <-> 6 : f <-> l edit dist: 2 ngram dist: 0.39 pct same positions: 0.79)
means that the 5'th and 6'th characters we transposed, which was the letter "f" and "l". The edit distance was 2 and ngram distance 0.39, and has 79% characters in the same positions as the source word.

Edit distance sort

The words generated by n-step method sorted by (Levenshtein) edit distance. Showing the first (best) 20 words which the edit distance for the anagram. Also: "step pos" is the position in the generated n-step word, i.e. in what turn that anagram was generated. "ngram dist" is the ngram distance (see above).

Ngram distance sort

The words generated by the n-step method, sorted by ngram-distance (see above), showing the best 20 words. The ngram distance is also shown.
Also shows the edit distance and the position of the word in the generation of n-step anagrams.

Shuffle method

The shuffle method just creates 500 random anagrams (permutations) based on the word. Here they are sorted by edit distance (see above), showing the first (best) 20 words
. The same shuffled words as above, but now sorted by ngrams distance. Showing the first (best) 20 words. Also the edit distance and ngram distance is shown.


Back to my other useless programs
Back to my homepage
Created by Hakan Kjellerstrand hakank@bonetmail.com