/* Isomorphic words in Picat. http://www.hakank.org/isomorphic_words/index.html """ This program shows the isomorphic words (sometimes also called "cryptograms") for a given input word. It is done by comparing the "word structure" of the target word and the structure of the words in a word list (English or Swedish). The program was presented in my Swedish blog post "Isomorfa ord (Isomorphic Words)" [http://www.hakank.org/webblogg/archives/000659.html]. The basic principle is that two words are isomorphic if they have the same lists of character occurrences. Example: The word "kjellerstrand" (my last name) has the character occurrence list [1, 1, 1, 1, 1, 1, 1, 2, 2, 2], which means that there are three characters which occurs 2 times (the characters e, l, and r), and seven characters which has one occurrence (the rest). Using an English word list with about 128000 words, there are quite a few words (453) that are isomorphic to "kjellerstrand", for example: "appreciations", "astonishingly" and "wonderfulness". The case of characters matters, i.e. the word "isomorphic" (two lower case i) has not the same word structure as "Isomorphic" (one capital I). Note that the order or position of the characters are not important here. But see Exact isomorphism below, where the exact position is important. Exact isomorphism The basic principle stated above, is that the order of the characters are not important. But the option exact isomorphism makes it possible to compare words where the exact positions of the characters should be compared. Two words with the same structure in this stricter sense, I call exact isomorphic. Example: The word "abba" consists of two different characters ("a" and "b") where the first and last character are the same, and the two mid characters are the same. There are 8 words in the word list with this form, i.e. which are exact isomorphic to "abba", is: "anna", "noon", "otto". The word structure is quite different from "plain" isomorphism. For abba it is [0,1,1,0], which means that the two first characters (index 0 and 1, starting from 0) is different, then the same characters as index 1 (the second character), and at the end the same character as the first. The exact isomorph structure of "kjellerstrand" is [0, 1, 2, 3, 3, 2, 6, 7, 8, 6, 10, 11, 12]; though there are no exact isomorphism for this word in the wordlist. With this option there is much less probability of isomorphisms between words. By the way, the words "probability" and "overwriting" are exact isomorphic. And there are 5 exact isomorphisms to "isomorphic": "economized", "economizer", "economizes", "eliminated", and "eliminates". Print mappings This option states if the mapping between the characters of two isomorphic words should be shown. The default is to show the mapping. See below for some examples of the mappings. If the word consists of only unique characters, however, no mapping will never be shown, since all characters in the word are interchangeable. In these cases, all the exact isomorphic words are also "plain" isomorphic words. Output of the program If there are any isomorphic words, they will be shown in a list along with a mapping between the characters from the input word to the isomorphic word. Example of a mapping: The word "hakank" is isomorphic to "visits". It will be show in this way: visits 1 char: [hn] -> [tv] 2 chars: [ak] -> [is] The two lines with "->" are the characters for mapping the input word ("hakank") to the other word ("visits"). The exact interpretation depends of it is an exact isomorphism or not. Here is shown where there are not search for exact isomorphism. For not exact isomorphisms, the characters left of the "->" are interchangeable with each other, as well as the characters on the right. The example above means that either "h" and "n" can be mapped to either "t" and "v", and "a" and "k" can be mapped to either "i" or "s". The notation of [..] is somewhat related to character classes in regular expressions. However, if it was a search for exact isomorphism the characters are not interchangeable with each other. The word "andersson" is exact isomorphic to "unwritten", and is represented like this: unwritten 1 char: "aedor" -> "eiruw" 2 chars: "ns" -> "nt" This means that "a" (left side) is mapped to "e" (right side), "e" (left side) is mapped to "i" (right side),... and "s" to "t". Some notes * The list of isomorphic words can be quite large. * The word mapping shown may or may not be unique. The program just shows the first mapping that comes to mind. * Space between two words in the input string counts as a character. This means that you can check for words isomorphic to a name. Space first or last are trimmed away, though. * If you just want to study some specific structure, you can use a string which are not a real word. E.g. if you, for some reason, want 7-letter words that are constructed with all unique characters, you can use "abcdefg" as the input word; and there are many such words. Example: there are 24 "abcabc"-like word: e.g. "murmur", "atlatl", "tartar". * The program don't care about "successors" in the alphabet, i.e. the fact that "a" is the character just before "b". This is, however, a feature that may be implemented in the future. * If the input word itself is in the word list it will be shown and counted as a isomorphic word. """ This model was created by Hakan Kjellerstrand, hakank@gmail.com See also my Picat page: http://www.hakank.org/picat/ */ main => go. go ?=> % [cohosh,detect,detent,detest,filial,minion,pinion,suburb] Exact = true, PrintMappings = true, Word = "hakank", WordList = "unixdict.txt", Words = read_file_lines(WordList), Res = isomorphic_words(Word,Exact,PrintMappings,Words), println(Res), println(count=Res.len), nl. go => true. go2 ?=> % [murmur,tartar,testes] Exact = true, PrintMappings = true, Word = "abcabc", WordList = "unixdict.txt", Words = read_file_lines(WordList), Res = isomorphic_words(Word,Exact,PrintMappings,Words), println(Res), println(count=Res.len), nl. go2 => true. go3 ?=> % [outguessing,programming,progressing] Exact = true, PrintMappings = true, Word = "programming", WordList = "allwords3.txt", Words = read_file_lines(WordList), Res = isomorphic_words(Word,Exact,PrintMappings,Words), println(Res), println(count=Res.len), nl. go3 => true. /* An example: word = yuwoawryt [word = yuwoawryt,exact = true,isoList = [0,1,2,3,4,2,6,0,8],isoMap = (map)[u = 1,r = 1,t = 1,o = 1,a = 1,y = 2,w = 2]] Matching words: w = avuncular [a,o,r,t,u,w,y] => [c,n,l,r,v,u,a] w = isotropic [a,o,r,t,u,w,y] => [r,t,p,c,s,o,i] w = watertown [a,o,r,t,u,w,y] => [r,e,o,n,a,t,w] [avuncular,isotropic,watertown] count = 3 */ go4 ?=> Exact = true, PrintMappings = true, _ = random2(), Alpha = "abcdefghijklmnopqrstuvwxyz", N=9, Word = [Alpha[1+random mod Alpha.len]: _ in 1..N], println(word=Word), Word.len != Word.remove_dups.len, % must be distinct WordList = "unixdict.txt", % WordList = "words_lower.txt", % WordList = "allwords3.txt", % WordList = "sv_spelling_org_utf8.txt", Words = read_file_lines(WordList), Res = isomorphic_words(Word,Exact,PrintMappings,Words), println(Res), println(count=Res.len), nl. go4 => true. % % Check all words in a wordlist. % Skip those with just unique letters. % go5 ?=> Exact = true, PrintMappings = false, WordList = "unixdict.txt", % WordList = "words_lower.txt", % WordList = "allwords3.txt", % WordList = "sv_spelling_org_utf8.txt", Words = read_file_lines(WordList), foreach(Word in Words) if Word.length != Word.remove_dups.length then println(word=Word), Res = isomorphic_words(Word,Exact,PrintMappings,Words), println(Res), nl end end, nl. go5 => true. % % Check what words are isomorphic to the word Word % given the words in the word list Words. % - Exact: if true then the structure is the same, % if false it don't have to be the same structure, just similar % - PrintMapping: if true, print the structure % isomorphic_words(Word,Exact,PrintMappings,Words) = Res => Len = Word.len, [IsoList,IsoMap] = make_isomorphism(Word), println([word=Word,exact=Exact,isoList=IsoList,isoMap=IsoMap]), Count = 0, println("\nMatching words:"), Res1 = [], foreach(W in Words,length(W) == Len) [L,M] = make_isomorphism(W), Found = false, if Exact then if L == IsoList then Found := true, Count := Count + 1 end else if map_eq(M,IsoMap) then Found := true, Count := Count + 1 end end, if Found == true then Res1 := Res1 ++ [W], if PrintMappings then println(w=W), MM = new_map(), foreach(I in L) MM.put(Word[I+1],W[I+1]) end, MMList = MM.to_list.sort, writef("%w => %w\n",[K : K=_ in MMList],[V : _=V in MMList]), nl end end end, Res = Res1. % % Convert a word to a word mapping % - as a list (L), used when Exact == true % - as a hash map (Map), use when Exact == false % make_isomorphism(Word) = [L,Map] => M=new_map(), L1 = [], Map = new_map(), foreach({C,I} in zip(Word,0..Word.len-1)) if M.has_key(C) then L1 := L1 ++ [M.get(C)] else L1 := L1 ++ [I], M.put(C,I) end, Map.put(C,Map.get(C,0)+1) end, L = L1. % % Check if the two maps M1 and M2 are equal regarding % occurrences of the values. % map_eq(M1,M2) => M1.values.sort == M2.values.sort.