/* 

  Isomorphic words in Picat.

  http://www.hakank.org/isomorphic_words/index.html
  """
  This program shows the isomorphic words (sometimes also called "cryptograms") for 
  a given input word. It is done by comparing the "word structure" of the target word 
  and the structure of the words in a word list (English or Swedish). The program 
  was presented in my Swedish blog post "Isomorfa ord (Isomorphic Words)" 
  [http://www.hakank.org/webblogg/archives/000659.html].

  The basic principle is that two words are isomorphic if they have the same lists of 
  character occurrences. Example: The word "kjellerstrand" (my last name) has the character 
  occurrence list [1, 1, 1, 1, 1, 1, 1, 2, 2, 2], which means that there are three 
  characters which occurs 2 times (the characters e, l, and r), and seven characters 
  which has one occurrence (the rest).

  Using an English word list with about 128000 words, there are quite a few 
  words (453) that are isomorphic to "kjellerstrand", for example: "appreciations", 
  "astonishingly" and "wonderfulness".

  The case of characters matters, i.e. the word "isomorphic" (two lower case i) has 
  not the same word structure as "Isomorphic" (one capital I).

  Note that the order or position of the characters are not important here. 
  But see Exact isomorphism below, where the exact position is important.

  Exact isomorphism
  The basic principle stated above, is that the order of the characters are not important. 
  But the option exact isomorphism makes it possible to compare words where the exact 
  positions of the characters should be compared. Two words with the same structure 
  in this stricter sense, I call exact isomorphic.

  Example: The word "abba" consists of two different characters ("a" and "b") where the 
  first and last character are the same, and the two mid characters are the same. 
  There are 8 words in the word list with this form, i.e. which are exact 
  isomorphic to "abba", is: "anna", "noon", "otto".

  The word structure is quite different from "plain" isomorphism. For 
  abba it is [0,1,1,0], which means that the two first characters (index 0 and 1, 
  starting from 0) is different, then the same characters as index 1 
  (the second character), and at the end the same character as the first. 
  The exact isomorph structure of "kjellerstrand" is 
  [0, 1, 2, 3, 3, 2, 6, 7, 8, 6, 10, 11, 12]; though there are no exact 
  isomorphism for this word in the wordlist.

  With this option there is much less probability of isomorphisms between words.
  By the way, the words "probability" and "overwriting" are exact isomorphic. And 
  there are 5 exact isomorphisms to "isomorphic": 
  "economized", "economizer", "economizes", "eliminated", and "eliminates".

  Print mappings
  This option states if the mapping between the characters of two isomorphic words 
  should be shown. The default is to show the mapping. See below for some examples 
  of the mappings.

  If the word consists of only unique characters, however, no mapping will never be 
  shown, since all characters in the word are interchangeable. In these cases, all 
  the exact isomorphic words are also "plain" isomorphic words.

  Output of the program
  If there are any isomorphic words, they will be shown in a list along with a 
  mapping between the characters from the input word to the isomorphic word.

  Example of a mapping: The word "hakank" is isomorphic to "visits". It will be show in this way:

    visits
        1 char: [hn] -> [tv]
        2 chars: [ak] -> [is] 

  The two lines with "->" are the characters for mapping the input word ("hakank") 
  to the other word ("visits"). The exact interpretation depends of it is an exact 
  isomorphism or not. Here is shown where there are not search for exact isomorphism.

  For not exact isomorphisms, the characters left of the "->" are interchangeable 
  with each other, as well as the characters on the right. The example above means 
  that either "h" and "n" can be mapped to either "t" and "v", and "a" and "k" 
  can be mapped to either "i" or "s". The notation of [..] is somewhat related to 
  character classes in regular expressions.

  However, if it was a search for exact isomorphism the characters are not interchangeable 
  with each other. The word "andersson" is exact isomorphic to "unwritten", and is 
  represented like this:

    unwritten
        1 char: "aedor" -> "eiruw"
        2 chars: "ns" -> "nt" 

  This means that "a" (left side) is mapped to "e" (right side), "e" (left side) 
  is mapped to "i" (right side),... and "s" to "t".

  Some notes

    * The list of isomorphic words can be quite large.
    * The word mapping shown may or may not be unique. The program just shows the first 
      mapping that comes to mind.
    * Space between two words in the input string counts as a character. This means that 
      you can check for words isomorphic to a name. Space first or last are trimmed away, though.
    * If you just want to study some specific structure, you can use a string which are 
      not a real word. E.g. if you, for some reason, want 7-letter words that are 
      constructed with all unique characters, you can use "abcdefg" as the input word; 
      and there are many such words. Example: there are 24 "abcabc"-like word: 
      e.g. "murmur", "atlatl", "tartar".

   * The program don't care about "successors" in the alphabet, i.e. the fact that "a" 
     is the character just before "b". This is, however, a feature that may be 
     implemented in the future.
   * If the input word itself is in the word list it will be shown and counted as a isomorphic word. 
  """

  This model was created by Hakan Kjellerstrand, hakank@gmail.com
  See also my Picat page: http://www.hakank.org/picat/

*/

main => go.

go ?=>
  % [cohosh,detect,detent,detest,filial,minion,pinion,suburb]
  Exact = true,
  PrintMappings = true,
  Word = "hakank",
  
  WordList = "unixdict.txt",
  Words = read_file_lines(WordList),
  Res = isomorphic_words(Word,Exact,PrintMappings,Words),
  println(Res),
  println(count=Res.len),
  nl.
go => true.

go2 ?=>
  % [murmur,tartar,testes]
  Exact = true,
  PrintMappings = true,
  Word = "abcabc",
  
  WordList = "unixdict.txt",
  Words = read_file_lines(WordList),
  Res = isomorphic_words(Word,Exact,PrintMappings,Words),
  println(Res),
  println(count=Res.len),
  nl.
go2 => true.


go3 ?=>
  % [outguessing,programming,progressing]
  Exact = true,
  PrintMappings = true,
  Word = "programming",
  
  WordList = "allwords3.txt",
  Words = read_file_lines(WordList),
  Res = isomorphic_words(Word,Exact,PrintMappings,Words),
  println(Res),
  println(count=Res.len),
  nl.
go3 => true.

/*
  An example:

  word = yuwoawryt
  [word = yuwoawryt,exact = true,isoList = [0,1,2,3,4,2,6,0,8],isoMap = (map)[u = 1,r = 1,t = 1,o = 1,a = 1,y = 2,w = 2]]

  Matching words:
  w = avuncular
  [a,o,r,t,u,w,y] => [c,n,l,r,v,u,a]

  w = isotropic
  [a,o,r,t,u,w,y] => [r,t,p,c,s,o,i]

  w = watertown
  [a,o,r,t,u,w,y] => [r,e,o,n,a,t,w]

  [avuncular,isotropic,watertown]
  count = 3

*/
go4 ?=>
  Exact = true,
  PrintMappings = true,
  _ = random2(),
  Alpha = "abcdefghijklmnopqrstuvwxyz",
  N=9,
  Word = [Alpha[1+random mod Alpha.len]: _ in 1..N],
  println(word=Word),
  Word.len != Word.remove_dups.len,  % must be distinct
  
  WordList = "unixdict.txt",
  % WordList = "words_lower.txt",
  % WordList = "allwords3.txt",
  % WordList = "sv_spelling_org_utf8.txt",
  Words = read_file_lines(WordList),
  Res = isomorphic_words(Word,Exact,PrintMappings,Words),
  println(Res),
  println(count=Res.len),
  nl.
go4 => true.

%
% Check all words in a wordlist.
% Skip those with just unique letters.
% 
go5 ?=>
  Exact = true,
  PrintMappings = false,
  
  WordList = "unixdict.txt",
  % WordList = "words_lower.txt",
  % WordList = "allwords3.txt",
  % WordList = "sv_spelling_org_utf8.txt",
  Words = read_file_lines(WordList),
  foreach(Word in Words)
    if Word.length != Word.remove_dups.length then
      println(word=Word),
      Res = isomorphic_words(Word,Exact,PrintMappings,Words),
      println(Res),
      nl
    end
  end,
  nl.
go5 => true.


%
% Check what words are isomorphic to the word Word
% given the words in the word list Words.
% - Exact: if true then the structure is the same,
%          if false it don't have to be the same structure, just similar
% - PrintMapping: if true, print the structure
%
isomorphic_words(Word,Exact,PrintMappings,Words) = Res =>
  Len = Word.len,
  [IsoList,IsoMap] = make_isomorphism(Word),
  println([word=Word,exact=Exact,isoList=IsoList,isoMap=IsoMap]),
  Count = 0,
  println("\nMatching words:"),
  Res1 = [],
  foreach(W in Words,length(W) == Len)
    [L,M] = make_isomorphism(W),
    Found = false,
    if Exact then
      if  L == IsoList then
        Found := true,
        Count := Count + 1
      end
    else
      if map_eq(M,IsoMap) then
         Found := true,
         Count := Count + 1
      end
    end,
    if Found == true then
      Res1 := Res1 ++ [W],
      if PrintMappings then
        println(w=W),
        MM = new_map(),
        foreach(I in L)
          MM.put(Word[I+1],W[I+1])
        end,
        MMList = MM.to_list.sort,
        writef("%w => %w\n",[K : K=_ in MMList],[V : _=V in MMList]),
        nl
      end
    end
  end,
  Res = Res1.


%
% Convert a word to a word mapping
% - as a list (L), used when Exact == true
% - as a hash map (Map), use when Exact == false
%
make_isomorphism(Word) = [L,Map] =>
   M=new_map(),
   L1 = [],
   Map = new_map(),
   foreach({C,I} in zip(Word,0..Word.len-1))
     if M.has_key(C) then
       L1 := L1 ++ [M.get(C)]
     else
       L1 := L1 ++ [I],
       M.put(C,I)
     end,
     Map.put(C,Map.get(C,0)+1)
   end,
   L = L1.

%
% Check if the two maps M1 and M2 are equal regarding
% occurrences of the values.
%
map_eq(M1,M2) =>
  M1.values.sort == M2.values.sort.