« mobillog.nu | Main | Multi-agent technology samt lite om ontologier och annat »

mars 19, 2004

Lord Of The Strings

I de två artiklarna Lord Of The Strings Part 1 och Lord Of The Strings Part 2 görs en analys av vilket naturligt språk som är närmast Tolkiens uppfunna språk. Databasfrågor och programkod medföljer artiklarna.


As a developer, I was thinking about an algorithmic approach to the problem. My idea was to write a program that takes each Tolkien word in turn and finds which real language has the word which is most similar. By inspecting the number of times each language is chosen, we should be able to decide which language was Tolkien’s biggest influence. Of course I would need to look on the Web to find lists of Tolkien words, as well as word lists for other languages, but I assumed that wouldn’t be a problem. My own string similarity metric could be used for the word-by-word comparison, and is a good choice because it acknowledges similarity for a common substring of any size, and is robust to differences in string size. Of course this would be a comparison of lexical similarity, as my string similarity algorithm makes only lexical comparisons. It is still possible that the inspiration for the grammar and the lexical structure of Tolkien’s languages came from entirely different sources.
...
Conclusions

When I started this investigation, I had no idea what the result would be. I just clung firmly onto the belief that my string similarity metric, together with a simple algorithm to iterate over the set of possible word pair comparisons, would provide an interesting result. In fact, the results are very satisfying. I found that English had a profound effect on Tolkien's invented languages, with perhaps further influences from Hungarian and Spanish. This is satisfying because it is entirely reasonable (at least the part about English!), though not exactly what I expected after reading about the (apparently unfounded) claims for the influences of Finnish. It is also satisfying because it increases my confidence in the string similarity method. And as developers, we like to have confidence in our methods.

Se även samme författares (Simon White) Matching Strings and Algorithms.


(Tack Ulf!)

Posted by hakank at mars 19, 2004 01:13 FM Posted to Statistik/data-analys