Generate spelling errors
This program generates spelling errors in a text, according to the four character change operators insert, delete, transpose and replace. It may or may not generate realistic output. There are some parameters to test and tweak, see below. More about this program is in the (Swedish) blog post Skapa stavfel ("Making spelling errors" in Swedish). If you have questions or other comments about the program, please mail me.
One of the things to study is how readable the text is with the spelling errors. Also see my program Reading scrambled words for a different way of scramble words.
Test this program with an english text.
Type in a text in the text area below, and change the parameters. Then click on "OK" to proceed. The scrambled text will be shown below. How readable is it?
Ec veytenskaplig undersökhing gjrd vdd tet universtiet i Enguand hvr vfisat tat uttfall dee wvå vörsta ochl dei vå sist brkstäverna i ahlla roden i ne tmext rä riktisgt placeraze, sxelar de ilten koll i vielken ordningföljd ed vriga bockstäverna i oden kmomer. Txten ä fpllt läsbai t.o.m. em ed adra bokspäverna koxmer hulleäombuller! Detsa ftersom iv nite läsexr vaje enskil bokstavq, ujan er blden v oredt sm hehlet.
Probability of spelling errors: 0.5
Probability of transpose: 0.25 (real: 0.22)
Probability of delete: 0.25 (real: 0.27)
Probability of insert: 0.25 (real: 0.20)
Probability of insert: 0.25 (real: 0.31)
Sum of probabilities of transpose, delete, insert and replace: 1
Max number of changes per word: 1
Probability of spelling error
The probability that a specific word in the text should contain a spelling error. 0 mean that no errors will be generated at all, 1 means that every word may have some errors. This probability is checked for each word in a text. This means that for the value of 0.5 (which is default), just about every other word will be changed.
There are no constraints where in a word the change will be: the positions are just randomized. For each of these operators it is possible to set a probability (from 0 to 1) that this type of error will occur. The sum of these probabilities should add to (about) 1 or else something unforseen may happen.
- insert: insert a character somewhere in a word
- delete: delete a random character from a word
- transpose: swap two (near) characters in a word
- replace: replace one character with another (randomly selected) character.
If you just want to study (say) transposes, set the transpose probablity to 1 (one) and the othere to 0 (zero).
For the insertion operator use of English or Swedish character set may be used.
The only characters that may be inserted is the lower characters "a" to "z" (for both languages) and "å", "ä" and "ö" (for Swedish). Note that just the lower characters is used for insertion. The option "Just letters in the word" will use only letters in the word for insertion.
Maximum number of errors
For a word that should be changed, there may be more than one change. Set this parameter to the number of maximum changes to do. The real number of changes is a random value between 1 and the value stated. Note that the result may be unrealistic, e.g. transposing a word a couple of times is not very likely in real life. Maximum 10 errors per word can be generated.
Detection of spelling errors in Swedish not using a word list en clair by Rickard Domeij, Joachim Hollman and Viggo Kann.
There may be some comments in my blogg post announcing this program.
Also see the related program Reading scrambled words
and Nearest words
Back to my other useless programs
Back to my homepage
Created by Hakan Kjellerstrand email@example.com