Let’s find the repeated phrases in the English words. The words come from https://github.com/dwyl/english-words/blob/master/words_alpha.txt.
The words list looks like this:
...
aardvark
aardvarks
aardwolf
...
To dig into the letters, I separate the letters by a space and insert a comma after the end of the words.
...
a a r d v a r k,
a a r d v a r k s,
a a r d w o l f,
...
No feed that in the our favorite pattern scanner: the one we’ve been using for last few months. Look for it on [raypereda.com].
Let’s start with the counts of the top 10 single letters:
length | count | repeated word parts |
---|---|---|
1 | 376454 | e |
1 | 370098 | , |
1 | 313006 | i |
1 | 295792 | a |
1 | 251597 | o |
1 | 251431 | n |
1 | 250280 | s |
1 | 246143 | r |
1 | 230894 | t |
1 | 194915 | l |
1 | 152980 | c |
Well, everybody knows that e is the most common letter in the English language. The count for the comma is equal to the number of words in the list. After that i, a, and so on are in decreasing counts.
Let’s dig into two letter patterns.
length | count | repeated word parts |
---|---|---|
2 | 75666 | s, |
2 | 66714 | er |
2 | 60886 | in |
2 | 56959 | e, |
2 | 49645 | ti |
2 | 47067 | on |
2 | 46930 | es |
2 | 44229 | te |
2 | 42057 | at |
2 | 40559 | al |
2 | 40042 | an |
So, we see that many words end with s. The obsession on singular and plural in English words is a social disease. It is also almost as bad as the obsession with gender in Spanish words.
er is that super popular two-letter suffix. er is perhaps the most important concept in the English language: something-er is the thing that does something!
Then many words end in e. Boring! Many times the e is silent. That problem is not as bad the silent letter in French, but still, this is a problem. One day when we switch to Esperanto, this nonsense will be fixed.
The other commons 2-letter patterns are in, ti, on, es, te, at, al, and an. Interesting.
Let’s dig into three letter patterns.
length | count | repeated word parts |
---|---|---|
3 | 23500 | ing |
3 | 22395 | ed, |
3 | 20147 | , un |
3 | 18672 | ng, |
3 | 17423 | es, |
3 | 16689 | ly, |
3 | 15213 | ati |
3 | 15092 | ess |
3 | 15054 | er, |
3 | 14153 | ion |
3 | 13942 | ter |
ing is a common ending of words, including the word ending. Wow. Many of these patterns happen around the ends of words. ed, ng, es, ly, and er. Very interesting. When reading these stable keep in mind that repeated suffixes end in with a comma. Repeated prefixes start with a comma.
Let’s dig into four letter patterns.
length | count | repeated word parts |
---|---|---|
4 | 18118 | ing, |
4 | 12103 | ess, |
4 | 10910 | tion |
4 | 9906 | ness |
4 | 8571 | ion, |
4 | 7639 | atio |
4 | 7479 | , non |
4 | 7213 | ous, |
4 | 6340 | ical |
4 | 5811 | able |
4 | 5575 | ble, |
Here are the common suffixes are ing, ion, and ous. The common prefixes is non. ical is probably a suffix but we need to look for comma on the right, and that is a longer pattern. We’ll look next in the next top 10 section.
Let’s dig into five letter patterns.
length | count | repeated word parts |
---|---|---|
5 | 9564 | ness, |
5 | 7539 | ation |
5 | 7245 | tion, |
5 | 4609 | able, |
5 | 4316 | ally, |
5 | 3816 | , over |
5 | 3731 | ting, |
5 | 3201 | ical, |
5 | 2750 | eness |
The trend is clear: nearly all of the patterns occur at the ends of words. Most of the conjugation of words happen at the ends of words. As we suspect, ical occurs at the end of words because of the comma at the end of the pattern. There is one prefix: over. That is very interesting. Over is an uber prefix to be so popular.
Let’s dig into the longest patterns.
These patterns are a delight to browse. I’m going to leave you to browse yourself. But before I want to look at the longest repeated phrases.
length | count | repeated word parts |
---|---|---|
26 | 2 | , antidisestablishmentarian |
26 | 2 | , electroencephalographical |
26 | 2 | , microspectrophotometrical |
This is graffiti. My hunch is that the coiners of these words are a politician, a brain doctor, and a physicist. I am not comfortable with politician making long words. Keep an eye on them.
The Top 10 repeated patterns for lengths from 1 to 26 letters
Here is the complete list of all top 10 repeated pattern of each length of letters. Enjoy browsing!
length | count | repeated word parts |
---|---|---|
26 | 2 | , antidisestablishmentarian |
26 | 2 | , electroencephalographical |
26 | 2 | , microspectrophotometrical |
25 | 2 | disestablishmentarianism, |
24 | 3 | , electroencephalographic |
24 | 3 | , microspectrophotometric |
23 | 2 | , microelectrophoretical |
23 | 2 | spectrophotometrically, |
22 | 6 | , electroencephalograph |
22 | 4 | , microspectrophotometr |
22 | 3 | establishmentarianism, |
22 | 2 | , alkylbenzenesulfonate |
22 | 2 | , anthropomorphological |
22 | 2 | , counterclassification |
22 | 2 | , disestablishmentarian |
22 | 2 | , immunoelectrophoretic |
22 | 2 | , pseudophilanthropical |
22 | 2 | , stereophotomicrograph |
22 | 2 | , superincomprehensible |
21 | 5 | , microspectrophotomet |
21 | 4 | disestablishmentarian |
21 | 3 | , microelectrophoretic |
21 | 3 | , superincomprehensibl |
21 | 3 | ndistinguishableness, |
21 | 2 | , anatomicophysiologic |
21 | 2 | , anticonstitutionalis |
21 | 2 | , antienvironmentalist |
21 | 2 | , antiinstitutionalist |
21 | 2 | , cineangiocardiograph |
20 | 8 | , electroencephalogra |
20 | 4 | distinguishableness, |
20 | 4 | spectrophotometrical |
20 | 3 | , antienvironmentalis |
20 | 3 | , ballistocardiograph |
20 | 3 | , hyperconstitutional |
20 | 3 | , magnetohydrodynamic |
20 | 3 | , pseudophilanthropic |
20 | 3 | contemporaneousness, |
20 | 3 | electrophoretically, |
20 | 3 | intellectualization, |
19 | 5 | , electrocardiograph |
19 | 5 | representativeness, |
19 | 4 | , anticonstitutional |
19 | 4 | , overintellectualiz |
19 | 4 | comprehensibleness, |
19 | 4 | impressionableness, |
19 | 4 | intellectualization |
19 | 3 | , antisupernaturalis |
19 | 3 | , electrophysiologic |
19 | 3 | , electroretinograph |
19 | 3 | , electrotherapeutic |
18 | 6 | , counterrevolution |
18 | 6 | , overintellectuali |
18 | 6 | establishmentarian |
18 | 6 | industrialization, |
18 | 6 | spectrophotometric |
18 | 5 | comprehensiveness, |
18 | 5 | conscientiousness, |
18 | 5 | constitutionalism, |
18 | 5 | nationalistically, |
18 | 4 | , ballistocardiogra |
18 | 4 | , electrophysiologi |
17 | 9 | , overintellectual |
17 | 7 | , electrocardiogra |
17 | 7 | ationalistically, |
17 | 7 | enthusiastically, |
17 | 6 | , hydrotherapeutic |
17 | 6 | , transubstantiati |
17 | 6 | conservativeness, |
17 | 5 | , anticonstitution |
17 | 5 | , chemotherapeutic |
17 | 5 | , electrophysiolog |
16 | 11 | , transcendentali |
16 | 9 | , intercommunicat |
16 | 9 | , transubstantiat |
16 | 9 | prehensibleness, |
16 | 9 | spectrophotometr |
16 | 8 | differentiation, |
16 | 8 | representational |
16 | 7 | , anthropomorphis |
16 | 7 | , institutionalis |
15 | 19 | , anthropomorphi |
15 | 14 | , internationali |
15 | 14 | , transcendental |
15 | 14 | opsychological, |
15 | 13 | , institutionali |
15 | 12 | , disproportiona |
15 | 12 | ationalization, |
15 | 12 | contemporaneous |
14 | 28 | , anthropomorph |
14 | 26 | anthropomorphi |
14 | 26 | constitutional |
14 | 24 | otherapeutics, |
14 | 24 | representation |
14 | 21 | psychological, |
14 | 21 | tionalization, |
14 | 20 | denominational |
13 | 110 | ographically, |
13 | 35 | anthropomorph |
13 | 28 | intellectuali |
13 | 28 | nstitutionali |
13 | 28 | ogenetically, |
12 | 129 | graphically, |
12 | 69 | alistically, |
12 | 62 | heartedness, |
12 | 62 | ometrically, |
12 | 54 | intellectual |
11 | 198 | ologically, |
11 | 146 | ographical, |
11 | 133 | raphically, |
11 | 94 | metrically, |
10 | 383 | ification, |
10 | 265 | istically, |
10 | 261 | alization, |
10 | 256 | ographical |
10 | 215 | logically, |
10 | 192 | ativeness, |
10 | 176 | tableness, |
9 | 947 | ableness, |
9 | 552 | ological, |
9 | 479 | iousness, |
9 | 462 | tiveness, |
9 | 447 | ification |
9 | 394 | fication, |
9 | 386 | lization, |
9 | 382 | stically, |
8 | 1189 | bleness, |
8 | 1140 | ousness, |
8 | 1126 | ization, |
8 | 1101 | ability, |
8 | 1099 | tically, |
8 | 752 | ological |
8 | 697 | iveness, |
8 | 603 | , counter |
8 | 590 | logical, |
8 | 588 | ication, |
7 | 2598 | ically, |
7 | 1464 | bility, |
7 | 1268 | leness, |
7 | 1223 | ization |
7 | 1142 | usness, |
7 | 1141 | ousness |
7 | 1133 | zation, |
7 | 988 | edness, |
7 | 887 | ations, |
6 | 5676 | ation, |
6 | 2702 | eness, |
6 | 2645 | cally, |
6 | 1846 | , inter |
6 | 1776 | ingly, |
6 | 1712 | ating, |
6 | 1706 | ograph |
6 | 1665 | , super |
5 | 9564 | ness, |
5 | 7539 | ation |
5 | 7245 | tion, |
5 | 4609 | able, |
5 | 4316 | ally, |
5 | 3816 | , over |
5 | 3731 | ting, |
5 | 3201 | ical, |
5 | 2750 | eness |
4 | 18118 | ing, |
4 | 12103 | ess, |
4 | 10910 | tion |
4 | 9906 | ness |
4 | 8571 | ion, |
4 | 7639 | atio |
4 | 7479 | , non |
4 | 7213 | ous, |
4 | 6340 | ical |
4 | 5811 | able |
4 | 5575 | ble, |
3 | 23500 | ing |
3 | 22395 | ed, |
3 | 20147 | , un |
3 | 18672 | ng, |
3 | 17423 | es, |
3 | 16689 | ly, |
3 | 15213 | ati |
3 | 15092 | ess |
3 | 15054 | er, |
3 | 14153 | ion |
3 | 13942 | ter |
2 | 75666 | s, |
2 | 66714 | er |
2 | 60886 | in |
2 | 56959 | e, |
2 | 49645 | ti |
2 | 47067 | on |
2 | 46930 | es |
2 | 44229 | te |
2 | 42057 | at |
2 | 40559 | al |
2 | 40042 | an |
1 | 376454 | e |
1 | 370098 | , |
1 | 313006 | i |
1 | 295792 | a |
1 | 251597 | o |
1 | 251431 | n |
1 | 250280 | s |
1 | 246143 | r |
1 | 230894 | t |
1 | 194915 | l |
1 | 152980 | c |