Blog

Patterns in Words

Nov 2, 2018 | 7 minutes read

Tag: patterns

Let’s find the repeated phrases in the English words. The words come from https://github.com/dwyl/english-words/blob/master/words_alpha.txt.

The words list looks like this:

...
aardvark
aardvarks
aardwolf
...

To dig into the letters, I separate the letters by a space and insert a comma after the end of the words.

...
a a r d v a r k, 
a a r d v a r k s,
a a r d w o l f,
...

No feed that in the our favorite pattern scanner: the one we’ve been using for last few months. Look for it on [raypereda.com].

Let’s start with the counts of the top 10 single letters:

length count repeated word parts
1 376454 e
1 370098 ,
1 313006 i
1 295792 a
1 251597 o
1 251431 n
1 250280 s
1 246143 r
1 230894 t
1 194915 l
1 152980 c

Well, everybody knows that e is the most common letter in the English language. The count for the comma is equal to the number of words in the list. After that i, a, and so on are in decreasing counts.

Let’s dig into two letter patterns.

length count repeated word parts
2 75666 s,
2 66714 er
2 60886 in
2 56959 e,
2 49645 ti
2 47067 on
2 46930 es
2 44229 te
2 42057 at
2 40559 al
2 40042 an

So, we see that many words end with s. The obsession on singular and plural in English words is a social disease. It is also almost as bad as the obsession with gender in Spanish words.

er is that super popular two-letter suffix. er is perhaps the most important concept in the English language: something-er is the thing that does something!

Then many words end in e. Boring! Many times the e is silent. That problem is not as bad the silent letter in French, but still, this is a problem. One day when we switch to Esperanto, this nonsense will be fixed.

The other commons 2-letter patterns are in, ti, on, es, te, at, al, and an. Interesting.

Let’s dig into three letter patterns.

length count repeated word parts
3 23500 ing
3 22395 ed,
3 20147 , un
3 18672 ng,
3 17423 es,
3 16689 ly,
3 15213 ati
3 15092 ess
3 15054 er,
3 14153 ion
3 13942 ter

ing is a common ending of words, including the word ending. Wow. Many of these patterns happen around the ends of words. ed, ng, es, ly, and er. Very interesting. When reading these stable keep in mind that repeated suffixes end in with a comma. Repeated prefixes start with a comma.

Let’s dig into four letter patterns.

length count repeated word parts
4 18118 ing,
4 12103 ess,
4 10910 tion
4 9906 ness
4 8571 ion,
4 7639 atio
4 7479 , non
4 7213 ous,
4 6340 ical
4 5811 able
4 5575 ble,

Here are the common suffixes are ing, ion, and ous. The common prefixes is non. ical is probably a suffix but we need to look for comma on the right, and that is a longer pattern. We’ll look next in the next top 10 section.

Let’s dig into five letter patterns.

length count repeated word parts
5 9564 ness,
5 7539 ation
5 7245 tion,
5 4609 able,
5 4316 ally,
5 3816 , over
5 3731 ting,
5 3201 ical,
5 2750 eness

The trend is clear: nearly all of the patterns occur at the ends of words. Most of the conjugation of words happen at the ends of words. As we suspect, ical occurs at the end of words because of the comma at the end of the pattern. There is one prefix: over. That is very interesting. Over is an uber prefix to be so popular.

Let’s dig into the longest patterns.

These patterns are a delight to browse. I’m going to leave you to browse yourself. But before I want to look at the longest repeated phrases.

length count repeated word parts
26 2 , antidisestablishmentarian
26 2 , electroencephalographical
26 2 , microspectrophotometrical

This is graffiti. My hunch is that the coiners of these words are a politician, a brain doctor, and a physicist. I am not comfortable with politician making long words. Keep an eye on them.

The Top 10 repeated patterns for lengths from 1 to 26 letters

Here is the complete list of all top 10 repeated pattern of each length of letters. Enjoy browsing!

length count repeated word parts
26 2 , antidisestablishmentarian
26 2 , electroencephalographical
26 2 , microspectrophotometrical
25 2 disestablishmentarianism,
24 3 , electroencephalographic
24 3 , microspectrophotometric
23 2 , microelectrophoretical
23 2 spectrophotometrically,
22 6 , electroencephalograph
22 4 , microspectrophotometr
22 3 establishmentarianism,
22 2 , alkylbenzenesulfonate
22 2 , anthropomorphological
22 2 , counterclassification
22 2 , disestablishmentarian
22 2 , immunoelectrophoretic
22 2 , pseudophilanthropical
22 2 , stereophotomicrograph
22 2 , superincomprehensible
21 5 , microspectrophotomet
21 4 disestablishmentarian
21 3 , microelectrophoretic
21 3 , superincomprehensibl
21 3 ndistinguishableness,
21 2 , anatomicophysiologic
21 2 , anticonstitutionalis
21 2 , antienvironmentalist
21 2 , antiinstitutionalist
21 2 , cineangiocardiograph
20 8 , electroencephalogra
20 4 distinguishableness,
20 4 spectrophotometrical
20 3 , antienvironmentalis
20 3 , ballistocardiograph
20 3 , hyperconstitutional
20 3 , magnetohydrodynamic
20 3 , pseudophilanthropic
20 3 contemporaneousness,
20 3 electrophoretically,
20 3 intellectualization,
19 5 , electrocardiograph
19 5 representativeness,
19 4 , anticonstitutional
19 4 , overintellectualiz
19 4 comprehensibleness,
19 4 impressionableness,
19 4 intellectualization
19 3 , antisupernaturalis
19 3 , electrophysiologic
19 3 , electroretinograph
19 3 , electrotherapeutic
18 6 , counterrevolution
18 6 , overintellectuali
18 6 establishmentarian
18 6 industrialization,
18 6 spectrophotometric
18 5 comprehensiveness,
18 5 conscientiousness,
18 5 constitutionalism,
18 5 nationalistically,
18 4 , ballistocardiogra
18 4 , electrophysiologi
17 9 , overintellectual
17 7 , electrocardiogra
17 7 ationalistically,
17 7 enthusiastically,
17 6 , hydrotherapeutic
17 6 , transubstantiati
17 6 conservativeness,
17 5 , anticonstitution
17 5 , chemotherapeutic
17 5 , electrophysiolog
16 11 , transcendentali
16 9 , intercommunicat
16 9 , transubstantiat
16 9 prehensibleness,
16 9 spectrophotometr
16 8 differentiation,
16 8 representational
16 7 , anthropomorphis
16 7 , institutionalis
15 19 , anthropomorphi
15 14 , internationali
15 14 , transcendental
15 14 opsychological,
15 13 , institutionali
15 12 , disproportiona
15 12 ationalization,
15 12 contemporaneous
14 28 , anthropomorph
14 26 anthropomorphi
14 26 constitutional
14 24 otherapeutics,
14 24 representation
14 21 psychological,
14 21 tionalization,
14 20 denominational
13 110 ographically,
13 35 anthropomorph
13 28 intellectuali
13 28 nstitutionali
13 28 ogenetically,
12 129 graphically,
12 69 alistically,
12 62 heartedness,
12 62 ometrically,
12 54 intellectual
11 198 ologically,
11 146 ographical,
11 133 raphically,
11 94 metrically,
10 383 ification,
10 265 istically,
10 261 alization,
10 256 ographical
10 215 logically,
10 192 ativeness,
10 176 tableness,
9 947 ableness,
9 552 ological,
9 479 iousness,
9 462 tiveness,
9 447 ification
9 394 fication,
9 386 lization,
9 382 stically,
8 1189 bleness,
8 1140 ousness,
8 1126 ization,
8 1101 ability,
8 1099 tically,
8 752 ological
8 697 iveness,
8 603 , counter
8 590 logical,
8 588 ication,
7 2598 ically,
7 1464 bility,
7 1268 leness,
7 1223 ization
7 1142 usness,
7 1141 ousness
7 1133 zation,
7 988 edness,
7 887 ations,
6 5676 ation,
6 2702 eness,
6 2645 cally,
6 1846 , inter
6 1776 ingly,
6 1712 ating,
6 1706 ograph
6 1665 , super
5 9564 ness,
5 7539 ation
5 7245 tion,
5 4609 able,
5 4316 ally,
5 3816 , over
5 3731 ting,
5 3201 ical,
5 2750 eness
4 18118 ing,
4 12103 ess,
4 10910 tion
4 9906 ness
4 8571 ion,
4 7639 atio
4 7479 , non
4 7213 ous,
4 6340 ical
4 5811 able
4 5575 ble,
3 23500 ing
3 22395 ed,
3 20147 , un
3 18672 ng,
3 17423 es,
3 16689 ly,
3 15213 ati
3 15092 ess
3 15054 er,
3 14153 ion
3 13942 ter
2 75666 s,
2 66714 er
2 60886 in
2 56959 e,
2 49645 ti
2 47067 on
2 46930 es
2 44229 te
2 42057 at
2 40559 al
2 40042 an
1 376454 e
1 370098 ,
1 313006 i
1 295792 a
1 251597 o
1 251431 n
1 250280 s
1 246143 r
1 230894 t
1 194915 l
1 152980 c
comments powered by Disqus