Kangaroo words

Contents

    Recently I saw two great art projects by @whisbe. Where a sign that has a phrase written on it. The phrase’s meaning changes when certain letters or words are added or removed.

    New purple variant “She Was inTOXICating” neon for the “Canto XIV” group show with @retna @obeygiant @zesmsk @drewmerritt @saberawr curated by @natalie.elizabeth.brady

    A post shared by WhIsBe (@whisbe) on

    24/7 365 Happy Fucking Valentines Day you filthy animals 🖤

    A post shared by WhIsBe (@whisbe) on

    I loved these projects and I wanted to make something similar of my own. I searched for other examples of it, without too much success. Maybe my google-fu is lacking today.

    In my search, I discovered List of different types of word play and Kangaroo words.

    A Kangaroo word is a word that contains letters of another word, in order (without transposing any letters). For example: encourage contains courage, cog, cur, urge, core, cure, nag, rag, age, nor, rage and enrage.

    Kangaroo word wasn’t exactly what I was looking for but it was on the right path. I started searching for a list of all the common Kangaroo word, and what is the largest Kangaroo words that exist. I was able to find a few small lists but nothing that was complete.

    The Word Circus: A Letter - Perfect Book (Lighter Side of Language Series) was a a good book with lots of Kangaroo words and phrases.

    Because there was no exhaustive list Kangaroo words, and I needed to practice my python I decided to create my own list. The script goes thought all ~250,000 words and finds all the sub words that appear in the word. The problem with this version is that a lot of the sub words that were being found were not common words.

    For example: Districts has 44 subword that include: ric, iris, srs, ist, tit, disc, dcs, discs, itc, tis, irc, sic, sti, ics, dst, dir, str, tits, src, ict, sri, irs, its, sits, dss, dist, iss, sit, tic, district, tri, sis, rcs, rts, dit, sts, dts, dsc, isis, cts, iis, dis, strict, dirt,

    I would have preferred that it only showed the most common words: disc, discs, strict, dirt, tits, district, sits

    Source code, [Output]

    The next version only used the top 20,000 most commonly used words generated from google’s n-gram frequency analysis of the Google’s Trillion Word Corpus. This subset of words also included slang, swear words, and names of companies. I limited this script to only find sub words that are greater then 3 letters, to reduce the noise. This produces a much better result.

    For example:

    Source code, Output

    Now that I have a giant list of words and their sub words.

    What is the largest Kangaroo word in the top 20,000 most commonly used words?

    Telecommunications has 12 sub words that include: lemma, communion, comm, tion, cocos, cont, coco, elena, unions, lena, louis, loan, coats, counts, onion, union, lion, ciao, coca, ions, cain, conn, cons, icon, mains, cats, tons, econ, toons, locations, latin, comma, elec, onto, lent, lemon, telecommunication, eaton, coins, conan, comics, commits, elect, tele, mins, unit, communication, toni, luton, comic, unto, mans, laos, teas, location, como, outs, cuts, count, tits, tuna, toon, commit, nato, units, lots, oman, main, cunt, commons, econo, elem, loans, tout, elton, lions, icons, cans, lotion, lean, common, coma, omni, lucas, eats, tous, tomato, emma, coat, onions, nation, mais, lets, telecom, coin, leica, leon, tees, tent, teen, teens, tomcat, lens, mats, elections, cation, omit, luna, tuition, tents, cmos, lois, communications, tions, luis, otis, election, tens, telecoms, nations,

    The next step create a phrase using the words with the most sub words. Then test different arrangements of sub words to see if they produce a phrase that also makes sense. Testing to see if a string of words creates a proper English word is harder then it sounds. I am going to try it manually first and I fail then I can let the robots at it.