Looking for the shortest unique phrases in the Hidden Words
O Son of Spirit! My first counsel is this: Possess a pure, kindly and radiant heart, that thine may be a sovereignty ancient, imperishable and everlasting.
Inspired by a post on r/bahaidev, I identified the shortest unique phrase in each of the Hidden Words. It turns out that for the vast majority of Hidden Words one word is enough to uniquely identify it.
The Hidden Words are described by their Author as the inner essence of that which has been revealed unto the Prophets of old, 'clothed in the garment of brevity.' There are many intonations of the Hidden Words, with the title often being the opening line, e.g. "O Son of Man" or "O Son of Spirit". But since there are several opening lines that are used in multiple Hidden Words, the opening line generally doesn't identify the Hidden Word which leads to intonations of different Hidden Words sometimes having the same title. This situation inspired the following attempt to find uniquely identifying phrases in the 153 Hidden Words that might serve as titles for intonations.
I got the Hidden Words from the Bahá'í Library.
Then I converted every word to lower case and removed punctuation.
To split each Hidden Word into individual words, I used the
word_tokenize function from NLTK.
In computational linguistics, an n-gram is a contiguous sequence of n items from a sample of text or speech.
In our case we consider the items to be words.
But they could also be any other unit such as letters, syllables or sentences.
I used the
ngrams function from NLTK to compute all n-grams for each Hidden Word.
This gives me a set of all phrases occuring in each Hidden Word.
Finding unique n-grams
Once I got all n-grams, I just needed to check which is the shortest n-gram in a Hidden Word that is not included in another Hidden Word. I brute forced this by creating two sets of n-grams for each Hidden Word — the first one contains all n-grams from the current Hidden Word, the other set contains all n-grams from the "other" Hidden Words. Then I could just look for the shortest n-gram that does not appear in the "other" Hidden Words. The full list of shortest unique phrases is at the bottom of the article.
- 145 of the 153 Hidden Words contain a word that is unique
- Eight Hidden Words need two words to be uniquely identified
- Six are from the Arabic part
- Two are from the Persian part
- For seven of them, the first occurence of a two word unique phrase is the last word of the opening line and the first word of the following sentence
- All Hidden Words can be uniquely identified with one or two words
- The identified words are not always suitable as a title. They either do not convey enough meaning or don't make any sense on their own (e.g. hence, whoso, light forget).
- Next steps for improved results
- Don't include adverbs and pronouns
- Prevent the two word phrases from crossing sentence boundaries, maybe even enforce one of the words to be a noun
import string import nltk with open('hidden_words.txt', 'r') as f: hidden_words = f.read().split('\n\n') hidden_words_cleaned = [hw.lower().translate(str.maketrans("", "", string.punctuation)) for hw in hidden_words] hidden_words_tokenized = [nltk.word_tokenize(hw) for hw in hidden_words_cleaned] # Get all ngrams all_hidden_words_ngrams =  for hw in hidden_words_tokenized: hidden_word_ngrams =  for i in range(1, len(hw)): hidden_word_ngrams.extend(nltk.ngrams(hw, i)) all_hidden_words_ngrams.append(hidden_word_ngrams) # Find unique ngrams identifying_ngrams =  for i, current_ngrams in enumerate(all_hidden_words_ngrams): other_ngrams = list(all_hidden_words_ngrams) del other_ngrams[i] other_ngrams = [item for sublist in other_ngrams for item in sublist] for ng in current_ngrams: if ng not in other_ngrams: identifying_ngrams.append(ng) break
Unique phrases — Part One: From the Arabic
1 possess 2 confide 3 immemorial 4 hence 5 reach 6 heavenly 7 regard 8 renouncing 9 entereth 10 safety 11 get 12 fingers 13 beside 14 fearest 15 endureth 16 light forget 17 helper 18 sake 19 forsaken 20 plenteous 21 hung 22 abased 23 supreme to 24 limits 25 vaunt 26 whoso 27 breathe 28 biddeth 29 ascribe 30 anything 31 account 32 supreme i 33 hail 34 beareth 35 drawing 36 gladness 37 beauteous 38 statutes 39 commandments 40 speed 41 greatness 42 humble 43 being make 44 testify 45 ordain 46 reflect 47 tinge 48 fortitude 49 yearneth 50 adversity 51 providence 52 prosperity 53 overtake 54 if thine 55 test 56 freedom 57 bestow 58 established 59 descent 60 bosom 61 man ascend 62 slumber 63 sinai 64 handiwork 65 grandeur 66 perturbed 67 written 68 same 69 sons 70 stood 71 write
Unique phrases — Part Two: From the Persian
1 solomon 2 nest 3 plant 4 whither 5 disputeth 6 wherein 7 step 8 swift 9 shadow 10 aught 11 blind 12 visions 13 sink 14 friends abandon 15 cometh 16 myriads 17 comrades 18 proclaim 19 surroundings 20 lying 21 moving 22 learned 23 blasts 24 foolish 25 seeming 26 awhile 27 human 28 flash 29 wastes 30 bondslave 31 gaze 32 befriended 33 delightsome 34 young 35 error 36 unless 37 river 38 burst 39 offspring 40 fetters 41 conceal 42 purge 43 lay 44 companion 45 swiftness 46 neglected 47 attire 48 forbearing 49 tell 50 quintessence 51 troubled 52 suffered 53 barrier 54 midst 55 passion cleanse 56 increaseth 57 beware 58 drunk 59 openly 60 whatsoever 61 dewdrop 62 matchless 63 unforeseen 64 oppressors 65 rebellious 66 emigrants 67 breaketh 68 weed 69 adam 70 worldliness 71 entered 72 finely 73 heavens 74 abandoned 75 unwary 76 show 77 nightseason 78 dayspring 79 bestowed 80 trees 81 basest 82 earn