Looking for the shortest unique phrases in the Hidden Words

28 April 2021

O Son of Spirit! My first counsel is this: Possess a pure, kindly and radiant heart, that thine may be a sovereignty ancient, imperishable and everlasting.

Inspired by a post on r/bahaidev, I identified the shortest unique phrase in each of the Hidden Words. It turns out that for the vast majority of Hidden Words one word is enough to uniquely identify it.

The Hidden Words are described by their Author as the inner essence of that which has been revealed unto the Prophets of old, 'clothed in the garment of brevity.' There are many intonations of the Hidden Words, with the title often being the opening line, e.g. "O Son of Man" or "O Son of Spirit". But since there are several opening lines that are used in multiple Hidden Words, the opening line generally doesn't identify the Hidden Word which leads to intonations of different Hidden Words sometimes having the same title. This situation inspired the following attempt to find uniquely identifying phrases in the 153 Hidden Words that might serve as titles for intonations.

Preprocessing

I got the Hidden Words from the Bahá'í Library. Then I converted every word to lower case and removed punctuation. To split each Hidden Word into individual words, I used the word_tokenize function from NLTK.

Computing ngrams

In computational linguistics, an n-gram is a contiguous sequence of n items from a sample of text or speech. In our case we consider the items to be words. But they could also be any other unit such as letters, syllables or sentences. I used the ngrams function from NLTK to compute all n-grams for each Hidden Word. This gives me a set of all phrases occuring in each Hidden Word.

Finding unique n-grams

Once I got all n-grams, I just needed to check which is the shortest n-gram in a Hidden Word that is not included in another Hidden Word. I brute forced this by creating two sets of n-grams for each Hidden Word — the first one contains all n-grams from the current Hidden Word, the other set contains all n-grams from the "other" Hidden Words. Then I could just look for the shortest n-gram that does not appear in the "other" Hidden Words. The full list of shortest unique phrases is at the bottom of the article.

Analysis

  • 145 of the 153 Hidden Words contain a word that is unique
  • Eight Hidden Words need two words to be uniquely identified
    • Six are from the Arabic part
    • Two are from the Persian part
    • For seven of them, the first occurence of a two word unique phrase is the last word of the opening line and the first word of the following sentence

Conclusion

  • All Hidden Words can be uniquely identified with one or two words
  • The identified words are not always suitable as a title. They either do not convey enough meaning or don't make any sense on their own (e.g. hence, whoso, light forget).
  • Next steps for improved results
    • Don't include adverbs and pronouns
    • Prevent the two word phrases from crossing sentence boundaries, maybe even enforce one of the words to be a noun

Code

import string
import nltk

with open('hidden_words.txt', 'r') as f:
    hidden_words = f.read().split('\n\n')

hidden_words_cleaned = [hw.lower().translate(str.maketrans("", "", string.punctuation)) for hw in hidden_words]
hidden_words_tokenized = [nltk.word_tokenize(hw) for hw in hidden_words_cleaned]

# Get all ngrams
all_hidden_words_ngrams = []
for hw in hidden_words_tokenized:
    hidden_word_ngrams = []
    for i in range(1, len(hw)):
        hidden_word_ngrams.extend(nltk.ngrams(hw, i))
    all_hidden_words_ngrams.append(hidden_word_ngrams)

# Find unique ngrams
identifying_ngrams = []
for i, current_ngrams in enumerate(all_hidden_words_ngrams):
    other_ngrams = list(all_hidden_words_ngrams)
    del other_ngrams[i]
    other_ngrams = [item for sublist in other_ngrams for item in sublist]
    for ng in current_ngrams:
        if ng not in other_ngrams:
            identifying_ngrams.append(ng)
            break

Unique phrases — Part One: From the Arabic

1 possess
2 confide
3 immemorial
4 hence
5 reach
6 heavenly
7 regard
8 renouncing
9 entereth
10 safety
11 get
12 fingers
13 beside
14 fearest
15 endureth
16 light forget
17 helper
18 sake
19 forsaken
20 plenteous
21 hung
22 abased
23 supreme to
24 limits
25 vaunt
26 whoso
27 breathe
28 biddeth
29 ascribe
30 anything
31 account
32 supreme i
33 hail
34 beareth
35 drawing
36 gladness
37 beauteous
38 statutes
39 commandments
40 speed
41 greatness
42 humble
43 being make
44 testify
45 ordain
46 reflect
47 tinge
48 fortitude
49 yearneth
50 adversity
51 providence
52 prosperity
53 overtake
54 if thine
55 test
56 freedom
57 bestow
58 established
59 descent
60 bosom
61 man ascend
62 slumber
63 sinai
64 handiwork
65 grandeur
66 perturbed
67 written
68 same
69 sons
70 stood
71 write

Unique phrases — Part Two: From the Persian

1 solomon
2 nest
3 plant
4 whither
5 disputeth
6 wherein
7 step
8 swift
9 shadow
10 aught
11 blind
12 visions
13 sink
14 friends abandon
15 cometh
16 myriads
17 comrades
18 proclaim
19 surroundings
20 lying
21 moving
22 learned
23 blasts
24 foolish
25 seeming
26 awhile
27 human
28 flash
29 wastes
30 bondslave
31 gaze
32 befriended
33 delightsome
34 young
35 error
36 unless
37 river
38 burst
39 offspring
40 fetters
41 conceal
42 purge
43 lay
44 companion
45 swiftness
46 neglected
47 attire
48 forbearing
49 tell
50 quintessence
51 troubled
52 suffered
53 barrier
54 midst
55 passion cleanse
56 increaseth
57 beware
58 drunk
59 openly
60 whatsoever
61 dewdrop
62 matchless
63 unforeseen
64 oppressors
65 rebellious
66 emigrants
67 breaketh
68 weed
69 adam
70 worldliness
71 entered
72 finely
73 heavens
74 abandoned
75 unwary
76 show
77 nightseason
78 dayspring
79 bestowed
80 trees
81 basest
82 earn