Generating Personalized Wordlists by Analyzing Target's Tweets
9 Aug 2019
Live Demo
Generating Personalized Wordlists by Analyzing Target's Tweets
Utku Sen
Abstract
Adversaries need to have a wordlist or combination-generation tool while conducting password guessing attacks. To narrow the combination pool, researchers developed a method named ”mask attack” where the attacker needs to assume a password’s structure. Even if it narrows the combination pool significantly, it’s still too large to use for online attacks or offline attacks with low hardware resources.
In the real world, a password’s structure is an unknown value, just like the password itself. Even if we specify a password structure with masks, we are still brute forcing characters in the mask. When we analyzed Ashley Madison and Myspace wordlists, we saw that they are mostly consists of sequential alpha characters. Which means that there is a high probability that they are meaningful words. The first step is understanding if a letter sequence is a meaningful word in the English language. We can state that a letter sequence is an English word if it’s listed in an English lexicon. Wordnet (a lexical database for English created by Princeton University) is used as the lexicon. Our research shows that 30% of the Ashley Madison wordlist and 36% of Myspace wordlist contains meaningful English words.
If we use all words in the Oxford English Directory, the combination pool will be 171,476. But 171,476 is a still big number for online attacks. We can reduce this number if we can identify what kind of words are usually chosen by people. According to experiments conducted by Carnegie Mellon and Carleton universities, most people are choosing words for their passwords based on personal topics such as hobbies, work, religion, sports, video games, etc. So if we can identify the candidate words from interest areas of a person, we can reduce the combination pool significantly. On Twitter, people tend to share posts mostly related to their area of interest. Because of that, Twitter is a good candidate to identify a user’s personal topics and generate related words about it to reduce the combination pool for password guessing attacks.
Our tool, Rhodiola is developed to narrow the combination pool by creating a personalized wordlist for target people. It finds interest areas of a given user by analyzing his/her tweets, and builds a personalized wordlist. Wordlist consists of most used nouns&proper nouns, paired nouns&proper nouns, cities and years related to detected proper nouns. Example usage:
python rhodiola.py --username elonmusk
Example output:
...
tesla
car
boring
spacex
falcon
flamethrower
coloradosprings
tesla1856
...