Skip to content

Generates passphrases from a list of basic English words at least 8192 words in length.

License

Notifications You must be signed in to change notification settings

JohnL4/PassphraseGenerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

Generates passphrases from a list of basic English words at least 8192 words in length.

Use

shuffle-pick.sh

Shell script to “shuffle” the input word list (output of mostFrequent) and pick n at a time.

Once you’ve processed all the input raw data into a list of most-frequently-occurring words, this is the script you’ll use most often to generate password possibilities.

(Note: It might be possible to run this script via Windows Subsystem for Linux, https://docs.microsoft.com/en-us/windows/wsl/about. Untested at this time.)

Sample run:

J1231-W10$ ./shuffle-pick.sh
exotic performing flood
calm strain membership
holding cooper lifting
sphere thomas august
accelerated ever seeking
candy passions void
encouraging epic measurement
amino constituted tariff
wealthy comparison bibliography
take supra promise

mostFrequent

Essentially,

cat googlebooks-eng-us-all-1gram-20120701-a | ./mostFrequent

Although there are some other steps you can take to clean things up a bit.

Significant files

mostFrequent.hs
Find the n most frequent words occurring on or after year y in a Google ngrams raw data file, in the 2012 format.
PassphraseGenerator.hs
The guts of the program, so I can write a separate unit-test suite.
spec.hs
Unit test.
filterSpecialCharacters.awk
Awk program to filter out words with non-ASCII characters. We only speak Murrican here.
go
A bash script that encapsulates the commands to build this thing for profiling, prepare the data, and actually run it (and get profiling output).
go2
Bash script to run mostFrequent over all the google raw ngram data files. This is the “Big Run” to produce final output.
filterCommonSuffixes.hs
Program to read a list of words and filter out those that are just another word in the list with a common suffix added on (e.g., to pluralize it). This was the first program I wrote, but mostFrequent is where I’ve spent most of my effort recently (as of 24 Oct 2015). I’ve been having second thoughts about whether I wanted to actually run this or not, since I don’t think words that are “almost duplicates” reduces entropy, really.
raw data
The raw data is at http://storage.googleapis.com/books/ngrams/books/datasetsv2.html, under section “American English” (version 20120701, 1-grams).

About

Generates passphrases from a list of basic English words at least 8192 words in length.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published