Overview

Generates passphrases from a list of basic English words at least 8192 words in length.

Use

`shuffle-pick.sh`

Shell script to “shuffle” the input word list (output of mostFrequent) and pick n at a time.

Once you’ve processed all the input raw data into a list of most-frequently-occurring words, this is the script you’ll use most often to generate password possibilities.

(Note: It might be possible to run this script via Windows Subsystem for Linux, https://docs.microsoft.com/en-us/windows/wsl/about. Untested at this time.)

Sample run:

J1231-W10$ ./shuffle-pick.sh
exotic performing flood
calm strain membership
holding cooper lifting
sphere thomas august
accelerated ever seeking
candy passions void
encouraging epic measurement
amino constituted tariff
wealthy comparison bibliography
take supra promise

`mostFrequent`

Essentially,

cat googlebooks-eng-us-all-1gram-20120701-a | ./mostFrequent

Although there are some other steps you can take to clean things up a bit.

Significant files

mostFrequent.hs: Find the n most frequent words occurring on or after year y in a Google ngrams raw data file, in the 2012 format.
PassphraseGenerator.hs: The guts of the program, so I can write a separate unit-test suite.
spec.hs: Unit test.
filterSpecialCharacters.awk: Awk program to filter out words with non-ASCII characters. We only speak Murrican here.
go: A bash script that encapsulates the commands to build this thing for profiling, prepare the data, and actually run it (and get profiling output).
go2: Bash script to run mostFrequent over all the google raw ngram data files. This is the “Big Run” to produce final output.
filterCommonSuffixes.hs: Program to read a list of words and filter out those that are just another word in the list with a common suffix added on (e.g., to pluralize it). This was the first program I wrote, but mostFrequent is where I’ve spent most of my effort recently (as of 24 Oct 2015). I’ve been having second thoughts about whether I wanted to actually run this or not, since I don’t think words that are “almost duplicates” reduces entropy, really.
raw data: The raw data is at http://storage.googleapis.com/books/ngrams/books/datasetsv2.html, under section “American English” (version 20120701, 1-grams).

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.gitignore		.gitignore
LICENSE		LICENSE
PassphraseGenerator.hs		PassphraseGenerator.hs
README.org		README.org
filterCommonSuffixes.hs		filterCommonSuffixes.hs
filterSpecialCharacters.awk		filterSpecialCharacters.awk
go		go
go2		go2
hscolour.css		hscolour.css
mostFreq-a-z-first-33000.txt		mostFreq-a-z-first-33000.txt
mostFreq-a-z-first-8192.txt		mostFreq-a-z-first-8192.txt
mostFrequent.hs		mostFrequent.hs
ppgen.hs		ppgen.hs
recordPlay.hs		recordPlay.hs
shuffle		shuffle
shuffle-pick.sh		shuffle-pick.sh
spec.hs		spec.hs
words-longer-than-3-chars-sorted.txt		words-longer-than-3-chars-sorted.txt
words-longer-than-3-chars.txt		words-longer-than-3-chars.txt
words.txt		words.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Use

`shuffle-pick.sh`

`mostFrequent`

Significant files

About

Releases

Packages

Languages

License

JohnL4/PassphraseGenerator

Folders and files

Latest commit

History

Repository files navigation

Overview

Use

shuffle-pick.sh

mostFrequent

Significant files

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`shuffle-pick.sh`

`mostFrequent`

Packages