Skip to content

Commit

Permalink
convert to rust
Browse files Browse the repository at this point in the history
  • Loading branch information
tinmarr committed Jan 18, 2023
1 parent f205e95 commit b8c4cd5
Show file tree
Hide file tree
Showing 9 changed files with 821 additions and 105 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/target
5 changes: 5 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"licenser.license": "GPLv3",
"licenser.projectName": "word_unscrambler",
"licenser.author": "Martin Chaperot"
}
7 changes: 7 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[package]
name = "word_unscrambler"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
695 changes: 674 additions & 21 deletions LICENSE

Large diffs are not rendered by default.

41 changes: 28 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,40 @@
# Word-Unscrambler
This python program outputs a list of the possible options a scrambled word could be.

This Rust program outputs a list of the possible options a scrambled word could be.

_Looking for the Python version? Its been archived to the
[python-version](https://github.com/tinmarr/Word-Unscrambler/tree/python-version) branch_

![Demo Gif](ezgif.com-gif-maker.gif)

## About
## About

This program was created by [Martin Chaperot-Merino](https://github.com/tinmarr)

This word unscrambler can clean unstructured text before employing a named entities recognition (NER) algorithm. For example, the word unscrambler function can be applied to every word in a text file before looking these words up in a gazetteer (a list of entities such as cities, organizations, days of the week, etc.)
This word unscrambler can clean unstructured text before employing a named entities recognition (NER) algorithm. For
example, the word unscrambler function can be applied to every word in a text file before looking these words up in a
gazetteer (a list of entities such as cities, organizations, days of the week, etc.)

# How to use

1. Open the IDE: [https://replit.com/@Tinmarr/Word-Unscrambler?v=1](https://replit.com/@Tinmarr/Word-Unscrambler?v=1)
2. Wait for the Prompt <br />
![The code asks to enter a scrambled word](step1.png)
3. Enter a scrambled word <br />
![The entered word is lleho](step2.png)
4. Hit enter <br />
![The code return hello and asks if you want to restart](step3.png)
2. Wait for the Prompt <br /> ![The code asks to enter a scrambled word](step1.png)
3. Enter a scrambled word <br /> ![The entered word is lleho](step2.png)
4. Hit enter <br /> ![The code return hello and asks if you want to restart](step3.png)

# How it works
It takes words from a text file and uses a lookup function to find words with the same letters (where the order of words does not matter).

It takes words from a text file and uses a lookup function to find words with the same letters (where the order of words
does not matter).

## The key to its speed
It converts all the words into integers (which is based on the letters) and groups words with the same integer in a dictionary. Then it converts the typed word into an integer and looks up that integer in the dictionary.

A first function Word2Vect converts a word into a 26 dimensions vector. Each dimension represents the number of occurrences of a letter ('a', 'b', 'c'...).
It converts all the words into integers (which is based on the letters) and groups words with the same integer in a
dictionary. Then it converts the typed word into an integer and looks up that integer in the dictionary.

A first function Word2Vect converts a word into a 26 dimensions vector. Each dimension represents the number of
occurrences of a letter ('a', 'b', 'c'...).

```
def Word2Vect(word):
l = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
Expand All @@ -36,7 +47,10 @@ def Word2Vect(word):
v[ind] += 1
return v
```
Then a second function Vect2Int converts a 26 dimensions vector into an integer. Each dimension is reduced to a 4 bits. All bits of the Integer are used to code the vector.

Then a second function Vect2Int converts a 26 dimensions vector into an integer. Each dimension is reduced to a 4 bits.
All bits of the Integer are used to code the vector.

```
def Vect2Int(vect):
pv = 0
Expand All @@ -47,4 +61,5 @@ def Vect2Int(vect):
pv += 4
return f
```

Using an integer as lookup value in a dictionary makes it run really fast!
File renamed without changes.
71 changes: 0 additions & 71 deletions main.py

This file was deleted.

98 changes: 98 additions & 0 deletions src/main.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
// Copyright (C) 2023 Martin Chaperot
//
// This file is part of word_unscrambler.
//
// word_unscrambler is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// word_unscrambler is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with word_unscrambler. If not, see <http://www.gnu.org/licenses/>.
use std::collections::BTreeMap;
use std::fs;
use std::io::{self, Write};

const LETTERS: [char; 26] = [
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's',
't', 'u', 'v', 'w', 'x', 'y', 'z',
];

fn load_dictionary() -> BTreeMap<u128, Vec<String>> {
let data = fs::read_to_string("assets/DL.txt").expect("Unable to read dictionary file");
let mut map: BTreeMap<u128, Vec<String>> = BTreeMap::new();
for line in data.lines() {
let word = line.to_string().to_lowercase();
if word == "" {
continue;
}
let i = word_2_int(&word);
match map.get_mut(&i) {
Some(vector) => {
vector.push(word);
}
None => {
map.insert(i, vec![word]);
}
}
}
map
}

fn word_2_int(word: &String) -> u128 {
let mut word_int: u128 = 0;
for letter in word.chars() {
let i: u32 = match LETTERS.binary_search(&letter) {
Ok(i) => i,
Err(_) => continue,
}
.try_into()
.expect("If this panics something went horribly wrong");
word_int += 2u128.pow(4u32 * i);
}
word_int
}

fn main() {
let map = load_dictionary();
loop {
let mut word = String::new();

print!("Enter a scrambled word: ");
io::stdout().flush().unwrap();

io::stdin()
.read_line(&mut word)
.expect("Failed to read line");

word = word.trim().to_string().to_lowercase();
match map.get(&word_2_int(&word)) {
Some(vector) => {
println!("{:?}", vector);
}
None => {
println!("No match found");
}
}

print!("Try again? [Y/n]: ");
io::stdout().flush().unwrap();

let mut again = String::new();

io::stdin()
.read_line(&mut again)
.expect("Failed to read line");

if again.trim().to_lowercase() != "n" {
continue;
} else {
break;
}
}
}

0 comments on commit b8c4cd5

Please sign in to comment.