nlptagger

This project is heavily under construction and will change a lot because I am learning as I am making and accuracy isn't #1 right now.

When you need a program to understand context of commands.

nlptagger
General info
Why build this?
What does it do?
Technologies
Requirements
Repository overview
Overview of the code.
Things to remember
Reference Commands
Special thanks
Why Go?
Just added

General info

*This project is used for tagging cli commands. It is not a LLM or trying to be. I am using it to generate go code but I made this completely separate so others can enjoy it. *I will keep working on it and hopefully improving the guessing of intent.

-Background

Tokenization: This is the very first step in most NLP pipelines. It involves breaking down text into individual units called tokens (words, punctuation marks, etc.). Tokenization is fundamental because it creates the building blocks for further analysis.
Part-of-Speech (POS) Tagging: POS tagging assigns grammatical categories (noun, verb, adjective, etc.) to each token. It's a crucial step for understanding sentence structure and is often used as input for more complex tasks like phrase tagging.
Named Entity Recognition (NER): NER identifies and classifies named entities (people, organizations, locations, dates, etc.) in text. This is more specific than POS tagging but still more generic than phrase tagging, as it focuses on individual entities rather than complete phrases.
Dependency Parsing: Dependency parsing analyzes the grammatical relationships between words in a sentence, creating a tree-like structure that shows how words depend on each other. It provides a deeper understanding of sentence structure than phrase tagging, which focuses on contiguous chunks.
Lemmatization and Stemming: These techniques reduce words to their base or root forms (e.g., "running" to "run"). They help to normalize text and improve the accuracy of other NLP tasks.
Word2Vec is a technique that represents words as numerical vectors capturing semantic relationships: words with similar meanings have closer vectors. This allows algorithms to understand and process text more effectively by leveraging word similarities.
Semantic roles describe the roles of words or phrases within a sentence, such as agent, action, or object. Identifying these roles helps to understand the meaning and relationships between different parts of a sentence.

*Phrase tagging often uses the output of these more generic techniques as input. For example:

POS tags are commonly used to define rules for identifying phrases (e.g., a noun phrase might be defined as a sequence of words starting with a determiner followed by one or more adjectives and a noun). NER can be used to identify specific types of phrases (e.g., a phrase tagged as "PERSON" might indicate a person's name).

Why build this?

Go never changes
It is nice to not have terminal drop downs

What does it do?

It tags words for commands. *I made an overview video on this project but there have been a lot of changes. video

Technologies

*Just Go.

Requirements

go 1.23 for gonew

How to run as is?

package main

import (
    "bufio"
	"flag"
	"fmt"
	"log"
	"os"
	"strings"

	"github.com/golangast/nlptagger/crf/crf_model"
	"github.com/golangast/nlptagger/neural/nn/g"
	"github.com/golangast/nlptagger/neural/nn/semanticrole"
	"github.com/golangast/nlptagger/neural/nnu"
	"github.com/golangast/nlptagger/neural/nnu/intent"
	"github.com/golangast/nlptagger/neural/nnu/train"
	"github.com/golangast/nlptagger/neural/nnu/word2vec"

func main() {

	var sw2v *word2vec.SimpleWord2Vec
	var err error

	if model == "true" {
		var err error
		sw2v, err = word2vec.LoadModel("trained_model.gob")
		if err != nil {
			fmt.Println("Error loading the model in loadmodel:", err)
		}
	}

	sw2v = &word2vec.SimpleWord2Vec{
		Vocabulary:          make(map[string]int),
		WordVectors:         make(map[int][]float64),
		VectorSize:          vectorsize, // each word in the vocabulary is represented by a vector of VectorSize numbers. A larger VectorSize can allow for a more nuanced representation of words, but it also increases the computational cost of training and storage.
		ContextEmbeddings:   make(map[string][]float64),
		Window:              window, // Example context window size
		Epochs:              epochs,
		ContextLabels:       make(map[string]string),
		UNKToken:            "<UNK>",
		HiddenSize:          hiddensize, // This means hiddensize determines the number of neurons in the hidden layer. A larger hidden size usually allows the network to learn more complex patterns, but also increases the computational resources required.
		LearningRate:        learningrate,
		MaxGrad:             maxgrad,             //Exploding gradients occur when the gradients during training become excessively large, causing instability and hindering the learning process. By limiting the norm of the gradients to maxGrad, the updates to the model's weights are kept within a reasonable range, promoting more stable and effective training.
		SimilarityThreshold: similaritythreshold, //Its purpose is to refine the similarity calculations, ensuring a tighter definition of similarity and controlling the results
	}
	sw2v.Ann, err = g.NewANN(sw2v.VectorSize, "euclidean")
	if err != nil {
		fmt.Println("Error creating ANN:", err) // Handle the error properly
		return                                  // Exit if ANN creation fails
	}

	nn := nnu.NewSimpleNN("datas/tagdata/training_data.json")
	// Train the model
	c, err := train.JsonModelTrain(sw2v, nn)
	if err != nil {
		fmt.Println("Error in JsonModelTrain:", err)
	}

	// Save the trained model
	err = sw2v.SaveModel("trained_model.gob")
	if err != nil {
		fmt.Println("Error saving the model:", err)
	}

	i := intent.IntentClassifier{}
	com := InputScanDirections("what would you like to do?")
	intents, err := i.ProcessCommand(com, sw2v.Ann.Index, c)
	if err != nil {
		fmt.Println("Error in ProcessCommand:", err)
	}
	fmt.Println("~~~ this is the intent: ", intents)
	myModel, err := semanticrole.NewSemanticRoleModel("word2vec_model.gob", "bilstm_model.gob", "role_map.gob")
	if err != nil {
		fmt.Println("Error creating SemanticRoleModel:", err)
	} else {
		fmt.Println("Semantic Role Model:", myModel)
	}



}

*- clone it

git clone https://github.com/golangast/nlptagger

- or
- install gonew to pull down project quickly

go install golang.org/x/tools/cmd/gonew@latest

- run gonew

gonew github.com/golangast/nlptagger example.com/nlptagger

- cd into nlptagger =======

cd nlptagger

- run the project

go run . -model true  -epochs 100 -learningrate 0.1 -hiddensize 100 -vectorsize 100 -window 10 -maxgrad 20 -similaritythreshold .6

Repository overview

├── data #training data
│   └── training_data.json
├── neural #neural network
│   ├── nn #neural networks for tragging
│   ├── nnu #neural network utils
│   └── sematicrole 
├── tagger #tagger folder
│   ├── dependencyrelation #dependency relation
│   ├── nertagger	#ner tagging
│   ├── phrasetagger #phraase tagging
│   ├── postagger #pos tagging
│   ├── stem #stemming tokens before tagging
│   ├── tag #tag data structure
│   └── tagger.go
└── all .gob files/models are at the outer directory #model

Overview of the code.

*Tries to guess intent of the program.

## Things to remember
* it is not a LLM or trying to be
* it is only for cli commands

Just added

*the project *word2vec *semanticroles *context

Special thanks

Go Team because they are gods

Why Go?

The language is done since 1.0.https://youtu.be/rFejpH_tAHM there are little features that get added after 10 years but whatever you learn now will forever be useful.
It also has a compatibility promise https://go.dev/doc/go1compat
It was also built by great people. https://hackernoon.com/why-go-ef8850dc5f3c
14th used language https://insights.stackoverflow.com/survey/2021
Highest starred language https://github.com/golang/go
It is also number 1 language to go to and not from https://www.jetbrains.com/lp/devecosystem-2021/#Do-you-plan-to-adopt--migrate-to-other-languages-in-the-next--months-If-so-to-which-ones
Go is growing in all measures https://madnight.github.io/githut/#/stars/2023/3
Jobs are almost doubling every year. https://stacktrends.dev/technologies/programming-languages/golang/
Companies that use go. https://go.dev/wiki/GoUsers
Why I picked Go https://youtu.be/fD005g07cU4

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.idx		.idx
.vscode		.vscode
commands		commands
crf		crf
data		data
datas		datas
neural		neural
tagger		tagger
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bilstm_model.gob		bilstm_model.gob
block.prof		block.prof
calc.test		calc.test
cpu.prof		cpu.prof
example.txt		example.txt
go.mod		go.mod
go.sum		go.sum
main.go		main.go
main_test.go		main_test.go
mem.prof		mem.prof
model.gob		model.gob
nlptagger		nlptagger
nlptagger.test		nlptagger.test
role_map.gob		role_map.gob
tag.txt		tag.txt
train.log		train.log
train_bilstm.test		train_bilstm.test
trained_model.gob		trained_model.gob
word2vec_model.gob		word2vec_model.gob

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nlptagger

General info

Why build this?

What does it do?

Technologies

Requirements

How to run as is?

Repository overview

Overview of the code.

Just added

Special thanks

Why Go?

About

Releases

Packages

Languages

License

golangast/nlptagger

Folders and files

Latest commit

History

Repository files navigation

nlptagger

General info

Why build this?

What does it do?

Technologies

Requirements

How to run as is?

Repository overview

Overview of the code.

Just added

Special thanks

Why Go?

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages