Fin-URLs

Detecting URLs, emails, and IP addresses, for Fin Natural language processor.

The Problem

The core lexer doesn't treat URLs, emails and IP addresses any differently, so it will separate between the www and google and consider them as separate tokens when presented with the string www.google. This is obviously inaccurate, and will lead to many inaccuracies in the POS tagger and the dependency parser

The solution to this problem is to have an preprocessor function that takes out the URLs and a postprocessor function that puts them back after the lexer, the POS tagger and the dependency parser are done with the sentence.

And while we're at it, we can attach a detector to the prototype that gives you the URLs that have been detected.

Installation

npm i --save fin-urls

Usage

import * as Fin from "finnlp";
import "fin-urls";

const input = "Here's an email [email protected]. and a website: www.google.com."
const instance = new Fin.Run(input);
const result = instance.links();
const links = result.filter((link)=>link);

console.log(result);
console.log(links);

The above example will give you:

[
    [false,false,false,false,{"type":"email","token":"[email protected]"},false],
    [false,false,false,false,{"type":"url","token":"www.google.com"}]
]

[
    [{"type":"email","token":"[email protected]"}],
    [{"type":"url","token":"www.google.com"}]
]

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src		src
test		test
.gitignore		.gitignore
.npmignore		.npmignore
package.json		package.json
readme.md		readme.md
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fin-URLs

The Problem

Installation

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

FinNLP/fin-urls

Folders and files

Latest commit

History

Repository files navigation

Fin-URLs

The Problem

Installation

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages