Skip to content

A set of python scripts for getting vectors of word counts for a set of documents.

Notifications You must be signed in to change notification settings

eerpini/GenDocWordCount

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository consists of a set of python scripts which count the number of occurences of words in a given directory consisting of *.txt files (the other files are ignored). The word count is calculated for each unique word occuring in all the documents considered together. 

RUNNING THE SCRIPTS
-------------------
the gen_doc_class_input.py is the main script run it as follows : 

$python gen_doc_class_input.py <path to a directory with *.txt files>

if you want the final word count vectors to be written to a file , use the program as follows :

$python gen_doc_class_input.py <path to a directory with *.txt files> -f <output_file_name>

note : the first argument should always be the path, and -f should always be followed by the file name to write to.

for any further discussions please mail me at <[email protected]>

About

A set of python scripts for getting vectors of word counts for a set of documents.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages