Skip to content

Latest commit

 

History

History
3 lines (3 loc) · 945 Bytes

File metadata and controls

3 lines (3 loc) · 945 Bytes

Document-Feature-Identification-Vasavi

The project is about taking random text documents of different topic scenarios and grouping/classifying these documents into respective domain areas by creating a algorithm. Initially, the user has to upload the files that needs to be classified.The classification algorithm takes all the input files into an array. Chunking and parts of speech tagging is done on the data array.Now construct parse tree and append all the nouns to an array. Frequency distribution is calculated on the first file in array. We get top 100(we can change this number) frequent words from the frequency distribution . now get top 100(number can be changed) words from the category we want to compare .Start comparing every word from frequency distribution with all categories necessary and increase the count variable of that respective category. The file is then moved to the folder which has more number of word matches.