Skip to content

Here we compare the catagorization of text using two diffrent methods and compare the results. First we use the TfIdf method which is common. Next we use the information theoretic calculation of Kullback-Leibler Distance as suggeted in the paper "Using Kullback-Leibler Distance for Text Categorization" by Brigitte Bigi.

Notifications You must be signed in to change notification settings

uwaces/KL-Text-Categorization

Repository files navigation

Text_Categorization_KDLvsTfIdf

Here we compare the categorization of text using two different methods and compare the results. First we use the tf-idf method which is commonly used. Next we use the information theoretic calculation of Kullback-Leibler Distance as suggested in the paper "Using Kullback-Leibler Distance for Text Categorization" by Brigitte Bigi.

This project is made to run on Linux. To run, either run with python tfidf.py or python kld.py. The project expects python to be Python 3. You can also run the scripts run-tfidf.sh or run-kdl.sh which compile to C; this may be faster.

About

Here we compare the catagorization of text using two diffrent methods and compare the results. First we use the TfIdf method which is common. Next we use the information theoretic calculation of Kullback-Leibler Distance as suggeted in the paper "Using Kullback-Leibler Distance for Text Categorization" by Brigitte Bigi.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •