Here we compare the categorization of text using two different methods and compare the results. First we use the tf-idf method which is commonly used. Next we use the information theoretic calculation of Kullback-Leibler Distance as suggested in the paper "Using Kullback-Leibler Distance for Text Categorization" by Brigitte Bigi.
This project is made to run on Linux. To run, either run with python tfidf.py or python kld.py. The project expects python to be Python 3. You can also run the scripts run-tfidf.sh or run-kdl.sh which compile to C; this may be faster.