This project implements a machine learning algorithm based on Zdzislaw Pawlak's Rough Set Theory to predict golf performance based on weather conditions.
The project consists of the following files:
Train_data_golf_14ex.csv: Training dataset.Test_data_golf_50ex.csv: Test dataset.algorithm.py: The main script with the implementation of the algorithm.
- Clone the repository:
git clone https://github.com/your-username/rst-golf-prediction.git- Go to your project folder:
cd rst-golf-prediction- Install required dependencies:
pip install pandas- Place your CSV data files in your project root folder.
- For correct operation specify the path to the test and training dataset depending on its location on your computer
df_path = 'Put your personal path here'df_test_path = 'Put your personal path here too'- Run the script
RS-ML.py
python RS-ML.py| Outlook | Humidity % | Wind | Play |
|---|---|---|---|
| Overcast | 87 | Fasle | Yes |
| Sunny | 80 | True | Yes |
| Sunny | 80 | True | Yes |
| Overcast | 75 | True | Yes |
| Overcast | 75 | True | Yes |
| Rainy | 80 | False | No |
| Sunny | 80 | True | No |
| Rainy | 80 | False | No |
| Rainy | 85 | False | No |
| Overcast | 87 | False | Yes |
After launch we get the following intermediate results, which represent the construction of production rules:
Getting an elementary subsets of dataset:
[[0, 9], [1, 2, 6], [3, 4], [5, 7], [8]]
[[0, 9], [3, 4]]
======== Production rules for positive region ========
1) IF (Outlook = Overcast)& (Humidity% = 87 & 75)& (Wind = False & True)& THEN DECISION "PLAY" = PLAY
======== Production rules for negative region ========
2) IF (Outlook = Rainy)&(Humidity% = 85 V 80)&(Wind = False) THEN DECISION "PLAY" = DON'T PLAY
======== Production rules for boundry region ========
3) IF (Outlook = Sunny)&(Humidity% = 80)&(Wind = True) THEN DECISION "PLAY" = MAYBE PLAY
Approximation accuracy: 0.571The final result will be the classification of the test dataset based on the constructed rules, as well as a comparison of the classification of the algorithm with the true values.
| Outlook | Humidity % | Wind | Play | Classification |
|---|---|---|---|---|
| Overcast | 87 | Fasle | Yes | Yes |
| Sunny | 80 | True | Yes | Maybe |
| Rainy | 80 | True | Yes | Unknown |
| Sunny | 75 | True | Yes | Maybe |
| NaN | 75 | True | Yes | Unknown |
| Overcast | 80 | False | No | Yes |
| Raqiny | 80 | True | No | No |
Accuracy of the classification RS1: 42.9 %The main implemented functions of the algorithm are:
get_elementary_subsets(X): A function that returns elementary subsets of a set of objects.get_lower(elementary, X_true_indexes): Formation of lower approximation.get_upper(elementary, X_true_indexes): Formation of upper approximation.get_pos_rule(pos_dataframe): Creating production rules for upper approximation.get_neg_rule(not_pos_dataframe): Creating production rules for lower approximation.get_maybe_rule(maybe_dataframe): Creating production rules for boundry region.classify_new_data(row, pos_df, maybe_df, neg_df): Classification of a test data set based on constructed rules.