Tytuł pozycji:
Python Machine Learning. Dry Beans Classification Case
A dataset containing over 13k samples of dry beans geometric features was analyzed using machine learning (ML) and deep learning (DL) techniques with the goal to automatically classify the bean species. Performance in terms of accuracy, train and test time was analyzed. First the original dataset was reduced to eliminate redundant features (too strongly correlated and echoing others). Then the dataset was visualized and analyzed with a few shallow learning techniques and simple artificial neural network. Cross validation was used to check the learning process repeatability. Influence of data preparation (dimension reduction) on shallow learning techniques were observed. In case of Multilayer Perceptron 3 activation functions were tried: ReLu, ELU and sigmoid. Random Forest appeared to be the best model for dry beans classification task reaching average accuracy reaching 92.61% with reasonable train and test times.