DG

Project lazerChair Machine Learning Experiments

From this page, you can download the files necessary to replicate a machine learning (ML) experiment that attempted to create a word-level machine translation quality estimation model, as part of Devin Gilbert's doctoral dissertation at Kent State University. Much of the script was developed by Dr. Michael Carl, and Devin Gilbert wrote functions to generate features from already existing features in the training/test data table (feature engineering functions are context_generator(), context_aggregator(), and generate_features_from_features()) as well as functions to cross-validate the different ML methods with different training/test samples (NB_loop(), RF_loop(), and SVM_loop()). The script does not provide an exact replication of everything that was attempted but rather provides the base code from which our results could be replicated

The Jupyter Notebook Script ("lazerChair_featureGeneration_MLtesting.ipynb") contains all the Python code needed to replicate our machine learning experiments and also includes a description of all available features in training/test data. The Results Documentation is an Excel sheet ("lazerCHair_ML_results_and_documentation.xlsx") containing descriptions of the method used to generate labels for both training/test sets, descriptions of all ML configurations run, and a table of precision and recall results for combinations of ML configurations and training/test sets. Training and test data 1 ("LS14_Training_Set.csv") contains the first set of training/test data. Training and test data 2 ("LS14_Training_Set_DG.csv") contains the second set of training/test data.

These ML experiments are described in Section 2 of Chapter 4 of Devin Gilbert's dissertation. Reference information is yet to come since the dissertation hasn't quite been published yet.