Week five here at Othot has been very productive. We finally got the main preprocessing steps done for the data set! This is a big accomplishment for me because this is my first time ever doing something like this. Also, I was able to write a train script and a predict script to use. These were also brand new tasks for me, but there was a couple other scripts to go off of which helped a lot. When looking at the train data, there are a couple things we like to focus on in the decision tree. First is the MCC score, which is the Matthews correlation coefficient. This coefficient is used in machine learning as a measure of the quality of binary classifications. We want this coefficient to be as close to 1 as possible. Next is the accuracy, which is basically how accurate the decision tree is on the data set. Lastly, the true positive classification. This is found in the confusion matrix, by adding and then dividing to give you a percentage. These three outputs give us a lot of insight on the data. Also this week, I looked at a couple of histograms with my co-worker that were based on the numerical values in the data set. The purpose of looking at these charts is too see if there are any outliers in the data set that can be excluded. These charts were interesting to look at because it showed us that the data looks good, there were only a couple of variables that needed to be excluded. Week five has taught me even more about analytics. It is nice to finally see all of the work we have been doing for the past four weeks come together.
I give myself a quality performance score of 4 this week. The stuff I am doing now is brand new to me and I am still trying to learn the specifics about what we are looking at in the decision tree. Other than that, I would say this week was very good for me.