No Access to R, SAS, Python, or SPSS?!! No sweat....

11/18/2014

In this post we will look at how you can implement some basic sql code to solve almost every data science problem you will encounter. Granted SVM, RandomForest, and GBM usually outperform logistic regression and KNN, it does not mean these two are not used or necessary.

Lets start of with some basics, you will need MySQL installed and have a basic understanding of SQL. If SQL is a little confusing then I recommend checking out a tutorial online prior.

We are going to use MySQL to accomplish the following:

Frequency Table
Mean
Median
Standard Deviation
Log Transformation
Z-Scores (Outlier detection)
Correlation
Multiple Linear Regression
Naive Bayes
KNN
Kmeans

We will be using two classification data sets provide by UCLA.
https://archive.ics.uci.edu/ml/datasets/Adult
https://archive.ics.uci.edu/ml/datasets/Covertype

We will do the following for each data set.

Build tables
Import Data
Report basic statistics
Split into test and train
Run through algorithm and validate

0 Comments

No Access to R, SAS, Python, or SPSS?!! No sweat....

Leave a Reply.

Author

Archives

Categories