When it comes to machine learning and data processing the big three are R, Python, and SAS. While I prefer R for running the analytical models it is not always the fastest when it comes to cleaning large datasets. Now, this is where I prefer the power (heavy weight) and speed (light weight) of Python.
In this post we will look at how to do the following:
We will use the following dataset which can be downloaded from the following link:
In this post we will look at how to do the following:
- Trim Whitespce
- Remove invalid characters
- Create Dummy Variables
- Format Dates
We will use the following dataset which can be downloaded from the following link: