Data science has became a hot topics lately. In any IT event, conference or seminar, one can attract the large crowd if data sicence and big data is one of the agenda.

In this article, I am going to show you one of the data science task using machine learning with open source R. My main goal is to illustrate the capabilities and benefits of R in helping you to perform advanced analytics work at minimum cost without degrading the quality. 

Objective

We have collected some records related to cars. Each record is a car model consists of 'mile per gallon' (MPG - fuel consumption), horse power, weight, cylinder, gear, trasmission type, and etc. From this data, we will build a prediction model that help to calculate the MPG of the car from user-supplied information.

mpg data

Analyze and Build Prediction Model in R

First step is to analyze the correlation between the variables and target value. Data Scientist will use the statistical significant car variables to build a linear regression model. I will not touch in deep of the statistcal work involved here. All this work can be performed with algorithms in R.

The study suggests that horse power (hp), weight (wt) and transmission type (am) have the strongest correlation with target value - mile per gallon (mpg). Statistic shows that those 3 variables can explain about 84% variation in mpg. Therefore, the other variables can be ignored.

With this finding, we will generate a prediction model using linear regression algorithm. To facilitate user simulation, we use a web application framework called 'Shiny' in R to capture user input of the selected variables.

         mpg input

Prediction Result and Visual Output

We publish entire work to 'Shiny' including the prediction model. In a web-based environment, user enters the car information and the model return the prediction result. R has a high quality graphic package which can then render the data and result to an attractive output.

mpg output

Conclusion

This short write-up shows that R is not only a pure statiscal platorm. It has rich set of libraries ranging from data manipulation, analysis, machine learning, graphic to web interface. Be a data scientist preferred tool, the analytics and machine learning algorithm is the core feature in finding value in big data.

The Car MPG Prediction application is available here. Do have a try !

You may also look at the updated R summary at our product page.