Tuesday, 19 March 2013
Monday, 18 March 2013
This is a dynamic post which I will continue to update whenever I find something new. Hope you will find the following links useful.
Online Courses for Learning the R language
Free Documentations for Learning the R Language
- R for Beginners by Emmanuel Paradis
- R Graphics by Paul Murrel
- ggplot2 (official documentation)
- Advanced R Programming by Hadley Wickham
Online Courses for Data Mining with R
e-Books for Data Mining with R
- R and Data Mining: Examples and Case Studies by Yanchang Zhao (Really useful worked examples!)
- Data Mining Algorithms in R (Wikibooks)
- The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman
- Introduction to Data Science by Jeffrey Stanton
- Forecasting: Principles and Practice by Rob Hyndman and George Athanasopoulous
- Bayesian Computation with R (Free Kindle Edition): UK Link, US Link (Aug 2013 Update: No longer free)
- 12 Free (as in beer) Data Mining Books
- Twotorials by Anthony Damico (learning new tricks from short 2-min videos)
- Revolution Analytics Free Webinars
- ggplot2 Graphics Cheat Sheet
- 10 tips for making your R graphics look their best
- Making Maps with R
- Compiling R 3.0.1 with MKL support
- Flowing Data - Tutorials
- R-Uni (A List of Free R Tutorials and Resources in University Webpages)
Interesting Blogs and Articles
- Statistics Blogs
- Whizage by Thia Kai Xin
- R Resources by Vivek Patil
- 100 most read R posts in 2012
- "R" you ready?
- The Angry Statistician
- VizWiz - Data Visualization Done Right
- FastML - Machine learning made easy
- ggplot2 Blog
- Spatial.ly - Visualisation, Analysis and Resources
- Vistat - a reproducible gallery of statistical graphics
- The Shape of Data
- Animated Graph for Data People
- 60+ R resources to improve your data skills
Useful R Packages
- Ten R packages I wish I knew about earlier (Before you do anything, read this blog post first!!)
- caret (short for Classification And REgression Training) for a simple way to train and fine-tune model using different algorithms
- ff and bigmemory - two packages to solve memory issues with big datasets
- quantmod for financial modelling
- foreach and doSNOW for parallel computing in R
Interactive Development Environment
- RStudio - a really nice IDE for R
- RStudio Server Amazon Machine Image by Louis Aslett (Wanna run RStudio on Amazon EC2? Try this!)
Other Useful Tips
Sunday, 17 March 2013
Over the years I have learned quite a few things about machine learning but I have never thought of writing them down properly. Too often I can't figure out exactly what I did when I look at my old codes. The time is NOW!
More importantly, I have fallen in love with the R programming language and the massive amount of useful packages from the R community. I want to talk about tricks, tools and useful resources for data mining with R (and sometimes my old favourite Matlab) here.
Bayesian Ensemble Learning
One of the interesting tricks I learned is called "Bayesian Ensemble Learning". It involves combining (i.e. blending) different models to improve overall prediction accuracy. Although it has its downside (e.g. computationally expensive, difficult to interpret ...), it is certainly my favourite data mining technique at the moment. I also decided to name this blog with it long before I start writing this first post!
There is also a need to promote my own research project online. So I guess there will be times I talk about drainage design, green infrastructure and decision support systems. This is not the main focus of the blog but I will try to create some funky graphs and explain my research to a wider audience when the time is right (i.e. when I eventually master the art of graphics in R).
OK, so here we go, this is my journey into the wonderful world of data science!