Monday 22 September 2014

H2O, Domino & Kaggle Quick-Start Guide and RUGSMAPS2

Following up on my previous posts about H2O Deep Learning (TTTAR1) and RUGSMAPS (TTTAR2), here is a quick update on two interesting things I have been working on: a Kaggle tutorial and a new RUGSMAPS app.

Short Tutorials based on a Kaggle Competition

First of all, I would like to share with you my first ever guest post on Domino Data Lab's blog - “How to use R, H2O and Domino for a Kaggle competition”.

My guest post on Domino Data Lab's blog.

As a sequel to TTTAR1, this blog post is a more in-depth machine learning article with starter code and short tutorials. The purpose is to get more people started with R, H2O Deep Learning and Domino Data Lab using a recent Kaggle competition as case study. The short tutorials should be generic enough for Kaggle competitions as well as any general data mining exercises. I hope you will find it useful if you are interested in machine learning stuff.

RUGSMAPS2: A Crowd-Sourcing Experiment

Shortly after RUGSMAPS went public, Ines Garmendia kindly pointed out that there were several mistakes in the app (thanks, Ines! … at the same time, doh!!!). For example, the two groups in Madrid are supposed to be much further apart than I thought. More importantly, they are subgroups of the main “Comunidad R Hispano” group. Without Ines' feedback, I would never be able to notice that myself. I was only relying on the source data from the contest.

I know a lot more local knowledge is required to make RUGSMAPS a much better app for the R community. It is also necessary to streamline the updating process so that new groups can be added easily. Therefore, I am now proposing a crowd-sourcing experiment and am hoping that more RUGs organisers/members can contribute in future. My idea is a dynamic web app (RUGSMAPS2) that reads information directly from a live Google spreadsheet.

Let's start with my favourite LondonR and its sister group ManchesterR. I know a lot about them personally so I can provide information like their key sponsor (Mango Solutions), venues and websites. To make this information available to all other R users, I just need to update the Google spreadsheet and the new RUGSMAPS2 will automatically render maps with new data.

Adding venue, key sponsors, websites and other information.

LondonR and ManchesterR with additional information.

For the Comunidad R Hispano issues I mentioned above, you can see that Ines helped me to fill in some new information about the four subgroups:

Entering main and subgroup information.

Fixing RUGSMAPS to show subgroups of Comunidad R Hispano correctly.

So if you're interested in helping me out or know someone else who might be able to help, please spread the word and forward this Google spreadsheet. Oh, wait ... can EVERYONE edit that spreadsheet??? Yes. I understand there might be issues if everyone can edit the spreadsheet without my permission. That's why I am calling it an experiment! I would like to see whether I can turn this into a successful crowd-sourcing project (or, otherwise, organised chaos). The app won't go live until I am happy with the new information and features. You can check out the RUGSMAPS2 repository for all the latest development!


That's it for now. Take my words for it, give H2O and Domino a try and you won't regret it!