Greg Hamel’s Series on Python & R

I had a plan to work up a quick tutorial on typical R commands for those starting with Python Notebook, but when I was looking around to see what resources were out there, I came across these – Greg Hamel’s Introduction to R and Python for Data Analysis

No more need to work up a tutorial – what an amazing set of resources (and kind of hard to find too!) that have been so well put together, a 30 part series on both giving a really nice balanced view of both. You should head on over and make sure are across these, and thank this guy (thanks Greg!)


And don’t go all should-I-learn-Python-or-R on me! Make sure you know them both! For me, R wins outs for its sheer joy of doing exploratory stuff, but Python has the edge when it comes to plugging into different stuff, and just generally plays better with messy tech situations.

Postgres on Node with Massive.js

One of the difficulties about Postgres is just getting the workflow right. PGAdmin III (maybe not PGAdmin IV just yet) is kind of ok and well featured, and for database set-up and design its fine. But the issue for me comes up when you start building out a web application.

A few years ago I went through a big Django phase which let me avoid this to an extent. Django has an ORM which handles things fairly well. However, when I moved over to the to Node.js and Angular I just haven’t been happy with the SQL situation.

In the end I moved to MongoDB because it plays so well with Node. Now I have to say I really dig MongoDB, its great with mongoose.js works really well for MongoDB query building. But overall, SQL is still my preferred approach (though MongoDB is perfect for my my own projects), especially if its a team project.

So where does this leave the Postgres backend on Node?  Well, I for a couple of projects of late, I have been using Sequalize.js. This was ok, but the SQL felt a little to abstracted, like it didn’t want me to know it was hiding SQL in the basement. I really want the clarity of working with SQL, but I want it to feel as integrated as the way I work with MongoDB on Node.

So what can be done? Enter massive.js

I really like this library – its a super lightweight interface to Postgres on node. You can just connect and then handle everything with a db folder and .sql files. Set up took about 5 minutes and the amount of code felt like I was typing a tweet!

You can check out a hello world for massive.js and node here. You will need a connection string to your own database and Postgres installed and running. For this, I just used the the sample dvdrental database which is always handy for playing around with Postgres.

Admittedly, this is all partly a philosophical thing – do you never want to see the SQL and just use the ORM? Or do like to see the SQL? Personally, I like SQL, its an interesting and powerful language and this solution keeps things pristine.

Probability Distributions

Yes I know…what a thrilling blog post title! But you know, sometimes I get the feeling we get all caught up in overly complicated ways to solve problems, its neural nets that, boosting this, recommender that, yada yada, yada….like whatever!

We forget that in data analysis its so important it to have an intuitive grasp on the basics. And with that in mind I just had to post this amazing image from Sean Owens great blog post on understanding the relationship between different probability distributions. I started searching for something like this because I got into this long and thrilling conversation the other day on different distributions and everyone seemed a little too vague about it all. I tell you, I felt uneasy!

So check out the post and ensure you have a zen understanding of how they they all fit together!




One of things about MusicXML is that it does not explicitly encode time information, but just sticks everything in a list. You can’t extract any particular note or rest from the data in order to find its exact position.

To solve this I have created a node module that takes a MusicXML file and converts it into MusicJSON! This provides all the time information you would expect such as tempo, the absolute location, the location in a measure of a note or rest, the location of the beginning of measures, and an ISO8601 compliant timestamp location in milliseconds that is calculated using location and tempo information.

Having this kind of information encoded in JSON makes data visualisation far simpler and also makes the data set well suited to a range of machine learning and machine intelligence applications.

I have included a pianoroll visualization express app in this node module (which generates the visualization below), and is the test I have found most handy. You can check it all out on my github at



Angular research template

After using all kinds of editors over the years, from Emacs to Word to LaTex, I have to say there has never been a better time to be doing research. Tools like Jupyter Notebook and the MEAN stack mean that you can easily have access to an industrial weight data store right along side your academic writing. There is nothing cooler than having access to variables in academic writing, and being able to put together interactive visualisations so people can immerse themselves in your data when they are reading your stuff (well, ok, there is probably some cooler stuff than this…but even so).

But while this brave new world is cool, sometimes you do need to able to print stuff out. If you work across different disciplines especially, its not enough to have all your stuff living on the web. You need to be able to create high quality, camera ready papers suitable for hard copy printing.

To solve the issue, I created a cool angular research template. Styled for even the most discerning tweed wearing academic type, its a way to work in the web environment with a view to printing.

Screen Shot 2016-08-08 at 10.57.00 AM

The features?

  • A simple CSS set up to manage A4 pages and margins, fonts, and some Twitter Bootstrap styling, super handy for basic layout stuff.
  • Angular driven website under the hood, so all the advantages that come with that. You can beautifully organise your data in factories and all that other Angular garb. Directives, along with the CSS, give you that nice LaTex style control that we all love and crave.
  • I have included some other cool javascript libraries I always use like D3 for data visualisation and MathJax for beautiful math typesetting

You can check it out on my Github (which is another neat thing of this set up – simple version control!).