This week we introduced the Open Data Project. It’s worth reading the post that introduces the assignment. It’s here.

A little extract of that:

Do these things to make your life better in the long run:

These tasks are going to feel like an imposition. They aren’t marked, but doing them will make everything so much easier. Think of it as a warm up; you don’t get any Olympic medals for warming up, but you’ll never :sports_medal::sports_medal:in one if you don’t warm up! These are good things to put into your lab book.

State your question. Be very clear about what you are asking. This will make communicating with your tutors a lot easier too.

  • Document your data source. Where did you get it from? When?

  • Document the data. If it comes as a table, what does each column mean? What type of data is it. E.g. this is some data from an Uber data set:


    "4/1/2014 0:11:00",40.769,-73.9549,"B02512"

    This is fairly obvious, but what does the datetime actually mean? Is it a pick up time? Same for the lat long. What is Base?

    This is a simple data set, some of the government sets have a lot more columns. Going through them and documenting their data-type and being specific about what they refer to will be really good for you.

  • Think about your story. In your career you’ll be looking at a data set to help you tell a story. If you know what that story is then you’ll find it a lot easier to find supporting graphs and statistics to tell it for you. (The only people I can think that this doesn’t apply to are solo investors, and even they are probably telling a story to themselves!)

So, think about what you care about in life, and then go to and and see if anything is related to that. (edited)

By next session, we want to be able to see:

  • Your subject - what are you interested in?
  • A documented list of all the value types in your data, what data type they are, the ranges of values (by eye) and how each value is collected.

As a tip, I would avoid data contained in PDF files like the plague. PDF is an old format, and it’s not queryable. Its enthusiastic use by someone is a strong indicator that that person is a low quality human.

This week’s reading :books:

The Catalog of Refactoring.. Have a look around this site for inspiration.

Atwood, J (2009) Paying Down Your Technical Debt Worth reading the comments on this one too.

Doherty, B (2015) Technical debt: stealing from ourselves

and then ready for next week:

Urban, T. (2015). The Artificial Intelligence Revolution: Part 1 - Wait But Why.

Urban, T. (2015). The Artificial Intelligence Revolution: Part 2 - Wait But Why.

M. Sanderson, A. Sandberg (2015). Battle Cry - Anders Sandberg on ethical AI :headphones:

Dijkstra, E. W. (1979) Programming Considered as a Human Activity.


Vi Hart ripping some recursive space filling curves!

This week’s homework

Some exercises, the reading, and looking for a dataset.

Week 4 & 5 exercises

These are about IO, refactoring and recursion.


Videos coming up


Videos coming up


Videos coming up