Categories1 are an emergent way of describing set creation operations based on attributes

Most of what I want to say here is taken directly from an old post on the BVN internal blog that I called Taxonomy. I’ll summarize it here and copy that post below so that you don’t need to go looking.

https://youtu.be/05WS0WN7zMQ

In the video above Richard Feynman2 tells a story about naming things. He separates the name of something from what that thing is[1. I think that this is the way ]. This is a safe and future proof w3irectly categorising things is lazy thinking and asking for trouble!


I made this presentation to explain how data is organised these days.

It’s part of the work on the People and Projects data cleaning, so I’ll refer to that throughout. It’s applicable to a lot of other things though so keep an open mind and it might be interesting.

taxonomy_Page_01

eyorePeople have always been interested in how to group like-things together. Methods of filing, organisation, classification, have taken up thousands of hours of discussion. There’s even a whole job title and academic discipline associated with taxonomy. Departments of library science etc.

There is a problem with old fashioned filing of physical items though:

taxonomy_Page_08

taxonomy_Page_09

Almost everything we work with is vague in one way or another. Consider organising these shapes:

classes

How do we know if she’s a witch? She’s either a witch or not a witch right?

https://www.youtube.com/watch?v=zrzMhU_4m-g

Luckily a new method of filing has emerged. By adding metadata to things4 we are able to filter things out of a giant pile really quickly.pile

The computer is able to make a big sieve that only lets through things that have certain attributes.

taxonomy_Page_11

A person can have attributes, a project can too. The big questions of our time are about what that means, and what we can do with that knowledge!

What’s really cool is that you can stack these sieves so that you can combine the filters. taxonomy_Page_14 taxonomy_Page_15

<a href=”http://www.amazon.com/gp/product/0387940944/ref=as_li_tl?ie=UTF8&camp=1789&creative=390957&creativ same thing as someone else if it suits their needs, but if they have new needs they aren’t forced to pick a “next best thing” option.

Downsides

With free-form tagging, even with cool, advanced versions with typeahead support and synonyms there are downsides.

If you have megalomaniac tendencies, or a god complex. free-form tagging is no good. You are ceding control to your vassals. You don’t get to have total control over your taxonomy. You should be wary of this if you don’t trust your staff.

The other downside is a bit more subtle and easier to work with. Tags and content in general needs to be gardened. This means that every so often someone needs to convert all the “HOuse” tags into “House” tags.

“OK you weird nerd, why are you telling me this?”

Well, mainly because I like you. I do this stuff and I might as well share what I’m thinking with people.

Secondly it’s part of the grand project to sort out all our data, so I’m sharing my philosophy of data with you and I’d love if you could ask me questions and probe the bits that I’ve missed. I’m looking for any comments, as aggressive as you like, about how we can make our data capture better!

<img class=”alignnone size-full wp-image-18194” src=”/assets/taxonomy_Page_21.png” alt=”taxonomy_Page_21” width=” same thing as someone else if it suits their needs, but if they have new needs they aren’t forced to pick a “next best thing” option.

  1. I recognise that I don’t know any Category Theory and not a great deal of Set Theory. I might be committing horrible sins with this statement. It works in it’s naive context though. 

  2. One of my heroes. If he isn’t one of yours then he really should be! 

  3. I think that this is the way 

  4. Yes the same metadata that all the fuss is about in the news