Michael Bauer works at the School of Data and Open Knowledge Foundation teaching journalists and NGOs how to better construct stories from data. He told me how he started out, and what his advice would be to young journalists who want to get to grips with data.
Michael: I studied medicine, and I did research, and I did technology things all the time on the side, and while doing research I realised that a lot of people are not good at dealing with data, so I thought it would be good to help them, to teach them to how to do this. I was able to do research my colleagues couldn’t. After a detour I ended up with the School of Data from the Open Knowledge Foundation where we aim to teach how to do things with data to journalists and to charity organisations.
Nabeelah: How did you end up at the School of Data and the Open Knowledge Foundation and how did you become interested in particular in teaching people like journalists?
Michael: So one thing was realising I could understand the world in a certain way, if I’m good at handling data, handling information that comes in, and I realised that when I was doing research my colleagues couldn’t. The people who were doing research with me just couldn’t – they were struggling every time they get large amounts of data and they spent a lot of time processing the data, not as much time thinking about what it means. And for me, one of the things I want ideally to achieve is that this lost time in the process doing – thinking about data and thinking about how to get from A to B, more time thinking about your story, more time thinking about what it actually means, and what it means in context, which really takes you to come up with some numbers and come up with a beautiful graph.
And the OKFN was hiring last year and i just stumbled across it. And I saw their project at the School of Data and I thought “this is exactly the thing I thought about when I left research”. So I’m really happy to be there and our focus are not researchers, our focus is really more journalists and NGOs, like being around teaching journalists and people like NGOs how to build data-driven stories, how to build campaigns.
Nabeelah: Are there any stories you’ve helped with that you’re really proud of having helped with? Like, a journalist you’ve taught has gone away and managed to find out something important or do something with the skills you’ve taught them. And why do you think data journalism can do more than traditional journalism?
Michael: Actually, I don’t think it can do more – I don’t even like the term ‘data journalist’. For me data is a tool, and it’s a tool like a pen or a typewriter. People didn’t used to call themselves ‘typewriter journalists’ because they used typewriters instead of a pen. And interestingly with computers this happened – this computer assisted reported, which is nothing else but data journalism. I see actually an ability to deal with data the ability to deal with information that comes in the short form, as something that covers all journalism, and i would call out to making a divide here between the journalists and the data journalists because for me it’s one of the tools you have as a journalist to find stories, one of the tools you have as a journalist to find evidence for what is happening and to understand what is happening is data.
And on the story side, I don’t have one in mind. If you go out to workshops and you talk to people, trying to open their minds to work with data and you try to teach them, they rarely report back to you what they do. I have started working with a journalist on a story that is unfortunately not published yet – it’s on hold. As a result of one of these workshops. It’s a large research project into environmental data, and because we have networked together, we started working on it together.
Which I think is another thing that’s worth mentioning if we’re talking about these new kinds of journalism where you always think about fancy news apps and we think about data projects and big investigative projects. A lot of this can’t be done by a single person. So I think it’s very important if you start to look into the technological field and to use more technology in reporting, to actually reach out to other people. Reach out to local Hacks Hackers for example where journalists meet technology people who are interested in journalism. To collaborate, to find people who can help you whenever you need help. A big story is not done by a single person a big story is done by multiple persons. There’s no reason you can’t have a technologist on board…
Nabeelah: On the subject of data being another tool you can use, the visualizations on your website are really beautiful. What tools are some of your favourites for creating visualizations and graphs?
Michael: That’s a good question. I use different kinds of things, I’m very good at programming, I’m not so good at designing, you might have noticed. I’m good at programming so I tend to use tools that require you to programme and that I just pick up for a project to get an idea of how they work. With three.js for example to draw graphics, incredibly stunning beautiful animated interactive graphics. But it’s really hard for a journalist to pick up. Often what you want is something quick – “I have a set of data and I’d just like to highlight the issues”. And there’s one in particular I want to mention.
Good friends of mine, Gregor Aisch and Mirko Lorenz, started a project called Datawrapper which is basically a website where you can upload a data file and make simple charts, specially targeted towards journalists. So this is one of my favourite tools out there when I need to teach someone to make simple graphs. If I want to make my own thing I’d probably go down a different road. But if I want to teach this to someone I’d say hey this is something you can use. Gregor has lots of ideas about information design and visualization, and they put all of this into this application. So what this application helps you to do is to produce beautiful graphs that follow all the rules of information design.
Nabeelah: I noticed one of your talks in Perugia was about “social network analysis”. I’m interested in what you mean by that and how you collect the data from twitter.
Michael: So social network analysis is basically a subsidy of very complicated mathematics. Mathematics to deal with networks because a lot of things behave very similarly to networks. Social network analysis deals with networks between people –and you can apply the same kind of analysis to other networks, like electricity networks, or the internet, or how documents are related to one another, by a similar methodology. And what you’re trying to find is – if I want to do some queries – what’s important in a certain area, what do people talk about, who talks? Who are the key players in this area, are there different groups talking about the same topic? I can then figure out who are the key players within a group and start talking to one of them. So for journalists it’s not so much as a result, (although the graph of this tends to look beautiful and you can do nice animated things with it), it doesn’t tell you that much. I think it’s an important research tool to find leads. Who do I talk to if I want to talk about this? Who has perspectives and opinions on this, and where can I find a way into this topic? And I think that’s the important thing.
I use twitter simply because twitter is relatively easy to get data out of. Twitter offers an API which is an application programming interface, which is specifically a way that software or your computer can ask the website for information.
What we did in the workshop was we used Google Refine, the data cleaning tool, to clean and collect and explore data, to connect to twitter and get the data out of twitter into Google Refine into a table format, because APIs don’t usually give you tables, APIs usually give you a very complicated data structure. Google Refine can interface that and put it into a table, and then you can clean it up a little bit and figure out who is talking, who they are talking to, figure out other things like hashtags, and then say okay, yeah, this is the person – and create a simple data map.
There will be a video at some point but we have all the instructions how to do this on the School of Data blog.
Nabeelah: Are there any new tools out there apart from datawrapper that are more complex tools you’d use yourself, that you really like, or are there any tools you’d like to see people create, that you think would be useful in the future?
Michael: There are a few more complex things that I think are reasonably good and are really worth knowing. One of them is Google Refine. If you work a lot with data, look at refine. Stop using spreadsheets for things and start moving into Refine. Refine also makes a lot of stuff, like categories, reconciliation features, a lot of cleaning features, lots of features that remove common mistakes in data files.
The other possibility is to get data out of APIs on the internet, so out of different websites. This is one of the things that is really great to know.
The other thing I use is Gephi which is the social networking thing. This is social software for looking at other networks – the network can be anything, it can something liek resource flow, it can be connections between politicans and companies, it can be company structures, it can be all these things. So there’s a lot of use there for ismply trying to visualize and analyse certain structures, and how things are related to each other. So it’s good for visualization of that.
One thing I miss and would love to see created is – one of the most common problems that you’ve probably experienced as well is getting PDF reports. Lots of organisations and governments don’t publish data in machien readable format. They publish it as PDF reports because they still think of print and people reading it. And what we’re missin is good ways to get data outof PDFs.
There are forays into it, and Tabula that was released by people around Mozilla Open News is one of the ways of trying to do this. Unfortunately it’s not at a level that can be easily used by journalists. Although it’s created for journalists sort of. So there’s something that needs to be done in this sector – you have commercial software that tries to do this but we still need to find ways to get data out of PDFs so that everyone can do it at home. So I think this will help us get access to a lot more information than we get right now.
Nabeelah: My last question is – I often hear people talk about those who are journalists first and then those who have come at it from a technological side. Is it possible to be very good at both, and what advice would you give to students coming at it from the journalistic rather than the technological side of things?
Michael: I do believe you can be good at both. I used to be a researcher and do computer stuff on the side. I was interested in technology. Other things I did in my free time during med school, or during high school, when I started playing with computers because it was fun. I built a technological expertise that I never thought I would use in my day to day job, because I was focusing on doing something else. And then whilst doing research I realised I could do all these things. I was a researcher who was good at technology, which is not a common thing to see. I believe there are journalists out there, and I know one or two in journalism, who are also very good at technology.
So you have those people who bridge the two, but they are rare. So one thing I would suggest to people who come from journalism is not to be afraid of it. Those who are not familiar with data always have the impression they have to use a lot of complicated math and numbers, when in reality most of the things we do are relatively straightforward. It’s simple adding numbers and dividing them. So it’s not complicated math – if you can do basic adding and subtracting you can do this. And also programming computers is not that hard, there’s not that many things you can break. It’s basically a very safe space. It’s not like learning to do rock climbing without a rope, because once you drop, you’re probably not climbing another rock for a few months. If you make a mistake there’s very little that can happen.
So what we see a lot is that people are afraid of touching technology and computers, and working with computers in ways they’ve never tried. In the same way people are afraid to do things with data because they think of data as this big complicated thing you need specialists to look at. And a lot of what we’re trying to do at the school of data – some of the workshops we have, are designed to make working with data less intimidating. You’ll only get good at it if you do it, and you won’t do it if you’re intimidated. So we try to take the scary parts out of it and make it a little more fun, and encourage people to just try new things.
And I think if you’re seriously interested in doing journalism and using data as a tool in journalism, just start playing around. When you have a question like – ‘how can we understand the recent crisis in Europe?’ The easiest thing is to try to find numbers, which economic indicators economists use when they talk about these things, and play around with it, and see if you can find something interesting. The only way to learn and get better is by actually doing it.