Data Roaming: A Portable Linux Environment for Data Science

When trying to fit data science in with a busy schedule of study, one often needs to work from shared university or library computers. Rather than spending the first 15 minutes of your working session reinstalling software, why not create a bootable USB stick with all your requirements ready to go?

AI is Not as Smart as You May Think (but That's No Reason Not to Be Worried)

2020-10-02

Data Science / Best Practice

21 minutes read (About 4100 words)

AI is Not as Smart as You May Think (but That's No Reason Not to Be Worried)

The world of AI is full of hype, making it hard to distinguish real threats from fiction. This post is one of a pair and discusses the current challenges and limitations that AI systems face, particularly with regards the large obstacles that must be overcome before any existential threat from AI could manifest itself. The other details the current use of AI in military applications and the risks that this introduces. In all, these posts aim to present you will an accurate view of the current state of AI and direct focus towards the threats from AI that require the most attention going forwards.

2020-04-05

Data Science / Best Practice

8 minutes read (About 1500 words)

Is it Time to Ditch the MNIST Dataset?

The MNIST dataset is the bread and butter of deep learning. Featuring 70,000 handwritten, numerical digits partitioned into a training and testing set, the dataset is the go to candidate for a large proportion of introductory tutorials, benchmarking tests, and data science showcases. This post questions the suitablility of this dataset for such uses, attributing this shortcoming to the excessive simplicity of the challenge it presents when tackled with modern machine learning tools. Additionally, we look at alternatives to the dataset that demonstrate a more appropriate challenge without fundamentally changing the learning problem.

2018-12-06

Data Science / Best Practice

13 minutes read (About 2500 words)

The Inaccuracy of Accuracy

In a data-driven world, your analyses will only ever be as good as the metric you use to evaluate them. In this post, I make the claim that the de facto metric used in data science is unfit for purpose and and can lead to the construction of unethical models. If this is the case, what should we use instead?

Streamlining Your Data Science Workflow With Magrittr

2018-10-21

Data Science / Best Practice

10 minutes read (About 2000 words)

Streamlining Your Data Science Workflow With Magrittr

The Tidyverse is here to stay so why not make the most out of it? The `magrittr` package extends the basic piping vocabulary of the core Tidyverse to facilitate the production of more intuitive, readable, and simplistic code. This post aims to be an all encompassing guide to the package and the benefits it provides.