What is data science?

By Data Tricks, 29 April 2019

Over the past five years or so, it’s been hard to ignore the rise of data science. Everyone seems to be talking about it. And, as put by a senior director I was recently talking to, ‘everybody wants to be a data scientist’.

If Google’s search trends are anything to go by, the rapid rise of data science began around 2013.

Google search trends for data science

But the concepts that underpin the field we now know as data science didn’t suddenly materialise in 2013. In fact, machine learning – a cornerstone of data science – has been around for at least 70 years thanks to pioneers such as Alan Turing.

Interestingly, many machine learning algorithms don’t attract the same level of interest in Google searches. More searches for support vector machines were carried out in 2004 than today.


Google search trends for support vector machines

Perhaps this is not surprising given that support vector machines has been around for 25 years.

So, if machine learning predates the music of Elvis Presley, why has it taken so long for data science to take off?

Digital transformation

Digital transformation is now widespread among many industries, which has led to more volume, velocity and variety of data. Over 2.5 quintillion bytes of data (that’s 2,500,000,000,000,000,000 bytes) are created every single day (see Domo’s report), and that’s increasing all the time. At the same time, advances in cloud computing and distributed file systems have led to the ability to store, retrieve and analyse this data easier than ever before.

For a long time, organisations have used data to measure how well they are performing. But now, an increasing number of businesses are realising that data can help them predict future trends, and can even help inform their direction and strategy to drive greater success.

Data science vs. statistics

So we’ve got more data, a greater ability to analyse it, and the will of businesses to extract useful information from it. This is where data science comes in. But you might also be thinking, isn’t this where statistics comes in? And you’d be right. As an article (which, ironically, sets out to explain the differences between data science and statistics) published on Educba rather unhelpfully puts it, “statistics is the science of data”.

Other opinions, including this article (which is in need of updating, but interesting nevertheless), state that broadly there is very little difference between the two. Indeed, pick up any syllabus for a degree in statistics – particularly an Applied Statistics degree – and you’re likely to find modules on machine learning, forecasting, or even data science.

It might not be a coincidence that Google trends for the term statistics have seen a gradual decline whilst data science has been rising.


Google search trends for statistics

So is data science simply statistics rebranded? The now ubiquitous data science venn diagram is one of the most helpful illustrations of what data science is.

A data science venn diagram

From this illustration, data science is clearly a very wide field. Generally however, it is the process of using statistical and mathematical models and scientific methods to extract knowledge and insight from data, and using that insight to inform business strategy and direction.

Conclusion

Today, great data scientists will have a combination of statistical and mathematical understanding, coding skills in order to apply that knowledge to real scenarios, and the business acumen to convert findings into future strategy.

Tags: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that your first comment on this site will be moderated, after which you will be able to comment freely.