What is data science?

By Data Tricks, 29 April 2019

Over the past five years or so, it’s been hard to ignore the rise of data science. Everyone seems to be talking about it. And, as put by a senior director I was recently talking to, ‘everybody wants to be a data scientist’.

If Google’s search trends are anything to go by, the rapid rise of data science began around 2013.

But the concepts that underpin the field we now know as data science didn’t suddenly materialise in 2013. In fact, machine learning – a cornerstone of data science – has been around for at least 70 years thanks to pioneers such as Alan Turing.

Interestingly, many machine learning algorithms don’t attract the same level of interest in Google searches. More searches for support vector machines were carried out in 2004 than today.

Google search trends for *support vector machines*

Perhaps this is not surprising given that support vector machines has been around for 25 years.

So, if machine learning predates the music of Elvis Presley, why has it taken so long for data science to take off?

Digital transformation

Digital transformation is now widespread among many industries, which has led to more volume, velocity and variety of data. Over 2.5 quintillion bytes of data (that’s 2,500,000,000,000,000,000 bytes) are created every single day (see Domo’s report), and that’s increasing all the time. At the same time, advances in cloud computing and distributed file systems have led to the ability to store, retrieve and analyse this data easier than ever before.

For a long time, organisations have used data to measure how well they are performing. But now, an increasing number of businesses are realising that data can help them predict future trends, and can even help inform their direction and strategy to drive greater success.

Data science vs. statistics

So we’ve got more data, a greater ability to analyse it, and the will of businesses to extract useful information from it. This is where data science comes in. But you might also be thinking, isn’t this where statistics comes in? And you’d be right. As an article (which, ironically, sets out to explain the differences between data science and statistics) published on Educba rather unhelpfully puts it, “statistics is the science of data”.

Other opinions, including this article (which is in need of updating, but interesting nevertheless), state that broadly there is very little difference between the two. Indeed, pick up any syllabus for a degree in statistics – particularly an Applied Statistics degree – and you’re likely to find modules on machine learning, forecasting, or even data science.

It might not be a coincidence that Google trends for the term statistics have seen a gradual decline whilst data science has been rising.

So is data science simply statistics rebranded? The now ubiquitous data science venn diagram is one of the most helpful illustrations of what data science is.

From this illustration, data science is clearly a very wide field. Generally however, it is the process of using statistical and mathematical models and scientific methods to extract knowledge and insight from data, and using that insight to inform business strategy and direction.

Conclusion

Today, great data scientists will have a combination of statistical and mathematical understanding, coding skills in order to apply that knowledge to real scenarios, and the business acumen to convert findings into future strategy.

Tags: data science, machine learning, statistics

Free data science in R guide

Sign up to our newsletter and we will send you a series of guides containing tips and tricks on data science and machine learning in R.

No thanks

Machine learning

Confusion matrix in R: two simple methods

April 13, 2021

Two of the best methods to calculate a confusion matrix in R – from scratch or with the caret package.

Machine learning

Feature scaling in R: five simple methods

November 18, 2020

Five simple methods for applying features scaling in R.

Machine learning

The quickest way to check for missing values in an R data frame

November 3, 2020

How to check how much missing data you have in your data frame, and in which columns.

Featured

How to choose the right statistical test

September 9, 2020

What is a statistical test and how do I choose the right one?

Blog

Exam results 2020 – the challenges of moderating exam results

August 13, 2020

The challenges of moderating exam results and why ministers should stop interfering.

What is data science?

Digital transformation

Data science vs. statistics

Conclusion

Leave a Reply Cancel reply

Free data science in R guide

You might also like

Confusion matrix in R: two simple methods

Feature scaling in R: five simple methods

The quickest way to check for missing values in an R data frame

How to choose the right statistical test

Exam results 2020 – the challenges of moderating exam results