What is Data Science? Why Do I Care?
Why Do We Care About Data?
Simply put, data touches every aspect of our lives.
Every day, roughly two and a half quintillion bytes of data are produced and we expect that number to rise to 463 exabytes by 2025. Each of these data points collected offer an opportunity to better understand a specific problem.
Consider the following examples of how data is used to uncover valuable insights:
- Spotify recommends weekly playlists based on the songs you play the most.
- Amazon’s predictive analysis suggests items similar to your previous purchases and ships them in anticipation of your order to reduce delivery times.
- Hospitals utilize analytics to reduce inefficiencies such as patient wait times, readmissions, and understaffed units. This can end up saving lives!
- Smart power grids can predict power outages and signal maintanance.
- Reporting and tracking systems are used to improve agricultural production in developing countries and thus reducing hunger and promoting rural growth.
- Police deperatments use data to predict and map crime across cities.
- Lawyers can analyze how specific federal judges give verdicts.
- The IRS generates profiles which are used to forecast individual tax returns and flag tax evasion.
What is Data Science?
Hopefully those case studies convinced you that we should care about data and what we can do with it. However, simply collecting the data is just scratching the surface. In order to effectively put data to good use, organizations need analysts who can distill actionable insights and inform strategy.
The process that data has to go through is best described by the Data Science Life Cycle.
There are a multitude of data-focused careers out there, and each employers has a slightly different title and description. One of the most popular is data science. At the core, a data scientist’s goal is to gain insight and understanding. Jeff Leek has one of the best categorizations of the types of insights that data science can produce that I have come across. They include descriptive (“the average client spends $30”), exploratory (“different products are more successful than others”), predictive (“we predict that a client with X,Y attributes will spend $Z”), and causal (“a randomized experiment shows that customers are more likely to buy a product when there is music playing in the store than when there is silence”).
Keep in mind, not everything that produces insights qualifies as data science (the classic definition of data science is that it involves a combination of statistics, software engineering, and domain expertise). The main marker is that in data science there’s always a human in the loop: someone is understanding the insight, seeing the figure, or benefitting from the conclusion.
This definition of data science thus emphasizes:
- Statistical inference
- Data visualization
- Experimental design
- Domain knowledge
- Communication
Data scientists might use simple tools: they could report percentages and make line graphs based on SQL queries. They could also use very complex methods: they might work with distributed data stores to analyze trillions of records, develop cutting-edge statistical techniques, and build interactive visualizations. Whatever they use, the goal is to gain a better understanding of their data.
Note: Data science is not the same thing as Machine Learning (ML) and Artificial Intelligence (AI).
If you have a question or a suggestion related to the topic covered in this article, please feel free to contact me!