Data Science Skills – A Brief Guide

“Data is useless without the skill to analyze it” – Jeanne Harris, author of  “Competing on Analytics: The New Science of Winning

Are you looking to hire data scientists or develop them internally?

Or, you might be a new graduate wondering what skills are needed to be a top data scientist and what technical skills will be covered during data science assessments like QuantHub’s.

We put together this Data Science Skills guide to help you understand:

    • What skills are required for a data scientist?
    • What are the qualities of a good data scientist?

We’ll outline what the practice of data science covers and outline the key skillsets to look for in job candidates or to develop in your employees.

 

Skills Needed to be a Data Scientist

Before we get into specific skills, let’s address some basic definitions.

What is a Data Scientist?

The field of data science has existed for at least a decade in its current form.  You would think that it would be obvious by now what exactly a “data scientist” is and does.  To a certain extent, there’s agreement on what competencies and responsibilities constitute this role.

Data scientists are data experts who have the analytical and technical skills to explore and solve complex business problems.  Among other things, they manage data and find trends in it.

During the course of a typical day, a data scientist can assume many different roles from software engineer to data miner to business communicator.

In recent years, the rapid growth of artificial intelligence and machine learning applications has continued to evolve the competencies required of a data scientist.

The varied nature of data science, along with the continuous change in technical tools, can make it difficult for organizations and individuals to identify necessary skills.

For this reason, we see that in many cases data scientist job descriptions focus too much on very specific qualifications, making it difficult to match a person’s skills to the job.  In other cases, candidates who lack the requisite level of qualifications are being recruited for data science roles.

So it’s important to have a grasp on foundational data science skills – the “must-have” skills that are critical to building a successful data science team or to becoming a top data scientist, regardless of new developments in the field.

What is Data Science?

Data Science is a cross-disciplinary set of competencies and roles. It involves to varying degrees statistics, programming and business or industry skills.

The goal of anyone working in data science is to discover hidden patterns and insights from data.

Unlike “data analysis” which typically focuses on explaining patterns in existing structured data sets, data science makes predictions and decisions about the future based on yet to be identified patterns in any kind of raw structured or unstructured data.

Data science, in essence, is focused on discovering answers to questions that an organization has yet to think of.

What Does a Data Scientist Do?

Below is a diagram published in 2020 by IBM depicting the data science workflow. Data Scientists typically engage in all of these activities, each of which requires a certain skill set.

They first understand a business opportunity or context by working with management.

They then work across the organization to identify and uncover multiple data sources that relate to the business context of a project.

Working with IT and data engineers they’ll ensure that their data sources are reliable enough to base business decisions upon.

Once the requisite data is cleaned and ready to use, Data Scientists build and train predictive models using algorithms and a variety of modeling techniques.

Eventually, after several iterations, when a model is validated, and therefore valuable to the organization, they’ll assist in the deployment, or use, of the model in appropriate parts of the organization.

They’ll then monitor these models for success and performance over time and ensure the model maintains accuracy.

Finally, they’ll communicate any findings and results, usually through visualization techniques and tools.

 

Foundational Data Science Skills

There’s a long list of academic, technical and soft skills that may or may not be required for any Data Scientist role.  Core data science skills, however, fall into three buckets: math/statistics, programming/coding, and business/domain skills.

Math Skills

Math skills can be some of the most challenging competencies to obtain for a data science team. The reason is unclear, but we sometimes think it’s because a lot of math is taught theoretically, but data science is about applying math. Competencies in math as it relates to data science focus primarily on statistics, linear algebra and differential calculus.

Statistics/Probability

The foundation of data science involves descriptive and inferential statistical methods and probability.  Knowledge in these areas provides fundamental techniques to use when working with data.  Statistics is the process of working with and analyzing a data set to identify unique mathematical characteristics (i.e. mean or variance). These characteristics then allow Data Scientists to make decisions based on those data characteristics.

Statistics and probability are the most fundamental data science skills required to be a Data Scientist.  Just a few of the many skills required in this area include:

  • Probability distributions
  • Statistical significance
  • Hypothesis testing
  • Regression
  • Bayesian concepts
  • Central Limit Theorem
  • Experimental Design
  • Sampling Methods

Linear Algebra

Many machine learning concepts are tied to linear algebra. Along with calculus, linear algebra forms the backbone of algorithms, so at least a general understanding of algebraic functions is required of Data Scientists.  In the case of Machine Learning Engineer or someone working with deep learning algorithms, linear algebra concepts are critical.

Some relevant concepts include:

  • Mathematical objects (scalar, vector, matrix, tensor)
  • Computational rules (matrix-scalar, matrix-vector, matrix multiplication, etc.)
  • Inverse and Transpose

Calculus

Like linear algebra, calculus is a field of math key to machine learning algorithms.  Data Scientists use it in machine and deep learning to formulate the functions used to train algorithms to reach their objective. 

Data science-related skills include:

  • Uni-variate and Multi-variate calculus
  • Derivatives
  • Gradient descent

Programming Skills

Coding permits a Data Scientist to convert theoretical knowledge (i.e. of statistics) into practical applications.  It’s now widely accepted that every Data Scientist should know Python.  R is also an option but is losing ground to Python.

At any rate, a data science candidate should be able to code proficiently in one of these languages.

A solid understanding of programming concepts, data structures such as trees and graphs, and knowledge of commonly used algorithms is necessary to do the job.

Other fundamental programming techniques a Data Scientist should know are:

  • Basic syntax and functions
  • Flow control statement
  • Object-oriented programming
  • Libraries such as numpy and pandas
  • Documentation (reading and writing)

Business/Domain Skills

With the failure of many data science initiatives in the early days of big data, organizations recognize now that Data Scientists should have an understanding of basic business concepts.  It’s also highly recommended that you hire or develop Data Scientists that have some knowledge or experience in your particular industry.

This is one argument for building your data science team by developing internal employees who already have domain experience and context.

At any rate, Data Scientists should be willing and able to frame their work in the context of a company’s strategic business goals.

9 Top Data Science Skills

In addition to these broad buckets of core competencies, there’s another layer or list of skills beneath these that typically rounds out the “top” data science skills most data teams need. These are often broken into technical and non-technical skills.

Technical Data Science Skills

Data Wrangling

(source: I2tutorials)

Data wrangling constitutes a series of tasks that can take the majority of a Data Scientist’s time.  It’s critical that a Data Scientist be adept at data wrangling tasks because it’s often during this phase that important discoveries are made.

In all data science projects, data needs to be hunted down from a variety of sources, combined and formatted in such a way that it is reliable enough to use for decision making. This multi-step process is called data wrangling.

Digging up data often involves using hacking skills such as writing complex SQL queries to extract data, manipulating text files using python scripts or understanding coding algorithms.

In addition to finding necessary data, wrangling skills involve the ability to:

  • Understand the business question and clarify related data aspects, such as types of data to collect and time frame.
  • Data collection, which involves requesting and accessing various databases across the organization.
  • Data preparation (cleaning) which involves manipulating and cleaning data and dealing with anomalies such as missing values and outliers and redundancies.
  • Identifying relationships in data
  • Creating machines learning features by filling in missing data
  • Exploring data through visualization and reports

Essentially a Data Scientist must know how to get the right data for a project and know how to put it into a usable and valuable form.

Model Building and Deployment

Model building is at the core of executing data science initiatives.

Data Scientists need to know multiple modeling techniques, model validation, and model selection techniques. They also need to know how to deploy a validated model and monitor it to maintain the accuracy of results.

Some specific types of skills associated with model building include:

  • A predictive mindset
  • An understanding of predictive techniques (regression, classification) and why to use them.
  • Critical thinking about attributes
  • Understand how to interpret results and validate a model (K fold, leave one out)

Top-performing data scientists are differentiated by their ability to understand the use of different modeling methodologies to obtain insights from data that translate into value for the business.

They are also able to confidently defend their analysis and explain what they did and how their technique works.

SQL

SQL skills are a long-standing prerequisite for success.

This is because being able to do the right search for data can create a lot of value out of that data.  Having good SQL skills allows a Data Scientist to dig into the vast swaths of legacy and list-based data that goes unused and find the right kind of information using queries.

Some SQL skills specific to data science include:

  • Relational Database Model
  • SQL commands – data query language, data manipulation language, data definition language, data control language
  • Primary and foreign key
  • Null value
  • Subquery
  • Indexes
  • Creating tables
  • Joins

Data Visualization

Data scientists use visualization for exploring data and also for communicating the story that the data tells.

In order to communicate model results and analytical outcomes, data scientists must be able to present what might be thousands of rows of data in a way that is understandable. They do this using data visualization tools and techniques.

Part of visualization skills involves determining which visualization best fits the data set and expresses it most effectively. Basic level skills include creating graphs, charts, and other graphical images.  These include bar, scatter and line charts, heatmaps, and word clouds.

Visualization skills also include understanding the components of good data visualization: data, geometric, mapping, scale, and labels.

To create visualizations, data scientists may need to use Python or other coding languages or know how to use tools such as Tableau, Highcharts, PowerBI and Python libraries.

For end-user consumption, data scientists need to be able to transform data into a more interactive display that communicates insights clearly and effectively for use throughout the organization.

To do this a data scientist needs to be able to answer the question, what is the end-user trying to answer with this data?

Machine Learning

In this age of artificial intelligence, machine learning skills have become indispensable for data scientists. But what are these skills exactly?

They mainly involve being familiar with supervised and unsupervised algorithms.  A few of the key algorithms that a data scientist should be familiar with are:

  • Basic, multi and logistic regression algorithms
  • Linear model
  • Support Vector Machine
  • K nearest neighbor
  • Decision Trees
  • Neural Networks
  • K means clustering

In addition, anyone doing machine learning should be well versant in Python.

Non-Technical Skills

Data science is as much about people, teamwork and non-technical skills as it is nut and bolts mathematics.  So what are the qualities of a good data scientist that doesn’t involve technical skills?

The Data Science Process

There’s a data science methodology and workflow that all professionals should understand and follow.  In any interview, candidates should be asked to describe it.

The basic steps are:

  • Characterize and understand a business problem
  • Formulate a hypothesis
  • Choose and use a variety of methodologies in the analytics cycle
  • Plan for the execution of analyses

The last two steps are depicted in the schema below. It covers much of the technical skills described previously.

Data science workflows could look slightly different for different teams, companies and individual Data Scientists.  Generally, Data Scientists should know how to organize their work,  where to put data and code, which tools to use and why.

Source: Konstantin in Towards Data Science

 

Problem Solving Skills

Data Scientists should have a rigorous data-driven problem-solving approach to their thinking.  Top Data Scientists are able to discern which problems are important to solve and then model what is critical to solving the problem.

There’s no template for solving a data science problem. The path to solving a business problem changes with every new dataset.

In addition, the practice of data science is riddled with challenges like missing data values, uncooperative stakeholders and coding bugs.

Data Scientists need to be comfortable with this uncertainty of the job.

Communication

Along with being able to create great visualizations to communicate results to end users, Data Scientists must possess persuasive communication skills and strong interpersonal skills to see a project from start to finish.

In their role, they may have to interact with a variety of personalities and stakeholders from technical IT and software engineers to marketing managers and other functional staff to C-suite managers. Certainly, to progress in the ranks as a Data Scientist, communication skills need to be strong.

Curiosity

Albert Einstein famously said:

“I have no special talent, I am only passionately curious”

The same can be said for good Data Scientists.  This personality trait is often a key differentiator in job interviews.

Data is messy and complex.  No one knows what insights it holds. It’s up to the Data Scientists to be curious about what data can tell a business and figure out a way to find that out.

To do this, they must be naturally curious, creative and eager to try new things, experiment and apply new concepts to their work.

Summary

We’ve covered a lot of skill sets and competencies in this guide, yet we’ve really only scratched the surface.  We’ve left out many other highly specific skills such as Hadoop, TensorFlow, deep learning and other “nice to have” skills such as cloud software skills and data ethics.

The fact is there are over 50 potential skills and tools that could be required of a Data Scientist for a particular role or company. It’s impossible to cover them all.

These 3 broad data science skill buckets and 9 additional skills are what we consider to be bottom-line skills for today’s Data Science candidates that will allow them to hit the ground running and grow and learn in their role to acquire the many other skills out there.

That’s why perhaps the most important skill of all for a Data Scientist is the desire to learn and improve their data science skills.

If you’d like to read more detail about specific data science skills check out our article outlining “50 Data Science Interview Questions”, which will give you an idea of what skills areas to assess.

Would you like to learn more about assessing data science skills?  Contact sales@quanthub.com to discuss your ideal data science skill set and we’ll help you figure out how to assess candidates for those!