Data science interview prep (relax, start here)

By Julia Hamilton on July 28, 2021 How we wrote this article

Data science interview preparation is a big challenge. There are a variety of question types you'll need to master, and tech companies like Meta, Amazon, and Google don’t always focus on the same ones.

So, what do you do? And where do you start? That’s where we come in.

From analyzing 300+ data scientist interview questions reported by real candidates on Glassdoor, we’ve got a good idea of how the recruitment process works at top tech companies.

Below, you’ll find an interview timeline, a breakdown of the five most common question types, practice examples, and links to resources for extra interview prep.

To begin, here’s a quick overview of our step-by-step guide:

Learn the data science interview process
Know the question types
Practice with example questions
Do mock interviews

Click here to practice 1-on-1 with ex-FAANG interviewers

1. Learn the data science interview process

What's the interview process and timeline for a data scientist? We'll be giving a general overview here, but if you need insight into a particular company, refer to one of our company-specific preparation guides below.

Generally, the process takes takes between 2 and 8 weeks in total but can take longer. It typically follows these steps:

1.1 What interviews to expect

HR recruiter phone screen (~30 min)
Technical screen (1-2 screens, 45-60 min)
Onsite (5-6 interviews, 45-60 min each)

Let's take a look at each of those steps in a bit more detail.

1.1.1 HR recruiter phone screen

In most cases, you'll start your interview process by talking to an HR recruiter on the phone. They are looking to confirm that you've got a chance of getting the data science job at all, so be prepared to explain your background and why you’re a good fit at the company. You should expect typical behavioral and resume questions like, "Tell me about yourself", "Why [this company]?", or "Tell me about your current day-to-day."

If you get past this first HR call, the recruiter will then help schedule a technical screen. At this time, they’ll also walk you through the next steps in the hiring process and they’ll likely share some company resources to help you prepare.

1.1.2 Technical screen

If you get past the HR call, you’ll make it to the technical screen, where you’ll have one or two phone interviews and/or a take-home assessment. The type of interviews you’ll face at this stage depends on which company you’re applying to.

Facebook tends toward one or two interviews with a focus on SQL and product analytics, while Google’s technical screen consists of one interview centered around statistics and coding. Amazon has the most variation in technical screens, with one or multiple take-home assignments and/or interviews focusing on live coding and machine learning questions.

Whether you’re interviewing with one of these companies or a different one, check with your recruiter if you are unsure of the process ahead of you. To prepare for specific question types, see sections 2 and 3 below.

1.1.3 Onsite

If you pass the HR and technical screens, you'll be invited onsite. The onsite interviews are the biggest test for data scientist candidates. During this interview loop, you'll have 5 or 6 interview rounds lasting between 30 and 60 minutes, in addition to lunch in the company cafeteria.

You'll mostly be interviewed by current data scientists. But, depending on the company, role, and circumstances, you may also have interviews with an HR rep, a senior executive or, in Amazon’s case, the “Bar Raiser.” If you're physically onsite, your lunch may take the form of an informal chat with a junior employee, or an additional lunch interview.

[COVID Update] Given the Covid-19 pandemic, your onsite interviews will likely be conducted virtually. You can ask your recruiter for the latest information on their Covid-19 adjustments.

Right, ready to get into the interview questions?

Let's go.

2. Know the question types

The questions you'll be asked in data scientist interviews can be boiled down into five broad categories. Below, you'll find a breakdown of each type, showing the frequency at which they appeared in the 300+ questions we’ve analyzed from leading tech companies.

Coding questions (38%) test your problem-solving and data manipulation skills through SQL, data structure, algorithm, and modeling questions.
Statistics questions (21%) test your understanding of statistical concepts and your experience applying them in your past projects.
Machine learning questions (17%) test your ability to build innovative systems that improve and remain accurate over time.
Product sense questions (12%) test your ability to use your technical knowledge to drive business and product decisions.
Behavioral questions (13%) test your culture fit to the company with which you’re interviewing, according to your past experiences and current motivations.

To understand the full breakdown of each question type across the three companies we studied, refer to the graphic below.

Data science interview question breakdown

If you're wondering how to prepare for these question types, don't worry, we have plenty of examples later in this article that should help, along with extra resources to study.

Note that many of these questions come in the form of case studies. For more information on what to expect and how to prepare for data science case interviews, take a look here.

3. Practice with example questions

Now that you’ve seen the high-level breakdown of the most common data scientist interview questions, let’s get into some examples.

3.1 Coding (38%)

While the frequency of coding questions will vary by company, all data scientist interviews will require a solid technical skillset. This is because data scientists must be able to work with their company's vast datasets to understand and solve real-world problems.

Of course, coding encompasses many types of questions that require different skills. So we’ve divided this category into the following subcategories:

SQL
Data structures and algorithms
Modeling
Statistical coding (Google only)

Consult the graphic below to see the frequency of each subcategory as compared with the other question types.

data science interview question percentages

As you may have noticed, SQL as well as data structure and algorithm problems tend to come up as frequently as product sense and behavioral questions. This highlights just how important technical skills are in a data scientist role. Spend extra time preparing this section, using the questions and resources below.

In most cases you will be coding on a whiteboard (or the virtual equivalent), so practice writing your scripts and queries on paper while speaking through your reasoning.

Below, you’ll find real data scientist interview questions reported by past candidates, links to solutions when appropriate, and resources to help you prepare.

Data scientist interview questions - Coding

SQL (14% of total)

How would you find the top 5 highest-selling items from a list of order histories?
Given three columns of data, how would you compare the first three to the last three?
How do you calculate the median for a given column of numbers in a data set?
Provided a table with user_id and the dates they visited the platform, find the top 100 users with the longest continuous streak of visiting the platform as of yesterday.
Provided a table with page_id, event timestamp, and an on/off status flag, find the number of pages that are currently on.
What's the difference between a left join, a union, and a right join?

Data structure and algorithms (12% of total)

How do you invert a binary tree? (*Frequent question. Solution)
Given a bar plot, imagine you are pouring water from the top. How do you qualify how much water can be kept in the bar chart? (Solution)
Write a Python function that displays the first n Fibonacci numbers. (Solution)
Given a list, search for consecutive numbers (n) whose sum is equal to a specific number (x).

Modeling (6% of total)

We have two models, one with 85% accuracy, one 82%. Which one do you pick? (Solution)
How would you improve a classification model that suffers from low precision?
How would you create a model to find bad sellers on marketplace?
Assume you have a file containing data in the form of data = [{"one":a1, "two":b1,...},{"one":a2, "two":b2,...},{"one":a3, "two":b3,...},...] How could you split this data into 30% test and 70% train data?

Statistical coding (4% of total - Google only)

Write a function to generate N sample from a normal distribution and plot the histogram. (Solution)
Write code to generate iid draws from distribution X when we only have access to a random number generator.
Given a list of characters, a list of prior probabilities for each character, and a matrix of probabilities for each character combination, return the optimal sequence for the highest probability.

To prepare for the coding interview questions, start with the video below that shows a step-by-step method by Amazon for answering programming questions.

Practice this method using practice questions such as those above. Also, practice SQL and programming with medium and hard level examples on LeetCode, HackerRank, or StrataScratch. For even more help with SQL, read this analysis of the 3 "types" of SQL problems.

3.2 Statistics (21%)

Data scientists must be able to derive useful insights from large and complex datasets, which makes statistical analysis an important part of their daily work. Additionally, for companies that place an emphasis on machine learning, statistical analysis can help to clean and sort data for machine learning projects.

Prior to your interviews, take some time to brush up on statistics fundamentals and to practice giving concise explanations of statistical terms (e.g. p-value, recall, etc.). In addition, it's pretty common to get questions related to A/B testing, probability, and statistical models. At Google, this is the most prevalent question type.

Below, we've listed several example questions that were asked in prior data scientist interviews, according to data from Glassdoor, followed by resources to help you out.

Data scientist interview questions - Statistics

Make an unfair coin fair. (Solution)
What is the assumption of error in linear regression? (Solution)
Given uniform distributions X and Y and the mean 0 and standard deviation 1 for both, what’s the probability of 2X > Y? (Solution)
What is p-value?
What is the maximum likelihood of getting k heads when you tossed a coin n times? Write down the mathematics behind it.
What is the difference between linear regression and a t-test?
How would you do an A/B test on a new metric to see if it truly captures meaningful social interactions?
What is "recall", can you explain it from scratch?

For extra help on preparing for statistics questions, Brilliant.org offers online courses designed around statistical probability and other useful topics, some of which are free. Search for specific questions and answers around statistics, machine learning, data analysis, and others on StackExchange. Finally, you can post your own questions and discuss topics likely to come up in your job interview on the statistics subreddit.

3.3 Machine learning (17%)

Data scientists devote a lot of their time to creating products and solving problems that are endlessly complex and constantly evolving. So your interviewer may test your ability to build innovative algorithms that improve and remain accurate over time.

Depending on the role, your interviewer may ask you to define and discuss specific ideas around system design and machine learning models. More in-depth machine learning rounds will require you to build out a hypothetical model or discuss how to improve existing ones related to real-life business decisions (more common at Google and Amazon). Other companies, like Facebook, may not focus on machine learning in their interviews.

Let’s get into some example questions.

Data scientist interview questions - Machine learning

Why use feature selection? (Solution)
When using a Gaussian mixture model, how do you know it is applicable?
What is the difference between K-mean and EM?
Describe a case where you have solved an ambiguous business problem using machine learning.
How does a neural network with one layer and one input and output compare to a logistic regression?
What is L1 vs L2 regularization?
What is the difference between bagging and boosting?
Having a categorical variable with thousands of distinct values, how would you encode it?

To prepare for machine learning questions, StackExchange is useful here as well, and Kaggle also offers free courses around introductory and intermediate machine learning, as well as data cleaning, data visualization, SQL, and others. Take a look at Facebook’s machine learning field guide for an end-to-end process for implementing machine learning solutions. This can be helpful whether or not you’re interviewing with Facebook.

3.4 Product sense (12%)

In addition to all the skills we’ve mentioned above, data scientists help to drive product and business decisions. Through a variety of techniques, they generate insights that must be able to improve a company’s products and grow its business.

So come prepared to apply your technical knowledge to business case scenarios. For example, Google tends to ask questions that use statistical A/B testing to compare the performance of their products and services, while Facebook tends to ask questions about metrics (e.g. how to set good metrics, how to react to metric changes, etc.). On the other hand, Amazon asks very few questions on product sense.

If your recruiter has indicated incoming product or business sense interviews, use the real example questions and resources below to start preparing.

Data scientist interview questions - Product sense

You have a Google app and you make a change. How do you test if a metric has increased or not? (Solution)
Facebook user groups have gone down by 20%, what will you do?
How would you improve product notifications?
How would you set up an experiment to understand a feature change in Instagram stories?
How would you compare if upgrading the android system produces more searches?
How do you detect viruses or inappropriate content on YouTube?
Given there are no metrics being tracked for Google Docs, a product manager comes to you and asks, what are the top five metrics you would implement?

To prepare for product sense questions, we'd recommend studying our product management guides on metric or product improvement questions. Also, familiarize yourself with the products produced by the company you're interviewing with, as you'll likely be asked questions about them. Here's a short resource guide related to product metrics, included in Facebook's onsite interview guide. Finally, watch this short video for a method to answer data science product sense questions.

3.5 Behavioral (13%)

Finally, you can expect to be asked behavioral or "resume" questions about your past work experience and your motivation for applying to the company at hand. Your interviewers are looking for you to demonstrate your culture fit as well as your ability to communicate clearly.

Be strategic by aligning your answers for behavioral questions with the top qualifications that are listed in the description of the job you’re applying for. If the company has published its core values, study those and be ready to list examples from your past positions that align with each value. This is extremely important at a company like Amazon, with its 16 leadership principles.

Give it a try using the example questions and techniques below.

Data scientist interview questions - Behavioral

Tell me about yourself
Why do you want to work at this company? (sample answer from Amazon interviews)
Why data science?
How would you measure the impact of a business initiative?
Tell me about a project you worked on that was not successful. What would you do differently?
What is the one feedback/complaint you always get from your colleagues? How are you working on such feedback?
How do you sort your priorities when engaged in multitasking?

To prepare for behavioral interview questions, we recommend learning our step-by-step method for answering behavioral questions. The guide focuses primarily on Meta but can be applied to any company. You can then use this method to try answering the practice examples provided above.

5. How to prepare for data science interviews

We've coached more than 15,000 people for interviews since 2018. There are essentially three activities you can do to practice for interviews. Here’s what we've learned about each of them.

5.1 Learn by yourself

Learning by yourself is an essential first step. We recommend you make full use of the resources we've linked to in this article and the free prep resources on this blog.

Once you’re in command of the subject matter, you’ll want to practice answering questions. But by yourself, you can’t simulate thinking on your feet or the pressure of performing in front of a stranger. Plus, there are no unexpected follow-up questions and no feedback.

That’s why many candidates try to practice with friends or peers.

5.2 Practice with peers

If you have friends or peers who can do mock interviews with you, that's an option worth trying. It’s free, but be warned, you may come up against the following problems:

It’s hard to know if the feedback you get is accurate
They’re unlikely to have insider knowledge of interviews at your target company
On peer platforms, people often waste your time by not showing up

For those reasons, many candidates skip peer mock interviews and go straight to mock interviews with an expert.

5.3 Practice with experienced tech interviewers

In our experience, practicing real interviews with experts who can give you company-specific feedback makes a huge difference.

Find a data science interview coach so you can:

Test yourself under real interview conditions
Get accurate feedback from a real expert
Build your confidence
Get company-specific insights
Learn how to tell the right stories, better.
Save time by focusing your preparation

Landing a job at a big tech company often results in a $50,000 per year or more increase in total compensation. In our experience, three or four coaching sessions worth ~$500 make a significant difference in your ability to land the job. That’s an ROI of 100x!

Click here to book data science mock interviews with experienced FAANG interviewers