The mother of all Data Science Interview Guides

Having gone through 100s of data science interviews I realized that they are all fairly similar in the type of questions they ask. This helped me create a framework for what to expect and how to answer the questions. This framework helped me land multiple job offers and at levels higher than I applied for.


Let me first outline the rounds that you’ll generally face and we’ll then dig into details of what are the questions you’re likely to see and how to answer it.

  1. HR Phone Screen

  2. Programming Rounds

    1. SQL

    2. Python

  3. Case Study Rounds

  4. Behavioral Rounds



Part 1: HR Phone Screen

Often candidates think this call is to merely test if you are right for a role. Recruiters run over some minor work experience questions, and run through a list of requirements - “Do you know SQL, Python, NLP etc”. End Call - Your first round is scheduled. Yay! 


This is the wrong way to do screening calls. The screening call is also where you get a chance to pitch your case for a higher role. Build a relationship during the course of this short call. Demonstrate effective communication skills, come off as somebody who people would want to work with and for. I’ve used this mentality to interview up a level or two. This is much better than going through the interview loop and hoping you’ve done a great job and that somebody will think you’re a level up than what you applied for. Needless to say, the #1 way to get the pay you want is to interview at a level where that pay is the average and not to interview for a level down and hope for top pay by negotiating. Reach out to the insider crew and ask for coaching on this specifically. We’ve done this so many times to good effect.



Part 2: Programming Rounds


All interview loops I’ve gone through had a SQL round. 20% of interview loops had a python round in addition to the SQL round. Here’s what interviewers are looking for, which directly translates to steps [Coding Step Framework] you should follow during the coding round:

  1. Take a moment to understand the problem vs jump right into code

  2. Communicate up front, ask clarifying questions and get on the same page as the interviewer.

  3. Think out loud and communicate possible approaches and limitations

  4. Write out a pseudo code

  5. Write clean and efficient code

Tl;dr Make it a collaborate Round- You and the interviewer are working together to solve it


How to ace SQL


Any SQL round can be crushed if you know all of the following concepts: 


  1. How to use Row_number/Rank/Dense_rank

  2. How to use Lead/ Lag

  3. How to use window function summation, eg: sum(x) over ()  

  4. How to write sum (case when… then.. else.. end) , Count (Case when …) statements

  5. How to write subqueries/ CTEs

  6. How to Join 2 tables


This amazing blog by Linda Chen has all the SQL questions you need to practice these concepts: https://towardsdatascience.com/sql-questions-summary-df90bfe4c9c


You will often be given 2 or more tables to work with and multiple questions to answer. There are 2 types of questions that I have encountered so far:

  • Type 1 Questions: Descriptive: “write a query that gives me sales by country”. 

  • Type 2 Questions: It will be more ambiguous “what percentage of users are repeat offenders/ Do we have a repeat offender problem on the platform”. You’ll have to collaborate with the interviewer to define what the output should be and then write the query for it.


Regardless of whether it is type 1 or type 2 remember to apply the Coding Step Framework mentioned earlier. The only difference is that if you have just Type 1 questions, the round is all about “speed SQL” and you can leave out “writing pseudo code” to save time. 


The biggest confidence booster for me after knowing the concepts, and the framework to solve the problems was to anticipate the question that will be asked. I’m not asking you to go Dr. Strange here. I'm, however, asking you to think about the company, think about their core product, think about the data they have and the problems they might be dealing with. For example: Let’s say you have a Linkedin interview coming up. Their product connects professionals with other professionals. Every user has a profile where they list the companies they worked at and when they worked for that company.

  • Right off the bat that means they have a user table with some user attributes user_id, name, etc. 

  • They might have a Employer table employer_id, company name etc

  • They might have User-Employments table, user_id, employer_id, Start Date, End Date

  • You might encounter questions like:

    • What percentage of people who work at Google end up going to Facebook


Play the role of interviewer and interviewee and try to solve these questions by applying the Coding Step Framework. This anticipation exercise really helped me think about what kind of tables I might get during the rounds and I was right on the money a few times. Even if you aren’t right, you’d enter the interview with a whole new level of confidence.


Summary on SQL: Know the concepts, Apply the Framework, Anticipate the Tables that will be given.



How to ace Python


Any Python round can be crushed if you know all of the following concepts: 


  1. How to use Python Classes

  2. How to use Arrays, Dictionaries

  3. How to do simple pandas dataframe manipulations

  4. How to use sklearn


Here’s 3 types of Python rounds that you’re likely to expect in the wild:


  • Type 1: Leetcode type problems: Given a 2-D array, can you write code to transpose it. (arrays, dictionaries, classes)

  • Type 2: Case study type question: Given some data in a csv, perform some exploration to answer a question. (pandas and data manipulations)

  • Type 3: Here’s our code that we’re working on - can you tell us what it does and what’s wrong with it.


Recall that The Coding Step Framework applies to Python rounds also. Often for Python rounds you’re given the option to search google for syntax, or functions. I highly recommend not doing that. You should have the ability to write basic Python and use Pandas without having to search stackoverflow for every other line. Using stackoverflow as a crutch for every line of code is a red-flag. I recommend going through Leetcode array and dictionary modules. I also recommend memorizing your basic Pandas Functions (merge, fillNa, group by aggregate, filter for a specific column value, describe etc)



Part 3: Case Study Rounds


The common theme you’d find across all these rounds (Coding, HR, Behavioral, Case study) is your ability to communicate. Here’s the framework for how to tackle ANY case study question.



  1. Take a moment to explain your structure of attempting this problem and this round

  2. The goals section - where you ask clarifying questions to find out the intent of the problem given, list out your users, stakeholders, hypothesis and metrics

    1. Users: who is impacted by this particular problem

    2. Hypothesis: that formalizes your intent and how each user is impacted

    3. Metrics: Define Primary success metrics for the business and the users, Guard rail metrics

  3. The methods section: Talk about your method and your approach to solve the problem given - Is it an AB test (80% of case studies), is it an optimization or regression problem, is it some simple exploratory analysis/ segment analysis.

    1. If its an AB test: talk about Null and Alternate Hypothesis, Randomization, Sample size calculations, Effect Size, Type 1 and 2 errors, Duration of running tests, Network effects if any.

  4. Business Strategy and Evaluation:

    1. Talks about how you validate your experimental results

    2. Talk about roll out strategies, next steps, limitations

    3. Talk about what you’d do if you observe no results or unsuccessful experiments


To fully grok this framework here’s a case study mock interview that you MUST watch: https://www.youtube.com/watch?v=TG6d7CQACyI  We also recommend this tech set up of whiteboard, to walk through and demonstrate each step during the interview. This comes in handy during a virtual case study round. You will differentiate yourself from the crowd just with your tech setup and preparedness. Reach out to us for some mock interviews on case study, and help on tech setup.


Part 4: Behavioral Rounds


The behavioral rounds are where you prove you are right for the organization and more importantly you are right for that level. The type of questions and the responses vary based on the level you’re interviewing for. In general you can expect the following themes though:


  1. Talk about failures - how you failed, why, how you handled it

  2. Talk about conflicts and managing them - when you didn’t get along with someone or there was friction in the team

  3. Talk about how you collaborate with others (stakeholders from across the org)

  4. Talk about when you disagreed with a direction - why, how, what did you do proceed


The mantra for this round: Be the person you want to work with. People are just looking for good, low drama, self aware and competent human beings to work alongside. Your ability to communicate stories of how you are able to pull together people, your listening and communications skills, your ability to disagree with reasoning and commit to a cause without taking it personal, your ability to resolve conflicts between people are all crucial here. When you communicate these experiences, be aware that the interviewer has no idea about your work, your previous org or anything contextual. Give them all the high level context needed so they’re along for the ride before you dive deep. Go in prepared. Don’t take this round lightly. This is where you make the big bucks. Reach out to the insider crew for specific mock interviews for the level and company you’re interviewing at.



Next
Next

Leadership Principles, Behavioral and Project specific questions