Senior data engineer Interview Questions


Senior Data Engineer interview questions shared by candidates

Top Interview Questions

Sort: Relevance|Popular|Date
Senior Data Scientist was asked...21 October 2014

How would you test if survey responses were filled at random by certain individuals, as opposed to truthful selections?

4 Answers

This is a very basic psychometrics question. Calculate Cronbach's alpha for the survey items. If it is low (below .5), it is very likely that the questions were answered at random. Less

I would design the test in a way that certain information is asked two different ways. if two answers disagree with each other I would seriously doubt the validity of the answers. Less

We need to find the histograms of the questions in the survey to see the distribution of each answer in each question. All question histograms will likely follow the normal distribution if they are truthful selection. If one response with more than of half of total answers being located outside of 95% confidential interval in each histogram, the response will be categorized as random fall out of mean plus tw Less

Show More Responses

How would you build and test a metric to compare two user's ranked lists of movie/tv show preferences?

4 Answers

1) Develop a list of shows/movies that are representative of different taste categries (more on this later) 2) Obtain ranking of the items in the list from 2 users 3) Use Spearman's rho (or other test that works with rankings) to assess dependence/conguence between the 2 people's rankings. * To find shows/movies to include in the measurement instrument, maybe do cluster analysis on large number of viewer's viewing habits. Less

Look at the mean average precision of the movies that the users watch out of the rankings. So if out of 10 recommended movies one user prefers the third and the other user prefers the sixth, the recommendation engine of the user who preferred the third would be better. has it more in depth of an answer. Less

It's essential to demonstrate that you can really go deep... there are plenty of followup questions and (sometimes tangential) angles to explore. There's a lot of Senior Data Scientist experts who've worked at Netflix, who provide this sort of practice through mock interviews. There's a whole list of them curated on Prepfully. Less

Show More Responses

1. Given the sample: id, status 1, active 2, active 3, active 4, pending 5, expired 6, expired 7, expired 8, pending Pull the unique statuses that show up consecutively 3 times, e.g. from the sample, the output would be 'active', 'expired'. 2. Given the sample: employee, in_out, time A, IN, 6:00 B, IN, 7:00 A, OUT, 8:00 C, IN, 9:30 A, IN, 9:00 A, OUT, 10:00 B, OUT, 11:00 C, OUT, 10:00 Determine which employees are in the building at 10:30.

4 Answers

I was perturbed since I thought this was going to be a Behavioral Interview. I could not answer. Less

select distinct status from (select *, case when status = lead(status,1) over(order by id) and lead(status,1) over(order by id) = lead(status,2) over(order by id) then 1 else 0 end as consecutive from tab) where consecutive =1 Less

with cte as ( select * , dense_rank() over(partition by employee order by time) as rnk from table ) select distinct a.employee from cte as a, cte as b where a.employee=b.employee and a. in_out='IN' and b. in_out='OUT' and a.rnk = b.rnk-1 and a.time=10:30 Less

Show More Responses

Given a list, create a new list that does not include the duplicates of the original list.

3 Answers

a = old list b = new list code : a = set(a) b = list(a)

Maybe they were asking to do it in-place. In that case, switch the duplicate elements to the end. Less

python 4 lines of code.


The percentage of female customer base

3 Answers

Wrote the SQL query to answer this question

Do you have any details on Python questions?

You need demographics data for this. Query would be fairly simple


If you can build a perfect (100% accuracy) classification model to predict some customer behavior, what will be the problem in application?

3 Answers

Distribution shift. You can never guarantee your train or test distribution covers future observations. Less

Than we have a determinist problem, so what is the point of building a model at all Less

Than we have a determinist problem, so what is the point of building a model at all Less

Boston Consulting Group

Technical case interview which is a mix of modelling skills + classical case interview structure

3 Answers

Hi there, Thank you for sharing your experience. Just a quick question - do you remember how long you waited till you heard back after the business case interview stage? Thanks! Less

Hi there, Sorry you had a bad experience with this interviewer - do you mind giving us the first name of this interviewer? Or at least first and last initials? I'll be sure to contact this employee and point them to training resources at BCG. Thanks. Less

wow, sorry to hear that. of all of gamma’s shortcomings, lack of common courtesy/EQ would not be on my radar’s radar. Less


Imagine you have N pieces of rope in a bucket. You reach in and grab one end-piece, then reach in and grab another end-piece, and tie those two together. What is the expected value of the number of loops in the bucket?

3 Answers

Is the question and answer makes sense? I thought the answer is 1/(2n-1). I don't understand why the solution adds all probability from 1 to N case together? For the 2 ropes case, the p(1 loop) = 1/3. So expected number of loop is also 1/3, but why the answer is 1+1/3= 4/3?Am I missing something? Less

You are right, the long answer failed simple boundary condition: if you tie once after pick two end, the max number of loops is one! So the p(n) is [0,1], lol Less

I got the correct answer, but the mathematician yelled at me for arriving to slowly at such an "easy" answer. Less


If you take 3 subsequent number (n, n+1, n+2) and know, that n and n+2 are prime numbers, can you proove, that n+1 is always dividable by 6?

3 Answers

3, 4, 5. 3 and 5 are prime. 4 is not divisible by 6.

n+1 will be divisible by 2 since n and n+2 are prime now n,n+1 or n+2 any one of them should be divisible by 3 n and n+2 are prime so n+1 should be divisible by 3 Hence proved Less

1. if n and n+2 are prime, the n+1 is dividable by 2; of the three subsequent number (n, n+1, n+2) must be dividable by 3. because n and n+2 are prime, then n+1 is dividable by 3. So n+1 is dividable by 6. Less


Describe the metrics one would use to evaluate a binary classifier.

2 Answers

Precision, Recall, F-score, Accuracy, ROC

Through questions like this, interviewers are mostly trying to test your skillset (and its relevance to the role) as robustly as possible, so be prepared for multiple offshoots and followups. It could be a useful exercise to do mocks with friends or colleagues in Bumble to get a real sense of what the interview is actually like. Alternatively Prepfully has a ton of Bumble Senior Data Scientist experts who provide mock interviews for a pretty reasonable amount. Less

Viewing 1 - 10 of 2,382 Interview Questions

See Interview Questions for Similar Jobs

senior software engineer