# Senior data engineer Interview Questions

Senior Data Engineer interview questions shared by candidates### How would you test if survey responses were filled at random by certain individuals, as opposed to truthful selections?

This is a very basic psychometrics question. Calculate Cronbach's alpha for the survey items. If it is low (below .5), it is very likely that the questions were answered at random. Less

I would design the test in a way that certain information is asked two different ways. if two answers disagree with each other I would seriously doubt the validity of the answers. Less

We need to find the histograms of the questions in the survey to see the distribution of each answer in each question. All question histograms will likely follow the normal distribution if they are truthful selection. If one response with more than of half of total answers being located outside of 95% confidential interval in each histogram, the response will be categorized as random fall out of mean plus tw Less

### How would you build and test a metric to compare two user's ranked lists of movie/tv show preferences?

1) Develop a list of shows/movies that are representative of different taste categries (more on this later) 2) Obtain ranking of the items in the list from 2 users 3) Use Spearman's rho (or other test that works with rankings) to assess dependence/conguence between the 2 people's rankings. * To find shows/movies to include in the measurement instrument, maybe do cluster analysis on large number of viewer's viewing habits. Less

Look at the mean average precision of the movies that the users watch out of the rankings. So if out of 10 recommended movies one user prefers the third and the other user prefers the sixth, the recommendation engine of the user who preferred the third would be better. InterviewQuery.com has it more in depth of an answer. Less

### 1. Given the sample: id, status 1, active 2, active 3, active 4, pending 5, expired 6, expired 7, expired 8, pending Pull the unique statuses that show up consecutively 3 times, e.g. from the sample, the output would be 'active', 'expired'. 2. Given the sample: employee, in_out, time A, IN, 6:00 B, IN, 7:00 A, OUT, 8:00 C, IN, 9:30 A, IN, 9:00 A, OUT, 10:00 B, OUT, 11:00 C, OUT, 10:00 Determine which employees are in the building at 10:30.

I was perturbed since I thought this was going to be a Behavioral Interview. I could not answer. Less

select distinct status from (select *, case when status = lead(status,1) over(order by id) and lead(status,1) over(order by id) = lead(status,2) over(order by id) then 1 else 0 end as consecutive from tab) where consecutive =1 Less

with cte as ( select * , dense_rank() over(partition by employee order by time) as rnk from table ) select distinct a.employee from cte as a, cte as b where a.employee=b.employee and a. in_out='IN' and b. in_out='OUT' and a.rnk = b.rnk-1 and a.time=10:30 Less

### The percentage of female customer base

Wrote the SQL query to answer this question

Do you have any details on Python questions?

You need demographics data for this. Query would be fairly simple

### If you can build a perfect (100% accuracy) classification model to predict some customer behavior, what will be the problem in application?

3 Answers### Technical case interview which is a mix of modelling skills + classical case interview structure

Hi there, Thank you for sharing your experience. Just a quick question - do you remember how long you waited till you heard back after the business case interview stage? Thanks! Less

Hi there, Sorry you had a bad experience with this interviewer - do you mind giving us the first name of this interviewer? Or at least first and last initials? I'll be sure to contact this employee and point them to training resources at BCG. Thanks. Less

wow, sorry to hear that. of all of gamma’s shortcomings, lack of common courtesy/EQ would not be on my radar’s radar. Less

### Imagine you have N pieces of rope in a bucket. You reach in and grab one end-piece, then reach in and grab another end-piece, and tie those two together. What is the expected value of the number of loops in the bucket?

Is the question and answer makes sense? I thought the answer is 1/(2n-1). I don't understand why the solution adds all probability from 1 to N case together? For the 2 ropes case, the p(1 loop) = 1/3. So expected number of loop is also 1/3, but why the answer is 1+1/3= 4/3?Am I missing something? Less

You are right, the long answer failed simple boundary condition: if you tie once after pick two end, the max number of loops is one! So the p(n) is [0,1], lol Less

I got the correct answer, but the mathematician yelled at me for arriving to slowly at such an "easy" answer. Less

### If you take 3 subsequent number (n, n+1, n+2) and know, that n and n+2 are prime numbers, can you proove, that n+1 is always dividable by 6?

3, 4, 5. 3 and 5 are prime. 4 is not divisible by 6.

n+1 will be divisible by 2 since n and n+2 are prime now n,n+1 or n+2 any one of them should be divisible by 3 n and n+2 are prime so n+1 should be divisible by 3 Hence proved Less

1. if n and n+2 are prime, the n+1 is dividable by 2; 2.one of the three subsequent number (n, n+1, n+2) must be dividable by 3. because n and n+2 are prime, then n+1 is dividable by 3. So n+1 is dividable by 6. Less

### Describe the metrics one would use to evaluate a binary classifier.

Precision, Recall, F-score, Accuracy, ROC

