Data Scientist Interview Questions in United States | Glassdoor.ie

Data Scientist Interview Questions in United States

2,573

Data scientist interview questions shared by candidates

Top Interview Questions

Sort: RelevancePopular Date

12 Sep 2013

25 Feb 2012
 Find the second largest element in a Binary Search Tree16 Answersfind the right most element. If this is a right node with no children, return its parent. if this is not, return the largest element of its left child.One addition is the situation where the tree has no right branch (root is largest). In this special case, it does not have a parent. So it's better to keep track of parent and current pointers, if different, the original method by the candidate works well, if the same (which means the root situation), find the largest of its left branch.if (root == null || (!root.hasRightChild() ) { return null;} else return findSecondGreatest(root, root.getValue()); value findSecondGreatest(Node curr, value oldValue) { if(curr.hasRightChild()) { return (findSecondGreatest( curr.getRightChild(), curr.value)); } else return oldValue; }Show more responsesAbove answer is wrong. it has to be something like this. public static int findSecondLargest(Node node) { Node secondLargest = null; Node parent = null; Node child = node; if (node!=null && (node.hasLeftChild()||node.hasRightChild())) { if (node.hasRightChild()) { while (child.hasRightChild()) { parent = child; child = child.rightChild(); } secondLargest = parent; } else if (node.hasLeftChild()) { child = node.leftChild(); while (child.hasRightChild()) { child = child.rightChild(); } secondLargest = child; } } return secondLargest; }The above answer is also wrong; Node findSceondLargest(Node root) { // If tree is null or is single node only, return null (no second largest) if (root==null || (root.left==null && root.right==null)) return null; Node parent = null, child = root; // find the right most child while (child.right!=null) { parent = child; child = child.right; } // if the right most child has no left child, then it's parent is second largest if (child.left==null) return parent; // otherwise, return left child's rightmost child as second largest child = child.left; while (child.right!=null) child = child.right; return child; }Soln by "mindpower" works. Thank you. I am trying to solve a similar problem Find the 2nd nearest high(in in-order traversal) value for a given node Eg: Given nums: 12 7 14 3, construct a BST. If the given value is: 7 then we should return 14 (in the sort order: 3, 7, 12, 14) if the given value is: 3 then we should return 12 (in the sort order: 3, 7, 12, 14)Generic solution in C# for any k. Notice that this example can be easily changed to find the k-th smallest node by doing a depth-first recursion on root.Left first, and then a tail recursion on root.Right. public Node GetKthLargest(int k) { return GetKthLargest(ref k, this.Root); } Node GetKthLargest(ref int k, Node root) { if (root == null || k < 1) return null; var node = GetKthLargest(ref k, root.Right); if (node != null) return node; if (--k == 0) return root; return GetKthLargest(ref k, root.Left); }recursion is not needed. SecondLargest(Node root, Node secondLarge) { if(root.right==null) return root.left; Node secondLargest = root; while(secondLargest.right.right==null) secondLargest=secondLargest.right; return secondLargest; }int getmax(node *root) { if(root->right == NULL) { return root->d; } return getmax(root->right); } int secondmax(node *root) { if(root == NULL) { return -1; } if(root->right == NULL && root->left != NULL) { return getmax(root->left); } if(root->right != NULL) { if(root->right->right == NULL && root->right->left == NULL) { return root->d; } } return secondmax(root->right); }In-order traverse the tree. The second last element in the array in the answer.In Python: def find_second_largest_bst_element(root, parent=None): if parent is None: # BST root if root.right is None: # no right subtree if root.left is not None: # if a left subtree exists... return root.left else: # root is the only element of the BST return False else: if root.right is None: # right-most element if root.left is not None: # left subtree exists return root.left else: # leaf return parent else: # check right subtree find_second_largest_bst_element(root.right, root) find_second_largest_bst_element(root)For kth smallest, descend the left subtree first. class Node: def __init__(self, value, left=None, right=None): self.value = value self.left = left self.right = right def findKthLargest(root, k): global count if root is None: return findKthLargest(root.right, k) count += 1 if count == k: print root.value return findKthLargest(root.left, k) count = 0 r = Node(10, Node(5, Node(2), Node(7)), Node(30, Node(22), Node(32))) findKthLargest(r, 3)// solution in java // main routine Node findSecondMax(Node root) { if(root == null || (root.left == null && root.right == null) return null; else { Node max = findMax(root); return (max.parent == null) ? findMax(max.left) : max.parent; } } //helper routine, recursive implementation.... can also be done non-recursively Node findMax(Node root) { return (root.right == null) ? root : findMax(root.right); }Show more responsesFind the largest number in the binary tree and delete it. And again find the largest number. Short and fast.Reverse in-order traversal of the BST, keeping a count of # of visited nodes. This methods works great to return the kth largest element in a BST.mindpower's solution looks right

17 Jan 2018
 Common statistical and python related questions. 1) How do you proof that males are on average taller than females by knowing just gender or height. 2) What is a monkey patch 3) How do you get the count of each letter in a sentence8 Answers1) Get average plus perform T-test. 2) https://stackoverflow.com/questions/5626193/what-is-monkey-patching 3) I answered that you can build a for loop that looks for a specific character and adds to a counter every time one is found. This is an example: https://stackoverflow.com/questions/2932511/letter-count-on-a-string1) How do you prove that males are on average taller than females by knowing just gender and height? Assuming that samples given in the data are independent within each group as well as across groups, this question can be answered by using "inference for comparing two independent means" ( a good source for learning the method is in the followed link: https://www.coursera.org/learn/inferential-statistics-intro/lecture/wkwlZ/inference-for-comparing-two-independent-means) There are two ways to do the inference and the result of both should agree with each other: a) calculating the confidence interval (CI) b) doing a hypothesis test (HT) conditions for doing both are as follow: 1) independence * within groups - random sampling - if sampling is done without replacement, number of samples (n) should be less than 10% of the population * between groups - samples should be non-paired 2) If we think the distribution of the height for the population of interest is skewed, then we need to have a larger sample size (n). (n>30) After checking for the conditions above, we can use central limit theorem for doing the inference: a) CI: - set a confidence level (ex 90%) --> Note that the t-test should be "one-tailed" because the question asks if males height are "higher". - calculate the difference between the sample mean for males and females (d) - estimating the difference between means by using the following formula: d +/- margin of error * calculate SE_d. * calculate t score with df = min(n_male - 1, n_female -1) and confidence level of 90% * calculate d +/- t*SE_d - if the confidence interval doesn't contain 0, we can reject the statement that "there is no difference between males and females heights." b) HT: - set a significance level (corresponding to the confidence level above, it should be 5%) - set Null and Alternative hypothesis (H_0 and H_a) * H_0: d = 0 * H_a: d > 0 - calculate t score ( t = (d - 0)/SE ) - calculate p_value corresponding to the t score (Note that the t-test should be "one-tailed" because the question asks if males height are "higher". So the shaded area under the t curve should be calculated on the right side only) - if p_value < 5%, we can reject the null hypothesis. A low p-value can give us a statistical evidence to support rejecting the null hypothesis, but it does not prove that the alternative hypothesis is true. Here we have chosen alpha level of 0.05, there's a 5% chance we will incorrectly reject the null hypothesisThere are three ways to reject the null hypothesis: 1) calculate the confidence interval: sample mean fall not within the confidence interval 2) t-score is greater than the t-critical value 3) p < alpha The above answer are kind of mixed up.Show more responsessent = 'this is a test. so count each letter of this variable.' letter ={} for char in sent: if char in letter: letter[char] += 1 else: letter[char] = 1 letterlist_of_char_count=[(x, given_string.count(x)) for x in [char for char in given_string]]sent = 'This is a sentence.' lst = [] for char in sent: lst.append(char) pd.Series(lst).value_counts()from collections import Counter Counter("This is the sentence.")a <- "mystring" table(unlist(strsplit(a, ""), use.names=FALSE)) #without zeros OR table(factor(unlist(strsplit(a, ""), use.names=FALSE), levels=letters)) # with Zeros

26 May 2013
 Write a function that takes in two sorted lists and outputs a sorted list that is their union.10 Answersf(a,b) { return sort(unique(a,b)) }def sortedUnion(list1,list2): list3 = [x for x in list1 if x in list2] return sorted(list(set(list3)))google merge sortShow more responseswrite 2 helpers: 1) INSERT(A, b) = put element b within A in the sort order 2) DEL(A, a) = delete element a from A Then do this recursion: f(A,B) : if max(A) <= min(B) return [A B] else { B = INSERT(B, max(a)); A = DEL(A, max(a); f(A,B); } something like that. try coding and testing. I haven't.Oops, check/write a termination conditionOn Python, you could do: from sets import Set def merge_sort(a,b): return sorted( Set(a).union(Set(b)) )def sorted_union(list1, list2): union=set(list1).union(set(list2)) sorted_union=sorted(list(union)) return sorted_unionSecond part of merge sort. Don't answer with sort(a), etc. Anyone can do that... def merge(A, B): i=0 j=0 sorted_list = [] while i < len(A) and j < len(B): if A[i] <= B[j]: sorted_list.append(A[i]) i += 1 else: sorted_list.append(B[j]) j += 1 if i < len(A): sorted_list.extend(A[i:]) elif j < len(B): sorted_list.extend(B[j:]) return sorted_listI assumed that we can not use any "sort" function and we want it with linear time. so here it is: def my_sort(list_a, list_b): if len(list_a) ==0: return list_b elif len(list_b) ==0: return list_a else: if list_a[-1] > list_b[-1]: return( my_sort(list_a[0:-1], list_b) + [list_a.pop(-1)]) else: return(my_sort(list_a,list_b[:-1]) + [list_b.pop(-1)])In SQL SELECT List1 FROM Table1 UNION SELECT List2 FROM Table2 ORDER BY List1, List2;

16 Feb 2012
 generating a sorted vector from two sorted vectors. 3 Answerskeep two pointers and compare the two numbers they point to. Move the pointer which points to the smaller or equal number. End loop when two pointers reach the end.look at merge in mergesort, does exact same thing.Merge sort is the best...many languages have this function inbuilt...else this can also be done manually, assume two vectors A [1,2,3,4] And B[5,6,7,8]...merge them...compare the last value of A and first value of B...in our case 4<5 is true...thus the result...if it is false then move the number up and then compare it with the previous number and so on...

Data Scientist at Square was asked...

1 Mar 2013
 How do you test whether a new credit risk scoring model works? What data would you look at?3 AnswersI think I did fairly well on the data side, but I think I should have connected this to a model or something. Not fully sure on this one.One could use the machine learning concept known as cross validation as an element to solve for this case... Assuming that in the development of the model, borrower data has already been broken into several subsets (a training, a validation, and a test set) and part of this subset data has already been used to fit and tune the model (the training and validation sets), the test set can then be used to provide an unbiased and independent assessment of the model's performance. In this case, we would be interested in comparing the MSE's of both the training and test sets - which should be roughly equivalent if the model is good.An ideal model will have lowest sum of bias square and variance. If the model already has lowest expected error comparing to other models, it is the only choice for a working model

3 Mar 2019

Senior Data Scientist at Fractal Analytics was asked...

3 Mar 2019
 The hacker rank challenge had questions about basic python/pandas skills. 1 AnswerYou can practice for the hacker rank questions on their website. Try solving a few moderate/difficult ones at their site.

Data Scientist at Wayfair was asked...

26 Jun 2019
 How would you correlate each device a person visits the website with back to that person?1 AnswerRight answer: logistic regression. check if they logged in. Wrong answers: SVMs, anything temporal, dimensional reduction, linear regression, finding out more about the dataset/industry to help with modeling the data in anyway.

Data Scientist at zulily was asked...

17 Apr 2019
 What is A/B testing2 AnswersCentral limit theorem, etc.Anyone can share zulily interview questions? I'll pay \$20 bounties for the interview information on Rooftop Slushie. https://wwww.rooftopslushie.com
110 of 2,573 Interview Questions

More