Search and Data Mining Engineer was asked...6 June 2016

↳

Sorting the strings is not optimal because each sort is O(N log N) where N is the number of characters in each word. A more optimal solution is to create a function to encode each word as a hash table of character frequencies, which is O(N) for each word. Less

↳

sort the strings and compare

Data Mining/Machine Learning Engineer Position was asked...29 December 2010

↳

Use collaborate filtering to compare personal preference with others. If A and B are similar, we can recommend preferred items in B to A. Less

↳

Why downvote on other answer? He/she is right. Collaborative filtering is the most common strategy for recommendation systems. You see user A buys these things and user B also bought those things but user B bought this other thing too so let's show that thing to User A. Less

↳

I think you mean Normal distribution! If you are using R use set.seed(). You can then use rnorm() with size, mean & SD. e.g. >set.seed(123) >rnorm(100, 2, 5) Less

↳

I'm the original poster, sorry for my typo. I actually mean multinomial distribution. And the advanced question was, if the probability is a skewed distribution, how would you speed up your algorithm. You can find both answer from Wikipedia. :) Less

↳

I got the optimal solution (with a couple nudges but time to spare), yet apparently this was the only module where I did not "meet expectations." Shame that some presumably small mistake in my first hour was enough to discount the otherwise very strong 6 hour interview. Less

Search and Data Mining Engineer was asked...21 August 2014

Software Engineer New Grad (Data Mining) was asked...22 August 2017

Data Mining Engineer was asked...10 February 2017

↳

Mathematically speaking, it adds a regularization term in order to prevent the coefficients to fit so perfectly to overfit. The difference between the L1 and L2 is just that L2 is the sum of the square of the weights, while L1 is just the sum of the weights Less

Data Mining/Machine Learning Engineer Position was asked...5 February 2011

↳

It depends on the volume of data that we have. Assuming there is a lot of data on hand, it is best to use a Collaborative filtering. This involves finding similar users/items for whom we are recommending products and implement a weighted average of their likeliness to the product to help make a decision on recommending the product. This could be implemented as a user-user collaborative filtering where we find similar users or an item-item collaborative filtering. If we have fewer data to work with, it is a better idea to implement a Content-based filtering approach where we create profiles for the users and try to recommend products based on the features of the user profiles. Less

Data Mining Engineer was asked...20 February 2015

↳

The closest point to the mean of all the points.