## Interview Question

Data Scientist Interview

-

# Given the following data: Table: searches Columns: date STRING date of the search, search_id INT the unique identifier of each search, user_id INT the unique identifier of the searcher, age_group STRING ('<30', '30-50', '50+'), search_query STRING the text of the search query Sample Rows: date | search_id | user_id | age_group | search_query -------------------------------------------------------------------- '2020-01-01' | 101 | 9991 | '<30' | 'justin bieber' '2020-01-01' | 102 | 9991 | '<30' | 'menlo park' '2020-01-01' | 103 | 5555 | '30-50' | 'john' '2020-01-01' | 104 | 1234 | '50+' | 'funny cats' Table: search_results Columns: date STRING date of the search action, search_id INT the unique identifier of each search, result_id INT the unique identifier of the result, result_type STRING (page, event, group, person, post, etc.), clicked BOOLEAN did the user click on the result? Sample Rows: date | search_id | result_id | result_type | clicked -------------------------------------------------------------------- '2020-01-01' | 101 | 1001 | 'page' | TRUE '2020-01-01' | 101 | 1002 | 'event' | FALSE '2020-01-01' | 101 | 1003 | 'event' | FALSE '2020-01-01' | 101 | 1004 | 'group' | FALSE Over the last 7 days, how many users made more than 10 searches? You notice that the number of users that clicked on a search result about a Facebook Event increased 10% week-over-week. How would you investigate? How do you decide if this is a good thing or a bad thing? The Events team wants to up-rank Events such that they show up higher in Search. How would you determine if this is a good idea or not?

10

Investigation: avg. # clicks per search (YoY for seasonality) # clicks per event type (histogram) # clicks by weekday / weekends good / bad if avg. # clicks per search has increased it might be bad cause users don't find relevant content, I would have also check the avg. time per link, if it has decreased it means the user didn't find relevant content I will also check the ratio between reacted events (arrive / interested) out of total clicked events , if the ratio has increased it might be good cause people reacted as it's relevant for them

Anonymous on

37

Over the last 7 days, how many users made more than 10 searches? WITH search_agg as( SELECT user_id, count(search_id) as count_search FROM search WHERE date>current_date -interval '7 days' GROUP BY user_id HAVING count_search>10) SELECT count (distinct user_id) FROM search_agg

Anonymous on

2

select count(distinct user_id) from ( select user_id, count(search_id) as cnt_search_7days from searches where date > datediff(today() -7) having count(search_id) > 10 ) a

Vivek on

1

SELECT SUM(CASE WHEN COUNT(search_id) > 10 THEN 1 ELSE 0 END) AS num_users_over_10 FROM searches WHERE date > CURRDATE() - INTERVAL 7 DAY GROUP BY user_id

Anonymous on

7

DROP VIEW IF EXISTS users_clicked; DROP TABLE IF EXISTS searches; CREATE TABLE searches ( "Date" VARCHAR(10), search_id int , user_id int, age_group VARCHAR(25), search_query VARCHAR(255) ); INSERT INTO searches ("Date", search_id, user_id, age_group, search_query) VALUES ('2020-01-01', 101, 9991, ' 4; CREATE VIEW users_clicked as( SELECT s."Date", s.search_id, s.user_id, s.age_group, sr.clicked FROM searches as s Left join search_results as sr on s.search_id = sr.search_id); SELECT '-----users_clicked------' as msg; SELECT * from users_clicked; -- how many costumers clicked over the past 10 days SELECT count(distinct user_id) filter (where clicked=True and CURRENT_DATE - TO_DATE("Date", 'YYY-MM-DD')<=10) From users_clicked;

Thoughts? on

0

library(tidyverse) library(lubridate) searches %>% filter(date >= now()-days(7)) %>% distinct(search_id, user_id) %>% group_by(user_id) %>% count() %>% ungroup() %>% filter(n>10)

Anonymous on

5

import pandas as pd from datetime import datetime,timedelta df = pd.read_csv(searches) df['date'] = pd.to_datetime(df['date']).dt.date last_7 = (df['date'] - datetime.now().date())10) num_searches_per_user_over_10 = num_searches_per_user.sum() percent_searches_per_user_over_10 = num_searches_per_user.mean()

Frank on