Leetcode 30 days of Pandas (Part 3 of 3)

Shahidullah Kawsar
20 min readJan 27, 2024

LeetCode has launched a study plan “30 days of Pandas”, with 33 questions to practice your Pandas skills. I have already discussed problems 1 to 10 in part 1 and problems 11 to 20 in part 2. Today I will discuss problems 21 to 33 and their multiple solutions using pandas.

Problem 21: Leetcode Easy 1741. Find Total Time Spent by Each Employee

Table: Employees
+-------------+------+
| Column Name | Type |
+-------------+------+
| emp_id | int |
| event_day | date |
| in_time | int |
| out_time | int |
+-------------+------+
(emp_id, event_day, in_time) is the primary key (combinations of columns with unique values) of this table.
The table shows the employees' entries and exits in an office.
event_day is the day at which this event happened, in_time is the minute at which the employee entered the office, and out_time is the minute at which they left the office.
in_time and out_time are between 1 and 1440.
It is guaranteed that no two events on the same day intersect in time, and in_time < out_time.

Write a solution to calculate the total time in minutes spent by each employee on each day at the office. Note that within one day, an employee can enter and leave more than once. The time spent in the office for a single entry is out_time - in_time. Return the result table in any order. The result format is in the following example.

Input: 
Employees table:
+--------+------------+---------+----------+
| emp_id | event_day | in_time | out_time |
+--------+------------+---------+----------+
| 1 | 2020-11-28 | 4 | 32 |
| 1 | 2020-11-28 | 55 | 200 |
| 1 | 2020-12-03 | 1 | 42 |
| 2 | 2020-11-28 | 3 | 33 |
| 2 | 2020-12-09 | 47 | 74 |
+--------+------------+---------+----------+
Output:
+------------+--------+------------+
| day | emp_id | total_time |
+------------+--------+------------+
| 2020-11-28 | 1 | 173 |
| 2020-11-28 | 2 | 30 |
| 2020-12-03 | 1 | 41 |
| 2020-12-09 | 2 | 27 |
+------------+--------+------------+
Explanation:
Employee 1 has three events: two on day 2020-11-28 with a total of (32 - 4) + (200 - 55) = 173, and one on day 2020-12-03 with a total of (42 - 1) = 41.
Employee 2 has two events: one on day 2020-11-28 with a total of (33 - 3) = 30, and one on day 2020-12-09 with a total of (74 - 47) = 27.

Solution:

import pandas as pd

def total_time(employees: pd.DataFrame) -> pd.DataFrame:

# calculate the time spent in each entry
employees["total_time"] = employees["out_time"] - employees["in_time"]

# for each event_day, sum the total time spent in the office
df = employees.groupby(["event_day", "emp_id"])["total_time"].sum()

return df.reset_index().rename(columns={"event_day": "day"})

Problem 22: Leetcode Easy 511. Game Play Analysis I

Table: Activity
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
+--------------+---------+
(player_id, event_date) is the primary key (combination of columns with unique values) of this table.
This table shows the activity of players of some games.
Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on someday using some device.

Write a solution to find the first login date for each player. Return the result table in any order. The result format is in the following example.

Input: 
Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
Output:
+-----------+-------------+
| player_id | first_login |
+-----------+-------------+
| 1 | 2016-03-01 |
| 2 | 2017-06-25 |
| 3 | 2016-03-02 |
+-----------+-------------+

Solution:

import pandas as pd

def game_analysis(activity: pd.DataFrame) -> pd.DataFrame:

activity["first_login"] = activity.groupby("player_id")["event_date"].transform('min')

return activity[["player_id", "first_login"]].drop_duplicates()

Problem 23: Leetcode Easy 2356. Number of Unique Subjects Taught by Each Teacher

Table: Teacher
+-------------+------+
| Column Name | Type |
+-------------+------+
| teacher_id | int |
| subject_id | int |
| dept_id | int |
+-------------+------+
(subject_id, dept_id) is the primary key (combinations of columns with unique values) of this table.
Each row in this table indicates that the teacher with teacher_id teaches the subject subject_id in the department dept_id.

Write a solution to calculate the number of unique subjects each teacher teaches in the university. Return the result table in any order. The result format is shown in the following example.

Input: 
Teacher table:
+------------+------------+---------+
| teacher_id | subject_id | dept_id |
+------------+------------+---------+
| 1 | 2 | 3 |
| 1 | 2 | 4 |
| 1 | 3 | 3 |
| 2 | 1 | 1 |
| 2 | 2 | 1 |
| 2 | 3 | 1 |
| 2 | 4 | 1 |
+------------+------------+---------+
Output:
+------------+-----+
| teacher_id | cnt |
+------------+-----+
| 1 | 2 |
| 2 | 4 |
+------------+-----+
Explanation:
Teacher 1:
- They teach subject 2 in departments 3 and 4.
- They teach subject 3 in department 3.
Teacher 2:
- They teach subject 1 in department 1.
- They teach subject 2 in department 1.
- They teach subject 3 in department 1.
- They teach subject 4 in department 1.

Solution:

import pandas as pd

def count_unique_subjects(teacher: pd.DataFrame) -> pd.DataFrame:

# calculate the number of unique subjects each teacher teaches in the university
df = teacher.groupby("teacher_id")[["subject_id"]].nunique()

return df.reset_index().rename(columns={"subject_id": "cnt"})

Problem 24: Leetcode Easy 596. Classes More Than 5 Students

Table: Courses
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| student | varchar |
| class | varchar |
+-------------+---------+
(student, class) is the primary key (combination of columns with unique values) for this table.
Each row of this table indicates the name of a student and the class in which they are enrolled.

Write a solution to find all the classes that have at least five students. Return the result table in any order. The result format is in the following example.

Input: 
Courses table:
+---------+----------+
| student | class |
+---------+----------+
| A | Math |
| B | English |
| C | Math |
| D | Biology |
| E | Math |
| F | Computer |
| G | Math |
| H | Math |
| I | Math |
+---------+----------+
Output:
+---------+
| class |
+---------+
| Math |
+---------+
Explanation:
- Math has 6 students, so we include it.
- English has 1 student, so we do not include it.
- Biology has 1 student, so we do not include it.
- Computer has 1 student, so we do not include it.

Solution 1:

import pandas as pd

def find_classes(courses: pd.DataFrame) -> pd.DataFrame:

# find number of students in each class
df = courses.groupby("class")[["student"]].count().reset_index()

# filter to find all the classes that have at least five students
df = df[df["student"]>=5][["class"]]

return df

Solution 2:

import pandas as pd

def find_classes(courses: pd.DataFrame) -> pd.DataFrame:

# find number of students in each class
df = courses.groupby('class').size().reset_index(name='count')

# filter to find all the classes that have at least five students
df = df[df['count'] >= 5]

return df[['class']]

Problem 25: Leetcode Easy 586. Customer Placing the Largest Number of Orders

Table: Orders
+-----------------+----------+
| Column Name | Type |
+-----------------+----------+
| order_number | int |
| customer_number | int |
+-----------------+----------+
order_number is the primary key (column with unique values) for this table.
This table contains information about the order ID and the customer ID.

Write a solution to find the customer_number for the customer who has placed the largest number of orders. The test cases are generated so that exactly one customer will have placed more orders than any other customer. The result format is in the following example.

Input: 
Orders table:
+--------------+-----------------+
| order_number | customer_number |
+--------------+-----------------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 3 |
+--------------+-----------------+
Output:
+-----------------+
| customer_number |
+-----------------+
| 3 |
+-----------------+
Explanation:
The customer with number 3 has two orders, which is greater than either customer 1 or 2 because each of them only has one order.
So the result is customer_number 3.

Solution 1:

import pandas as pd

def largest_orders(orders: pd.DataFrame) -> pd.DataFrame:

# count the order number for each customer
df = orders.groupby(["customer_number"])[["order_number"]].count().reset_index()

# filter the customers with the maximum number of orders
df = df[df["order_number"]==df["order_number"].max()][["customer_number"]]

return df

Solution 2:

import pandas as pd

def largest_orders(orders: pd.DataFrame) -> pd.DataFrame:
# If orders is empty, return an empty DataFrame.
if orders.empty:
return pd.DataFrame({'customer_number': []})

df = orders.groupby('customer_number').size().reset_index(name='count')
df.sort_values(by='count', ascending = False, inplace=True)
return df[['customer_number']].head(1)

Problem 26: Leetcode Easy 1484. Group Sold Products By The Date

Table Activities:
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| sell_date | date |
| product | varchar |
+-------------+---------+
There is no primary key (column with unique values) for this table. It may contain duplicates.
Each row of this table contains the product name and the date it was sold in a market.

Write a solution to find for each date the number of different products sold and their names. The sold product names for each date should be sorted lexicographically. Return the result table ordered by sell_date. The result format is in the following example.

Input: 
Activities table:
+------------+------------+
| sell_date | product |
+------------+------------+
| 2020-05-30 | Headphone |
| 2020-06-01 | Pencil |
| 2020-06-02 | Mask |
| 2020-05-30 | Basketball |
| 2020-06-01 | Bible |
| 2020-06-02 | Mask |
| 2020-05-30 | T-Shirt |
+------------+------------+
Output:
+------------+----------+------------------------------+
| sell_date | num_sold | products |
+------------+----------+------------------------------+
| 2020-05-30 | 3 | Basketball,Headphone,T-shirt |
| 2020-06-01 | 2 | Bible,Pencil |
| 2020-06-02 | 1 | Mask |
+------------+----------+------------------------------+
Explanation:
For 2020-05-30, Sold items were (Headphone, Basketball, T-shirt), we sort them lexicographically and separate them by a comma.
For 2020-06-01, Sold items were (Pencil, Bible), we sort them lexicographically and separate them by a comma.
For 2020-06-02, the Sold item is (Mask), we just return it.

Solution:

import pandas as pd

def categorize_products(activities: pd.DataFrame) -> pd.DataFrame:

# sort activities by product in ascending order
activities = activities.sort_values(by="product")

# for each sell_date, find the number of unique products and the products list
df = activities.groupby(["sell_date"]).agg({"product": ["nunique", "unique"]}).reset_index()

# convert multi-level header to single header
df.columns = [c[0] + "_" + c[1] for c in df.columns]
df = df.rename(columns={"sell_date_": "sell_date", "product_nunique": "num_sold", "product_unique": "products"})

# extract the products from list
df["products"] = df["products"].apply(','.join)

df = df.sort_values(by="sell_date")

return df

Problem 27: Leetcode Easy 1693. Daily Leads and Partners

Table: DailySales
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| date_id | date |
| make_name | varchar |
| lead_id | int |
| partner_id | int |
+-------------+---------+
There is no primary key (column with unique values) for this table. It may contain duplicates.
This table contains the date and the name of the product sold and the IDs of the lead and partner it was sold to.
The name consists of only lowercase English letters.

For each date_id and make_name, find the number of distinct lead_id's and distinct partner_id's. Return the result table in any order. The result format is in the following example.

Input: 
DailySales table:
+-----------+-----------+---------+------------+
| date_id | make_name | lead_id | partner_id |
+-----------+-----------+---------+------------+
| 2020-12-8 | toyota | 0 | 1 |
| 2020-12-8 | toyota | 1 | 0 |
| 2020-12-8 | toyota | 1 | 2 |
| 2020-12-7 | toyota | 0 | 2 |
| 2020-12-7 | toyota | 0 | 1 |
| 2020-12-8 | honda | 1 | 2 |
| 2020-12-8 | honda | 2 | 1 |
| 2020-12-7 | honda | 0 | 1 |
| 2020-12-7 | honda | 1 | 2 |
| 2020-12-7 | honda | 2 | 1 |
+-----------+-----------+---------+------------+
Output:
+-----------+-----------+--------------+-----------------+
| date_id | make_name | unique_leads | unique_partners |
+-----------+-----------+--------------+-----------------+
| 2020-12-8 | toyota | 2 | 3 |
| 2020-12-7 | toyota | 1 | 2 |
| 2020-12-8 | honda | 2 | 2 |
| 2020-12-7 | honda | 3 | 2 |
+-----------+-----------+--------------+-----------------+
Explanation:
For 2020-12-8, toyota gets leads = [0, 1] and partners = [0, 1, 2] while honda gets leads = [1, 2] and partners = [1, 2].
For 2020-12-7, toyota gets leads = [0] and partners = [1, 2] while honda gets leads = [0, 1, 2] and partners = [1, 2].

Solution:

import pandas as pd

def daily_leads_and_partners(daily_sales: pd.DataFrame) -> pd.DataFrame:

# For each date_id and make_name, find the number of distinct lead_id's and distinct partner_id's.
df = daily_sales.groupby(["date_id", "make_name"])[["lead_id", "partner_id"]].nunique()
df = df.reset_index().rename(columns={"lead_id": "unique_leads", "partner_id": "unique_partners"})

return df

Problem 28: Leetcode Easy 1050. Actors and Directors Who Cooperated At Least Three Times

Table: ActorDirector
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| actor_id | int |
| director_id | int |
| timestamp | int |
+-------------+---------+
timestamp is the primary key (column with unique values) for this table.

Write a solution to find all the pairs (actor_id, director_id) where the actor has cooperated with the director at least three times. Return the result table in any order. The result format is in the following example.

Input: 
ActorDirector table:
+-------------+-------------+-------------+
| actor_id | director_id | timestamp |
+-------------+-------------+-------------+
| 1 | 1 | 0 |
| 1 | 1 | 1 |
| 1 | 1 | 2 |
| 1 | 2 | 3 |
| 1 | 2 | 4 |
| 2 | 1 | 5 |
| 2 | 1 | 6 |
+-------------+-------------+-------------+
Output:
+-------------+-------------+
| actor_id | director_id |
+-------------+-------------+
| 1 | 1 |
+-------------+-------------+
Explanation: The only pair is (1, 1) where they cooperated exactly 3 times.

Solution 1:

import pandas as pd

def actors_and_directors(actor_director: pd.DataFrame) -> pd.DataFrame:

# find the number of pairs
df = actor_director[["actor_id", "director_id"]].value_counts().reset_index()

# data filtering
df = df[df["count"]>=3]

return df[["actor_id", "director_id"]]

Solution 2:

import pandas as pd

def actors_and_directors(actor_director: pd.DataFrame) -> pd.DataFrame:

df = actor_director.groupby(['actor_id', 'director_id']).size().reset_index(name='counts')
df = df[df['counts'] >= 3][['actor_id', 'director_id']]

return df

Problem 29: Leetcode Easy 1378. Replace Employee ID With The Unique Identifier

Table: Employees
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| id | int |
| name | varchar |
+---------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table contains the id and the name of an employee in a company.

Table: EmployeeUNI
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| id | int |
| unique_id | int |
+---------------+---------+
(id, unique_id) is the primary key (combination of columns with unique values) for this table.
Each row of this table contains the id and the corresponding unique id of an employee in the company.

Write a solution to show the unique ID of each user, If a user does not have a unique ID replace just show null. Return the result table in any order. The result format is in the following example.

Input: 
Employees table:
+----+----------+
| id | name |
+----+----------+
| 1 | Alice |
| 7 | Bob |
| 11 | Meir |
| 90 | Winston |
| 3 | Jonathan |
+----+----------+
EmployeeUNI table:
+----+-----------+
| id | unique_id |
+----+-----------+
| 3 | 1 |
| 11 | 2 |
| 90 | 3 |
+----+-----------+
Output:
+-----------+----------+
| unique_id | name |
+-----------+----------+
| null | Alice |
| null | Bob |
| 2 | Meir |
| 3 | Winston |
| 1 | Jonathan |
+-----------+----------+
Explanation:
Alice and Bob do not have a unique ID, We will show null instead.
The unique ID of Meir is 2.
The unique ID of Winston is 3.
The unique ID of Jonathan is 1.

Solution:

import pandas as pd

def replace_employee_id(employees: pd.DataFrame, employee_uni: pd.DataFrame) -> pd.DataFrame:

df = employee_uni.merge(employees, how="outer")
df = df[["unique_id", "name"]].dropna(subset=["name"])

return df

Problem 30: Leetcode Easy 1280. Students and Examinations

Table: Students
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| student_id | int |
| student_name | varchar |
+---------------+---------+
student_id is the primary key (column with unique values) for this table.
Each row of this table contains the ID and the name of one student in the school.

Table: Subjects
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| subject_name | varchar |
+--------------+---------+
subject_name is the primary key (column with unique values) for this table.
Each row of this table contains the name of one subject in the school.

Table: Examinations
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| student_id | int |
| subject_name | varchar |
+--------------+---------+
There is no primary key (column with unique values) for this table. It may contain duplicates.
Each student from the Students table takes every course from the Subjects table.
Each row of this table indicates that a student with ID student_id attended the exam of subject_name.

Write a solution to find the number of times each student attended each exam. Return the result table ordered by student_id and subject_name. The result format is in the following example.

Input: 
Students table:
+------------+--------------+
| student_id | student_name |
+------------+--------------+
| 1 | Alice |
| 2 | Bob |
| 13 | John |
| 6 | Alex |
+------------+--------------+
Subjects table:
+--------------+
| subject_name |
+--------------+
| Math |
| Physics |
| Programming |
+--------------+
Examinations table:
+------------+--------------+
| student_id | subject_name |
+------------+--------------+
| 1 | Math |
| 1 | Physics |
| 1 | Programming |
| 2 | Programming |
| 1 | Physics |
| 1 | Math |
| 13 | Math |
| 13 | Programming |
| 13 | Physics |
| 2 | Math |
| 1 | Math |
+------------+--------------+
Output:
+------------+--------------+--------------+----------------+
| student_id | student_name | subject_name | attended_exams |
+------------+--------------+--------------+----------------+
| 1 | Alice | Math | 3 |
| 1 | Alice | Physics | 2 |
| 1 | Alice | Programming | 1 |
| 2 | Bob | Math | 1 |
| 2 | Bob | Physics | 0 |
| 2 | Bob | Programming | 1 |
| 6 | Alex | Math | 0 |
| 6 | Alex | Physics | 0 |
| 6 | Alex | Programming | 0 |
| 13 | John | Math | 1 |
| 13 | John | Physics | 1 |
| 13 | John | Programming | 1 |
+------------+--------------+--------------+----------------+
Explanation:
The result table should contain all students and all subjects.
Alice attended the Math exam 3 times, the Physics exam 2 times, and the Programming exam 1 time.
Bob attended the Math exam 1 time, the Programming exam 1 time, and did not attend the Physics exam.
Alex did not attend any exams.
John attended the Math exam 1 time, the Physics exam 1 time, and the Programming exam 1 time.

Solution:

import pandas as pd

def students_and_examinations(students: pd.DataFrame, subjects: pd.DataFrame, examinations: pd.DataFrame) -> pd.DataFrame:

# Group by student_id and subject_name to count the number of exams
grouped = examinations.groupby(['student_id',
'subject_name']).size().reset_index(name='attended_exams')

# Get all combinations of (id, subject)
combinations = pd.merge(students, subjects, how='cross')

# Left join to retain all combinations
df = pd.merge(combinations, grouped,
on=['student_id', 'subject_name'],
how='left')

# Data cleaning
df['attended_exams'] = df['attended_exams'].fillna(0).astype(int)

# Sort DataFrame in ascending based on 'student_id', 'subject_name'
df.sort_values(['student_id', 'subject_name'], inplace=True)

return df[['student_id', 'student_name', 'subject_name', 'attended_exams']]

Problem 31: Leetcode Medium 570. Managers with at Least 5 Direct Reports

Table: Employee
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| id | int |
| name | varchar |
| department | varchar |
| managerId | int |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table indicates the name of an employee, their department, and the id of their manager.
If managerId is null, then the employee does not have a manager.
No employee will be the manager of themself.

Write a solution to find managers with at least five direct reports. Return the result table in any order. The result format is in the following example.

Input: 
Employee table:
+-----+-------+------------+-----------+
| id | name | department | managerId |
+-----+-------+------------+-----------+
| 101 | John | A | null |
| 102 | Dan | A | 101 |
| 103 | James | A | 101 |
| 104 | Amy | A | 101 |
| 105 | Anne | A | 101 |
| 106 | Ron | B | 101 |
+-----+-------+------------+-----------+
Output:
+------+
| name |
+------+
| John |
+------+

Solution 1:

import pandas as pd

def find_managers(employee: pd.DataFrame) -> pd.DataFrame:

# count and filter "managerId"
mask = pd.DataFrame(employee["managerId"].value_counts()>=5)
mask = mask[mask["count"]==True].index.tolist()

# filter where employee id == manager id
employee = employee[employee["id"].isin(mask)][["name"]]

return employee

Solution 2:

import pandas as pd

def find_managers(employee: pd.DataFrame) -> pd.DataFrame:

# find the number of reports for each manager id
df = employee.groupby('managerId').size().reset_index(name='count')

# count and filter "managerId"
df = df[df['count'] >= 5]

# find managers with at least five direct reports
managers_info = pd.merge(df, employee, left_on='managerId', right_on='id', how='inner')

return managers_info[['name']]

Problem 32: Leetcode Easy 607. Sales Person

Table: SalesPerson
+-----------------+---------+
| Column Name | Type |
+-----------------+---------+
| sales_id | int |
| name | varchar |
| salary | int |
| commission_rate | int |
| hire_date | date |
+-----------------+---------+
sales_id is the primary key (column with unique values) for this table.
Each row of this table indicates the name and the ID of a salesperson alongside their salary, commission rate, and hire date.

Table: Company
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| com_id | int |
| name | varchar |
| city | varchar |
+-------------+---------+
com_id is the primary key (column with unique values) for this table.
Each row of this table indicates the name and the ID of a company and the city in which the company is located.

Table: Orders
+-------------+------+
| Column Name | Type |
+-------------+------+
| order_id | int |
| order_date | date |
| com_id | int |
| sales_id | int |
| amount | int |
+-------------+------+
order_id is the primary key (column with unique values) for this table.
com_id is a foreign key (reference column) to com_id from the Company table.
sales_id is a foreign key (reference column) to sales_id from the SalesPerson table.
Each row of this table contains information about one order. This includes the ID of the company, the ID of the salesperson, the date of the order, and the amount paid.

Write a solution to find the names of all the salespersons who did not have any orders related to the company with the name “RED”. Return the result table in any order. The result format is in the following example.

Input: 
SalesPerson table:
+----------+------+--------+-----------------+------------+
| sales_id | name | salary | commission_rate | hire_date |
+----------+------+--------+-----------------+------------+
| 1 | John | 100000 | 6 | 4/1/2006 |
| 2 | Amy | 12000 | 5 | 5/1/2010 |
| 3 | Mark | 65000 | 12 | 12/25/2008 |
| 4 | Pam | 25000 | 25 | 1/1/2005 |
| 5 | Alex | 5000 | 10 | 2/3/2007 |
+----------+------+--------+-----------------+------------+
Company table:
+--------+--------+----------+
| com_id | name | city |
+--------+--------+----------+
| 1 | RED | Boston |
| 2 | ORANGE | New York |
| 3 | YELLOW | Boston |
| 4 | GREEN | Austin |
+--------+--------+----------+
Orders table:
+----------+------------+--------+----------+--------+
| order_id | order_date | com_id | sales_id | amount |
+----------+------------+--------+----------+--------+
| 1 | 1/1/2014 | 3 | 4 | 10000 |
| 2 | 2/1/2014 | 4 | 5 | 5000 |
| 3 | 3/1/2014 | 1 | 1 | 50000 |
| 4 | 4/1/2014 | 1 | 4 | 25000 |
+----------+------------+--------+----------+--------+
Output:
+------+
| name |
+------+
| Amy |
| Mark |
| Alex |
+------+
Explanation:
According to orders 3 and 4 in the Orders table, it is easy to tell that only salesperson John and Pam have sales to company RED, so we report all the other names in the table salesperson.

Solution 1:

import pandas as pd

def sales_person(sales_person: pd.DataFrame, company: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:

# separate company ids for the "RED" companies
red_code = company[company["name"] == "RED"]["com_id"]

# separate the sales ids from the "RED" companies
red_sales = orders[orders["com_id"].isin(red_code)]["sales_id"]

# find the salespersons who didn't have any orders related to the company "RED"
non_red_sales = sales_person[~sales_person["sales_id"].isin(red_sales)][['name']]

return non_red_sales

Solution 2:

import pandas as pd

def sales_person(sales_person: pd.DataFrame, company: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:

# merge orders and company based on company id
df = orders.merge(company, on='com_id')

# find the unique sales id of the RED company
invalid_ids = df[df['name'] == 'RED']["sales_id"].unique()

# find the sales persons who are not related to the company RED
salespersons = sales_person[~sales_person['sales_id'].isin(invalid_ids)]

return salespersons[['name']]

Problem 33: Leetcode Medium 2041. Accepted Candidates From the Interviews

Table: Candidates
+--------------+----------+
| Column Name | Type |
+--------------+----------+
| candidate_id | int |
| name | varchar |
| years_of_exp | int |
| interview_id | int |
+--------------+----------+
candidate_id is the primary key (column with unique values) for this table.
Each row of this table indicates the name of a candidate, their number of years of experience, and their interview ID.

Table: Rounds
+--------------+------+
| Column Name | Type |
+--------------+------+
| interview_id | int |
| round_id | int |
| score | int |
+--------------+------+
(interview_id, round_id) is the primary key (combination of columns with unique values) for this table.
Each row of this table indicates the score of one round of an interview.

Write a solution to report the IDs of the candidates who have at least two years of experience and the sum of the score of their interview rounds is strictly greater than 15. Return the result table in any order. The result format is in the following example.

Input: 
Candidates table:
+--------------+---------+--------------+--------------+
| candidate_id | name | years_of_exp | interview_id |
+--------------+---------+--------------+--------------+
| 11 | Atticus | 1 | 101 |
| 9 | Ruben | 6 | 104 |
| 6 | Aliza | 10 | 109 |
| 8 | Alfredo | 0 | 107 |
+--------------+---------+--------------+--------------+
Rounds table:
+--------------+----------+-------+
| interview_id | round_id | score |
+--------------+----------+-------+
| 109 | 3 | 4 |
| 101 | 2 | 8 |
| 109 | 4 | 1 |
| 107 | 1 | 3 |
| 104 | 3 | 6 |
| 109 | 1 | 4 |
| 104 | 4 | 7 |
| 104 | 1 | 2 |
| 109 | 2 | 1 |
| 104 | 2 | 7 |
| 107 | 2 | 3 |
| 101 | 1 | 8 |
+--------------+----------+-------+
Output:
+--------------+
| candidate_id |
+--------------+
| 9 |
+--------------+
Explanation:
- Candidate 11: The total score is 16, and they have one year of experience. We do not include them in the result table because of their years of experience.
- Candidate 9: The total score is 22, and they have six years of experience. We include them in the result table.
- Candidate 6: The total score is 10, and they have ten years of experience. We do not include them in the result table because the score is not good enough.
- Candidate 8: The total score is 6, and they have zero years of experience. We do not include them in the result table because of their years of experience and the score.

Solution:

import pandas as pd

def accepted_candidates(candidates: pd.DataFrame, rounds: pd.DataFrame) -> pd.DataFrame:

# filter candidates with at least 2 years of experience
candidates = candidates[candidates["years_of_exp"]>=2]

# Left join rounds and candidates based on the "interview_id"
rounds = rounds.merge(candidates, how="left")

# for each candidate_id, find the sum of score
rounds = rounds.groupby(["candidate_id"])[["score"]].sum().reset_index()

# filter higher scoring candidates
rounds = rounds[rounds["score"]>15]

return rounds[["candidate_id"]]
Source

--

--