LeetCode 30 Days of Pandas (Part 1 of 3)

Shahidullah Kawsar
12 min readJan 23, 2024

LeetCode has launched a study plan “30 days of Pandas”, with 33 questions to practice your Pandas skills. Here are the first 10 questions and their multiple solutions for you.

Problem 1: Leetcode Easy 595. Big Countries

Table: World
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| name | varchar |
| continent | varchar |
| area | int |
| population | int |
| gdp | bigint |
+-------------+---------+
name is the primary key (column with unique values) for this table.
Each row of this table gives information about the name of a country,
the continent to which it belongs, its area, the population,
and its GDP value.

A country is big if it has an area of at least three million (i.e., 3000000 km2), or it has a population of at least twenty-five million (i.e., 25000000). Write a solution to find the name, population, and area of the big countries. Return the result table in any order. The result format is in the following example.

Input: 
World table:
+-------------+-----------+---------+------------+--------------+
| name | continent | area | population | gdp |
+-------------+-----------+---------+------------+--------------+
| Afghanistan | Asia | 652230 | 25500100 | 20343000000 |
| Albania | Europe | 28748 | 2831741 | 12960000000 |
| Algeria | Africa | 2381741 | 37100000 | 188681000000 |
| Andorra | Europe | 468 | 78115 | 3712000000 |
| Angola | Africa | 1246700 | 20609294 | 100990000000 |
+-------------+-----------+---------+------------+--------------+
Output:
+-------------+------------+---------+
| name | population | area |
+-------------+------------+---------+
| Afghanistan | 25500100 | 652230 |
| Algeria | 37100000 | 2381741 |
+-------------+------------+---------+

Solution:

import pandas as pd

def big_countries(world: pd.DataFrame) -> pd.DataFrame:

# filtering a dataframe based on multiple conditions
world = world[(world["area"]>=3000000) | (world["population"]>=25000000)]

return world[["name", "population", "area"]]

Problem 2: Leetcode Easy 1757. Recyclable and Low Fat Products

Table: Products
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| product_id | int |
| low_fats | enum |
| recyclable | enum |
+-------------+---------+
product_id is the primary key (column with unique values) for this table.
low_fats is an ENUM (category) of type ('Y', 'N') where 'Y' means this product is low fat and 'N' means it is not.
recyclable is an ENUM (category) of types ('Y', 'N') where 'Y' means this product is recyclable and 'N' means it is not.

Write a solution to find the ids of products that are both low fat and recyclable. Return the result table in any order. The result format is in the following example.

Input: 
Products table:
+-------------+----------+------------+
| product_id | low_fats | recyclable |
+-------------+----------+------------+
| 0 | Y | N |
| 1 | Y | Y |
| 2 | N | Y |
| 3 | Y | Y |
| 4 | N | N |
+-------------+----------+------------+
Output:
+-------------+
| product_id |
+-------------+
| 1 |
| 3 |
+-------------+
Explanation: Only products 1 and 3 are both low fat and recyclable.

Solution:

import pandas as pd

def find_products(products: pd.DataFrame) -> pd.DataFrame:

# filtering a dataframe based on multiple conditions
df = products[(products["low_fats"]=="Y") & (products["recyclable"]=="Y")]

return df[["product_id"]]

Problem 03: Leetcode Easy 183. Customers Who Never Order

Table: Customers
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| id | int |
| name | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table indicates the ID and name of a customer.

Table: Orders
+-------------+------+
| Column Name | Type |
+-------------+------+
| id | int |
| customerId | int |
+-------------+------+
id is the primary key (column with unique values) for this table.
customerId is a foreign key (reference columns) of the ID from the Customers table.
Each row of this table indicates the ID of an order and the ID of the customer who ordered it.

Write a solution to find all customers who never order anything. Return the result table in any order. The result format is in the following example.

Input: 
Customers table:
+----+-------+
| id | name |
+----+-------+
| 1 | Joe |
| 2 | Henry |
| 3 | Sam |
| 4 | Max |
+----+-------+
Orders table:
+----+------------+
| id | customerId |
+----+------------+
| 1 | 3 |
| 2 | 1 |
+----+------------+
Output:
+-----------+
| Customers |
+-----------+
| Henry |
| Max |
+-----------+

Solution: pandas.unique, pandas.DataFrame.isin

import pandas as pd

def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:

# find the list of customers who placed orders
customer_list = orders["customerId"].unique().tolist()

# remove the customers who never order
customers = customers[~customers["id"].isin(customer_list)]

# rename the column name
customers = customers.rename(columns={"name": "Customers"})

return customers[["Customers"]]

Problem 04: Leetcode Easy 1148. Article Views I

Table: Views
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| article_id | int |
| author_id | int |
| viewer_id | int |
| view_date | date |
+---------------+---------+
There is no primary key (column with unique values) for this table, the table may have duplicate rows.
Each row of this table indicates that some viewer viewed an article (written by some author) on some date.
Note that equal author_id and viewer_id indicate the same person.

Write a solution to find all the authors who viewed at least one of their own articles. Return the result table sorted by id in ascending order. The result format is in the following example.

Input: 
Views table:
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date |
+------------+-----------+-----------+------------+
| 1 | 3 | 5 | 2019-08-01 |
| 1 | 3 | 6 | 2019-08-02 |
| 2 | 7 | 7 | 2019-08-01 |
| 2 | 7 | 6 | 2019-08-02 |
| 4 | 7 | 1 | 2019-07-22 |
| 3 | 4 | 4 | 2019-07-21 |
| 3 | 4 | 4 | 2019-07-21 |
+------------+-----------+-----------+------------+
Output:
+------+
| id |
+------+
| 4 |
| 7 |
+------+

Solution 1:

import pandas as pd

def article_views(views: pd.DataFrame) -> pd.DataFrame:

# create new column to identify the authors who viewed their own articles
views["id"] = np.where(views["author_id"]==views["viewer_id"],
views["author_id"],
np.nan)

return views[["id"]].dropna().drop_duplicates().sort_values(by="id")

Solution 2: pandas.DataFrame.drop_duplicates

import pandas as pd

def article_views(views: pd.DataFrame) -> pd.DataFrame:

# filter to identify the authors who viewed their own articles
df = views[views['author_id'] == views['viewer_id']]

# remove duplicate authors
df.drop_duplicates(subset=['author_id'], inplace=True)

# sort by author_id in ascending order
df.sort_values(by=['author_id'], inplace=True)

df.rename(columns={'author_id':'id'}, inplace=True)

return df[['id']]

Problem 05: Leetcode Easy 1683. Invalid Tweets

Table: Tweets
+----------------+---------+
| Column Name | Type |
+----------------+---------+
| tweet_id | int |
| content | varchar |
+----------------+---------+
tweet_id is the primary key (column with unique values) for this table.
This table contains all the tweets in a social media app.

Write a solution to find the IDs of the invalid tweets. The tweet is invalid if the number of characters used in the content of the tweet is strictly greater than 15. Return the result table in any order. The result format is in the following example.

Input: 
Tweets table:
+----------+----------------------------------+
| tweet_id | content |
+----------+----------------------------------+
| 1 | Vote for Biden |
| 2 | Let us make America great again! |
+----------+----------------------------------+
Output:
+----------+
| tweet_id |
+----------+
| 2 |
+----------+
Explanation:
Tweet 1 has length = 14. It is a valid tweet.
Tweet 2 has length = 32. It is an invalid tweet.

Solution: pandas.Series.str.len

import pandas as pd

def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:

return tweets[tweets['content'].str.len()>15][["tweet_id"]]

Problem 06: Leetcode Easy 1873. Calculate Special Bonus

Table: Employees
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| employee_id | int |
| name | varchar |
| salary | int |
+-------------+---------+
employee_id is the primary key (column with unique values) for this table.
Each row of this table indicates the employee ID, employee name, and salary.

Write a solution to calculate the bonus of each employee. The bonus of an employee is 100% of their salary if the ID of the employee is an odd number and the employee's name does not start with the character 'M'. The bonus of an employee is 0 otherwise. Return the result table ordered by employee_id. The result format is in the following example.

Input: 
Employees table:
+-------------+---------+--------+
| employee_id | name | salary |
+-------------+---------+--------+
| 2 | Meir | 3000 |
| 3 | Michael | 3800 |
| 7 | Addilyn | 7400 |
| 8 | Juan | 6100 |
| 9 | Kannon | 7700 |
+-------------+---------+--------+
Output:
+-------------+-------+
| employee_id | bonus |
+-------------+-------+
| 2 | 0 |
| 3 | 0 |
| 7 | 7400 |
| 8 | 0 |
| 9 | 7700 |
+-------------+-------+
Explanation:
The employees with IDs 2 and 8 get 0 bonus because they have an even employee_id.
The employee with ID 3 gets 0 bonus because their name starts with 'M'.
The rest of the employees get a 100% bonus.

Solution 1:

import pandas as pd

def calculate_special_bonus(employees: pd.DataFrame) -> pd.DataFrame:

employees["bonus"] = np.where((employees["employee_id"] % 2 == 1) & (employees["name"].str[0]!="M"),
employees["salary"],
0)

return employees[["employee_id", "bonus"]].sort_values(by="employee_id")

Solution 2: pandas.DataFrame.apply, lambda function

import pandas as pd

def calculate_special_bonus(employees: pd.DataFrame) -> pd.DataFrame:

employees['bonus'] = employees.apply(
lambda x: x['salary'] if x['employee_id'] % 2 and not x['name'].startswith('M') else 0,
axis=1
)

df = employees[['employee_id', 'bonus']].sort_values('employee_id')
return df

Problem 07: Leetcode Easy 1667. Fix Names in a Table

Table: Users
+----------------+---------+
| Column Name | Type |
+----------------+---------+
| user_id | int |
| name | varchar |
+----------------+---------+
user_id is the primary key (column with unique values) for this table.
This table contains the ID and the name of the user. The name consists of only lowercase and uppercase characters.

Write a solution to fix the names so that only the first character is uppercase and the rest are lowercase. Return the result table ordered by user_id. The result format is in the following example.

Input: 
Users table:
+---------+-------+
| user_id | name |
+---------+-------+
| 1 | aLice |
| 2 | bOB |
+---------+-------+
Output:
+---------+-------+
| user_id | name |
+---------+-------+
| 1 | Alice |
| 2 | Bob |
+---------+-------+

Solution 1: pandas.Series.str.capitalize

import pandas as pd

def fix_names(users: pd.DataFrame) -> pd.DataFrame:

users["name"] = users["name"].str.capitalize()

return users.sort_values(by="user_id")

Solution 2:

import pandas as pd

def fix_names(users: pd.DataFrame) -> pd.DataFrame:

users["name"] = users["name"].apply(lambda x: x.capitalize())

return users.sort_values(by="user_id")

Solution 3: pandas.Series.str.upper, pandas.Series.str.lower

import pandas as pd

def fix_names(users: pd.DataFrame) -> pd.DataFrame:

users["name"] = users["name"].str[0].str.upper() + users["name"].str[1:].str.lower()

return users.sort_values("user_id")

Problem 08: Leetcode Easy 1517. Find Users With Valid E-Mails

Table: Users
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| name | varchar |
| mail | varchar |
+---------------+---------+
user_id is the primary key (column with unique values) for this table.
This table contains information of the users signed up in a website. Some e-mails are invalid.

Write a solution to find the users who have valid emails. A valid e-mail has a prefix name and a domain where:

  • The prefix name is a string that may contain letters (upper or lower case), digits, underscore '_', period '.', and/or dash '-'. The prefix name must start with a letter.
  • The domain is '@leetcode.com'.

Return the result table in any order. The result format is in the following example.

Input: 
Users table:
+---------+-----------+-------------------------+
| user_id | name | mail |
+---------+-----------+-------------------------+
| 1 | Winston | winston@leetcode.com |
| 2 | Jonathan | jonathanisgreat |
| 3 | Annabelle | bella-@leetcode.com |
| 4 | Sally | sally.come@leetcode.com |
| 5 | Marwan | quarz#2020@leetcode.com |
| 6 | David | david69@gmail.com |
| 7 | Shapiro | .shapo@leetcode.com |
+---------+-----------+-------------------------+
Output:
+---------+-----------+-------------------------+
| user_id | name | mail |
+---------+-----------+-------------------------+
| 1 | Winston | winston@leetcode.com |
| 3 | Annabelle | bella-@leetcode.com |
| 4 | Sally | sally.come@leetcode.com |
+---------+-----------+-------------------------+
Explanation:
The mail of user 2 does not have a domain.
The mail of user 5 has the # sign which is not allowed.
The mail of user 6 does not have the leetcode domain.
The mail of user 7 starts with a period.

Solution: pandas.Series.str.match

import pandas as pd

def valid_emails(users: pd.DataFrame) -> pd.DataFrame:

users = users[users["mail"].str.match('^[a-zA-Z][a-zA-Z0-9_.-]*@leetcode[.]com$')]

return users

Problem 09: Leetcode Easy 1527. Patients With a Condition

Table: Patients
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| patient_id | int |
| patient_name | varchar |
| conditions | varchar |
+--------------+---------+
patient_id is the primary key (column with unique values) for this table.
'conditions' contains 0 or more code separated by spaces.
This table contains information of the patients in the hospital.

Write a solution to find the patient_id, patient_name, and conditions of the patients who have Type I Diabetes. Type I Diabetes always starts with DIAB1 prefix. Return the result table in any order. The result format is in the following example.

Input: 
Patients table:
+------------+--------------+--------------+
| patient_id | patient_name | conditions |
+------------+--------------+--------------+
| 1 | Daniel | YFEV COUGH |
| 2 | Alice | |
| 3 | Bob | DIAB100 MYOP |
| 4 | George | ACNE DIAB100 |
| 5 | Alain | DIAB201 |
+------------+--------------+--------------+
Output:
+------------+--------------+--------------+
| patient_id | patient_name | conditions |
+------------+--------------+--------------+
| 3 | Bob | DIAB100 MYOP |
| 4 | George | ACNE DIAB100 |
+------------+--------------+--------------+
Explanation: Bob and George both have a condition that starts with DIAB1.

Solution 1: pandas.Series.str.contains

import pandas as pd

def find_patients(patients: pd.DataFrame) -> pd.DataFrame:

return patients[patients["conditions"].str.contains(r'\bDIAB1\w*\b')]

Solution 2: pandas.Series.str.startswith

import pandas as pd

def find_patients(patients: pd.DataFrame) -> pd.DataFrame:
return patients[patients["conditions"].str.startswith("DIAB1") | patients["conditions"].str.contains(" DIAB1", regex=False)]

Problem 10: Leetcode Medium 2738. Count Occurrences in Text

Table: Files
+-------------+---------+
| Column Name | Type |
+-- ----------+---------+
| file_name | varchar |
| content | text |
+-------------+---------+
file_name is the column with unique values of this table.
Each row contains file_name and the content of that file.

Write a solution to find the number of files that have at least one occurrence of the words ‘bull’ and ‘bear’ as a standalone word, respectively, disregarding any instances where it appears without space on either side (e.g. ‘bullet’, ‘bears’, ‘bull.’, or ‘bear’ at the beginning or end of a sentence will not be considered). Return the word ‘bull’ and ‘bear’ along with the corresponding number of occurrences in any order. The result format is in the following example.

Input: 
Files table:
+------------+----------------------------------------------------------------------------------+
| file_name | content |
+------------+----------------------------------------------------------------------------------+
| draft1.txt | The stock exchange predicts a bull market which would make many investors happy. |
| draft2.txt | The stock exchange predicts a bull market which would make many investors happy, |
| | but analysts warn of possibility of too much optimism and that in fact we are |
| | awaiting a bear market. |
| draft3.txt | The stock exchange predicts a bull market which would make many investors happy, |
| | but analysts warn of possibility of too much optimism and that in fact we are |
| | awaiting a bear market. As always predicting the future market is an uncertain |
| | game and all investors should follow their instincts and best practices. |
+------------+----------------------------------------------------------------------------------+
Output:
+------+-------+
| word | count |
+------+-------+
| bull | 3 |
| bear | 2 |
+------+-------+
Explanation:
- The word "bull" appears 1 time in "draft1.txt", 1 time in "draft2.txt", and 1 time in "draft3.txt". Therefore, the total number of occurrences for the word "bull" is 3.
- The word "bear" appears 1 time in "draft2.txt", and 1 time in "draft3.txt". Therefore, the total number of occurrences for the word "bear" is 2.

Solution 1: numpy.where

import pandas as pd

def count_occurrences(files: pd.DataFrame) -> pd.DataFrame:
files["bull"] = np.where(files["content"].str.contains(" bull "), 1, 0)
files["bear"] = np.where(files["content"].str.contains(" bear "), 1, 0)

output = pd.DataFrame(files[["bull", "bear"]].sum().reset_index())
output = output.rename(columns={"index": "word", 0: "count"})

return output

Solution 2: pandas.Series.str.contains

import pandas as pd

def count_occurrences(files: pd.DataFrame) -> pd.DataFrame:

bull_count = files[files["content"].str.contains(r"(\s+bull\s+)",
regex=True, case=False)]["file_name"].nunique()

bear_count = files[files["content"].str.contains(r"(\s+bear\s+)",
regex=True, case=False)]["file_name"].nunique()

data = {"word": ["bull", "bear"],
"count": [bull_count, bear_count]}

return pd.DataFrame(data)

Solution 3:

import pandas as pd

def count_occurrences(files: pd.DataFrame) -> pd.DataFrame:

bull_count = files["content"].str.contains(" bull ", case=False).sum()
bear_count = files["content"].str.contains(" bear ", case=False).sum()

data = {"word": ["bull", "bear"], "count": [bull_count, bear_count]}

return pd.DataFrame(data)
Source: https://www.datacamp.com/blog/an-introduction-to-pandas-ai

--

--