Learn Pandas in 15 Questions from LeetCode
LeetCode has launched a study plan “Introduction to Pandas”, with 15 questions to learn the basics of Pandas.
Problem 1 (Leetcode 2877. Create a DataFrame from List)
Write a solution to create a DataFrame from a 2D list called student_data
. This 2D list contains the IDs and ages of some students. The DataFrame should have two columns, student_id
and age
, and be in the same order as the original 2D list. The result format is in the following example.
[1, 15],
[2, 11],
[3, 11],
[4, 20]
| student_id | age |
| 1 | 15 |
| 2 | 11 |
| 3 | 11 |
| 4 | 20 |
A DataFrame was created on top of student_data, with two columns named student_id and age.
Solution: pandas.DataFrame
import pandas as pd
def createDataframe(student_data: List[List[int]]) -> pd.DataFrame:
return pd.DataFrame(student_data, columns=['student_id', 'age'])
Time complexity: Creating the DataFrame from the 2D list takes O(n) time, where n is the number of rows in the student_data list.
Space complexity: The space complexity is also O(n) as we create a DataFrame to store the data.
Problem 2: Leetcode 2878. Get the Size of a DataFrame
DataFrame players:
player_id (int), name (object), age (int), position (object), team (object)
Write a solution to calculate and display the number of rows and columns of players
. Return the result as an array:[number of rows, number of columns]
The result format is in the following example.
| player_id | name | age | position | team |
| 846 | Mason | 21 | Forward | RealMadrid |
| 749 | Riley | 30 | Winger | Barcelona |
| 155 | Bob | 28 | Striker | ManchesterUnited |
| 583 | Isabella | 32 | Goalkeeper | Liverpool |
| 388 | Zachary | 24 | Midfielder | BayernMunich |
| 883 | Ava | 23 | Defender | Chelsea |
| 355 | Violet | 18 | Striker | Juventus |
| 247 | Thomas | 27 | Striker | ParisSaint-Germain |
| 761 | Jack | 33 | Midfielder | ManchesterCity |
| 642 | Charlie | 36 | Center-back | Arsenal |
[10, 5]
This DataFrame contains 10 rows and 5 columns.
Solution: pandas.DataFrame.shape
import pandas as pd
def getDataframeSize(players: pd.DataFrame) -> List[int]:
return list(players.shape)
Time complexity: The time complexity is O(1) because accessing the .shape attribute of a DataFrame is a constant-time operation.
Space complexity: The space complexity is O(1) as we only return a small array containing two integers.
Problem 3: Leetcode 2879. Display the First Three Rows
DataFrame: employees
| Column Name | Type |
| employee_id | int |
| name | object |
| department | object |
| salary | int |
Write a solution to display the first 3
rows of this DataFrame.
DataFrame employees
| employee_id | name | department | salary |
| 3 | Bob | Operations | 48675 |
| 90 | Alice | Sales | 11096 |
| 9 | Tatiana | Engineering | 33805 |
| 60 | Annabelle | InformationTechnology | 37678 |
| 49 | Jonathan | HumanResources | 23793 |
| 43 | Khaled | Administration | 40454 |
| employee_id | name | department | salary |
| 3 | Bob | Operations | 48675 |
| 90 | Alice | Sales | 11096 |
| 9 | Tatiana | Engineering | 33805 |
Only the first 3 rows are displayed.
Solution: pandas.DataFrame.head
import pandas as pd
def selectFirstRows(employees: pd.DataFrame) -> pd.DataFrame:
return employees.head(3)
Time complexity: The time complexity of selecting the first three rows using .head(3) is O(1) because it involves a constant-time operation.
Space complexity: The space complexity is O(1) since we are returning a DataFrame that contains only the first three rows.
Problem 4: Leetcode 2880. Select Data
DataFrame students
| Column Name | Type |
| student_id | int |
| name | object |
| age | int |
Write a solution to select the name and age of the student with student_id = 101
. The result format is in the following example.
| student_id | name | age |
| 101 | Ulysses | 13 |
| 53 | William | 10 |
| 128 | Henry | 6 |
| 3 | Henry | 11 |
| name | age |
| Ulysses | 13 |
Student Ulysses has student_id = 101, we select the name and age.
Solution: Indexing and selecting data
import pandas as pd
def selectData(students: pd.DataFrame) -> pd.DataFrame:
return students[students["student_id"] == 101][["name", "age"]]
Time complexity: The time complexity depends on the size of the ‘students’ DataFrame but can be considered as O(n), where n is the number of rows in the DataFrame.
Space complexity: The space complexity is also dependent on the size of the ‘students’ DataFrame but can be considered as O(n), where n is the number of rows in the DataFrame.
Problem 5: 2881. Create a New Column
DataFrame employees
| Column Name | Type. |
| name | object |
| salary | int. |
A company plans to provide its employees with a bonus. Write a solution to create a new column name bonus
that contains the doubled values of the salary
column. The result format is in the following example.
DataFrame employees
| name | salary |
| Piper | 4548 |
| Grace | 28150 |
| Georgia | 1103 |
| Willow | 6593 |
| Finn | 74576 |
| Thomas | 24433 |
| name | salary | bonus |
| Piper | 4548 | 9096 |
| Grace | 28150 | 56300 |
| Georgia | 1103 | 2206 |
| Willow | 6593 | 13186 |
| Finn | 74576 | 149152 |
| Thomas | 24433 | 48866 |
A new column bonus is created by doubling the value in the column salary.
import pandas as pd
def createBonusColumn(employees: pd.DataFrame) -> pd.DataFrame:
employees["bonus"] = employees["salary"]*2
return employees
Time complexity: The time complexity depends on the size of the ‘employees’ DataFrame but can be considered as O(n), where n is the number of rows in the DataFrame.
Space complexity: The space complexity is also dependent on the size of the ‘employees’ DataFrame but can be considered as O(n), where n is the number of rows in the DataFrame.
Problem 6: 2882. Drop Duplicate Rows
DataFrame customers
| Column Name | Type |
| customer_id | int |
| name | object |
| email | object |
There are some duplicate rows in the DataFrame based on the email
column. Write a solution to remove these duplicate rows and keep only the first occurrence. The result format is in the following example.
Example 1:
| customer_id | name | email |
| 1 | Ella | emily@example.com |
| 2 | David | michael@example.com |
| 3 | Zachary | sarah@example.com |
| 4 | Alice | john@example.com |
| 5 | Finn | john@example.com |
| 6 | Violet | alice@example.com |
| customer_id | name | email |
| 1 | Ella | emily@example.com |
| 2 | David | michael@example.com |
| 3 | Zachary | sarah@example.com |
| 4 | Alice | john@example.com |
| 6 | Violet | alice@example.com |
Alic (customer_id = 4) and Finn (customer_id = 5) both use john@example.com, so only the first occurrence of this email is retained.
Solution: pandas.DataFrame.drop_duplicates
import pandas as pd
def dropDuplicateEmails(customers: pd.DataFrame) -> pd.DataFrame:
return customers.drop_duplicates(subset=["email"], keep="first")
Time complexity: The time complexity depends on the size of the ‘customers’ DataFrame but can be considered as O(n), where n is the number of rows in the DataFrame.
Space complexity: The space complexity is also dependent on the size of the ‘customers’ DataFrame but can be considered as O(n), where n is the number of rows in the DataFrame.
Problem 7: 2883. Drop Missing Data
DataFrame students
| Column Name | Type |
| student_id | int |
| name | object |
| age | int |
Some rows have missing values in the name
column. Write a solution to remove the rows with missing values. The result format is in the following example.
| student_id | name | age |
| 32 | Piper | 5 |
| 217 | None | 19 |
| 779 | Georgia | 20 |
| 849 | Willow | 14 |
| student_id | name | age |
| 32 | Piper | 5 |
| 779 | Georgia | 20 |
| 849 | Willow | 14 |
Student with id 217 havs empty value in the name column, so it will be removed.
Solution: pandas.DataFrame.dropna
import pandas as pd
def dropMissingData(students: pd.DataFrame) -> pd.DataFrame:
# solution 1
return students.dropna(subset=["name"])
# solution 2
# return pd.DataFrame(students)[students['name'].notnull()]
Time complexity: The time complexity depends on the size of the ‘students’ DataFrame but can be considered as O(n), where n is the number of rows in the DataFrame.
Space complexity: The space complexity is O(n) as we are creating a new DataFrame containing only the rows with non-null ‘name’ values.
Problem 8: 2884. Modify Columns
DataFrame employees
| Column Name | Type |
| name | object |
| salary | int |
A company intends to give its employees a pay rise. Write a solution to modify the salary
column by multiplying each salary by 2. The result format is in the following example.
DataFrame employees
| name | salary |
| Jack | 19666 |
| Piper | 74754 |
| Mia | 62509 |
| Ulysses | 54866 |
| name | salary |
| Jack | 39332 |
| Piper | 149508 |
| Mia | 125018 |
| Ulysses | 109732 |
Every salary has been doubled.
import pandas as pd
def modifySalaryColumn(employees: pd.DataFrame) -> pd.DataFrame:
employees["salary"] = employees["salary"]*2
return employees
Time complexity: The time complexity depends on the size of the ‘employees’ DataFrame but can be considered as O(n), where n is the number of rows in the DataFrame.
Space complexity: The space complexity is O(1) because we are modifying the ‘salary’ column in-place without creating any additional data structures.
Problem 9: 2885. Rename Columns
DataFrame students
| Column Name | Type |
| id | int |
| first | object |
| last | object |
| age | int |
Write a solution to rename the columns as follows:
The result format is in the following example.
Example 1:
| id | first | last | age |
| 1 | Mason | King | 6 |
| 2 | Ava | Wright | 7 |
| 3 | Taylor | Hall | 16 |
| 4 | Georgia | Thompson | 18 |
| 5 | Thomas | Moore | 10 |
| student_id | first_name | last_name | age_in_years |
| 1 | Mason | King | 6 |
| 2 | Ava | Wright | 7 |
| 3 | Taylor | Hall | 16 |
| 4 | Georgia | Thompson | 18 |
| 5 | Thomas | Moore | 10 |
The column names are changed accordingly.
Solution: pandas.DataFrame.rename
import pandas as pd
def renameColumns(students: pd.DataFrame) -> pd.DataFrame:
return students.rename(columns={"id": "student_id",
"first": "first_name",
"last": "last_name",
"age": "age_in_years"})
Time complexity: The time complexity depends on the size of the ‘students’ DataFrame but can be considered as O(1) because renaming columns is a constant-time operation.
Space complexity: The space complexity is also O(1) because we are modifying the ‘students’ DataFrame in-place without creating additional data structures.
Problem 10: 2886. Change Data Type
DataFrame students
| Column Name | Type |
| student_id | int |
| name | object |
| age | int |
| grade | float |
Write a solution to correct the errors: The grade
column is stored as floats, convert it to integers. The result format is in the following example.
Example 1:
DataFrame students:
| student_id | name | age | grade |
| 1 | Ava | 6 | 73.0 |
| 2 | Kate | 15 | 87.0 |
| student_id | name | age | grade |
| 1 | Ava | 6 | 73 |
| 2 | Kate | 15 | 87 |
The data types of the column grade is converted to int.
Solution: pandas.DataFrame.astype
import pandas as pd
def changeDatatype(students: pd.DataFrame) -> pd.DataFrame:
students["grade"] = students["grade"].astype("int")
return students
Time complexity: The time complexity depends on the size of the ‘students’ DataFrame but can be considered as O(n) because it involves creating a new DataFrame and converting the ‘grade’ column.
Space complexity: The space complexity is also O(n) because we are creating a new DataFrame ‘df’ that has the same size as the ‘students’ DataFrame.
Problem 11: 2887. Fill Missing Data
DataFrame products
| Column Name | Type |
| name | object |
| quantity | int |
| price | int |
Write a solution to fill in the missing value as 0
in the quantity
column. The result format is in the following example.
Example 1:
| name | quantity | price |
| Wristwatch | None | 135 |
| WirelessEarbuds | None | 821 |
| GolfClubs | 779 | 9319 |
| Printer | 849 | 3051 |
| name | quantity | price |
| Wristwatch | 0 | 135 |
| WirelessEarbuds | 0 | 821 |
| GolfClubs | 779 | 9319 |
| Printer | 849 | 3051 |
The quantity for Wristwatch and WirelessEarbuds are filled by 0.
Solution: pandas.DataFrame.fillna
import pandas as pd
def fillMissingValues(products: pd.DataFrame) -> pd.DataFrame:
products["quantity"] = products["quantity"].fillna(0)
return products
Time complexity: The time complexity depends on the size of the ‘products’ DataFrame but can be considered as O(n) because it involves updating values in a column.
Space complexity: The space complexity is O(1) because we are modifying the ‘products’ DataFrame in-place without creating additional data structures.
Problem 12: 2888. Reshape Data: Concatenate
DataFrame df1
| Column Name | Type |
| student_id | int |
| name | object |
| age | int |
DataFrame df2
| Column Name | Type |
| student_id | int |
| name | object |
| age | int |
Write a solution to concatenate these two DataFrames vertically into one DataFrame. The result format is in the following example.
| student_id | name | age |
| 1 | Mason | 8 |
| 2 | Ava | 6 |
| 3 | Taylor | 15 |
| 4 | Georgia | 17 |
| student_id | name | age |
| 5 | Leo | 7 |
| 6 | Alex | 7 |
| student_id | name | age |
| 1 | Mason | 8 |
| 2 | Ava | 6 |
| 3 | Taylor | 15 |
| 4 | Georgia | 17 |
| 5 | Leo | 7 |
| 6 | Alex | 7 |
The two DataFramess are stacked vertically, and their rows are combined.
Solution: pandas.concat
import pandas as pd
def concatenateTables(df1: pd.DataFrame, df2: pd.DataFrame) -> pd.DataFrame:
return pd.concat([df1, df2], axis=0)
Time complexity: The time complexity depends on the sizes of ‘df1’ and ‘df2’ but can be considered as O(n), where n is the combined number of rows in both DataFrames.
Space complexity: The space complexity is O(n) because we create a new DataFrame containing the concatenated data.
Problem 13: 2889. Reshape Data: Pivot
DataFrame weather
| Column Name | Type |
| city | object |
| month | object |
| temperature | int |
Write a solution to pivot the data so that each row represents temperatures for a specific month, and each city is a separate column. The result format is in the following example.
Example 1:
| city | month | temperature |
| Jacksonville | January | 13 |
| Jacksonville | February | 23 |
| Jacksonville | March | 38 |
| Jacksonville | April | 5 |
| Jacksonville | May | 34 |
| ElPaso | January | 20 |
| ElPaso | February | 6 |
| ElPaso | March | 26 |
| ElPaso | April | 2 |
| ElPaso | May | 43 |
| month | ElPaso | Jacksonville |
| April | 2 | 5 |
| February | 6 | 23 |
| January | 20 | 13 |
| March | 26 | 38 |
| May | 43 | 34 |
The table is pivoted, each column represents a city, and each row represents a specific month.
Solution: pandas.DataFrame.pivot
import pandas as pd
def pivotTable(weather: pd.DataFrame) -> pd.DataFrame:
return weather.pivot(index="month",
Time complexity: The time complexity depends on the size of the ‘weather’ DataFrame but can be considered as O(n), where n is the number of rows in the DataFrame.
Space complexity: The space complexity is O(n) because we create a new DataFrame in the pivoted format.
Problem 14: 2890. Reshape Data: Melt
DataFrame report
| Column Name | Type |
| product | object |
| quarter_1 | int |
| quarter_2 | int |
| quarter_3 | int |
| quarter_4 | int |
Write a solution to reshape the data so that each row represents sales data for a product in a specific quarter. The result format is in the following example.
| product | quarter_1 | quarter_2 | quarter_3 | quarter_4 |
| Umbrella | 417 | 224 | 379 | 611 |
| SleepingBag | 800 | 936 | 93 | 875 |
| product | quarter | sales |
| Umbrella | quarter_1 | 417 |
| SleepingBag | quarter_1 | 800 |
| Umbrella | quarter_2 | 224 |
| SleepingBag | quarter_2 | 936 |
| Umbrella | quarter_3 | 379 |
| SleepingBag | quarter_3 | 93 |
| Umbrella | quarter_4 | 611 |
| SleepingBag | quarter_4 | 875 |
The DataFrame is reshaped from wide to long format. Each row represents the sales of a product in a quarter.
Solution: pandas.melt
import pandas as pd
def meltTable(report: pd.DataFrame) -> pd.DataFrame:
return pd.melt(report,
value_vars=["quarter_1", "quarter_2",
"quarter_3", "quarter_4"],
Time complexity: The time complexity depends on the size of the ‘report’ DataFrame but can be considered as O(n), where n is the number of cells in the DataFrame.
Space complexity: The space complexity is O(n) because we have created a new DataFrame in the melted format.
Problem 15: 2891. Method Chaining
DataFrame animals
| Column Name | Type |
| name | object |
| species | object |
| age | int |
| weight | int |
Write a solution to list the names of animals that weigh strictly more than 100
kilograms. Return the animals sorted by weight in descending order. The result format is in the following example.
DataFrame animals:
| name | species | age | weight |
| Tatiana | Snake | 98 | 464 |
| Khaled | Giraffe | 50 | 41 |
| Alex | Leopard | 6 | 328 |
| Jonathan | Monkey | 45 | 463 |
| Stefan | Bear | 100 | 50 |
| Tommy | Panda | 26 | 349 |
| name |
| Tatiana |
| Jonathan |
| Tommy |
| Alex |
All animals weighing more than 100 should be included in the results table.
Tatiana's weight is 464, Jonathan's weight is 463, Tommy's weight is 349, and Alex's weight is 328.
The results should be sorted in descending order of weight.
In Pandas, method chaining enables us to perform operations on a DataFrame without breaking up each operation into a separate line or creating multiple temporary variables. Can you complete this task in just one line of code using method chaining?
Solution: pandas.DataFrame.sort_values
import pandas as pd
def findHeavyAnimals(animals: pd.DataFrame) -> pd.DataFrame:
return animals.loc[animals["weight"]>100].sort_values(by="weight",
Time complexity: The time complexity depends on the size of the ‘animals’ DataFrame but can be considered as O(n * log(n)) due to the sorting operation and then O(n) for the subsequent filtering.
Space complexity: The space complexity is O(n) as we have created a new DataFrame containing the selected rows.