User Purchase Platform: Leetcode Pandas

Shahidullah Kawsar
4 min readJan 3, 2025

--

In the rapidly evolving landscape of digital commerce, businesses strive to comprehend how users interact with various platforms to optimize their strategies and enhance user experiences. Imagine having access to detailed data that captures user spending behavior across different devices over specific periods. In this blog post, we’ll explore a concise dataset that tracks user expenditures over multiple days, distinguishing between desktop and mobile platform usage.

View from the World’s Fair Marina Banquet, New York. Photo Credit: Tasnim and Kawsar
Table: Spending
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| user_id | int |
| spend_date | date |
| platform | str |
| amount | int |
+-------------+---------+
The table logs the history of the spending of users that make purchases from
an online shopping website that has a desktop and a mobile application.

(user_id, spend_date, platform) is the primary key (combination of columns
with unique values) of this table.

The platform column is a str (category) type of ('desktop', 'mobile').

Write a solution to find the total number of users and the total amount spent using the mobile-only, the desktop-only, and both mobile and desktop together for each date. Return the result table in any order. The result format is in the following example.

Example 1:
Input:
Spending table:
+---------+------------+----------+--------+
| user_id | spend_date | platform | amount |
+---------+------------+----------+--------+
| 1 | 2019-07-01 | mobile | 100 |
| 1 | 2019-07-01 | desktop | 100 |
| 2 | 2019-07-01 | mobile | 100 |
| 2 | 2019-07-02 | mobile | 100 |
| 3 | 2019-07-01 | desktop | 100 |
| 3 | 2019-07-02 | desktop | 100 |
+---------+------------+----------+--------+
Output:
+------------+----------+--------------+-------------+
| spend_date | platform | total_amount | total_users |
+------------+----------+--------------+-------------+
| 2019-07-01 | desktop | 100 | 1 |
| 2019-07-01 | mobile | 100 | 1 |
| 2019-07-01 | both | 200 | 1 |
| 2019-07-02 | desktop | 100 | 1 |
| 2019-07-02 | mobile | 100 | 1 |
| 2019-07-02 | both | 0 | 0 |
+------------+----------+--------------+-------------+
Explanation:
On 2019-07-01, user 1 purchased using both desktop and mobile,
user 2 purchased using mobile only and user 3 purchased using desktop only.
On 2019-07-02, user 2 purchased using mobile only,
user 3 purchased using desktop only and no one purchased using both platforms.

Solution 01: We can solve this problem using **Pandas** by following these steps:

Step 1: Group by spend_date and user_id to aggregate platforms and amounts

df = spending.groupby(['spend_date', 'user_id']).agg({
'platform': lambda x: set(x),
'amount': 'sum'
}).reset_index()

display(df)

Step 2: Categorize each user based on the platforms used

def categorize_platform(platforms):
if platforms == {'mobile'}:
return 'mobile'
elif platforms == {'desktop'}:
return 'desktop'
elif platforms == {'mobile', 'desktop'}:
return 'both'
else:
return 'other' # For any unexpected cases

df['platform'] = df['platform'].apply(categorize_platform)

display(df)

Step 3: Aggregate total_amount and total_users for each spend_date and platform_category

df = df.groupby(['spend_date', 'platform']).agg(
total_amount=('amount', 'sum'),
total_users=('user_id', 'nunique')
).reset_index()

display(df)

Step 4: Ensure all combinations of spend_date and platforms are present

# Define all possible categories
categories = ['mobile', 'desktop', 'both']

# Create a data frame with all combinations of spend_date and platform
all_combinations = pd.MultiIndex.from_product(
[spending['spend_date'].unique(), categories],
names=['spend_date', 'platform']
).to_frame(index=False)

display(all_combinations)

Step 5: Merge with the df to include missing combinations with default values

df = all_combinations.merge(
df,
on=['spend_date', 'platform'],
how='left'
).fillna({'total_amount': 0, 'total_users': 0})

# Convert total_amount and total_users to integer type
df['total_amount'] = df['total_amount'].astype(int)
df['total_users'] = df['total_users'].astype(int)

# Optional: Sort for better readability
df = df.sort_values(by=['spend_date', 'platform']).reset_index(drop=True)

# Display the final output
display(df)

Happy learning! Follow me: LinkedIn Instagram Medium profile or subscribe to my email list for the latest blogs, and click the clap button to support writing. Feel free to ask questions in the comment section. Don’t forget to share the blog link with your friends or LinkedIn connections.

References:
[1] Leetcode Source: https://leetcode.com/problems/user-purchase-platform/description/
[2] Full code: https://github.com/SKawsar/Pandas-Hard-Data-Preprocessing/blob/main/User%20Purchase%20Platform.ipynb

--

--

Shahidullah Kawsar
Shahidullah Kawsar

Written by Shahidullah Kawsar

Senior Data Scientist, IDARE, Houston, TX

No responses yet