User Purchase Platform: Leetcode Pandas
In the rapidly evolving landscape of digital commerce, businesses strive to comprehend how users interact with various platforms to optimize their strategies and enhance user experiences. Imagine having access to detailed data that captures user spending behavior across different devices over specific periods. In this blog post, we’ll explore a concise dataset that tracks user expenditures over multiple days, distinguishing between desktop and mobile platform usage.
Table: Spending
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| user_id | int |
| spend_date | date |
| platform | str |
| amount | int |
+-------------+---------+
The table logs the history of the spending of users that make purchases from
an online shopping website that has a desktop and a mobile application.
(user_id, spend_date, platform) is the primary key (combination of columns
with unique values) of this table.
The platform column is a str (category) type of ('desktop', 'mobile').
Write a solution to find the total number of users and the total amount spent using the mobile-only, the desktop-only, and both mobile and desktop together for each date. Return the result table in any order. The result format is in the following example.
Example 1:
Input:
Spending table:
+---------+------------+----------+--------+
| user_id | spend_date | platform | amount |
+---------+------------+----------+--------+
| 1 | 2019-07-01 | mobile | 100 |
| 1 | 2019-07-01 | desktop | 100 |
| 2 | 2019-07-01 | mobile | 100 |
| 2 | 2019-07-02 | mobile | 100 |
| 3 | 2019-07-01 | desktop | 100 |
| 3 | 2019-07-02 | desktop | 100 |
+---------+------------+----------+--------+
Output:
+------------+----------+--------------+-------------+
| spend_date | platform | total_amount | total_users |
+------------+----------+--------------+-------------+
| 2019-07-01 | desktop | 100 | 1 |
| 2019-07-01 | mobile | 100 | 1 |
| 2019-07-01 | both | 200 | 1 |
| 2019-07-02 | desktop | 100 | 1 |
| 2019-07-02 | mobile | 100 | 1 |
| 2019-07-02 | both | 0 | 0 |
+------------+----------+--------------+-------------+
Explanation:
On 2019-07-01, user 1 purchased using both desktop and mobile,
user 2 purchased using mobile only and user 3 purchased using desktop only.
On 2019-07-02, user 2 purchased using mobile only,
user 3 purchased using desktop only and no one purchased using both platforms.
Solution 01: We can solve this problem using **Pandas** by following these steps:
Step 1: Group by spend_date and user_id to aggregate platforms and amounts
df = spending.groupby(['spend_date', 'user_id']).agg({
'platform': lambda x: set(x),
'amount': 'sum'
}).reset_index()
display(df)
Step 2: Categorize each user based on the platforms used
def categorize_platform(platforms):
if platforms == {'mobile'}:
return 'mobile'
elif platforms == {'desktop'}:
return 'desktop'
elif platforms == {'mobile', 'desktop'}:
return 'both'
else:
return 'other' # For any unexpected cases
df['platform'] = df['platform'].apply(categorize_platform)
display(df)
Step 3: Aggregate total_amount and total_users for each spend_date and platform_category
df = df.groupby(['spend_date', 'platform']).agg(
total_amount=('amount', 'sum'),
total_users=('user_id', 'nunique')
).reset_index()
display(df)
Step 4: Ensure all combinations of spend_date and platforms are present
# Define all possible categories
categories = ['mobile', 'desktop', 'both']
# Create a data frame with all combinations of spend_date and platform
all_combinations = pd.MultiIndex.from_product(
[spending['spend_date'].unique(), categories],
names=['spend_date', 'platform']
).to_frame(index=False)
display(all_combinations)
Step 5: Merge with the df to include missing combinations with default values
df = all_combinations.merge(
df,
on=['spend_date', 'platform'],
how='left'
).fillna({'total_amount': 0, 'total_users': 0})
# Convert total_amount and total_users to integer type
df['total_amount'] = df['total_amount'].astype(int)
df['total_users'] = df['total_users'].astype(int)
# Optional: Sort for better readability
df = df.sort_values(by=['spend_date', 'platform']).reset_index(drop=True)
# Display the final output
display(df)
Happy learning! Follow me: LinkedIn Instagram Medium profile or subscribe to my email list for the latest blogs, and click the clap button to support writing. Feel free to ask questions in the comment section. Don’t forget to share the blog link with your friends or LinkedIn connections.
References:
[1] Leetcode Source: https://leetcode.com/problems/user-purchase-platform/description/
[2] Full code: https://github.com/SKawsar/Pandas-Hard-Data-Preprocessing/blob/main/User%20Purchase%20Platform.ipynb