Amazon Product Placement and Brand Bias Analysis¶

This project investigates spatial patterns and brand-related trends in Amazon search result data. The dataset, collected through browser extensions, includes the position (top/left coordinates), brand affiliation, and promotional status (e.g., Amazon Prime, Sponsored) of products shown in user searches.

The notebook explores whether Amazon-branded products receive more favorable placement than third-party products. I perform exploratory data analysis (EDA), calculate frequency metrics, and visualize spatial trends to assess potential bias in search result rankings.

This notebook focuses on data cleaning, visual pattern discovery, and early insights generation.

In [23]:
# Libraries import for data analysis
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from sklearn.cluster import KMeans

Load and Preview Data¶

In [24]:
# Data Import
# Imported the Amazon user dataset and inspected first few rows
data = pd.read_csv('C:/Users/Baljot/Desktop/Old School/Data 301/A3/Amazon_data.csv')
data.head()
Out[24]:
Unnamed: 0 base_spell subspell date_created_day Top Left asin is_targeted_brand search_result_amazonprime search_result_usedoptions ... used_offers subsave_option name_fit name_cosine_distance nonamefit nonamedistance brand major_brand has_amazon_brands has_other_brands
0 1 1 1 8/31/2022 226.00000 283.20001 B098QR8Q4N False False False ... False False 0.400000 0.186015 False NaN Warriors: False 0 0
1 2 1 1 8/31/2022 461.05002 283.20001 B000VYX8L8 False False False ... False False 0.293103 0.119305 False NaN Warriors False 0 0
2 3 1 1 8/31/2022 722.65002 283.20001 B0B6Y7SNT9 False False False ... False False 0.333333 0.303127 False NaN Warriors: False 0 0
3 4 1 1 8/31/2022 1341.45010 283.20001 B09RKBVZVJ False False False ... False False 0.363636 0.310618 False NaN Warriors: False 0 0
4 5 1 1 8/31/2022 1571.65000 283.20001 B09N8T1DWR False False False ... False False 0.363636 0.170995 False NaN Warriors False 0 0

5 rows × 68 columns

Feature Engineering¶

Several new columns were created to support the analysis:

  • date_created_day: Converted the original date strings into datetime format to enable time-based grouping and visualization.

  • amazon_brand_count: Added a binary indicator to count whether a product was affiliated with an Amazon brand.

  • amazon_rank_value: Extracted the rank values for Amazon-branded products to calculate their average search ranking over time.

  • targeted_brand_count: Created a count for targeted brands appearing in search results to track their visibility trends.

  • major_brand_count: Established a binary marker identifying whether a product belonged to a major brand.

These features were used throughout the exploratory analysis to better understand brand presence, ranking behaviors, and potential spatial biases in search result placements.

Exploratory Data Analysis (EDA)¶

In [67]:
data.columns
Out[67]:
Index(['Unnamed: 0', 'base_spell', 'subspell', 'date_created_day', 'Top',
       'Left', 'asin', 'is_targeted_brand', 'search_result_amazonprime',
       'search_result_usedoptions', 'search_result_outofstock',
       'search_result_best_seller', 'search_result_sponsored',
       'search_result_sponsored_tag', 'search_result_rank',
       'search_result_resultdetail', 'search_result_stars',
       'search_result_ratings', 'search_result_newprice',
       'search_result_oldprice', 'search_result_unitprice',
       'search_result_deliveryrule', 'search_result_deliverytime',
       'search_result_price', 'search_result_stockleft',
       'search_result_coupon', 'search_result_freedelivery',
       'search_result_used_price', 'search_result_used_offers',
       'search_result_discount_subsave', 'search_result_freeshipping',
       'search_result_brand_subtitle', 'rank_full', 'rank_data_index',
       'rank_unique_page', 'amazon_brand', 'search_results_stars', 'ratings',
       'norating', 'stars', 'nostars', 'price', 'price_discount', 'noprice',
       'delivery_speed', 'min_for_freedelivery', 'delivery_fee',
       'nodeliveryfee', 'free_delivery', 'free_delivery_possible',
       'delivery_date', 'delivery_days', 'nodeliverydt', 'noinfostockleft',
       'coupon', 'n_other_offers', 'no_n_other_offers', 'new_offers',
       'used_offers', 'subsave_option', 'name_fit', 'name_cosine_distance',
       'nonamefit', 'nonamedistance', 'brand', 'major_brand',
       'has_amazon_brands', 'has_other_brands', 'amazon_brand_count',
       'major_brand_count', 'total_brand_count'],
      dtype='object')
In [25]:
targeted_pct = np.mean(data['is_targeted_brand'] == True) * 100
print(f"Amazon-branded products make up {targeted_pct:.2f}% of all results.")
Amazon-branded products make up 1.26% of all results.

Spatial Placement Distribution¶

In [26]:
# Separated the data based on brand
targeted_brand = data[data['is_targeted_brand'] == True]
not_targeted_brand = data[(data['is_targeted_brand'] == False) & (data['major_brand']==True)]

# Scatter plot with different colors for points based on the condition
plt.figure(figsize=(10, 6))
plt.scatter(targeted_brand['Left'], targeted_brand['Top'], marker='o', color='#00A8E1', label='Amazon Targeted Brand', alpha=0.5)
plt.scatter(not_targeted_brand['Left'], not_targeted_brand['Top'], marker='o', color='red', label='Not Targeted Brand', alpha=0.5)

plt.xlabel('Left (X-coordinate)')
plt.ylabel('Top (Y-coordinate)')
plt.title('Approximate positioning of Brands')
plt.legend()
plt.grid(True)
plt.show()

Date Parsing¶

Converted the date_created_day column to datetime format for proper time-based analysis and plotting.

In [27]:
data['date_created_day'] = pd.to_datetime(data['date_created_day'])

Dataset Time Range¶

To better understand the scope of the data, I extracted the minimum and maximum dates from the date_created_day column.
This provides context on how long the data was collected over time.

In [28]:
min_date = data['date_created_day'].min()
max_date = data['date_created_day'].max()
print(min_date, max_date)
2022-06-17 00:00:00 2023-01-09 00:00:00

Amazon vs Major Brand Presence Over Time¶

Behind every product search lies a quiet competition. This chart visualizes the proportion of Amazon-branded and major-branded products appearing in search results over time — a daily tug-of-war between Amazon’s own offerings and major household names.

Ratios were calculated by dividing the count of branded products by the total number of listings each day. The plot highlights shifts in visibility, revealing who gets seen, who gets sidelined, and how brand presence shifts in the shadows of the algorithm.

In [52]:
# Convert boolean values to int
data['amazon_brand_count'] = data['amazon_brand'].astype(int)
data['major_brand_count'] = data['major_brand'].astype(int)
data['total_brand_count'] = 1  # Every row = 1 product

# Group data by date
grouped = data.groupby('date_created_day').agg({'amazon_brand_count': 'sum', 'total_brand_count': 'sum'})
major_grouped = data.groupby('date_created_day').agg({'major_brand_count': 'sum', 'total_brand_count': 'sum'})

# Calculate daily ratios
grouped['amazon_brand_ratio'] = grouped['amazon_brand_count'] / grouped['total_brand_count']
major_grouped['major_brand_ratio'] = major_grouped['major_brand_count'] / major_grouped['total_brand_count']

# Plot
plt.figure(figsize=(12, 6))
plt.plot(grouped.index, grouped['amazon_brand_ratio'], color='#00A8E1', marker='o', label='Amazon Brand Ratio')
plt.plot(major_grouped.index, major_grouped['major_brand_ratio'], color='red', marker='o', label='Major Brand Ratio')
plt.xlabel('Date')
plt.ylabel('Brand Ratio')
plt.title('Amazon vs Major Brand Ratios Over Time')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Average Search Rank of Amazon vs Major Brands¶

This line chart compares the daily average search result rank of Amazon-branded and major-branded products over time. In the competitive landscape of online retail, rank determines visibility — and visibility drives outcomes. Lower values indicate higher placement on the page. Dashed horizontal lines show the overall average rank for each group, highlighting consistent gaps in positioning. The visualization reveals how small differences in search ranking can quietly shape which brands dominate the customer’s attention.

In [65]:
# Calculate the average rank of Amazon's brands for each time period
amazon_average_ranks = data[data['amazon_brand']].groupby('date_created_day')['rank_full'].mean()
major_average_ranks = data[data['major_brand']].groupby('date_created_day')['rank_full'].mean()

# Visualizes the Amazon Brand Rank Average over time
plt.figure(figsize=(12, 6))
plt.plot(amazon_average_ranks.index, amazon_average_ranks.values, color='#00A8E1', marker='o')
plt.plot(major_average_ranks.index, major_average_ranks.values, color='red', marker='o')

# Adds a mean line
plt.axhline(major_average_ranks.mean(), color='red', linestyle='--', label=f'Avg Major Prevalence: {major_average_ranks.mean():.2f}')
plt.axhline(amazon_average_ranks.mean(), color='#00A8E1', linestyle='--', label=f'Avg Amazon Prevalence: {amazon_average_ranks.mean():.2f}')
plt.xlabel('Date')
plt.ylabel('Brand Rank Average')
plt.title('Brand Rank Average Over Time')
plt.grid(True)
plt.legend()
plt.show()

Average Search Result Position of Amazon vs Major Brands¶

This chart shows the daily average search result position of Amazon and major-branded products using rank_data_index, which records placement per individual user search. rank_data_index was selected over rank_full due to its clearer construction and direct interpretability. Lower values correspond to better visibility in search results.

In [66]:
# Calculate the average rank by brand type for each time period
amazon_average_ranks = data[data['amazon_brand']].groupby('date_created_day')['rank_data_index'].mean()
major_average_ranks = data[data['major_brand']].groupby('date_created_day')['rank_data_index'].mean()

# Visualize the Amazon Brand Rank Average over time
plt.figure(figsize=(12, 6))
plt.plot(amazon_average_ranks.index, amazon_average_ranks.values, color='#00A8E1', marker='o')
plt.plot(major_average_ranks.index, major_average_ranks.values, color='red', marker='o')

# Adds a mean line
plt.axhline(amazon_average_ranks.mean(), color='#00A8E1', linestyle='--', label=f'Avg Amazon Prevalence: {amazon_average_ranks.mean():.2f}')
plt.axhline(major_average_ranks.mean(), color='red', linestyle='--', label=f'Avg Major Prevalence: {major_average_ranks.mean():.2f}')
plt.xlabel('Date')
plt.ylabel('Brand Result Position Average')
plt.title('Brand Result Position Average Over Time')
plt.grid(True)
plt.legend()
plt.show()

Brand Prevalence Over Time¶

This chart traces the daily count of Amazon-branded and major-branded products appearing across user searches. Grouped by date, the counts reveal how often each brand type surfaced over time. Dashed horizontal lines mark the average daily prevalence for Amazon and major brands, offering a steady benchmark against the day-to-day fluctuations. Together, these trends highlight how brand presence shifts within the search landscape.

In [54]:
# Filters the DataFrame to select Amazon's brands and major brands
amazon_brands_df = data[data['amazon_brand']]
major_brands_df = data[data['major_brand']]

# Groups by the relevant time period (e.g., date) and counts the occurrences of Amazon's brands and major brands
amazon_brand_counts = amazon_brands_df.groupby('date_created_day').size()
major_brand_counts = major_brands_df.groupby('date_created_day').size()

# Visualize the Brand Prevalence over time
plt.figure(figsize=(12, 6))
plt.plot(amazon_brand_counts.index, amazon_brand_counts.values, label='Amazon Brands', color='#00A8E1', marker='o')
plt.plot(major_brand_counts.index, major_brand_counts.values, label='Major Brands', color='red', marker='o')
plt.axhline(amazon_brand_counts.mean(), color='#00A8E1', linestyle='--', label=f'Avg Amazon Prevalence: {amazon_brand_counts.mean():.2f}')
plt.axhline(major_brand_counts.mean(), color='red', linestyle='--', label=f'Avg Major Prevalence: {major_brand_counts.mean():.2f}')
plt.xlabel('Date')
plt.ylabel('Brand Prevalence')
plt.title('Brand Prevalence Over Time')
plt.grid(True)
plt.legend()
plt.show()

Summary¶

This analysis explored brand visibility patterns within Amazon search results, focusing on Amazon-branded and major-branded products. Key trends were identified by comparing daily brand frequency, average ranking position, and overall brand prevalence. The dataset revealed that Amazon's own brands consistently occupied higher-ranking positions and appeared more frequently than major competitors. These patterns suggest potential brand favoritism in search result visibility, raising important considerations around competition and platform neutrality.

Further analysis with expanded datasets or additional metadata (e.g., click-through rates, category filters) could deepen these insights.