Back to Blog

Ready for a free 2 GB trial?

Book a call with one of our Data Nerds to unlock a super-sized free trial.

START TRIAL

Scraping Walmart Product Data Using Massive

Jason Grad
Proxy Network Manager
March 4, 2025

Walmart provides a vast collection of product data through its Walmart website which is a valuable resource for businesses, researchers, and marketers. Whether you're accessing a single Walmart product page or navigating through multiple Walmart pages, collecting relevant information can be challenging due to Walmart's strong anti-scraping measures.

This article will explain how Massive proxies make it easier to bypass these restrictions and allow you to collect Walmart product details from specific locations.

Let’s dive in!

Common Use Cases for Walmart Product Data

The Walmart website contains valuable insights that can be used for various purposes, including:

  • Product & Market Research: Analyze data points from each Walmart product page, including reviews and ratings to understand consumer preferences and market trends.
  • Competitor Analysis: Gain insights into competitor pricing and product strategies
  • Price Monitoring & Optimization: Track real-time prices to adjust pricing strategies and stay competitive.
  • Inventory Management: Monitor stock levels and product availability to optimize inventory and supply chain operations.

Why Use Proxies in Web Scraping

Proxies play a crucial role in web scraping by serving as intermediaries between your scraper and the target website. They offer several key benefits, including:

  1. Avoiding IP Bans: Proxies allow you to rotate IP addresses, which reduces the risk of detection and blocking by the website.
  2. Accessing Geo-Restricted Content: Some content or products are available only in certain regions. Proxies allow you to appear as though you are browsing from a different location.
  3. Bypassing Rate Limits: Websites often impose limits on the number of requests from a single IP address. Proxies help distribute your requests across multiple IPs, allowing you to avoid these restrictions.

Understanding Walmart Anti-Scraping Measures

Scraping data from Walmart can be challenging due to several protections in place. Here are some common issues you might face:

  1. CAPTCHA Challenges: Walmart uses CAPTCHAs to block bots. While these are relatively easy for humans to solve, they can be difficult for automated scripts.
  2. Rate Limiting: Walmart restricts the number of requests you can make within a short period. If you exceed this limit, your access may be blocked.
  3. IP Blocking: If Walmart detects excessive scraping activity from a single IP address, it may block that IP.
  4. Changing Page Layout: Walmart frequently updates the structure of its web pages. These changes can break your scraping code, requiring you to update it regularly.
  5. Dynamic Content: The Walmart search page and other Walmart pages use JavaScript to load content dynamically, which can make scraping more complex.

For example, I created a script to scrape data from multiple product pages on Walmart. However, my scraper was eventually blocked by Walmart’s bot detection system, as you can see in the image below.

This challenge, which asks me to "Press & Hold" to verify my humanity, is a common CAPTCHA mechanism used by websites to prevent bots from accessing their content.

At this point, my scraper could no longer access the data, which was both frustrating and time-consuming. But don't worry—there's a solution. We'll explore how Massive Residential Proxies can help you scrape Walmart product data.

Residential vs. Datacenter Proxies for Walmart Scraping: What Works Best

I've been scraping Walmart for a while now, and your proxy choice makes all the difference. From my experience, residential proxies are worth the extra cost. They use real IPs assigned by actual ISPs to homeowners, so to Walmart's systems, you just look like a regular shopper browsing from home.  Yeah, it costs more, but the data quality and uninterrupted scraping sessions make it worthwhile.

Datacenter proxies are tempting - they're faster and cheaper - but Walmart's anti-bot systems have gotten pretty good at spotting them.

Benefits of Using Massive Proxies for Walmart Scraping

Massive residential proxies offer several key benefits:

  1. 🌐 Global Reach: Access 195+ countries—target cities, ZIP codes, or ASN
  2. 🔒 Anonymity: Millions of rotating IPs + customizable rotation (per request or 1-60 mins)
  3. ⚡ Speed & Reliability: 99.8% success rate, <0.7s response times, and 99.9% uptime
  4. 💰 Budget-Friendly: Start at $4.49/GB with scalable plans
  5. ✅ Ethically Compliant: Fully GDPR/CCPA-compliant proxies, 100% ethically sourced
  6. 🛠️ Support: Via Slack, Skype, email, or tickets

Getting Started with Massive

If you’re new to Massive, sign up for an account. Choose a plan for your needs.

Note: We offer a 2 GB free trial for companies. To get started, fill out this form. If you need more bandwidth, contact our sales team, and we’ll assist you.

After signing up, go to the Massive Dashboard to retrieve your proxy credentials (username and password).

Configuration Steps:

Visit the Quickstart section to customize your proxy settings:

  • Choose your preferred protocol (HTTP, HTTPS, or SOCKS5)
  • Select between rotating or sticky proxies
  • Set geo-targeting preferences (country, state, city, or ZIP code)

Once configured, you'll get a ready-to-use cURL command for your specific use case.

For advanced features like location-based targeting and sticky sessions, refer to the Massive Documentation. The docs provide step-by-step instructions for getting the most out of Massive Residential Proxies.

With this setup, you can use Massive Proxies to scrape Walmart products data from the specific region.

Building Walmart Scraper with Python and Massive Proxies

While you could use a Walmart scraper api, building your own solution with proper user agent configuration gives you more control over the scraping process. Let's explore how to build a Python scraper for Walmart product data using Massive proxies and Playwright. Playwright helps automate browser actions and handle dynamic content, such as loading more products as you scroll.

Using Massive proxies, you can scrape Walmart data from any location where Walmart operates, simply by changing the proxy settings. For this tutorial, we’ll show scraping product data in Washington, USA.

We’ll extract the following data for each Walmart product: Product Name, Rating, Number of Reviews, Price, Previous Price (if available), Shipping Info, and Product Link.

1. Set Up Your Python Environment

To begin, make sure you have Python installed on your machine. Next, install Playwright and its necessary browser binaries:

pip install playwright
playwright install

2. Import Required Libraries

Now, let’s start writing the script. You'll need to import the following libraries:

  • asyncio for asynchronous programming.
  • random to add random delays.
  • json to save our scraped data.
  • async_playwright from Playwright to control the browser and automate the scraping.
import asyncio
import random
import json
from playwright.async_api import async_playwright

3. Launch the Browser with Massive Proxy Settings

Launch the Chromium browser in headless mode with proxy settings. This allows you to bypass Walmart's anti-scraping measures using Massive proxies.

async with async_playwright() as p:
    browser = await p.chromium.launch(
        headless=True,
        args=[
            "--disable-blink-features=AutomationControlled",
            "--disable-infobars",
            "--disable-extensions",
            "--disable-popup-blocking",
            "--no-sandbox",
        ],
    )

Next, set up the browser context to route traffic through Massive residential proxies:

context = await browser.new_context(
    user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36",
    locale="en-GB",
    proxy={
        "server": "https://network.joinmassive.com:65535",
        "username": "MASSIVE_USERNAME-country-US-subdivision-WA",
        "password": "MASSIVE_PASSWORD",
    },
    viewport={"width": 1920, "height": 1080},
)

Open a new page and navigate to the Walmart search results:

page = await context.new_page()
await page.goto(
    "https://www.walmart.com/search?q=windows+laptops",
    wait_until="domcontentloaded",
)

This code will direct the browser to the Walmart search results page for "Windows laptops”.

4. Scroll and Extract Data

Since Walmart loads more products as you scroll, we'll simulate human-like scrolling to make sure all products are loaded:

async def scroll_and_extract(page):
    previous_height = await page.evaluate("document.body.scrollHeight")
    while True:
        await page.evaluate("window.scrollBy(0, Math.random() * 100 + 300)")
        await asyncio.sleep(random.uniform(1, 2))

        new_height = await page.evaluate("document.body.scrollHeight")
        if new_height == previous_height:
            break
        previous_height = new_height

5. Extract Product Information

Now we extract details like product name, price, and ratings using CSS selectors. Here’s how you can extract each piece of information:

A. Product Name:

Code snippet:

product_name = await product.query_selector('span[data-automation-id="product-title"]')
product_name = await product_name.inner_text() if product_name else "N/A"

B. Current Price:

Code snippet:

price = await product.query_selector(
    'div[data-automation-id="product-price"] div[aria-hidden="true"]'
)
price = await price.inner_text() if price else "N/A"

C. Previous Price:

Code snippet:

previous_price = await product.query_selector("div.gray.strike")
previous_price = await previous_price.inner_text() if previous_price else "N/A"

D. Product Rating:

Code snippet:

rating = await product.query_selector('span[data-testid="product-ratings"]')
rating = await rating.get_attribute("data-value") if rating else "N/A"

E. Number of Reviews:

Code snippet:

num_reviews = await product.query_selector('span[data-testid="product-reviews"]')
num_reviews = await num_reviews.inner_text() if num_reviews else "N/A"

F. Shipping Information:

Code snippet:

shipping_info = await product.query_selector('div[data-automation-id="fulfillment-badge"]')
shipping_info = await shipping_info.inner_text() if shipping_info else "N/A"

Here’s the combined code that returns all the information from each product.

async def extract_product_info(product):
    title_selector = 'span[data-automation-id="product-title"]'
    price_selector = 'div[data-automation-id="product-price"] div[aria-hidden="true"]'
    previous_price_selector = "div.gray.strike"
    rating_selector = 'span[data-testid="product-ratings"]'
    reviews_selector = 'span[data-testid="product-reviews"]'
    shipping_info_selector = 'div[data-automation-id="fulfillment-badge"]'
    product_url_selector = 'a[href*="/ip/"]'

    title = await product.query_selector(title_selector)
    product_url_element = await product.query_selector(product_url_selector)
    product_url = (
        await product_url_element.get_attribute("href") if product_url_element else None
    )

    if product_url and "from=/search" in product_url:
        current_price = await product.query_selector(price_selector)
        previous_price = await product.query_selector(previous_price_selector)
        rating = await product.query_selector(rating_selector)
        num_reviews = await product.query_selector(reviews_selector)
        shipping_info = await product.query_selector(shipping_info_selector)

        return {
            "title": await title.inner_text() if title else "N/A",
            "product_url": f"https://www.walmart.com/{product_url}",
            "current_price": (
                await current_price.inner_text() if current_price else "N/A"
            ),
            "previous_price": (
                await previous_price.inner_text() if previous_price else "N/A"
            ),
            "rating": await rating.get_attribute("data-value") if rating else "N/A",
            "num_reviews": await num_reviews.inner_text() if num_reviews else "N/A",
            "shipping_info": (
                await shipping_info.inner_text() if shipping_info else "N/A"
            ),
        }
    return None

6. Scrape Multiple Pages

To scrape multiple pages, we'll locate the 'Next Page' button and click through each one

async def scrape_walmart(page, current_page):
    async def product_info_generator(current_page):
        while True:
            print(f"Scraping page {current_page}...")
            await scroll_and_extract(page)

            # Extract product information
            product_elements = await page.query_selector_all(
                'div[role="group"][data-item-id]'
            )

            for product in product_elements:
                product_data = await extract_product_info(product)
                if product_data:
                    yield product_data
            # Check for the "Next Page" button
            next_page_button = await page.query_selector('a[data-testid="NextPage"]')
            if next_page_button:
                await next_page_button.click()
                current_page += 1
            else:
                break

    return product_info_generator(current_page)

7. Save the Data to a JSON File

Once all data is extracted, save it to a JSON file:

def save_data_to_json(data, filename):
    with open(filename, "w", encoding="utf-8") as json_file:
        json.dump(data, json_file, ensure_ascii=False, indent=4)
    print(f"Product information saved to {filename}")

8. Running the Scraper

Here’s the main function to start the Walmart scraper:

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        await page.goto(
            "https://www.walmart.com/search?q=windows+laptops",
            wait_until="domcontentloaded",
        )

        current_page = 1
        product_info_generator = await scrape_walmart(page, current_page)

        with open("walmart_products.json", "w", encoding="utf-8") as json_file:
            json_file.write("[")
            first = True
            async for product in product_info_generator:
                if not first:
                    json_file.write(",\\n")
                json.dump(product, json_file, ensure_ascii=False, indent=4)
                first = False
            json_file.write("\\n]")
            print("Product information saved to walmart_products.json")


asyncio.run(main())

Final Result

You can find the complete code for scraping Walmart data using Massive proxies in the GitHub Gist.

When you run the code, the result will look something like this:

[
    {
        "title": "14.1in Windows 11 Pro Laptop, 8GB DDR4, 512GB SSD Computer, Intel Celeron, 1920x1080, 1TB Expansion, Silver",
        "product_url": "<https://www.walmart.com/ip/Temlicolo-14-1-Laptop-8GB-RAM-PC-512GB-SSD-Intel-Celeron-N4020C-up-to-2-8GHz-Windows-11-Pro-Webcam-1TB-SSD-Expansion-Silver/1519228026?classType=VARIANT&selectedSellerId=101196098&from=/search>",
        "current_price": "$227.89",
        "previous_price": "$499.99",
        "rating": "4.5",
        "num_reviews": "220",
        "shipping_info": "Free shipping, arrives in 2 days"
    },
    {
        "title": "HP Stream 14 inch Windows Laptop Intel Processor N4120 4GB RAM 64GB eMMC Pink (12-mo. Microsoft 365 included)",
        "product_url": "<https://www.walmart.com/ip/HP-Stream-14-inch-Laptop-Intel-Processor-N4102-4GB-RAM-64GB-eMMC-Pink-12-mo-Microsoft-365-included/443153637?classType=VARIANT&athbdg=L1102&from=/search>",
        "current_price": "$199.00",
        "previous_price": "$229.00",
        "rating": "4",
        "num_reviews": "11,240",
        "shipping_info": "Free pickup today\\nDelivery today\\nFree shipping, arrives today"
    },
    {
        "title": "Jumper 15.6\\" Windows 11 Laptop 4GB DDR4 128GB Rom Computer with Intel Celeron 5205U, Come with 1-Yr Free Office 365",
        "product_url": "<https://www.walmart.com/ip/Jumper-15-6-Laptop-4GB-DDR4-128GB-ROM-Computer-with-Dual-Core-Intel-Celeron-5205U-CPU-1-yr-Office-Free-1366x768-HD/9497006657?classType=VARIANT&selectedSellerId=101078354&from=/search>",
        "current_price": "$199.89",
        "previous_price": "$379.99",
        "rating": "4.2",
        "num_reviews": "6",
        "shipping_info": "Free shipping, arrives in 2 days"
    }
]

Check the complete JSON file with all Walmart “Windows laptop” data scraped from every available page.

Wrapping Up

This article discussed how leveraging Massive proxies can help you extract valuable data while minimizing the risk of detection and blocking. For more details on proxy configuration or best practices, be sure to visit our official documentation.

Ready to get started? Sign up for Massive Proxies today 🚀

Read More