Building Bot-Resistant E2E Tests for Shopify Stores with Python, Playwright, and SeleniumBase

You point Playwright at a Shopify store fronted by Cloudflare. You get a 403 or a permanent challenge page. You try headless=False. Same result. You add a user agent. Same result. You add a residential proxy. Now you get rate-limited instead of blocked, which is progress of a sort.

This is a build log for the stack that got us past all of that — about 300 lines of Python and one GitHub Actions file, running ~10 checkout and catalog flows against a live store every day without tripping.

1. The problem

Four things are fighting you, and they compound:

Your Chrome looks wrong. Vanilla Playwright ships a Chromium build with navigator.webdriver = true, missing chrome.runtime, mismatched Navigator.permissions, and a suspicious WebGL vendor string. Cloudflare’s challenges.cloudflare.com/turnstile runs a few hundred of these checks in under a second. You lose before your first page.goto.

Your mouse doesn’t exist. A real user lands somewhere random on a button, often drifts past it, hovers for 200ms, then clicks. Playwright’s .click() teleports the cursor to the exact center of the element’s bounding box and fires a mousedown in the same frame. Bot detection systems have been pattern-matching that for years.

Your IP is hot. Datacenter IPs from AWS, GCP, Hetzner — all pre-flagged. Even free residential proxies arrive pre-burned because ten thousand other scrapers used them yesterday.

Your rotation is worse than your static IP. The naive fix — rotate the proxy on every request — is what tips Cloudflare from “mildly suspicious” to “definitely a bot.” Cloudflare’s cf_clearance cookie is bound to the IP it was issued to. Rotating nukes the clearance every time.

The stack below addresses all four.

2. The architecture

┌─────────────────────────────────────────────────────────────┐
│  pytest + Playwright sync API                               │
│     │                                                       │
│     │  (CDP over ws://127.0.0.1:<port>)                     │
│     ▼                                                       │
│  SeleniumBase (uc=True) Chrome — stealth fingerprint        │
│     │                                                       │
│     │  (HTTP CONNECT)                                       │
│     ▼                                                       │
│  Local auth-bridge proxy  127.0.0.1:18888                   │
│     │                                                       │
│     │  (basic auth injected)                                │
│     ▼                                                       │
│  Webshare static residential proxy                          │
│  (auto-replaced on health signals — unavailable >15min,     │
│  low country confidence, slowdown — via dashboard           │
│  Replace Proxies setting)                                   │
│     │                                                       │
│     ▼                                                       │
│  Shopify store (behind Cloudflare)                          │
└─────────────────────────────────────────────────────────────┘

A few deliberate choices here:

  • SeleniumBase drives Chrome, Playwright drives the test. SeleniumBase’s uc=True mode handles undetected-chromedriver patching, argument scrubbing, and CDP flags. Once Chrome is up, we connect Playwright to the existing browser over CDP and throw the SeleniumBase driver away except for its Cloudflare solver. The test code gets Playwright’s much nicer API.
  • Static residential, with provider-side auto-replace. One Webshare residential endpoint. The cf_clearance cookie sticks to it for the lifetime of a session. Rotation happens out of band at the dashboard layer, not per-request from the client.
  • Local auth bridge. Chrome cannot handle HTTP proxy auth via CDP. You cannot pass user:pass@host:port through --proxy-server. The workaround is a tiny local forwarder that listens on 127.0.0.1 unauthenticated, injects basic auth, and forwards to Webshare.
  • Two-tier Cloudflare bypass. SeleniumBase has a free built-in Turnstile clicker (sb.uc_gui_click_captcha()). It works maybe 70% of the time. For the other 30%, we call CapSolver’s AntiCloudflareTask and inject the returned cf_clearance. CapSolver is one option; 2Captcha, Anti-Captcha, and a few others expose equivalent APIs.

3. Setup walkthrough

Dependencies

# requirements.txt
seleniumbase==4.32.7
playwright==1.47.0
pytest==8.3.3
pytest-rerunfailures==14.0
python-dotenv==1.0.1
requests==2.32.3
pip install -r requirements.txt
playwright install chromium

Environment

# .env
CAPSOLVER_API_KEY=CAP-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
WEBSHARE_PROXY=user-session-abc123:[email protected]:80

Only one proxy, one key. If either is missing, the fixtures skip the run with a clear message instead of falling back to a clean browser and pretending everything is fine.

The local auth-bridge proxy (appendix, but you need it)

Chrome can’t do HTTP CONNECT with basic auth via command-line flags. This small forwarder sits between Chrome and Webshare and adds the Proxy-Authorization header for you:

# tests/support/auth_proxy.py
import base64
import select
import socket
import threading


def start_auth_proxy(upstream_host: str, upstream_port: int,
                     username: str, password: str,
                     listen_port: int = 18888) -> threading.Thread:
    creds = base64.b64encode(f"{username}:{password}".encode()).decode()

    def handle(client: socket.socket) -> None:
        try:
            request = b""
            while b"\r\n\r\n" not in request:
                chunk = client.recv(4096)
                if not chunk:
                    return
                request += chunk

            # inject Proxy-Authorization if missing
            if b"Proxy-Authorization:" not in request:
                headers_end = request.find(b"\r\n\r\n")
                injected = (
                    request[:headers_end]
                    + f"\r\nProxy-Authorization: Basic {creds}".encode()
                    + request[headers_end:]
                )
                request = injected

            upstream = socket.create_connection((upstream_host, upstream_port))
            upstream.sendall(request)
            pipe(client, upstream)
        except Exception:
            pass
        finally:
            client.close()

    def pipe(a: socket.socket, b: socket.socket) -> None:
        sockets = [a, b]
        while True:
            r, _, _ = select.select(sockets, [], [], 30)
            if not r:
                break
            for s in r:
                data = s.recv(8192)
                if not data:
                    return
                (b if s is a else a).sendall(data)

    def serve() -> None:
        server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
        server.bind(("127.0.0.1", listen_port))
        server.listen(16)
        while True:
            client, _ = server.accept()
            threading.Thread(target=handle, args=(client,), daemon=True).start()

    t = threading.Thread(target=serve, daemon=True)
    t.start()
    return t

It’s sixty lines, it handles CONNECT tunnelling, and you’ll never think about it again.

conftest.py — the real work

# tests/conftest.py
import os
import random
import re
import time
from dataclasses import dataclass
from pathlib import Path
from typing import Any

import pytest
import requests
from dotenv import load_dotenv
from playwright.sync_api import Page, BrowserContext, expect, sync_playwright
from seleniumbase import SB

from tests.support.auth_proxy import start_auth_proxy

load_dotenv()

SCREENSHOT_ROOT = Path("latest_logs/screenshots")
LOCAL_PROXY_PORT = 18888


@dataclass(frozen=True)
class StealthPage:
    sb: Any
    page: Page
    context: BrowserContext
    proxy_info: dict


def _parse_proxy(raw: str) -> dict:
    # user:pass@host:port
    auth, host = raw.split("@", 1)
    user, pw = auth.split(":", 1)
    hostname, port = host.split(":", 1)
    return {"host": hostname, "port": int(port), "user": user, "pass": pw}


def bypass_cloudflare(sb: Any, page: Page, target_url: str,
                      proxy_info: dict) -> None:
    challenge_markers = [
        "challenge-platform",
        "cf-challenge-running",
        "Just a moment",
    ]
    title = page.title() or ""
    html = page.content()
    challenged = any(m in title or m in html for m in challenge_markers)
    if not challenged:
        return

    # Tier 1 — SeleniumBase free solver
    try:
        sb.uc_gui_click_captcha()
        page.wait_for_timeout(4000)
        if "challenge-platform" not in page.content():
            return
    except Exception:
        pass

    # Tier 2 — CapSolver AntiCloudflareTask
    api_key = os.environ["CAPSOLVER_API_KEY"]
    create = requests.post(
        "https://api.capsolver.com/createTask",
        json={
            "clientKey": api_key,
            "task": {
                "type": "AntiCloudflareTask",
                "websiteURL": target_url,
                "proxy": f"http:{proxy_info['host']}:{proxy_info['port']}"
                         f":{proxy_info['user']}:{proxy_info['pass']}",
            },
        },
        timeout=30,
    ).json()
    task_id = create.get("taskId")
    if not task_id:
        raise RuntimeError(f"CapSolver createTask failed: {create}")

    deadline = time.time() + 180
    while time.time() < deadline:
        time.sleep(4)
        result = requests.post(
            "https://api.capsolver.com/getTaskResult",
            json={"clientKey": api_key, "taskId": task_id},
            timeout=30,
        ).json()
        if result.get("status") == "ready":
            solution = result["solution"]
            clearance = solution["cookies"]["cf_clearance"]
            ua = solution["userAgent"]
            page.context.add_cookies([{
                "name": "cf_clearance",
                "value": clearance,
                "domain": solution.get("domain", ".shopify.com"),
                "path": "/",
                "secure": True,
                "httpOnly": True,
            }])
            page.context.set_extra_http_headers({"User-Agent": ua})
            page.reload(wait_until="domcontentloaded")
            return
    raise RuntimeError("CapSolver timed out after 180s")


@pytest.fixture(scope="function")
def stealth_page(request):
    raw_proxy = os.environ.get("WEBSHARE_PROXY")
    if not raw_proxy:
        pytest.skip("WEBSHARE_PROXY not set")
    proxy_info = _parse_proxy(raw_proxy)
    start_auth_proxy(
        upstream_host=proxy_info["host"],
        upstream_port=proxy_info["port"],
        username=proxy_info["user"],
        password=proxy_info["pass"],
        listen_port=LOCAL_PROXY_PORT,
    )
    time.sleep(0.3)  # let the listener bind

    with SB(uc=True, headless=False,
            proxy=f"127.0.0.1:{LOCAL_PROXY_PORT}") as sb:
        cdp_port = sb.driver.capabilities["goog:chromeOptions"]["debuggerAddress"]
        with sync_playwright() as pw:
            browser = pw.chromium.connect_over_cdp(f"http://{cdp_port}")
            context = browser.contexts[0]
            page = context.pages[0] if context.pages else context.new_page()
            yield StealthPage(sb=sb, page=page,
                              context=context, proxy_info=proxy_info)


@pytest.fixture
def step(request):
    test_slug = re.sub(r"[^a-z0-9]+", "_", request.node.name.lower()).strip("_")
    folder = SCREENSHOT_ROOT / test_slug
    folder.mkdir(parents=True, exist_ok=True)
    counter = {"n": 0}

    def _step(name: str, page: Page) -> None:
        counter["n"] += 1
        n = counter["n"]
        slug = re.sub(r"[^a-z0-9]+", "_", name.lower()).strip("_")
        print(f"\n===== STEP {n:02d}: {name} =====", flush=True)
        page.screenshot(path=str(folder / f"{n:02d}_{slug}.png"),
                        full_page=True)

    return _step


@pytest.fixture
def random_sleep():
    def _sleep(page: Page, lo: float = 2.0, hi: float = 5.0) -> None:
        close_cookie_popup(page)
        page.wait_for_timeout(int(random.uniform(lo, hi) * 1000))
    return _sleep


def close_cookie_popup(page: Page) -> None:
    selectors = [
        'button:has-text("Accept")',
        'button:has-text("Accept all")',
        'button:has-text("Alle akzeptieren")',
        '[aria-label*="accept" i]',
        '#onetrust-accept-btn-handler',
    ]
    for sel in selectors:
        try:
            btn = page.locator(sel).first
            if btn.is_visible(timeout=500):
                btn.click(timeout=1000)
                return
        except Exception:
            continue
    # JS fallback — some Shopify apps refuse to respect the click
    try:
        page.evaluate("""
            () => {
                const nodes = document.querySelectorAll(
                    'button, [role="button"]'
                );
                for (const n of nodes) {
                    const t = (n.innerText || '').toLowerCase();
                    if (t.includes('accept') || t.includes('akzeptieren')) {
                        n.click();
                        return;
                    }
                }
            }
        """)
    except Exception:
        pass


@pytest.fixture
def human_click():
    def _click(page: Page, selector: str) -> None:
        locator = page.locator(selector).first
        locator.scroll_into_view_if_needed()
        box = locator.bounding_box()
        if box is None:
            locator.click()
            return
        # land somewhere off-center
        fx = random.uniform(0.30, 0.70)
        fy = random.uniform(0.30, 0.70)
        tx = box["x"] + box["width"] * fx
        ty = box["y"] + box["height"] * fy
        steps = random.randint(8, 18)
        page.mouse.move(tx, ty, steps=steps)
        page.wait_for_timeout(random.randint(120, 420))  # hover
        page.mouse.click(tx, ty)
    return _click

A few things worth calling out:

The StealthPage dataclass is frozen. You cannot reassign .page halfway through a test and confuse yourself. Tests that need the raw SeleniumBase driver still have it via stealth_page.sb; tests that only need Playwright use stealth_page.page.

bypass_cloudflare is a no-op when there’s no challenge. You can call it freely after every navigation without paying for a CapSolver task. Only an actual challenge page triggers the paid path.

close_cookie_popup runs inside random_sleep. Cookie dialogs show up unpredictably — after the first paint, after a GDPR geolocation check, sometimes after the first scroll. Tying it to the sleep rhythm means you don’t have to think about it in test code.

4. The test-writing pattern

Before the first test, lock prices and selectors in frozen dataclasses. When the store later moves to an API-driven price, you swap the dataclass; the tests don’t move.

# tests/fixtures/catalog.py
from dataclasses import dataclass


@dataclass(frozen=True)
class Product:
    slug: str
    title: str
    price_eur: str  # "29.95" — compare as string to avoid float drift


@dataclass(frozen=True)
class Selectors:
    add_to_cart: str = 'button[name="add"]'
    cart_drawer: str = '[data-cart-drawer]'
    checkout_button: str = 'button:has-text("Checkout")'
    product_price: str = '[data-product-price]'


HERO_TEE = Product(slug="hero-tee", title="Hero Tee", price_eur="29.95")
SEL = Selectors()

And the rhythm in a test — stephuman_clickrandom_sleep, every action, including the last one:

# tests/test_add_to_cart.py
from playwright.sync_api import expect

from tests.fixtures.catalog import HERO_TEE, SEL
from tests.conftest import bypass_cloudflare

BASE = "https://store.example.com"


def test_add_to_cart_shows_correct_price(stealth_page, step,
                                         random_sleep, human_click):
    sp = stealth_page
    page = sp.page

    step("open home", page)
    page.goto(BASE, wait_until="domcontentloaded")
    bypass_cloudflare(sp.sb, page, BASE, sp.proxy_info)
    random_sleep(page)

    step("navigate to product", page)
    page.goto(f"{BASE}/products/{HERO_TEE.slug}",
              wait_until="domcontentloaded")
    bypass_cloudflare(sp.sb, page, f"{BASE}/products/{HERO_TEE.slug}",
                      sp.proxy_info)
    random_sleep(page)

    step("verify listed price", page)
    expect(page.locator(SEL.product_price)).to_contain_text(
        HERO_TEE.price_eur, timeout=15_000,
    )
    random_sleep(page)

    step("add to cart", page)
    human_click(page, SEL.add_to_cart)
    random_sleep(page)

    step("verify cart drawer", page)
    expect(page.locator(SEL.cart_drawer)).to_contain_text(
        HERO_TEE.title, timeout=15_000,
    )
    expect(page.locator(SEL.cart_drawer)).to_contain_text(
        HERO_TEE.price_eur, timeout=15_000,
    )
    random_sleep(page)

Two things the expect(...).to_contain_text(timeout=...) form buys you: it retries on Playwright’s internal polling loop, so the async DOM settles naturally; and it fails with a readable diff instead of a stale assertion snapshot. Never use locator.inner_text() == "…" for prices on a Shopify store — Shopify’s cart drawer mutates in two passes and you will flake.

5. The human_click helper, explained

Here it is again, isolated, so you can lift it into another project without importing the rest of the file:

def human_click(page, selector):
    locator = page.locator(selector).first
    locator.scroll_into_view_if_needed()
    box = locator.bounding_box()
    if box is None:
        locator.click()
        return
    fx = random.uniform(0.30, 0.70)
    fy = random.uniform(0.30, 0.70)
    tx = box["x"] + box["width"] * fx
    ty = box["y"] + box["height"] * fy
    steps = random.randint(8, 18)
    page.mouse.move(tx, ty, steps=steps)
    page.wait_for_timeout(random.randint(120, 420))
    page.mouse.click(tx, ty)

Twenty lines. Here’s what each part earns:

  • scroll_into_view_if_needed — if the button is below the fold, a real human scrolled to it. Playwright’s own .click() does this, but we’re not calling .click(), we’re calling .mouse.click() at absolute coordinates, which doesn’t.
  • fx, fy ∈ [0.30, 0.70] — nobody clicks dead-center of a button. The 30–70% window keeps you away from the edge while still being off-axis.
  • steps=8..18page.mouse.move(x, y, steps=n) interpolates n intermediate mousemove events between the current position and (x, y). Bot detectors watch the distribution of those events; a single teleport from (0,0) is a dead giveaway.
  • 120..420ms hover — the gap between the mouse arriving and the click firing. Humans hesitate. Bots don’t.

Full Bezier-curve libraries like ghost-cursor do more — acceleration curves, overshoot, micro-corrections. They also add a dependency, a Puppeteer bridge, and a handful of edge cases. Twenty lines of linear interpolation plus a hover covers roughly 80% of the signal for none of the cost. Revisit it only when you have evidence the simple version got caught.

6. CI setup with GitHub Actions

One job, sequential, per-test ::group:: blocks, single artifact upload:

# .github/workflows/e2e.yml
name: e2e

on:
  schedule:
    - cron: "0 6 * * *"
  workflow_dispatch:

jobs:
  run:
    runs-on: ubuntu-latest
    timeout-minutes: 60
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install system deps for Chrome
        run: |
          sudo apt-get update
          sudo apt-get install -y xvfb libnss3 libatk-bridge2.0-0 \
            libgbm1 libasound2

      - name: Install Python deps
        run: |
          pip install -r requirements.txt
          playwright install chromium

      - name: Clean old logs
        run: rm -rf latest_logs && mkdir -p latest_logs/junit

      - name: Run tests (sequential, one proxy)
        env:
          CAPSOLVER_API_KEY: ${{ secrets.CAPSOLVER_API_KEY }}
          WEBSHARE_PROXY: ${{ secrets.WEBSHARE_PROXY }}
          DISPLAY: :99
        run: |
          Xvfb :99 -screen 0 1920x1080x24 &
          sleep 1
          failed=0
          for f in tests/test_*.py; do
            name=$(basename "$f" .py)
            echo "::group::${name}"
            xvfb-run -a pytest "$f" \
              --reruns 1 --reruns-delay 20 \
              --junit-xml="latest_logs/junit/${name}.xml" \
              -v || failed=1
            echo "::endgroup::"
          done
          exit $failed

      - name: Upload logs and screenshots
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: e2e-latest-logs
          path: latest_logs/
          retention-days: 14

The ::group:: wrapping is the quality-of-life upgrade nobody thinks about until they’ve scrolled through 4,000 lines of combined pytest output looking for which test actually broke. With groups, the Actions UI collapses each test to a one-liner; you click only the ones you care about.

The per-file --junit-xml is mandatory, not optional — if every test run writes to the same report.xml, you lose everything but the last one.

Worth spelling out why this beats a matrix strategy: you have one residential IP. Two runners hitting the store simultaneously from the same exit IP doubles your request rate without doubling your credibility, which is the exact failure mode Cloudflare watches for. Sequential is slower but it gets to the end.

7. What’s intentionally not solved

Per-request proxy rotation. The stack uses one static residential IP at any given time, on purpose — but “static” here means static for the duration of a session, not static forever. Webshare’s dashboard exposes a Replace Proxies setting (Proxy Settings → Replace Proxies) that auto-swaps the underlying IP when any of these trigger:

  • proxy unavailable for more than 15 minutes
  • proxy has low country confidence
  • proxy experiencing temporary slowdown

That’s the right layer for rotation. The client keeps pointing at the same Webshare endpoint (p.webshare.io:80), the cf_clearance cookie stays valid for the lifetime of one test run, and IP health is handled by the provider in the background. When Webshare rotates the upstream IP, the next test run gets a fresh cf_clearance on first challenge — no client-side pool logic, no round-robin, no cookie juggling.

What we explicitly don’t do is rotate the proxy per-request from inside the test. That’s the failure mode: Cloudflare issues cf_clearance bound to the IP it saw, you rotate mid-session, the next request arrives from a new IP carrying a clearance cookie that was minted for a different one, Cloudflare invalidates it, and you re-challenge every single navigation. Rotation belongs at the session boundary (provider-side, between runs), not the request boundary.

Rule of thumb for the split:

LayerWho handles itWhen
IP health, dead-proxy replacementWebshare dashboard (Replace Proxies)Background, auto
Per-session IP stickinessWebshare static endpointDuration of one test run
Per-request rotationNobody. Don’t do it.

Full ghost-cursor Bezier curves. The lightweight human_click above is ~20 lines and buys ~80% of the humanness signal. If you find yourself caught specifically on mouse-motion fingerprints — not all Cloudflare fingerprints, not TLS, not headless giveaways — then swap it. Otherwise, leave it.

Parallel test execution. pytest-xdist would cut wall-clock time by 4x. It would also multiply your requests-per-second from one exit IP by 4x, and Cloudflare’s rate limiter is tighter than you think. Sequential with --reruns 1 is the setting that stays green.

8. Conclusion

Total build: about 300 lines of Python across conftest.py, auth_proxy.py, and a small catalog.py, plus one GitHub Actions file. It survives Cloudflare on a live Shopify store, it looks human enough that the challenge rate stays under 10%, and it scales comfortably to around ten flows on a single residential IP.

The pieces are independent on purpose. Lift human_click into a Selenium codebase and it still works. Use bypass_cloudflare with a different test runner and it still works. Keep the auth-bridge proxy around even if you throw the rest out — any time you need authenticated proxying from a tool that can’t do CONNECT auth, it’s the same sixty lines.

Adapt what you need, ignore what you don’t, and check the latest_logs/ artifact on every failed run. The screenshots are almost always faster than reading the stack trace.

aharonyan
aharonyan
Articles: 2

Newsletter Updates

Enter your email address below and subscribe to our newsletter

Leave a Reply

Your email address will not be published. Required fields are marked *