You point Playwright at a Shopify store fronted by Cloudflare. You get a 403 or a permanent challenge page. You try headless=False. Same result. You add a user agent. Same result. You add a residential proxy. Now you get rate-limited instead of blocked, which is progress of a sort.
This is a build log for the stack that got us past all of that — about 300 lines of Python and one GitHub Actions file, running ~10 checkout and catalog flows against a live store every day without tripping.
1. The problem
Four things are fighting you, and they compound:
Your Chrome looks wrong. Vanilla Playwright ships a Chromium build with navigator.webdriver = true, missing chrome.runtime, mismatched Navigator.permissions, and a suspicious WebGL vendor string. Cloudflare’s challenges.cloudflare.com/turnstile runs a few hundred of these checks in under a second. You lose before your first page.goto.
Your mouse doesn’t exist. A real user lands somewhere random on a button, often drifts past it, hovers for 200ms, then clicks. Playwright’s .click() teleports the cursor to the exact center of the element’s bounding box and fires a mousedown in the same frame. Bot detection systems have been pattern-matching that for years.
Your IP is hot. Datacenter IPs from AWS, GCP, Hetzner — all pre-flagged. Even free residential proxies arrive pre-burned because ten thousand other scrapers used them yesterday.
Your rotation is worse than your static IP. The naive fix — rotate the proxy on every request — is what tips Cloudflare from “mildly suspicious” to “definitely a bot.” Cloudflare’s cf_clearance cookie is bound to the IP it was issued to. Rotating nukes the clearance every time.
The stack below addresses all four.
2. The architecture
┌─────────────────────────────────────────────────────────────┐ │ pytest + Playwright sync API │ │ │ │ │ │ (CDP over ws://127.0.0.1:<port>) │ │ ▼ │ │ SeleniumBase (uc=True) Chrome — stealth fingerprint │ │ │ │ │ │ (HTTP CONNECT) │ │ ▼ │ │ Local auth-bridge proxy 127.0.0.1:18888 │ │ │ │ │ │ (basic auth injected) │ │ ▼ │ │ Webshare static residential proxy │ │ (auto-replaced on health signals — unavailable >15min, │ │ low country confidence, slowdown — via dashboard │ │ Replace Proxies setting) │ │ │ │ │ ▼ │ │ Shopify store (behind Cloudflare) │ └─────────────────────────────────────────────────────────────┘
A few deliberate choices here:
- SeleniumBase drives Chrome, Playwright drives the test. SeleniumBase’s
uc=Truemode handles undetected-chromedriver patching, argument scrubbing, and CDP flags. Once Chrome is up, we connect Playwright to the existing browser over CDP and throw the SeleniumBase driver away except for its Cloudflare solver. The test code gets Playwright’s much nicer API. - Static residential, with provider-side auto-replace. One Webshare residential endpoint. The
cf_clearancecookie sticks to it for the lifetime of a session. Rotation happens out of band at the dashboard layer, not per-request from the client. - Local auth bridge. Chrome cannot handle HTTP proxy auth via CDP. You cannot pass
user:pass@host:portthrough--proxy-server. The workaround is a tiny local forwarder that listens on127.0.0.1unauthenticated, injects basic auth, and forwards to Webshare. - Two-tier Cloudflare bypass. SeleniumBase has a free built-in Turnstile clicker (
sb.uc_gui_click_captcha()). It works maybe 70% of the time. For the other 30%, we call CapSolver’sAntiCloudflareTaskand inject the returnedcf_clearance. CapSolver is one option; 2Captcha, Anti-Captcha, and a few others expose equivalent APIs.
3. Setup walkthrough
Dependencies
# requirements.txt
seleniumbase==4.32.7
playwright==1.47.0
pytest==8.3.3
pytest-rerunfailures==14.0
python-dotenv==1.0.1
requests==2.32.3
pip install -r requirements.txt
playwright install chromium
Environment
# .env
CAPSOLVER_API_KEY=CAP-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
WEBSHARE_PROXY=user-session-abc123:[email protected]:80
Only one proxy, one key. If either is missing, the fixtures skip the run with a clear message instead of falling back to a clean browser and pretending everything is fine.
The local auth-bridge proxy (appendix, but you need it)
Chrome can’t do HTTP CONNECT with basic auth via command-line flags. This small forwarder sits between Chrome and Webshare and adds the Proxy-Authorization header for you:
# tests/support/auth_proxy.py
import base64
import select
import socket
import threading
def start_auth_proxy(upstream_host: str, upstream_port: int,
username: str, password: str,
listen_port: int = 18888) -> threading.Thread:
creds = base64.b64encode(f"{username}:{password}".encode()).decode()
def handle(client: socket.socket) -> None:
try:
request = b""
while b"\r\n\r\n" not in request:
chunk = client.recv(4096)
if not chunk:
return
request += chunk
# inject Proxy-Authorization if missing
if b"Proxy-Authorization:" not in request:
headers_end = request.find(b"\r\n\r\n")
injected = (
request[:headers_end]
+ f"\r\nProxy-Authorization: Basic {creds}".encode()
+ request[headers_end:]
)
request = injected
upstream = socket.create_connection((upstream_host, upstream_port))
upstream.sendall(request)
pipe(client, upstream)
except Exception:
pass
finally:
client.close()
def pipe(a: socket.socket, b: socket.socket) -> None:
sockets = [a, b]
while True:
r, _, _ = select.select(sockets, [], [], 30)
if not r:
break
for s in r:
data = s.recv(8192)
if not data:
return
(b if s is a else a).sendall(data)
def serve() -> None:
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server.bind(("127.0.0.1", listen_port))
server.listen(16)
while True:
client, _ = server.accept()
threading.Thread(target=handle, args=(client,), daemon=True).start()
t = threading.Thread(target=serve, daemon=True)
t.start()
return t
It’s sixty lines, it handles CONNECT tunnelling, and you’ll never think about it again.
conftest.py — the real work
# tests/conftest.py
import os
import random
import re
import time
from dataclasses import dataclass
from pathlib import Path
from typing import Any
import pytest
import requests
from dotenv import load_dotenv
from playwright.sync_api import Page, BrowserContext, expect, sync_playwright
from seleniumbase import SB
from tests.support.auth_proxy import start_auth_proxy
load_dotenv()
SCREENSHOT_ROOT = Path("latest_logs/screenshots")
LOCAL_PROXY_PORT = 18888
@dataclass(frozen=True)
class StealthPage:
sb: Any
page: Page
context: BrowserContext
proxy_info: dict
def _parse_proxy(raw: str) -> dict:
# user:pass@host:port
auth, host = raw.split("@", 1)
user, pw = auth.split(":", 1)
hostname, port = host.split(":", 1)
return {"host": hostname, "port": int(port), "user": user, "pass": pw}
def bypass_cloudflare(sb: Any, page: Page, target_url: str,
proxy_info: dict) -> None:
challenge_markers = [
"challenge-platform",
"cf-challenge-running",
"Just a moment",
]
title = page.title() or ""
html = page.content()
challenged = any(m in title or m in html for m in challenge_markers)
if not challenged:
return
# Tier 1 — SeleniumBase free solver
try:
sb.uc_gui_click_captcha()
page.wait_for_timeout(4000)
if "challenge-platform" not in page.content():
return
except Exception:
pass
# Tier 2 — CapSolver AntiCloudflareTask
api_key = os.environ["CAPSOLVER_API_KEY"]
create = requests.post(
"https://api.capsolver.com/createTask",
json={
"clientKey": api_key,
"task": {
"type": "AntiCloudflareTask",
"websiteURL": target_url,
"proxy": f"http:{proxy_info['host']}:{proxy_info['port']}"
f":{proxy_info['user']}:{proxy_info['pass']}",
},
},
timeout=30,
).json()
task_id = create.get("taskId")
if not task_id:
raise RuntimeError(f"CapSolver createTask failed: {create}")
deadline = time.time() + 180
while time.time() < deadline:
time.sleep(4)
result = requests.post(
"https://api.capsolver.com/getTaskResult",
json={"clientKey": api_key, "taskId": task_id},
timeout=30,
).json()
if result.get("status") == "ready":
solution = result["solution"]
clearance = solution["cookies"]["cf_clearance"]
ua = solution["userAgent"]
page.context.add_cookies([{
"name": "cf_clearance",
"value": clearance,
"domain": solution.get("domain", ".shopify.com"),
"path": "/",
"secure": True,
"httpOnly": True,
}])
page.context.set_extra_http_headers({"User-Agent": ua})
page.reload(wait_until="domcontentloaded")
return
raise RuntimeError("CapSolver timed out after 180s")
@pytest.fixture(scope="function")
def stealth_page(request):
raw_proxy = os.environ.get("WEBSHARE_PROXY")
if not raw_proxy:
pytest.skip("WEBSHARE_PROXY not set")
proxy_info = _parse_proxy(raw_proxy)
start_auth_proxy(
upstream_host=proxy_info["host"],
upstream_port=proxy_info["port"],
username=proxy_info["user"],
password=proxy_info["pass"],
listen_port=LOCAL_PROXY_PORT,
)
time.sleep(0.3) # let the listener bind
with SB(uc=True, headless=False,
proxy=f"127.0.0.1:{LOCAL_PROXY_PORT}") as sb:
cdp_port = sb.driver.capabilities["goog:chromeOptions"]["debuggerAddress"]
with sync_playwright() as pw:
browser = pw.chromium.connect_over_cdp(f"http://{cdp_port}")
context = browser.contexts[0]
page = context.pages[0] if context.pages else context.new_page()
yield StealthPage(sb=sb, page=page,
context=context, proxy_info=proxy_info)
@pytest.fixture
def step(request):
test_slug = re.sub(r"[^a-z0-9]+", "_", request.node.name.lower()).strip("_")
folder = SCREENSHOT_ROOT / test_slug
folder.mkdir(parents=True, exist_ok=True)
counter = {"n": 0}
def _step(name: str, page: Page) -> None:
counter["n"] += 1
n = counter["n"]
slug = re.sub(r"[^a-z0-9]+", "_", name.lower()).strip("_")
print(f"\n===== STEP {n:02d}: {name} =====", flush=True)
page.screenshot(path=str(folder / f"{n:02d}_{slug}.png"),
full_page=True)
return _step
@pytest.fixture
def random_sleep():
def _sleep(page: Page, lo: float = 2.0, hi: float = 5.0) -> None:
close_cookie_popup(page)
page.wait_for_timeout(int(random.uniform(lo, hi) * 1000))
return _sleep
def close_cookie_popup(page: Page) -> None:
selectors = [
'button:has-text("Accept")',
'button:has-text("Accept all")',
'button:has-text("Alle akzeptieren")',
'[aria-label*="accept" i]',
'#onetrust-accept-btn-handler',
]
for sel in selectors:
try:
btn = page.locator(sel).first
if btn.is_visible(timeout=500):
btn.click(timeout=1000)
return
except Exception:
continue
# JS fallback — some Shopify apps refuse to respect the click
try:
page.evaluate("""
() => {
const nodes = document.querySelectorAll(
'button, [role="button"]'
);
for (const n of nodes) {
const t = (n.innerText || '').toLowerCase();
if (t.includes('accept') || t.includes('akzeptieren')) {
n.click();
return;
}
}
}
""")
except Exception:
pass
@pytest.fixture
def human_click():
def _click(page: Page, selector: str) -> None:
locator = page.locator(selector).first
locator.scroll_into_view_if_needed()
box = locator.bounding_box()
if box is None:
locator.click()
return
# land somewhere off-center
fx = random.uniform(0.30, 0.70)
fy = random.uniform(0.30, 0.70)
tx = box["x"] + box["width"] * fx
ty = box["y"] + box["height"] * fy
steps = random.randint(8, 18)
page.mouse.move(tx, ty, steps=steps)
page.wait_for_timeout(random.randint(120, 420)) # hover
page.mouse.click(tx, ty)
return _click
A few things worth calling out:
The StealthPage dataclass is frozen. You cannot reassign .page halfway through a test and confuse yourself. Tests that need the raw SeleniumBase driver still have it via stealth_page.sb; tests that only need Playwright use stealth_page.page.
bypass_cloudflare is a no-op when there’s no challenge. You can call it freely after every navigation without paying for a CapSolver task. Only an actual challenge page triggers the paid path.
close_cookie_popup runs inside random_sleep. Cookie dialogs show up unpredictably — after the first paint, after a GDPR geolocation check, sometimes after the first scroll. Tying it to the sleep rhythm means you don’t have to think about it in test code.
4. The test-writing pattern
Before the first test, lock prices and selectors in frozen dataclasses. When the store later moves to an API-driven price, you swap the dataclass; the tests don’t move.
# tests/fixtures/catalog.py
from dataclasses import dataclass
@dataclass(frozen=True)
class Product:
slug: str
title: str
price_eur: str # "29.95" — compare as string to avoid float drift
@dataclass(frozen=True)
class Selectors:
add_to_cart: str = 'button[name="add"]'
cart_drawer: str = '[data-cart-drawer]'
checkout_button: str = 'button:has-text("Checkout")'
product_price: str = '[data-product-price]'
HERO_TEE = Product(slug="hero-tee", title="Hero Tee", price_eur="29.95")
SEL = Selectors()
And the rhythm in a test — step → human_click → random_sleep, every action, including the last one:
# tests/test_add_to_cart.py
from playwright.sync_api import expect
from tests.fixtures.catalog import HERO_TEE, SEL
from tests.conftest import bypass_cloudflare
BASE = "https://store.example.com"
def test_add_to_cart_shows_correct_price(stealth_page, step,
random_sleep, human_click):
sp = stealth_page
page = sp.page
step("open home", page)
page.goto(BASE, wait_until="domcontentloaded")
bypass_cloudflare(sp.sb, page, BASE, sp.proxy_info)
random_sleep(page)
step("navigate to product", page)
page.goto(f"{BASE}/products/{HERO_TEE.slug}",
wait_until="domcontentloaded")
bypass_cloudflare(sp.sb, page, f"{BASE}/products/{HERO_TEE.slug}",
sp.proxy_info)
random_sleep(page)
step("verify listed price", page)
expect(page.locator(SEL.product_price)).to_contain_text(
HERO_TEE.price_eur, timeout=15_000,
)
random_sleep(page)
step("add to cart", page)
human_click(page, SEL.add_to_cart)
random_sleep(page)
step("verify cart drawer", page)
expect(page.locator(SEL.cart_drawer)).to_contain_text(
HERO_TEE.title, timeout=15_000,
)
expect(page.locator(SEL.cart_drawer)).to_contain_text(
HERO_TEE.price_eur, timeout=15_000,
)
random_sleep(page)
Two things the expect(...).to_contain_text(timeout=...) form buys you: it retries on Playwright’s internal polling loop, so the async DOM settles naturally; and it fails with a readable diff instead of a stale assertion snapshot. Never use locator.inner_text() == "…" for prices on a Shopify store — Shopify’s cart drawer mutates in two passes and you will flake.
5. The human_click helper, explained
Here it is again, isolated, so you can lift it into another project without importing the rest of the file:
def human_click(page, selector):
locator = page.locator(selector).first
locator.scroll_into_view_if_needed()
box = locator.bounding_box()
if box is None:
locator.click()
return
fx = random.uniform(0.30, 0.70)
fy = random.uniform(0.30, 0.70)
tx = box["x"] + box["width"] * fx
ty = box["y"] + box["height"] * fy
steps = random.randint(8, 18)
page.mouse.move(tx, ty, steps=steps)
page.wait_for_timeout(random.randint(120, 420))
page.mouse.click(tx, ty)
Twenty lines. Here’s what each part earns:
scroll_into_view_if_needed— if the button is below the fold, a real human scrolled to it. Playwright’s own.click()does this, but we’re not calling.click(), we’re calling.mouse.click()at absolute coordinates, which doesn’t.fx, fy ∈ [0.30, 0.70]— nobody clicks dead-center of a button. The 30–70% window keeps you away from the edge while still being off-axis.steps=8..18—page.mouse.move(x, y, steps=n)interpolatesnintermediatemousemoveevents between the current position and(x, y). Bot detectors watch the distribution of those events; a single teleport from(0,0)is a dead giveaway.120..420ms hover— the gap between the mouse arriving and the click firing. Humans hesitate. Bots don’t.
Full Bezier-curve libraries like ghost-cursor do more — acceleration curves, overshoot, micro-corrections. They also add a dependency, a Puppeteer bridge, and a handful of edge cases. Twenty lines of linear interpolation plus a hover covers roughly 80% of the signal for none of the cost. Revisit it only when you have evidence the simple version got caught.
6. CI setup with GitHub Actions
One job, sequential, per-test ::group:: blocks, single artifact upload:
# .github/workflows/e2e.yml
name: e2e
on:
schedule:
- cron: "0 6 * * *"
workflow_dispatch:
jobs:
run:
runs-on: ubuntu-latest
timeout-minutes: 60
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install system deps for Chrome
run: |
sudo apt-get update
sudo apt-get install -y xvfb libnss3 libatk-bridge2.0-0 \
libgbm1 libasound2
- name: Install Python deps
run: |
pip install -r requirements.txt
playwright install chromium
- name: Clean old logs
run: rm -rf latest_logs && mkdir -p latest_logs/junit
- name: Run tests (sequential, one proxy)
env:
CAPSOLVER_API_KEY: ${{ secrets.CAPSOLVER_API_KEY }}
WEBSHARE_PROXY: ${{ secrets.WEBSHARE_PROXY }}
DISPLAY: :99
run: |
Xvfb :99 -screen 0 1920x1080x24 &
sleep 1
failed=0
for f in tests/test_*.py; do
name=$(basename "$f" .py)
echo "::group::${name}"
xvfb-run -a pytest "$f" \
--reruns 1 --reruns-delay 20 \
--junit-xml="latest_logs/junit/${name}.xml" \
-v || failed=1
echo "::endgroup::"
done
exit $failed
- name: Upload logs and screenshots
if: always()
uses: actions/upload-artifact@v4
with:
name: e2e-latest-logs
path: latest_logs/
retention-days: 14
The ::group:: wrapping is the quality-of-life upgrade nobody thinks about until they’ve scrolled through 4,000 lines of combined pytest output looking for which test actually broke. With groups, the Actions UI collapses each test to a one-liner; you click only the ones you care about.
The per-file --junit-xml is mandatory, not optional — if every test run writes to the same report.xml, you lose everything but the last one.
Worth spelling out why this beats a matrix strategy: you have one residential IP. Two runners hitting the store simultaneously from the same exit IP doubles your request rate without doubling your credibility, which is the exact failure mode Cloudflare watches for. Sequential is slower but it gets to the end.
7. What’s intentionally not solved
Per-request proxy rotation. The stack uses one static residential IP at any given time, on purpose — but “static” here means static for the duration of a session, not static forever. Webshare’s dashboard exposes a Replace Proxies setting (Proxy Settings → Replace Proxies) that auto-swaps the underlying IP when any of these trigger:
- proxy unavailable for more than 15 minutes
- proxy has low country confidence
- proxy experiencing temporary slowdown
That’s the right layer for rotation. The client keeps pointing at the same Webshare endpoint (p.webshare.io:80), the cf_clearance cookie stays valid for the lifetime of one test run, and IP health is handled by the provider in the background. When Webshare rotates the upstream IP, the next test run gets a fresh cf_clearance on first challenge — no client-side pool logic, no round-robin, no cookie juggling.
What we explicitly don’t do is rotate the proxy per-request from inside the test. That’s the failure mode: Cloudflare issues cf_clearance bound to the IP it saw, you rotate mid-session, the next request arrives from a new IP carrying a clearance cookie that was minted for a different one, Cloudflare invalidates it, and you re-challenge every single navigation. Rotation belongs at the session boundary (provider-side, between runs), not the request boundary.
Rule of thumb for the split:
| Layer | Who handles it | When |
|---|---|---|
| IP health, dead-proxy replacement | Webshare dashboard (Replace Proxies) | Background, auto |
| Per-session IP stickiness | Webshare static endpoint | Duration of one test run |
| Per-request rotation | Nobody. Don’t do it. | — |
Full ghost-cursor Bezier curves. The lightweight human_click above is ~20 lines and buys ~80% of the humanness signal. If you find yourself caught specifically on mouse-motion fingerprints — not all Cloudflare fingerprints, not TLS, not headless giveaways — then swap it. Otherwise, leave it.
Parallel test execution. pytest-xdist would cut wall-clock time by 4x. It would also multiply your requests-per-second from one exit IP by 4x, and Cloudflare’s rate limiter is tighter than you think. Sequential with --reruns 1 is the setting that stays green.
8. Conclusion
Total build: about 300 lines of Python across conftest.py, auth_proxy.py, and a small catalog.py, plus one GitHub Actions file. It survives Cloudflare on a live Shopify store, it looks human enough that the challenge rate stays under 10%, and it scales comfortably to around ten flows on a single residential IP.
The pieces are independent on purpose. Lift human_click into a Selenium codebase and it still works. Use bypass_cloudflare with a different test runner and it still works. Keep the auth-bridge proxy around even if you throw the rest out — any time you need authenticated proxying from a tool that can’t do CONNECT auth, it’s the same sixty lines.
Adapt what you need, ignore what you don’t, and check the latest_logs/ artifact on every failed run. The screenshots are almost always faster than reading the stack trace.

