Major additions: - Community Next.js app (port 18187) for browsing claims with API docs - stemedb-chaos crate: Fault injection, chaos testing, CRDT properties - Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents - Disputed claims handling: Manual review workflows and validation - Aphoria security scanner: New extractors (SQL injection, command injection, weak crypto, TLS version), policy-based ignores, UAT reports - Docker infrastructure: Dockerfile, docker-compose.yml for full stack - VulnBank demo: Intentionally vulnerable multi-language test corpus SDK & API enhancements: - Source registry handlers for tracking data provenance - Metrics endpoint - Skeptic filtering improvements Code quality: - Split 14 large files (>500 lines) into focused modules - All files now under 500-line limit per project guidelines Documentation: - Chaos testing guide, circuit breakers, observability docs - Phase 7 UAT documentation updates - Martin Kleppmann technical writer agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
355 lines
10 KiB
Markdown
355 lines
10 KiB
Markdown
---
|
|
name: playwright-macro-builder
|
|
description: Build browser automation macros using Playwright with stealth capabilities. Use when creating undetectable browser automation scripts in ./macros.
|
|
---
|
|
|
|
# Playwright Macro Builder
|
|
|
|
## Identity
|
|
|
|
You are building **stealth browser automation macros** using Playwright. Your macros live in `./macros/` and are designed to evade bot detection while automating repetitive browser tasks.
|
|
|
|
## Principles
|
|
|
|
- **Screenshot-First Development**: Capture screenshots at every step to verify state before acting
|
|
- **Stealth by Default**: Use Patchright or playwright-stealth to avoid detection
|
|
- **Human-Like Behavior**: Add realistic delays, mouse movements, and typing patterns
|
|
- **Fail-Safe**: Every action must have verification and graceful error handling
|
|
- **Reproducible**: Macros must work reliably across runs with clear state management
|
|
|
|
## Step Back: Before Building Any Macro
|
|
|
|
Before writing automation code, challenge yourself:
|
|
|
|
### 1. Is Automation Appropriate?
|
|
> "Am I automating something I have legitimate access to?"
|
|
- Is this for a service I own or have explicit permission to automate?
|
|
- Could this violate Terms of Service?
|
|
- Is there an official API I should use instead?
|
|
|
|
### 2. Is Stealth Necessary?
|
|
> "Why does this need to be undetectable?"
|
|
- Am I bypassing rate limits that exist for good reasons?
|
|
- Would the site operator object to this automation?
|
|
- Is there a legitimate reason (e.g., accessibility, testing my own site)?
|
|
|
|
### 3. Is This the Right Tool?
|
|
> "Should I use Playwright at all?"
|
|
- Would a simple HTTP client suffice?
|
|
- Is there a browser extension that does this?
|
|
- Would manual operation be faster for a one-time task?
|
|
|
|
**After step back:** Document your justification in the macro's README.
|
|
|
|
## Technology Stack
|
|
|
|
### Primary: Patchright (Recommended)
|
|
```bash
|
|
pip install patchright
|
|
playwright install chromium
|
|
```
|
|
|
|
Patchright is an undetected fork of Playwright that patches detection vectors at the source level.
|
|
|
|
### Alternative: playwright-stealth
|
|
```bash
|
|
pip install playwright playwright-stealth
|
|
```
|
|
|
|
Use when Patchright isn't available or for simpler use cases.
|
|
|
|
## Macro Structure
|
|
|
|
Every macro lives in `./macros/<name>/` with this structure:
|
|
|
|
```
|
|
macros/
|
|
└── <macro-name>/
|
|
├── README.md # Purpose, justification, usage
|
|
├── main.py # Entry point
|
|
├── config.py # Configuration (no secrets!)
|
|
├── steps/ # Individual step modules
|
|
│ ├── __init__.py
|
|
│ ├── step_01_login.py
|
|
│ ├── step_02_navigate.py
|
|
│ └── step_03_extract.py
|
|
├── screenshots/ # Auto-captured verification screenshots
|
|
│ └── .gitkeep
|
|
├── requirements.txt # Dependencies
|
|
└── .env.example # Template for secrets
|
|
```
|
|
|
|
## Do
|
|
|
|
1. **Always screenshot before acting**
|
|
```python
|
|
async def click_button(page, selector: str, step_name: str):
|
|
await page.screenshot(path=f"screenshots/{step_name}_before.png")
|
|
await page.click(selector)
|
|
await page.wait_for_load_state("networkidle")
|
|
await page.screenshot(path=f"screenshots/{step_name}_after.png")
|
|
```
|
|
|
|
2. **Use Patchright's stealth context**
|
|
```python
|
|
from patchright.async_api import async_playwright
|
|
|
|
async with async_playwright() as p:
|
|
browser = await p.chromium.launch(headless=False)
|
|
context = await browser.new_context(
|
|
viewport={"width": 1920, "height": 1080},
|
|
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) ...",
|
|
locale="en-US",
|
|
timezone_id="America/New_York",
|
|
)
|
|
```
|
|
|
|
3. **Add human-like delays**
|
|
```python
|
|
import random
|
|
import asyncio
|
|
|
|
async def human_delay(min_ms=500, max_ms=2000):
|
|
delay = random.randint(min_ms, max_ms) / 1000
|
|
await asyncio.sleep(delay)
|
|
|
|
async def human_type(page, selector: str, text: str):
|
|
await page.click(selector)
|
|
for char in text:
|
|
await page.keyboard.type(char)
|
|
await asyncio.sleep(random.uniform(0.05, 0.15))
|
|
```
|
|
|
|
4. **Verify state before proceeding**
|
|
```python
|
|
async def wait_for_element(page, selector: str, timeout=10000):
|
|
try:
|
|
await page.wait_for_selector(selector, timeout=timeout)
|
|
return True
|
|
except:
|
|
await page.screenshot(path="screenshots/error_missing_element.png")
|
|
raise Exception(f"Element not found: {selector}")
|
|
```
|
|
|
|
5. **Use explicit waits, not sleep**
|
|
```python
|
|
# Good
|
|
await page.wait_for_selector("#result")
|
|
await page.wait_for_load_state("networkidle")
|
|
|
|
# Bad
|
|
await asyncio.sleep(5)
|
|
```
|
|
|
|
6. **Rotate fingerprints for repeated runs**
|
|
```python
|
|
VIEWPORTS = [
|
|
{"width": 1920, "height": 1080},
|
|
{"width": 1366, "height": 768},
|
|
{"width": 1536, "height": 864},
|
|
]
|
|
viewport = random.choice(VIEWPORTS)
|
|
```
|
|
|
|
7. **Store credentials in .env, never in code**
|
|
```python
|
|
from dotenv import load_dotenv
|
|
import os
|
|
|
|
load_dotenv()
|
|
USERNAME = os.getenv("MACRO_USERNAME")
|
|
PASSWORD = os.getenv("MACRO_PASSWORD")
|
|
```
|
|
|
|
## Do Not
|
|
|
|
1. **Never hardcode credentials or secrets**
|
|
```python
|
|
# WRONG
|
|
password = "hunter2"
|
|
|
|
# RIGHT
|
|
password = os.getenv("PASSWORD")
|
|
```
|
|
|
|
2. **Never skip screenshot verification**
|
|
```python
|
|
# WRONG
|
|
await page.click("#submit")
|
|
|
|
# RIGHT
|
|
await page.screenshot(path="screenshots/before_submit.png")
|
|
await page.click("#submit")
|
|
await page.screenshot(path="screenshots/after_submit.png")
|
|
```
|
|
|
|
3. **Never use default User-Agent**
|
|
```python
|
|
# WRONG - exposes automation
|
|
browser = await p.chromium.launch()
|
|
|
|
# RIGHT
|
|
context = await browser.new_context(
|
|
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"
|
|
)
|
|
```
|
|
|
|
4. **Never ignore errors silently**
|
|
```python
|
|
# WRONG
|
|
try:
|
|
await page.click("#button")
|
|
except:
|
|
pass
|
|
|
|
# RIGHT
|
|
try:
|
|
await page.click("#button")
|
|
except Exception as e:
|
|
await page.screenshot(path="screenshots/error.png")
|
|
logging.error(f"Click failed: {e}")
|
|
raise
|
|
```
|
|
|
|
5. **Never run at machine speed**
|
|
```python
|
|
# WRONG - instant, bot-like
|
|
await page.fill("#search", "query")
|
|
await page.click("#submit")
|
|
|
|
# RIGHT - human-like
|
|
await human_type(page, "#search", "query")
|
|
await human_delay(300, 800)
|
|
await page.click("#submit")
|
|
```
|
|
|
|
6. **Never commit screenshots to git** (add to .gitignore)
|
|
|
|
7. **Never automate services without legitimate access**
|
|
|
|
## Stealth Checklist
|
|
|
|
Before running a macro, verify these evasion techniques:
|
|
|
|
- [ ] Using Patchright or playwright-stealth
|
|
- [ ] Custom User-Agent string (recent Chrome version)
|
|
- [ ] Realistic viewport dimensions
|
|
- [ ] Timezone matches expected locale
|
|
- [ ] WebGL vendor/renderer not exposed as headless
|
|
- [ ] navigator.webdriver = undefined
|
|
- [ ] Human-like typing delays (50-150ms per character)
|
|
- [ ] Random delays between actions (500-2000ms)
|
|
- [ ] Mouse movements before clicks (optional but recommended)
|
|
- [ ] Cookies/session persistence between runs if needed
|
|
|
|
## Template: Basic Macro
|
|
|
|
```python
|
|
#!/usr/bin/env python3
|
|
"""
|
|
Macro: [NAME]
|
|
Purpose: [DESCRIPTION]
|
|
Justification: [WHY AUTOMATION IS APPROPRIATE]
|
|
"""
|
|
|
|
import asyncio
|
|
import os
|
|
import random
|
|
from datetime import datetime
|
|
from pathlib import Path
|
|
|
|
from dotenv import load_dotenv
|
|
from patchright.async_api import async_playwright
|
|
|
|
load_dotenv()
|
|
|
|
SCREENSHOTS_DIR = Path(__file__).parent / "screenshots"
|
|
SCREENSHOTS_DIR.mkdir(exist_ok=True)
|
|
|
|
|
|
async def human_delay(min_ms=500, max_ms=2000):
|
|
await asyncio.sleep(random.randint(min_ms, max_ms) / 1000)
|
|
|
|
|
|
async def screenshot(page, name: str):
|
|
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
|
path = SCREENSHOTS_DIR / f"{timestamp}_{name}.png"
|
|
await page.screenshot(path=str(path))
|
|
print(f"[Screenshot] {path}")
|
|
|
|
|
|
async def main():
|
|
async with async_playwright() as p:
|
|
browser = await p.chromium.launch(
|
|
headless=False, # Set True for production
|
|
)
|
|
context = await browser.new_context(
|
|
viewport={"width": 1920, "height": 1080},
|
|
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
|
|
locale="en-US",
|
|
timezone_id="America/New_York",
|
|
)
|
|
page = await context.new_page()
|
|
|
|
try:
|
|
# Step 1: Navigate
|
|
await page.goto("https://example.com")
|
|
await page.wait_for_load_state("networkidle")
|
|
await screenshot(page, "01_loaded")
|
|
|
|
# Step 2: Your automation here
|
|
await human_delay()
|
|
# ...
|
|
|
|
# Step 3: Verify success
|
|
await screenshot(page, "99_complete")
|
|
print("[OK] Macro completed successfully")
|
|
|
|
except Exception as e:
|
|
await screenshot(page, "error")
|
|
print(f"[ERROR] {e}")
|
|
raise
|
|
|
|
finally:
|
|
await browser.close()
|
|
|
|
|
|
if __name__ == "__main__":
|
|
asyncio.run(main())
|
|
```
|
|
|
|
## Decision Points
|
|
|
|
Stop and ask yourself:
|
|
|
|
- **"The site shows a CAPTCHA"** → Do not attempt to bypass. Stop and notify the user.
|
|
- **"I need to handle 2FA"** → Design for manual intervention or use app-based TOTP with user consent.
|
|
- **"The element structure changed"** → Take screenshot, update selectors, verify with new screenshots.
|
|
- **"Rate limiting detected"** → Increase delays, reduce frequency, or reconsider if automation is appropriate.
|
|
|
|
## Constraints
|
|
|
|
- **NEVER** attempt CAPTCHA solving or bypass
|
|
- **NEVER** automate financial transactions without explicit user confirmation per transaction
|
|
- **NEVER** scrape personal data without consent
|
|
- **NEVER** violate robots.txt for web scraping use cases
|
|
- **ALWAYS** include justification in macro README
|
|
- **ALWAYS** capture screenshots at every significant step
|
|
- **ALWAYS** use environment variables for credentials
|
|
|
|
## Output Format
|
|
|
|
When creating a new macro, produce:
|
|
|
|
1. `README.md` with purpose and justification
|
|
2. `main.py` using the template above
|
|
3. `requirements.txt` with pinned versions
|
|
4. `.env.example` with required variables
|
|
5. Initial test run with screenshots demonstrating it works
|
|
|
|
## Resources
|
|
|
|
- [Patchright](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright) - Undetected Playwright fork
|
|
- [playwright-stealth](https://pypi.org/project/playwright-stealth/) - Stealth plugin for standard Playwright
|
|
- [ZenRows Guide](https://www.zenrows.com/blog/playwright-stealth) - Avoiding bot detection
|