Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network status Careers

hello@oxylabs.io

English (EN)

English

中文

Proxies

Proxies & Advanced Proxy Solutions

Residential Proxies

Human-like scraping without IP blocking

Mobile Proxies

Harness the power of IP addresses from real mobile devices

Rotating ISP Proxies

Extract the required data without the fear of getting blocked

Web Unblocker

AI-powered proxy solution for block-free scraping

Shared Datacenter Proxies

Fast and reliable proxies for cost-effective scraping

Dedicated Datacenter Proxies

The highest performing proxies on the market

Static Residential Proxies

Combined power of Datacenter and Residential IPs

Tools & Addons

Oxy Proxy Extension for Chrome

Free Chrome proxy manager extension that works with any proxy provider.

Oxy Proxy Manager for Android

Free Android proxy manager app that works with any proxy provider.

Proxy RotatorAdd-on

Rotates your Datacenter Proxies to help increase success rates.

Scraper APIs

SERP Scraper APIFREE TRIAL

Scalable SERP data delivery from major search engines

E-Commerce Scraper APIFREE TRIAL

Enterprise-level data from largest e-commerce marketplaces

Real Estate Scraper APIFREE TRIAL

Real-time data from popular real estate websites

Web Scraper APIFREE TRIAL

Public data delivery from a majority of websites

Features

Web Crawler

Discovers all pages on a website and fetches data at scale.

Scheduler

Schedules multiple scraping and parsing jobs at specified frequencies.

Custom Parser

Parses scraped documents by executing given parsing instructions.

Headless BrowserNEW

Render JavaScript and execute browser instructions.

DatasetsNew

Datasets

Company Data

Comprehensive datasets for business profiling

E-Commerce Product Data

Datasets for product catalog insights from E-Commerce stores

Job Postings Data

Datasets for labour market research and insights

Community and Code Data

Datasets for developer community trends

Product Review Data

Fresh datasets for user sentiment analysis

Pricing

Proxies

Residential Proxies

Human-like scraping

Starts from

$10

Pay as you go

Mobile Proxies

3G/4G/5G Mobile Proxies

Starts from

$22

Pay as you go

Rotating ISP Proxies

Extended sessions

Starts from

$340/month

Shared Datacenter Proxies

Cost-effective solution

Starts from

$50/month

Dedicated Datacenter Proxies

Superior performance

Starts from

$50/month

Scraper APIs

SERP Scraper API

Scalable SERP data delivery

Starts from

$49/month

E-Commerce Scraper API

Enterprise-level product page data

Starts from

$49/month

Web Scraper API

Data from a majority of websites

Starts from

$49/month

Real Estate Scraper API

Real-time real estate data

Starts from

$49/month

Advanced Proxy Solutions

Web Unblocker

AI-powered proxy solution

Starts from

$75/month

Learn

Getting Started

Knowledge Base

Read the latest articles about the world of web scraping, proxies, and more

Webinars

Check our webinars to learn more about data gathering issues and solutions

White papers

Get extensive white papers to understand the most complex scraping topics

OxyCon

Join inspiring discussions at Oxylabs’ annual web scraping conference

Scraping Experts

Watch lessons by industry-leading experts to gain insights on data gathering

Useful Information

Quick Start Guides

Featured

Explore tutorials and code samples to build a web scraping infrastructure with Oxylabs solutions.

Solutions

By Industry

E-Commerce

Get access to valuable e-commerce data with the help of advanced scraping solutions

Cybersecurity

Collect threat intelligence and inspect risky activities anonymously with reliable proxies

Brand protection

Monitor the web on a large scale to ensure no unauthorized product seeped into the market

SERP Monitoring

Monitor SERPs to enhance your business strategy

Travel and hospitality

Gather real-time flight and hotel data to and build a solid strategy for your travel business.

By Use Case

View all

By Target

View all

Back to blog

Tutorials Scrapers

How to Bypass CAPTCHA With Playwright

Yelyzaveta Nechytailo

2023-08-144 min read

CAPTCHAs (Completely Automated Public Turing tests to tell Computers and Humans Apart) have become vital to website security. Once the security apparatus of the website becomes suspicious of access (e.g., the access pattern does not follow normal human behavior), it loads a CAPTCHA (e.g., reCAPTCHA, sound, and image puzzles), preventing bots from further access.

Bypassing a CAPTCHA challenge once it loads, can be extremely difficult. However, there are a few methods by which your script can exhibit more human behavior to the web firewall. Thereby, you can completely prevent CAPTCHA from loading. We call this bypassing, or avoiding, a CAPTCHA.

This step-by-step tutorial demonstrates how to use Playwright and Oxylabs’ Web Unblocker to bypass CAPTCHA challenges using Python. The tutorial will also discuss the perks of using Oxylabs’ Web Unblocker instead of the `playwright-stealth` library.

Note: Bypassing CAPTCHAs for illegal or malicious motives violates ethical and legal standards. This tutorial is for educational purposes only, and we encourage readers to thoroughly read the Terms of Services of the target website to avoid legal issues.

Bypass CAPTCHA with Playwright

Playwright provides a robust and user-friendly API to interact with web pages, allowing developers to perform tasks such as clicking elements, filling out forms, and extracting data from dynamic websites. Its support for multiple browsers (like Chromium, Firefox, and WebKit) ensures cross-browser compatibility. Additionally, Playwright’s support for headless mode allows for hidden browser interactions, making it suitable for web scraping tasks.

Bypassing CAPTCHAs with Playwright alone is difficult as websites can detect traffic from automated and headless scripts. Fortunately, the `playwright-stealth` package can help.

Combining the stealth package with Playwright offers a powerful combo to bypass CAPTCHAs. The stealth package helps Playwright’s headless browser instances to appear more human to the websites. Thereby, it helps reduce the chances of being detected by the websites.

Let’s demonstrate bypassing CAPTCHAs by creating a Python script that opens a web link in a headless mode. It then captures the target link's screenshot and saves it in the local file storage. The script is successful if the screenshot shows the actual contents of the page instead of a CAPTCHA or reCAPTCHA screen.

Let’s see a step-by-step procedure to set up the stealth with Playwright in Python and develop any such script.

1. Preliminaries: Install the Playwright library and the stealth package.

pip install playwright playwright-stealth

2. Import the required modules: Use the synchronous version of the Playwright library for a straightforward and linear program flow.

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

3. Create a headless browser instance: Define the `capture_screenshot()` function that encapsulates the whole code to open a headless browser instance, visit the url, and capture the screenshot. In this function, create a new `sync_playwright` instance and then use it to launch the Chromium browser in headless mode.

# Define the function to capture screenshot
def capture_screenshot():
   # Create a playwright instance
   with sync_playwright() as play_wright:
       browser = play_wright.chromium.launch(headless=True)

       # Create a new context and page
       context = browser.new_context()
       page = context.new_page()

4. Apply the stealth settings: After creating the browser context, apply the stealth settings to the page using the `playwright-stealth` package. Stealth settings help in reducing the chances of automated access detection by hiding the browsers’ automated behavior.

# Apply the stealth settings
       stealth_sync(page)

5. Navigate to the page: In the next step, navigate to the target URL by specifying your required URL and navigating to it using the `goto()` page method.

url = "http://sandbox.oxylabs.io/products"
       page.goto(url)

6. Take the screenshot: Wait for the page to load completely, take the screenshot, and close the browser.

# Wait for the webpage to load completely
       page.wait_for_load_state("load")

       # Take a screenshot
       screenshot_filename = "oxylabs_screenshot.png"
       page.screenshot(path=screenshot_filename)

       # Close the browser
       browser.close()

       print("Done! You can check the screenshot...")

7. Call the function:

capture_screenshot()

8. Execute and test: Here is what our complete code looks like:

# Import the required modules
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync



# Define the function to capture the screenshot
def capture_screenshot():
   # Create a playwright instance
   with sync_playwright() as play_wright:
       browser = play_wright.chromium.launch(headless=True)

       # Create a new context and page
       context = browser.new_context()
       page = context.new_page()

       # Apply the stealth settings
       stealth_sync(page)

       # Navigate to the website
       "http://sandbox.oxylabs.io/products"
       page.goto(url)

       # Wait for the webpage to load completely
       page.wait_for_load_state("load")

       # Take a screenshot
       screenshot_filename = "oxylabs_screenshot.png"
       page.screenshot(path=screenshot_filename)

       # Close the browser
       browser.close()

       print("Done! You can check the screenshot...")


capture_screenshot()

Executing the code saves the screenshot. Here is what it looks like in our case:

The screenshot shows the actual content of the page, which means we just bypassed the CAPTCHA or reCAPTCHA from loading on this page.

Bypass CAPTCHA with Web Unblocker

Oxylabs’ Web Unblocker employs advanced AI techniques to help users access publicly available information behind the CAPTCHA. Bypassing CAPTCHAs with our advanced proxy solution is easy. You just need to send a simple query. Web Unblocker will automatically choose the fastest CAPTCHA proxy, attach all essential headers, and return the response HTML bypassing any anti-bots of the target websites.

Here are the steps you must follow to implement a simple web scraping request using Web Unblocker.

1. Create an account : You can create an account on the dashboard with a 7-day free trial.

2. Create API key: After successfully creating your account, you can set your API key and password from the dashboard. This key and password will be used later in the code.

3. Install the required Python modules: You should use a library that can help perform HTTP requests. We will use the `requests` to send HTTP requests to Web Unblocker API and capture the response.

pip install requests

4. Import the required modules: In your Python script file, import the modules using the following import statement:

import requests

5. Define proxy: You can get your proxy links from the documentation of Web Unblocker.

# Define proxy dict. Remember to put your real user and pass here as well.
proxies = {
   "http": "http://YOUR_USERNAME:YOUR_PASSWORD@unblock.oxylabs.io:60000",
   "https": "http://YOUR_USERNAME:YOUR_PASSWORD@unblock.oxylabs.io:60000",
}

6. Define your request: Perform your request by specifying the URL, request type, and proxy by using the following code.

response = requests.request(
   "GET",
   "http://sandbox.oxylabs.io/products",
   verify=False,  # Ignore the certificate
   proxies=proxies,
)

7. Save the response: Write code to print the response and save it in an HTML file.

# Print result page to stdout
print(response.text)

# Save returned HTML to result.html file
with open("result.html", "w") as f:
   f.write(response.text)

8. Execute and check: Execute the code and test the output. If the output HTML file has actual page contents, the script successfully bypassed the CAPTCHA. Here is what our complete code looks like.

# Import the modules
import requests

# Define proxy dict. Don't forget to put your real user and pass here as well.
proxies = {
   "http": "http://YOUR_USERNAME:YOUR_PASSWORD@unblock.oxylabs.io:60000",
   "https": "http://YOUR_USERNAME:YOUR_PASSWORD@unblock.oxylabs.io:60000",
}


response = requests.request(
   "GET",
   "http://sandbox.oxylabs.io/products",
   verify=False,  # Ignore the certificate
   proxies=proxies,
)

# Print result page to stdout
print(response.text)

# Save returned HTML to result.html file
with open("result.html", "w") as f:
   f.write(response.text)

Here is the snapshot of the output HTML displayed on the screen:

Here is a snapshot of how browser renders this HTML:

The above snapshot makes it clear that we accessed the products page without any block.

Conclusion

Playwright, when combined with the `playwright-stealth` package, can effectively be used to scrape content behind the sites with ordinary CAPTCHA protection. Learn more about how to perform web scraping with Playwright and configure Playwright with proxies in our blog posts. If you're still wondering which proxies fit your needs best, get free trial for our premium proxies to make the right decision.

However, bypassing CAPTCHA (e.g., reCAPTCHA) from websites with advanced anti-bots requires a more sophisticated and intelligent bypassing solution. Oxylabs’s Web Unblocker automatically combines the latest AI techniques with bypassing schemes (e.g., proxies and IP rotation, setting realistic fingerprints, and JS rendering) to ditch advanced anti-bots. Therefore, it is a more secure, convenient, and reliable solution for bypassing CAPTCHAs and scraping data at scale.

About the author

Yelyzaveta Nechytailo

Senior Content Manager

Yelyzaveta Nechytailo is a Senior Content Manager at Oxylabs. After working as a writer in fashion, e-commerce, and media, she decided to switch her career path and immerse in the fascinating world of tech. And believe it or not, she absolutely loves it! On weekends, you’ll probably find Yelyzaveta enjoying a cup of matcha at a cozy coffee shop, scrolling through social media, or binge-watching investigative TV series.

Learn more about Yelyzaveta Nechytailo

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Tutorials Scrapers