Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network statusCareers

Back to blog

How to Bypass CAPTCHA With Playwright

How to Bypass CAPTCHA With Playwright

Yelyzaveta Nechytailo

2023-08-144 min read
Share

CAPTCHAs (Completely Automated Public Turing tests to tell Computers and Humans Apart) have become vital to website security. Once the security apparatus of the website becomes suspicious of access (e.g., the access pattern does not follow normal human behavior), it loads a CAPTCHA (e.g., reCAPTCHA, sound, and image puzzles), preventing bots from further access.

Bypassing a CAPTCHA challenge once it loads, can be extremely difficult. However, there are a few methods by which your script can exhibit more human behavior to the web firewall. Thereby, you can completely prevent CAPTCHA from loading. We call this bypassing, or avoiding, a CAPTCHA.

This step-by-step tutorial demonstrates how to use Playwright and Oxylabs’ Web Unblocker to bypass CAPTCHA challenges using Python. The tutorial will also discuss the perks of using Oxylabs’ Web Unblocker instead of the `playwright-stealth` library. 

Note: Bypassing CAPTCHAs for illegal or malicious motives violates ethical and legal standards. This tutorial is for educational purposes only, and we encourage readers to thoroughly read the Terms of Services of the target website to avoid legal issues.

Bypass CAPTCHA with Playwright

Playwright provides a robust and user-friendly API to interact with web pages, allowing developers to perform tasks such as clicking elements, filling out forms, and extracting data from dynamic websites. Its support for multiple browsers (like Chromium, Firefox, and WebKit) ensures cross-browser compatibility. Additionally, Playwright’s support for headless mode allows for hidden browser interactions, making it suitable for web scraping tasks.

Bypassing CAPTCHAs with Playwright alone is difficult as websites can detect traffic from automated and headless scripts. Fortunately, the `playwright-stealth` package can help.

Combining the stealth package with Playwright offers a powerful combo to bypass CAPTCHAs. The stealth package helps Playwright’s headless browser instances to appear more human to the websites. Thereby, it helps reduce the chances of being detected by the websites. 

Let’s demonstrate bypassing CAPTCHAs by creating a Python script that opens a web link in a headless mode. It then captures the target link's screenshot and saves it in the local file storage. The script is successful if the screenshot shows the actual contents of the page instead of a CAPTCHA or reCAPTCHA screen.

Let’s see a step-by-step procedure to set up the stealth with Playwright in Python and develop any such script.

1. Preliminaries: Install the Playwright library and the stealth package.

pip install playwright playwright-stealth

2. Import the required modules: Use the synchronous version of the Playwright library for a straightforward and linear program flow.

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

3. Create a headless browser instance: Define the `capture_screenshot()` function that encapsulates the whole code to open a headless browser instance, visit the url, and capture the screenshot. In this function, create a new `sync_playwright` instance and then use it to launch the Chromium browser in headless mode.

# Define the function to capture screenshot
def capture_screenshot():
   # Create a playwright instance
   with sync_playwright() as play_wright:
       browser = play_wright.chromium.launch(headless=True)

       # Create a new context and page
       context = browser.new_context()
       page = context.new_page()

4. Apply the stealth settings: After creating the browser context, apply the stealth settings to the page using the `playwright-stealth` package. Stealth settings help in reducing the chances of automated access detection by hiding the browsers’ automated behavior.

# Apply the stealth settings
       stealth_sync(page)

5. Navigate to the page: In the next step, navigate to the target URL by specifying your required URL and navigating to it using the `goto()` page method.

url = "http://sandbox.oxylabs.io/products"
       page.goto(url)

6. Take the screenshot: Wait for the page to load completely, take the screenshot, and close the browser.

# Wait for the webpage to load completely
       page.wait_for_load_state("load")

       # Take a screenshot
       screenshot_filename = "oxylabs_screenshot.png"
       page.screenshot(path=screenshot_filename)

       # Close the browser
       browser.close()

       print("Done! You can check the screenshot...")

7. Call the function:

capture_screenshot()

8. Execute and test: Here is what our complete code looks like:

# Import the required modules
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync



# Define the function to capture the screenshot
def capture_screenshot():
   # Create a playwright instance
   with sync_playwright() as play_wright:
       browser = play_wright.chromium.launch(headless=True)

       # Create a new context and page
       context = browser.new_context()
       page = context.new_page()

       # Apply the stealth settings
       stealth_sync(page)

       # Navigate to the website
       "http://sandbox.oxylabs.io/products"
       page.goto(url)

       # Wait for the webpage to load completely
       page.wait_for_load_state("load")

       # Take a screenshot
       screenshot_filename = "oxylabs_screenshot.png"
       page.screenshot(path=screenshot_filename)

       # Close the browser
       browser.close()

       print("Done! You can check the screenshot...")


capture_screenshot()

Executing the code saves the screenshot. Here is what it looks like in our case:

The screenshot shows the actual content of the page, which means we just bypassed the CAPTCHA or reCAPTCHA from loading on this page. 

Bypass CAPTCHA with Web Unblocker

Oxylabs’ Web Unblocker employs advanced AI techniques to help users access publicly available information behind the CAPTCHA. Bypassing CAPTCHAs with our advanced proxy solution is easy. You just need to send a simple query. Web Unblocker will automatically choose the fastest CAPTCHA proxy, attach all essential headers, and return the response HTML bypassing any anti-bots of the target websites.

Here are the steps you must follow to implement a simple web scraping request using Web Unblocker. 

1. Create an account : You can create an account on the dashboard with a 7-day free trial. 

2. Create API key: After successfully creating your account, you can set your API key and password from the dashboard. This key and password will be used later in the code.

3. Install the required Python modules: You should use a library that can help perform HTTP requests. We will use the `requests` to send HTTP requests to Web  Unblocker API and capture the response.

pip install requests

4. Import the required modules: In your Python script file, import the modules using the following import statement:

import requests

5. Define proxy: You can get your proxy links from the documentation of Web Unblocker. 

# Define proxy dict. Remember to put your real user and pass here as well.
proxies = {
   "http": "http://YOUR_USERNAME:YOUR_PASSWORD@unblock.oxylabs.io:60000",
   "https": "http://YOUR_USERNAME:YOUR_PASSWORD@unblock.oxylabs.io:60000",
}

6. Define your request: Perform your request by specifying the URL, request type, and proxy by using the following code.

response = requests.request(
   "GET",
   "http://sandbox.oxylabs.io/products",
   verify=False,  # Ignore the certificate
   proxies=proxies,
)

7. Save the response: Write code to print the response and save it in an HTML file. 

# Print result page to stdout
print(response.text)

# Save returned HTML to result.html file
with open("result.html", "w") as f:
   f.write(response.text)

8. Execute and check: Execute the code and test the output. If the output HTML  file has actual page contents, the script successfully bypassed the CAPTCHA. Here is what our complete code looks like.

# Import the modules
import requests

# Define proxy dict. Don't forget to put your real user and pass here as well.
proxies = {
   "http": "http://YOUR_USERNAME:YOUR_PASSWORD@unblock.oxylabs.io:60000",
   "https": "http://YOUR_USERNAME:YOUR_PASSWORD@unblock.oxylabs.io:60000",
}


response = requests.request(
   "GET",
   "http://sandbox.oxylabs.io/products",
   verify=False,  # Ignore the certificate
   proxies=proxies,
)

# Print result page to stdout
print(response.text)

# Save returned HTML to result.html file
with open("result.html", "w") as f:
   f.write(response.text)

Here is the snapshot of the output HTML displayed on the screen:

Here is a snapshot of how browser renders this HTML:

The above snapshot makes it clear that we accessed the products page without any block.

Conclusion

Playwright, when combined with the `playwright-stealth` package, can effectively be used to scrape content behind the sites with ordinary CAPTCHA protection. Learn more about how to perform web scraping with Playwright and configure Playwright with proxies in our blog posts. If you're still wondering which proxies fit your needs best, get free trial for our premium proxies to make the right decision.

However, bypassing CAPTCHA (e.g., reCAPTCHA) from websites with advanced anti-bots requires a more sophisticated and intelligent bypassing solution. Oxylabs’s Web Unblocker automatically combines the latest AI techniques with bypassing schemes (e.g., proxies and IP rotation, setting realistic fingerprints, and JS rendering) to ditch advanced anti-bots. Therefore, it is a more secure, convenient, and reliable solution for bypassing CAPTCHAs and scraping data at scale.

About the author

Yelyzaveta Nechytailo

Senior Content Manager

Yelyzaveta Nechytailo is a Senior Content Manager at Oxylabs. After working as a writer in fashion, e-commerce, and media, she decided to switch her career path and immerse in the fascinating world of tech. And believe it or not, she absolutely loves it! On weekends, you’ll probably find Yelyzaveta enjoying a cup of matcha at a cozy coffee shop, scrolling through social media, or binge-watching investigative TV series.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Get the latest news from data gathering world

I’m interested

IN THIS ARTICLE:


  • Bypass CAPTCHA with Playwright

  • Bypass CAPTCHA with Web Unblocker

  • Conclusion

Forget about complex web scraping processes

Choose Oxylabs' advanced web intelligence collection solutions to gather real-time public data hassle-free.

Scale up your business with Oxylabs®