Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network status Careers

hello@oxylabs.io

English (EN)

English

中文

Proxies

Proxies & Advanced Proxy Solutions

Residential Proxies

Human-like scraping without IP blocking

Mobile Proxies

Harness the power of IP addresses from real mobile devices

Rotating ISP Proxies

Extract the required data without the fear of getting blocked

Web Unblocker

AI-powered proxy solution for block-free scraping

Shared Datacenter Proxies

Fast and reliable proxies for cost-effective scraping

Dedicated Datacenter Proxies

The highest performing proxies on the market

Static Residential Proxies

Combined power of Datacenter and Residential IPs

Tools & Addons

Oxy Proxy Extension for Chrome

Free Chrome proxy manager extension that works with any proxy provider.

Oxy Proxy Manager for Android

Free Android proxy manager app that works with any proxy provider.

Proxy RotatorAdd-on

Rotates your Datacenter Proxies to help increase success rates.

Scraper APIs

SERP Scraper APIFREE TRIAL

Scalable SERP data delivery from major search engines

E-Commerce Scraper APIFREE TRIAL

Enterprise-level data from largest e-commerce marketplaces

Real Estate Scraper APIFREE TRIAL

Real-time data from popular real estate websites

Web Scraper APIFREE TRIAL

Public data delivery from a majority of websites

Features

Web Crawler

Discovers all pages on a website and fetches data at scale.

Scheduler

Schedules multiple scraping and parsing jobs at specified frequencies.

Custom Parser

Parses scraped documents by executing given parsing instructions.

Headless BrowserNEW

Render JavaScript and execute browser instructions.

DatasetsNew

Datasets

Company Data

Comprehensive datasets for business profiling

E-Commerce Product Data

Datasets for product catalog insights from E-Commerce stores

Job Postings Data

Datasets for labour market research and insights

Community and Code Data

Datasets for developer community trends

Product Review Data

Fresh datasets for user sentiment analysis

Pricing

Proxies

Residential Proxies

Human-like scraping

Starts from

$10

Pay as you go

Mobile Proxies

3G/4G/5G Mobile Proxies

Starts from

$22

Pay as you go

Rotating ISP Proxies

Extended sessions

Starts from

$340/month

Shared Datacenter Proxies

Cost-effective solution

Starts from

$50/month

Dedicated Datacenter Proxies

Superior performance

Starts from

$50/month

Scraper APIs

SERP Scraper API

Scalable SERP data delivery

Starts from

$49/month

E-Commerce Scraper API

Enterprise-level product page data

Starts from

$49/month

Web Scraper API

Data from a majority of websites

Starts from

$49/month

Real Estate Scraper API

Real-time real estate data

Starts from

$49/month

Advanced Proxy Solutions

Web Unblocker

AI-powered proxy solution

Starts from

$75/month

Learn

Getting Started

Knowledge Base

Read the latest articles about the world of web scraping, proxies, and more

Webinars

Check our webinars to learn more about data gathering issues and solutions

White papers

Get extensive white papers to understand the most complex scraping topics

OxyCon

Join inspiring discussions at Oxylabs’ annual web scraping conference

Scraping Experts

Watch lessons by industry-leading experts to gain insights on data gathering

Useful Information

Quick Start Guides

Featured

Explore tutorials and code samples to build a web scraping infrastructure with Oxylabs solutions.

Solutions

By Industry

E-Commerce

Get access to valuable e-commerce data with the help of advanced scraping solutions

Cybersecurity

Collect threat intelligence and inspect risky activities anonymously with reliable proxies

Brand protection

Monitor the web on a large scale to ensure no unauthorized product seeped into the market

SERP Monitoring

Monitor SERPs to enhance your business strategy

Travel and hospitality

Gather real-time flight and hotel data to and build a solid strategy for your travel business.

By Use Case

View all

By Target

View all

Back to blog

Tutorials Scrapers Data acquisition

How to Scrape YouTube Data: Step-by-Step Guide

Vytenis Kaubre

2023-09-125 min read

YouTube is one of the largest content-sharing platforms in the world, with more than 500 hours of content uploaded each minute. In November 2022, YouTube even secured the second position as the most visited website globally, with 74.8 billion monthly visits, according to Statista.

The sheer volume of public data and traffic on YouTube unlocks various research opportunities for businesses and individuals. Web scraping is the go-to method for extracting data from publicly available YouTube pages, such as video details, comments, channel information, as well as search results. Hence, in this guide, you’ll learn how to leverage Python, Oxylabs’ YouTube Scraper API, and Custom Parser to scrape YouTube videos and harness the potential of YouTube data.

1. Prepare the environment

First, install the latest version of Python, which you can download from the official Python website.

1.1 Install the dependencies

Next, run the following command in your terminal to install the necessary modules:

pip install yt-dlp requests

1.2 Obtain Youtube Scraper API credentials

To use the Oxylabs’ YouTube Scraper API, you’ll need an Oxylabs account. Head to the Oxylabs dashboard and sign up to create a new account. Once you create your account, you’ll get a one-week free trial together with your user credentials. You’ll later need these credentials to extract channel information, subscriber count, and search results.

2. Download YouTube videos

Please note that all information provided herein is for informational purposes only and does not grant you any rights with regard to the described data, videos, or images, which may be protected by copyright, intellectual property, or other rights. Before engaging in scraping activities, you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Now, let’s download a YouTube video using the yt-dlp library, which is popular for downloading YouTube videos. For this example, you can use this video as your target URL.

To download this video, you’ll first need to import the library. Then, use the download() method as shown below:

from yt_dlp import YoutubeDL


video_url = "https://www.youtube.com/watch?v=mDveiNIpqyw"
opts = dict()

with YoutubeDL(opts) as yt:
    yt.download([video_url])

When you run this code, the script will download the video and store it in the current folder of your project.

3. Scrape YouTube video data

Scraping YouTube videos is also possible with the yt-dlp library. You can extract public video data like the title, video dimensions, and the language used.

Let’s extract video details from the video we’ve downloaded previously. For this task, you can use the extract_info() method with the download=False parameter so that it doesn’t download the video file again. This method will return a dictionary with all the video-related info:

from yt_dlp import YoutubeDL


video_url = "https://www.youtube.com/watch?v=mDveiNIpqyw"
opts = dict()

with YoutubeDL(opts) as yt:
    info = yt.extract_info(video_url, download=False)
    video_title = info.get("title", "")
    width = info.get("width", "")
    height = info.get("height", "")
    language = info.get("language", "")
    print(video_url, video_title, width, height, language)

4. Scrape YouTube Comments

Please note that all information provided herein is for informational purposes only and does not grant you any rights with regard to the described data, which may be protected by corresponding privacy rights or other rights. Before engaging in scraping activities, you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

To extract all the video comments, you’ll need to pass an additional option getcomments while initializing the yt-dlp library.

Once you set getcomments to True, the extract_info() method will fetch all the comment threads along with the other information about the video. So, you can extract just the comments from the info dictionary like below:

from yt_dlp import YoutubeDL
from pprint import pprint


video_url = "https://www.youtube.com/watch?v=mDveiNIpqyw"
opts = {
    "getcomments": True
}

with YoutubeDL(opts) as yt:
    info = yt.extract_info(video_url, download=False)
    comments = info["comments"]
    thread_count = info["comment_count"]
    print("Number of threads: {}".format(thread_count))
    pprint(comments)

5. Scrape YouTube channel information

For this example, let’s use the Oxylabs channel's “About” section to extract the channel name and description. Here, you’ll have to use your YouTube Scraper API credentials to authenticate with the API.

5.1 Inspect elements

The first step is to find the necessary XPath selectors to extract the channel name and description. If you want to use CSS selectors, visit our Custom Parser documentation for more information.

So, open the “About” page in a web browser and use the Developer Tools to inspect elements. You can simply press CTRL + SHIFT + I on Windows or Option + Command + I on macOS to open the Developer Tools:

By inspecting the elements, you can easily construct the relative XPath selector using the IDs associated with the elements. Thus, the XPath selectors are:

Channel name XPath

//ytd-channel-name[@id="channel-name"]/div/div/yt-formatted-string[@id="text"]

Description XPath

//yt-formatted-string[@id="description"]

5.2 Prepare parsing instructions

Now, using the XPath selectors, you can prepare the parsing instructions for YouTube Scraper API. It’s a dictionary that lists all the functions to execute when parsing the data from the HTML content. Let’s begin by importing the requests module and defining the variable instructions that'll contain the parsing instructions:

import requests


url = "https://www.youtube.com/@oxylabs/about"

instructions = {
    "Channel Name": {
        "_fns": [{
            "_fn": "xpath_one",
            "_args": ['//ytd-channel-name[@id="channel-name"]/div/div/yt-formatted-string[@id="text"]/text()']
            }]
    },
    "Description": {
            "_fns": [{
                "_fn": "xpath_one",
                "_args": ['//yt-formatted-string[@id="description"]/text()']
            }]
    }
}

Note the xpath_one function, which tells the API to select only the first matched element when parsing.

5.3 Prepare payload

Create a new variable payload that'll contain the scraping parameters and parsing instructions that you’ll send to the API:

payload = {
    "source": "universal",
    "render": "html",
    "parse": "true",
    "parsing_instructions": instructions,
    "url": url,
}

The render parameter is set to html, so the API will execute JavaScript to render all dynamic content. parse is also set to true to tell the API that the payload includes parsing_instructions.

5.4 Make a POST request to the API

To POST the payload to the API, you’ll have to use the credentials that you’ve obtained from the Oxylabs dashboard:

credentials = ("USERNAME", "PASSWORD")

response = requests.post(
    "https://realtime.oxylabs.io/v1/queries",
    auth=credentials,
    json=payload,
)

print(response.status_code)

Replace the USERNAME and PASSWORD with your credentials, run the code, and If everything works as expected, you’ll get a status_code of 200.

5.5 Extract the channel info

YouTube Scraper API sends a JSON response from which you can extract the parsed channel name and description, as showcased below:

channel_name = response.json()["results"][0]["content"]["Channel Name"]
description = response.json()["results"][0]["content"]["Description"]

print(channel_name)
print(description)

Here’s the complete code:

import requests


url = "https://www.youtube.com/@oxylabs/about"

instructions = {
    "Channel Name": {
        "_fns": [{
            "_fn": "xpath_one",
            "_args": ['//ytd-channel-name[@id="channel-name"]/div/div/yt-formatted-string[@id="text"]/text()']
            }]
    },
    "Description": {
            "_fns": [{
                "_fn": "xpath_one",
                "_args": ['//yt-formatted-string[@id="description"]/text()']
            }]
    }
}

payload = {
    "source": "universal",
    "render": "html",
    "parse": "true",
    "parsing_instructions": instructions,
    "url": url,
}

credentials = ("USERNAME", "PASSWORD")

response = requests.post(
    "https://realtime.oxylabs.io/v1/queries",
    auth=credentials,
    json=payload,
)

print(response.status_code)

channel_name = response.json()["results"][0]["content"]["Channel Name"]
description = response.json()["results"][0]["content"]["Description"]

print(channel_name)
print(description)

6. Scrape YouTube channel subscribers

You can extract the subscriber count of a YouTube channel using the same approach. Let’s again use the Oxylabs channel’s “About” page:

By inspecting elements with Developer Tools, you can see the element has an ID subscriber-count, so building XPath is relatively easy: //*[@id="subscriber-count”]. With this information, you can create parsing instructions as follows:

instructions = {
    "subscribers": {
        "_fns": [{
            "_fn": "xpath_one",
            "_args": ['//*[@id="subscriber-count"]/text()'],
        }]
    },
}

And, just like before, the xpath_one function picks only the first match. The rest of the code is almost the same. Here’s the full source code:

import requests


url = "https://www.youtube.com/@oxylabs/about"
instructions = {
    "subscribers": {
        "_fns": [{
            "_fn": "xpath_one",
            "_args": ['//*[@id="subscriber-count"]/text()'],
        }]
    },
}

payload = {
    "source": "universal",
    "render": "html",
    "parse": "true",
    "parsing_instructions": instructions,
    "url": url,
}

credentials = ("USERNAME", "PASSWORD")

response = requests.post(
    "https://realtime.oxylabs.io/v1/queries",
    auth=credentials,
    json=payload,
)

print(response.status_code)

subscribers = response.json()["results"][0]["content"]["subscribers"]
print(subscribers)

As the data is in the JSON response, you can extract the parsed subscriber count from the response and print it as an output.

7. Scrape YouTube search results

You can also use YouTube Scraper API to scrape public data from search results.

To scrape video titles and video links of every search result, first, you need to find the related XPath selectors, and then you can modify the instructions as below:

instructions = {
    "titles": {
        "_fns": [{
            "_fn": "xpath",
            "_args": ['//*[@id="video-title"]/yt-formatted-string/text()']
            }]
    },
    "links": {
            "_fns": [{
                "_fn": "xpath",
                "_args": ['//*[@id="video-title"]/@href']
            }]
    }
}

In this instance, we’re using xpath instead of xpath_one because there are multiple search results, and we want to extract all of them. The complete code for scraping the search page looks like this:

import requests


url = "https://www.youtube.com/results?search_query=oxylabs"

instructions = {
    "titles": {
        "_fns": [{
            "_fn": "xpath",
            "_args": ['//*[@id="video-title"]/yt-formatted-string/text()']
            }]
    },
    "links": {
            "_fns": [{
                "_fn": "xpath",
                "_args": ['//*[@id="video-title"]/@href']
            }]
    }
}

payload = {
    "source": "universal",
    "render": "html",
    "parse": "true",
    "parsing_instructions": instructions,
    "url": url,
}

credentials = ("USERNAME", "PASSWORD")

response = requests.post(
    "https://realtime.oxylabs.io/v1/queries",
    auth=credentials,
    json=payload,
)

print(response.status_code)

titles = response.json()["results"][0]["content"]["titles"]
links = response.json()["results"][0]["content"]["links"]
base_url = "https://www.youtube.com"
for title, link in zip(titles, links):
    full_url = f"{base_url}{link}"
    print(title, full_url)

Since both titles and links variables are Python lists, you can simply use the zip() method to map the relevant titles with the links.

Wrap up

Feel free to expand the source codes with additional functionalities and adjust the target URLs for your YouTube data needs. If you want to store your scraped public data in a CSV or Excel file, check out this in-depth Python web scraping guide for more details. Additionally, visit our API documentation to find more information about the payload parameters and other code examples.

In case you prefer visual tutorials, take a look at this extensive playlist of Oxylabs’ video guides to get an even easier head-start into web scraping.

Need to collect data from other sources? See these detailed guides on how to scrape Google Search Results, Bing Search Results, Google News, Google Shopping, as well as Amazon data.

Frequently asked questions

Is it legal to scrape YouTube videos?

The legality of web scraping YouTube videos solely relies on what data you gather and how you use it. It’s important to follow all the regulations and laws that govern online data, including privacy laws and copyright. In addition, it’s always best to seek professional legal advice before engaging in scraping activities.

It’s also recommended to adhere to the website’s terms of use and follow web scraping best practices. To better understand this topic, we recommend reading this article about the legal frameworks behind web scraping.

Does YouTube block scrapers?

Yes, YouTube may block suspicious requests coming from web scrapers. It uses various anti-scraping measures and constantly monitors incoming web requests for any indication of bot-like behavior.

If you want to learn more about web scraping and bot detection systems, check out this great article on 13 tips for block-free scraping and hear about the bypassing methods from our scraping expert in this free webinar.

About the author

Vytenis Kaubre

Copywriter

Vytenis Kaubre is a Copywriter at Oxylabs. As his passion lay in creative writing and curiosity in anything tech kept growing, he joined the army of copywriters. After work, you might find Vytenis watching TV shows, playing a guitar, or learning something new.

Learn more about Vytenis Kaubre

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Scrapers Tutorials