Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network statusCareers

Back to blog

How to Scrape Tripadvisor Data

How to Scrape Tripadvisor Data

Augustas Pelakauskas

2023-10-063 min read
Share

Tripadvisor is a prominent platform in the travel and hospitality industry. It offers a wealth of data on hotels, restaurants, and attractions, along with user reviews, making it a good target for web scraping and a valuable resource for market research, competitor analysis, and, in turn, decision-making.

You can scrape data like names, addresses, contact info, ratings, user-generated reviews, images, pricing, and geographic coordinates to enhance your understanding of the industry.

In this tutorial, you’ll learn how to scrape Tripadvisor data with Tripadvisor Scraper API and Python.

1. Prepare the environment

You can download the latest version of Python from the official website.

Install dependencies

Install scraping-related Python libraries. Run the command below.

pip install bs4 requests pandas

It’ll automatically download and install Beautiful Soup, Requests, and Pandas.

Import libraries

Import the libraries for use at a later step.

from bs4 import BeautifulSoup
import requests
import pandas as pd

Get API credentials

To use Tripadvisor API, you’ll need an Oxylabs account. With a one-week free trial, you’ll have ample time to fine-tune your scraping task. Once signed up, you'll receive your API credentials. Save them in a tuple, as shown below.

credentials = ('USERNAME', 'PASSWORD')

Don’t forget to replace USERNAME and PASSWORD with your credentials.

2. Prepare payload

Prepare a payload to make a POST request to the API. For Tripadvisor, the source must be set to universal. You’ll also have to set render to html.

NOTE: You can always find all of the parameters and examples in our documentation.

url  = "https://www.tripadvisor.com/Search?searchSessionId=000a97712c5c1aad.ssid&searchNearby=false&ssrc=e&q=Nearby&sid=6786CB884ED642F4A91E6E9AD932BE131695517577013&blockRedirect=true&geo=1&rf=1"
payload = {
    'source': 'universal',
    'render': 'html',
    'url': url,
}

Just replace the URL above with your own search query.

3. Send POST request

Use credentials and payload to send a POST request to the API. The Requests module will convert the payload dict to a JSON object and send it to the API.

response = requests.post(
    'https://realtime.oxylabs.io/v1/queries',
    auth=credentials,
    json=payload,
)
print(response.status_code)

You should expect a status_code with a value of 200, indicating success. If you get a different code, check your credentials and payload to make sure they’re correct.

4. Extract data

The API sends the response in JSON format. You can extract the HTML content of the page as follows.

content = response.json()["results"][0]["content"]
soup = BeautifulSoup(content, "html.parser")

The soup object will contain parsed HTML content. You can use CSS selectors to grab specific elements.

Let’s collect the following data from the Restaurants category.

Name

To extract a restaurant name, you’ll first need to find the corresponding CSS selector. Use your web browser’s developer tools to inspect and find the necessary CSS selector. Navigate to the web page, right-click, and then select Inspect.

If you inspect a name, you’ll notice it’s wrapped in <span> inside the <div> with the result-title class. Using this information, you can construct the Beautiful Soup selectors.

name = soup.find('div', {"class": "result-title"}).find('span').get_text(strip=True)

Rating

Similarly, for rating, inspect the rating bubbles.

As you can see, the <span> element has a class ui_bubble_rating, and the rating is available in the alt attribute. Use the find() method to extract the alt attribute.

rating = soup.find('span', {"class": "ui_bubble_rating"})['alt']

Reviews

Reviews can be extracted from the <a> tag with the class review_count, as shown below.

The code will look like this.

review = soup.find('a', {"class": "review_count"}).get_text(strip=True)

NOTE: In all three cases, the find() method only grabs elements from the first search result. See the following section for extracting all results.

Search results

To extract all the search results, grab each result and then run a loop. First, identify the CSS selector of each result encapsulated in a <div> with the class result.

Now, update the code to grab all the search results.

data = []
for div in soup.find_all("div", {"class": "result"}):
    name = div.find('div', {"class": "result-title"}).find('span').get_text(strip=True)
    rating = div.find('span', {"class": "ui_bubble_rating"})['alt']

    review = div.find('a', {"class": "review_count"}).get_text(strip=True)
    data.append({
        "name": name,
        "rating": rating,
        "review": review,
    })

The code above extracts all the search results and stores them in the data list.

Save to CSV

Lastly, use Pandas to export data to a CSV file using the to_csv() method.

df = pd.DataFrame(data)
df.to_csv("search_results.csv", index=False)

The complete code

Here’s the full source code.

from bs4 import BeautifulSoup
import requests
import pandas as pd

credentials = ('USERNAME', 'PASSWORD')
url  = "https://www.tripadvisor.com/Search?searchSessionId=000a97712c5c1aad.ssid&searchNearby=false&ssrc=e&q=Nearby&sid=6786CB884ED642F4A91E6E9AD932BE131695517577013&blockRedirect=true&geo=1&rf=1"
payload = {
    'source': 'universal',
    'render': 'html',
    'url': url,
}
response = requests.post(
    'https://realtime.oxylabs.io/v1/queries',
    auth=credentials,
    json=payload,
)
print(response.status_code)

content = response.json()["results"][0]["content"]
soup = BeautifulSoup(content, "html.parser")

data = []
for div in soup.find_all("div", {"class": "result"}):
    name = div.find('div', {"class": "result-title"}).find('span').get_text(strip=True)
    rating = div.find('span', {"class": "ui_bubble_rating"})['alt']

    review = div.find('a', {"class": "review_count"}).get_text(strip=True)
    data.append({
        "name": name,
        "rating": rating,
        "review": review,
    })

df = pd.DataFrame(data)
df.to_csv("search_results.csv", index=False)

Conclusion

Pairing Python and Tripadvisor Scraper API lets you scrape Tripadvisor data, avoiding common web scraping-associated challenges. Check our technical documentation for all the API parameters and variables mentioned in this tutorial.

Additionally, explore our blog to learn how to scrape data from popular targets like YouTube, Best Buy, Zillow, eBay, Walmart, and many others.
If you have inquiries about the tutorial or web scraping in general, don't hesitate to reach out either by sending a message to support@oxylabs.io or using the live chat.

Frequently asked questions

Is scraping Tripadvisor legal?

Yes, you can freely scrape public data, including Tripadvisor. Make sure to adhere to website regulations and consider legal differences based on geographic location. To learn more about the legalities of web scraping, check here.

How do I crawl data from Tripadvisor?

To scrape data at scale, you can either build and maintain your own web scraping infrastructure using a preferred programming language or outsource an all-in-one solution, such as a scraper API.

Can you scrape Tripadvisor reviews?

Yes, when using Python’s Beautiful Soup, you need to inspect and locate corresponding HTML elements and use CSS selectors to extract them.

About the author

Augustas Pelakauskas

Senior Copywriter

Augustas Pelakauskas is a Senior Copywriter at Oxylabs. Coming from an artistic background, he is deeply invested in various creative ventures - the most recent one being writing. After testing his abilities in the field of freelance journalism, he transitioned to tech content creation. When at ease, he enjoys sunny outdoors and active recreation. As it turns out, his bicycle is his fourth best friend.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Related articles

Get the latest news from data gathering world

I’m interested

IN THIS ARTICLE:


  • 1. Prepare the environment

  • 2. Prepare payload

  • 3. Send POST request

  • 4. Extract data

  • The complete code

  • Conclusion

Try Tripadvisor Scraper API

Choose Oxylabs' Tripadvisor Scraper API to gather real-time product data hassle-free.

Scale up your business with Oxylabs®