How to Scrape Yelp in Python | Uwrangle
Learn how to scrape Yelp data, including business details, reviews, and ratings, using Python. This guide covers the essentials to help you easily extract Yelp data with a simple script.
Yelp is a widely used platform where customers share reviews and experiences about local businesses. Since its launch in 2004, Yelp has grown to include:
- 287 million reviews across various categories like restaurants, shopping, and home services
- Over 7 million businesses are listed, making it a go-to source for discovering local businesses (Source)
With millions of visitors each month, Yelp provides valuable insights into customer preferences and market trends. This makes it a powerful tool for businesses and researchers who want to analyze market trends, scrape Yelp reviews, and understand customer sentiment.
Python Tutorial: Scraping Yelp Reviews with Unwrangle API
Step 1: Prerequisites
Before you begin, ensure you have the following:
- API Key: Sign up on Unwrangle to get your API key.
- yelp-biz-id: This ID is unique to each business on Yelp. You can find it by using the inspect element and searching for yelp-biz-id.
-
Python Installed: Make sure Python 3.x is installed on your system.
-
Requests Library: If you don't already have the requests library, install it by running:
Step 2: Making a Basic API Request
To scrape Yelp reviews, you need to make a GET request to the /api/getter endpoint with the following query parameters:
- Platform: Set this to "yelp_reviews".
- yelp-biz-id: The unique Yelp business ID.
- Api_key: Your Unwrangle API key.
- Page (optional): Specifies the page number of results. Default is 1.
Here's a Python example:
Step 3: Response Format
The API returns a JSON object containing the reviews and metadata. Here's a quick overview of the key fields:
Meta Information:
- success: Indicates whether the API call was successful.
- page: The current page of results.
- total_results: Total number of reviews available.
- no_of_pages: Total pages for all reviews.
- result_count: Number of reviews on the current page.
Review Details
Each review in the reviews array contains the following attributes:
Attribute | Data Type | Description |
---|---|---|
id | string | Yelp's unique ID for the review |
date | string | Date when the review was published |
rating | integer | Star rating provided by the reviewer (1-5) |
review_text | string | The full text content of the review |
review_url | string | Direct link to the review on Yelp |
lang | string | Two-letter language code for the review (e.g., en) |
author_avatar | string | URL of the reviewer's profile avatar |
author_name | string | Name of the reviewer |
author_url | string | Link to the reviewer's Yelp profile |
review_imgs | list | Links to images included in the review (if any) |
meta_data | dict | Feedback metrics including useful, funny, and cool votes |
location | string | City and state of the reviewer |
response | dict | Contains the business owner's response to the review, if available |
Step 4: Handling the API Response
To process the response and extract useful information:
- Parse the JSON response into a Python dictionary.
- Access the metadata (e.g., total reviews and pages).
- Iterate through the reviews array to extract individual review details.
Here's Code to Parse Reviews:
Here's how preview would look like:
Get Started with Unwrangle
Sign up for Unwrangle to:
- Access Yelp reviews through a simple API call.
- Get structured JSON responses with review data.
- Avoid dealing with proxies and CAPTCHAs.