Zillow Scraper is an Apify actor for extracting data about homes from Zillow.com. It allows you to search homes in any location and extract detailed information about each one. It is build on top of Apify SDK and you can run it both on Apify platform and locally.
The way it works is by accesing Zillow's internal API and recursively splitting the map 4 ways to overcome the limit of 500 results per search. To limit the number of outputted results, you can set the maximum depth of the 4-way split zooms. This is done using the maxLevel
attribute.
Field | Type | Description | Default value |
---|---|---|---|
search | string | Query string to be searched on the site | "Los Angeles" |
startUrls | array | List of Request objects that will be deeply crawled. The URL can be any Zillow.com home list page | none |
maxItems | number | Maximum number of pages that will be scraped | 200 |
maxLevel | number | Maximum map splitting level | 20 |
minDate | string | Minimum date of the results allowed (timestamp or date string) | none |
simple | boolean | Toggle whether simplified results will be returned | true |
extendOutputFunction | string | Function that takes Zillow home data object as argument and returns data that will be merged with the default output. More information in Extend output function | (data) => { return {}; } |
proxyConfiguration | object | Proxy settings of the run. If you have access to Apify proxy, leave the default settings. If not, you can set { "useApifyProxy": false" } to disable proxy usage |
{ "useApifyProxy": true } |
Either the search
or startUrls
atrribute has to be set.
Output is stored in a dataset. Each item is information about a home.
If the simple
attribute is set, an example result may look like this:
{
"address": {
"streetAddress": "312 N Kendall Ave APT B",
"city": "Kalamazoo",
"state": "MI",
"zipcode": "49006",
"neighborhood": "Westwood",
"community": null,
"subdivision": null
},
"bedrooms": 6,
"bathrooms": 3.5,
"price": 300,
"longitude": -85.626183,
"latitude": 42.29457,
"description": "Rent: $300.00/bed Student Housing. This is a 3-unit complex within close proximity to WMU consisting of 6 bedrooms and 3.5 baths throughout three levels in each unit. The main level features 1 of the 6 bedrooms, 1/2 bath, a roomy living room, kitchen with an eating area, and entry to your private deck. The upper level features 3 bedrooms, 2 full baths and a laundry area with a full size washer and dryer. The lower level features 2 bedrooms, 1 full bath and a bonus 2nd living area to use for socializing, gaming, studying, etc. All bedrooms are good-size and privately keyed. trash, lawn and snow plowing services included. Pet friendly with prior management approval and pet rent. Cats Allowed\nOven\nParking\nResident Pays Electricity\nResident Pays Gas\nResident Pays Water\nSmall Dogs Allowed\nSmoke Free\nTrash Pick Up Included\nUnfurnished\nWasher & Dryer",
"livingArea": 2236,
"currency": "USD",
"url": "https://www.zillow.com/homedetails/312-N-Kendall-Ave-APT-B-Kalamazoo-MI-49006/2096316908_zpid/",
"photos": [
"https://photos.zillowstatic.com/p_f/IS3f0lgq5a0cxn0000000000.jpg",
"https://photos.zillowstatic.com/p_f/ISzvuwdfl85g7p0000000000.jpg",
"https://photos.zillowstatic.com/p_f/ISrpskv8h0xi7p0000000000.jpg",
"https://photos.zillowstatic.com/p_f/ISjjq8d2dsol7p0000000000.jpg",
"https://photos.zillowstatic.com/p_f/ISbdowuv8kgo7p0000000000.jpg",
"https://photos.zillowstatic.com/p_f/IS3zfi07y70jap0000000000.jpg"
]
}
If the simple
attribute is not set, the result will contain many more attributes.
You can find example of a full result here.
To overcome the limit of 500 results per page, the crawler uses Zillow's internal API to search for homes on a rectangular section of a map. If the number of results on the map is higher than 500, the map is split into 4 quadrants and zoomed. Each of these quadrants is searched for homes and can again contain 500 results (that means using 1 split, we've increased the total result limit to 2000). Unless the result count in the quadrant is less than 500 (no need to split anymore), the quadrant is split again and so on. To limit this behavior, you can set the maxLevel
attribute. That way, the map will be split only a maximum of maxLevel
times, even if the number of results is higher than 500.
Keep in mind that it is much more efficient to run one longer scrape (at least one minute) than more shorter ones because of the startup time.
The average consumption is about 1 Compute unit per 2000 results scraped.
You can use this function to update the default output of this actor. This function gets Zillow internal home data object as an argument, so you can choose which other attributes you would like to add. The output from this function will get merged with the default output.
The internal home object contains huge amounts of data - example
Any of these attributes can be added to the result object.
The return value of this function has to be an object!
You can return fields to achieve 3 different things:
- Add a new field - Return object with a field that is not in the default output
- Change a field - Return an existing field with a new value
- Remove a field - Return an existing field with a value
undefined
(data) => {
return {
schools: data.property.schools,
homeStatus: 'SOLD',
address: undefined,
}
}
This example will add a new field schools
, change the homeStatus
field and remove the address
field
Thank you for trying my actor. You can send any feedback you have to my email [email protected]
.
If you find any bug, please create an issue on the Github page.
Create an input file at ./apify_storage/key_value_stores/default/INPUT.json
Example:
{
"search": "Los Angeles",
"maxItems": 200,
"maxLevel": 20,
"simple": true,
"extendOutputFunction": "(data) => { return {}; }",
"proxyConfiguration": {
"useApifyProxy": false
}
}