nodyhub / flathunter Goto Github PK
View Code? Open in Web Editor NEWTelegramBot that finds flats on immobilienscout24.de and wg-gesucht.de
License: GNU Affero General Public License v3.0
TelegramBot that finds flats on immobilienscout24.de and wg-gesucht.de
License: GNU Affero General Public License v3.0
If you crawl immobilienscout webpage, the bot gets stuck in the loop and does not continue scrapping the other sites.
In my tests it worked with python โฅ 3.3
Hi guys,
Thanks for your work. I've been trying to make this work for my WG-Gesucht apartment search. I wanted to step into the code and see if something is wrong (I'm not a Python expert) I, but I wanted to make sure I'm not missing out on anything. Is the Wg-Gesucht crawl still a work in progress?
Hey,
would like to get this script to run, but got error. In config if commentout google api stuff, or is this really needed?
I used Python v3.
What im doing wrong?
ubuntu:~/flathunter-master$ python3 flathunter.py
[2023/01/16 14:10:51|flathunter.py |INFO ]: Using config /home/ubuntu/flathunter-master/config.yaml
/home/ubuntu/flathunter-master/flathunter.py:65: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(config_handle.read())
[2023/01/16 14:10:51|idmaintainer.py |INFO ]: already processed: 0
Traceback (most recent call last):
File "/home/ubuntu/flathunter-master/flathunter.py", line 89, in <module>
main()
File "/home/ubuntu/flathunter-master/flathunter.py", line 85, in main
launch_flat_hunt(config)
File "/home/ubuntu/flathunter-master/flathunter.py", line 43, in launch_flat_hunt
hunter.hunt_flats(config, searchers, id_watch)
File "/home/ubuntu/flathunter-master/flathunter/hunter.py", line 27, in hunt_flats
results = searcher.get_results(url)
File "/home/ubuntu/flathunter-master/flathunter/crawl_wggesucht.py", line 19, in get_results
soup = self.get_page(search_url, page_no)
File "/home/ubuntu/flathunter-master/flathunter/crawl_wggesucht.py", line 41, in get_page
return BeautifulSoup(resp.content, 'html.parser')
File "/home/ubuntu/.local/lib/python3.10/site-packages/bs4/__init__.py", line 228, in __init__
self._feed()
File "/home/ubuntu/.local/lib/python3.10/site-packages/bs4/__init__.py", line 289, in _feed
self.builder.feed(self.markup)
File "/home/ubuntu/.local/lib/python3.10/site-packages/bs4/builder/_htmlparser.py", line 215, in feed
parser.feed(markup)
File "/usr/lib/python3.10/html/parser.py", line 110, in feed
self.goahead(0)
File "/usr/lib/python3.10/html/parser.py", line 178, in goahead
k = self.parse_html_declaration(i)
File "/usr/lib/python3.10/html/parser.py", line 269, in parse_html_declaration
self.handle_decl(rawdata[i+2:gtpos])
File "/home/ubuntu/.local/lib/python3.10/site-packages/bs4/builder/_htmlparser.py", line 153, in handle_decl
self.soup.endData()
File "/home/ubuntu/.local/lib/python3.10/site-packages/bs4/__init__.py", line 365, in endData
self.object_was_parsed(o)
File "/home/ubuntu/.local/lib/python3.10/site-packages/bs4/__init__.py", line 370, in object_was_parsed
previous_element = most_recent_element or self._most_recent_element
File "/home/ubuntu/.local/lib/python3.10/site-packages/bs4/element.py", line 1054, in __getattr__
return self.find(tag)
File "/home/ubuntu/.local/lib/python3.10/site-packages/bs4/element.py", line 1292, in find
l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
File "/home/ubuntu/.local/lib/python3.10/site-packages/bs4/element.py", line 1313, in find_all
return self._find_all(name, attrs, text, limit, generator, **kwargs)
File "/home/ubuntu/.local/lib/python3.10/site-packages/bs4/element.py", line 528, in _find_all
strainer = SoupStrainer(name, attrs, text, **kwargs)
File "/home/ubuntu/.local/lib/python3.10/site-packages/bs4/element.py", line 1610, in __init__
self.text = self._normalize_search_value(text)
File "/home/ubuntu/.local/lib/python3.10/site-packages/bs4/element.py", line 1615, in _normalize_search_value
if (isinstance(value, str) or isinstance(value, collections.Callable) or hasattr(value, 'match')
AttributeError: module 'collections' has no attribute 'Callable'
Hey,
i am trying to understand what is wrong with my try to get the "flathunter" running...
I cannot figure out where i would start looking for the problem - maybe someone could point me in the right direction?
Thank you so much!!!
this is the error message:
[2019/11/08 12:09:47|flathunter.py |INFO ]: Using config config.yaml.dist [2019/11/08 12:09:47|idmaintainer.py |INFO ]: already processed: 0 [2019/11/08 12:09:48|crawl_immobilienscout.py|INFO ]: Number of results: 1 [2019/11/08 12:09:48|hunter.py |INFO ]: New offer: Wohnung mit Balkon --- Logging error --- Traceback (most recent call last): File "/usr/lib/python3.5/logging/__init__.py", line 981, in emit msg = self.format(record) File "/usr/lib/python3.5/logging/__init__.py", line 831, in format return fmt.format(record) File "/usr/lib/python3.5/logging/__init__.py", line 568, in format record.message = record.getMessage() File "/usr/lib/python3.5/logging/__init__.py", line 331, in getMessage msg = msg % self.args ValueError: unsupported format character 'C' (0x43) at index 62 Call stack: File "./flathunter.py", line 89, in <module> main() File "./flathunter.py", line 85, in main launch_flat_hunt(config) File "./flathunter.py", line 43, in launch_flat_hunt hunter.hunt_flats(config, searchers, id_watch) File "/home/hannes/flathunter/flathunter/hunter.py", line 61, in hunt_flats durations=self.get_formatted_durations(config, address)).strip() File "/home/hannes/flathunter/flathunter/hunter.py", line 79, in get_formatted_durations duration = self.get_gmaps_distance(config, address, dest, mode['gm_id']) File "/home/hannes/flathunter/flathunter/hunter.py", line 109, in get_gmaps_distance self.__log__.error("Failed retrieving distance to address %s: " % str(address), str(result)) Message: 'Failed retrieving distance to address Lichtenberger+Str.++11%2C+Friedrichshain+%28Friedrichshain%29%2C+Berlin: ' Arguments: ("{'rows': [], 'origin_addresses': [], 'error_message': 'This API project is not authorized to use this API.', 'destination_addresses': [], 'status': 'REQUEST_DENIED'}",) Traceback (most recent call last): File "./flathunter.py", line 89, in <module> main() File "./flathunter.py", line 85, in main launch_flat_hunt(config) File "./flathunter.py", line 43, in launch_flat_hunt hunter.hunt_flats(config, searchers, id_watch) File "/home/hannes/flathunter/flathunter/hunter.py", line 64, in hunt_flats sender.send_msg(message) File "/home/hannes/flathunter/flathunter/sender_telegram.py", line 20, in send_msg qry = url % (self.bot_token, chat_id, text) TypeError: %i format: a number is required, not str
I am gonna make a fork of this project that can crawl wg gesucht because they updated it :(
In flathunter/hunter.py line 119 code unreachable after "continue"
Indentation error
Fix: one indent less on line 119
When executing python flathunter.py
I get an error message that immobilienscout identifies a robot.
Do you know a way around this?
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.