Giter Club home page Giter Club logo

Comments (25)

alvarobartt avatar alvarobartt commented on May 28, 2024 11

Hi guys! So Investing.com has some severe limitations and strict Cloudflare protection so it's probably that, I tried to contact them in the past in order to find a solution for this, but they didn't reply, so it seems that they are not interested at all in having an open source project to pull data from their site and/or API. I'll probably try to contact them again, but they are reluctant to do anything, so probably it will never work as smooth as before 😭

from investiny.

jvn-cc avatar jvn-cc commented on May 28, 2024 4

https://info.signalsciences.com/rs/025-XKO-469/images/signal-sciences-case-study-investingcom.pdf
doesn't look like it's gona work again any time soon. what a shame

from investiny.

ymyke avatar ymyke commented on May 28, 2024 4

If I may shamelessly plug my project tessa – it's an abstraction layer on top of data sources. It used to use investpy and coingecko. Now it uses yfinance and coingecko. It offers a streamlined interface and a number of features such as a Symbol class including collections to store and load a list of symbols. Might be useful for some people.

from investiny.

csr13 avatar csr13 commented on May 28, 2024 3

Same issue here, tried from a remote server (different IP) in case of blacklist, and my own machine, both got 403.

from investiny.

RafPe avatar RafPe commented on May 28, 2024 1

@alvarobartt Can you contact me as I can share with you how to make it work :)

So without patch I get 403 👎 but with some extra setup added I get proper responses as needed.

{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [222 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [222 bytes data]
* old SSL session ID is stale, removing
{ [5 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS == 256)!
} [5 bytes data]
< HTTP/2 200
< date: Sun, 04 Jun 2023 14:57:14 GMT
< content-type: text/html; charset=utf-8
< access-control-allow-origin: https://tvc-invdn-com.investing.com
< x-requested-with: XMLHttpRequest
< access-control-allow-methods: POST, GET, OPTIONS, PUT, DELETE
< access-control-allow-headers: Content-Type, Depth, User-Agent, X-File-Size, X-Requested-With, If-Modified-Since, X-File-Name, Cache-Control, accept, sessionid, x-csrftoken, content-type
< x-benchmark-1a: 0ms, mem alloc - 768.00Kb start
< x-benchmark-1b: 66ms, mem alloc - 1.25Mb start of TradingviewConnector::getExchanges
< x-benchmark-1c: 66ms, mem alloc - 1.25Mb get currencies in TradingviewConnector::getExchanges
< x-benchmark-1d: 184ms, mem alloc - 8.00Mb get_exchange_attr in TradingviewConnector::getExchanges
< x-benchmark-1e: 455ms, mem alloc - 9.00Mb make list in TradingviewConnector::getExchanges
< x-benchmark-1f: 456ms, mem alloc - 9.00Mb end of TradingviewConnector::getExchanges
< x-benchmark-1g: 457ms, mem alloc - 9.00Mb end
< vary: Accept-Encoding,User-Agent
< content-security-policy: upgrade-insecure-requests; block-all-mixed-content
< cf-cache-status: DYNAMIC
< set-cookie: __cf_bm=EHzHH2HOT6DJb4uLsqBH2gNwP3wdFvD9U9fubmMjR6c-1685890634-0-AZWJpSXr3ShvcdKca0E2t1T7pxeWSprj1xYrm8tggM9qcqs4VHBYloXJVYUdNa9PP6oJXtJK5b7HkQqnVSobM7Y=; path=/; expires=Sun, 04-Jun-23 15:27:14 GMT; domain=.investing.com; HttpOnly; Secure; SameSite=None
< server: cloudflare
< cf-ray: 7d2106f01cf51c18-AMS
< content-encoding: br
< alt-svc: h3=":443"; ma=86400
<
{ [103 bytes data]
100   103    0   103    0     0    147      0 --:--:-- --:--:-- --:--:--   150
* Connection #0 to host tvc6.investing.com left intact
[{"symbol":"AAPL","full_name":"NASDAQ:AAPL","description":"Apple Inc","type":"Stock","ticker":"6408","exchange":"NASDAQ"}]%

from investiny.

zh1cheng avatar zh1cheng commented on May 28, 2024 1

@ivgomezarnedo

Amazing! Good job!

For Selenium, may be the problem is on the driver. User-Agent in http header is also an factor Cloudflare checks.

You can check on this link, https://stackoverflow.com/questions/68289474/selenium-headless-how-to-bypass-cloudflare-detection-using-selenium

from investiny.

adalbertobrant avatar adalbertobrant commented on May 28, 2024

Hi @alvarobartt , great job my friend well done! Just sent to you 5 bat

image

Why not use tradingview instead of investing.com

from investiny.

JosePereiraUA avatar JosePereiraUA commented on May 28, 2024

Does anyone have any other option to obtain historical data of investment funds, etc?

from investiny.

vulonviing avatar vulonviing commented on May 28, 2024

Does anyone have any other option to obtain historical data of investment funds, etc?

we just have yfinance.

https://info.signalsciences.com/rs/025-XKO-469/images/signal-sciences-case-study-investingcom.pdf doesn't look like it's gona work again any time soon. what a shame

it is really big shame. supporting open source projects should be no problem on modern internet but clearly we can see still we have some blind minds problem.

from investiny.

alvarobartt avatar alvarobartt commented on May 28, 2024

Yes, guys... I know it's a shame, also I tried my best to contact them but it seems they are not keen to help!

@JosePereiraUA regarding your question, the most popular options were either investpy (later turned into investiny) or yfinance from @ranaroussi, as pointed by @vulonviing.

Both had some pros/cons, but Yahoo! Finance offers an actual API, which makes that project more stable, investpy and investiny were relying on Investing.com scrappers...

This said, I'm still waiting for Investing.com's response, which I assume will never come since it's been already 2 months waiting for it, and people from Investing.com were just redirecting me to contact more and more people and at the end no one answered...

from investiny.

alvarobartt avatar alvarobartt commented on May 28, 2024

@alvarobartt Can you contact me as I can share with you how to make it work :)

So without patch I get 403 👎 but with some extra setup added I get proper responses as needed.

Sure @RafPe, I'm happy you made it work! Feel free to email me at [email protected] and we can continue the conversation there, and I'm also happy to make you a collaborator if the fix is worth adding 😄

from investiny.

landifrancesco avatar landifrancesco commented on May 28, 2024

Thank you for the big work @alvarobartt
Unfortunately, I am getting this error right now. Any fix?

from investiny.

ZhengfengRao avatar ZhengfengRao commented on May 28, 2024

aswsome open project, i hope we can fix 403 error

from investiny.

a2nath avatar a2nath commented on May 28, 2024

same problem

from investiny.

zh1cheng avatar zh1cheng commented on May 28, 2024

Just realized that when you send request to server, need to include __cf_bm in the http request header "cookie" field. The __cf_bm is something like the following:

__cf_bm=IbnWLE8.iLpa51GMDhU4sgQNuWZzN0YzU4nTSvh5IAI-1704787008-1-AWf1Y3Cbrf9qUKzKKzDhIrgwEYDJFkWQHwwhtH9/Nxh7nKV0TYpXQZIiI+R8w5hQIjf+M17LG3T915oFnZ0xsjY=;

Of cause you can find this cookie string via your web browser, and manually set in the script and then run. But the problem is how to dynamically get this cookie instead of do it in a manual way?

Anyone has any clue?

from investiny.

RafPe avatar RafPe commented on May 28, 2024

Just realized that when you send request to server, need to include __cf_bm in the http request header "cookie" field. The __cf_bm is something like the following:

__cf_bm=IbnWLE8.iLpa51GMDhU4sgQNuWZzN0YzU4nTSvh5IAI-1704787008-1-AWf1Y3Cbrf9qUKzKKzDhIrgwEYDJFkWQHwwhtH9/Nxh7nKV0TYpXQZIiI+R8w5hQIjf+M17LG3T915oFnZ0xsjY=;

Of cause you can find this cookie string via your web browser, and manually set in the script and then run. But the problem is how to dynamically get this cookie instead of do it in a manual way?

Anyone has any clue?

This cookie is Cloudflare's bot manager and just copy paste will not cut the chase as additional checks exist on their serverside

from investiny.

zh1cheng avatar zh1cheng commented on May 28, 2024

Manually set the HTTP header will work. "Origin", "Referer", "Host", "Domain-Id" and "Cookie". The reset fields are static value, except "Cookie". Must set valid value for "__cf_bm" and "__cflb" in the cookie field.

Origin=https://www.investing.com
Referer=https://www.investing.com/
Host=api.investing.com
Domain-Id=www
Cookie=___cf_bm=zYbnFeYK5jgrbdaWMH8PXIwcnJLTZzjVO7bELanOuTQ-1705051441-1-AWIFWyqBC1ojnwh4XRxM/ElniX10cVnqkZpjJAnLfESC+S4qDeF5lkHRJUbBA+6sq7EUS2vm+fb+i6wlNUxUcgw=; __cflb=02DiuEaBtsFfH7bEbN4qQwLpwTUxNYEGzXEVQQDTnUDek_

  • must find a valid cookie from your chrome browser before set into python script and run.

Looks like there is no way to fully automate this, as the server checks TLS fingerprint. There are some work to be done in order to send the correct the TLS fingerprint. Not worth the effort.

from investiny.

RafPe avatar RafPe commented on May 28, 2024

Manually set the HTTP header will work. "Origin", "Referer", "Host", "Domain-Id" and "Cookie". The reset fields are static value, except "Cookie". Must set valid value for "__cf_bm" and "__cflb" in the cookie field.

Looks like there is no way to fully automate this, as the server checks TLS fingerprint. There are some work to be done in order to send the correct the TLS fingerprint. Not worth the effort.

I think you should not give up @zh1cheng ... there has been work done on that front and u can just simply use it here https://github.com/lwthiker/curl-impersonate

from investiny.

zh1cheng avatar zh1cheng commented on May 28, 2024

@RafPe Thanks for the link. Need sometime to study.

By the way, there might be another solution, can anyone try it? Use the Selenium with Python, looks like able to do the similar things:

https://selenium-python.readthedocs.io

from investiny.

RafPe avatar RafPe commented on May 28, 2024

@zh1cheng you need to take under the consideration that the the bot manager solution from Cloudflare will detect ( in most of the cases ) the automated browser and will eventually block you. So your best choice would be to bake the app into a docker container where you can easily use the framework I referenced as linked libraries or create a method interface in the code that would execute the calls instead :)

from investiny.

ivgomezarnedo avatar ivgomezarnedo commented on May 28, 2024

@RafPe thanks for your comments. I have been able to make the library work again by modifying how the requests are being done (instead of using httpx, it uses the subprocess package to directly call to curl-impersonate ) and it works fine!

I'm curious about what you said:

@zh1cheng you need to take under the consideration that the the bot manager solution from Cloudflare will detect ( in most of the cases ) the automated browser and will eventually block you. So your best choice would be to bake the app into a docker container where you can easily use the framework I referenced as linked libraries or create a method interface in the code that would execute the calls instead :)

Do you know why Cloudflare could detect a Selenium implementation but not an implementation that use curl-impersonate?

Furthermore, I don't know what would be the best option to push my working changes (as these changes will add a dependency with curl-impersonate, to create a merge request, to create a fork of the repo...?

from investiny.

eromoe avatar eromoe commented on May 28, 2024

A simple Python module to bypass Cloudflare's anti-bot page
https://github.com/VeNoMouS/cloudscraper
Doest this help ?

from investiny.

alvarobartt avatar alvarobartt commented on May 28, 2024

A simple Python module to bypass Cloudflare's anti-bot page https://github.com/VeNoMouS/cloudscraper Doest this help ?

Thanks for sharing, tried that in the past but had no success with Cloudflare v2

from investiny.

eromoe avatar eromoe commented on May 28, 2024

I come up an idea,
I am using cloudflare' warp to bypass chat.openai.com testing. ( openai block some country ) could take a try, but not sure openai use Cloudflare v2 . Maybe combine with clouldscraper .

from investiny.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.