Giter Club home page Giter Club logo

Comments (8)

femtotrader avatar femtotrader commented on August 25, 2024

In https://github.com/femtotrader/pandas_datareaders_unofficial I was using requests-cache http://requests-cache.readthedocs.org/

but @bashtage seems to also have some ideas about it. He's suggesting the use of Django cache decorator

In terms of caching, is there some reason now to use django-style @cache decorator that assumes the exact same set of inputs produces the same output, so that only the final result is cached, and the reader doesn't need to be modified?

maybe we might weight here the pros and cons of each method.

from pandas-datareader.

bashtage avatar bashtage commented on August 25, 2024

I'm not literally suggesting using django's cache - I was suggesting using their strategy, which is really algorithmically simple - same function, same inputs, read cache, else read fresh (in django it is a sql read, here it will be a network read). This would remove an install dependency on a 3rd party caching solution.

from pandas-datareader.

femtotrader avatar femtotrader commented on August 25, 2024

Sorry @bashtage (english is not my mother tongue)

My idea was to pass to DataReader function a session parameter (with None as default value). If a requests-cache session is passed it will use it. If no session is passed a normal requests is used. But yes it adds a dependency to a library (requests for "normal" session and an optional dependency to requests-cache if we want to cache results).

What I don't like with requests-cache (for now) is the lack of support of several database requests-cache/requests-cache#43

from pandas-datareader.

femtotrader avatar femtotrader commented on August 25, 2024

I'm working on this here:

https://github.com/femtotrader/pandas-datareader/tree/session

Cache seems to work fine with

  • Google Daily
  • Yahoo Daily

I think that all other datareaders have a session parameter (set to None by default)
we just need to call

session = _get_session(session)

to get a requests.Session

and use it inside other datareaders

Some help will be nice.

from pandas-datareader.

femtotrader avatar femtotrader commented on August 25, 2024

It should work now (except some readers such as wb, famafrench...)

try this:

from pandas_datareader import data
import requests_cache
import datetime
session = requests_cache.CachedSession(cache_name='cache', backend='sqlite', expire_after=datetime.timedelta(days=60))
data.DataReader(["IBM", "MSFT", "AAPL"], "yahoo", session=session)

disconnect internet connection

data.DataReader(["IBM", "MSFT", "AAPL"], "yahoo", session=session)

should still be working fine (thanks to @sinhrks great work! )

Now, we need to write some tests to see if cache is working correctly.
So we will need to disable connection.
Maybe this http://stackoverflow.com/questions/18601828/python-block-network-connections-for-testing-purposes should help

def guard(*args, **kwargs):
    raise Exception("I told you not to use the Internet!")
socket.socket = guard

We need to fix others data readers to also have this cache mechanism also working.

For example:

 from pandas_datareader import famafrench
 famafrench.get_available_datasets()

don't cache queries.

same for

data.DataReader("5_Industry_Portfolios", "famafrench")

We also need to improve doc to tell about this feature to users

from pandas-datareader.

twiecki avatar twiecki commented on August 25, 2024

That looks great. Will it automatically try to update the cache if I request data that's not available?

from pandas-datareader.

davidastephens avatar davidastephens commented on August 25, 2024

@twiecki I think how it works is if you make a request to a website, it will check if its in the cache and not expired, if not it will get a fresh copy from the server.

from pandas-datareader.

davidastephens avatar davidastephens commented on August 25, 2024

@femtotrader I'm not sure we need to test to make sure the cache is working, requests-cache isn't a dependency. We should probably test that the session parameter is working though - any way to confirm with a test that you used the same requests session?

I put together a list of the things we still need to do for this issue, please add to it if I missed anything.

  • Tests to confirm session is working for each data reader
  • Add session to famafrench.get_available_datasets
  • Others?

from pandas-datareader.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.