wchatx / direct-access-py Goto Github PK

View Code? Open in Web Editor NEW

35.0 11.0 24.0 1.08 MB

Enverus Drillinginfo Direct Access Developer API Python Client

Home Page: https://direct-access-py.readthedocs.io/en/stable/api.html

License: MIT License

Python 100.00%

api oil gas data drillinginfo python enverus

direct-access-py's Introduction

Hi there 👋

direct-access-py's People

Contributors

Stargazers

Watchers

Forkers

peterflaming tobyb87 nickfranciose hm2655 east-daley enverus-ea magerton adamchainz

direct-access-py's Issues

Add to_dataframe method on V2 class

Many users of this module are also using Pandas to create dataframes from their API queries. This issue covers the requirements for providing a method that generates the dataframe for them.

Using the DDL feature of the V2 endpoints allows creation of precise dtypes for dataframes.

Still receiving timeout error

Using the following code:

prod_list = []
    print('Begin DI Pull')
    for ID in GroupedEntityIds:
        
        for row in d2.query(
        'producing-entity-details',
        fields='ApiNo,EntityId,Gas,Liq,ProdDate,ProdMonthNo',
        EntityId='in({})'.format(','.join([str(x) for x in ID])),
        DeletedDate='null',
        proddate='gt(2009-12-01)',
        pagesize=10000):
            prod_list.append(row)

Error:

RetryError: HTTPSConnectionPool(host='di-api.drillinginfo.com', port=443): Max retries exceeded with url: /v2/direct-access/producing-entity-details?
...
(Caused by ResponseError('too many 503 error responses'))

Unable to install directaccess package in FME environment

Hi,

FME has a PythonCaller module for Python integration. I try to install directaccess package in FME environment through the following command:

fme.exe python -m pip install directaccess

I get the following error messages:

Collecting directaccess
Using cached https://files.pythonhosted.org/packages/c7/b2/3bb51148af50f4aeda5ced745224317357cbaa9ea11cb3ea0995eea69a69/directaccess-1.4.0-py2.py3-none-any.whl
Collecting unicodecsv==0.14.1 (from directaccess)
Using cached https://files.pythonhosted.org/packages/6f/a4/691ab63b17505a26096608cc309960b5a6bdf39e4ba1a793d5f9b1a53270/unicodecsv-0.14.1.tar.gz
Error [WinError 2] The system cannot find the file specified while executing command python setup.py egg_info
Could not install packages due to an EnvironmentError: [WinError 2] The system cannot find the file specified

It appears to have trouble with the "python setup.py egg_info" command. The FME installation is 2019 version on Windows 10.

Need some help to resolve this issue.

Thanks in advance.

WARNING Throttled token request. Waiting 60 seconds...

I'm failing to get a token. The keys I am using (not shown below, I've just used dummies) are working in other scripts that do not use the directaccess package.

from directaccess import DirectAccessV2

d2 = DirectAccessV2(
api_key = '555555',
client_id='5555',
client_secret='555555555555555',
)

Output: JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I've also tried:
from directaccess import DirectAccessV2

d2 = DirectAccessV2(
api_key = '<555555>',
client_id='<5555>',
client_secret='<555555555555555>',
)

Output:
directaccess WARNING Throttled token request. Waiting 60 seconds...

Is there some sort of issue with my syntax? Do I need new keys to use direct access?

Requesting Example For Use-Case Scenario | Multi-Processing into SQL Server Database

A use-case scenario I'm interested in would be some form of the multi-processing example where multiple API endpoints can be queried and then loaded into a SQL server database on a regular basis. An added bonus would be to utilize the Enverus Developer API Best Practices to incrementally update a database by leveraging the "UpdatedDate" and "DeletedDate" fields, but this isn't a critical feature at this point.

Currently, I am using the multi-processing example to download .CSV files from two different API endpoints (Rigs and Rig Analytics) on a daily basis and then I have written a separate Python script that uses the pyodbc package to truncate the target SQL Server tables and then load the data in each .CSV file into their respective table/column. This occurs on a daily basis. It's clunky, but it works. My concern is that if I begin to add more endpoints, the inefficiency of my approach will come back to haunt me.

Also, I want to mention that I am an amateur Python user, so I'm open to any approach that is most sensible and greatly appreciate what is being provided with the direct-access package. Please let me know if I can provide any further information. Thank you very much!

Pin dependencies to version range

Get minimum supported version of dependencies, set range in setup.py and requirements.txt

Handle endpoint 404s

When a user provides a dataset endpoint that doesn't exist, we don't handle this ourselves, instead getting a 404 page from nginx.

Provide a useful exception for this

Support Python 2.7

Need 2.7 support for use in ArcGIS Desktop-installed Python envs

directaccess.DAQueryException: Non-200 response: 403 Authentication failed

I am getting a 403 Authentication Error partway through a query download. I am confident my API key is correct and I see you have an issue open to implement different handling for throttled request, but I am not getting throttled here it seems. Any insight?

Login successful...
Downloading production data...
Fri, 09 Oct 2020 12:34:45 directaccess INFO     Wrote 100000 records to file /var/cache/analytics/enverus/temp/2020-10-09T12:31:09.364976-producing-entities.csv
Fri, 09 Oct 2020 12:38:29 directaccess INFO     Wrote 200000 records to file /var/cache/analytics/enverus/temp/2020-10-09T12:31:09.364976-producing-entities.csv
Fri, 09 Oct 2020 12:42:24 directaccess INFO     Wrote 300000 records to file /var/cache/analytics/enverus/temp/2020-10-09T12:31:09.364976-producing-entities.csv
Fri, 09 Oct 2020 12:46:19 directaccess INFO     Wrote 400000 records to file /var/cache/analytics/enverus/temp/2020-10-09T12:31:09.364976-producing-entities.csv
Fri, 09 Oct 2020 12:50:04 directaccess INFO     Wrote 500000 records to file /var/cache/analytics/enverus/temp/2020-10-09T12:31:09.364976-producing-entities.csv
Fri, 09 Oct 2020 12:53:40 directaccess INFO     Wrote 600000 records to file /var/cache/analytics/enverus/temp/2020-10-09T12:31:09.364976-producing-entities.csv
Traceback (most recent call last):
  File "enverus_prod.py", line 66, in <module>
    old_csv = get_production_data(env, temp_path)
  File "enverus_prod.py", line 24, in get_production_data
    env.to_csv(producing_entities, os.path.join(temp_path, filename))
  File "/usr/local/lib/python3.6/dist-packages/directaccess/__init__.py", line 87, in to_csv
    for i, row in enumerate(query, start=1):
  File "/usr/local/lib/python3.6/dist-packages/directaccess/__init__.py", line 336, in query
    response.status_code, response.text)
directaccess.DAQueryException: Non-200 response: 403 Authentication failed

Match `well-origins` with `well-production-values` results`

I used the API to request two sets of results:

d2.query('well-origins') and d2.query('well-production-details')

I'm looking to combine the two results to essentially have a table like the one at the bottom of the web tool:

However I'm not finding a matching ID. Well Origins has a UID for each well, while Well Production Details has the 14-digit API well number. How could I merge these?

Many thanks again!

Add option to persist v2 pagination links

Saving the pagination links in v2 to disk would allow picking back up in the event of failure

Handle HTTP 410 for Version 1

Version 1 of the API has been mostly decommissioned. Handle HTTP 410 response from endpoints

directaccess WARNING Throttled token request. Waiting 60 seconds...

Am unable to instantiate DirectAccessV2 class. I double checked the api key, client secret & id and they are correct. I get the following error (with debug enabled):

urllib3.connectionpool DEBUG    https://di-api.drillinginfo.com:443 "POST /v2/direct-access/tokens?grant_type=client_credentials HTTP/1.1" 403 41
Mon, 29 Jun 2020 15:00:38 directaccess WARNING  Throttled token request. Waiting 60 seconds...

I am sure my credentials are correct, but this error persists 5 times before finally throwing a DAAuthException:

Traceback (most recent call last):
  File "enverus_global.py", line 24, in <module>
    d2 = login(api_key, client_id, client_secret)
  File "enverus_global.py", line 9, in login
    env = da(api_key, client_id, client_secret)
  File "/usr/local/lib/python3.6/dist-packages/directaccess/__init__.py", line 197, in __init__
    self.access_token = self.get_access_token()['access_token']
  File "/usr/local/lib/python3.6/dist-packages/directaccess/__init__.py", line 254, in get_access_token
    response.status_code, response.text)
directaccess.DAAuthException: Error getting token. Code: 403 Message: Authentication failed

Any insight?

Way to get a count of query results without downloading the dataset?

I'm looking for a way to get a count of results of a query, say:

DirectAccessV2().query('dataset'')

and return number of results instead of the results themselves, it'd be nice to know the number before downloading each individual record. I spoke with a customer service rep who mentioned something called 'post' (I could be completely wrong) but he wasn't sure if it was possible or what the Syntax was in the Python API wrapper. Does the generator object have any length attributes or methods?

"Verify=False" argument in DirectAccessV2 causing JSONDecodeError in V1.4.0

In this very first call in the script:

d2 = DirectAccessV2(...
retries = 5,
backoff_factor=1,
verify=False
)

JSONDecodeError is returned.

Without the 3 extra arguments (retries, backoff_factor and verify), ssl error is returned.

Please advise,

Thanks in advance.

Finalize to_dataframe()

The to_dataframe() method has not been released yet. Finalize and release:

The option to chunk results for large datasets is untested. Remove for now
Creating the dataframe from a temporary CSV might not be possible in some environments that don't allow write access to temp space. Try from records instead.

List of possible options for `query` method?

Hey there. I'm using the DirectAccess API to query for all wells in North America but I'm having trouble figuring out what the possible options for the query method are. From the documentation:

query(dataset, **options)

Parameters: dataset – a valid dataset name. See the Direct Access documentation for valid values
options – query parameters as keyword arguments

What are the possible "options" or query parameters? I've seen some in the examples provided but is there a full list somewhere?

Ideally I'd like to specify a rectangular Area of Interest using North, South, East & West bounds. Is this possible? Or can I only do geographical AOIs like county= 'reeves'?

Deprecation warning from urllib3 Retry

I'm seeing this warning in a project using directaccess:

/.../python3.9/site-packages/directaccess/__init__.py:69: DeprecationWarning: Using 'method_whitelist' with Retry is deprecated and will be removed in v2.0. Use 'allowed_methods' instead
  retries = Retry(

The docs describe this as a simple replacement.

Certificate problems behind corperate firewall

I am behind a corporate firewall. Anytime a production uses certificate validate it fails and I have to disable the verification. I have been unable to figure out where I would add the verify=False to the init.py module. I've tried adding it in several location. Any assistance would be appreciated.

How can I query specific fields of a dataset?

I want to query the producing-entities-details dataset, but I'd like to only download records with UpdatedDate after or between certain dates. I can see in the API Explorer that one can make a request with parameters such as btw(date1, date2) or gt(date) :

UpdatedDate	No		Value should be in the RFC3339 format	Date the record was updated. Example: updateddate=2017-01-15 updateddate=eq(2017-01-15)

but I'm not sure how this translates to the DirectAccessV2.query() method.

Replace CircleCI with Github Actions

Add helper methods for relational endpoints

The Enverus Developer API V2 has two high-level concepts in which many endpoints participate: the well hierarchy and production. This issue covers the requirements for creating helper methods that allow a user to provide a query to one of the top-level endpoints (well-origins or producing-entities) and retrieve records for the other participating endpoints (wellbores, packers, completions, producing-entity-details, etc) that correspond to their parent.

Remove `links` argument from V2 class init

Allowing a user to provide a persisted links object was intended to ease recovering from a failure but created more confusion than it was worth.

Remove the links argument and note this breaking change in the README

enhancement: Completions Query

I would really like to see how to pull completions to match when they were filed to the state. I can't seem to figure out how to extract the most recent w2 filings by the date they were filed (not date well was completed), if that makes sense. I'm sure i'm missing a relationship. For example, I've linked to the most recent new mexico completions table. I can pull the data if i already know the api, but unsure of how to do this without first going to the website to get the well ids.

https://wwwapps.emnrd.state.nm.us/ocd/ocdpermitting/Reporting/Activity/WeeklyActivity.aspx

thanks!

Optionally provide own session and logger

Allow the user to provide their own requests.Session or logger objects

Differentiate between bad API key and throttled token request

There is currently no difference in handling between a bad API key and a throttled token request.

When a user provides an incorrect API key, ensure an appropriate exception is thrown immediately.

When a user is throttled from requesting tokens too quickly, warn the user and wait for 60 seconds before trying again