backblaze / b2-sdk-python Goto Github PK

View Code? Open in Web Editor NEW

177.0 177.0 57.0 4.31 MB

Python library to access B2 cloud storage.

License: Other

Python 99.97% Shell 0.03%

production

b2-sdk-python's People

Contributors

Stargazers

Watchers

b2-sdk-python's Issues

Hang when backblaze endpoint stops responding

Several times over the past several days we have witnessed b2 upload-file hang. We run these commands with --noProgress and there is no output that we can see from our logging that indicates that anything went wrong, however the command hangs for >24h until a human intervenes. In debugging this we've seen a connection open to backblaze but no network traffic is being sent or received from the process. Every time we kill and restart the process, it succeeds, so debugging this is is rather tricky.

We have experienced nearly identical issues with other python projects that were traced back to a lack of a timeout parameter in python requests. I can't confirm this is exactly what is happening, but I do see what appears to be similar code issues with .post and .get which seem to be at the core of the b2 HTTP API (both before the 1.4.0 release and after):

https://github.com/Backblaze/b2-sdk-python/blob/master/b2sdk/b2http.py#L290
https://github.com/Backblaze/b2-sdk-python/blob/master/b2sdk/b2http.py#L358

Without a timeout there, if the server simply stops responding but keeps the socket open, my understanding and experience is that these calls will hang indefinitely.

From the requests docs:

You can tell Requests to stop waiting for a response after a given number of seconds with the timeout parameter. Nearly all production code should use this parameter in nearly all requests. Failure to do so can cause your program to hang indefinitely:

>>> requests.get('https://github.com/', timeout=0.001)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)

Note timeout is not a time limit on the entire response download; rather, an exception is raised if the server has not issued a response for timeout seconds (more precisely, if no bytes have been received on the underlying socket for timeout seconds). If no timeout is specified explicitly, requests do not time out.

No public API for getting a list of file ids for a filename

In order to delete a file you need to know the file id of the version to delete but unless I'm missing something there is no straightforward way to list the different versions of a file with a given filename. bucket.ls exists but since it enforces a trailing / on the folder you pass the only way I can see right now to get this through the public API is to do something like this:

def get_file_ids(bucket, filename):
    for file_info, folder_name in bucket.ls(folder_to_list=os.path.dirname(filename), show_versions=True):
        if file_info.file_name == filename:
            yield file_info

Obviously this potentially wastes bandwidth by returning all files in the same "folder".

Remove raising CommandError in make_folder_sync_actions

def make_folder_sync_actions(
    source_folder, dest_folder, args, now_millis, reporter, policies_manager=DEFAULT_SCAN_MANAGER
):
    """
    Yields a sequence of actions that will sync the destination
    folder to the source folder.
    """
    if args.skipNewer and args.replaceNewer:
        raise CommandError('--skipNewer and --replaceNewer are incompatible')

    if args.delete and (args.keepDays is not None):
        raise CommandError('--delete and --keepDays are incompatible')

    if (args.keepDays is not None) and (dest_folder.folder_type() == 'local'):
raise CommandError('--keepDays cannot be used for local files')

In sdk those should probably be assertions and we should have similar checks in the CLI with proper reporting to the user

Feature Request: Typings

Request:
Provide typings for python

Sufficient to support libraries like mypy.
A starting point perhaps: https://mypy.readthedocs.io/en/stable/stubgen.html?highlight=generate

Not sure if this has already been considered, but I saw no record of it.
It would also make using some of your APIs more intuitive, as I have a hard time discerning return types.

Thanks!

Default to a better cache

b2-sdk-python/b2sdk/session.py

Line 67 in b103939

cache = DummyCache()

If the user specifies no AccountInfo object, they get a SqliteAccountInfo and an AuthInfoCache by default. This is a good thing. However, if they specify any other AccountInfo object (InMemoryAccountInfo for instance), they get an even worse cache: the DummyCache.

What could break if at the very least this defaulted to the InMemoryCache at minimum? Personally I don't see the harm of always defaulting to the AuthInfoCache instead. AbstractAccountInfo already forces the implementation of the functions necessary for AuthInfoCache to function correctly. This could only break existing code where people use their own AccountInfo objects that are themselves broken.

Avoid duplicate keys

Hi Author. Thanks a lot for this great SDK. Very easy to setup and use.

I'm wondering how i can avoid uploading duplicated files?
say i upload a file twice that has same key, is there a way to reject the second upload somehow?

Thanks!

Move the test scripts

move b2sdk/account_info/test_upload_url_concurrency.py and test_raw_api, test_raw_api_helper, _clean_and_delete_bucket, _should_delete_bucket, _add_range_header from b2sdk/raw_api.py to somewhere else, so that we can still execute them in pre-commit.sh, but so that it does not get test coverage tracking and so that it is not shipped in the package to the library user

'b2' has no attribute 'bucket'

When trying to make a backup with benji to backblaze b2 i get this error:

ERROR: An exception of type AttributeError occurred: module 'b2' has no attribute 'bucket'

question: how to throttle file transfers?

Hello,
I have a python3 application leveraging the b2sdk; how can I throttle bucket.upload_local_file() for example? I'd like to limit to a set bandwidth if possible, say 100Mbps; is that possible?

treat warnings as errors, detect deprecation early in CI

from https://mail.python.org/archives/list/[email protected]/message/EYLXCGGJOUMZSE5X35ILW3UNTJM3MCRE/

use the development mode to see
DeprecationWarning and ResourceWarning: use the "-X dev" command line
option or set the PYTHONDEVMODE=1 environment variable. Or you can use
the PYTHONWARNINGS=default environment variable to see
DeprecationWarning.

You might even want to treat all warnings as errors to ensure that you
don't miss any when you run your test suite in your CI. You can use
PYTHONWARNINGS=error, and combine it with PYTHONDEVMODE=1.

Warnings filters can be used to ignore warnings in third party code,
see the documentation:
https://docs.python.org/dev/library/warnings.html#the-warnings-filter

ERROR:b2sdk.bucket:error when uploading

The command run from a cron.hourly script was

b2 sync --noProgress --keepDays 14 /home/data/v2 b2://bhs-backup/ > /dev/null

b2 wrote to stderr

ERROR:b2sdk.bucket:error when uploading, upload_url was https://pod-000-1009-12.backblaze.com/b2api/v2/b2_upload_file/redacted/redacted
Traceback (most recent call last): 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/bucket.py", line 615, in _upload_small_file 
content_type, HEX_DIGITS_AT_END, file_info, hashing_stream 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/raw_api.py", line 533, in upload_file 
return self.b2_http.post_content_return_json(upload_url, headers, data_stream) 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/b2http.py", line 297, in post_content_return_json 
response = _translate_and_retry(do_post, try_count, post_params) 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/b2http.py", line 127, in _translate_and_retry 
return _translate_errors(fcn, post_params) 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/b2http.py", line 60, in _translate_errors 
int(error['status']), error['code'], error['message'], post_params 
ServiceError: 500 internal_error incident id 62096cb3b343-0424780b144c

Question about friendly and native url

Hi,
First thanks for the awesome python sdk

I have one question how do we retrieve the friendly and native URL using this sdk.

When I upload a file, i just get a FileVersionInfo object, which in the docs doesn't have a friendly or a native url property.

I think a way to get these URLs is useful if I want to upload a file and link this file (based on these URLs) to a column (for example file_url) in my database.

For now I'm just hardcoding the url with something like:

response = bucket.upload_bytes() # ....
file_id = response.id_
url = "https://f000.backblazeb2.com/b2api/v1/b2_download_file_by_id?fileId=" + file_id

Thanks

Better documentation for B2Api.init

what the default values are for the parameters, which are required, why is it set up in this way and what is the intended usage

400 bad_request more than one upload using auth token

I have a loop uploading files locally to B2 cloud. After a few thousand uploads over a few hours, I get this error:
b2sdk.exception.UnknownError: Unknown error: 400 bad_request more than one upload using auth token 4_0021a87abca5e6b0000000005_018fe4f8_97f3ec_uplg_wANv-9n9ent4nLDoDI0yMFmZqeQ=

I feel like upload_local_file method should be able to handle renegotiating a new auth token on its own rather than failing as such.

Improve sync_folders interface for library callers

def sync_folders(
    source_folder,
    dest_folder,
    args,
    now_millis,
    stdout,
    no_progress,
    max_workers,
    policies_manager=DEFAULT_SCAN_MANAGER,
    dry_run=False,
    allow_empty_source=False,
):

args mandatory argument is a choice that we've made in the CLI times and an area which we can improve on.

should bucket.update() let me set my bucket_info to {}?

Hi --

I'd like to set the bucket info on my bucket to not have any entries. To do that, I tried calling bucket.set_info() and got an assert. Here's a simple failing case:

b2_api = B2Api(InMemoryAccountInfo())
b2_api.authorize_account(url, app_key_id, app_key)
bucket = b2_api.get_bucket_by_name("myBucket"))

bucket.set_info({})

This triggers an assert on line 558, in update_bucket:

assert bucket_info or bucket_type

I'm running on macOS 10.15.7.
I'm using Python 3.8.6 installed with mac ports.
i have the following b2 things installed with pip3:

$ pip3 list | grep b2
b2 2.0.2
b2sdk 1.1.4

Here's the stack trace that's generated: stack.txt

A similar thing happens when I use bucket.update(bucket_info={}), which ends up on the same place. I'm currently working around it by passing the bucket's type back in:

bucket.update(bucket_type=bucket.as_dict()['bucketType'], bucket_info={}, lifecycle_rules=[])

Issues with clearing large upload file tokens

clear_large_file_upload_urls may not clear some urls that are not in the pool right now, as they are "rented" by upload threads. Those would need to be blacklisted or kept in a structure which keeps track of them even during the rental period, or something like that.

The documentation doesn't say so, but the idea behind that feature in the first place was that a failing pod is very likely to fail any subsequent requests, so the sdk code invalidates "sister" tokens to that same pod in order to save a few failing requests and go directly to retrieval of new upload urls+tokens.

It is not a severe issue - the code that improves the behavior in a corner case may be improved and I am filing this issue to not forget about this problem, maybe solve it along with some other similar issue.

Problem Creating Key

Not exactly related to the SDK. However, I am trying to access B2 using Python with the requests library to create a new key. My code is as below:

import requests
 
response = request.get("https://api.backblazeb2.com/b2api/v2/b2_create_key", params={'accountId': 'xxxx', 'capabilities': ['listKeys','listBuckets','listFiles','readFiles','shareFiles','writeFiles','deleteFiles'], 'keyName': 'test'}, headers={'Authorization': 'xxxx'})

I have included the values for the account ID and the authorization. I am able to access other B2 services. However, when using the above function I get the following error:

{'code': 'bad_request', 'message': 'duplicate name in query string: capabilities', 'status': 400}

Any idea how to overcome this?

B2ConnectionError. Max retries exceeded. Caused by NewConnectionError. Failed to establish a new connection

The command run from a cron.hourly script was

b2 sync --noProgress --keepDays 14 /home/data/v2 b2://bhs-backup/ > /dev/null

b2 wrote to stderr

ERROR:b2sdk.bucket:error when uploading, upload_url was https://pod-000-1117-16.backblaze.com/b2api/v2/b2_upload_file/xxxx/xxxx
Traceback (most recent call last): 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/bucket.py", line 615, in _upload_small_file 
content_type, HEX_DIGITS_AT_END, file_info, hashing_stream 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/raw_api.py", line 533, in upload_file 
return self.b2_http.post_content_return_json(upload_url, headers, data_stream) 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/b2http.py", line 297, in post_content_return_json 
response = _translate_and_retry(do_post, try_count, post_params) 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/b2http.py", line 127, in _translate_and_retry 
return _translate_errors(fcn, post_params) 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/b2http.py", line 84, in _translate_errors 
raise B2ConnectionError(str(e0)) 
B2ConnectionError: Connection error: HTTPSConnectionPool(host='pod-000-1117-16.backblaze.com', port=443): Max retries exceeded with url: /b2api/v2/b2_upload_file/xxxx/xxxx (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f139ca04c10>: Failed to establish a new connection: [Errno 111] Connection refused',))

Bucket to bucket "sync" of all versions

Currently, the SDK is able to synchronize files between two B2 buckets (implemented in #165), but it synchronizes only the latest versions as the whole idea of synchronization works on files and not on file versions.

We may consider adding a feature to be able to sync every version of the files. It may not by b2 sync and something else, or a special b2 sync mode.

arrow dependency

the requirements.txt lists arrow>=0.8.0,<0.13.1

arrow is currently at 0.14.2 and therefore causes issues/warnings in pip

I reviewed the code, it's just a few lines of very basic usage which doesn't appear to have any issues with later versions. Please update the requirements.txt file.

Broken pipe: unable to send entire request. ServiceError: 503 service_unavailable c001_v0001009_t0016 is too busy

The command run from a cron.hourly script was

b2 sync --noProgress --keepDays 14 /home/data/v2 b2://bhs-backup/ > /dev/null

b2 wrote to stderr

ERROR:b2sdk.bucket:error when uploading, upload_url was https://pod-000-1128-03.backblaze.com/b2api/v2/b2_upload_file/redacted/redacted
Traceback (most recent call last): 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/bucket.py", line 615, in _upload_small_file 
content_type, HEX_DIGITS_AT_END, file_info, hashing_stream 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/raw_api.py", line 533, in upload_file 
return self.b2_http.post_content_return_json(upload_url, headers, data_stream) 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/b2http.py", line 297, in post_content_return_json 
response = _translate_and_retry(do_post, try_count, post_params) 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/b2http.py", line 119, in _translate_and_retry 
return _translate_errors(fcn, post_params) 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/b2http.py", line 83, in _translate_errors 
raise BrokenPipe() 
BrokenPipe: Broken pipe: unable to send entire request 
ERROR:b2sdk.bucket:error when uploading, upload_url was https://pod-000-1009-00.backblaze.com/b2api/v2/b2_upload_file/redacted/redacted
Traceback (most recent call last): 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/bucket.py", line 615, in _upload_small_file 
content_type, HEX_DIGITS_AT_END, file_info, hashing_stream 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/raw_api.py", line 533, in upload_file 
return self.b2_http.post_content_return_json(upload_url, headers, data_stream) 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/b2http.py", line 297, in post_content_return_json 
response = _translate_and_retry(do_post, try_count, post_params) 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/b2http.py", line 127, in _translate_and_retry 
return _translate_errors(fcn, post_params) 
File "/usr/local/lib/python2.7/dist-packages/b2sdk/b2http.py", line 60, in _translate_errors 
int(error['status']), error['code'], error['message'], post_params 
ServiceError: 503 service_unavailable c001_v0001009_t0016 is too busy

List of things to consider for apiver v2

#129
replace InMemoryAccountInfo with SqliteAccountInfo and a special :memory: file, see here
change all path arguments everywhere (internally too) to use pathlib instead of strings

How to calculate metrics? Bucket size, last updated, etc

We are in the process of setting up monitoring and alerts for our backblaze backups, so we are notified if one of our backup processes stops working.

Some metrics I'd like to track per bucket are:

time since last updated (timestamp of most recently uploaded file)
- this one is the most critical -- is our backup process running at all?
total size and number of files
- we are adding data daily, so in general these numbers should slowly increase over time
number of versions
- to make sure config didn't get accidentally messed up for buckets that should have versions

For total size and number of files -- I found Backblaze/B2_Command_Line_Tool#404 which adds --showSize to the CLI. But I looked at the code that calculates this, and it recursively looks at every file in the bucket and adds it up. That's simply not going to perform well. (It worked fine for a small bucket, but when I tried it for one of our larger buckets I didn't get a result after 15 minutes of waiting and killed the process) What's strange is, I can see this info on the Backblaze website, so it seems like you guys know these stats per bucket? Is there a chance it could be exposed somehow?

For time since last updated, we could add a "canary file" to the root of each bucket, make sure it gets updated regularly, and check that... but it would be far less brittle if backblaze provided this info. Do you store this info? If so, is there any way to access it?

For number of versions, I could parse out lifecycle_rules from the bucket info, so that's fine.

Any guidance here? We use prometheus for metrics, so the plan is to use this python sdk and write a simple client to export the above metrics. It would be generic enough that we could open-source the project so others could use it. But as it stands right now, I can't figure out a way to get enough information to create useful metrics in a generic way.

Thanks!

fix pypy3 build

error when using bucket download_file_by_name

Using b2api 1.0.2
I have a bucket and I can list files and get urls and everything so I think the bucket object is working.
However when I call bucket.download_file_by_name("filepath/filename", "download_path")
I get an "AttributeError: 'str' object has no attribute 'make_file_context'"

I can download the file from the backblaze webpage.
I can get the url from the b2api, so it doesnt seem to be an issue with the specific file.

Heres is the trace:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-18-88c956cdf6c6> in <module>
----> 1 bucket.download_file_by_name("filepath/filename", "download_path")

~/.pyenv/versions/3.8.1/envs/scipy/lib/python3.8/site-packages/logfury/v0_1/trace_call.py in wrapper(*wrapee_args, **wrapee_kwargs)
     82                 # actually log the call
     83                 self.logger.log(self.LEVEL, 'calling %s(%s)%s', function_name, arguments, suffix)
---> 84             return function(*wrapee_args, **wrapee_kwargs)
     85 
     86         return wrapper

~/.pyenv/versions/3.8.1/envs/scipy/lib/python3.8/site-packages/b2sdk/bucket.py in download_file_by_name(self, file_name, download_dest, progress_listener, range_)
    256             url_factory=self.api.account_info.get_download_url,
    257         )
--> 258         return self.api.transferer.download_file_from_url(
    259             url, download_dest, progress_listener, range_
    260         )

~/.pyenv/versions/3.8.1/envs/scipy/lib/python3.8/site-packages/logfury/v0_1/trace_call.py in wrapper(*wrapee_args, **wrapee_kwargs)
     82                 # actually log the call
     83                 self.logger.log(self.LEVEL, 'calling %s(%s)%s', function_name, arguments, suffix)
---> 84             return function(*wrapee_args, **wrapee_kwargs)
     85 
     86         return wrapper

~/.pyenv/versions/3.8.1/envs/scipy/lib/python3.8/site-packages/b2sdk/transferer/transferer.py in download_file_from_url(self, url, download_dest, progress_listener, range_)
     94             )
     95 
---> 96             with download_dest.make_file_context(
     97                 metadata.file_id,
     98                 metadata.file_name,

~/.pyenv/versions/3.8.1/lib/python3.8/contextlib.py in __enter__(self)
    111         del self.args, self.kwds, self.func
    112         try:
--> 113             return next(self.gen)
    114         except StopIteration:
    115             raise RuntimeError("generator didn't yield") from None

~/.pyenv/versions/3.8.1/envs/scipy/lib/python3.8/site-packages/b2sdk/download_dest.py in write_file_and_report_progress_context(self, file_id, file_name, content_length, content_type, content_sha1, file_info, mod_time_millis, range_)
    211         mod_time_millis, range_
    212     ):
--> 213         with self.download_dest.make_file_context(
    214             file_id, file_name, content_length, content_type, content_sha1, file_info,
    215             mod_time_millis, range_

AttributeError: 'str' object has no attribute 'make_file_context'

[Question] Uploading file itself rather than giving full path

Hello,
I've read quite a bit through the documentation, but haven't really found an answer.

Users are uploading their files to the website (Using flask) and I'd like to upload these files backblaze b2. I have a FileStorage Object which I'd rather not save to my disk locally and afterwards upload to backblaze as it creates an unnecessary step.

The way I see it there is 'upload_local_file' and 'upload'.
The first requires the full path to the file, which I don't have and the second one upload I don't quite understand how to use it or what upload_source is supposed to be.

Can I achieve what I want or does the API not support uploading files directly?

Edit: easier question while I work around this issue: How to get the generated Id when I upload a file with upload_local_file?

Thanks

fix tests on windows

the package is being built, but the tests are not being run

Set the threads of parallel transferer to be daemons

... so that if the main thread terminates, they terminate as well on their own.

I think we don't want the library user to call wait() on the internal threads of b2sdk.

Getting File ID from name

Hi,

I know that this question was asked previously here but I had some further questions regarding this functionality.

You previously mentioned that we could use list_file_versions to do this - which is fairly easy to do (just limit the fetch_count to 1). But my concern with that is that in terms of "API cost" it's more expensive than calling get_file_id - as list_file_versions is in the same class as list_file_names.

So my question is, could you have a call which assumes the latest version for the file you want the information about?

If this is resolved with some other solution then I would be happy to use that but I am just trying to reduce the number of API calls and I couldn't see any obvious alternative.

Thanks

Does B2 add padding to uploads? 256 KB -> 262 KB

Sorry if this has already been discussed before, but I'm uploading to B2 using:

client.get_bucket_by_name(bucket_name).upload_bytes(b"0" * 2**18, path)

and I see that the uploaded file size on the B2 web portal is 262 KB instead of 256 KB. Is this expected? Is there some automatic padding that is added.

ERROR: FAILED to upload after 5 tries. Encountered exceptions: Connection error: ('Connection aborted.', BadStatusLine("''",))

The command run from a cron.hourly script was

b2 sync --noProgress --keepDays 14 /home/data/v2 b2://bhs-backup/ > /dev/null

b2 wrote to stderr

ERROR: FAILED to upload after 5 tries. Encountered exceptions: Connection error: ('Connection aborted.', BadStatusLine("''",)) 
Connection error: ('Connection aborted.', BadStatusLine("''",)) 
Broken pipe: unable to send entire request 
Connection error: ('Connection aborted.', error(104, 'Connection reset by peer')) 
Connection error: ('Connection aborted.', error(104, 'Connection reset by peer'))

Switch emerger code from str.format() to %

Specifying folder when using upload_local_file()

Hi there,

Is there functionality to upload a file to a folder when using the upload_local_file() function?

Example: "my_folder/my_file.txt"

I assumed that it would be possible simply by prepending the B2 folder path to the file_name argument. However, this didn't work (no file appears although the function doesn't throw any errors).

I've confirmed that uploading a file without any folder (e.g. "my_file.txt") does work.

What is the correct way to upload a file to a specific folder?

Just to be clear, the folder does exist.

v1.0.0 API structure

It would be more convenient for users if the most-used classes, like B2Api wore available in the top-level package. Some things are internal, some are not, it is very hard to say for the user which things should be used and which shouldn't.

Cannot import b2sdk.v1

Following the simple installation instructions (in a virtualenv) does not seem to work:

% virtualenv v
% . v/bin/activate
(v) % pip install b2sdk
(v) % python -c 'from b2sdk.v1 import InMemoryAccountInfo'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'b2sdk.v1'

Extra info:

(v) % python --version
Python 3.7.3
(v) % pip show b2sdk
Name: b2sdk
Version: 0.1.8
Summary: Backblaze B2 SDK
Home-page: https://github.com/Backblaze/b2-sdk-python
Author: Backblaze, Inc.
Author-email: [email protected]
License: MIT
Location: /home/rob/tmp/v/lib/python3.7/site-packages
Requires: logfury, requests, tqdm, six, setuptools, arrow
Required-by:

the uploader does not check for maximum number of custom header fields

when one passes a dict to file_info it is not checked whether too many entries fields are present.

Cryptic error is thrown, ie:

FAILED to upload after 5 tries. Encountered exceptions: Broken pipe: unable to send entire request

Bucket.download_file_by_name fails when getting bucket with B2Api.get_bucket_by_id

After getting a Bucket instance by calling B2Api.get_bucket_by_id(), I'm unable to download files with bucket.download_file_by_name(). A minimal example of the problem:

from b2sdk.v1 import InMemoryAccountInfo, B2Api, DownloadDestLocalFile

app_id = '<app_id>'
app_key = '<app_key>'
bucket_id = '<bucket_id>'
bucket_name = '<bucket_name>'

b2 = B2Api(InMemoryAccountInfo())
b2.authorize_account('production', app_id, app_key)

bucket = b2.get_bucket_by_id(bucket_id)
# bucket = b2.get_bucket_by_name(bucket_name)

dst = DownloadDestLocalFile('local_file_name')
bucket.download_file_by_name('<b2_file_path>', dst)

If the commented line that calls get_bucket_by_name() is used instead, the snippet works as expected.

Tried both v1.0.2 on PyPI and installing directly from master (more specifically 1707190).

The problem seems to be that the Bucket class assumes that its name attribute is always set in download_file_by_name() which is not a correct assumption when the bucket is instantiated with B2Api.get_bucket_by_id().

Is this a bug or have I missed something in the documentation?

Error when b2 writes non-ASCII to stdout

The command

b2 sync --noProgress --keepDays 14 /home/data/v2 b2://backup/ > /path/to/logfile

produced this error on stderr.

WARNING:b2sdk.sync.report:could not output the following line with encoding None on stdout due 
to 'ascii' codec can't encode character u'\xfa' in position 40: ordinal not in range(128): 
upload files/Playlist-for-Derecho-a-la-música-9-15-19.csv

I think this is not my fault as a user of the b2 command.

If b2 uses ASCII on stdout then it should take care to not send any non-ASCII characters to stdout to avoid this python error.

Better than that is to use UTF-8 for stdout, at least when it is not connected to a term.

Transferer v2

We need to significantly alter how uploads are handed in b2sdk to provide better configurability and to properly handle multi-part server side copy.

The logic of uploads is going to be moved to transferer, but it is going to be pretty abstract. The same flow will be used by single part uploads, multi-part uploads, multi-part upload continuation, single-part server side copy, mutli-part server side copy as well as "emerging" - synthesyzing a file out of pieces, some of which are in the cloud already and some are still local and need to be uploaded. The Emerger will be capable of uploading, server-side copying and downloading a stream to then upload it, that's why it is not called an Uploader.

A corner case of that is when we need to download some data to then upload it again (due to server-side lower size limit of part equal currently to 5MB) and another is that sometimes a very large file needs to be copied and Emerge will need to break it down to smaller chunks, otherwise the server-side copy will refuse to handle the request.

The copy operation will have an optional flag, which will terminate the operation if any data would need to be downloaded in order to fulfill the copying (so if it would not be a purely server-side copy). Setting that flag to True will force the Emerger to consume the entire iterable to verify whether the operation is a pure copy (unless it is not - then it can exit as soon as a range that cannot be satisfied without downloading is found).

There will be two thread pools, one for uploading and another one for downloading. This is to avoid deadlocks when we'd need to download data in order to upload it back and all threads in the pool are busy doing that (waiting for someone to download the data). Since downloads will never wait for uploads (in the current design), two threadpools will ensure we avoid a deadlock.

To decrease the final PR size, we'll move the upload logic as is to transferer first. We'll implement b2_copy_part support in raw_api as well as in the simulator. Some tests can maybe be added before the feature is actually implemented.

The part of code which decides how to split a file into parts is going to have a simple implementation - it is a very hard problem to optimize and we are not going to spend a lot of effort on the strategy at this point. This may cause some copy operations which could be completed with just upload+copy to not be properly recognized, so if the forbid_downloads flag is set to True, we may return failure because of an imperfect strategy. This is a known limitation of the design and we will be open for pull requests from anyone wishing to optimize this further (as long as it doesn't degrade the performance of the naiive approach (too much)).

In Transferer we already delegate some functionality away to Downloaders. This is nice because Transferer class itself is smaller. Here the functionality will be moved to a new class, called Emerger. Emerger, in its main operation, will need to accept an iterable of RangeToEmerge objects, which may be:

a range of a remote file (possibly the entire file)
a range of a local file (possibly the entire file)
an unfinished upload part (in case we are continuing an interrupted upload)

but those can be mixed. Sometimes a file will be present both on the cloud and on the local filesystem and Emerger will decide whether to use the local part or download it. A range may be present on the cloud in multiple objects at different locations and maybe the same applies to the local side too, but optimization of local reading to maximize streaming is not the goal at this point. Therefore the interface will NOT allow the user to provide multiple locations of a range of the same type (cloud, local, unfinished part). Our API interface policy allows us to easily add such support in the future without breaking interface compatibility for the existing users, so we'll match the interface with the current implementation plan and will change the interface if it's ever required.

Transferer will provide a few wrappers for Emerger, allowing the user to easily:

upload a local file (that's just a single RangeToEmerge with local file source and nothing else, (which may be broken down by the Emerger if the file is large))
server-side copy a file (that's just a single RangeToEmerge without local file source)

Both of those operations may break an operation into smaller ones. There are no plans to allow for forcing a file to be treated as small or large, for upload or for server-side copy. If someone would need this (though I can't imagine why), the function would be easy to add by swapping out Emerger to another implementation (so one would have to pass it to an explicitly constructed Transferer, which would be passed to B2Api).

When a piece of a file needs to be downloaded from the cloud so that we can upload it again (simplest case being a "copy" request of two remote-only ranges at least one of which is less than 5MB), we will NOT save it into a temporary storage on the local drive, but we'll pipe it directly through to the uploading thread. Local storage would be tricky to configure and may not be expected by the user, may cause security issues etc, but streaming also has its issues - if downloading is faster than uploading, we may have a memory utilization issue. Therefore care will need to be taken to avoid exploding memory in case of asymmetric network performance.

This issue is now open for comments on the design, so that we can improve the concept before a significant amount of work is put into implementation.

How to deal with hidden files during sync

When the source is B2 and the file is hidden, the file on the destination won't be deleted even with KeepOrDeleteMode.DELETE.

We should talk if it's expected behavior or maybe the file should be deleted when the destination file is local.
But when the destination file is B2 (bucket to bucket sync of the latest versions implemented in #165), then we may want to hide that file. It may require synchronizing not only the latest versions as described in #166

Tests fail on OSX

Running tests on a clean checkout on OSX fail:

running nosetests
running egg_info
writing b2sdk.egg-info/PKG-INFO
writing dependency_links to b2sdk.egg-info/dependency_links.txt
writing requirements to b2sdk.egg-info/requires.txt
writing top-level names to b2sdk.egg-info/top_level.txt
reading manifest file 'b2sdk.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'b2sdk.egg-info/SOURCES.txt'
/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/nose/config.py:430: DeprecationWarning: Use of multiple -w arguments is deprecated and support may be removed in a future release. You can get the same behavior by passing directories without the -w argument on the command line, or by using the --tests argument in a configuration file.
  warn("Use of multiple -w arguments is deprecated and "
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/runpy.py", line 263, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/dave/workspace/b2-sdk-python/setup.py", line 52, in <module>
    setup(
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/setuptools/__init__.py", line 145, in setup
    return distutils.core.setup(**attrs)
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/nose/commands.py", line 158, in run
    TestProgram(argv=argv, config=self.__config)
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/nose/core.py", line 118, in __init__
    unittest.TestProgram.__init__(
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/unittest/main.py", line 100, in __init__
    self.parseArgs(argv)
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/nose/core.py", line 145, in parseArgs
    self.config.configure(argv, doc=self.usage())
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/nose/config.py", line 346, in configure
    self.plugins.configure(options, self)
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/nose/plugins/manager.py", line 284, in configure
    cfg(options, config)
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/nose/plugins/manager.py", line 99, in __call__
    return self.call(*arg, **kw)
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/nose/plugins/manager.py", line 167, in simple
    result = meth(*arg, **kw)
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/nose/plugins/multiprocess.py", line 239, in configure
    _import_mp()
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/nose/plugins/multiprocess.py", line 150, in _import_mp
    m = Manager()
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/multiprocessing/context.py", line 57, in Manager
    m.start()
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/multiprocessing/managers.py", line 579, in start
    self._process.start()
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
    return Popen(process_obj)
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
Traceback (most recent call last):
  File "setup.py", line 52, in <module>
    setup(
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/setuptools/__init__.py", line 145, in setup
    return distutils.core.setup(**attrs)
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/nose/commands.py", line 158, in run
    TestProgram(argv=argv, config=self.__config)
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/nose/core.py", line 118, in __init__
    unittest.TestProgram.__init__(
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/unittest/main.py", line 100, in __init__
    self.parseArgs(argv)
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/nose/core.py", line 145, in parseArgs
    self.config.configure(argv, doc=self.usage())
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/nose/config.py", line 346, in configure
    self.plugins.configure(options, self)
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/nose/plugins/manager.py", line 284, in configure
    cfg(options, config)
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/nose/plugins/manager.py", line 99, in __call__
    return self.call(*arg, **kw)
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/nose/plugins/manager.py", line 167, in simple
    result = meth(*arg, **kw)
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/nose/plugins/multiprocess.py", line 239, in configure
    _import_mp()
  File "/Users/dave/config/repos/pyenv/versions/b2-sdk/lib/python3.8/site-packages/nose/plugins/multiprocess.py", line 150, in _import_mp
    m = Manager()
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/multiprocessing/context.py", line 57, in Manager
    m.start()
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/multiprocessing/managers.py", line 583, in start
    self._address = reader.recv()
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/Users/dave/config/repos/pyenv/versions/3.8.2/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError
make: *** [test] Error 1

is there a dome for upload ?

how to use this sdk in python? who has a demo?

AttributeError: 'module' object has no attribute 'packages'

I'm running duplicity 0.7.19 on Centos 7, backing up to a Backblaze B2 account. I periodically get an error:

Attempt [x] failed. AttributeError: 'module' object has no attribute 'packages'

When I do, the progress information becomes nonsense. Here's an excerpt from my logs:

Jan 29 14:18:14 fafnir backups: Getting delta of (duplicati-bb8434f1fa55a4cacac317a30d3c9d1ca.dblock.zip.aes reg) and None
Jan 29 14:18:14 fafnir backups: A duplicati-bb8434f1fa55a4cacac317a30d3c9d1ca.dblock.zip.aes
Jan 29 14:18:22 fafnir backups: AsyncScheduler: running task synchronously (asynchronicity disabled)
Jan 29 14:18:22 fafnir backups: Writing duplicity-full.20200123T060005Z.vol362.difftar.gpg
Jan 29 14:18:22 fafnir backups: Put: /tmp/duplicity-B8z5vS-tempdir/mktemp-HfU6lV-7 -> [folder redacted]/duplicity-full.20200123T060005Z.vol362.difftar.gpg
Jan 29 14:18:25 fafnir backups: Backtrace of previous error: Traceback (innermost last):
Jan 29 14:18:25 fafnir backups: File "/usr/lib64/python2.7/site-packages/duplicity/backend.py", line 369, in inner_retry
Jan 29 14:18:25 fafnir backups: return fn(self, *args)
Jan 29 14:18:25 fafnir backups: File "/usr/lib64/python2.7/site-packages/duplicity/backend.py", line 529, in put
Jan 29 14:18:25 fafnir backups: self.__do_put(source_path, remote_filename)
Jan 29 14:18:25 fafnir backups: File "/usr/lib64/python2.7/site-packages/duplicity/backend.py", line 515, in __do_put
Jan 29 14:18:25 fafnir backups: self.backend._put(source_path, remote_filename)
Jan 29 14:18:25 fafnir backups: File "/usr/lib64/python2.7/site-packages/duplicity/backends/b2backend.py", line 121, in _put
Jan 29 14:18:25 fafnir backups: progress_listener=progress_listener_factory())
Jan 29 14:18:25 fafnir backups: File "/usr/lib/python2.7/site-packages/logfury/v0_1/trace_call.py", line 84, in wrapper
Jan 29 14:18:25 fafnir backups: return function(*wrapee_args, **wrapee_kwargs)
Jan 29 14:18:25 fafnir backups: File "/usr/lib/python2.7/site-packages/b2sdk/bucket.py", line 537, in upload_local_file
Jan 29 14:18:25 fafnir backups: progress_listener=progress_listener
Jan 29 14:18:25 fafnir backups: File "/usr/lib/python2.7/site-packages/logfury/v0_1/trace_call.py", line 84, in wrapper
Jan 29 14:18:25 fafnir backups: return function(*wrapee_args, **wrapee_kwargs)
Jan 29 14:18:25 fafnir backups: File "/usr/lib/python2.7/site-packages/b2sdk/bucket.py", line 593, in upload
Jan 29 14:18:25 fafnir backups: upload_source, file_name, content_type, file_info, progress_listener
Jan 29 14:18:25 fafnir backups: File "/usr/lib/python2.7/site-packages/b2sdk/bucket.py", line 677, in _upload_large_file
Jan 29 14:18:25 fafnir backups: part_sha1_array = [interruptible_get_result(f)['contentSha1'] for f in part_futures]
Jan 29 14:18:25 fafnir backups: File "/usr/lib/python2.7/site-packages/b2sdk/utils.py", line 41, in interruptible_get_result
Jan 29 14:18:25 fafnir backups: return future.result(timeout=1.0)
Jan 29 14:18:25 fafnir backups: File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 429, in result
Jan 29 14:18:25 fafnir backups: return self.__get_result()
Jan 29 14:18:25 fafnir backups: File "/usr/lib/python2.7/site-packages/concurrent/futures/thread.py", line 62, in run
Jan 29 14:18:25 fafnir backups: result = self.fn(*self.args, **self.kwargs)
Jan 29 14:18:25 fafnir backups: File "/usr/lib/python2.7/site-packages/b2sdk/bucket.py", line 765, in _upload_part
Jan 29 14:18:25 fafnir backups: HEX_DIGITS_AT_END, hashing_stream
Jan 29 14:18:25 fafnir backups: File "/usr/lib/python2.7/site-packages/b2sdk/raw_api.py", line 545, in upload_part
Jan 29 14:18:25 fafnir backups: return self.b2_http.post_content_return_json(upload_url, headers, data_stream)
Jan 29 14:18:25 fafnir backups: File "/usr/lib/python2.7/site-packages/b2sdk/b2http.py", line 297, in post_content_return_json
Jan 29 14:18:25 fafnir backups: response = _translate_and_retry(do_post, try_count, post_params)
Jan 29 14:18:25 fafnir backups: File "/usr/lib/python2.7/site-packages/b2sdk/b2http.py", line 119, in _translate_and_retry
Jan 29 14:18:25 fafnir backups: return _translate_errors(fcn, post_params)
Jan 29 14:18:25 fafnir backups: File "/usr/lib/python2.7/site-packages/b2sdk/b2http.py", line 69, in _translate_errors
Jan 29 14:18:25 fafnir backups: if isinstance(e1, requests.packages.urllib3.exceptions.MaxRetryError):
Jan 29 14:18:25 fafnir backups: AttributeError: 'module' object has no attribute 'packages'
Jan 29 14:18:55 fafnir backups: Writing duplicity-full.20200123T060005Z.vol362.difftar.gpg
Jan 29 14:18:55 fafnir backups: Put: /tmp/duplicity-B8z5vS-tempdir/mktemp-HfU6lV-7 -> [folder redacted]/duplicity-full.20200123T060005Z.vol362.difftar.gpg

Use "releases" sphinx plugin for nice changelog management

https://releases.readthedocs.io/en/latest/changelog.html

Add a nice link to the docs (badge?) to `README.md`

deduplicate test_base.py files

Add pypy tests on Windows

https://www.appveyor.com/docs/lang/python/#testing-against-pypy

Question: Get File ID from Name

Hi,

I've noticed a lot of the file operations in the SDK take the ID of the file in order to work. Unfortunately, I am only keeping track of the file names locally. How would I go about getting the ID of a file given its name?

Thanks in advance.

Feature Request: Generate Presigned URLs

Hi there.

I'm a new B2 user. I've got files stored in a private B2 bucket and I'd like the ability to be able to generate a download token with a set expiration time that I can give to users so they can temporarily download files from my private B2 bucket.

S3 supports this functionality, and since B2 is API compatible, I believe this should be possible.

Unfortunately, when I try to use the official boto3 library to access B2 and generate a presigned URL for a B2 item, I run into issues. For example:

url = s3.generate_presigned_url('get_object', Params={
    'Bucket': 'BUCKET',
    'Key': 'FILENAME',
}, ExpiresIn=60 * 60 * 24 * 7 # 7 days in seconds
print(url)

The resulting URL I get back from this call to B2 looks something like this:

https://s3.us-west-002.backblazeb2.com/BUCKET/FILE?AWSAccessKeyId=XXX&Signature=gEPp9je72g01htiu5VZJPMZA344%3D&Expires=1591850644

Unfortunately, when I go to visit this URL, I get a B2 authorization error:

<Error>
  <Code>UnauthorizedAccess</Code>
  <Message>bucket is not authorized: BUCKET</Message>
</Error>

This led me to try to use this native B2 library to accomplish this feat instead, but unfortunately, it doesn't appear there is a way to make this work using this library.

It'd be cool to get some support added to generate a presigned file download URL =)

backblaze / b2-sdk-python Goto Github PK

b2-sdk-python's People

Contributors

Stargazers

Watchers

Forkers

b2-sdk-python's Issues

Recommend Projects

Recommend Topics

Recommend Org