kaggle / kagglehub Goto Github PK
View Code? Open in Web Editor NEWPython library to access Kaggle resources
License: Apache License 2.0
Python library to access Kaggle resources
License: Apache License 2.0
This was not the case with 0.2.5
To reproduce:
Dockerfile:
FROM python:3.12.3-slim
ARG UID=10001
RUN adduser \
--disabled-password \
--gecos "" \
--shell "/sbin/nologin" \
--no-create-home \
--uid "${UID}" \
appuser
RUN pip install kagglehub==0.2.6
USER $UID
ENTRYPOINT [ "python", "-c", "import kagglehub" ]
docker build -t kagglehub . && docker run --rm kagglehub
Posted something long ago, here.
Shortly, we can do following without hassle. These are public resources. Can we do the same with kagglehub?
images = keras.utils.get_file(
origin="https://huggingface.co/datasets/images.tar.gz",
untar=True,
)
or
hf_dataset_identifier = "{user_id}/data_id"
filename = "dataset.zip"
file_path = hf_hub_download(
repo_id=hf_dataset_identifier,
filename=filename,
repo_type="dataset"
)
kagglehub version 0.2.5
It seems unable to upload subfolders with 'link -s' while kaggle api could upload for dataset.
Hello, thank you for making this utility lib for Kaggle!
I'm trying to upload a local model to Kaggle Hub and want it to be private. However, after following the instructions in the README, I am not able to import the model to a Kaggle notebook and instead see a "No instances available" on the notebooks Add input
tab:
Steps to reproduce:
import kagglehub
handle = "lewtun/mistral-7b-sft/pyTorch/v1"
local_files = "./mistral-7b-sft/" # Just a fine-tuned Mistral 7B
kagglehub.model_upload(handle, local_files)
It is possible that something is wrong with the variation being set during the upload? Thanks!
I am getting an error with pushing a 7B model to the Kaggle Hub that wasn't happening previously. I am using kagglehub==0.2.4
and my script to reproduce the error is
import kagglehub
from huggingface_hub import snapshot_download
"""
Script to upload a Transformers model from the Hugging Face Hub to Kaggle.
To push to your Kaggle account, generate a `kaggle.json` file with your Kaggle API credentials and store it in ~/.kaggle/kaggle.json
See: https://github.com/Kaggle/kagglehub?tab=readme-ov-file#authenticate
Usage:
python upload_model.py \
--model_id ORG/MODEL_ID \
--revision REV
"""
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--model_id", type=str, help="Name of repository on the Hub in '<ORG>/<NAME>' format.")
parser.add_argument(
"--revision", type=str, default="main", help="Name of branch in repository to save experiments."
)
parser.add_argument(
"--kaggle_handle",
type=str,
default=None,
help="Kaggle handle to upload the model to. Should be in the format <KAGGLE_USERNAME>/<MODEL>/<FRAMEWORK>/<VARIATION>. Defaults to <KAGGLE_USERNAME>/{model_id}/transformers/{revision}",
)
args = parser.parse_args()
# Download repo
model_name = args.model_id.split("/")[-1]
local_dir = Path(f"data/{model_name}-{args.revision}")
local_dir.mkdir(parents=True, exist_ok=True)
snapshot_download(
repo_id=args.model_id,
revision=args.revision,
local_dir=local_dir,
ignore_patterns=[
"checkpoint-*",
"pytorch_model*",
".git*",
"*_results.json",
"*.bin",
"trainer_*",
], # Kaggle doesn't allow uploads >25 files, so we need to heavily filter: https://github.com/Kaggle/kagglehub/issues/116
)
# If no handle is provided, default to <KAGGLE_USERNAME>/{model_id}/pyTorch/{revision}
if args.kaggle_handle is None:
kaggle_username = kagglehub.config.get_kaggle_credentials().username
kaggle_handle = f"{kaggle_username}/{model_name}/transformers/{args.revision}"
print(f"Pushing to Kaggle Hub with handle {kaggle_handle} ...")
kagglehub.model_upload(kaggle_handle, local_dir)
print("Done!")
Here is the full stack trace:
Traceback (most recent call last):
File "/fsx/lewis/git/hf/h4/scripts/deployment/aimo/upload_model.py", line 84, in <module>
main()
File "/fsx/lewis/git/hf/h4/scripts/deployment/aimo/upload_model.py", line 75, in main
kagglehub.model_upload(kaggle_handle, local_dir)
File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/models.py", line 53, in model_upload
create_model_instance_or_version(h, tokens, license_name, version_notes)
File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/models_helpers.py", line 67, in create_model_instance_or_version
raise (e)
File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/models_helpers.py", line 61, in create_model_instance_or_version
_create_model_instance(model_handle, files, license_name)
File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/models_helpers.py", line 34, in _create_model_instance
api_client.post(f"/models/{model_handle.owner}/{model_handle.model}/create/instance", data)
File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/clients.py", line 122, in post
process_post_response(response_dict)
File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/exceptions.py", line 115, in process_post_response
raise BackendError(response["error"], error_code)
kagglehub.exceptions.BackendError: The service storage has thrown an exception. HttpStatusCode is NotFound. No such object: kaggle-models-data/inbox/1002070/534392bb61a8dd2cdbab22e064199089/generation_config.json.lock
I noticed that the KaggleAPI library supports the use of a proxy by specifying it in the kaggle.json file. Since kagglehub also reads the configuration from the same file, I would like to check if there are any plans to support proxy configuration in kagglehub as well.
If so, I am willing to implement this feature myself. If the kagglehub maintainers are open to this addition, I would appreciate it if someone could review my pull request once it is submitted.
Hello, I am trying to load a model that has more than 25 files and am hitting this error:
Traceback (most recent call last):
File "/fsx/lewis/git/hf/h4/scripts/deployment/aimo/upload_model.py", line 77, in <module>
main()
File "/fsx/lewis/git/hf/h4/scripts/deployment/aimo/upload_model.py", line 68, in main
kagglehub.model_upload(kaggle_handle, local_dir)
File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/models.py", line 53, in model_upload
create_model_instance_or_version(h, tokens, license_name, version_notes)
File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/models_helpers.py", line 67, in create_model_instance_or_version
raise (e)
File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/models_helpers.py", line 61, in create_model_instance_or_version
_create_model_instance(model_handle, files, license_name)
File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/models_helpers.py", line 34, in _create_model_instance
api_client.post(f"/models/{model_handle.owner}/{model_handle.model}/create/instance", data)
File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/clients.py", line 122, in post
process_post_response(response_dict)
File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/exceptions.py", line 108, in process_post_response
raise BackendError(response["error"], error_code)
kagglehub.exceptions.BackendError: The file count exceeds the maximum of 25
It would be nice if this constraint could be relaxed since I typically shard my models into smaller files to speed up the model loading on Kaggle notebooks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.