Giter Club home page Giter Club logo

kagglehub's People

Contributors

jeward414 avatar jplotts avatar lucyhe avatar mayankmalik-colab avatar mohami2000 avatar neshdev avatar philmod avatar rosbo avatar wcuk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kagglehub's Issues

download data w/o auth!

Posted something long ago, here.

Shortly, we can do following without hassle. These are public resources. Can we do the same with kagglehub?

images = keras.utils.get_file(
    origin="https://huggingface.co/datasets/images.tar.gz",
    untar=True,
)

or

hf_dataset_identifier = "{user_id}/data_id"
filename = "dataset.zip"
file_path = hf_hub_download(
    repo_id=hf_dataset_identifier, 
    filename=filename, 
    repo_type="dataset"
)

Support for Proxy Configuration in kagglehub

I noticed that the KaggleAPI library supports the use of a proxy by specifying it in the kaggle.json file. Since kagglehub also reads the configuration from the same file, I would like to check if there are any plans to support proxy configuration in kagglehub as well.

If so, I am willing to implement this feature myself. If the kagglehub maintainers are open to this addition, I would appreciate it if someone could review my pull request once it is submitted.

Unable to upload repos with more than 25 files

Hello, I am trying to load a model that has more than 25 files and am hitting this error:

Traceback (most recent call last):
  File "/fsx/lewis/git/hf/h4/scripts/deployment/aimo/upload_model.py", line 77, in <module>
    main()
  File "/fsx/lewis/git/hf/h4/scripts/deployment/aimo/upload_model.py", line 68, in main
    kagglehub.model_upload(kaggle_handle, local_dir)
  File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/models.py", line 53, in model_upload
    create_model_instance_or_version(h, tokens, license_name, version_notes)
  File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/models_helpers.py", line 67, in create_model_instance_or_version
    raise (e)
  File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/models_helpers.py", line 61, in create_model_instance_or_version
    _create_model_instance(model_handle, files, license_name)
  File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/models_helpers.py", line 34, in _create_model_instance
    api_client.post(f"/models/{model_handle.owner}/{model_handle.model}/create/instance", data)
  File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/clients.py", line 122, in post
    process_post_response(response_dict)
  File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/exceptions.py", line 108, in process_post_response
    raise BackendError(response["error"], error_code)
kagglehub.exceptions.BackendError: The file count exceeds the maximum of 25

It would be nice if this constraint could be relaxed since I typically shard my models into smaller files to speed up the model loading on Kaggle notebooks

The service storage has thrown an exception

I am getting an error with pushing a 7B model to the Kaggle Hub that wasn't happening previously. I am using kagglehub==0.2.4 and my script to reproduce the error is

import kagglehub
from huggingface_hub import snapshot_download


"""
Script to upload a Transformers model from the Hugging Face Hub to Kaggle.

To push to your Kaggle account, generate a `kaggle.json` file with your Kaggle API credentials and store it in ~/.kaggle/kaggle.json
See: https://github.com/Kaggle/kagglehub?tab=readme-ov-file#authenticate

Usage:

python upload_model.py \
    --model_id ORG/MODEL_ID \
    --revision REV
"""


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--model_id", type=str, help="Name of repository on the Hub in '<ORG>/<NAME>' format.")
    parser.add_argument(
        "--revision", type=str, default="main", help="Name of branch in repository to save experiments."
    )
    parser.add_argument(
        "--kaggle_handle",
        type=str,
        default=None,
        help="Kaggle handle to upload the model to. Should be in the format <KAGGLE_USERNAME>/<MODEL>/<FRAMEWORK>/<VARIATION>. Defaults to <KAGGLE_USERNAME>/{model_id}/transformers/{revision}",
    )
    args = parser.parse_args()

    # Download repo
    model_name = args.model_id.split("/")[-1]
    local_dir = Path(f"data/{model_name}-{args.revision}")
    local_dir.mkdir(parents=True, exist_ok=True)
    snapshot_download(
        repo_id=args.model_id,
        revision=args.revision,
        local_dir=local_dir,
        ignore_patterns=[
            "checkpoint-*",
            "pytorch_model*",
            ".git*",
            "*_results.json",
            "*.bin",
            "trainer_*",
        ],  # Kaggle doesn't allow uploads >25 files, so we need to heavily filter: https://github.com/Kaggle/kagglehub/issues/116
    )

    # If no handle is provided, default to <KAGGLE_USERNAME>/{model_id}/pyTorch/{revision}
    if args.kaggle_handle is None:
        kaggle_username = kagglehub.config.get_kaggle_credentials().username
        kaggle_handle = f"{kaggle_username}/{model_name}/transformers/{args.revision}"

    print(f"Pushing to Kaggle Hub with handle {kaggle_handle} ...")
    kagglehub.model_upload(kaggle_handle, local_dir)

    print("Done!")

Here is the full stack trace:

Traceback (most recent call last):
  File "/fsx/lewis/git/hf/h4/scripts/deployment/aimo/upload_model.py", line 84, in <module>
    main()
  File "/fsx/lewis/git/hf/h4/scripts/deployment/aimo/upload_model.py", line 75, in main
    kagglehub.model_upload(kaggle_handle, local_dir)
  File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/models.py", line 53, in model_upload
    create_model_instance_or_version(h, tokens, license_name, version_notes)
  File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/models_helpers.py", line 67, in create_model_instance_or_version
    raise (e)
  File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/models_helpers.py", line 61, in create_model_instance_or_version
    _create_model_instance(model_handle, files, license_name)
  File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/models_helpers.py", line 34, in _create_model_instance
    api_client.post(f"/models/{model_handle.owner}/{model_handle.model}/create/instance", data)
  File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/clients.py", line 122, in post
    process_post_response(response_dict)
  File "/fsx/lewis/miniconda3/envs/h4/lib/python3.10/site-packages/kagglehub/exceptions.py", line 115, in process_post_response
    raise BackendError(response["error"], error_code)
kagglehub.exceptions.BackendError: The service storage has thrown an exception. HttpStatusCode is NotFound. No such object: kaggle-models-data/inbox/1002070/534392bb61a8dd2cdbab22e064199089/generation_config.json.lock

Upload for private model yields "No instances available"

Hello, thank you for making this utility lib for Kaggle!

I'm trying to upload a local model to Kaggle Hub and want it to be private. However, after following the instructions in the README, I am not able to import the model to a Kaggle notebook and instead see a "No instances available" on the notebooks Add input tab:

Screenshot 2024-04-16 at 21 51 49

Steps to reproduce:

import kagglehub

handle = "lewtun/mistral-7b-sft/pyTorch/v1"
local_files = "./mistral-7b-sft/" # Just a fine-tuned Mistral 7B
kagglehub.model_upload(handle, local_files)

It is possible that something is wrong with the variation being set during the upload? Thanks!

New logging forces user to have a home folder

This was not the case with 0.2.5

To reproduce:

Dockerfile:

FROM python:3.12.3-slim

ARG UID=10001
RUN adduser \
    --disabled-password \
    --gecos "" \
    --shell "/sbin/nologin" \
    --no-create-home \
    --uid "${UID}" \
    appuser

RUN pip install kagglehub==0.2.6
USER $UID

ENTRYPOINT [ "python", "-c", "import kagglehub" ]
docker build -t kagglehub . && docker run --rm kagglehub

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.