Giter Club home page Giter Club logo

tinyllm's Issues

vLLM Pascal architecture fix no longer works

Hello, what a great project! Unfortunately, vLLM fix for Pascal arch no longer works on the main branch.
vLLM changed the way it checks for compute capability, but I was unable to find how it's done in the current version
Would you be able to refactor the Pascal patch to make TinyLLM work with recent version of vLLM? Would much oblige!

Ollama GPU support on Apple Silicon

When leveraging Ollama via Docker as mentioned in Option 1 on Apple Silicon using the --gpus=all flag. Since Apple Silicon is not using Nvidia GPU's. Docker Desktop is not exposed to Apple's own GPU, and users may receive the following error message:

docker: Error response from daemon: could not select device driver "" with capabilities: [[GPU]].

Recommend if I can submit a PR to the README with the following guidance:

**Apple Silicon GPU Support**:

Apple Silicon GPUs use the Metal Performance Shaders API, which is not as widely supported as NVIDIA's CUDA API. This means that Docker, which is commonly used to run applications in containers, does not detect or utilize the Apple Silicon GPU effectively.

**Docker Limitations**:
When running Ollama in Docker on an Apple Silicon Mac, the GPU is not detected, and the system falls back to using the CPU. This is because Docker images are typically configured to use NVIDIA GPU libraries, which are not compatible with Apple Silicon GPUs.

**Native Execution**:
Running Ollama natively on macOS, without Docker, can enable GPU acceleration. This approach leverages the Metal API directly, allowing better utilization of the Apple Silicon GPU.

**Model Size and Memory Constraints**:
Large models may not fit within the GPU memory available on Apple Silicon Macs, leading to fallback to CPU usage. For efficient performance, use models that fit within the memory accessible to the GPU (approximately 10.5GB for a 16GB RAM system).

Chatbot - FastAPI Template Update

/usr/local/lib/python3.10/site-packages/starlette/templating.py:178: DeprecationWarning: The name is not the first parameter anymore. The first parameter should be the Request instance.
Replace TemplateResponse(name, {"request": request}) by TemplateResponse(request, name).

Chatbot - Switch to WSGI server

INFO WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.

None of the "easy" conversions worked due to the way the threading works to support model output streaming (token streams to browser via socketio).

  • Gunicorn - No streaming token response and crashes
  • FastAPI + uvicorn + socketio - Moving to async: Test mostly works but again, no streaming

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.