Giter Club home page Giter Club logo

Comments (4)

sashirestela avatar sashirestela commented on August 17, 2024 1

Hi @cryptoapebot here you have a successful example of vision+stream with that model. Two versions:

  1. Demo for external image
  2. Demo for a local image

This code runs using the simple-openai library:

package io.github.sashirestela.openai.playground;

import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Base64;
import java.util.List;

import io.github.sashirestela.openai.OpenAI;
import io.github.sashirestela.openai.SimpleOpenAI;
import io.github.sashirestela.openai.domain.chat.ChatRequest;
import io.github.sashirestela.openai.domain.chat.content.ContentPartImage;
import io.github.sashirestela.openai.domain.chat.content.ContentPartText;
import io.github.sashirestela.openai.domain.chat.content.ImageUrl;
import io.github.sashirestela.openai.domain.chat.message.ChatMsgUser;

public class DemoVision {

    private SimpleOpenAI openai;
    private OpenAI.ChatCompletions chatService;

    public DemoVision() {
        openai = SimpleOpenAI.builder()
                .apiKey(System.getenv("OPENAI_API_KEY"))
                .build();
        chatService = openai.chatCompletions();
    }

    public void demoCallChatWithVisionExternalImage() {
        var chatRequest = ChatRequest.builder()
                .model("gpt-4-turbo-2024-04-09")
                .messages(List.of(
                        new ChatMsgUser(List.of(
                                new ContentPartText(
                                        "What do you see in the image? Give in details in no more than 100 words."),
                                new ContentPartImage(new ImageUrl(
                                        "https://upload.wikimedia.org/wikipedia/commons/e/eb/Machu_Picchu%2C_Peru.jpg"))))))
                .temperature(0.0)
                .maxTokens(500)
                .build();
        var chatResponse = chatService.createStream(chatRequest).join();
        chatResponse.filter(chatResp -> chatResp.firstContent() != null)
                .map(chatResp -> chatResp.firstContent())
                .forEach(System.out::print);
        System.out.println();
    }

    public void demoCallChatWithVisionLocalImage() {
        var chatRequest = ChatRequest.builder()
                .model("gpt-4-turbo-2024-04-09")
                .messages(List.of(
                        new ChatMsgUser(List.of(
                                new ContentPartText(
                                        "What do you see in the image? Give in details in no more than 100 words."),
                                new ContentPartImage(loadImageAsBase64("src/main/resources/machupicchu.jpg"))))))
                .temperature(0.0)
                .maxTokens(500)
                .build();
        var chatResponse = chatService.createStream(chatRequest).join();
        chatResponse.filter(chatResp -> chatResp.firstContent() != null)
                .map(chatResp -> chatResp.firstContent())
                .forEach(System.out::print);
        System.out.println();
    }

    private ImageUrl loadImageAsBase64(String imagePath) {
        try {
            Path path = Paths.get(imagePath);
            byte[] imageBytes = Files.readAllBytes(path);
            String base64String = Base64.getEncoder().encodeToString(imageBytes);
            var extension = imagePath.substring(imagePath.lastIndexOf('.') + 1);
            var prefix = "data:image/" + extension + ";base64,";
            return new ImageUrl(prefix + base64String);
        } catch (Exception e) {
            e.printStackTrace();
            return null;
        }
    }

    public static void main(String[] args) {
        var demoVision = new DemoVision();
        demoVision.demoCallChatWithVisionExternalImage();
        demoVision.demoCallChatWithVisionLocalImage();
    }
}

from openai-java.

sashirestela avatar sashirestela commented on August 17, 2024 1

@cryptoapebot To extend my answer, to generate images you should use the models dall-e-2 and dall-e-3 only. The vision feature (read images and describe them) is attached to the chat completion service and you should use one of the gpt models, including the gpt-4-turbo-2024-04-09. You can take a look at this OpenAI model endpoint compatibility table:

https://platform.openai.com/docs/models/model-endpoint-compatibility

from openai-java.

xyifhgvnlo286 avatar xyifhgvnlo286 commented on August 17, 2024 1

[openai4j]( https://github.com/Lambdua/openai4j )It is a fork in this library that already supports gpt4 vision

final List<ChatMessage> messages = an ArrayList<>();
final ChatMessage systemMessage = a SystemMessage("You are a helpful assistant.");
//Here, the imageMessage is intended for image recognition
final ChatMessage imageMessage = UserMessage.buildImageMessage("What's in this image?",
        "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg");
        messages.add(systemMessage);
        messages.add(imageMessage);
ChatCompletionRequest chatCompletionRequest = ChatCompletionRequest.builder()
        .model("gpt-4-turbo")
        .messages(messages)
        .n(1)
        .maxTokens 200)
        .build();
ChatCompletionChoice choice = service.createChatCompletion(chatCompletionRequest).getChoices().get(0);
        System.out.println(choice.getText());

from openai-java.

cryptoapebot avatar cryptoapebot commented on August 17, 2024

Thank you!

from openai-java.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.