jameshwade / gpttools Goto Github PK

View Code? Open in Web Editor NEW

291.0 8.0 27.0 37.61 MB

gpttools extends gptstudio for package development to help you document code, write tests, or even explain code

Home Page: https://jameshwade.github.io/gpttools/

License: Other

R 98.51% Python 1.49%

nlp package-development rstats rstudio-addin chatgpt openai

gpttools's Introduction

gpttools

The goal of gpttools is to extend gptstudio for R package developers to more easily incorporate use of large language models (LLMs) into their project workflows. These models appear to be a step change in our use of text for knowledge work, but you should carefully consider ethical implications of using these models. Ethics of LLMs (also called Foundation Models) is an area of very active discussion.

Installation

Install from GitHub with `{pak}`

# install.packages("pak")
pak::pak("JamesHWade/gpttools")

Install from R-Universe

# Enable repository from jameshwade
options(repos = c(
  jameshwade = "https://jameshwade.r-universe.dev",
  CRAN = "https://cloud.r-project.org"
))
# Download and install gpttools in R
install.packages("gpttools")
# Browse the gpttools manual pages
help(package = "gpttools")

Available AI Services and Models

AI Service	Models	Documentation	Setup
OpenAI	gpt-4-turbo, gpt-4, gpt-3.5-turbo (latest models)	OpenAI API Docs	OpenAI Setup
HuggingFace	various	HF Inference API Docs	HF Setup
Anthropic	claude-2.1, claude-instant-1.2	Anthropic API Docs	Anthropic Setup
Ollama	mistral, llama2, mixtral, phi (latest models)	Ollama API Docs	Ollama Setup
Perplexity	pplx-7b-chat, pplx-70b-chat, pplx-7b-online, pplx-70b-online, llama-2-70b-chat, codellama-34b-instruct, mistral-7b-instruct, and mixtral-8x7b-instruct	Perplexity API Docs	Perplexity Setup
Google AI Studio	Gemini and Palm (legacy)	Google AI Studio Docs	Google AI Studio Setup
Azure OpenAI	gpt-4, gpt-3.5-turbo (latest models)	Azure OpenAI API Docs	Azure OpenAI Setup

Default AI Service: OpenAI

To get started, you must first set up an API service. The package is configured to work with several AI service providers, allowing for flexibility and choice based on your specific needs. The default configuration is set to use OpenAI’s services. To use it you need:

Make an OpenAI account. Sign up here.
Create an OpenAI API key to use with the package.
Set the API key up in Rstudio. See the section below on configuring the API key.

Configuring OpenAI API Key

To interact with the OpenAI API, it’s required to have a valid OPENAI_API_KEY environment variable. Here are the steps to configure it.

You can establish this environment variable globally by including it in your project’s .Renviron file. This approach ensures that the environment variable persists across all sessions as the Shiny app runs in the background.

Here is a set of commands to open the .Renviron file for modification:

require(usethis)
edit_r_environ()

For a persistent setting that loads every time you launch this project, add the following line to .Renviron, replacing "<APIKEY>" with your actual API key:

OPENAI_API_KEY="<APIKEY>"

Caution: If you’re using version control systems like GitHub or GitLab, remember to include .Renviron in your .gitignore file to prevent exposing your API key!

Important Note: OpenAI API will not function without valid payment details entered into your OpenAI account. This is a restriction imposed by OpenAI and is unrelated to this package.

Alternative AI Service Providers

While OpenAI is the default and currently considered one of the most robust options, gpttools is also compatible with other AI service providers. These include Anthropic, HuggingFace, Google AI Studio, Azure OpenAI, and Perplexity. You can select any of these providers based on your preference or specific requirements. You can also run local models with Ollama. This requires more setup but at the benefit of not sharing your data with any third party.

To use an alternative provider, you will need to obtain the relevant API key or access credentials from the chosen provider and configure them similarly.

Privacy Notice for gpttools

This privacy notice is applicable to the R package that uses popular language models like gpt-4 turbo and claude-2.1. By using this package, you agree to adhere to the privacy terms and conditions set by the API service.

Data Sharing with AI Services

When using this R package, any text or code you highlight/select with your cursor, or the prompt you enter within the built-in applications, will be sent to the selected AI service provider (e.g., OpenAI, Anthropic, HuggingFace, Google AI Studio, Azure OpenAI) as part of an API request. This data sharing is governed by the privacy notice, rules, and exceptions that you agreed to with the respective service provider when creating an account.

Security and Data Usage by AI Service Providers

We cannot guarantee the security of the data you send via the API to any AI service provider, nor can we provide details on how each service processes or uses your data. However, these providers often state that they use prompts and results to enhance their AI models, as outlined in their terms of use. Be sure to review the terms of use of the respective AI service provider directly.

Limiting Data Sharing

The R package is designed to share only the text or code that you specifically highlight/select or include in a prompt through our built-in applications. No other elements of your R environment will be shared unless you turn those features on. It is your responsibility to ensure that you do not accidentally share sensitive data with any AI service provider.

IMPORTANT: To maintain the privacy of your data, do not highlight, include in a prompt, or otherwise upload any sensitive data, code, or text that should remain confidential.

Code of Conduct

Please note that the gpttools project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

gpttools's People

Contributors

Stargazers

Watchers

gpttools's Issues

Incorporate an open source model

Error when trying to create embeddings using the crawl() function

Thanks a lot for this amazing package and continuously developing new cool functions for it!

When I tried to crawl https://adv-r.hadley.nz/ using the code below and confirm that I want to create the embeddings, I get the following error message:

! Duplicate text entries detected.
i These are removed by default.
Error in `mutate_cols()`:
! Problem with `mutate()` column `embeddings`.
i `embeddings = purrr::map(.x = chunks, .f = create_openai_embedding, .progress = "Create Embeddings")`.
x unused argument (.progress = "Create Embeddings")
Caused by error in `.f()`:
! unused argument (.progress = "Create Embeddings")
Run `rlang::last_error()` to see where the error occurred.

Warning messages:
1: UNRELIABLE VALUE: Future (‘<none>’) unexpectedly generated random numbers without specifying argument 'seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'seed=NULL', or set option 'future.rng.onMisuse' to "ignore". 
2: UNRELIABLE VALUE: Future (‘<none>’) unexpectedly generated random numbers without specifying argument 'seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'seed=NULL', or set option 'future.rng.onMisuse' to "ignore". 
3: UNRELIABLE VALUE: Future (‘<none>’) unexpectedly generated random numbers without specifying argument 'seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'seed=NULL', or set option 'future.rng.onMisuse' to "ignore".`

reprex:

library(gpttools)
crawl("https://adv-r.hadley.nz/")

Note: I use R version 4.1.3 and gpttools v 0.0.5.

No change to code/text in script when running gpttools functions

Hello, I'm excited to get started with gpttools but I seem to have run into an issue off the bat.

I've installed gpttools using pak::pak("JamesHWade/gpttools") which seemed to work fine.

The openai token is set properly in the r_environ and works well with gptstudio.

The gpttools features appear in the addins menu and are available through Ctrl+Shift+P

However, when I try to use any of the gpttools functions something appears to be happening in the console but my script remains unchanged and I can't access the response from OpenAI.

I have tried restarting R, reinstalling the package, using a clean environment and various different code chunks and functions (suggest unit tests/suggest improvements/convert to function etc) however no matter what I've done the result has been the same.

This is an example of the output from the console:

Selection: plantae <- lit_ents %>% filter(kingdom == "Plantae") %>% count() %>% collect() %>% as.numeric()
✔ Received response from OpenAI
Text to insert: plantae <- lit_ents %>% filter(kingdom == "Plantae") %>% count() %>% collect() %>%
as.numeric()

I'm using the latest versions of R (4.3.1) and R Studio (2023.09.0+463)

Feature: Add Azure OpenAI as an Embedding Option

Based on #60, I realize we need Azure OpenAI as an embedding option.

Leave dependency on openai

Great work, thanks for this initiative!

It seems that the openai package on CRAN is not actually from OpenAI, but just a wrapper package from an OpenAI user. It also seems that there’s not a lot of experience with R development on their side, indicated by how they are struggling to define a correct required R version in DESCRIPTION. No problem of course, but it’s probably not a good idea at the moment to rely on their work.

So, I would suggest to try to leave the dependency on that package, so there’s no need for users to install your openai fork, and to allow a much wider range of users for your addin package.

Limited number of characters printed with add roxygen skeletton ?

I tested "add roxygen to function" with gpttools on a function starting like that :

blastn <- function(db, # reference database in fasta format
                   query, # sequences to blast in fasta format
                   out = "Blast_output.tsv",
                   outfmt = "6 qacc saccver evalue bitscore length pident",
                   retrieve_nonmatching = TRUE, #
                   num_threads = detectCores(),
                   max_target_seqs = 20,
                   evalue = 10,
                   verbose = 0,  # 0 for no messages, 1 for time used, 2 for blast DB creation messages
                   clean = FALSE # remove the files created (blast db based on a fasta file)
){
(...)
}

This is the result obtained :

#' Blast nucleotide sequences
#' 
#' @param db reference database in fasta format
#' @param query sequences to blast in fasta format
#' @param out output file
#' @param outfmt output format
#' @param retrieve_nonmatching if true, retrieve non matching sequences
#' @param num_threads number of threads used
#' @param max_target_seqs maximum number of target

So it is very short and it does not even provide an entry for all parameters.
If I request the same task but directly in chatGPT I obtain a much in depth reply with values for all parameters :

#' Blast nucleotide sequences
#'
#' This function allows to run a blastn search on a reference database. If the reference database is a fasta file, the function will create a blast database. 
#' If `retrieve_nonmatching` is set to `TRUE`, the function will retrieve non-matching sequences from the query file and add them to the blast output.
#' Optionally, the function can remove the files created by the blast search (blast database based on a fasta file).
#'
#' @param db a character string giving the path to the reference database in fasta format.
#' @param query a character string giving the path to the sequences to blast in fasta format.
#' @param out a character string giving the path to the output file. Default is "Blast_output.tsv".
#' @param outfmt a character string giving the format of the output. Default is "6 qacc saccver evalue bitscore length pident".
#' @param retrieve_nonmatching a logical indicating whether to retrieve non-matching sequences from the query file and add them to the blast output. Default is `TRUE`.
#' @param num_threads an integer giving the number of threads to use. Default is the number of cores detected on the machine.
#' @param max_target_seqs an integer giving the maximum number of target sequences to report for each query. Default is 20.
#' @param evalue a numeric giving the maximum expectation value to report. Default is 10.
#' @param verbose an integer indicating the level of messages to print. 0 for no messages, 1 for time used, 2 for blast DB creation messages. Default is 0.
#' @param clean a logical indicating whether to remove the files created by the blast search (blast database based on a fasta file). Default is `FALSE`.
#' @return invisible NULL.
#' @export
#' @examples
#' blastn(db = "reference.fasta", query = "sequences.fasta", out = "blast_output.tsv")

This is a much better starting point to document the function. I tested with several other large functions and I always get what looks like truncated answers from gpttools.

Is there a parameter to adjust to make the IA more talkative ?

Add option for code style to optionally include in prompts

Many R programmers prefer a particular style of code (e.g., tidyverse, base-R). It would be nice to adjust the prompts to keep to the programmers preferred style of code writing. This could be done with the option gpttools.code_style.

Upkeep for gpttools

Pre-history

usethis::use_readme_rmd()
usethis::use_roxygen_md()
usethis::use_github_links()
usethis::use_pkgdown_github_pages()
usethis::use_tidy_github_labels()
usethis::use_tidy_style()
usethis::use_tidy_description()
urlchecker::url_check()

2020

usethis::use_package_doc()
Consider letting usethis manage your @importFrom directives here.
usethis::use_import_from() is handy for this.
usethis::use_testthat(3) and upgrade to 3e, testthat 3e vignette
Align the names of R/ files and test/ files for workflow happiness.
The docs for usethis::use_r() include a helpful script.
usethis::rename_files() may be be useful.

2021

usethis::use_tidy_dependencies()
usethis::use_tidy_github_actions() and update artisanal actions to use setup-r-dependencies
Remove check environments section from cran-comments.md
Bump required R version in DESCRIPTION to 3.5
Use lifecycle instead of artisanal deprecation messages, as described in Communicate lifecycle changes in your functions

2022

usethis::use_tidy_coc()
Handle and close any still-open master --> main issues
Update README badges, instructions in r-lib/usethis#1594
Update errors to rlang 1.0.0. Helpful guides:
https://rlang.r-lib.org/reference/topic-error-call.html
https://rlang.r-lib.org/reference/topic-error-chaining.html
https://rlang.r-lib.org/reference/topic-condition-formatting.html
Update pkgdown site using instructions at https://tidytemplate.tidyverse.org
Ensure pkgdown development is mode: auto in pkgdown config
Re-publish released site; see How to update a released site

issue with ollama

Trying to get this working with ollama/codestral

It looks like I have it working with gptstudio, but with gpttools, I am getting this error when I try to launch the settings menu:

Listening on http://127.0.0.1:6828
Access your index files here:
/Users/cotinga/Library/Application Support/org.R-project.R/R/gpttools/index
Warning: Error in httr2::req_perform: HTTP 401 Unauthorized.
66:
65: signalCondition
64: signal_abort
63: abort
62: resp_abort
61: handle_resp
60: httr2::req_perform
59: is_response
58: check_response
57: httr2::resp_body_json
56: pluck_raw
55: purrr::pluck
54: vctrs_vec_compat
53: map_
52: purrr::map_chr
51: %>%
50: list_available_models.openai
48: gptstudio::get_available_models
46: observe
45:
2: shiny::runApp
1: gpttools:::launch_settings

trying most of the addins indicate that my API is not configured correctly-- but I want to use a local model... Any ideas?

# insert reprex here

Wrong "add roxygen" video on the presentation page

On the presentation page : https://jameshwade.github.io/gpttools/ the video supposed to show the Add Roxygen capabilities is in fact the same as the previous one (add comment)

Thanks for the package anyways... Looks promising !!

Enhancement via dm

This came via Twitter dm:

“Hi Michel, I'm very excited about your GPT packages for R, which look amazing. I have a question if you don't mind: do you know a way to apply chat GPT's prompts to a dataset? For instance, I have a dataset with a variable with long texts. I'd like to summarize all of them (e.g., in a sentence) with chat GPT. I've checked your packages, but I'm unsure if one can apply the prompt to an entire dataset. Any thoughts would be really useful”

gptapply()?

not an addin so serious mission creep (I sugested the api directly for now)

Misspelled function remove_new_lines_and_spaces()

Hello!

It seems that there is spelling error when crawling using crawl() and calling remove_new_lines_and_spaces()

gpttools::crawl("https://r4ds.hadley.nz/")
#> ── Crawling <https://r4ds.hadley.nz/> ──────────────────────────────────────────
#> ℹ This may take a while.
#> ℹ Gathering links to scrape
#> ℹ Total urls: 1
#> ℹ Total urls: 40
#> ℹ Scraping validated links
#> Error:
#> ℹ In index: 1.
#> Caused by error in `remove_newlines_and_spaces()`:
#> ! could not find function "remove_newlines_and_spaces"

Tested with gpttools v0.0.5 installed via devtools::install_github("JamesHWade/gpttools") and pak.

no response from fresh gpttools installation

Thanks for this software and I'm looking forward to using it during development. However, I'm having trouble getting any of the features to work.

I've installed gpttools (using install_github()) and had no problems noted in the install.

The OPENAI_API_KEY is set and works with gptstudio without a problem.

the gpttools features are included in the addins menu and are available through Ctrl+Shift+P

However, when I try to use any of the gpttools functions (roxygen documentation, script-to-function, etc) nothing is happening.

By nothing I mean there are no messages in the console or in the background-jobs pane. Furthermore, there is no evidence of increased cpu usage by rsession or rstudio.

I've tried on several installs on different linux boxes.
I'm seeing this behavior on Rstudio desktop version 2023.06.1 Build 524 and both R 4.3.0 and R 4.3.1.

Given that there don't seem to be other similar issues reported, I'm assuming that I am doing something wrong that is causing the behavior. I'd appreciate any pointers you may have to guide me back on track

thanks
allan

Documentation: Consider adding link to {gptstudio} in README

Is gpttools supported by rstudio server?

When trying to run gpttools on an Rstudio server, I get following message:

> gpttools::script_to_function_addin() ✓ API already validated in this session. Error: 'selectionGet' is not an exported object from 'namespace:rstudioapi'

Does gpttools currently support Rstudio server?

Thanks a lot in advance.

Best,

Joschka

Problem accessing azure opeani

I am trying to use your package but it is not possible for me as we have a private instance for my company. It would be very helpful if you could adapt the way URLs are created. The code below provides an example of how I can construct a URL for chat (I have not implemented the version for embeddings yet). It should be possible to have different versions.
In your implementation, it seems like the problem is how the [URL construction](https://github.com/JamesHWade/gpttools/bThe code below provides an lob/42f140acd7d91c439bd04dad7f5e23ac67c7fc42/R/azure_openai.R)

The code below provides an example of how I construct a URL for chat (I have not implemented the version for embeddings yet).

    endpoint =   Sys.getenv("GENAI_SYN_GPT4TURBO_ENDPOINT")
    deployment =  Sys.getenv("GENAI_SYN_GPT4TURBO_DEPLOYMENT_NAME")
    api_version =  Sys.getenv("GENAI_SYN_GPT4TURBO_API_VERSION")
    if(is.null(api_key))
      api_key <-  Sys.getenv("GENAI_SYN_GPT4TURBO_KEY")

    url <- glue::glue('https://{endpoint}.openai.azure.com/openai/deployments/{deployment}/chat/completions?api-version={api_version}')

Chat With Retrieval Isn't Using The Embedding

Hello,

I'm trying James Wade's simple example from the posit::conf(2023) video before I apply the gpttools package to my own work.

The issue I have is that Chat With Retrieval doesn't contextualize the response with the content scraped from the tidyverse design website. The Assistant provides a response but unlike in James' example, the Sources at the end of the response say "No context used so far.". I'm not sure what my next step should be.

I'm using R (posit) Cloud and the OpenAI API. I can see in my OpenAI account that I'm being charged for the embedding(s) and for the Assistant answering my prompts (ie. GPT-4 Turbo) so things are good at that end.

#scrape website and create text embeddings
gpttools::crawl("https://design.tidyverse.org")

#response to the question would you like to continue with creating embeddings?
#Yes
3

Thank you for any insight you can provide. I think this project is super cool.

Laura

feature request roxygen for data files

For data files, build a reasonable roxygen file for each data field for the help file.

Script to function not working

Hello again!

I found that when accessing the Script to Function option the following error is returned:

Error in gptstudio::gpt_create(model = "text-davinci-edit-001", instruction = "convert this R code into an R function or a few R functions",  : 
  unused argument (instruction = "convert this R code into an R function or a few R functions")

I tried to replace the gpt_create with gpt_edit in addin_script_to_function.R but I get no input returned after the message Inserting text from GPT...

Also, when accessin the function from the Addin dropdown in Rstudio the hover comment is referring to the commenting function not the script to function one.

Revamp authentication approach

The API key authentication approach from rtweet and rtoot are quite nice. I'd like to implement something like that.

Use proxy to access openai for Chinese users?

As a Chinese user, I came across this error when using gpttools:
! Timeout was reached: [api.openai.com] Resolving timed out after 10012 milliseconds

I guess it is due to the proxy. I am using Clash and I tried to add the local proxy in .Renviron file following this, but it still gives me the error.

Thanks in advance and looking forward to your reply!

Best regards
Kun Guo