continuedev / contribution-ideas Goto Github PK

A Repo to which to Attach Contribution Ideas

contribution-ideas's Introduction

Continue

Continue keeps developers in flow. Our open-source VS Code and JetBrains extensions enable you to easily create your own modular AI software development system that you can improve.

Easily understand code sections

VS Code: cmd+L (MacOS) / ctrl+L (Windows)

JetBrains: cmd+J (MacOS) / ctrl+J (Windows)

Tab to autocomplete code suggestions

VS Code: tab (MacOS) / tab (Windows)

JetBrains: tab (MacOS) / tab (Windows)

Refactor functions where you are coding

VS Code: cmd+I (MacOS) / ctrl+I (Windows)

JetBrains: cmd+I (MacOS) / ctrl+I (Windows)

Ask questions about your codebase

VS Code: @codebase (MacOS) / @codebase (Windows)

JetBrains: Support coming soon

Quickly use documentation as context

VS Code: @docs (MacOS) / @docs (Windows)

JetBrains: @docs (MacOS) / @docs (Windows)

Getting Started

Download for VS Code and JetBrains

You can try out Continue with our free trial models before configuring your setup.

Learn more about the models and providers here.

Contributing

Check out the contribution ideas board, read the contributing guide, and join #contribute on Discord

License

contribution-ideas's People

Contributors

Stargazers

Watchers

contribution-ideas's Issues

Prompt Engineering

Since Continue works with any model, there's a ton of prompt engineering needed to optimize for each of them.

The most important area for work is the /edit slash command. GPT-4 is able to handle a very complicated prompt that we give it here, but smaller open-source models often struggle and need a simpler prompt. The goal is to reliably convert (previous code, user instructions) --> new code without the model outputting any English.

This can be done entirely from config.py by editing the prompt_templates method:

config=ContinueConfig(
  models=Models(
    default=Ollama(..., prompt_templates={"edit": """<MY_PROMPT_TEMPLATE>"""})
  )
)

The default prompt template for "edit" is the following:

[INST] Consider the following code:
\```
{{code_to_edit}}
\```
Edit the code to perfectly satisfy the following user request:
{{user_input}}
Output nothing except for the code. No code block, no English explanation, no start/end tags.
[/INST]

Support a new LLM Provider

Continue supports many different LLM providers by subclassing the LLM class. If you know of an LLM provider that we don't support, adding it can be as simple as writing a single method.

For an example, see the Ollama LLM subclass. By implementing only the _stream_complete method, the parent class will automatically fill in the _stream_chat and _complete methods.

Some providers we don't yet support:

/refactor Slash Command

For situations where you are looking to find all occurrences of some pattern, or usages of a function, etc... and you need to edit all of them, but it is slightly more complicated that find and replace.

The /refactor slash command would:

Find all occurrences of a pattern in the codebase, either by using ripgrep (see here, or semantic search, or other
Write a prompt for each of these and pass each to the LLM in parallel
Once all edits are done, parse them so that the edit can be applied to the file
Apply them to each of the files. One way of doing this is with sdk.ide.showDiff, but we might also consider a better refactoring UI here.

If you're interested in working on this, feel free to reach out and ask questions first

Update Documentation

A feature is never complete until it's documented! If you see anything out-of-date or that ought to be better explained, please edit the file and make a PR.

Starting at the root of the repo, navigate to the file you want to edit
Click the edit button as seen below:
This will tell you that you must fork the repository to proceed. Click "Fork this repository"
Use the editor that appears to make the desired changes. Once you're ready, click "Commit Changes..."

This will open up the dialog box you see here. Optionally, you can give a message explaining the changes. Click "Commit Changes"
Finally, you'll be taken to an overview of your changes. Click "Create pull request" and you're good to go! You can always make additional changes and these will be shown in the same pull request.
We'll review your code as soon as time allows, possibly ask follow-up questions, and then merge it into the main repository.

Report a Bug

If you find anything that doesn't work, please submit a GitHub Issue with this template.

Fill in the basic details, including what went wrong, what you expected, operating system, etc... and then click "Submit new issue" at the bottom of the page. You can then track progress as we work toward the solution, and ask follow-up questions.

Embeddings ContextProvider

I've started work on this here, but it hasn't been tested thoroughly and could use some user experience improvements / better design decisions.

Improve the RAG Pipeline

A "/codebase" slash command is currently being developed that will allow you to ask questions without explicitly specifying which files should be included as context. Instead, Continue will use embeddings to pull out the most important files to answer your question.

The current implementation, in the embeddings branch, uses the most basic setup with Chroma. Each file is indexed as a single document.

There is tons of room to improve the indexing and retrieval steps, and development is probably quite straightforward: most changes will happen inside of the create_codebase_index and query_codebase_index here.

Here are some of the ideas for how the pipeline can be improved (and you can also contribute by adding your own ideas here!):

Chunking
Code-aware chunking (for example chunking by function or class) (consider using tree-sitter)
Separating the text used for similarity search and the text actually returned (for example, you might write a short preamble summary in the text used for similarity search, or use the reverse of the technique of converting the question to a potential answer before doing search)
Convert the input to some text that is more appropriate for search (e.g. to a possible answer to the question, and then similarity search on that)
Custom embeddings model (currently using ada or sentence transformers (the Chroma default))
Re-ranking: retrieve many options and then prune afterward
Improve the re-ranking prompts (currently there is a "remove" prompt that choose which files are irrelevant, and an "include" prompt that says which files are important
Weight chunks by information like commit frequency/recency, file length, etc.
Use other retrieval methods like fuzzy search, ripgrep, etc. to expand the initial pool

Reliably edit file without repeating the entire contents

The current prompt for editing files requires the LLM to rewrite the entire range that it wants to edit. Ideally, it could specify only the diff, or at most a few lines surrounding the areas that it wants to edit.

To see the start of one attempt at this, look here. One idea on how to do this is by getting the LLM to output git merge conflict notation, like this:

<<<<<<< HEAD
<OLD_CODE>
=======
<NEW_CODE>
>>>>>>> updated

Other ideas:

Fine-tune a model to generate a git diff
Get the model to quote parts of the range it wants to edit
Give the model tools like "insert here", "delete here", or "find and replace"

Answer questions on Discord

Our Discord community is active and growing. The faster someone can get an answer, the happier they will be, and we think it's overall good when people are happy. If you see an unanswered question to which you know the answer, feel free to respond!

If they are reporting a bug, you can help them create an issue and reference this in Discord before moving it over to GitHub.

If they are looking for resources you know to exist in the docs, feel free to point them there. Commonly, questions can be answered by Customization and Troubleshooting.

Add a Language Server Provider

We currently support the Python LSP, but no others. Ideally, we will be able to support all languages by having plugins for each LSP.

If you can find a reliable way to download a language server, or alternatively to connect to one already running on one's machine, then it can be hooked up to a ContinueLSPClient.

Note that we might have to make a tweak to the ContinueLSPClient class, but are happy to do so - if you can present a way to run+connect to a language server, we'll gladly take it from there : )

Improved Webpage Parsing

The current URLContextProvider just uses bs4 to parse HTML to raw text, but this includes a lot of junk, including ads, navigation text, etc... We just want the title and main article. There are python libraries intended for this, but their size is more than ideal to include in our bundle.

Support a new IDE

Continue was designed so that this can be as easy as possible, but it will still probably be a good amount of work. However, I features can be added incrementally and getting 80% of Continue in a new IDE might take only 20% of the work.

To support a new IDE, you need to implement a class like here, which just maps each of the actions like "read file" or "display message" to the proper API given by that IDE.

IDEs we would like to support:

Add more settings to CompletionOptions

In the LLM class, every completion is passed a set of parameters in the CompletionOptions object.

We currently support common settings like max_tokens, temperature, top_p, top_k, frequency_penalty, and presence_penalty, but are missing things like tail-free sampling and mirostat sampling.

Some model providers, like llama.cpp will accept these, and so it is only a matter of allowing the parameter to be passed in.

Update CompletionOptions to have the parameter
Update each of stream_complete, complete, and stream_chat in __init__.py to have the argument and pass it to the CompletionOptions object
Test with all model providers to make sure that this new parameter doesn't break the request. If it does, you'll have to update the collect_args method of that class to remove the parameter. For testing, you can run pytest and llm_test.py will automatically run some tests of the model providers

Write unit tests for the rest of the LLM classes

We already have a framework setup to test subclasses of LLM in llm_test.py, but are missing some of the LLM providers found in libs/llm.

This task is mostly a matter of copying the patterns seen above in the file, running the tests, and modifying as needed to make sure that the tests can pass for each.

Create a new Policy

A Policy is essentially a function that takes the history of previous steps and decides what action to take next. Note that whenever a user enters input, this is ingested as a "UserInputStep".

As an example, consider the DefaultPolicy here. It does the following:

If there is not history, show the welcome message.
If the previous step is not a UserInputStep, stop and wait for the next one
Parse the user input for slash commands or custom commands and either
- Run the slash/custom command
- Run EditHighlightedCodeStep if input starts with '/edit'
- Otherwise, run SimpleChatStep

Policy ideas:

Whenever the last step was an EditHighlightedCode, run a "review" step that looks over the code, either just by passing to an LLM, or by linting, running the code, or checking for certain properties (ideally configurable by the user)
Generate code from scratch: Define a goal at the beginning of the policy, and then run steps that repeatedly write code, add features, and check in with the current status of the goal

Request a Feature

If you have ideas about how to improve Continue, you can submit feedback in a GitHub Issue with this template.

Tab Autocomplete

Check out the discussion here - it is probably a quick task to setup autocompletions, but a difficult task to do it very well.

Support a new chat template

The llama2 and codellama class of models use a chat template that looks like this:

[INST] <<SYS>>
{system_message}
<<SYS>>

{user_input} [/INST] {response}

But other models use different templates. For example, the Alpaca series of models uses a pattern like this, which we don't yet have implemented:

### Instruction: {system_message}

### Input: {user_input}

### Response: {response}

This can be implemented in probably less than 10 lines of code in prompts/chat.py.

Create a new ContextProvider

A ContextProvider is what allows you to type '@' and select from a dropdown of related items, potentially including GitHub Issues, local files, Jira Tickets, and more.

Docs are building a ContextProvider are here.

Some ideas:

Jira Tickets
Slack Messages
Stack Overflow search
Contents of a URL

"Use This Snippet" Button

It would be great if you could click a button in the top right of any code block from Continue's output and have it automatically inserted into your code.

If you're interested in making this contribution, don't worry about the button - we can help with that. The real task is to come up with a good prompt that reliably converts (contents of file in question, code block snippet generated by Continue) --> new contents of file.

To do this, you should implement a new subclass of Step and write the run method, where you'll have access to the ContinueSDK, including all of the utilities you need to interact with the IDE, Language Server Protocol, and more. There are many examples of steps in the plugins/steps directory, but the important part is the prompting, so there's probably no need to even clone the Continue repository for the first part of working on this - might be easier to try things in the OpenAI playground.

Unit Testing

We have a simple test set up right now to make sure that the extension can start the continue server binary on each platform, but it would be great to individually test some of the smaller components of Continue.

There's room for a small framework that would test any and all Step subclasses, and ideally we would have a unit test for each one. For example, the SimpleChatStep, which is the default, should be passed some user input, and we should make sure that, for each of several models, it runs without fail and has an appropriate description afterward.

from continuedev.src.continuedev.plugins.steps.chat import SimpleChatStep

def test(sdk):
    chat_step = SimpleChatStep(user_input="Output exactly the following string, and nothing else: 'Hello World'")
    await chat_step.run(sdk)
    assert chat_step.description = "Hello World"

In order to test with each of multiple models, you'll want to update the sdk object that is passed, for example:

# Initialize an SDK
models = [...]
for model in models:
    sdk.config.models.default = <MODEL_TO_USE>    
    test(sdk)

Headless Mode

Since Continue is built to work with any IDE, it can also work with no IDE. This would be useful if you want to run tasks in the background, for example from the CLI or in CI/CD.

Implementation of headless mode is complete at a basic level, but there isn't yet a sophisticated CLI application. Here you can find the beginnings of a simple CLI app, but much more can probably be done, for example:

allowing more options
formatting the output more nicely
developing AI agents that can run in the background in headless mode