Giter Club home page Giter Club logo

text-dataset-aid-plugin's Introduction

image Obsidian Downloads

Personalize your Second Brain Buddy(Text Generation Model)

Build obsidian plugin

Use a txt file to house your dataset. A feature to export your txt to a jsonl file will be added soon.

Context

Condition: Fully Working

The creation of NLP and text generation datasets are extremely impactual and has the potential to allow for researchers to train models that can automatically generate text. However, the creation of custom datasets is a teadious and slow process.

The text dataset aid is a helpful tool that can aid the creation of finetuning datasets for text generation models like GPT-3 by hand! This can make the text generated by your model after finetuning to be more personalized, detailed, or better formatted. Say no to dealing with menus through hotkey configurations!

This plugin can be used to quickly generate training data for NLP and text generation models. This would speed up research in these areas, as well as make it easier for practitioners to train these models.

The text dataset aid plugin is a helpful tool that can aid the creation of finetuning datasets for text generation models like GPT-3 by hand. This can make the text generated by your model after finetuning to be more personalized, detailed, or better formatted. Say no to dealing with menus through hotkey configurations!

Context within your second brain

Updating your own text generation model on your collected dataset whilst working in your second brain allows for your model to better fit your second brain's needs. This plugin fits in any creation or editing workflow because of the nature of commands within obsidian. Hope that you use this plugin as much as I do!

Advantages of Finetuning

Fintuning your text generation model allows for the creation of text that is more natural and expressive.

  1. increased accuracy in text prediction/generation
  2. increased fluency and coherence in text generation
  3. greater control over the style and content of generated text
  4. More control over the types of outputs the model produces
  5. Greater flexibility in the types of inputs the model can accept
  6. The ability to produce more human-like outputs
  7. Increased accuracy in the prediction of certain types of outputs

An great resource for fine-tuning principles from microsoft

Usage

The core function of this plugin is made easier through the use of vim mode, but should work in either case. There are two commands offered currently:(Each of these commands has an acommpanying hotkey configureable from hotkeys)

When you send the prompt to the dataset if there is already a prompt there, the plugin does nothing.

When you send the completion to the dataset and there is already a prompt the text selection is sent to the dataset as a completion to that prompt.

Open Ended Generation Support!

When you send the completion to the dataset and there is not a prompt, the text selection is inserted into the dataset with a empty prompt prepended to the text selection.

an example of this

{"prompt":"", "completion":"Hello can I help you?"}

another example

{"prompt":"", "completion":"Hi, How can I help you today"}

Send the Selection to send to your dataset file as prompt Send the Selection to send to your dataset file as completion

Example of finetuning dataset

{"prompt":"Company: BHFF insurance\nProduct: allround insurance\nAd:One stop shop for all your insurance needs!\nSupported:", "completion":" yes"}
{"prompt":"Company: Loft conversion specialists\nProduct: -\nAd:Straight teeth in weeks!\nSupported:", "completion":" no"}

Installation

Installing from the community plugins page in obsidian

  • Open Settings > Third-party plugin
  • Make sure Safe mode is off
  • Click Browse community plugins
  • Search for "Dataset Finetuning Aid Plugin"
  • Click Install
  • Once installed, close the community plugins window and activate the newly installed plugin

Manually Installing from github

  • Download the Latest Release from the Releases section of the GitHub Repository(if you can't find this it should be to the right while your viewing this)
  • Extract the plugin folder from the zip to your vault's plugins folder: <vault>/.obsidian/plugins/
    Note: On some machines the .obsidian folder may be hidden. On MacOS you should be able to press Command+Shift+Dot to show the folder in Finder.
  • Reload Obsidian

Settings

There are four main settings that are configurable within the settings panel of the plugin, but the default values are set up for the popular format for datasets for text generation models called jsonl.

Setting Name Description Default
Prefix for Prompts This is the string that is prepended to the prompt when sent to the dataset {"prompt":
Suffix for Prompts This is the string that is appended to the prompt when sent to the dataset ,
Prefix for Completion This is the string that is prepended to the completion when sent to the dataset "completion":
Suffix for Completion This is the string that is appended to the completion when sent to the dataset }\n

Help within development

Development

Creating a new version:

git tag -a 1.0.1 -m "1.0.1"
git push origin 1.0.1

Inspiration

Inspired by the efficiency and appeal of fine-tuning your own language model, this plugin allows for you to build datasets from your notes in the form of prompts and responses. Automatically formats the text to the specification of OpenAI for finetuning models like GPT3.

This plugin shares simularities to the textTransporter Plugin made by TfTHacker

Made with ❤️ by Conner Ohnesorge

text-dataset-aid-plugin's People

Contributors

conneroisu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

text-dataset-aid-plugin's Issues

v0.0.6

Priority:
Due Date:
Date Created: Nov 22 2022
Creator: Conner Ohnesorge
Status: OPEN

Description:

Plugin ID missmatch

When trying to download in one of my vaults, I got a failure with a response of plugin id missmatch.

new git tag

Priority:
Due Date:
Date Created: Nov 22 2022
Creator: Conner Ohnesorge
Status: OPEN

Description:

Clone Repo

Priority:
Due Date:
Date Created: Nov 22 2022
Creator: Conner Ohnesorge
Status: OPEN

Description:

v0.0.3

Priority:
Due Date:
Date Created: Nov 22 2022
Creator: Conner Ohnesorge
Status: OPEN

Description:

v0.0.1

Due Date: Nov 25 2022
Date Created: Nov 22 2022
Creator: Conner Ohnesorge
Status: OPEN
Description:

- [x] Clone the repository for the sample plugin offered by obsidian ✅ 2022-11-20

- [x] Update the version in `manifest.json` ✅ 2022-11-20
- [x] Update the version in `package.json` ✅ 2022-11-20
- [x] Update the version in `versions.json`
- [x] New git tag for version ✅ 2022-11-20

v0.0.5

Priority:
Due Date:
Date Created: Nov 22 2022
Creator: Conner Ohnesorge
Status: OPEN

Description:

Export Command to export your txt file to jsonl

A user should be able to export the dataset txt file to a jsonl file. This should be an additional mappable command which converts your txt file to a jsonl file and puts it into the main directory in the vault.

v0.0.8

Priority:
Due Date:
Date Created: Nov 22 2022
Creator: Conner Ohnesorge
Status: OPEN

Description:

Update manifest

Priority:
Due Date:
Date Created: Nov 22 2022
Creator: Conner Ohnesorge
Status: OPEN

Description:

New plugin name requirements

Hi @conneroisu

Per our developer policies,
please ensure that your plugin's name does not include the word "Obsidian".
In addition, the plugin name should not include the word "Plugin", as that is unecessary duplication.
We have already modified the name of your plugin to "Text Dataset Aid" in our records.

To maintain compliance, take the following actions:

  1. Modify the manifest.json file in your plugin repository.
  2. Generate a new release for your plugin to ensure users download the updated manifest.

If you have a idea for a different plugin name, you may also submit a pull request to the obsidianmd/obsidian-releases repository.

Thank you for your cooperation.
— the Obsidian team

Send to dataset commands are not parsing the last line

The two commands needed for release are being hung up on a promise within ts. An example of the results of this are seen below:

{"prompt": "fsdfd","completion": "dasffsdfdsf"}
{"prompt": "dasffsdfdsf","completion": "dasffsdfdsf"}
{"prompt": "dasffsdfdsf",{"prompt": "dasffsdfdsf",{"prompt": "dasffsdfdsf",{"prompt": "dasffsdfdsf",

v0.0.4

Priority:
Due Date:
Date Created: Nov 22 2022
Creator: Conner Ohnesorge
Status: OPEN

Description:

update package

Priority:
Due Date:
Date Created: Nov 22 2022
Creator: Conner Ohnesorge
Status: OPEN

Description:

v0.0.9

Priority:
Due Date:
Date Created: Nov 22 2022
Creator: Conner Ohnesorge
Status: OPEN

Description:

v0.0.2

Priority:
Due Date:
Date Created: Nov 22 2022
Creator: Conner Ohnesorge
Status: OPEN

Description:

v0.0.7

Priority:
Due Date:
Date Created: Nov 22 2022
Creator: Conner Ohnesorge
Status: OPEN

Description:

v0.0.1

Priority:
Due Date:
Date Created: Nov 22 2022
Creator: Conner Ohnesorge
Status: OPEN

Description:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.