Giter Club home page Giter Club logo

datasloth's Introduction

DataSloth

Natural language Pandas queries and data generation powered by GPT-3

Installation

pip install datasloth

Usage

In order for DataSloth to work, you must have a working OpenAI API key set in your environment variable, or provide it to the DataSloth object. For more info, refer to this guide.

DataSloth automatically discovers all Pandas dataframes in your namespace (filtering out names starting with an underscode). Before you load any data, import DataSloth and create the sloth:

from datasloth import DataSloth
sloth = DataSloth()

Next, load any data you want to use. Try naming your dataframes and columns in a meaningful way, as DataSloth uses these names to understand what the data is about.

Once your data is loaded, simply run

sloth.query('...')

to query the data.

Improving results

To improve the results, you can set custom descriptions of your tables:

df.sloth.description = 'Verbose description of the table'

By default, table descriptions consist of information about each column in the table. You can include this default description in your custom one by adding a {COLUMNS_SUMMARY} placeholder. See the detailed example notebook in the examples folder for more information.

Solving issues

A lot of times, if the returned data is not correct, or not fully formatted the way you want, it helps to rephrase the question or give specific pointers to how the final data should look like. To better understand where things might have gone wrong, use show_query=True in the sloth.query(), or run sloth.show_last_query() after the prompt has finished to print out the SQL query used (whithout rerunning the engine).

Data generation

DataSloth is also able to generate random data with the generate function. For example, running:

sloth.generate(
    description="people from Mars, with very space-sounding names, and strange taste in ice cream", 
    columns=['First Name', 'Last Name', 'Date Of Birth', 'Country', 'City', 'Favourite Ice Cream'],
    n_rows=15
)

Produces something like this:

First Name Last Name Date Of Birth Country City Favourite Ice Cream
Glorza Mangal 06/12/2079 Mars Pryus Mater Celestial Delight
Yalza Krang 09/21/2084 Mars Valles Marineris Moon Mist
Tralza Vomar 04/17/2074 Mars Syrtis Major Mars Mud Pie
Dalza Ralad 01/02/2088 Mars Hellas Planitia Alien Abduction
Halza Wular 11/04/2092 Mars Olympus Mons Martian Sunrise

Note that the results of the generate function are random, and different on each call.

datasloth's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

datasloth's Issues

Local Run?

Hi, thanks for putting this neat library together!

Is there a way to run this locally? (Using a model from HF?)

Consistent Issue with executing queries

what I'm trying

sloth.query("Number of men over 30 on board the titanic", show_query= True)

The error

ObjectNotExecutableError: Not an executable object: "SELECT COUNT(*) AS men_over_30\nFROM real_data\nWHERE Sex = 'male' AND Age > 30"

What sloth.show_last_query() gets me:

SELECT COUNT(*) AS men_over_30
FROM real_data
WHERE Sex = 'male' AND Age > 30

pandas versioning:

pandas==1.5.3
pandas-datareader==0.10.0
pandas-gbq==0.17.9
pandasql==0.7.3

TypeError: data type "category" not understood

Hi,
Trying to reproduce example. It is throwing following error for this code

# Main dataset to show datasloth capabilities
titanic = sns.load_dataset('titanic')
titanic.head()
titanic.sloth.description = 'Verbose description of the table'
sloth.query("Number of men which survived the titanic", show_query=True)
Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?5c05cb7d-1b4d-48ee-8e94-e8afceb177c6)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-b1e190aa764c> in <module>
      5 titanic.head()
      6 titanic.sloth.description = 'Verbose description of the table'
----> 7 sloth.query("Number of men which survived the titanic", show_query=True)

~/.local/lib/python3.6/site-packages/datasloth/__init__.py in query(self, query, env, show_query)
    118         env = env or get_outer_frame_variables()
    119         query = query[0].lower() + query[1:]
--> 120         prompt = self.dataframes_summary(env)
    121         if not prompt:
    122             print('No dataframes found')

~/.local/lib/python3.6/site-packages/datasloth/__init__.py in dataframes_summary(env, ignore)
    102                 summary_lines += [
    103                     f"\n\nTable name: {name}",
--> 104                     value.sloth.description
    105                 ]
    106                 table_count += 1

~/.local/lib/python3.6/site-packages/datasloth/__init__.py in description(self)
     27     @property
     28     def description(self) -> str:
---> 29         return self._description.format(COLUMNS_SUMMARY=self.columns_summary())
...
--> 200     if is_string_dtype(col) or col.dtype == 'category':
    201         unique = col.unique().tolist()
    202         summary = 'unique values: ' + ', '.join(map(str, unique[:30]))

TypeError: data type "category" not understood

Not able to transpose table

Great work. But when I tried to "transpose table" it did not get it. Then I tried...

sloth.query("columns in rows and rows in columns", show_query=True)

SELECT *
FROM (
  SELECT *
  FROM T
  LIMIT 10
)
PIVOT (
  COUNT(*)
  FOR column_name
  IN ('a', 'b', 'c')
)

Unsuccessful. Try rephrasing your query, or add additional table descriptions in df.sloth.description.
You can inspect the generated prompt and GPT response in sloth.show_last_prompt().

sloth.show_last_prompt is not working by the way.

Open ai key

great work, but ive set the open ai key in all ways suggested but get the following exception: "Exception: OpenAI API key is not set. Either provide it to DataSloth(openai_api_key='...') run openai.api_key('...'), or set it as an env variable OPENAI_API_KEY", any suggestions appreciated?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.