Giter Club home page Giter Club logo

quantipy's Introduction

Quantipy

Python for people data

Quantipy is an open-source data processing, analysis and reporting software project that builds on the excellent pandas and numpy libraries. Aimed at people data, Quantipy offers support for native handling of special data types like multiple choice variables, statistical analysis using case or observation weights, DataFrame metadata and pretty data exports.

Key features

Understands plain .csv, converts from Dimensions, SPSS, Decipher, or Ascribe and to SPSS. Accessible metadata format to describe and manage case data inputs Computation and assessment of data weights Easy-to-use analysis interface Extensible automated data aggregation via View objects Structured analysis and reporting using savable Link, Stack, Chain and Cluster containers Beautiful exports to MS Excel and Powerpoint with flexible layouts

Contributors

Required libraries before installation

We recommend installing Anaconda for Python 2.7 which will provide most of the required libraries and an easy means of keeping them up-to-date over time.

  • Python 2.7.8
  • Numpy 1.9.2
  • Pandas 0.16.2
  • pylzma

5-minutes to Quantipy

Start a new folder called 'Quantipy-5' and add a subfolder called 'data'.

You can find an example dataset in quantipy/tests:

  • Example Data (A).csv
  • Example Data (A).json

Put these files into your 'data' folder.

Start with some import statements:

import pandas as pd
import quantipy as qp

from quantipy.core.tools.dp.io import load_json
from quantipy.core.helpers.functions import paint_dataframe

# This is a handy bit of pandas code to let you display your 
# dataframes without having them split to fit a vertical column.
pd.set_option('display.expand_frame_repr', False)

# Set up the required path variables
path_data = './data/'
name_data = 'Example Data (A)'

# Paths to the input files
path_json = '{}{}.json'.format(path_data, name_data)
path_csv = '{}{}.csv'.format(path_data, name_data)

# Paths to expected Quantipy files we will want to save
path_stack = '{}{}.stack'.format(path_data, name_data)
path_cluster = '{}{}'.format(path_data, name_data)
path_excel = '{}{}.xlsx'.format(path_data, name_data)

# Load the case metadata and the case data
meta = load_json(path_json)
data = pd.DataFrame.from_csv(path_csv)

# Create a stack (container for aggregations) and add the 
# source data to it
stack = qp.Stack(add_data={'Example': {'data': data, 'meta': meta}})

# If you want to list your variables by type you can use 
# something like this.
cols_by_type = {
    t: [
        col 
        for col in meta['columns'] 
        if meta['columns'][col]['type']==t
    ]
    for t in ['single', 'delimited set', 'int', 'float', 'string']
}
singles = cols_by_type['single']
multiples = cols_by_type['delimited set']
ints = cols_by_type['int']

# Quantipy cares about the links between variables, so set up x and y lists
x_vars = ['q1', 'q2']
y_vars = ['gender', 'ethnicity']

# Add variable links and views (aggregations) on those links
stack.add_link(x=x_vars, y=y_vars, views=['cbase', 'c%'])

# Save the stack
stack.save(path_stack)

# See what's in the stack (what aggregations exist already?)
print stack.describe()

#       data     filter   x          y                     view  #
# 0  Example  no_filter  q1     gender       x|frequency||y||c%  1
# 1  Example  no_filter  q1     gender  x|frequency|x:y|||cbase  1
# 2  Example  no_filter  q1  ethnicity       x|frequency||y||c%  1
# 3  Example  no_filter  q1  ethnicity  x|frequency|x:y|||cbase  1
# 4  Example  no_filter  q2     gender       x|frequency||y||c%  1
# 5  Example  no_filter  q2     gender  x|frequency|x:y|||cbase  1
# 6  Example  no_filter  q2  ethnicity       x|frequency||y||c%  1
# 7  Example  no_filter  q2  ethnicity  x|frequency|x:y|||cbase  1
#       data     filter   x          y                     view  #
# 0  Example  no_filter  q1     gender       x|frequency||y||c%  1
# 1  Example  no_filter  q1     gender  x|frequency|x:y|||cbase  1
# 2  Example  no_filter  q1  ethnicity       x|frequency||y||c%  1
# 3  Example  no_filter  q1  ethnicity  x|frequency|x:y|||cbase  1

# These are the keys under which our base and column percentages
# are saved, we'll use them to get them out of the stack.
view_keys = [
    'x|frequency|x:y|||cbase',
    'x|frequency||y||c%'
]

# Isolate a single aggregation in the stack and take a look at it
data_key = 'Example'
filter_key = 'no_filter'
x_key = x_vars[0]
y_key = y_vars[0]
view_key = 'x|frequency||y||c%'
# Look at the raw dataframe
df = stack[data_key][filter_key][x_key][y_key][view_key].dataframe
print df

# Question            gender           
# Values                   1          2
# Question Values                      
# q1       1        3.669028   3.532419
#          2        5.187247   4.462003
#          3       27.682186  27.980479
#          4       36.386640  36.277016
#          5        2.403846   2.300720
#          6        5.035425   6.460609
#          7       11.310729  10.388101
#          8        1.644737   1.533814
#          9        0.075911   0.023240
#          96       1.037449   1.161980
#          98       0.986842   1.510574
#          99       4.579960   4.369045
         
# Paint the labels onto the raw dataframe
print paint_dataframe(df, meta)

# Question                                                     gender. What is your gender?           
# Values                                                                    Male     Female
# Question                                Values                                                                        
# q1. Min fitness activity? Swimming                                    3.669028   3.532419
#                           Running/jogging                             5.187247   4.462003
#                           Lifting weights                            27.682186  27.980479
#                           Aerobics                                   36.386640  36.277016
#                           Yoga                                        2.403846   2.300720
#                           Pilates                                     5.035425   6.460609
#                           Football (soccer)                          11.310729  10.388101
#                           Basketball                                  1.644737   1.533814
#                           Hockey                                      0.075911   0.023240
#                           Other                                       1.037449   1.161980
#                           I regularly change my fitness activity      0.986842   1.510574
#                           Not applicable - I don't exercise           4.579960   4.369045
                                        
# Extract chains of links from the stack in preparation for the build
# Chains are a subset of the stack drawn out in a special shape that
# represents a one-to-many set of relationship.
chains = stack.get_chain(x=x_vars, y=y_vars, views=view_keys, orient_on='x')

# The first chain is 'q1' to 'gender' and 'ethnicity'
print chains[0].describe()

#       data     filter   x          y                     view  #
# 0  Example  no_filter  q1     gender       x|frequency||y||c%  1
# 1  Example  no_filter  q1     gender  x|frequency|x:y|||cbase  1
# 2  Example  no_filter  q1  ethnicity       x|frequency||y||c%  1
# 3  Example  no_filter  q1  ethnicity  x|frequency|x:y|||cbase  1

# The second chain is 'q2' to 'gender' and 'ethnicity'
print chains[1].describe()

#       data     filter   x          y                     view  #
# 0  Example  no_filter  q2     gender       x|frequency||y||c%  1
# 1  Example  no_filter  q2     gender  x|frequency|x:y|||cbase  1
# 2  Example  no_filter  q2  ethnicity       x|frequency||y||c%  1
# 3  Example  no_filter  q2  ethnicity  x|frequency|x:y|||cbase  1

# Create a cluster and fill it with the chains
# The cluster is consumed by a build
cluster = qp.Cluster('Percentages')
cluster.add_chain(chains)
cluster.save(path_cluster)

# Use the cluster to build an XLSX
qp.ExcelPainter(
    path_excel=path_excel,
    meta=meta,
    cluster=[cluster],
    create_toc=False,
    display_names=['x', 'y']
)

print 'Finished!'

More examples

There is so much more you can do with Quantipy... why don't you explore the docs to find out!

quantipy's People

Contributors

geirfreysson avatar jamesrkg avatar alextanski avatar

Watchers

James Cloos avatar Ben Sully avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.