Giter Club home page Giter Club logo

gattica's Introduction

Gattica

Gattica is an easy to use Gem for getting data from the Google Analytics API.

Features

  • Supports: metrics, dimensions, sorting, filters, goals, and segments.
  • Handles accounts with over 1000 profiles
  • Returns data as: hash, json, CSV

How to export Google Analytics data using Ruby (Links to my blog post on Seer Interactive)


Quick Start

Here are bare basics to get you up and running.

Installation

Add Gattica to your Gemfile

gem 'gattica', :git => 'git://github.com/chrisle/gattica.git'

Don't forget to bundle install:

$ bundle install

Login, get a list of accounts, pick an account, and get data:

# Include the gem
require 'gattica'

# Login
ga = Gattica.new({ 
    :email => '[email protected]', 
    :password => 'password'
})

# Get a list of accounts
accounts = ga.accounts

# Choose the first account
ga.profile_id = accounts.first.profile_id

# Get the data
data = ga.get({ 
    :start_date   => '2011-01-01',
    :end_date     => '2011-04-01',
    :dimensions   => ['month', 'year'],
    :metrics      => ['visits', 'bounces'],
})

# Show the data
puts data.inspect

General Usage

Create your Gattica object

ga = Gattica.new({ :email => '[email protected]', :password => 'password' })
puts ga.token   # => returns a big alpha-numeric string

Query for accounts you have access to

# Retrieve a list of accounts
accounts = ga.accounts

# Show information about accounts
puts "---------------------------------"
puts "Available profiles: " + accounts.count.to_s
accounts.each do |account|
  puts "   --> " + account.title
  puts "   last updated: " + account.updated.inspect
  puts "   web property: " + account.web_property_id
  puts "     profile id: " + account.profile_id.inspect
  puts "          goals: " + account.goals.count.inspect
end

Set which profile Gattica needs to use

# Tell Gattica to query profile ID 5555555
ga.profile_id = 5555555 

Get data from Google Analytics

The Get method will get data from Google Analytics and return Gattica::DataSet type.

Here's an example:

# Get the number of visitors by month from Jan 1st to April 1st.
data = ga.get({ 
    :start_date   => '2011-01-01',
    :end_date     => '2011-04-01',
    :dimensions   => ['month', 'year'],
    :metrics      => ['visitors']
})

Using Dimension & Metrics

Here are some additional examples that illustrate different things you can do with dimensions and metrics.

Sorting

# Sorting by number of visits in descending order (most visits at the top)
data = ga.get({ 
    :start_date   => '2011-01-01',
    :end_date     => '2011-04-01',
    :dimensions   => ['month', 'year'],
    :metrics      => ['visits'],
    :sort         => ['-visits']
})

Limiting results

# Limit the number of results to 25.
data = ga.get({ 
    :start_date   => '2011-01-01',
    :end_date     => '2011-04-01',
    :dimensions   => ['month', 'year'],
    :metrics      => ['visits'],
    :max_results  => 25 
})

Results as a Hash

my_hash = data.to_h['points']

# => 
#   [{
#     "xml"         => "<entry gd:etag=\"W/&quot;....  </entry>", 
#     "id"          => "http://www.google.com/analytics/feeds/data?...", 
#     "updated"     => Thu, 31 Mar 2011 17:00:00 -0700, 
#     "title"       => "ga:month=01 | ga:year=2011", 
#     "dimensions"  => [{:month=>"01"}, {:year=>"2011"}], 
#     "metrics"     => [{:visitors=>6}]
#   },
#   {
#     "xml"         => ...
#     "id"          => ...
#     "updated"     => ...
#     ...
#   }]

JSON formatted string

# Return data as a json string. (Useful for NoSQL databases)
my_json = data.to_h['points'].to_json

# => 
#   "[{
#       \"xml\":\"<entry> .... </entry>\",
#       \"id\":\"http://www.google.com/analytics/feeds/data? ...",
#       \"updated\":\"2011-03-31T17:00:00-07:00\",
#       \"title\":\"ga:month=01 | ga:year=2011\",
#       \"dimensions\":[{\"month\":\"01\"},{\"year\":\"2011\"}],
#       \"metrics\":[{\"visitors\":6}]
#     },
#     { 
#       \"xml\":\"<entry> .... </entry>\",
#       \"id\":\"http://www.google.com/analytics/feeds/data? ...",
#       ...
#   }]"

CSV formatted string

# Return the data in CSV format.  (Useful for using in Excel.)

# Short CSV will only return your dimensions and metrics:
short_csv = data.to_csv(:short)   

# => "month,year,visitors\n\n01,2011, ...."

# Long CSV will get you a few additional columns:
long_csv = data.to_csv            

# => "id,updated,title,month,year,visitors\n\nhttp:// ..."

DIY formatting

# You can work directly with the 'point' method to return data.
data.points.each do |data_point|
  month = data_point.dimensions.detect { |dim| dim.key == :month }.value
  year = data_point.dimensions.detect { |dim| dim.key == :year }.value
  visitors = data_point.metrics.detect { |metric| metric.key == :visitors }.value
  puts "#{month}/#{year} got #{visitors} visitors"
end

# => 
#   01/2011 got 34552 visitors
#   02/2011 got 36732 visitors
#   03/2011 got 45642 visitors
#   04/2011 got 44456 visitors

Using Filter, Goals, and Segments

Learn more about filters: Google Data feed filtering reference

Get profiles with goals

# Get all the profiles that have goals
profiles_with_goals = accounts.select { |account| account.goals.count > 0 }

# => 
#   [{
#     "id"                => "http://www.google.com/analytics/feeds/accounts/ga:...",
#     "updated"           => Mon, 16 May 2011 16:40:30 -0700, 
#     "title"             => "Profile Title", 
#     "table_id"          => "ga:123456", 
#     "account_id"        => 123456, 
#     "account_name"      => "Account name", 
#     "profile_id"        =>  123456, 
#     "web_property_id"   => "UA-123456-3", 
#     "goals"=>[{
#         :active   => "true", 
#         :name     => "Goal name", 
#         :number   => 1, 
#         :value    => 0.0
#     }]
#   }, 
#   {
#     "id"                => "http://www.google.com/analytics/feeds/accounts/ga:...",
#     "updated"           => Mon, 16 May 2011 16:40:30 -0700, 
#     "title"             => "Profile Title", 
#     ...
#   }]

List available segments

# Get all the segments that are available to you
segments = ga.segments

# Segments with negative gaid are default segments from Google. Segments
# with positive gaid numbers are custom segments that you created.
# =>
#   [{
#     "id"          => "gaid::-1", 
#     "name"        => "All Visits", 
#     "definition"  => " "
#   }, 
#   {
#     "id"          => "gaid::-2", 
#     "name"        => "New Visitors", 
#     "definition"  => "ga:visitorType==New Visitor"
#   }, 
#   {
#     "id"          => ... # more default segments
#     "name"        => ...
#     "definition"  => ...
#   },
#   {
#     "id"          => "gaid::12345678", 
#     "name"        => "Name of segment", 
#     "definition"  => "ga:keyword=...."
#   }, 
#   {
#     "id"          => ... # more custom segments
#     "name"        => ...
#     "definition"  => ...
#   }]

Query by segment

# Return visits and bounces for mobile traffic 
# (Google's default user segment gaid::-11)

mobile_traffic = ga.get({ 
  :start_date   => '2011-01-01', 
  :end_date     => '2011-02-01', 
  :dimensions   => ['month', 'year'],
  :metrics      => ['visits', 'bounces'],
  :segment      => 'gaid::-11'
})

Filtering

Filters are boolean expressions in strings. Here's an example of an equality:

# Filter by Firefox users
firefox_users = ga.get({
  :start_date   => '2010-01-01', 
  :end_date     => '2011-01-01',
  :dimensions   => ['month', 'year'],
  :metrics      => ['visits', 'bounces'],
  :filters      => ['browser == Firefox']
})

Here's an example of greater-than:

# Filter where visits is >= 10000
lots_of_visits = ga.get({
  :start_date   => '2010-01-01', 
  :end_date     => '2011-02-01',
  :dimensions   => ['month', 'year'],
  :metrics      => ['visits', 'bounces'],
  :filters      => ['visits >= 10000']
})

Multiple filters is an array. Currently, they are only joined by 'AND'.

# Firefox users and visits >= 10000
firefox_users_with_many_pageviews = ga.get({
  :start_date   => '2010-01-01', 
  :end_date     => '2011-02-01',
  :dimensions   => ['month', 'year'],
  :metrics      => ['visits', 'bounces'],
  :filters      => ['browser == Firefox', 'visits >= 10000']
})

Even More Examples!

Top 25 keywords that drove traffic

Output the top 25 keywords that drove traffic to your website in the first quarter of 2011.

# Get the top 25 keywords that drove traffic
data = ga.get({ 
  :start_date => '2011-01-01',
  :end_date => '2011-04-01',
  :dimensions => ['keyword'],
  :metrics => ['visits'],
  :sort => ['-visits'],
  :max_results => 25 
})

# Output our results
data.points.each do |data_point|
  kw = data_point.dimensions.detect { |dim| dim.key == :keyword }.value
  visits = data_point.metrics.detect { |metric| metric.key == :visits }.value
  puts "#{visits} visits => '#{kw}'"
end

# =>
#   19667 visits => '(not set)'
#   1677 visits => 'keyword 1'
#   178 visits => 'keyword 2'
#   165 visits => 'keyword 3'
#   161 visits => 'keyword 4'
#   112 visits => 'keyword 5'
#   105 visits => 'seo company reviews'
#   ...

Additional Options & Settings

Setting HTTP timeout

If you have a lot of profiles in your account (like 1000+ profiles) querying for accounts may take over a minute. Net::HTTP will timeout and an exception will be raised.

To avoid this, specify a timeout when you instantiate the Gattica object:

ga = Gattica.new({ 
    :email => '[email protected]', 
    :password => 'password',
    :timeout => 600  # Set timeout for 10 minutes!
})

The default timeout is 300 seconds (5 minutes). Change the default in: lib/gattica/settings.rb

For reference 1000 profiles with 2-5 goals each takes around 90-120 seconds.

Reusing a session token

You can reuse an older session if you still have the token string. Google recommends doing this to avoid authenticating over and over.

my_token = ga.token # => 'DSasdf94...'

# Sometime later, you can initialize Gattica with the same token
ga = Gattica.new({ :token => my_token })

If your token times out, you will need to re-authenticate.

Specifying your own headers

Google expects a special header in all HTTP requests called 'Authorization'. Gattica handles this header automatically. If you want to specify your own you can do that when you instantiate Gattica:

ga = Gattica.new({
    :token => 'DSasdf94...', 
    :headers => {'My-Special-Header':'my_custom_value'}
})

Using http proxy

You can set http proxy settings when you instantiate the Gattica object:

ga = Gattica.new({ 
    :email => '[email protected]', 
    :password => 'password',
    :http_proxy => { :host => 'proxy.example.com', :port => 8080, :user => 'username', :password => 'password' }
})

History

Version history

0.6.1

  • Incorporated fixes by vgololobov
    • Removed circular dependency
    • Fixed 1.9.3 init exception #6

0.6.0

  • Update to use Google Analytics v2.4 management API

    TL;DR: Uses the v2.4 API now because Google deprecated <2.3.

    • :) - Drop-in replacement for you.
    • :) - Won't timeout anymore.
    • :) - Accounts method might be faster if you have a few profiles
    • :( - Accounts method is notably slower if you have >1000 profiles.

    Google has changed the output of the API < 2.3. Most notable changes were the output of what was the /management/accounts/default call. Some of the XML changed, but most notably it didn't return everything all at once. It used to look like this: http://bit.ly/w6Ummj

  • Fixed token [deviantech]

0.5.1

  • Added some tests - needs more work :(

0.4.7

  • Removed version numbers [john mcgrath]

0.4.6

  • Removed monkey patch [mathieuravaux]

0.4.4

  • Added a configuration file to unit tests
  • Removed version.rb. Not needed. (thanks John McGrath see: github.com/john)
  • Migrated examples and rewrote README file

0.4.3

  • FIXED: Typo in start-index parameter
  • Refactored Engine class into it's own file.
  • Began to re-style code to wrap at 80 characters
  • Added some unit tests

0.4.2

  • Added Ruby 1.9 support (Thanks @mathieuravaux https://github.com/mathieuravaux)
  • Uses hpricot 0.8.4 now. 0.8.3 segfaults.
  • Added ability to change the timeout when requesting analytics from Google
  • Added the ability to use max_results

0.3.2.scottp

  • scottp Added Analytics API v2 header, and basic support for "segment" argument.

0.3.2

  • er1c updated to use standard Ruby CSV library

0.3.0

  • Support for filters (filters are all AND'ed together, no OR yet)

0.2.1

  • More robust error checking on HTTP calls
  • Added to_xml to get raw XML output from Google

0.2.0 / 2009-04-27

  • Changed initialization format: pass a hash of options rather than individual email, password and profile_id
  • Can initialize with a valid token and use that instead of requiring email/password each time
  • Can initialize with your own logger object instead of having to use the default (useful if you're using with Rails, initialize with RAILS_DEFAULT_LOGGER)
  • Show error if token is invalid or expired (Google returns a 401 on any HTTP call)
  • Started tests

0.1.4 / 2009-04-22

  • Another attempt at getting the gem to build on github

0.1.3 / 2009-04-22

  • Getting gem to build on github

0.1.2 / 2009-04-22

  • Updated readme and examples, better documentation throughout

0.1.1 / 2009-04-22

  • When outputting as CSV, surround each piece of data with double quotes (appears pretty common for various properties (like Browser name) to contain commas

0.1.0 / 2009-03-26

  • Basic functionality working good. Can't use filters yet.

Maintainer history

gattica's People

Contributors

cannikin avatar chrisle avatar eignerchris avatar kdonovan avatar martijnsch avatar mathieuravaux avatar ngpestelos avatar nreckart avatar rtlechow avatar thieso2 avatar tilthouse avatar vgololobov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gattica's Issues

Segmentation issue plus

I just updated to the latest version of the gem, now when I try to pull the list of segments I get the following error:

"The requested URL /analytics/v2.4/management/segments was not found on this server. That’s all we know."

I have no idea why, I looked through the code and the Google Docs for the API, which states that /analytics/v2.4/management/segments is the URL.

Also, another strage thing happens, when I pull data from the API (such as a report) I have to alter your code to make it work, but I don't know why. It should work just as you've written it. Below is the example:

for the do_http_get method:
I update:
response, data = @http.get(query_string, @headers)

to:
response = @http.get(query_string, @headers)
data = response.body

Splitting up the code make it work, otherwise data is nil. Again, it should work from my understanding of Ruby, but for some reason it's not. I have the latest version of ruby.

Thanks for your help, this is truly an awesome gem!!!

OAuth 2.0 support

I noticed that grabbing a token from /ClientLogin gives you a token that expires in ~14 days. According to the docs here: https://developers.google.com/analytics/devguides/reporting/core/v2/ and here https://developers.google.com/analytics/devguides/config/mgmt/v2/ V2.4 supports OAuth 2.0.

I don't see any pull requests or anything referencing adding support for OAuth 2.0 so I was going to add in support for it unless someone has already started on it somewhere?? One of the applications that I'm working on looks to need it at this point.

You have a nil object when you didn’t expect it!

Hi Chris, thanks for helping!

I was trying to follow your instructions, but when I run:
ga = Gattica.new({ :email => ‘MY EMAIL’,
:password => ‘MY PASSWORD’,
:timeout => 500 })

I keep getting:

You have a nil object when you didn’t expect it!
You might have expected an instance of Array.
The error occurred while evaluating nil.split
Any idea what could be wrong?

Other then the above message, I don't get any other feed.

Thanks!

Read: Google Analytics Data Export API with Ruby + Gattica | SEER Interactive

Passing the API key

How does one pass the API key to google analytics in order to get past the rate limits.

Cheers.

filters shouldn't be dependent on available dimensions

Right now, the gem only lets you filter on a dimension that's being used in the query. But based on the Query Explorer (http://code.google.com/apis/analytics/docs/gdata/gdataExplorer.html), it doesn't appear that the filters need to be dependent on the selected dimensions...

.rvm/gems/ruby-1.8.7-p352/gems/gattica-0.4.3/lib/gattica.rb:359:in `validate_and_clean': You are trying to filter by fields that are not in the available dimensions or metrics: source == google (GatticaError::InvalidSort)

It seems logical to de-link these in the gem, because it would free up extra dimensions for reporting; otherwise, the filtered dimension is pretty much a waste of space. For example, if I'm looking for only Google / CPC data, there's no reason why I should have filtered columns that all read "google", "cpc"...

Thanks for the work, btw! It's saved me a lot of time with a project I'm working on...

Gattica::DataSet to_hash method returning an array?

Hi
I just installed the gem and I'm playing around to get used to it but I run into a weird case in which the to_hash method for a Gattica::DataSet object is returning an array

This is my code
1.9.3p0 :012 > visits.class
=> Gattica::DataSet

1.9.3p0 :013 > visits.to_hash.class
=> Array

The object itself contains an array of hashes, but the method's name is a little bit misleading since I was expecting to retrieve a hash

Thanks for the hard work!

Not all goal data retrieved

Goal data is not retrieved for 'event' goals.

On one website, I have 1 goal (#1) that is a 'url destination' goal, and 4 other goals that are all events (#2, 6, 11, and 12). When calling account.goals in Gattaca, only one goal (#1) is returned.

I can still get the metrics for the other goals, they just aren't returned by account.goals.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.