Giter Club home page Giter Club logo

guillotine's Introduction

Guillotine

Simple URL Shortener hobby kit. Currently used to shorten URLs at GitHub.com, and also available as a an installable Heroku app.

USAGE

The easiest way to use it is with the built-in memory adapter.

# app.rb
require 'guillotine'
module MyApp
  class App < Guillotine::App
    adapter = Guillotine::Adapters::MemoryAdapter.new
    set :service => Guillotine::Service.new(adapter)

    get '/' do
      redirect 'https://homepage.com'
    end
  end
end
# config.ru
require "rubygems"
require File.expand_path("../app.rb", __FILE__)
run MyApp::App

Once it's running, add URLs with a simple POST.

curl http://localhost:4567 -i \
  -F "url=http://techno-weenie.net"

You can specify your own code too:

curl http://localhost:4567 -i \
  -F "url=http://techno-weenie.net" \
  -F "code=abc"

Sequel

The memory adapter sucks though. You probably want to use a DB. Check out the Sequel gem for more examples. It'll support SQLite, MySQL, PostgreSQL, and a bunch of other databases.

require 'guillotine'
require 'sequel'
module MyApp
  class App < Guillotine::App
    db = Sequel.sqlite
    adapter = Guillotine::Adapters::SequelAdapter.new(db)
    set :service => Guillotine::Service.new(adapter)
  end
end

You'll need to initialize the DB schema with something like this (depending on which DB you use):

CREATE TABLE IF NOT EXISTS `urls` (
  `url` varchar(255) DEFAULT NULL,
  `code` varchar(255) DEFAULT NULL,
  UNIQUE KEY `url` (`url`),
  UNIQUE KEY `code` (`code`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Redis

Redis works well, too. The sample below is adapted from Katana, a hosted wrapper around Guillotine designed for Heroku.

require 'guillotine'
require 'redis'

module MyApp
  class App < Guillotine::App
    # use redis adapter with redistogo on Heroku
    uri = URI.parse(ENV["REDISTOGO_URL"])
    redis = Redis.new(:host => uri.host, :port => uri.port, :password => uri.password)
    adapter = Guillotine::Adapters::RedisAdapter.new(redis)
    set :service => Guillotine::Service.new(adapter)
  end
end

Riak

If you need to scale out your url shortening services across the cloud, you can use Riak!

require 'guillotine'
require 'riak/client'
module MyApp
  class App < Guillotine::App
    client = Riak::Client.new :protocol => 'pbc', :pb_port => 8087
    bucket = client['guillotine']
    adapter = Guillotine::Adapters::RiakAdapter.new(bucket)
    set :service => Guillotine::Service.new(adapter)
  end
end

Cassandra

you can use Cassandra!

require 'guillotine'
require 'cassandra'

module MyApp
  class App < Guillotine::App
    cassandra = Cassandra.new('url_shortener', '127.0.0.1:9160')
    adapter = Guillotine::Adapters::CassandraAdapter.new(cassandra)

    set :service => Guillotine::Service.new(adapter)
  end
end

You need to create keyspace and column families as below

CREATE KEYSPACE url_shortener;
USE url_shortener;

CREATE COLUMN FAMILY urls
WITH comparator = UTF8Type
AND key_validation_class=UTF8Type
AND column_metadata = [{column_name: code, validation_class: UTF8Type}];

CREATE COLUMN FAMILY codes
WITH comparator = UTF8Type
AND key_validation_class=UTF8Type
AND column_metadata = [{column_name: url, validation_class: UTF8Type}];

Domain Restriction

You can restrict what domains that Guillotine will shorten.

require 'guillotine'
module MyApp
  class App < Guillotine::App
    adapter = Guillotine::Adapters::MemoryAdapter.new
    # only this domain
    set :service => Guillotine::Service.new(adapter,
      :required_host => 'github.com')

    # or, any *.github.com domain
    set :service => Guillotine::Service.new(adapter,
      :required_host => /(^|\.)github\.com$/)

    # or set a simple wildcard
    set :service => Guillotine::Servicew.new(adapter,
      :required_host => '*.github.com')
  end
end

Not TODO

  • Statistics
  • Authentication

guillotine's People

Contributors

brianmario avatar kalw avatar max-power avatar mrtazz avatar snatchev avatar technoweenie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

guillotine's Issues

Inverse transform

I entered https://git.io/vo4w5 into git.io and wanted to know where it comes from. Obviously, conversion failed because it makes no sense to compress the compressed file. But, you can uncompress it if compressed supplied, and show what the original URL used to be, right? Simply endtering url into the browser's window is not an option because it will download the file, which may be not what you want.

This concerns the web interface, not command line. I see that in the command line you can see the redirect link.

Default code generation strategy fails due to birthday paradox

The default short code generator only uses enough entropy to process 9300 URLs before there is a 1% chance of a collision. This is known as the birthday paradox.

This is due to Adapter#shorten only using the last 32 bits (8 hex characters/4 bytes) of the MD5 of the URL, as demonstrated here:

irb(main):001:0> require 'base64'
=> true
irb(main):002:0> require 'digest/md5'
=> true
irb(main):003:0> hashed = Digest::MD5.hexdigest("birthday")
=> "cf673f7ee88828c9fb8f6acf2cb08403"
irb(main):004:0> hashed[-8..-1].to_i(16) == [hashed.to_i(16)].pack("N").unpack("N")[0]
=> true

This is the birthday probability table: (source)

Bits Possible outputs
(rounded)(H)
Desired probability of random collision
(rounded) (p)
10−18 10−15 10−12 10−9 10−6 0.1% 1% 25% 50% 75%
16 66,000 2 2 2 2 2 11 36 190 300 430
32 4.3 × 109 2 2 2 2.9 93 2900 9300 50,000 77,000 110,000
64 1.8 × 1019 6.1 190 6100 190,000 6,100,000 1.9 × 108 6.1 × 108 3.3 × 109 5.1 × 109 7.2 × 109
128 3.4 × 1038 2.6 × 1010 8.2 × 1011 2.6 × 1013 8.2 × 1014 2.6 × 1016 8.3 × 1017 2.6 × 1018 1.4 × 1019 2.2 × 1019 3.1 × 1019
256 1.2 × 1077 4.8 × 1029 1.5 × 1031 4.8 × 1032 1.5 × 1034 4.8 × 1035 1.5 × 1037 4.8 × 1037 2.6 × 1038 4.0 × 1038 5.7 × 1038

Here's a program that demonstrates the problem:

package main

import (
    "crypto/md5"
    "crypto/rand"
    "fmt"
)

func main() {
    res := make(map[string][16]byte)
    data := make([]byte, 16)
    for n := 0; ; n++ {
        rand.Read(data)
        hash := md5.Sum(data)
        str := string(hash[12:])
        if old, ok := res[str]; ok {
            fmt.Printf("collision after %d tries: %x vs %x\n", n, hash, old)
            return
        }
        res[str] = hash
    }
}
$ go run paradox.go
collision after 47836 tries: 8fbe4353a14023b5dabe513932b4a308 vs 5a3e7e510588862076ff2ada32b4a308
$ go run paradox.go
collision after 165556 tries: 26429a5f96eca5c86fc7e1feddd760d2 vs 44bb5ec7290a8c309aeb9d8dddd760d2
$ go run paradox.go
collision after 104012 tries: 564d3703f2f8d5629582c05bc3de3e36 vs 48731311e763d00f056d235ac3de3e36

Create with a empty code

Hello,

I'm working in a sinatra app, passing the parameters through a form, and when we pass a empty code, guillotine tries to create a url in the root path.

To avoid things like:

params[:code] = nil if params[:code] == ""

Can we improve the guillotine to ignore empty code?

Output urls in response body

It'd be nice to get rid of the need for extra sed:

curl -i http://git.io -F "url=https://github.com/rapportive/repo/commit/$(git rev-parse HEAD)" | sed -n 's/Location: \(.*\)/\1/p'

=>

curl http://git.io -F "url=https://github.com/rapportive/repo/commit/$(git rev-parse HEAD)"

way to overwrite/delete/list db keys?

Is there anyway to overwrite, list, or delete keys via GET or POST? I'm guessing not since you don't support authentication but I thought I'd check anyway. (Disclaimer: I have no background in ruby or web stuff) It would be great not to have to fire up a redis cli and poke around that way.

Doesn't work on ruby 1.8.x

It has to do with the url shorten code:

>> Digest::MD5.hexdigest(url).to_i(16)
=> 191415658344158766168031473277922803570
>> [Digest::MD5.hexdigest(url).to_i(16)].pack("N")
RangeError: bignum too big to convert into `unsigned long'
    from (irb):6:in `pack'

The most common url shortening algorithms involve mapping a unique incrementing ID to some custom encoding:

How to convert the id to a shortened URL:

  • Think of an alphabet you want to use. In your case that's [a-zA-Z0-9]. It contains 62 letters.
  • Take the auto-generated unique numerical key (auto-incremented id): for example 125 (a decimal number)
  • Now you have to convert the 125 (base 10) to X (base 62). This will then be {2}{1} (2×62+1=125).
  • Now map the symbols {2} and {1} to your alphabet. Say {0} = 'a', {25} = 'z' and so on. We will have {2} = 'c' and {1} = 'b'. So '/cb' will be your shortened URL.

http://stackoverflow.com/questions/742013/how-to-code-a-url-shortener

Riak has no way to get this though. I have some thoughts on doing host-based counters, but it's way out of scope. Maybe if someone gets really bored...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.