Giter Club home page Giter Club logo

okcupid's Introduction

OKCupid hacker scripts, data, etc

Tools for crawling OKCupid for users and their profiles.

Pulling Data

  • FindUsers.py - This will find all the usernames (with age and gender) of current OKC users in the target location, dumping out a csv.
  • FetchProfiles.py - This takes the usernames csv from FindUsers.py and pulls the profiles one at a time, writing to a new csv.

Analyzing Data

  • anonymized_profiles.sf.20120630.csv - .csv file containing profile information (from their public profiles on 2012/06/30) for n=59,946 users who
    • were members on 2012/06/26
    • were living within 25 miles of San Francisco
    • were online in the last year
    • had at least one photo in their profile
    • Does NOT include essays, personal questions, or any other potentially individually identifiable data
  • anonymized_usernames.sf.20140218.csv - to study temporal population changes, this is a dataset of just age and gender of users at this point in time (with the same filters as above). Anonymized user ids will map to users in the other file, if the user existed in both.
  • Analysis.R - R script to read in profile data and
    • produce a mosaicplot cross-classifying gender and sexual orientation
    • produce a histogram of heights split by gender
  • ReadingLevel.py - Tools for determining the reading level of a chunk of text, as well as some useful statistics about a blob (number of words, sentences, syllables, etc). I have used it to compute reading levels for essay responses, plot them, etc, though I don't have any of that code checked in.
  • cmudict.0.7a.txt - Used by ReadingLevel.py, this is a dictionary mapping words to their numbers of syllables. I pulled it from some CMU page on the interwebs, but I forget where.

Fun Links

Because I'm not the only one who has tried this kind of thing:

okcupid's People

Contributors

rudeboybert avatar

Watchers

Bart Sakwerda avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.