Giter Club home page Giter Club logo

caravanstudios_webscraping4localknowledge's Introduction

EU Commission Grant Exploratory Research: Polish Discourse on Social Media on Air Pollution


Background

As the Citizen Science Research Intern at Caravan Studios, I was tasked to explore the air pollution discourse online for Poland, the EU country with the worst air pollution record. My aim was to 1) gain a temporal and spatial understanding of the interest about air pollution in Poland and 2) have a sense of the popular discourse about air pollution. To expand on the latter: how did Poles express themselves about air pollution (whom did they blame, how did they describe the impact of pollution on daily life etc.).

I investigated different social media sources, including: Twitter, Facebook, Reddit, Wykop, NK.pl, and Google Trends. Here's the summary of my outputs per sources below:

  • Google Trends: I was able to generate charts showing browsing frequency over the last 5 years, using a set of keywords relevant to the subject of air pollution. Additionally I was able to compare the frequency of those searches between the whole of Poland and the southwestern region of Lesser Poland, known to be the region heavily impacted by smog.

  • Facebook: Due to changes in Facebook's API, I was unable to query for meaningful content. I considered a web scraping approach but Facebook browsing features limits to 5 posts over a narrow time period. Thus I dropped the site from my inquiries.

  • Reddit: I made a post on r/Polska, the Polish Subreddit, soliciting inputs about air pollutions from Polish individuals directly. I was able to gather over half a dozen responses. The objective of my sollicitation was to inform my research process and use local testimonies to cross-examine with news story and subsequent data I gathered.

  • NK.pl: A Polish forum site, which I rapidly found to be lacking in the content I was researching. Futhermore a poster on r/Polska kindly informed me that "NK.pl is dead".

  • Twitter Analytics: Using the Twitter's API, I queried Tweets from the last 7 days (a limitation of the public access API) geotagged to Poland with the mention of smog. Although the scope of temporal granularity is quite limited, the API nonetheless provides a snapshot of the "current discourse". I was able to both generate a timeseries and word clouds from the Tweets collected.

  • Wykop: The "Polish Reddit". Wykop was my richest source of information: I was able to scrape over 7,000 posts, dating as far back to 2012, mentioning smog. I was able to generate a timeseries of the frequency of the posts, in addition to a word cloud.

Method: Scraping all posts mentioning 'smog' on Wykop.pl

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.