Giter Club home page Giter Club logo

facepager's Introduction

Logo

Facepager was made for fetching public available data from YouTube, Twitter and other websites on the basis of APIs and webscraping. All data is stored in a SQLite database and may be exported to csv.

Installer

Installation packages for each version are available on the releases page. Database files may be incompatible between versions.

  • Windows: Download and execute the latest exe-installer from the releases page. If Windows complains about an unknown publisher and refuses to launch the app click "More info" and start anyway.

  • Mac OS X: Download and install the latest package from the releases page. Your computer will complain that it can't install the package. Ctrl+Click on the installer icon to bypass the complaint, see https://support.apple.com/guide/mac-help/open-a-mac-app-from-an-unidentified-developer-mh40616/mac for further information.
    Older versions of Facepager were distributed in zip files and not code signed. Download and unzip the file from the releases page, drag & drop the app to your "Applications" folder. Next, you need to disable the download flag. Open the terminal, go to the folder where Facepager is stored (e.g. cd /Applications) and use xattr -cr Facepager.app to remove the download flag. Then open the app using ctrl-click.

  • Linux: There is no binary release, see src/readme.md for steps to run under linux.

If you want to run from source, see src/readme.md.

Getting help

Try the help button built into Facepager or directly go to the Wiki. There you find everything to get you started. Further, you will find some Tutorials on YouTube.

You can get help regarding specific problems in the Facepager Usergroup on Facebook. If you want to be informed about updates please follow the Facebook Page.

Citation

Jünger, Jakob / Keyling, Till (2019). Facepager. An application for automated data retrieval on the web. Source code and releases available at https://github.com/strohne/Facepager/.

Licence

MIT License

Copyright (c) 2019 Jakob Jünger and Till Keyling

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

facepager's People

Contributors

catwolfman avatar chantalgrtnr avatar dorvak avatar fuerdiearbeit avatar henriekekotthoff avatar isklee avatar nikicc avatar shahnewaz-labib avatar strohne avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

facepager's Issues

Object empty

Hi!
I'm very excited about discovering Facepager, but life is not a bed of roses..

I need extract Fb IDs of my friends. When I try it, the result is an object "empty".

I've already tried:

  • using "me" node
  • using fb ID as
  • using fb ID as
    ...

    I just can fetch the basic info of my ID, using in "Resources" box without parameters, but that's all. I can not fetch any data from this, like friends, events, etc, etc. The try always results in Object empty.

SSL Handshake failed on Windows and Facebook

Der SSL Handshake scheint auf einem Windows-Rechner einer Anwenderin fehlzuschlagen, seltsamerweise nur mit der Facebook-Seite:

2014-01-09 10:13:34.022000 Network error (6): SSL handshake failed
2014-01-09 10:13:34.042000 Error loading web page

Vermutung: Facebook liefert irgendeine SSL-Information nicht mit, die QT haben will. Ist kaum debug-bar...

YouTube Comments Download

Hi Guys,

I am trying to get the comments from youtube by using this tutorial "http://tillkeyling.com/facepager-and-the-youtube-api-v3-a-quick-tutorial.html#facepager-and-the-youtube-api-v3-a-quick-tutorial".
However, I am getting the following error. I have tried everything that I can. If anyone can help me it would be much appreciated.
{
"error": {
"errors": [
{
"domain": "usageLimits",
"reason": "keyInvalid",
"message": "Bad Request"
}
],
"code": 400,
"message": "Bad Request"
}
}

CSV structure occasionally broken

Some FB posts cause exporting to CSV to break. By break I mean that each post is not in its own line any more since some posts are broken over multiple lines. For example, this is printscreen of a post online (coming from here, scroll down to 23rd May 2014)
screen shot 2016-06-01 at 21 22 06

The resulting CSV is broken into multiple lines like this:

"594";"4";"1";"98358327274_10152537848242275";"data";"fetched (200)";"2016-06-01 21:07:13.278147";"Facebook:<Object ID>/posts";"PIRATSKA STRANKA SLOVENIJE";"Posnetek včerajšnjega TV soočenja na RTV1 na katerem smo sodelovali tudi mi in na katerem je bil naš kandidat, po mnenju mnogih, nesporni zmagovalec.

Prepričajte se sami. Priporočamo tudi ogled včerajšnje aktivnosti na twitterju pod hashtagom #EUVolitve
https://twitter.com/search?q=%23euvolitve
";"51";"6";"13";"2014-05-23T07:34:08+0000"

This prevents Excell to import the data correctly. Maybe we should escape newlines from messages when exporting to CSV?

Twitter API v 1.1

Accessing Twitter via API version 1.0 is deprecated; update to 1.1 is necessary

only 10 posts collectable

facepager

I want to analyse the feed of a specific facebook group. But I can only download the last 10 posts. Furthermore if I set the parameters 'since' and 'until', facepager is answering with empty. Where is the mistake?

One extra question: I want to analyze posts with all related likes, shares, comments and answers with all other reactions to this comments for a specific period inlcuding links, photos and videos. Is it possible to collect all this data with fetching one time?

Pagination seems to not work with Facebook

Using the "Search for Facebook Pages" works, however I can't get paging to work. No matter what I set it to under "Maximum pages", it always returns the same data. Is that not the method to change the number of pages of results? Thanks.

Linux support?

Hallo,
danke für das schöne tool. Leider kann ich es unter Linux nicht benutzen, da mir das credentials.py fehlt. Ich habe auch keine Idee, wie es aufgebaut sein müsste.
Ist denn irgendwann auch eine Linux Version in Sicht?
LG, Sam

Collecting data from Facebook groups: Errors 400 and "Exception: global name 'sleep' is not defined"

Hello,

I'm currently trying to use Facepager to collect data from a Facebook group.

Let's say that the Facebook URL is https://www.facebook.com/groups/1420997938186432/.

I wasn't sure how I should've input the node, and so I tried doing that in two ways.

(1) When I used the node: groups/1420997938186432/, I received the error (400).
(2) When I used the node: 1420997938186432/, I received the error: "Exception: global name 'sleep' is not defined."

How can I go about extracting information from Facebook groups? Am I incorrectly setting up the variables in Facepager? Here's the link to the screenshot of my settings in Facepager: https://drive.google.com/file/d/0BzGweDhxb-SjaE90RUNzX0FhNkE/view?usp=sharing

Thanks in advance for your kind assistance.

Best,
Phoebe

Twitter Stream Disconnecting/Crashing

Hi,
I am trying to collect tweets, but my stream is disconnecting again and again. I am getting the following message in status log:
2017-03-20 15:36:15.476000 Exception: IncompleteRead(0 bytes read, 2 more expected).

Did not change any setting. It was working fine two days ago (last time I used)! Max Tweets that I can get are 300-700.

Logging System

As in the earlier versions, we should implement a simple logging system (log to file). Users still report the "blank login"-error --> Log is needed, because the Error could not be reproduced in our machines

File-Tab: Encode/Extract Picture-Names | file-naming pattern

Some Pictrues and other resources from facebook throw a File-Creation Error due to their name, f.e. https://fbexternal-a.akamaihd.net/safe_image.php?d=AQBGKf_DPP-FtQdF&w=154&h=154&url=http%3A%2F%2Fwww.bundeskanzlerin.de%2FContent%2FDE%2FArtikel%2F2014%2F02%2FBilder%2F2014-02-18-eskalation-ukraine.jpg%3Bjsessionid%xxxx2%3F__blob%3DbpaTopmeldung%26v%3D3

will result in a filename like "safe_image.php?d=AQBGKf_DPP-FtQdF&w=154&h=154&url=http%3A%2F%2Fwww.bundeskanzlerin.de%2FContent%2FDE%2FArtikel%2F2014%2F02%2FBilder%2F2014-02-18-eskalation-ukraine.jpg%3Bjsessionid%3D85F8048B6FE745F7B0CFC5D95AD1F484-2014-03-28-09-52-12-1.jpeg'"

We should use the urlllib-library (urlparse)-Functions & the os.path-Function to create a valid path (or use the object-ID as a fallback, when the file-creation throws an exception due to a invalid name). As an additional feature, we could provide pattern-based file naming, f.e. <object_id>_<created_at>

"Long" encoding error drops randomly

The following error seems to drop randomly:

2014-05-05 13:44:17,226 ERROR:'long' object has no attribute 'encode'
Traceback (most recent call last):
  File "N:\src\facepager\src\apithread.py", line 112, in run
    self.module.fetchData(job['data'], job['options'], streamingData)
  File "N:\src\facepager\src\apimodules.py", line 561, in fetchData
    urlpath, urlparams = self.getURL(urlpath, options["params"], nodedata)
  File "N:\src\facepager\src\apimodules.py", line 110, in getURL
    urlparams[name] = value.encode("utf-8")

This looks like an issue with long numbers (f.e. the Twitter Tweet ID) and the QParamEdit Class (BTW: We should not name our own classes like QAnyName, as it suggests that it is an official QT-Class) and the .getcurrentText()-Method. Although it returns unicode by default, at some point a conversion to long or int might have happened randomly.

Not reproducible, all entries in the ParamEdit seem to be unicode strings

Crash when deleting multiple nodes

In Debian, Facepager closes itself when I try to delete multiple nodes. Every time I select all nodes and try to delete it, first Facepager deletes just one of the nodes. If I try again, it crashes with this error:

Traceback (most recent call last):
File "/home/abitporu/Facepager/src/actions.py", line 141, in deleteNodes
self.mainWindow.tree.treemodel.deleteNode(index, delaycommit=True)
File "/home/abitporu/Facepager/src/datatree.py", line 326, in deleteNode
item.remove(True)
File "/home/abitporu/Facepager/src/datatree.py", line 145, in remove
self.parentItem.removeChild(self, persistent)
File "/home/abitporu/Facepager/src/datatree.py", line 160, in removeChild
dbnode = self.dbnode()
File "/home/abitporu/Facepager/src/datatree.py", line 185, in dbnode
return Node.query.get(self.id)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 831, in get
return self._get_impl(ident, loading.load_on_ident)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 842, in _get_impl
if len(ident) != len(mapper.primary_key):
TypeError: object of type 'NoneType' has no len()
Traceback (most recent call last):
File "/home/abitporu/Facepager/src/actions.py", line 141, in deleteNodes
self.mainWindow.tree.treemodel.deleteNode(index, delaycommit=True)
File "/home/abitporu/Facepager/src/datatree.py", line 326, in deleteNode
item.remove(True)
File "/home/abitporu/Facepager/src/datatree.py", line 145, in remove
self.parentItem.removeChild(self, persistent)
File "/home/abitporu/Facepager/src/datatree.py", line 160, in removeChild
dbnode = self.dbnode()
File "/home/abitporu/Facepager/src/datatree.py", line 185, in dbnode
return Node.query.get(self.id)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 831, in get
return self._get_impl(ident, loading.load_on_ident)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 842, in _get_impl
if len(ident) != len(mapper.primary_key):
TypeError: object of type 'NoneType' has no len()
Falha de segmentação
Segmentation Error

$ sudo cat /proc/version
Linux version 3.16.0-4-amd64 ([email protected]) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.39-1 (2016-12-30)

Filter nodes

Implement a user interface for filtering nodes by node type or JSON keys when fetching data.

Overflow Error

Finally, we produced an OverFlow-Error (yeahhh!!) in the 3.3 Version:

  File "Facepager.py", line 314, in <module>
    startMain()
  File "Facepager.py", line 309, in startMain
    sys.exit(app.exec_())
OverflowError

Reproducable on Mac OS 10.8.4: Collect some Tweets via the search method. Selecting one of them crashes the whole Facepager, beeing unable to make interactions within the FP. Closing the whole stuff, it produces the error above

Mac Version Test

Created a Mac .app -Version (which works on my Mac). Should be tested on other Hardware

HTTPS Connection Pool Error

Kriege hier teilweise einen Exception: Request Error: HTTPSConnectionPool(host='api.twitter.com', port=443): Max retries exceeded with url: /..

Fehler, beim zweiten Ausführen des Requests geht es meistens. Ist das repoduzierbar? Hängt vermutlich an requests und irgendwelchen alten Objekten/garbage collection etc..

Implement a simple scheduler

The idea is a basic scheduling system for repetitive tasks (fetch XY every 10 minutes) wihtin the tool (task handling via crontab, autostarts etc. are propably overkill). Because the requests are saved in the info-window, the schedule could read in the request details and use this information.

Twitter: Streaming API

With wrappers like Tweepy, constructing a Streaming API Client is straightforward. Therefore, whenn users of the Facepager provide their Developer App Credentials (consumer_key, consumer_secret), the could use the Streaming API. Problems: Running the Streaming Event Loop inside the PyQT Event loop and concurrently updating the Nodes/writing to the DB in the TreeView might be tricky.

BadStatusLine after every N request

I'm still getting a BadStatus-Line Error on Win 7 (need to check this on my Mac). This error ocurrs after issuing a new request (f.e. this does not ocurr while paginating through the results).

`HTTPSConnectionPool(host='api.twitter.com', port=443): Max retries exceeded with url: /1.1/application/rate_limit_status.json?resources= blablabla (Caused by <class 'httplib.BadStatusLine'>: '')

An educated guess without debugging anything: 1) Some remainings of the old request inside the HTTPSConnection Pool (this seems to be a known request-bug with older version). 2) Some issues with proxies(network at the university (although this should not cause such error 3) Our threading-implementation breaks the ConnectionPool somehow

This is not really urgent, but quite annoying. I'll debug that in May (holidays!)

URL & File Download Option

One should be able to dowload pictures etc (more generic: URLs in a field) within the tool. For Facebook-Contents, an authed request may be needed (private pictures, for example). Another option would be an "export to curl" commando (f.e. as used in the Chrome Browser)

Facebook Token Import

(Jetzt mal auf deutsch): Der OAuth mit Facebook klappt, aber der Token wird nicht korrekt übergeben. Kopiert man den (nicht sichtbaren) Token einfach raus und kopiert in wieder rein, funktioniert der Auth, d.h. der Fehler dürfte irgendwo in der Paste-Funktion liegen. Scheint nur bei einigen Anfragen aufzufallen, in denen der Token benötigt wird (Posts der Seite von Boris Becker z.B.)

Unpack list data

Display lists in single columns or unpack lists into new datasets.

Facebook

  1. how to set two nodes with different resource and at the same time with different time stamp have to set the timer.
  2. how to over write sql lite db, every time if my timer trigger whatever the data is presented in the view entire data available in database which is duplicate because facepager giving everytime full data(page information)
  3. page information like paging, likes , comments all these information in database its storing as a single column as responses. but in the facepager view i can view in different columns. so how to set the same columns in sqllite db.

API-Limit block/notification & Error warnings

Ein Großteil der Fehler der Facebook-API-Errors wurden durch die Überschreitung der Limits ("User request limit reached") verursacht. Hier könnte man in Zukunft eine Blockierung einführen, wenn die Limits bei Twitter/Facebook (die jeweils eindeutig durch eine Error-Message in den Metadaten gekennzeichnet sind) überschritten werden, Alternativ auch nur eine Warnung per ButtonDialog etc.

Gleiches gilt auch für "normale" Errormeldungen, z. B. wenn ein falsches/nicht existenter Endpoint gewählt wurde. Hier sollte man eventuell ein Limit einführen, des verhindert, dass x-tausend Anfragen mit derselben, fehlerhaften Einstellung generiert werden (d.h. Error-Messages im erhebungsprozess bereits parsen, einen Variable "errorcount" inkrementieren und ggf. abbrechen)

GET statuses/show/<id> not formulated correctly

Hi all,
Thank you for this great tool. At the moment, we are using Facepager to get individual Tweets by status id. We have noticed that the API call GET statuses/show/ is not formulated correctly having an extra slash between show and .json.
https://api.twitter.com/1.1/statuses/show/.json?id=xxxxxxxxxxx
The issues seems to be in “apimodules.py" line 623 in options["query"]
Thank you for resolving this.
Dimitra

build instructions for linux (ubuntu 15.04)

Ich habe keine Anleitung gefunden, wie man Facepager unter Linux zum laufen bekommt. Vielleicht möchtet ihr die folgenden Schritte in der README.md einfügen, damit Linux-User die Software in Zukunft schneller zum Laufen bringen.

#tested under ubuntu vivid x64
git clone https://github.com/strohne/Facepager
cd Facepager

sudo apt-get install build-essential git cmake libqt4-dev libphonon-dev python2.7-dev libxml2-dev libxslt1-dev qtmobility-dev python-virtualenv

virtualenv facepager_env
. facepager_env/bin/activate

pip install SQLAlchemy python-dateutil requests rauth wheel
cd src/

#you may want to change the download link to the most recent version
wget https://pypi.python.org/packages/source/P/PySide/PySide-1.2.2.tar.gz
extract PySide-1.2.2.tar.gz 
cd PySide-1.2.2/


python2.7 setup.py bdist_wheel --qmake=/usr/bin/qmake-qt4

../pacepager_env/bin/pip2.7 install dist/PySide-1.2.2-cp27-none-linux_x86_64.whl 

#add your credentials
cp credentials.py.readme credentials.py
python Facepager.py

Lower/Uppercase aware templating system

Just noticed that in some cases you'll need to distinct from <object_id> (the latter one might differ, f.e. in case of FB-Posts, this ID is the part behind the underscore 4334_343434). This doesn't work yet, because it still takes the instead..should be a minor change in the regex

Export Options

List of Export-Options:

  • BOM
  • Separator Type
  • Exclude Empty Nodes/Offcuts (f.e. in case you retrieve comments for comments on Facebook, a lot of empty nodes will occur du to non-existing answer-comments)
  • Sort order (see selective export)
  • wide and long format
  • casted/multi-page XLSX-Export
  • Remove linebreaks in text columns

Lizensierung

Damit irgendwelche Unternehmen nicht einfach den Code kopieren, sollten wir mal über eine Lizent a´la GNU/GPL oder MIT nachdenken.

Templates/Presets

Save settings as template, manage templates for generic requests, provide some examples

"Bad Request"

Eben bei Ines Schaudel auf dem Rechner: 2013-07-31 15:42:16.550000 Request Error: HTTPSConnectionPool(host='graph.facebook.com', port=443): Max retries exceeded with url: /57771538323?access_token=balblabla&since=2013-07-17&until=2013-07-18&metadata=1 (Caused by <class 'httplib.BadStatusLine'>: '')

  • Unter Windows auf Mac (Virtualbox)
  • nachdem sie sich bei Facebook eingeloggt hatte (per Browser), lief die Anfrage völlig korrekt
  • Issue mal beobachten ;)

facebook remove friends api - facepager help pls

hi, just trying to use facepager, notice an annoucment from FB stating they have ceased friends api. I am having issues with fetching data from facepager - i get an 404 error in some cases using a user name but when i use a FB ID number string it produces a object type 'empty' and fetched 200 but no other data is pulled. The user page has data however that is available to me and can be viewed via the normal FB web page. Any help would be greatful

GUI: Rezisable Raw-Data Window

For easier object inspection, the raw-data window/widget should be resizable or a full-screen view/open in new window-function would be handy

API Documentation and Pattern System

So wie hier könnte man die API-Dokumente automatisch in die Tabs einbauen und mögliche Parameter direkt mitliefern. Entweder man benutzt der JSON-Datei im Hintergrund oder man parsed direkt aus der API-Dokumentation. Müsste man für Facebook analog machen. Wenn man einen Parser baut, dann sogar direkt als eigenes kleines Guthub-Projekt: "Faceparser" :D

Gibt es eine API für API-Dokumentationen?

Extract Reviews from Facebook Page

I would like to get some reviews from a public Facebook page. However, when I try
< application >/reviews in Resource, it returns error. I'm not sure if there is a problem with the code of Facepager or if I do not choose correct value in the drop down menu of "Resource". I'm grateful for your help.

facepager_error

API v2.0 Migration Facebook

For the future, we should think about making the v2.0 API as a standard endpoint (basically adding the /v2.0/ to the URL-Path, maybe update some preferences)

Test request

When configuring generic requests a feature to test and look at the results would be nice (instead of immediately writing fetched data to database).

Installation package for MAC for latest commits

Hello guys! Thanks for creating Facepager! Awesome application. May I know if the Mac installation package with the recent commit of fixing the export text line break will be out anytime soon? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.