Giter Club home page Giter Club logo

Comments (22)

amjedbj avatar amjedbj commented on May 29, 2024

Personally, I need the next fields (sorted by importance):
1- tweet.lang (if available)
2- tweet.retweet_status.id (if retweeted)
3- tweet.retweet_status.user.screenname (if retweeted)
4- tweet.in_replay_to_status_id (if reply)
5- tweet.in_reply_to_screen_name (if reply)
6- tweet.user.followers_count
7- tweet.user.friends_count
8- tweet.retweet_status.reweet_count (if retweeted)
9- tweet.coordinates (lat/log) (if available)

from twitter-tools.

lintool avatar lintool commented on May 29, 2024

Hi Miles,

I think these are all reasonable fields to throw into the index. I've assigned this task to you? We need to see how big the index becomes...

Thanks!

from twitter-tools.

JamesMcMinn avatar JamesMcMinn commented on May 29, 2024

The list given by amjedbj seems more than enough for our needs, however can I suggest location (including lat/long if available) as something which may be useful.

from twitter-tools.

lintool avatar lintool commented on May 29, 2024

I would be -1 on location, unless someone needs absolutely needs it.

from twitter-tools.

amjedbj avatar amjedbj commented on May 29, 2024

The problem with location is there are 3 fields

  • tweet.coordinates (lat/log)
  • tweet.place (complex object)
  • tweet.user.location (string)

Which field is useful?

from twitter-tools.

lintool avatar lintool commented on May 29, 2024

lat/long I'd say

from twitter-tools.

isoboroff avatar isoboroff commented on May 29, 2024

Also the lang slot from the tweet (not the user)?

On Wed, Apr 17, 2013 at 1:28 PM, James McMinn [email protected]:

The list given by amjedbj seems more than enough for our needs, however
can I suggest location (including lat/long if available) as something which
may be useful.


Reply to this email directly or view it on GitHubhttps://github.com//issues/26#issuecomment-16520196
.

from twitter-tools.

isoboroff avatar isoboroff commented on May 29, 2024

Definitely not user.location, that is user fillable from the profile and
would not correspond to the tweet location.

On Wed, Apr 17, 2013 at 1:50 PM, Jimmy Lin [email protected] wrote:

lat/long I'd say


Reply to this email directly or view it on GitHubhttps://github.com//issues/26#issuecomment-16521500
.

from twitter-tools.

stewhdcs avatar stewhdcs commented on May 29, 2024

If this causes issues with the the Lucene index, a viable alternative is to keep it separate in an AWS Dynamo DB, with tables to lookup stats by userid, or by tweetid. Dynamo DB pricing has just been cut to $0.25 per gb/month.

from twitter-tools.

amjedbj avatar amjedbj commented on May 29, 2024

Hashtags, mentions and URLs could be extracted (on client side) from tweet text.
I solved this issue #28.

from twitter-tools.

dpmccul avatar dpmccul commented on May 29, 2024

Though you have retweet count, you do not appear to have the retweet field. We need to see if a particular tweet is a retweet, or which portion of the tweet is a retweet.

from twitter-tools.

milesefron avatar milesefron commented on May 29, 2024

yep. i'll do this. in most cases, the data types will be pretty obvious,
i assume. i'll ping you if i see ambiguity.

and i will store created_at as a Long, corresponding to Unix epoch. that
will take up less space than a string and allow easy computation of things
like recency priors. make sense?

On Wed, Apr 17, 2013 at 12:19 PM, Jimmy Lin [email protected]:

Hi Miles,

I think these are all reasonable fields to throw into the index. I've
assigned this task to you? We need to see how big the index becomes...

Thanks!


Reply to this email directly or view it on GitHubhttps://github.com//issues/26#issuecomment-16519667
.

Miles Efron
Assistant Professor
Graduate School of Library and Information Science
University of Illinois, Urbana-Champaign

from twitter-tools.

lintool avatar lintool commented on May 29, 2024

sgtm

from twitter-tools.

hussam123 avatar hussam123 commented on May 29, 2024
  • the comments of each tweet

from twitter-tools.

hussam123 avatar hussam123 commented on May 29, 2024
  • list of twitterer friends and followers

from twitter-tools.

amjedbj avatar amjedbj commented on May 29, 2024

@dpmccul that's right, what we need to extract is the retweet count of retweeted status (tweet.retweet_status.reweet_count) . Tweets from streaming API are just published and thus have low chance to be retweeted.
@hussam123 comments, followers and friends are not available through Streaming API. You can use Twitter REST API (Rate Limited).

Seems that lang attribute is not available for all tweets in Tweet2013 dataset
(see https://dev.twitter.com/blog/introducing-new-metadata-for-tweets)

@lintool @milesefron I updated the list of fields #26 (comment).

from twitter-tools.

telsayed avatar telsayed commented on May 29, 2024

I'd like to have the following also to be indxed as fields:

  • hashtags (i.e., that are mentioned in the tweet), [so that we can retrieve all tweets that have a specific hashtag].
  • mentions (usernames mentioned in the tweet)
  • URLs [so that we can get all tweets that point to a specific URL]
  • comments (that are written beside a retweet)

Sorry if duplicated.

from twitter-tools.

stewhdcs avatar stewhdcs commented on May 29, 2024

@telsayed If the analyzer (lucene's tokenization approach) is setup correctly (keeping preceding #'s and @'s), then you should be able to perform queries for these through the current index.

@JamesMcMinn is assigned task #23 to develop an appropriate analyzer. I will add your comments to that issue.

from twitter-tools.

amjedbj avatar amjedbj commented on May 29, 2024

In the last version of API specification https://github.com/lintool/twitter-tools/wiki/TREC-2013-API-Specifications status.retweet_status.id and status.retweet_status.user.screenname have been removed.

Even though retweets are consirded irrelevant in the two last editions of TREC microblogs, these fields are helpul for social network based appraoches. I used these two fields in my trec2011 and trec2012 runs.

from twitter-tools.

milesefron avatar milesefron commented on May 29, 2024

I'll go ahead and put the retweeted_status.id and retweeted_user_id
elements back in the index.

As for the screenname element, do folks need that if we're already exposing
the user_id? i assumed having one would be enough. but let me know if not.

On Mon, May 6, 2013 at 4:43 PM, Lamjed Ben Jabeur
[email protected]:

In the last version of API specification
https://github.com/lintool/twitter-tools/wiki/TREC-2013-API-Specifications
status.retweet_status.id and _status.retweet_status.user.screenname_has been removed.

Even though retweets are consirded irrelevant in the two last editions of
TREC microblogs, these fields are helpul for social network based
appraoches. I used these two fields in my trec2011 and trec2012 runs.


Reply to this email directly or view it on GitHubhttps://github.com//issues/26#issuecomment-17510187
.

Miles Efron
Assistant Professor
Graduate School of Library and Information Science
University of Illinois, Urbana-Champaign

from twitter-tools.

Latifa-AlMarri avatar Latifa-AlMarri commented on May 29, 2024

Any chance we can play with the APIs?
.......
Latifa
Qatar University

from twitter-tools.

lintool avatar lintool commented on May 29, 2024

This task has been completed and results have been merged into the trec2013-api branch.

from twitter-tools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.