Giter Club home page Giter Club logo

covid19_twitter's People

Contributors

jmbanda avatar tutubalinaev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covid19_twitter's Issues

Tweets in Indonesian is not found

Hello, from your data description in http://www.panacealab.org/covid19/ (Table 1: Languages and their frequencies on the dataset), the number of tweets in Indonesian (related abbreviation "id") ranks sixth in that of all languages. But I didn’t find it in my json data crawled. Would you find the same problem?

Best wishes

Link to dataset on Zenodo is broken

Readme in version1.0 says:

"Apparently github has a bandwidth limit on free accounts for large files, so the full dataset Version 1 will be available in Zenodo: https://doi.org/10.5281/zenodo.3723940"

But this link to the data set in Zenodo throws an
Internal server error, with
Error identifier: 9eb9e7c76b9348b797b692c529dc143a

Only Tweet IDs

Hi,
Once I load the tsv file I can only find 3 columns. Is this correct or are there other columns as well?

Latest Tweets

What about the last few days?
Are there any updates to this repo planned?

Hydrating Tweets tutorial

Hi,

Thank you for sharing this dataset. I am trying to use it for my maters dissertation.

I am just trying to work through the usage tutorial and am stuck at:

!python3 get_metadata.py -i clean-dataset-filtered.tsv -o hydrated_tweets -k api_keys.json

I get the error:

ImportError: cannot import name 'TweepError' from 'tweepy'

Following a search I gleaned that TweepError has been replaced by TweepyException in the current version. I have tried that but still no luck. Do I need to be using a specific Tweepy version?

EDIT: I have tried to make changes to the get_metadata.py file and I get output files with headings but they're empty

Thanks!

Issues about hydrating

Hello, I successfully completed the pre-step, but I had a problem with the hydration step.

After running get_metadata.py and entering the parameters, The output files I get ("hydrated_tweets" and so on) are blank files (0Kb). What might be wrong with it? (I am sure my input file "'clean-dataset-filtered.tsv'" is valid)

Finally, thank you for the code you provided, which is very inspiring for my research!

Geotagged subset?

Hi!

First of all, thanks for all your great work. This dataset is awesome.

I was wondering if it's possible for you to make public the subset of tweet IDs that are geotagged (i.e., have the coordinates field). Most users have limited capacity for rehydration, and in my case, I am interested specifically in the geotagged tweets. Rehydrating the whole dataset and filtering that out myself would be infeasible, but I noticed that on your website there is an interactive map with tweets based on their locations, so I assume there is already a subset made.

Thanks in advance!

Cannot unpack dailies

Hello! I'm trying to unpack daily .tar.gz files but running into the following error in Python:

import tar
t = tarfile.open('./covid19_twitter/dailies/2020-03-23/2020-03-23_clean.tar.gz')

Screen Shot 2020-03-25 at 12 19 35 AM

And bash:

$ tar tvf ./covid19_twitter/dailies/2020-03-23/2020-03-23_clean.tar.gz
tar: This does not look like a tar archive

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

Anyhow, really like your team's work & thanks in advance!

Retrieve the tweets from ids

Hello, Mr. Banda.
Thanks for you and all co-authors for the amazing database.
For retrieve the tweets from ids, and receiving the text of tweets I used the codes: processing_code/getDataset_clean.py and processing_code/parse_json_extreme_cleantweets.py by Jupyter Notebook. I process I faced with issue, after running the codes I find the files become empty. And I could not retrieve the tweets. Could you please guide me?

Best regards,
Zarnigor Dzhuraeva!

Filter by language. Especially for English

Hi,

This is awesome work and I can see how huge effort it must have taken. I am really glad about that.
We will use this dataset for our Information Networks course and I was wondering if you have the data in language filtered format. When it comes to filtering it ourselves, we will have to hydrate 1.5 billion tweets only to find the 600 million that are in English. Considering the limitations of our Twitter API access, it will be a huge burden on us.
Is there any chance you do that for maybe only the English language? I believe you store them non-dehydrated in JSON format.
That would be a huge help and I believe data in English would be used by a wider community since the tech and research world is mostly work in English.
Thanks

"dailies/2020-12-20" and "dailies/2020-12-21" only include the cleaned dataset

Thank you very much for making available the COVID-19 Twitter dataset. While browsing the repository I have noticed that the folders for the dates "2020-12-20" and "2020-12-21" do not include the "dataset.tsv.gz" file. Only the "clean-dataset.tsv.gz" file is included. Would it be possible to add the full dataset files?

Clustering data

Hello,
could you share please the code for clustering data by frequency terms?

Best regards,
Zarnigor.

hello, i failed when i use the parse_json_extreme.py

I run the "parse_json_extreme.py" in the shall after i download the dataset.tsv , but what i get after that was a empty file. i search on google, and found there are something wrong When the program runs to tweet = json.loads(line), it always failed and continue. Am i forgot something ? i didn't know programming and python too much and i'm confused. Can you help me?

when i run the program , what i typed is "python .\parse_json_extreme.py .\2020-03-22-dataset.tsv"

Tools to hydrate the dataset

Hi there,

Thanks for your excellent work on collecting and sharing this dataset.

As you mentioned on the website, there is a set of tools that can be used to hydrate the dataset.
Could you offer the list of this set of tools?

Thanks and I really appreciate your help.

Lack of some data in the full clean dataset

In my understanding, the set of tweet ids from clean_language_en.tsv should be a proper subset of the set of tweet ids from full_dataset_clean.tsv. However, it is not the case - some ids are not presented in the full clean dataset. So, I am wondering how and why this happens. I can provide a list of non-presented tweets if needed.
Thanks in advance!

Language and place Location unavailable before 07-2020

Hello, It seems that the language and place location are not present in the datasets before July 2020. It was my understanding that they were added to all tweets in version 20.
I was wondering if I misunderstood and they were added to new tweets scraped from version 20 or is this a mistake?

Thank you for your amazing dataset and work :)

Tweet Label

Good morning!!
How is going?
I was browsing through your data.
Do you have the label of the tweets in terms of True(Truth/Good/Correct), or False(Fake/Lie/Disinformation)?

Best Wishes,
Fernando Durier

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.