Comments (4)
Please feel free to patch. I'm traveling and can get to it in a couple days but not right now. If I don't hear otherwise, I'll work on it in a day or two.
And feel free to modify if an approach other than those try/gets would work. None were obvious to me.
Sent from my iPhone
On May 20, 2013, at 1:02 PM, Ian Soboroff [email protected] wrote:
I found some statuses in my crawl that don't have the retweet_count field. This causes a NPE:
java.lang.NullPointerException
at cc.twittertools.corpus.data.Status.fromJson(Status.java:162)
at cc.twittertools.corpus.data.JsonStatusBlockReader.next(JsonStatusBlockReader.java:44)
at cc.twittertools.corpus.data.JsonStatusCorpusReader.next(JsonStatusCorpusReader.java:48)
at cc.twittertools.search.indexing.IndexStatuses.main(IndexStatuses.java:138)Sure enough, Status.fromJson tries to set the retweetCount by blindly following obj.get()'s result without checking it. Since it's a NPE I can only get the stack trace by running in jdb.
I'm happy to submit a patch to check this case, but I want to know -- is there any reason to expect any of these fields to be in a given status? If not, is there a cleaner way to wrap these checks than using all these try-catch blocks?
—
Reply to this email directly or view it on GitHub.
from twitter-tools.
I'll put up a basic patch in a pull request, and leave the redesign for some comments from Jimmy. The trivial answer is methods on a JsonObject like get_asLong("field") which can catch the null more cleanly, but Jimmy may have a better answer from the style perspective.
from twitter-tools.
from twitter-tools.
This patch worked for me, completing a full index in about a day and a half on my Mac Pro running in all local storage. I'm going to mark this closed, and merge the patch.
from twitter-tools.
Related Issues (20)
- Spam filtering
- Store created_at as long HOT 1
- Add auth mechanism HOT 1
- Java set to require too much memory in etc/run.sh HOT 2
- Extract Named Entities
- Decide on (Lucene) Analyzer HOT 15
- Implement service to return term counts HOT 6
- Response format HOT 2
- What fields do you need to replicate your run? HOT 22
- Extract Entities form tweet text HOT 2
- Memory usage in IndexStatuses HOT 1
- MalformedJsonException forced end to indexing HOT 4
- "Connection timed out" on client.search
- Merge RM3 contribution back into master HOT 1
- RM3 doesn't implement duplicate removal
- trec_eval problem at compile time
- twitter collection 2011
- The test URL is access denied HOT 1
- Fetch data from Tweets2011 Collection HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from twitter-tools.