craw's People
craw's Issues
Enablement of pagination to get all the subreddits URL's
Construct a feed for a set of tags
A rough idea:
This would be a synchronous process. the interface could be sth like
interface FeedGenerator {
List<Article> genFeed(Set<String> topics)
}
This call can block for as long as it takes to construct the feed. You can use concurrency inside but the final call must block. This will be a common pattern for all the feed transformations so that we can mix and match them.
For example, let's say now we want to select this final set of article from a cluster. This cluster could pre-exist or will be created on demand (out of scope for now).
We could just implement the above interface, the topics in this case could belong to the tags of clusters produced and then the impl would supply the heuristics to do that. More importantly, we would already
know how to compose interfaces of this kind so we can have various such impls which we can then compose in different ways to improve feed gen.
Api service
-
given a subreddit - get urls
-
get all subreddit's { name, followers }
-
given urls - get article content
-
link- get current html
-
given two html send diff
Modularize parseJSON
Add oauth for reddit API
Extract the subreddits
Return types of interface
I think a java model return would be better since I can use that directly in jvm clients.
Add .gitignore
Interface doubts
-
https://github.com/entranceplus/craw/blob/master/src/main/java/com/ep/LinkExtractor/ExtractionInterface/LinkFunc.java#L9
What url is expected here ? [Raghav]-Now no URL is expected just invoke the method -
https://github.com/entranceplus/craw/blob/master/src/main/java/com/ep/LinkExtractor/ExtractionInterface/LinkFunc.java#L6
This could just be the subreddit name.[Raghav]- Just pass the subreddit Ex. soccer
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.