erdincuzun / webce Goto Github PK
View Code? Open in Web Editor NEWWEB Content Extractor (WEBCE) is an open source project that has two effective algorithms to eliminate uninformative blocks and efficiently extract content blocks from web pages. Moreover WEBCE produce a XML File that contains main, headline, and information about the article for a given web page.