0x38 / simple-tika-server Goto Github PK
View Code? Open in Web Editor NEWThis project forked from gselva/simple-tika-server
Apache Tika as a http service, PUT files and get the metadata as JSON
This project forked from gselva/simple-tika-server
Apache Tika as a http service, PUT files and get the metadata as JSON
Simple Tika Server https://github.com/gselva ========================== Tika as a HTTP service to parse and get metadata and the textual content of documents (as json) such as below. Sample Invocation: ------------------ curl -T tika.pdf http://localhost:8080/tika/fulldata Sample Output: ------------- { "metadata": { "Author":"Apache", "Content-Length":243166, "Content-Type":"application/pdf", "Creation-Date":"2010-05-11T13:39:32Z", "Keywords":"", "Last-Modified":"2010-05-11T13:39:32Z", "created":"Wed May 11 14:39:32 BST 2010", "creator":"Google Chrome", "producer":"Mac OS X 10.6.7 Quartz PDFContext", "resourceName":"tika.pdf", "subject":"", "title":"Apache Tika - Apache Tika", "xmpTPg:NPages":4 }, "text":"Apache Tika\nApache Tika - a content analysis toolkit" } Based on https://github.com/maxcom/tikaserver-ex (thanks Max!) More info --------- Tika as server : https://issues.apache.org/jira/browse/TIKA-593 Json output : https://issues.apache.org/jira/browse/TIKA-213 Build and run ------------- Install Maven if you haven't, then, under project root folder do: mvn jetty:run Deploy ------ mvn war:war Then drop the war into the webapps folder of a servlet container like Jetty or Apache Tomcat. If using Simple Tika Server for parsing local files with HTTP GET, you'll need to setup JNDI entries, see below. Usage ----- HTTP PUT a document to parse with Tika PUT document to get metadata curl -T pom.xml http://localhost:8080/tika/metadata returns metadata as JSON PUT document to get text curl -T pom.xml http://localhost:8080/tika/text returns textual content, if any, as json PUT document to get metadata and text curl -T pom.xml http://localhost:8080/tika/fulldata returns metadata and textual content as json HTTP GET to parse a locally available document with Tika GET Metadata for files local to server ('metadata' option) http://localhost:8080/tika/metadata/localfiles/tika.pdf returns metadata as JSON 'localfiles' is a JNDI value that returns a URL (see jetty-env.xml for example) tika.pdf should be made available there before calling http://localhost:8080/tika/metadata/wikipedia/wiki/Douglas_Adams returns metadata as JSON 'wikipedia' is a JNDI value that returns a URL (see jetty-env.xml for example) "wiki/Douglas_Adams" is the resource you want to parse http://localhost:8080/tika/metadata/wikipedia/w/index.php?title=Talk:Douglas_Adams&action=history document part can have query params as well "w/index.php?title=Talk:Douglas_Adams&action=history" is treated as the source document here GET Text for files local to server ('text' option) http://localhost:8080/tika/text/localfiles/tika.pdf returns textual content, if any, as json 'localfiles' is a JNDI value that returns a URL (see jetty-env.xml for example) tika.pdf should be made available there before calling GET Metadata and Text for files local to server ('fulldata' option) http://localhost:8080/tika/fulldata/localfiles/tika.pdf returns metadata and textual content as json 'localfiles' is a JNDI value that returns a URL (see jetty-env.xml for example) tika.pdf should be made available there before calling HTTP Codes returned ------------------- 200 - Ok 404 - Document not found 415 - Unknown file type 422 - Unparsable document of known type (password protected documents and unsupported versions like Biff5 Excel) 500 - Internal error
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.