Giter Club home page Giter Club logo

nutch's Introduction

Apache Nutch README

For the latest information about Nutch, please visit our website at:

https://nutch.apache.org/

and our wiki, at:

https://cwiki.apache.org/confluence/display/NUTCH/Home

To get started using Nutch read Tutorial:

https://cwiki.apache.org/confluence/display/NUTCH/NutchTutorial

Contributing

To contribute a patch, follow these instructions (note that installing Hub is not strictly required, but is recommended).

0. Download and install hub.github.com
1. File JIRA issue for your fix at https://issues.apache.org/jira/projects/NUTCH/issues
- you will get issue id NUTCH-xxx where xxx is the issue ID.
2. git clone https://github.com/apache/nutch.git
3. cd nutch
4. git checkout -b NUTCH-xxx
5. edit files (please try and include a test case if possible)
6. git status (make sure it shows what files you expected to edit)
7. Make sure that your code complies with the [Nutch codeformatting template](https://raw.githubusercontent.com/apache/nutch/master/eclipse-codeformat.xml), which is basially two space indents
8. git add <files>
9. git commit -m “fix for NUTCH-xxx contributed by <your username>”
10. git fork
11. git push -u <your git username> NUTCH-xxx
12. git pull-request

IDE setup

Generate Eclipse project files

ant eclipse

and follow the instructions in Importing existing projects.

For Intellij IDEA, first install the IvyIDEA Plugin. then run ant eclipse.

Then open the project in IntelliJ. You may see popups like "Ant build scripts found", "Frameworks detected - IvyIDEA Framework detected". Just follow the simple steps in these dialogs.

You must configure the nutch-site.xml before running. Make sure, you've added http.agent.name and plugin.folders properties. The plugin.folders normally points to <project_root>/build/plugins.

Now create a Java Application Configuration, choose org.apache.nutch.crawl.Injector, add two paths as arguments. First one is the crawldb directory, second one is the URL directory where, the injector can read urls. Now run your configuration.

If we still see the No plugins found on paths of property plugin.folders="plugins", update the plugin.folders in the nutch-default.xml, this is a quick fix, but should not be used.

nutch's People

Contributors

sebastian-nagel avatar sigram avatar siren avatar jnioche avatar chrismattmann avatar cutting avatar lewismc avatar r0ann3l avatar nickredmark avatar yossitamari avatar tejasapatil avatar mfeltscher avatar jorgelbg avatar balashashanka avatar erikhatcher avatar sujen1412 avatar mjjoyce avatar omkar20895 avatar maborec avatar okedoki avatar thammegowda avatar ayalciobotaru avatar karanjeets avatar kwhitehall avatar prakharchaube avatar kamaci avatar pmezard avatar naegelejd avatar aalbahem avatar derhecht avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.