Comments (3)
Did you create the uberjar? See the note in the README just after the docker build
command. Sorry, I'll move the note and also extend the instructions how to compile the Java code (that's mandatory). But let me know whether running mvn package
before docker build ...
fixes your problem. Thanks!
from news-crawl.
that's the issue.. while running command
mvn clean package
I'm getting this error
Could not find artifact jdk.tools:jdk.tools:jar:1.7 at specified path /usr/lib/jvm/java-11-openjdk-amd64/../lib/tools.jar
from news-crawl.
The tools.jar was removed for Java 9 and upwards. There are two ways to address this problem:
- use Java 8
- exclude the transitive dependency in the pom.xml:
<dependency> <groupId>com.digitalpebble.stormcrawler</groupId> <artifactId>storm-crawler-warc</artifactId> <version>${storm-crawler.version}</version> <exclusions> <exclusion> <!-- not available with Java 9 and higher --> <groupId>jdk.tools</groupId> <artifactId>jdk.tools</artifactId> </exclusion> </exclusions> </dependency>
I'll also have a look how to fix this problem.
from news-crawl.
Related Issues (20)
- Allow to follow news sites not providing RSS/Atom feed or news sitemap HOT 2
- Do not use "http/2" protocol version in HTTP headers in WARC files HOT 2
- Odd duplicate content behaviour on www.diariodeavila.es domain HOT 4
- How to get a listing of WARC/WAT/WET files using HTTP for News Dataset ? HOT 2
- News archive is not available since 06.06.2021 HOT 3
- Run docker in a non-interactively way HOT 1
- How large is the dataset HOT 2
- Use wikidata to complete seeds HOT 1
- Explore schema.org annotations for seed completions
- Consider archiving of news feeds and sitemaps
- produce WET files? HOT 6
- mvn clean package fails on Mac on Apple M1 Pro chip HOT 5
- News archive is not available since 2023-10-23 15:36:50 HOT 1
- Avoid following advertisements in news feeds and sitemaps
- Nutch-compatible implementation of FastURLFilter + use it in PreFilterBolt
- Port topology and resources to StormCrawler 2.10 HOT 2
- news-crawl 2.x Broken when using multiple workers (across multiple hosts) HOT 17
- Have as many WARCBolt instances as there are workers
- Route tuples to the status updater bolt based on URLs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from news-crawl.