Giter Club home page Giter Club logo

concrete-java's Introduction

Copyright 2012-2019 Johns Hopkins University HLTCOE. All rights reserved. See LICENSE in the project root directory.

Concrete Java

Java libraries for the Concrete HLT data schema.

JavaDoc API documentation is hosted on javadoc.io

Generating Thrift Java files

Call generate.sh, where the first and only argument is the path to the thrift files from concrete.

As an example, if the concrete repo and this repo are in the same directory, run:

./generate.sh ../concrete/thrift

Be aware that you'll need Thrift 0.10.0 installed and in your $PATH.

Building and Installing

Maven is used to build concrete-java:

mvn clean package

To install the jars into your local maven repository, run:

mvn clean install

Using an IDE

If you are using an IDE such as Eclipse or IntelliJ, you are likely getting many build errors because some modules use FreeBuilder. See the FreeBuilder readme for instructions on configuring your IDE.

Maven Dependencies

See the pom.xml file for the current version.

<dependency>
  <groupId>edu.jhu.hlt</groupId>
  <artifactId>concrete-core</artifactId>
  <version>x.y.z</version>
</dependency>
<dependency>
  <groupId>edu.jhu.hlt</groupId>
  <artifactId>concrete-safe</artifactId>
  <version>x.y.z</version>
</dependency>
<dependency>
  <groupId>edu.jhu.hlt</groupId>
  <artifactId>concrete-util</artifactId>
  <version>x.y.z</version>
</dependency>
<dependency>
  <groupId>edu.jhu.hlt</groupId>
  <artifactId>concrete-validation</artifactId>
  <version>x.y.z</version>
</dependency>

concrete-java's People

Contributors

azpoliak avatar cash avatar charman avatar cjmay avatar ctongfei avatar fmof avatar forkunited avatar jacksullivan avatar jamesmayfield avatar jayded avatar jennsleeman avatar maxthomas avatar mdredze avatar mgormley avatar tomlippincott avatar tturpen avatar twolfe18 avatar vandurme avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

concrete-java's Issues

Difficulty with first installation

I'm trying to start with Concrete, first by parsing the Gigaword corpus. The script ingest-gw.sh gives me this error:

[minhle@fs0 gigaword]$ ./ingest-gw.sh /home/minhle/scratch/gigaword ../../output/gigaword
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: edu/jhu/hlt/utilt/io/NotFileException
	at java.lang.Class.getDeclaredMethods0(Native Method)
	at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
	at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
	at java.lang.Class.getMethod0(Class.java:3018)
	at java.lang.Class.getMethod(Class.java:1784)
	at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: edu.jhu.hlt.utilt.io.NotFileException
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 7 more

This is what I have done so far:

git clone https://github.com/hltcoe/concrete-java.git
rm -rf concrete-java/.git
cd concrete-java
mvn clean package
cd ingesters/gigaword
./ingest-gw.sh /home/minhle/scratch/gigaword ../../output/gigaword

BTW, following the instruction in ingesters/gigaword folder gives me this error:

mvn clean compile assembly:single
...
[ERROR] Failed to execute goal on project concrete-ingesters-gigaword: Could not resolve dependencies for project edu.jhu.hlt:concrete-ingesters-gigaword:jar:4.12.1-SNAPSHOT: The following artifacts could not be resolved: edu.jhu.hlt:concrete-util:jar:4.12.1-SNAPSHOT, edu.jhu.hlt:concrete-validation:jar:4.12.1-SNAPSHOT, edu.jhu.hlt:concrete-ingesters-base:jar:4.12.1-SNAPSHOT: Failure to find edu.jhu.hlt:concrete-util:jar:4.12.1-SNAPSHOT in https://clojars.org/repo was cached in the local repository, resolution will not be reattempted until the update interval of clojars has elapsed or updates are forced -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException

Concrete should not depend on specific SLF4J backends

Concrete should not depend on specific SLF4J backends. As a library, concrete should only depend on the SLF4J API, but not on any specific implementations such as LOG4J. The actual backend should be chosen by the user of the the library, i.e. the person that builds an application using concrete.

Having a re-usable library depend on a SLF4J backends introduces a high risk of ending up with multiple backends on the classpath when the library is being used within an application.

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/.../.m2/repository-ukp/org/apache/logging/log4j/log4j-slf4j-impl/2.5/log4j-slf4j-impl-2.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/.../.m2/repository-ukp/org/slf4j/slf4j-log4j12/1.7.21/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Concretely affected (sorry!) are

  • concrete-safe 4.10.0
  • concrete-util 4.10.0
  • maybe others ;)

Splitting hashtags

Some hashtags are tokenized incorrectly if they occur at the end of a tweet. The main example is "#FelizDiaDeLaM…", where the '…' is the unicode ellipsis, which gets tokenized '#','FelizDiaDeLaM','…'. My guess is that this error will occur on all hashtags that look like "#[[:alnum:]]+[^[:alnum:]\s]+"

Numbers?

Should numbers receive different tags from unknown words? Currently both are 'OTHER'.

Missed URLs

While the string 'http' or 'https' is not necessarily always a URL, if it occurs at the end of a tweet (or particularly when followed only by '...') and after another URL, it should likely be tagged as such. (I haven't looked at how tift actually works, so this reasoning may or may not be implementable in the current framework).

Example tweet: "'La traición vendrá de un general de alto rango que generará un gran caos' - http://t.co/MgLypirfTV http…"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.