Giter Club home page Giter Club logo

Comments (7)

kbastani avatar kbastani commented on July 17, 2024

Great question. They would be considered different distinct documents with the same training labels.

I originally went with an approach that extracted a set of sentences from a source text and ran the training algorithm sentence by sentence. I did this because repetition was important to training good models. This is no longer the case as I've made training the model focus more on quality of training examples.

For your idea to allow multiple documents to be sent during training, this works now, but the document being stored as a "Data" node, it has no unique identity, for instance a URL as an identifier of that document's text.

What I am going to do is to improve the data model to include an optional document identifier. This would be something you pass along during training:

{
    "documents": [
        {
            "uri": "http://en.wikipedia.org/wiki/Interoperability",
            "text": "Interoperability is the ability of making systems and organizations work together.",
            "label": [
                "Computing terminology",
                "Telecommunications engineering",
                "Interoperability",
                "Product testing"
            ]
        },
        {
            "uri": "http://en.wikipedia.org/wiki/Information_technology",
            "text": "Information technology (IT) is the application of computers and telecommunications equipment to store, retrieve, transmit and manipulate data, often in the context of a business or other enterprise.",
            "label": [
                "Information technology",
                "Media technology"
            ]
        }
    ]
}

Let me know what you think.

from graphify.

moooji avatar moooji commented on July 17, 2024

Thx a lot for the explanation and yes, this improved data model would be exactly what I was looking for! 👍 How many training samples would you say (roughly like 10k, 100k, 1m) are a good amount for your algorithm and would there be a big difference between few / big documents vs. many / small documents (like tweets)?

from graphify.

kbastani avatar kbastani commented on July 17, 2024

In the movie review dataset, as many as 200 documents is enough to train a model that classifies correctly 60% of the time. This number increases with the number of documents. This comes at the cost of performance eventually. I'm working on putting a set of guidelines together, which are coming from the examples. As far as document size, batching tweets together with the same hashtags into one document is equivalent to submitting them individually one by one. All content is treated equally during training. Good generalizations come from content that has some uniformity in the grammar as to allow for generalizations to be made for a large set of examples. Since the training model performs grammar induction, if you had many movie reviews by the same author then this would be less effective then having all reviews in the training data be authored by different people.

from graphify.

cicero19 avatar cicero19 commented on July 17, 2024

I attempted to train a model with the example you give and there seem to be a few issues. Is there an issue with my installation?

C:\Users>curl -H "Content-Type: application/json" -d '{"label": ["Documen
t classification"], "text": ["Documents may be classified according to their sub
jects or according to other attributes (such as document type, author, printing
year etc.). In the rest of this article only subject classification is considere
d. There are two main philosophies of subject classification of documents: The c
ontent based approach and the request based approach."]}' http://localhost:7474/
service/graphify/training
curl: (3) [globbing] bad range in column 6
curl: (6) Could not resolve host: text
curl: (3) [globbing] bad range in column 6
{"error":"[org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
, org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimal
Base.java:521), org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpecte
dChar(JsonParserMinimalBase.java:442), org.codehaus.jackson.impl.ReaderBasedPars
er._handleUnexpectedValue(ReaderBasedParser.java:1198), org.codehaus.jackson.imp
l.ReaderBasedParser.nextToken(ReaderBasedParser.java:485), org.codehaus.jackson.
map.ObjectMapper._initForReading(ObjectMapper.java:2770), org.codehaus.jackson.m
ap.ObjectMapper._readMapAndClose(ObjectMapper.java:2718), org.codehaus.jackson.m
ap.ObjectMapper.readValue(ObjectMapper.java:1863), org.neo4j.nlp.ext.PatternReco
gnitionResource.training(PatternRecognitionResource.java:52), sun.reflect.Native
MethodAccessorImpl.invoke0(Native Method), sun.reflect.NativeMethodAccessorImpl.
invoke(Unknown Source), sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
Source), java.lang.reflect.Method.invoke(Unknown Source), com.sun.jersey.spi.con
tainer.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60), com.
sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvi
der$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205
), com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher
.dispatch(ResourceJavaMethodDispatcher.java:75), org.neo4j.server.rest.transacti
onal.TransactionalRequestDispatcher.dispatch(TransactionalRequestDispatcher.java
:139), com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule
.java:288), com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightH
andPathRule.java:147), com.sun.jersey.server.impl.uri.rules.ResourceClassRule.ac
cept(ResourceClassRule.java:108), com.sun.jersey.server.impl.uri.rules.RightHand
PathRule.accept(RightHandPathRule.java:147), com.sun.jersey.server.impl.uri.rule
s.RootResourceClassesRule.accept(RootResourceClassesRule.java:84), com.sun.jerse
y.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.j
ava:1469), com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequ
est(WebApplicationImpl.java:1400), com.sun.jersey.server.impl.application.WebApp
licationImpl.handleRequest(WebApplicationImpl.java:1349), com.sun.jersey.server.
impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339),
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416
), com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContaine
r.java:537), com.sun.jersey.spi.container.servlet.ServletContainer.service(Servl
etContainer.java:699), javax.servlet.http.HttpServlet.service(HttpServlet.java:8
48), org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:698), org
.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:505), org.ecl
ipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:211), org.
eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1096),
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432), org.e
clipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175), org
.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1030),
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136), o
rg.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52), org.ecl
ipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97), org.ecl
ipse.jetty.server.Server.handle(Server.java:445), org.eclipse.jetty.server.HttpC
hannel.handle(HttpChannel.java:268), org.eclipse.jetty.server.HttpConnection.onF
illable(HttpConnection.java:229), org.eclipse.jetty.io.AbstractConnection$ReadCa
llback.run(AbstractConnection.java:358), org.eclipse.jetty.util.thread.QueuedThr
eadPool.runJob(QueuedThreadPool.java:601), org.eclipse.jetty.util.thread.QueuedT
hreadPool$3.run(QueuedThreadPool.java:532), java.lang.Thread.run(Unknown Source)
]"}
C:\Users>

from graphify.

kbastani avatar kbastani commented on July 17, 2024

It looks like the JSON request was malformed. I think that on Windows there
may be a differentiation between single and double quotes on the command
line. You may want to try replacing your single quotes for double quotes
and double quotes for single quotes.

On Mon, Oct 13, 2014 at 4:24 PM, Mark Cicero [email protected]
wrote:

I attempted to train a model with the example you give and there seem to
be a few issues. Is there an issue with my installation?

C:\Users>curl -H "Content-Type: application/json" -d '{"label": ["Documen
t classification"], "text": ["Documents may be classified according to
their sub
jects or according to other attributes (such as document type, author,
printing
year etc.). In the rest of this article only subject classification is
considere
d. There are two main philosophies of subject classification of documents:
The c
ontent based approach and the request based approach."]}'
http://localhost:7474/
service/graphify/training
curl: (3) [globbing] bad range in column 6
curl: (6) Could not resolve host: text
curl: (3) [globbing] bad range in column 6

{"error":"[org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
,
org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimal
Base.java:521),
org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpecte
dChar(JsonParserMinimalBase.java:442),
org.codehaus.jackson.impl.ReaderBasedPars
er._handleUnexpectedValue(ReaderBasedParser.java:1198),
org.codehaus.jackson.imp
l.ReaderBasedParser.nextToken(ReaderBasedParser.java:485),
org.codehaus.jackson.
map.ObjectMapper._initForReading(ObjectMapper.java:2770),
org.codehaus.jackson.m
ap.ObjectMapper._readMapAndClose(ObjectMapper.java:2718),
org.codehaus.jackson.m
ap.ObjectMapper.readValue(ObjectMapper.java:1863),
org.neo4j.nlp.ext.PatternReco
gnitionResource.training(PatternRecognitionResource.java:52),
sun.reflect.Native
MethodAccessorImpl.invoke0(Native Method),
sun.reflect.NativeMethodAccessorImpl.
invoke(Unknown Source),
sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
Source), java.lang.reflect.Method.invoke(Unknown Source),
com.sun.jersey.spi.con
tainer.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60),
com.

sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvi

der$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205
),
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher
.dispatch(ResourceJavaMethodDispatcher.java:75),
org.neo4j.server.rest.transacti

onal.TransactionalRequestDispatcher.dispatch(TransactionalRequestDispatcher.java
:139),
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule
.java:288),
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightH
andPathRule.java:147),
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.ac
cept(ResourceClassRule.java:108),
com.sun.jersey.server.impl.uri.rules.RightHand
PathRule.accept(RightHandPathRule.java:147),
com.sun.jersey.server.impl.uri.rule
s.RootResourceClassesRule.accept(RootResourceClassesRule.java:84),
com.sun.jerse

y.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.j
ava:1469),
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequ
est(WebApplicationImpl.java:1400),
com.sun.jersey.server.impl.application.WebApp
licationImpl.handleRequest(WebApplicationImpl.java:1349),
com.sun.jersey.server.

impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339),

com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416
),
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContaine
r.java:537),
com.sun.jersey.spi.container.servlet.ServletContainer.service(Servl
etContainer.java:699),
javax.servlet.http.HttpServlet.service(HttpServlet.java:8
48),
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:698), org
.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:505),
org.ecl
ipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:211),
org.

eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1096),
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432),
org.e
clipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175),
org

.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1030),
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136),
o
rg.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52),
org.ecl
ipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97),
org.ecl
ipse.jetty.server.Server.handle(Server.java:445),
org.eclipse.jetty.server.HttpC
hannel.handle(HttpChannel.java:268),
org.eclipse.jetty.server.HttpConnection.onF
illable(HttpConnection.java:229),
org.eclipse.jetty.io.AbstractConnection$ReadCa
llback.run(AbstractConnection.java:358),
org.eclipse.jetty.util.thread.QueuedThr
eadPool.runJob(QueuedThreadPool.java:601),
org.eclipse.jetty.util.thread.QueuedT
hreadPool$3.run(QueuedThreadPool.java:532), java.lang.Thread.run(Unknown
Source)
]"}
C:\Users>


Reply to this email directly or view it on GitHub
#12 (comment).

Kenny Bastani
Developer Evangelist, Neo4j
Phone: 239-738-8000
Twitter: http://www.twitter.com/kennybastani
Website: http://www.neo4j.com
(graphs)-[:are]->(everywhere)

Join us at GraphConnect 2014 SF! graphconnect.com
https://wmphighrise.appspot.com/r/c45f4f906f15443b256e9809dd6efeb9?d=http%3A%2F%2Fgraphconnect.com%2F

As a friend of Neo4j, use discount code *KOMPIS
https://wmphighrise.appspot.com/r/c45f4f906f15443b256e9809dd6efeb9?d=https%3A%2F%2Fgraphconnect2014sf.eventbrite.com%2F%3Fdiscount%3DKOMPIS

for $100 off registration*

from graphify.

cicero19 avatar cicero19 commented on July 17, 2024

Yeah it seems to be a command prompt issue. Works great using REST Console chrome plugin. Very impressed with this plugin, keep up the good work. Hope it yields good results with what I am trying to do.

from graphify.

kbastani avatar kbastani commented on July 17, 2024

I'm glad you were able to get it working. Thanks for your support. Please let me know how it goes.

from graphify.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.