Giter Club home page Giter Club logo

Comments (10)

Balearica avatar Balearica commented on May 29, 2024 1

We've exhausted the obvious answers, so to go further we would need a reproducible example. The snippet below is minimal code to run recognition 1,000 times in Node.js. You can run it by cloning this repo, saving the snippet in the examples folder in a new file named recognize_1000.js, and then running node recognize_1000.js [path to your file] from within the examples directory on whatever system you are using for the bot.

const path = require('path');
const { createWorker } = require('../../');

const [,, imagePath] = process.argv;
const image = path.resolve(__dirname, (imagePath || '../../tests/assets/images/cosmic.png'));

(async () => {
  for (let i=0; i<1e3; i++) {
    console.log("Starting iteration: " + i);
    console.log("Creating worker");
    const worker = await createWorker();  
    console.log("Loading language");
    await worker.loadLanguage('eng');
    console.log("Initializing");
    await worker.initialize('eng');
    console.log("Recognizing");
    const { data: { text } } = await worker.recognize(image);
    console.log("Terminating");
    await worker.terminate();
  }
})();

If you are unable to recognize a specific image 1,000 times in a row using this code, please provide the image so I can try and replicate. If you are unable to produce an error using the code above with any image, then I think the issue is specific to some aspect of your project and not Tesseract.js.

from tesseract.js.

Balearica avatar Balearica commented on May 29, 2024

The only reason I can think of why initialization would fail in a non-deterministic manor is due to network issues.

  1. Update to the latest version (v4.1.1 as of the time of this writing)
    1. There have been recent changes to reduce network issues, including (1) switching to a CDN with better reliability and (2) automatically deleting invalid language data downloads
  2. If the issues continue, you can eliminate network-related issues by hosting all the necessary files locally
    1. How to host all files locally is explained here

from tesseract.js.

C0rentinC avatar C0rentinC commented on May 29, 2024

Thank you for your response, i have update to the last version i will see in the next days if the problem persist, if so i will try your second solution !

from tesseract.js.

Balearica avatar Balearica commented on May 29, 2024

Okay, let me know what happens. We are working to resolve these sorts of (seemingly) random/nondeterministic errors, however unfortunately as there are multiple causes and they are inherently difficult to replicate it can be difficult to know when a problem is fixed.

from tesseract.js.

C0rentinC avatar C0rentinC commented on May 29, 2024

In 1 week i met this bug about 2 / 3 times

On this screen Tesseract failed to initialize
image

And on this he seems failed at the beginning of the recognition, first time i encounter this
image

I think its a network issue, i dont try to local install.

I just have to do that ?
const worker = await createWorker({ workerPath: 'https://cdn.jsdelivr.net/npm/[email protected]/dist/worker.min.js', langPath: 'https://tessdata.projectnaptha.com/4.0.0', corePath: 'https://cdn.jsdelivr.net/npm/[email protected]', });
Or just add langPath, im using Node.js

from tesseract.js.

Balearica avatar Balearica commented on May 29, 2024

Does it fail repeatedly or does simply rerunning the failed command resolve the issue?

Can you post a link to the code that you are running?

I am now more skeptical that it is a network issue. The Node.js version does not use a CDN for code (as all of the code is already in the npm package). While the Node.js version does use a remote server for language data (which you can change to a local directory using langPath) your logs indicate that the language traineddata was loaded from the cache, which is (by definition) always local.

from tesseract.js.

C0rentinC avatar C0rentinC commented on May 29, 2024

Just rerunning the command resolve the issue,

Actually the repo is private but this is the part of code were i use Tesseract:
image

But if you want i can send you more.

from tesseract.js.

Balearica avatar Balearica commented on May 29, 2024

Is the code block you show above ever executed in parallel, with multiple images potentially being recognized at the same time? Given that a worker is created and destroyed each time this is run, I could see this being problematic in a situations where that is the case.

For parallel processing, we recommend creating a scheduler with a set number of workers. Then, jobs can be sent to the scheduler, which sends them to the workers it managers. An example of this can be found here. Without using schedulers, resource-related crashes are common as each instance of Tesseract requires multiple gigabytes of memory.

from tesseract.js.

C0rentinC avatar C0rentinC commented on May 29, 2024

No, this code is never executed in parallel and only with one image unfortunately :(

A little example
image
The first command work, second crash and the last i rerun the crashed command with the same param and working

from tesseract.js.

Balearica avatar Balearica commented on May 29, 2024

Closing for now. Will reopen is a reproducible example of code + image that causes these errors (even probabilistically/non-deterministically) is identified.

from tesseract.js.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.