Giter Club home page Giter Club logo

Comments (5)

rajatbarve avatar rajatbarve commented on June 12, 2024

@polm could you please chime in

from fugashi.

polm avatar polm commented on June 12, 2024

I'm not sure I understand your problem entirely, but it sounds like you want to download the UniDic dictionary from code instead of the command line. You can do that like this:

from unidic.download import download_version
download_version()

The command line download command is just a wrapper for that function.

from fugashi.

rajatbarve avatar rajatbarve commented on June 12, 2024

I'm not sure I understand your problem entirely, but it sounds like you want to download the UniDic dictionary from code instead of the command line. You can do that like this:

from unidic.download import download_version
download_version()

The command line download command is just a wrapper for that function.

Thank you @polm!

But I am still getting the error "The unidic_lite dictionary is not installed".
My model is making a call to the class BertJapaneseTokenizer. (I would request you to take a look at the source code here. It will greatly facilitate my explanation: [https://huggingface.co/transformers/v4.11.3/_modules/transformers/models/bert_japanese/tokenization_bert_japanese.html]

You will see that on following the trail of function calls in reverse, it looks like that error can occur only when the variable mecab_dic is equal to = 'unidic_lite' (mecab_dic is an attribute of the class MecabTokenizer that you will also find in the link above).

My question is can I override this value of mecab_dic and change it to "unidic" instead of the current "unidic_lite"

That way I will not get the error.

from fugashi.

polm avatar polm commented on June 12, 2024

I looked at the code that you linked to but I'm not sure what to tell you - I didn't write that code and it's not part of this library, so I have no control over it. Your code (which you haven't shared) or the HuggingFace code is setting that value somewhere.

I'm also not sure why you think you can't use unidic-lite. Can you explain that or try it?

from fugashi.

polm avatar polm commented on June 12, 2024

Closing for lack of response / because this issue isn't relevant to fugashi directly.

For the record, if you want help with this you should show the code where you're using BertJapaneseTokenizer and explain what you're actually trying to do. It sounds like you have a usage question about the HuggingFace code, but with just the information you've given there's no way to tell what's going on.

from fugashi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.