Giter Club home page Giter Club logo

limelight's Introduction

Limelight

Latest Stable Version License

A php Japanese language analyzer and parser.
  • Split Japanese text into individual, full words
  • Find parts of speech for words
  • Find dictionary entries (lemmas) for conjugated words
  • Get readings and pronunciations for words
  • Build furigana for words
  • Convert Japanese to romaji (English lettering)

Quick Guide

Version Notes

  • April 25, 2016: The Limelight API changed in Version 1.6.0. The new API uses collection methods to give developers better control of Limelight parse results. Please see the wiki for the updated documentation.
  • April 11, 2016: php-mecab, the MeCab bindings Limelight uses, were updated to version 0.6.0 in Dec. 2015 for php 7 support. The pre-0.6.0 bindings no longer work with the master branch of Limelight. If you are using an older version of php-mecab, please update your bindings or use the php-mecab_pre_0.6.0 version.

Install Limelight

Using Docker

From the project root, build the image:

docker build -f docker/Dockerfile -t limelight .

Once it is built, run the container:

docker run --name limelight -v /host/path/to/limelight:/usr/limelight -d --rm limelight

Access the project in the container:

docker exec -it limelight bash

Install composer dependencies from within the container:

composer install

Without Docker

Requirements
  • php > 5.6
Dependencies

Before installing Limelight, you must install both mecab and the php extension php-mecab on your system.

Linux Ubuntu Users

Use the install script included in this repository. The script only works for and php7. Download the script:

curl -O https://raw.githubusercontent.com/nihongodera/limelight/master/install_mecab_php-mecab.sh

Make the file executable:

chmod +x install_mecab_php-mecab.sh

Execute the script:

./install_mecab_php-mecab.sh

You may need to restart your server to complete the process.

For information about what the script does, see here.

Other Systems

Please see this page to learn more about installing on your system.

Install Limelight

Install Limelight through composer.

composer require nihongodera/limelight

Parse Text

Make a new instance of Limelight\Limelight. Limelight takes no arguments.

$limelight = new Limelight();

Use the parse() method on the Limelight object to parse Japanese text.

$results = $limelight->parse('庭でライムを育てています。');

The returned object is an instance of Limelight\Classes\LimelightResults.

Get Results

Get results for the entire text using methods available on LimelightResults.

$results = $limelight->parse('庭でライムを育てています。');

echo 'Words: ' . $results->string('word') . "\n";
echo 'Readings: ' . $results->string('reading') . "\n";
echo 'Pronunciations: ' . $results->string('pronunciation') . "\n";
echo 'Lemmas: ' . $results->string('lemma') . "\n";
echo 'Parts of speech: ' . $results->string('partOfSpeech') . "\n";
echo 'Hiragana: ' . $results->toHiragana()->string('word') . "\n";
echo 'Katakana: ' . $results->toKatakana()->string('word') . "\n";
echo 'Romaji: ' . $results->string('romaji', ' ') . "\n";
echo 'Furigana: ' . $results->string('furigana') . "\n";

Output: Words: 庭でライムを育てています。 Readings: ニワデライムヲソダテテイマス。 Pronunciations: ニワデライムヲソダテテイマス。 Lemmas: 庭でライムを育てる。 Parts of speech: noun postposition noun postposition verb symbol Hiragana: にわでらいむをそだてています。 Katakana: ニワデライムヲソダテテイマス。 Romaji: niwa de raimu o sodateteimasu. Furigana: (にわ)でライムを(そだ)てています。

Alter the collection of words however you like using the library of collection methods.

Get individual words off the LimelightResults object by using one of several applicable collection methods. Use methods available on the returned LimelightWord object.

$results = $limelight->parse('庭でライムを育てています。');

$word1 = $results->pull(2);

$word2 = $results->where('word', '');

echo $word1->string('romaji') . "\n";

echo $word2->string('furigana') . "\n";

Output: raimu にわ

Methods on the LimelightResults object and the LimelightWord object follow the same conventions, but LimelightResults methods are plural (words()) while LimelightWord methods are singular (word()).

Alternatively, loop through all the words on the LimelightResults object.

$results = $limelight->parse('庭でライムを育てています。');

foreach ($results as $word) {
    echo $word->word() . ' is a ' . $word->partOfSpeech() . ' read like ' . $word->reading() . "\n";
}

Output: 庭 is a noun read like ニワ で is a postposition read like デ ライム is a noun read like ライム を is a postposition read like ヲ 育てています is a verb read like ソダテテイマス 。 is a symbol read like 。

Full Documentation

Full documentation for Limelight can be found on the Limelight Wiki page.

Sources, Contributions, and Contributing

The Japanese parsing logic used in Limelight was adapted from Kimtaro's excellent Ruby program Ve. A big thank you to him and all the others who contributed on that project.

Limelight relies heavily on both MeCab and php-mecab.

Collection methods and methods in the Arr class were derived from Laravel's collection methods.

Contributors more than welcome.

Top

limelight's People

Contributors

nihongodera avatar shou-nen avatar zachleigh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

limelight's Issues

PHP v8 support?

PHP v8 support?

I tried on php v8.0, v8.1, v8.2
It fails when I'm trying to run

RUN wget https://github.com/nihongodera/php-mecab/archive/master.zip \ && unzip master.zip \ && cd php-mecab-master/mecab \ && phpize \ && ./configure \ && make \ && make install

I'm getting these errors
php v8 errors.txt

Kanji -> Furigana translation sometimes makes small mistakes

Hello :)

First of all, congratulations on the great work this plugin is.

Second, I wanted to ask - maybe this is because of a dictionary error, not because of the plugin, but sometimes I get wrongly translated furigana when translating from kanji. Given how difficult it is to write it 100% correctly, I guess so, but I was wondering if there's a way to fix those inconsistencies?

For example, I've found this small errors:

土 - get's translated as ど while it should be つち
一人 - get's translated as いちにん while it should be ひとり
三百 - get's translated as さんひゃく while it should be さんびゃく
...

This is to show an example of a few, there are more, although I would say about 95% of the translated text is correct... which is absolutely awesome.

By any chance, do you have any suggestions for the above predicament?

Composer namespace Limelight not found!

Not work for me. Composer not found your namespace and classname. Install is correct. Please help me for this.

My file installed.json:
[ { "name": "nihongodera/limelight", "version": "v1.6.6", "version_normalized": "1.6.6.0", "source": { "type": "git", "url": "https://github.com/nihongodera/limelight.git", "reference": "037cd19aa1df6ceb51b346af25378ee96de42a14" }, "dist": { "type": "zip", "url": "https://api.github.com/repos/nihongodera/limelight/zipball/037cd19aa1df6ceb51b346af25378ee96de42a14", "reference": "037cd19aa1df6ceb51b346af25378ee96de42a14", "shasum": "" }, "require": { "ext-mecab": "*", "php": ">=5.6" }, "require-dev": { "phpunit/phpunit": "^6.4" }, "time": "2018-09-22T02:16:06+00:00", "type": "project", "installation-source": "dist", "autoload": { "psr-4": { "Limelight\\": "src/" } }, "notification-url": "https://packagist.org/downloads/", "license": [ "MIT" ], "authors": [ { "name": "Zach Leigh", "email": "[email protected]", "role": "Developer" } ], "description": "A php Japanese language text analyzer and parser.", "homepage": "https://github.com/nihongodera/limelight", "keywords": [ "furigana", "japanese", "kanji", "language", "mecab", "parse", "romaji" ] }, { "name": "overtrue/pinyin", "version": "3.0.6", "version_normalized": "3.0.6.0", "source": { "type": "git", "url": "https://github.com/overtrue/pinyin.git", "reference": "3b781d267197b74752daa32814d3a2cf5d140779" }, "dist": { "type": "zip", "url": "https://api.github.com/repos/overtrue/pinyin/zipball/3b781d267197b74752daa32814d3a2cf5d140779", "reference": "3b781d267197b74752daa32814d3a2cf5d140779", "shasum": "" }, "require": { "php": ">=5.3" }, "require-dev": { "phpunit/phpunit": "~4.8" }, "time": "2017-07-10T07:20:01+00:00", "type": "library", "installation-source": "dist", "autoload": { "psr-4": { "Overtrue\\Pinyin\\": "src/" } }, "notification-url": "https://packagist.org/downloads/", "license": [ "MIT" ], "authors": [ { "name": "Carlos", "homepage": "http://github.com/overtrue" } ], "description": "Chinese to pinyin translator.", "homepage": "https://github.com/overtrue/pinyin", "keywords": [ "Chinese", "Pinyin", "cn2pinyin" ] } ]
Any ideas?

Furigana in input

Is it possible to add furigana in input text to help the algorithm understand what is correct kana or simply force specific kana for a given kanji (especially when convert to romaji)?

Something like
漢字 [かんじ] or [漢字] {かな} etc.

When adding multiple dividing characters with string() method, divider is present at beginning of string

Original: 宇宙航空研究開発機構(JAXA)は8日、金星を回る探査機「あかつき」の軌道修正に成功したと発表した。

$limelightResult->toKatakana()->string('reading', '---');

// ---うちゅう---こうくう---けんきゅう---かいはつ---きこう---(---JAXA---)---は---8---にち---、---きんぼし---を---まわる---たんさき---「---あかつき---」---の---きどう---しゅうせい---に---せいこうした---と---はっぴょうした---。

Should not have dividing character at front of string.

toHiragana and toKatakana methods are skipping kanji

Hello,
I've created a simple page that represents the example from "Getting started". And I've noticed that in my implementation methods toHiragana and toKatakana() are skipping kanji symbols.
Result:
image

Results:
Words: 庭 で ライム を 育てています。
Readings: ニワデライムヲソダテテイマス。
Pronunciations: ニワデライムヲソダテテイマス。
Lemmas: 庭でライムを育てる。
Parts of speech: noun, postposition, noun, postposition, verb, symbol
Hiragana: 庭でらいむを育てています。
Katakana: 庭デライムヲ育テテイマス。
Romaji: niwa de raimu o sodateteimasu.
Furigana: 庭ニワデライムヲ育ソダテテイマス。

If you want, you may try it yourself: http://jpn.white-miku.me/index.php
At the same time readings, pronunciations, romaji and furigana works perfectly.
Can it be a bug? Or maybe it is MeCab misconfiguration?
Thank you.

Undefined offset: Using user dictionary

Thanks so much for putting this together.

Using a user dictionary:

vim /etc/mecabrc

userdic = /home/paul/userdict3.dic

I get:

Undefined offset: 9

at vendor/nihongodera/limelight/src/Parse/Tokenizer.php:155
  151|             $parameters = explode(',', $feature);
  152|
  153|             foreach ($parameters as $index => $parameter) {
  154|                 if ($parameter) {
> 155|                     $token[$this->mecabParameters[$index]] = $this->getParameter($parameter);
  156|                 }
  157|             }
  158|         }
  159|

    +6 vendor frames
7   app/Console/Commands/testMecab3.php:44
    Limelight\Limelight::parse()

    +13 vendor frames
21  artisan:37
    Illuminate\Foundation\Console\Kernel::handle()

I'm getting round this by using an isset just before. But thought you should be aware of it.

Thanks

PHP Fatal error: Uncaught Error: Class 'Limelight' not found

Having trouble understanding why I can't use any Limelight commands. Here is the setup:

  • Installed mecab OK. commandline mecab works as expected.
  • Installed PHP mecab module. Can see ini file in /etc/php/7.4/mods_enabled, phpinfo page confirms it is loaded.
  • Tested php_mecab. parse seems to work, but split hasn't worked yet. There was an issue with open_basedir, but that was resolved so dictionary is available in /var/lib/mecab/

OK, so now trying to figure out Limelight.

  • made a directory on the webserver ./limelight.
  • cd into limelight and run "composer require nihongodera/limelight" which creates the vendor directory and subdirs.
  • in same dir, create limetest.php file containing.

<?php

require 'vendor/autoload.php';

$limelight = new Limelight();

$results = $limelight->parse('庭でライムを育てています');

echo 'Words: ' . $results->string('word') . "\n";
echo 'Readings: ' . $results->string('reading') . "\n";
echo 'Pronunciations: ' . $results->string('pronunciation') . "\n";
echo 'Lemmas: ' . $results->string('lemma') . "\n";
echo 'Parts of speech: ' . $results->string('partOfSpeech') . "\n";
echo 'Hiragana: ' . $results->toHiragana()->string('word') . "\n";
echo 'Katakana: ' . $results->toKatakana()->string('word') . "\n";
echo 'Romaji: ' . $results->string('romaji', ' ') . "\n";
echo 'Furigana: ' . $results->string('furigana') . "\n";

?>

And get Fatal error Uncaught Error: Class 'Limelight' not found

Have tried several forms of the autoload file path, absolute, relative, full, but none change the error.

So where can I troubleshoot next?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.