Giter Club home page Giter Club logo

susing-piauki's Introduction

su5-sing3

Build Status Coverage Status

頭頁

台語詞性標記

走服務

docker-compose up --build

加登入網站用的數號、密碼

docker exec -ti su5-sing3_gunicorn_1 /bin/bash 
python3 manage.py createsuperuser

其他

time docker build -t su5-sing3 .
time docker build -t tai5-hua5 流程/台華翻譯模型訓練/
docker run -d  --name tai5-hua5 -p 8080:8080 tai5-hua5 /usr/local/lib/python3.5/dist-packages/外部程式/mosesdecoder/bin/mosesserver -f 服務資料/臺語/翻譯做外文模型/model/moses.ini
# docker run --rm -p 8080:8080 tai5-hua5 /usr/local/lib/python3.5/dist-packages/外部程式/mosesdecoder/bin/mosesserver -f 服務資料/臺語/翻譯做外文模型/model/moses.ini
time docker build -t tai5_gi2-liau7 流程/華語標台語語料/

# 家己實作,尾仔放棄,直接用人的套件較穩
# time docker build -t tai5_gi2-gian5_boo5-hing5 流程/台語語料算語言模型/
# time docker build -t tai5_tng7-su5 流程/台語標詞性/

time docker build -t tai5_deepnlp 流程/產生deepnlp語料/

上傳臺語翻華語

docker login 
time docker build -t tai5-hua5 流程/台華翻譯模型訓練/
docker tag tai5-hua5 i3thuan5/su5-sing3_tai5-hua5
docker push i3thuan5/su5-sing3_tai5-hua5

susing-piauki's People

Contributors

sih4sing5hong5 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

mhshih

susing-piauki's Issues

標記表管理 e 先後

ordering = ['-先標記無', '標記者', 'id', ]

抑是

ordering = ['-先標記無', '標記者', 'ppl', 'id', ]

輕聲

臺灣言語工具.解析整理.解析錯誤.解析錯誤: 詞組內底的型「你欲來食飯無」比音「lí beh lâi tsia̍h-pn̄g --bô」少!配對結果:[詞:[字:你 lí], 詞:[字:欲 beh], 詞:[字 :來 lâi], 詞:[字:食 tsia̍h, 字:飯 pn̄g]]

初使語料

我們目前的做法

  1. 提教育部台語辭典的例句,台語(漢字、羅馬字)、華語
    https://github.com/g0v/moedict-data-twblg/tree/master/uni

  2. 華語的部份提去國教院斷詞,而且華語詞面頂有詞性
    http://coct.naer.edu.tw/Segmentor/

  3. 台語的部份因為羅馬字有斷詞,阮就免改
    像是漢字和羅馬字分別是:
    漢字:阮冬山河的水是清甲無地比,
    羅馬字:Guán Tang-suann-hô ê tsuí sī tshing kah bô tè pí,

  4. 台語佮斷詞華語,兩个物件提去訓練moses機器翻譯模型A
    https://github.com/sih4sing5hong5/hok8-bu7/blob/master/使用範例/台華翻譯/Dockerfile#L13

  5. 使用者輸入台語時,會用模型A去得著華語詞佮華語詞性
    https://github.com/i3thuan5/su5-sing3/blob/master/提著詞性結果/views.py#L77

  6. 顯示台語詞,對應的華語詞佮華語詞性

輕聲詞的詞性

    輕聲若是兩个字的,比如講拍「久來kú--lâi」抑是「一寡--tsi̍t-kuá」, 「提交」了後會出現「久來kú-lâi」佮「一寡tsi̍t-kuá」,mā無法度顯示詞類標記。

基礎句選擇表/72/change/ 500 Error

gunicorn_1  | Internal Server Error: /admin/標記/基礎句選擇表/72/change/
gunicorn_1  | Traceback (most recent call last):
gunicorn_1  |   File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 34, in inner
gunicorn_1  |     response = get_response(request)
gunicorn_1  |   File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/base.py", line 126, in _get_response
gunicorn_1  |     response = self.process_exception_by_middleware(e, request)
gunicorn_1  |   File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/base.py", line 124, in _get_response
gunicorn_1  |     response = wrapped_callback(request, *callback_args, **callback_kwargs)
gunicorn_1  |   File "/usr/local/lib/python3.5/dist-packages/django/contrib/admin/options.py", line 607, in wrapper
gunicorn_1  |     return self.admin_site.admin_view(view)(*args, **kwargs)
gunicorn_1  |   File "/usr/local/lib/python3.5/dist-packages/django/utils/decorators.py", line 142, in _wrapped_view
gunicorn_1  |     response = view_func(request, *args, **kwargs)
gunicorn_1  |   File "/usr/local/lib/python3.5/dist-packages/django/views/decorators/cache.py", line 44, in _wrapped_view_func
gunicorn_1  |     response = view_func(request, *args, **kwargs)
gunicorn_1  |   File "/usr/local/lib/python3.5/dist-packages/django/contrib/admin/sites.py", line 223, in inner
gunicorn_1  |     return view(request, *args, **kwargs)
gunicorn_1  |   File "/usr/local/lib/python3.5/dist-packages/django/contrib/admin/options.py", line 1650, in change_view
gunicorn_1  |     return self.changeform_view(request, object_id, form_url, extra_context)
gunicorn_1  |   File "/usr/local/lib/python3.5/dist-packages/django/utils/decorators.py", line 45, in _wrapper
gunicorn_1  |     return bound_method(*args, **kwargs)
gunicorn_1  |   File "/usr/local/lib/python3.5/dist-packages/django/utils/decorators.py", line 142, in _wrapped_view
gunicorn_1  |     response = view_func(request, *args, **kwargs)
gunicorn_1  |   File "/usr/local/lib/python3.5/dist-packages/django/contrib/admin/options.py", line 1536, in changeform_view
gunicorn_1  |     return self._changeform_view(request, object_id, form_url, extra_context)
gunicorn_1  |   File "/usr/local/lib/python3.5/dist-packages/django/contrib/admin/options.py", line 1565, in _changeform_view
gunicorn_1  |     ModelForm = self.get_form(request, obj, change=not add)
gunicorn_1  | TypeError: get_form() got an unexpected keyword argument 'change'

佇digital ocean面頂無法度連mosesserver

time docker build -t tai5_gi2-liau7 流程/華語標台語語料/
會有問題

可能是防火牆的問題

docker .0.6
mosesserver .0.5
host .0.1

入去docker

root@467607f66d64:/usr/local/su5-sing3# wget 172.17.0.1:8080
--2018-05-04 08:15:06--  http://172.17.0.1:8080/
Connecting to 172.17.0.1:8080... failed: Connection timed out.
Retrying.

防火牆 .0.6=>.0.1,就無回應

本機

root@hok8bu7-docker-s-2vcpu-4gb-sgp1-01:~# wget 172.17.0.1:8080
--2018-05-04 16:14:49--  http://172.17.0.1:8080/
Connecting to 172.17.0.1:8080... connected.
HTTP request sent, awaiting response... 404 Not Found
2018-05-04 16:14:49 ERROR 404: Not Found.

防火牆 .0.1=>.0.5,就無回應

討論 基礎句清單頁 的操作

討論按怎揀句會較輕鬆

  1. 揀過的排去上後壁
-----            -----
1         2 v    1
2         =>     3
3                4
4                2

先無標記是boolean值,預設是False
缺點:

  • 句的排等會亂去,無好揣
  • 無法度確定1、3是猶未揀,抑是先莫標記
  1. 顯示上尾揀過的句的編號
-----             -----     ---------------
1         2 v    1          | 上尾更新:2 |
2         =>     2  v       | 佇第1頁    |
3                3          ---------------

先無標記是boolean值,預設是False
假設1,2攏看過,然後揀2,就顯示上尾更新是2。
袂振動排等
缺點:要手動翻頁

  1. 用radio鈕仔
-----               -----
1   ooo      2 v    1  oov
2   ooo      =>     2  ovo
3   ooo             3  ooo  

先標記無是int值,預設是未選0,會用得選 愛揀/莫揀
袂振動排等
缺點:畫面可能傷花

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.