Giter Club home page Giter Club logo

skyworkaigc / skytext-chinese-gpt3 Goto Github PK

View Code? Open in Web Editor NEW
420.0 4.0 23.0 56 KB

SkyText是由奇点智源发布的中文GPT3预训练大模型,可以进行文章续写、对话、中英翻译、内容风格生成、推理、诗词对联等不同任务。| SkyText is a Chinese GPT3 pre-trained large model released by Singularity-AI, which can perform different tasks such as chatting, Q&A, and Chinese-English translation.

Home Page: https://openapi.singularity-ai.com/index.html#/

License: MIT License

gpt-3 gpt3 chatgpt ai artificial-intelligence chatbot instructgpt machine-learning model-training openai

skytext-chinese-gpt3's Introduction

SkyText

SkyText是由奇点智源发布的中文GPT3预训练大模型,可以进行聊天、问答、中英互译等不同的任务。 应用这个模型,除了可以实现基本的聊天、对话、你问我答外,还能支持中英文互译、内容续写、对对联、写古诗、生成菜谱、第三人称转述、创建采访问题等多种功能。

image

huggingface模型主页

一百四十亿参数模型【暂时闭源,即将发布新的百亿参数模型,敬请期待!】 https://huggingface.co/SkyWork/SkyText

三十亿参数模型 https://huggingface.co/SkyWork/SkyTextTiny

下面是一些示例:

效果示例

体验和试用,请访问奇点智源API试用

聊天

image

问答

image

生成菜谱

输入: image

输出: image

对对联

image

项目亮点

  1. 技术优势一 :30多道流程的数据清洗

    随着NLP技术的发展,预训练大模型逐渐成为了人工智能的核心技术之一。预训练大模型通常需要海量的文本来进行训练,网络文本自然成为了最重要的语料来源。而训练语料的质量无疑直接影响着模型的效果。为了训练出能力出众的模型,奇点智源在数据清洗时使用了30多道的清洗流程。精益求精的细节处理,铸造了卓越的模型效果。

  2. 技术优势二:针对中文优化创新的中文编码方式

    曾经在预训练大模型领域,一直是被英文社区主导着,而中文预训练大模型的重要性不言而喻。不同于英文的拼音文字,中文预训练大模型的中文输入方式显然应该有所不同。奇点智源针对中文的特点,优化创新使用了独特的中文编码方式,更加符合中文的语言习惯,重新构建出更利于模型理解的中文字典。

奇点新闻

——————————————————————————————————

依赖

推荐
transformers>=4.18.0

模型使用

# -*- coding: utf-8 -*-
from transformers import GPT2LMHeadModel
from transformers import AutoTokenizer
from transformers import TextGenerationPipeline

# 以 SkyWork/SkyText(13billions) 为例,还有 SkyWork/SkyTextTiny(2.6billions) 可用, 期待使用

model = GPT2LMHeadModel.from_pretrained("SkyWork/SkyText")
tokenizer = AutoTokenizer.from_pretrained("SkyWork/SkyText", trust_remote_code=True)
text_generator = TextGenerationPipeline(model, tokenizer, device=0)
input_str = "今天是个好天气"
max_new_tokens = 20
print(text_generator(input_str, max_new_tokens=max_new_tokens, do_sample=True)) 

版权许可

MIT License

加入开发者群

微信扫码加入开发者群

text

感兴趣别忘了star一下~

image

skytext-chinese-gpt3's People

Contributors

skyworkaigc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

skytext-chinese-gpt3's Issues

HTTP报401: Unauthorized for url https://huggingface.co/SkyWork/SkyText/resolve/main/config.json

安装transformers后, 运行DEMO代码报错:


HTTPError Traceback (most recent call last)
/usr/local/lib/python3.9/dist-packages/huggingface_hub/utils/_errors.py in hf_raise_for_status(response, endpoint_name)
258 try:
--> 259 response.raise_for_status()
260 except HTTPError as e:

12 frames
HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/SkyWork/SkyText/resolve/main/config.json

The above exception was the direct cause of the following exception:

RepositoryNotFoundError Traceback (most recent call last)
RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-643f85f2-6dc7fa3c1f329a1562a2c476)

Repository Not Found for url: https://huggingface.co/SkyWork/SkyText/resolve/main/config.json.
Please make sure you specified the correct repo_id and repo_type.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

During handling of the above exception, another exception occurred:

OSError Traceback (most recent call last)
/usr/local/lib/python3.9/dist-packages/transformers/utils/hub.py in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, use_auth_token, revision, local_files_only, subfolder, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash)
422
423 except RepositoryNotFoundError:
--> 424 raise EnvironmentError(
425 f"{path_or_repo_id} is not a local folder and is not a valid model identifier "
426 "listed on 'https://huggingface.co/models'\nIf this is a private repository, make sure to "

OSError: SkyWork/SkyText is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and pass use_auth_token=True.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.