microsoft / ai-system Goto Github PK
View Code? Open in Web Editor NEWSystem for AI Education Resource.
Home Page: https://microsoft.github.io/AI-System/
License: Creative Commons Attribution 4.0 International
System for AI Education Resource.
Home Page: https://microsoft.github.io/AI-System/
License: Creative Commons Attribution 4.0 International
my environment : CUDAtoolkit 10.0 pytorch 1.5.0 tensorflow 1.15.0
when i run python mnist_tensorboard.py
, some error happened:
2021-03-21 20:15:01.978418: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2021-03-21 20:15:04.781186: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same
Error occurs, No graph saved
Traceback (most recent call last):
File "mnist_tensorboard.py", line 199, in <module>
main()
File "mnist_tensorboard.py", line 182, in main
writer.add_graph(model, images)
File "D:\Program_Files\Anaconda3\envs\ai-system-learn\lib\site-packages\torch\utils\tensorboard\writer.py", line 707, in add_graph
self._get_file_writer().add_graph(graph(model, input_to_model, verbose))
File "D:\Program_Files\Anaconda3\envs\ai-system-learn\lib\site-packages\torch\utils\tensorboard\_pytorch_graph.py", line 291, in graph
raise e
File "D:\Program_Files\Anaconda3\envs\ai-system-learn\lib\site-packages\torch\utils\tensorboard\_pytorch_graph.py", line 285, in graph
trace = torch.jit.trace(model, args)
File "D:\Program_Files\Anaconda3\envs\ai-system-learn\lib\site-packages\torch\jit\__init__.py", line 875, in trace
check_tolerance, _force_outplace, _module_class)
File "D:\Program_Files\Anaconda3\envs\ai-system-learn\lib\site-packages\torch\jit\__init__.py", line 1027, in trace_module
module._c._create_method_from_trace(method_name, func, example_inputs, var_lookup_fn, _force_outplace)
File "D:\Program_Files\Anaconda3\envs\ai-system-learn\lib\site-packages\torch\nn\modules\module.py", line 548, in __call__
result = self._slow_forward(*input, **kwargs)
File "D:\Program_Files\Anaconda3\envs\ai-system-learn\lib\site-packages\torch\nn\modules\module.py", line 534, in _slow_forward
result = self.forward(*input, **kwargs)
File "mnist_tensorboard.py", line 61, in forward
x = self.conv1(x)
File "D:\Program_Files\Anaconda3\envs\ai-system-learn\lib\site-packages\torch\nn\modules\module.py", line 548, in __call__
result = self._slow_forward(*input, **kwargs)
File "D:\Program_Files\Anaconda3\envs\ai-system-learn\lib\site-packages\torch\nn\modules\module.py", line 534, in _slow_forward
result = self.forward(*input, **kwargs)
File "D:\Program_Files\Anaconda3\envs\ai-system-learn\lib\site-packages\torch\nn\modules\conv.py", line 349, in forward
return self._conv_forward(input, self.weight)
File "D:\Program_Files\Anaconda3\envs\ai-system-learn\lib\site-packages\torch\nn\modules\conv.py", line 346, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same
I think there are something wrong with tensorboard. How can I fix this?
在执行bash quick-start-service.sh -m ~/master.csv -w ~/worker.csv -c ~/config.yaml
报错”No worker node is detected.“,查询发现错误在执行/contrib/kubespray/script/openpai-generator.py (line 304)脚本时出现。请问是否有解决方法。
您好!我看见Textbook目录下缺少一些章节的内容。请问本课程的教材是还在写作当中吗?以及本课程是否有配套的视频可以观看呢?
应该是对应到哪个章节的呢?能否合并到正文中?
服务-> 推理,减少问题词出现次数
2 标题修改
原始
12. 人工智能安全与隐私
12.1 人工智能内在安全与隐私
12.1.1 深度神经网络的内在安全问题
12.1.2 深度神经网络的内在隐私问题
12.2 人工智能训练安全与隐私
12.2.1 深度学习训练时的安全问题
12.2.2 深度学习训练时的隐私问题
12.2.3 联邦学习及其训练时的隐私问题
12.3 人工智能服务安全与隐私
12.3.1 深度学习服务时的安全问题
12.3.2 深度学习服务时的用户隐私问题
12.3.3 深度学习服务时的模型隐私问题
->
新
12. 人工智能安全与隐私
12.1 人工智能内在安全与隐私
12.1.1 深度神经网络的安全问题
12.1.2 深度神经网络的隐私问题
12.2 人工智能训练安全与隐私
12.2.1 训练系统安全
12.2.2 训练系统隐私
12.2.3 联邦学习隐私
12.3 人工智能推理安全与隐私
12.3.1 推理系统安全
12.3.2 推理系统用户隐私
12.3.3 推理系统隐私
正文6000字,参考文献4000字,比例失衡,缩短后者
markdown文件
Dear teachers, we have problems doing our homework-lab6, we are now stuck in one step, the problem is shown in the picture, could you please help us?thanks a lot!
usually, only the training change, so it is just about using the same converter for a new model.
Great job! Will there be an open video course?
想问下该教材有没有上课的视频资料链接呢?
In page 19 by calculation of the gradient of L(x), there is probably an error in the second part for sin(exp(x)+exp(x)^2
Should the gradient be cos(exp(x)+exp(x)^2)(exp(x)+2exp(x)^2), i.e. there is one unnecessary exp in the answer?
Thanks
Thanks for your great work. Is there an open video recording of the course? If so, could you share the link?
Starting kubernetes...
setup k8s cluster
PLAY [localhost] *******************************************************************************************************************************************************************************************
[WARNING]: Could not match supplied host pattern, ignoring: bastion
PLAY [bastion[0]] ******************************************************************************************************************************************************************************************
skipping: no hosts matched
PLAY [k8s-cluster:etcd] ************************************************************************************************************************************************************************************
included: /home/openpai/pai-deploy/kubespray/roles/bootstrap-os/tasks/bootstrap-debian.yml for stu-276, iair279, stu-282
PLAY [k8s-cluster:etcd] ************************************************************************************************************************************************************************************
TASK [kubernetes/preinstall : Stop if access_ip is not pingable] *******************************************************************************************************************************************
changed: [iair279]
changed: [stu-276]
changed: [stu-282]
included: /home/openpai/pai-deploy/kubespray/roles/container-engine/docker/tasks/set_facts_dns.yml for stu-276, iair279, stu-282
[WARNING]: flush_handlers task does not support when conditional
TASK [download : prep_download | Create staging directory on remote node] **********************************************************************************************************************************
changed: [stu-276]
changed: [iair279]
changed: [stu-282]
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/prep_kubeadm_images.yml for stu-276
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/download_file.yml for stu-276
TASK [download : download_file | Create dest directory on node] ********************************************************************************************************************************************
changed: [stu-276]
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/extract_file.yml for stu-276
[WARNING]: noop task does not support when conditional
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/download_container.yml for stu-276
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/download_file.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/download_file.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/download_file.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/download_file.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/download_container.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/download_container.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/download_container.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/download_container.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/download_container.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/download_container.yml for stu-276
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/download_container.yml for stu-276
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/download_container.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/download_container.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/download_container.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/download_container.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/download_container.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/download_container.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/set_docker_image_facts.yml for stu-276
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/check_pull_required.yml for stu-276
TASK [download : download_file | Create dest directory on node] ********************************************************************************************************************************************
changed: [stu-282]
changed: [iair279]
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/extract_file.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/extract_file.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/extract_file.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/extract_file.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/set_docker_image_facts.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/check_pull_required.yml for stu-276, iair279, stu-282
TASK [download : download_container | Download image if required] ******************************************************************************************************************************************
changed: [stu-276 -> 192.168.1.187]
changed: [iair279 -> 192.168.1.187]
changed: [stu-282 -> 192.168.1.187]
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/set_docker_image_facts.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/check_pull_required.yml for stu-276, iair279, stu-282
TASK [download : download_container | Download image if required] ******************************************************************************************************************************************
changed: [stu-276 -> 192.168.1.187]
changed: [stu-282 -> 192.168.1.187]
changed: [iair279 -> 192.168.1.187]
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/set_docker_image_facts.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/check_pull_required.yml for stu-276, iair279, stu-282
TASK [download : download_container | Download image if required] ******************************************************************************************************************************************
changed: [stu-276 -> 192.168.1.187]
changed: [iair279 -> 192.168.1.187]
changed: [stu-282 -> 192.168.1.187]
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/set_docker_image_facts.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/check_pull_required.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/set_docker_image_facts.yml for stu-276, iair279, stu-282
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/check_pull_required.yml for stu-276, iair279, stu-282
TASK [download : download_container | Download image if required] ******************************************************************************************************************************************
changed: [iair279 -> 192.168.1.187]
changed: [stu-276 -> 192.168.1.187]
changed: [stu-282 -> 192.168.1.187]
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/set_docker_image_facts.yml for stu-276
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/check_pull_required.yml for stu-276
TASK [download : download_container | Download image if required] ******************************************************************************************************************************************
changed: [stu-276 -> 192.168.1.187]
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/set_docker_image_facts.yml for stu-276
included: /home/openpai/pai-deploy/kubespray/roles/download/tasks/check_pull_required.yml for stu-276
FAILED - RETRYING: download_container | Download image if required (4 retries left).
FAILED - RETRYING: download_container | Download image if required (3 retries left).
FAILED - RETRYING: download_container | Download image if required (2 retries left).
FAILED - RETRYING: download_container | Download image if required (1 retries left).
TASK [download : download_container | Download image if required] ******************************************************************************************************************************************
fatal: [stu-276 -> 192.168.1.187]: FAILED! => {"attempts": 4, "changed": true, "cmd": ["/usr/bin/docker", "pull", "k8s.gcr.io/cluster-proportional-autoscaler-amd64:1.6.0"], "delta": "0:00:15.027078", "end": "2021-05-11 19:29:07.932611", "msg": "non-zero return code", "rc": 1, "start": "2021-05-11 19:28:52.905533", "stderr": "Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)", "stderr_lines": ["Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"], "stdout": "", "stdout_lines": []}
NO MORE HOSTS LEFT *****************************************************************************************************************************************************************************************
PLAY RECAP *************************************************************************************************************************************************************************************************
iair279 : ok=204 changed=7 unreachable=0 failed=0 skipped=192 rescued=0 ignored=0
localhost : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
stu-276 : ok=295 changed=8 unreachable=0 failed=1 skipped=258 rescued=0 ignored=0
stu-282 : ok=204 changed=7 unreachable=0 failed=0 skipped=192 rescued=0 ignored=0
Update chapter 2 outline in main README.md outline #71
Welcome to discuss about the practice projects here!
Dear Xiaowu哥,
Seems Chapter 2 lacks some content below. Will add it in the later version?
2.2 深度学习系统基础
2.2.1 深度学习运算的表示
2.2.2 编译框架与中间表达
2.2.3 运行态和硬件
2.2.4 分布式执行
2.2.5 深度学习系统性能优化
Dear Xiaowu哥,当前transformer, MOE and Pathways系列模型越来越重要,可否帮补一章这类模型结构?与CNN和RNN并列
https://github.com/microsoft/AI-System/tree/main/Textbook/%E7%AC%AC2%E7%AB%A0-%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%9F%BA%E7%A1%80
本 issue 将作为2021_USTC_JointPhD-人工智能系统实践项目成果的提交地址
项目截止日期: 5月26日
请在北京时间2021年5月27日17点前,在本issue下提交你的成果。
提交步骤:
MSRA-USTC-AISystemProject-2021
的公开仓库# 实验1
|-- lab 1
# 图片均放在images目录下
|-- images
|-- image1.png
|-- image2.jpg
# 代码均放在src目录下
|-- src
|-- code1.py
|-- code2.ipynb
# 其他文件均放在resources目录下
|-- resources
# 请在README.md中放置各个文件的内容说明,你的实验流程和成果展示
|-- README.md
# 实验6
|-- lab 6
|-- ...
1. 报名时提供的邮件地址: yourmail[at]your.domain
2. 个人仓库地址: https://github.com/yourname/yourrepositories
3. 补充信息(可选)
有其他疑问的地方请随时与助教联系
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.