Giter Club home page Giter Club logo

Comments (14)

ucasiggcas avatar ucasiggcas commented on August 28, 2024
Traceback (most recent call last):
  File "/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle_rec-0.1.0-py3.7.egg/paddlerec/core/utils/envs.py", line 221, in lazy_instance_by_fliename
    globals(), locals(), package.split("."))
  File "models/recall/gnn/model.py", line 23, in <module>
    from paddlerec.core.metrics import RecallK
ImportError: cannot import name 'RecallK' from 'paddlerec.core.metrics' (/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle_rec-0.1.0-py3.7.egg/paddlerec/core/metrics/__init__.py)
Catch Exception:cannot import name 'RecallK' from 'paddlerec.core.metrics' (/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle_rec-0.1.0-py3.7.egg/paddlerec/core/metrics/__init__.py)
Traceback (most recent call last):
  File "/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle_rec-0.1.0-py3.7.egg/paddlerec/core/trainer.py", line 246, in run
    self.context_process(self._context)
  File "/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle_rec-0.1.0-py3.7.egg/paddlerec/core/trainer.py", line 207, in context_process
    self._status_processor[context['status']](context)
  File "/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle_rec-0.1.0-py3.7.egg/paddlerec/core/trainers/general_trainer.py", line 90, in network
    network_class.build_network(context)
  File "/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle_rec-0.1.0-py3.7.egg/paddlerec/core/trainers/framework/network.py", line 64, in build_network
    model_path, "Model")(context["env"])
TypeError: 'NoneType' object is not callable
Catch Exception:'NoneType' object is not callable

--------------------------------
PaddleRec Error Message Summary:
--------------------------------

Exit PaddleRec. catch exception in precoss status: [network_pass], except: 'NoneType' object is not callable
TypeError

from paddlerec.

ucasiggcas avatar ucasiggcas commented on August 28, 2024
PaddleRec: Runner single_cpu_train Begin
Executor Mode: train
processor_register begin
Running SingleInstance.
Running SingleNetwork.
Warning:please make sure there are no hidden files in the dataset folder and check these hidden files:[]
need_split_files: False
QueueDataset can not support PY3, change to DataLoader
Traceback (most recent call last):
  File "/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle_rec-0.1.0-py3.7.egg/paddlerec/core/trainer.py", line 256, in run
    self.context_process(self._context)
  File "/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle_rec-0.1.0-py3.7.egg/paddlerec/core/trainer.py", line 217, in context_process
    self._status_processor[context['status']](context)
  File "/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle_rec-0.1.0-py3.7.egg/paddlerec/core/trainers/general_trainer.py", line 90, in network
    network_class.build_network(context)
  File "/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle_rec-0.1.0-py3.7.egg/paddlerec/core/trainers/framework/network.py", line 80, in build_network
    model._data_loader)
  File "/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle_rec-0.1.0-py3.7.egg/paddlerec/core/trainers/framework/dataset.py", line 60, in get_dataloader
    reader_class_name=reader_class_name)
  File "/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle_rec-0.1.0-py3.7.egg/paddlerec/core/utils/dataloader_instance.py", line 96, in dataloader_by_name
    return gen_batch_reader()
  File "/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle_rec-0.1.0-py3.7.egg/paddlerec/core/utils/dataloader_instance.py", line 93, in gen_batch_reader
    return reader.generate_batch_from_trainfiles(files)
  File "models/recall/gnn/reader.py", line 135, in generate_batch_from_trainfiles
    self.input = self.base_read(files)
  File "models/recall/gnn/reader.py", line 35, in base_read
    for line in fin:
  File "/home/xulm1/anaconda3/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
Catch Exception:'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

--------------------------------
PaddleRec Error Message Summary:
--------------------------------

Exit PaddleRec. catch exception in precoss status: [network_pass], except: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
UnicodeDecodeError

from paddlerec.

ucasiggcas avatar ucasiggcas commented on August 28, 2024

运行的下面这句,第二个是重新安装后的结果
$ python -m paddlerec.run -m models/recall/gnn/config.yaml

from paddlerec.

ucasiggcas avatar ucasiggcas commented on August 28, 2024

不太理解的是召回的Cnt个数为啥越来越多?一共就没那么多item
image

from paddlerec.

ucasiggcas avatar ucasiggcas commented on August 28, 2024
2020-09-15 15:14:05,122-INFO: 	[Train],  epoch: 0,  batch: 1, time_each_interval: 29.89s, LOSS: [10.532445], InsCnt: [10000.], RecallCnt: [73.], Acc(Recall@20): [0.0073]
2020-09-15 15:14:18,110-INFO: 	[Train],  epoch: 0,  batch: 2, time_each_interval: 12.99s, LOSS: [10.150826], InsCnt: [15000.], RecallCnt: [266.], Acc(Recall@20): [0.01773333]
2020-09-15 15:14:30,812-INFO: 	[Train],  epoch: 0,  batch: 3, time_each_interval: 12.70s, LOSS: [9.429095], InsCnt: [20000.], RecallCnt: [459.], Acc(Recall@20): [0.02295]
2020-09-15 15:14:42,839-INFO: 	[Train],  epoch: 0,  batch: 4, time_each_interval: 12.03s, LOSS: [8.945746], InsCnt: [25000.], RecallCnt: [814.], Acc(Recall@20): [0.03256]
2020-09-15 15:14:54,804-INFO: 	[Train],  epoch: 0,  batch: 5, time_each_interval: 11.96s, LOSS: [8.617248], InsCnt: [30000.], RecallCnt: [1152.], Acc(Recall@20): [0.0384]
2020-09-15 15:15:06,927-INFO: 	[Train],  epoch: 0,  batch: 6, time_each_interval: 12.12s, LOSS: [8.601961], InsCnt: [35000.], RecallCnt: [1509.], Acc(Recall@20): [0.04311429]
2020-09-15 15:15:18,632-INFO: 	[Train],  epoch: 0,  batch: 7, time_each_interval: 11.70s, LOSS: [8.352413], InsCnt: [40000.], RecallCnt: [1921.], Acc(Recall@20): [0.048025]
2020-09-15 15:15:30,354-INFO: 	[Train],  epoch: 0,  batch: 8, time_each_interval: 11.72s, LOSS: [8.464729], InsCnt: [45000.], RecallCnt: [2270.], Acc(Recall@20): [0.05044444]

100万行训练数据,3万多item,一个batch12s,batch_size=5000,训练一轮需要100万/5000*12s=2400s,而tf版本只需不到10min,同样的数据量,需要提高啊。

from paddlerec.

ucasiggcas avatar ucasiggcas commented on August 28, 2024

这还没用1000万的训练数据呢,咋整啊,大数据还是用不起啊

from paddlerec.

ucasiggcas avatar ucasiggcas commented on August 28, 2024

推理是咋做的啊
每个用户推的items列表怎么取到啊
数据一定要存下来吗??train和test,
然后再读取?
很麻烦,数据处理完就训练不行吗?整个流程

from paddlerec.

ucasiggcas avatar ucasiggcas commented on August 28, 2024

models/recall/gnn/data/config.txt
187993
7806633
这个文件下的俩数字怎么用脚本放到config.yaml文件中啊,这可咋整啊??
好麻烦啊,我定时训练总不能自己每隔一段时间看看,然后手动改吧

from paddlerec.

ucasiggcas avatar ucasiggcas commented on August 28, 2024

另外如果要改config.yaml中的数据咋整??这种形式好麻烦。
我倒是觉得不如直接来个argparse进行参数的输入

from paddlerec.

ucasiggcas avatar ucasiggcas commented on August 28, 2024
/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py:789: UserWarning: The following exception is not an EOF exception.
  "The following exception is not an EOF exception.")
Traceback (most recent call last):
  File "/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle_rec-0.1.0-py3.7.egg/paddlerec/core/trainer.py", line 256, in run
    self.context_process(self._context)
  File "/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle_rec-0.1.0-py3.7.egg/paddlerec/core/trainer.py", line 217, in context_process
    self._status_processor[context['status']](context)
  File "core/trainers/general_trainer.py", line 113, in startup
    startup_class.startup(context)
  File "/data1/xulm1/PaddleRec/core/trainers/framework/startup.py", line 237, in startup
    context["exe"].run(startup_prog)
  File "/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 790, in run
    six.reraise(*sys.exc_info())
  File "/home/xulm1/anaconda3/lib/python3.7/site-packages/six.py", line 696, in reraise
    raise value
  File "/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 785, in run
    use_program_cache=use_program_cache)
  File "/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 838, in _run_impl
    use_program_cache=use_program_cache)
  File "/home/xulm1/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 912, in _run_program
    fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet: 

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<std::string>(std::string&&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(paddle::platform::ErrorSummary const&, char const*, int)
2   paddle::platform::DeviceContextPool::Get(paddle::platform::Place const&)
3   paddle::framework::GarbageCollector::GarbageCollector(paddle::platform::Place const&, unsigned long)
4   paddle::framework::UnsafeFastGPUGarbageCollector::UnsafeFastGPUGarbageCollector(paddle::platform::CUDAPlace const&, unsigned long)
5   paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool)
6   paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator<std::string> > const&, bool, bool)

----------------------
Error Message Summary:
----------------------
Error: Place CUDAPlace(2) is not supported, Please check that your paddle compiles with WITH_GPU option or check that your train process hold the correct gpu_id if you use Executor at (/paddle/paddle/fluid/platform/device_context.cc:67)

from paddlerec.

ucasiggcas avatar ucasiggcas commented on August 28, 2024

而实际上是可以用2的

>>> import paddle.fluid as fluid
>>> fluid.CUDAPlace(2)
<paddle.fluid.core_avx.CUDAPlace object at 0x7fcf4e938c30>
>>> 

from paddlerec.

ucasiggcas avatar ucasiggcas commented on August 28, 2024

train及infer都用1,显式设置gpu为1

----------------------
Error Message Summary:
----------------------
ResourceExhaustedError: 

Out of memory error on GPU 1. Cannot allocate 7.003248GB memory on GPU 1, available memory is only 2.751526GB.

Please check whether there is any other process using GPU 1.
1. If yes, please stop them, or start PaddlePaddle on another GPU.
2. If no, please decrease the batch size of your model. 

 at (/paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:69)

这说明,train结束后占用的内存并没有释放。
下面试试train 1 infer 0

from paddlerec.

ucasiggcas avatar ucasiggcas commented on August 28, 2024

仍旧不行啊,也不知道改了哪里不该改的了,心累

----------------------
Error Message Summary:
----------------------
Error: Place CUDAPlace(0) is not supported, Please check that your paddle compiles with WITH_GPU option or check that your train process hold the correct gpu_id if you use Executor at (/paddle/paddle/fluid/platform/device_context.cc:67)

EnforceNotMet

离实际应用的距离有点远

from paddlerec.

ucasiggcas avatar ucasiggcas commented on August 28, 2024

image

from paddlerec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.