Giter Club home page Giter Club logo

Comments (9)

guoxinyang avatar guoxinyang commented on August 22, 2024 3

@whuaxiom 对于有id数据 ,feature_dim要求精准,必须大于等于最大特征id. 对于去id化数据,feature_dim只是参考值,ps-plus根据这个值做参数分配,不强制必须精准,支持自动扩容,但最好数量级能匹配,否则可能导致内存OOM。

from x-deeplearning.

guoxinyang avatar guoxinyang commented on August 22, 2024 2

@DecKen 代码已经开放出来了,实时特征频控目前支持概率过滤的方式,实现方法在hashmap.cc中的GetWithAddProbability方法,后续还会支持按bloom filter进行过滤。特征淘汰支持按某个slot的值做删除,目前常用做法是按上次更新到现在的global_step差值做过滤,实现在hash_unary_filter.cc和ps_mark_op.cc中,具体方法是每一轮用ps_mark_op标记特征更新的global_step,然后隔一定时间或步数,调用hash_unary_filter对所有key进行一遍过滤。

from x-deeplearning.

guoxinyang avatar guoxinyang commented on August 22, 2024

谢谢关注,目前代码是支持对Kafka的读写,文档后续会补上

from x-deeplearning.

DecKen avatar DecKen commented on August 22, 2024

我好奇的是Online-Learning这一块 实时特征频控和过期特征淘汰这两者是如何实现的? 这一块后续也会开源出来吗?

谢谢关注,目前代码是支持对Kafka的读写,文档后续会补上

from x-deeplearning.

kobe0849 avatar kobe0849 commented on August 22, 2024

@DecKen 代码已经开放出来了,实时特征频控目前支持概率过滤的方式,实现方法在hashmap.cc中的GetWithAddProbability方法,后续还会支持按bloom filter进行过滤。特征淘汰支持按某个slot的值做删除,目前常用做法是按上次更新到现在的global_step差值做过滤,实现在hash_unary_filter.cc和ps_mark_op.cc中,具体方法是每一轮用ps_mark_op标记特征更新的global_step,然后隔一定时间或步数,调用hash_unary_filter对所有key进行一遍过滤。

你们的做法是对每一个特征记录一个类似时间戳的东西,然后一定时间戳之后都没有更新到这个特征就把它删除,这么理解吗?具体怎么去设置这个时间呢,是保持总特征数量控制在一定范围还是就拍时间呢?以及想问下那个做这个过滤操作的业务代码在哪个类里?

from x-deeplearning.

songyue1104 avatar songyue1104 commented on August 22, 2024

XDL有两种方式控制特征数量,一个是特征准入,即特征出现次数高于一定的阈值才会参与训练,目前采用的是概率的方式,后续会实现基于counting-bloomfilter的精准过滤。第二个是特征淘汰,目前提供基于时间的淘汰方式,业务代码需要定期调用清理op,后续我们会建立更完善的特征综合评分机制

from x-deeplearning.

hazoth avatar hazoth commented on August 22, 2024

期待文档。
问个问题,一般来说在线学习,怎样更新模型?我看文档里,在线推理需要xdl模型导出转换成blaze。那对于在线学习怎么处理呢。

from x-deeplearning.

simon1024 avatar simon1024 commented on August 22, 2024

期待文档。
问个问题,一般来说在线学习,怎样更新模型?我看文档里,在线推理需要xdl模型导出转换成blaze。那对于在线学习怎么处理呢。

同期待文档,准备搞在线学习。梳理了导出增量模型的过程,但是对于模型如何对接到blaze,进行增量更新,不是很明白。

from x-deeplearning.

nolanliou avatar nolanliou commented on August 22, 2024

@DecKen 代码已经开放出来了,实时特征频控目前支持概率过滤的方式,实现方法在hashmap.cc中的GetWithAddProbability方法,后续还会支持按bloom filter进行过滤。特征淘汰支持按某个slot的值做删除,目前常用做法是按上次更新到现在的global_step差值做过滤,实现在hash_unary_filter.cc和ps_mark_op.cc中,具体方法是每一轮用ps_mark_op标记特征更新的global_step,然后隔一定时间或步数,调用hash_unary_filter对所有key进行一遍过滤。

有个问题咨询一下,如果用ps_mark_op标记更新的特征,请问当前step被更新的特征ID是要怎么获取呢?谢谢。

from x-deeplearning.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.