Giter Club home page Giter Club logo

investment_data's Introduction

Hi there 👋 This is Di, nice to meet you.

About me

  • 💼 Full Stack Software Engineer at Microsoft
  • 📈 Part-time Quant Trader
  • 😄 I love hacking stuff
  • 🌱 I’m currently learning Distributed Storage System and Financial Engineering.
  • 📫 How to reach me: [email protected]
  • ⚡ Fun fact:
  • 💬 Ask me about anything here

chendi's GitHub stats

Top Langs

investment_data's People

Contributors

andyli1007 avatar chenditc avatar dmnsn7 avatar hlstwizard avatar xu-li avatar zhuoju36 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

investment_data's Issues

已退市股票数据缺失

sz000418 这只股票在 csi500 里面

SZ000418 2012-03-20 2019-06-10

但是 features 里面没这只股票的数据,2019 年退市

关于复权价

您好,非常感谢您的付出。
有个问题请教,您说:
校验了 A 股历史数据的复权因子。由于浮点数的精度问题,如果相差不大,则不作调整,如果相差大,则找对应的分红或者除权公告进行计算。

实际上,wind的复权方法是最科学的,其他数据源复权方法有问题。所以是不是应该以wind的复权价和复权因子为准,不要“找对应的分红或者除权公告进行计算“自行计算,因为算法可能不对。

关于成分股股票池文件

我下载了你上传的最新的qlib数文件,解压后观察csi300.txt,发现同一股票会有多行数据,不知这是否异常。比如你搜索
SH601668,会发现有多行都含有它。

ts_index_weight更新频率

作者您好,ts_index_weight是不是很久没有更新了?我看沪深300成分股的trade_date还停留在2022年7月1日。

关于涨跌幅change

您好,想请教一下,在 normalize 数据时,调用 qlib 的 yahoo.collector 中的 YahooNormalize.calc_change 计算涨跌幅,用的是没有复权的 close 价格,是不是不合理?是否应该使用复权后的 close 价格?

每日更新数据延迟

从2024-01-08 开始 Daily update release qlib_bin 数据都不包含当天的数据,比如 2024-01-08 版本最后交易日数据到 01-05, 01-09 最后交易日数据是 01-08

数据是不是有问题

使用如下的配置,以及 2023-04-15 的数据跑出来结果异常差,好像不太对劲

'The following are analysis results of benchmark return(1day).'
risk
mean 0.000884
std 0.008006
annualized_return 0.210426
information_ratio 1.703677
max_drawdown -0.078120
'The following are analysis results of the excess return without cost(1day).'
risk
mean -0.000896
std 0.004709
annualized_return -0.213222
information_ratio -2.934913
max_drawdown -0.104559
'The following are analysis results of the excess return with cost(1day).'
risk
mean -0.001283
std 0.004692
annualized_return -0.305263
information_ratio -4.217365
max_drawdown -0.139517

qlib_init:
    provider_uri: "~/.qlib/qlib_data/cn_data"
    region: cn
market: &market csi500
benchmark: &benchmark SH000905
data_handler_config: &data_handler_config
    start_time: 2018-01-01
    end_time: 2023-04-01
    fit_start_time: 2018-01-01
    fit_end_time: 2021-12-31
    instruments: *market
port_analysis_config: &port_analysis_config
    strategy:
        class: TopkDropoutStrategy
        module_path: qlib.contrib.strategy
        kwargs:
            model: <MODEL> 
            dataset: <DATASET>
            topk: 50
            n_drop: 10
    backtest:
        start_time: 2022-11-01
        end_time: 2023-04-01
        account: 100000000
        benchmark: *benchmark
        exchange_kwargs:
            limit_threshold: 0.095
            deal_price: close
            open_cost: 0.0005
            close_cost: 0.0015
            min_cost: 5
task:
    model:
        class: LGBModel
        module_path: qlib.contrib.model.gbdt
        kwargs:
            loss: mse
            colsample_bytree: 0.9
            learning_rate: 0.1
            subsample: 0.9
            lambda_l1: 205.6999
            lambda_l2: 580.9768
            max_depth: 8
            num_leaves: 250
            num_threads: 20
    dataset:
        class: DatasetH
        module_path: qlib.data.dataset
        kwargs:
            handler:
                class: Alpha158
                module_path: qlib.contrib.data.handler
                kwargs: *data_handler_config
            segments:
                train: [2010-01-01, 2021-12-31]
                valid: [2022-01-01, 2022-10-31]
                test: [2022-11-01, 2023-04-01]
    record: 
        - class: SignalRecord
          module_path: qlib.workflow.record_temp
          kwargs: 
            model: <MODEL>
            dataset: <DATASET>
        - class: SigAnaRecord
          module_path: qlib.workflow.record_temp
          kwargs: 
            ana_long_short: False
            ann_scaler: 252
        - class: PortAnaRecord
          module_path: qlib.workflow.record_temp
          kwargs: 
            config: *port_analysis_config

Daily Update incremental export

请问是否能够增量导出qlib格式的数据呢?

我现在的流程是从dolthub上更新数据,可能一周更新一次,如果想导出最新的数据,是否必须全量导出?
如果不是必须的话,怎样增量导出?(如果不支持,我可以提供PR,但是需要一些指引。)

复权数据错误 833533.BJ

833533.BJ 于 20220527 的复权因子为 1,20220530 的复权因子为 4.07,其他数据源在这个时间点没有复权事件。

关于release tag中发布的数据

感谢你实现了一个较为可靠的数据源,我想请教一下,github中你每天发布的数据包是全量数据吗?这些tag是每天固定几点钟更新的?这样我就偷懒可以做个脚本直接从github上下载就可以了

normalize.py 出现网络错误

发现获取日历的时候被深交所www.szse.cn屏蔽,正常浏览器都无法访问深交所。
怀疑访问过于频繁,是否应该加一个delay?

日志
C:\Projects\Quant\Quant1\bin\Debug\net7.0>python ./normalize.py normalize_data --source_dir C:\Projects\Quant\Quant1\bin\Debug\net7.0\output\ --normalize_dir C:\Projects\Quant\Quant1\bin\Debug\net7.0\output_normalize\ --max_workers=28 --date_field_name="tradedate"
2024-04-12 18:31:40.760 | INFO | data_collector.utils:get_calendar_list:68 - get calendar list: ALL......
2024-04-12 18:33:17.852 | WARNING | data_collector.utils:wrapper:491 - _get_calendar: 1 :2000-01-->HTTPConnectionPool(host='www.szse.cn', port=80): Read timed out. (read timeout=None)

数据有问题,解压2023-07-29数据包后,使用qlib查看,发现数据不正确

数据有问题,解压2023-07-29数据包后,使用qlib查看,发现数据不正确,如:sh601933
$open $volume $high $low $adjclose $close
instrument datetime
sh601933 2023-07-24 0.937075 2016378.750 0.962632 0.928556 30.900000 0.956953
2023-07-25 0.962424 1617211.750 0.965263 0.945390 30.709999 0.951068
2023-07-26 0.948279 1798458.375 0.965314 0.945440 31.170000 0.965314
2023-07-27 0.959740 1200080.250 0.965419 0.954061 30.990000 0.959740
2023-07-28 0.956948 1648106.875 0.973986 0.954108 31.450001 0.973986

而这一支股票的收盘价格为3.43

缺少2023-06-09的数据

导入到qlib后没有2023-06-09的数据,如下图:
image

查看dolt数据源也是没有,如下图:
image

上交所和深交所都没有股票数据,倒是有一些北交所的数据

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.