Comments (5)
就统计模型而言,最简单的方式或许可以如果将历年北京8月1日的天气整理成一张表,通过模型训练,应该可以生成无数个符合分布的背景8月1日的天气,将这个过程针对31天重复31次,即可获得无数组仿真数据。
这种方法应该是OK的,但是对于很多用户而言,可能获得如此多 / 准确的数据可能并不是太容易。
如果仅有一年的数据,其实这一年的数据是能够有很多参考价值的,有一些经典的统计学算法来解决这个问题,例如:
- 自回归移动平均模型(ARMA)
- 自回归整合移动平均模型(ARIMA)
- 季节性自回归整合移动平均模型(SARIMA)
- 等
此类通过分析季节性(不同于周期性)、周期性、趋势等时间序列特征,来生成数据,这种方法可能可以需要更少的数据。
我感觉这些算法或许更适合解决这个问题中的一列,例如温度、湿度等。
也许可以开发一种算法,已知一些特征数值、补全其他特征数值的算法,这应该是可行的。
再结合时间序列等经典方法,也许可以更优雅地解决这一问题。
from synthetic-data-generator.
Hi @twodonkeys ,
We have received your PR and thank you for your valuable content.
If we will support the feature you proposed, after conducting a technical evaluation, we will add an Issue tagged Feature/SDG-Feature to describe the precise newly added technical feature, and keep you updated in subsequent conversations on this Issue :)
from synthetic-data-generator.
可否提供一份图中数据的csv版本(应包含尽量多的数据条目)供技术人员进行分析与实验?
Could you provide a csv version of this data in the figure (which should contain as many data entries as possible) for technical staff to analyze and experiment?
from synthetic-data-generator.
只要通过对数据进行分组,应该可以实现。
就统计模型而言,最简单的方式或许可以如果将历年北京8月1日的天气整理成一张表,通过模型训练,应该可以生成无数个符合分布的背景8月1日的天气,将这个过程针对31天重复31次,即可获得无数组仿真数据。
我不太清楚基于LLM模型或者GAN模型应该如何更好地组织数据,同时,这一日的前后应该也是有参考价值的,我们最好可以详细研究一下这个场景和对应的数据集,以便进行更好地仿真实现
from synthetic-data-generator.
可否提供一份图中数据的csv版本(应包含尽量多的数据条目)供技术人员进行分析与实验?
Could you provide a csv version of this data in the figure (which should contain as many data entries as possible) for technical staff to analyze and experiment?
open-meteo-39.89N116.36E47m (1).csv
类似这种类型的数据,每天的天气情况,还有当天的一些气象标签,我可以用气象标签生成符合气象特征的数据。
from synthetic-data-generator.
Related Issues (20)
- [Good first issue | Feature] Synthesize specific types of IDs HOT 1
- [BOT] Add contributors HOT 6
- [Bug] Metadata's `_extend` missing when saving Metadata to disk (in Json) HOT 1
- [Good first issue | Bugfix]Add more logs in current components
- [Good first issue | Enhance] Add column description info in LLM prompt
- [Enhance | 0.2.0] Update SDG Readme HOT 1
- [Feature] Data Processor: support pre-processing and post-processing
- [Feature] Rule Manager: discover, manage rule and constraint between features(columns)
- Information Data Preprocessing HOT 1
- [Good First Issue | Document] Add metadata code example in ipynb HOT 3
- [Good First Issue | ENV] add dotenv in single LLM models
- [Feature | Document] add FAQ section in document HOT 2
- [Feature | Inspector] add chn address inspector
- How to regulate the range of Synthetic Data HOT 9
- Performance issues with GaussianCopula training on tabular data
- LaTable implementation for enhanced tabular data generation HOT 1
- Does the WeChat QR code expire? How can I add it to the user group? HOT 1
- 请问有多表的合成数据示例吗? HOT 1
- Segmentation Fault in CTGAN Execution, Resolved by Upgrading scikit-learn HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from synthetic-data-generator.