This Python program generates fake data containing 5000 rows. It can be utilized for various purposes, particularly for machine learning projects where a large dataset is required for testing and development.
The generated data consists of 5000 rows with the following columns:
id | name | age | gender | salary | join_date | |
---|---|---|---|---|---|---|
0 | 1 | William | 37 | Female | 48600 | 2021-07-24 |
1 | 2 | Jonathan | 48 | Female | 869900 | 2011-12-31 |
2 | 3 | Mary | 30 | Female | 474200 | 2001-01-03 |
3 | 4 | Shari | 56 | Female | 43900 | 2009-11-23 |
4 | 5 | Lindsey | 48 | Male | 981000 | 2011-10-27 |
To use this program, follow these steps:
- Clone this repository to your local machine.
- Navigate to the directory containing the program.
- Run the Python script
data_generator_program.ipynb
. - The generated data will be saved as a CSV file named
fake_data.csv
in the same directory.
Feel free to modify the parameters of the generated data according to your requirements. You can also integrate this program into your machine learning pipeline for data preprocessing and testing.
This project is licensed under the MIT License - see the LICENSE file for details.