Giter Club home page Giter Club logo

igtd's Introduction

Image Generator for Tabular Data (IGTD): Converting Tabular Data into Images for Deep Learning Using Convolutional Neural Networks

Description

Image Generator for Tabular Data (IGTD) is an algorithm for transforming tabular data into images. The algorithm assigns each feature to a unique pixel position in the image representation. Similar features are assigned to neighboring pixels, while dissimilar features are assigned to pixels that are far apart. According to the assignment, an image is generated for each sample, in which the pixel intensity reflects the value of the corresponding feature in the sample. One of the most important applications for the generated images is to build Convolutional Neural Networks (CNNs) based on the image representations in subsequent analysis. A publication about the IGTD algorithm is available at https://www.nature.com/articles/s41598-021-90923-y

User Community

  • Primary: machine learning; computational data modeling
  • Secondary: bioinformatics; computational biology

Usability

To use this software package, users must possess the basic skills to program and run Python scripts. Users need to process the input data into the data format accepted by the package. Users also need to understand the input parameters of the IGTD algorithm, so that the parameters can be appropriately set to execute the algorithm. To build CNN prediction models based on the converted images, users need to understand the parameters of CNN models and set their values in the model parameter files.

Uniqueness

IGTD is a novel algorithm for transforming tabular data into images. Compared with existing methods for converting tabular data into images, IGTD has several advantages.

  • IGTD does not require prior knowledge about the features. Thus, it can be used in the absence of domain knowledge.
  • IGTD generates compact image representations, in which each pixel represents a unique feature. Deep learning based on compact image representations usually requires less memory and time to train the prediction model.
  • IGTD can simultaneously convert multiple data tables of different feature types into multi-channel images for modeling using multi-channel CNNs.
  • IGTD has been shown to generate compact image representations promptly, which also better preserve the feature neighborhood structure.
  • CNNs trained on IGTD images achieve a better (or similar) prediction performance than both CNNs trained on alternative image representations and prediction models trained on the original tabular data.
  • IGTD provides a flexible framework that can be extended to accommodate diversified data and requirements. The size and shape of the image representation can be flexibly chosen.

Components

The package includes four Python scripts.

  1. IGTD_Functions.py provides all the functions used by the IGTD algorithm. It provides comments explaining the input and output of each function.
  2. Examples_Of_Table_To_Image_Conversion.py provides examples showing how to run the IGTD algorithm for demo purpose.
  3. Examples_Of_Multi_Table_To_Image_Conversion.py provides examples showing how to use the IGTD algorithm for converting multiple data tables of different feature types into multi-channel images.
  4. Prediction_Modeling_Functions.py provides all the functions used for building CNN classification/regression models based on images generated by IGTD.
  5. Examples_Of_Prediction_On_Images.py provides examples showing how to build CNN classification/regression models for demo purpose.

The package also includes data files for demonstrating its utility

  1. Example_Gene_Expression_Tabular_Data.txt is gene expression dataset including 100 cancer cell lines and 1600 genes for demonstrating the conversion of tabular data into images.
  2. Example_Drug_Response_Data.txt is a drug response dataset obtained from Cancer Cell Line Encyclopedia (CCLE), used for building CNN models of drug response prediction.
  3. Example_Drug_Descriptor_Image_Data folder includes drug descriptor data files of matrix(image) format generated by IGTD. They are used for building CNN models of drug response prediction.
  4. Example_Gene_Expression_Image_Data folder includes gene expression data files of matrix(image) format generated by IGTD. They are used for building CNN models of drug response prediction.
  5. Example_Model_Parameters folder includes three files of CNN parameters. CNN2D_SubNetwork.txt includes parameters for CNN layers in the subnetwork of each input data modality. FCNN_Classifier.txt is for classification modeling, including parameters of the dense layers after concatenating the embeddings from subnetworks. FCNN_Regressor.txt includes the same set of parameters as FCNN_Classifier.txt does, but is for regression modeling.

Technical Details

Refer to this README.

igtd's People

Contributors

zhuyitan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

igtd's Issues

Little amount of input column

Hi !

I have a project about Binary Classification with Tabular data. Data contains 3 columns as input and 1 column as output. Also the number of rows is 6646.

image

I want bigger matrix. What I mean is because of the 3 inputs, generated images only becomes like 1-3 or 3-1. For example I want 4*4 images which generated from 3 inputs.

image

I want images like this :

image

Is there any way to do this ?

.pkl files

Results.pkl in Test_1 and Test_2 directories should store the original tabular data, the generated image data, and the names of samples. It does not display all these files when pickle.load() is executed.

image for CNN

What is the final image for training a CNN and where can it be called from the Test_1 and Test_2 directories?

Images and data text files being saved empty.

Hello,

I ran the Examples_Of_Table_To_Image_Conversion.py file and it produced the expected output images. I now am attempting to create my own images, but when I add my own test data, and follow the same steps as Examples_Of_Table_To_Image_Conversion.py, it produces empty images and data text files. The script runs otherwise as expected, and produces all other images and text files correctly with no error. Is there a way I can debug what is going on with the algo?

Matrix image shape

I have read the paper in nature and I am unsure if the image created needs to be symmetrical? If I have 200 features can I create an image of 20 x 10 or does it need to be symmetrical by adding padding features so it has an integer square root?

Converting back to panel data

Not necessarily an issue and more of a question...
Would it be possible to convert from the image back to panel data?

Results.pkl

The results.pkl file only seems to contain the original tabular data and nothing else but the IGTD seems to have worked as I have optimized_feature_ranking.png image along with some others. If I want to use the CNN regressor provided am I meant to use one of the png files generated or the contents of the Results.pkl file?

The contents of my results.pkl file.
Screenshot 2024-05-20 144958

test set

Hello,
I'm trying to use your code for prediction based on chemistry descriptors. How can I create such images for test set, when I've already obtained model based on training set presented with IGTD? In different words, after creation of images are there any settings which I should give to test set?

Time complexity is high when column number arbitrary in image transformation

image
When using the image transformation, depending on the number of columns (for example ~20 thousand colons), I encounter quite long algorithm run times, for example about 30 hours, but as a result, I get a crash. I got around this by reducing the number of columns. What are your suggestions, what would you recommend?
Thank in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.