Giter Club home page Giter Club logo

sisso's Introduction

Version SISSO.3.3, July, 2023.
This code is licensed under the Apache License, Version 2.0

If you are using this code, please cite:
R. Ouyang, S. Curtarolo, E. Ahmetcik, M. Scheffler, and L. M. Ghiringhelli, Phys. Rev. Mater. 2, 083802 (2018).

Features

  • Regression & Classification
    Ref.: [R. Ouyang et al., Phys. Rev. Mater. 2, 083802 (2018)]
  • Multi-Task Learning (MT-SISSO)
    Ref.: [R. Ouyang et al., J. Phys.: Mater. 2, 024002 (2019)]
  • Variables Selection assisted Symbolic Regression (VS-SISSO, see the VarSelect.py in 'utilities')
    Ref.: [Z. Guo et al., J. Chem. Theory Comput. 18, 4945 (2022)]
  • Sign-Constrained Multi-Task Learning (SCMT-SISSO)
    Ref.: [J. Wang et al., J. Am. Chem. Soc. 145, 11457 (2023)]

(Please refer to the Refs. and the SISSO_guide.pdf for more details in using the code)

Installation

A Fortran mpi compiler is required to compile the SISSO parallel program. Below are two options for compiling the program using an IntelMPI compiler (other compilers may work as well). In the folder 'src', do:
(1) mpiifort -fp-model precise var_global.f90 libsisso.f90 DI.f90 FC.f90 SISSO.f90 -o ~/bin/SISSO
or (2) mpiifort -O2 var_global.f90 libsisso.f90 DI.f90 FC.f90 SISSO.f90 -o ~/bin/SISSO

Note:

  • option (1) enables better accuracy and run-to-run reproducibility of floating-point calculations; (2) is ~ 2X faster than (1) but tiny run-to-run variations may happen between processors of different types, e.g. Intel and AMD.
  • if 'mpi' related errors present during the compilation, try opening the file 'var_global.f90' and replace the line "use mpi" with "include 'mpif.h'". However, " use mpi " is strongly encouraged (see https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report/node411.htm).

Modules of the program:

  • var_global.f90 ! declaring the global variables
  • libsisso.f90 ! subroutines and functions for mathematical operations
  • DI.f90 ! model sparsification (descriptor identification)
  • FC.f90 ! feature construction
  • SISSO.f90 ! the main program

Running SISSO

Input Files: SISSO.in and train.dat, whose templates can be found in the folder input_templates.
Note that the input templates and the tools in the folder utilities may be modified accordingly when a new version of the code is released. Thus, users are recommended to always use the updated files, in particular the SISSO.in.

Command-line usage:
SISSO > log ! You may need to remove resource limit first by running the command 'ulimit -s unlimited'
Running on computer clusters, for example, using this command in your submission script:
mpirun -np number_of_cores SISSO >log

Primary Output Files:

  • File "SISSO.out": overall information from feature construction to model building
  • Folder "models": the top ranked descriptors/models
  • Folder "SIS_subspaces": SIS-selected subspaces (feature data and expressions)
  • Folder "desc_dat": the data for the best descriptors/models
  • File "convexnd_hull": the vertices of the nD convex hulls in classification
  • File "VS_results": the results from the VS-SISSO run.

User guide

More details on using this code can be found in the SISSO_guide.pdf

About

Created and maintained by Runhai Ouyang. Please feel free to open issues in the Github or contact Ouyang
([email protected]) in case of any problems/comments/suggestions in using the code.

Other SISSO-related codes

SISSO++: https://gitlab.com/sissopp_developers/sissopp
MATLAB: https://github.com/NREL/SISSORegressor_MATLAB
Python interface: https://github.com/Matgenix/pysisso

sisso's People

Contributors

rouyang2017 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

sisso's Issues

Issues on running the code with my data

Hi!

I am trying to run this code on my data, and with regard to the input file (SISSO.in) , I have a few doubts and it would be great if you can clarify it a bit.

a. nsample (does it mean the number of sample data I have, which considered primitively, is the number of rows in a generalized dataset? And if so, what do the numbers, considering them with >=2 properties (or, nprop) in the file description (mentioned as 'classification: e.g.(4,3,5),(7,9)') mean?
b. nsf (is it broadly speaking, the number of features/columns I have?)
c. rung (I have assumed that by rung, you mean the complexity of the feature space. How is this different from the successive parameter 'maxcomplexity'?)
d. opset (doesn't putting in all the available operators seem the best option?)
e. dimclass (is it necessary that all the dimensions have to be grouped linearly? for example, if I have feature 1,4 and 7 of the same dimension, how do I group them together? and do I categorize dimensionless features to be under the same dimension?
f. Is there any thumb rule in the selection of the number of subspaces (subs_sis)?
g. Does the input data file need to be of a specific format only? Can it have character entry in the first few columns, instead of just one? can I ignore any column? If so, how?

Thanks!

Questions about space_001d.name and space_002d.name

Hello I tried classification task using example train.dat posted on your github.

I have 2 questions about space_001d.name and space_002d.name

  1. In space_001d.name file, what does N and S exactly mean??
  2. In space_002d.name file, why does S value missing??

Thank you

A problem statement for MT-SISSO

Hi Dr.Ouyang,

I have multiple properties that need to be predicted from a single dataset (eg bandgap, phonon frequency, etc) using atomic features. Clearly, this is a regression problem. Previously, I was mapping individual properties using the dataset and running it multiple times. In the paper on MT-SISSO you have predicted the relative stability crystal structure for construction of a phase diagram. Can I use this model for my problem?

Feature Scaling

Could you add in the documentation if preprocessing such as feature scaling is necessary. As far as I understand you should not scale the features as otherwise the + and - operator for the physical feature construction does not make sense anymore. This is something I think is missing in the SISSO paper and would be useful to have in a documentation.

How to convert a CSV file to the input 'train.dat' file

Dear Dr. Ouyang:

I tried to convert a CSV file to the input 'train.dat' file, but I encountered this issue:

forrtl: severe (59): list-directed I/O syntax error, unit -5, file Internal List-Directed Read
Image              PC                Routine            Line        Source
SISSO_3.0.2_1      00000000004842BE  Unknown               Unknown  Unknown
SISSO_3.0.2_1      00000000004A97FD  Unknown               Unknown  Unknown
SISSO_3.0.2_1      00000000004A7F66  Unknown               Unknown  Unknown
SISSO_3.0.2_1      000000000047C40C  Unknown               Unknown  Unknown
SISSO_3.0.2_1      0000000000478E3E  Unknown               Unknown  Unknown
SISSO_3.0.2_1      000000000040345E  Unknown               Unknown  Unknown
libc-2.17.so       00002B0CE484F445  __libc_start_main     Unknown  Unknown
SISSO_3.0.2_1      0000000000403369  Unknown               Unknown  Unknown
forrtl: severe (59): list-directed I/O syntax error, unit -5, file Internal List-Directed Read
Image              PC                Routine            Line        Source
SISSO_3.0.2_1      00000000004842BE  Unknown               Unknown  Unknown
SISSO_3.0.2_1      00000000004A97FD  Unknown               Unknown  Unknown
SISSO_3.0.2_1      00000000004A7F66  Unknown               Unknown  Unknown
SISSO_3.0.2_1      000000000047C40C  Unknown               Unknown  Unknown
SISSO_3.0.2_1      0000000000478E3E  Unknown               Unknown  Unknown
SISSO_3.0.2_1      000000000040345E  Unknown               Unknown  Unknown
libc-2.17.so       00002AE102510445  __libc_start_main     Unknown  Unknown
SISSO_3.0.2_1      0000000000403369  Unknown               Unknown  Unknown

Here is my SISSO.in file:

!>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
! keywords for the target properties
!>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ptype=1              ! property type 1: continuous for regression,2:categorical for classification
ntask=16             ! number of tasks (properties or maps) 1: single-task learning, >1: multi-task learning
nsample=23,18,16,9,16,16,4,4,13,6,6,15,10,12,6,6           ! number of samples for each task (seperate the numbers by comma for ntask >1)
task_weighting=2     ! 1: no weighting (tasks treated equally) 2: weighted by #sample_task_i/total_sample.a
desc_dim=2           ! dimension of the descriptor (<=3 for classification)
restart=.false.      ! set .true. to continue a job that was stopped but not yet finished

!>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
! keywords for feature construction and sure independence screening
! implemented operators:(+)(-)(*)(/)(exp)(exp-)(^-1)(^2)(^3)(sqrt)(cbrt)(log)(|-|)(scd)(^6)(sin)(cos)
! scd: standard Cauchy distribution
!>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
nsf=35              ! number of scalar features (one feature is one number for each material)
rung=2               ! rung (<=3) of the feature space to be constructed (times of applying the opset recursively)
opset='(+)(-)(*)(/)(exp)(log)(^-1)(^2)(^3)(sqrt)(cbrt)(|-|)'  ! ONE operator set for feature transformation
maxcomplexity=30     ! max feature complexity (number of operators in a feature)
dimclass=(1:2)  ! group features according to their dimension/unit; those not in any () are dimensionless
maxfval_lb=1e-3      ! features having the max. abs. data value <maxfval_lb will not be selected
maxfval_ub=1e5       ! features having the max. abs. data value >maxfval_ub will not be selected
subs_sis=20000        ! size of the SIS-selected (single) subspace for each descriptor dimension

!>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
! keywords for descriptor identification via a sparsifying operator
!>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
method='L0'          ! sparsification operator: 'L1L0' or 'L0'; L0 is recommended!
L1L0_size4L0= 1      ! If method='L1L0', specify the number of features to be screened by L1 for L0
fit_intercept=.true. ! fit to a nonzero intercept (.true.) or force the intercept to zero (.false.)
metric='RMSE'        ! for regression only, the metric for model selection: RMSE,MaxAE
nm_output=100        ! number of the best models to output

I checked the number of features, the 'nsample' and 'ntask' parameter, they are right.

Thank you in advance!

SISSO error

Hi Dr Ouyang!
I am sorry to bother you, but I am a new learner in SISSO. The problem is that I want to do regression by SISSO, but I don't kown hao to deal with this problem, I am not sure it is input file errror whether or compile error, but it run successfully at the beginner. Here is the error picture,thank you very much and look forward to your suggestions.

image

Selection fo SIS subspace for respective dimensions

Hello, Dr. Ouyang

I'd like to ask about setting separate SIS subspace values for each dimensions.
As far as I know, although SIS value itself is not a hyperparameter, it influences the probability to found better descriptor.
In previous several papers, they set different values for each dimension.
I'd like to also set different SIS value for each dimension in a decreasing number according to dimension.
But it seems it's not working as I expected when I run the code.

I set SIS value for each dimension as subs_sis=6105, 2000, 1000, 100, 50 in SISSO.in file, but in SISSO.out, this is not correctly applied and making overall calculation unreasonably long.
Please also find attached input and output files.

One more question is, about the baseline.
In your previous paper (Journal of Physics: Materials 2.2 (2019): 024002.), you argued that the RMSD of SISSO model should be smaller than the standard deviation of reference (target) values (aka baseline).
If predicted RMSD from SISSO is not lower than the baseline value, does it mean that the model is overfitting or not properly learning from data?

Any comment would be appreciated and if you need further information, please tell me.
Many thanks in advance.

Best regards,
Hyunwook
SISSO_SIS.zip

A small issue about VarSelect.py

Dear professor,

When I performed VS_SISSO calculations, I found that the parameters of a defined function initial_SISSO_in_2_output_parameter() in the VarSelect.py file were using older versions of the parameters, which resulted in an arithmetic error. The specific parameters to be changed are as follows:
opset --> ops
dimclass --> funit
Thank you for developing this software, which has made me particularly interested in machine learning. I hope my suggestions are helpful to you.
Best wishes!

untar the code on github

could you untar the source code, the benefits of hosting the source are the ability to see/search the files which isn't possible when the code is compressed.

Question about mpiifort

At mpiifort -O2 var_global.f90 libsisso.f90 DI.f90 FC.f90 SISSO.f90 -o ~/bin/your_code_name

May I ask for example of ~/bin/your_code_name ??
What file should I put after -o tag??

How to predict new data manually with classification SISSO

Dear Dr. Ouyang,

first of all, thanks a lot for provinding such a very reliable method for regression and classification. That helps me a lot in my works !

I have one question about the classification method. I computed the example from your github and also tried to test with some of my dataset to see if I understandd well the methodology.

Neverthless, after finding descriptors in "SISSO.out", I predict new output manually, not with Fortran since it does not work on my computer (I really don't know why I face issues for predicting). For the regression I do excatly the same but it is more simple since it is a regression equation.

Thus, about the SVM approaches you mentionned on this github :
- Is it possible to use SISSO to find a subpsace where you project the initial dataset, and then use a SVM approach to construct a classifier on SISSO descriptors ?

Otherwise, is it possible to predict manually (I mean with somethings I could forget in the SISSO folder) new data points without using fortran ?

Thanks a lot for your time.

Marc

Continue and parallel computation

Hi Dr.Ouyang,

  1. I realized the program would generate a CONTINUE file and there is a flag in the SISSO.in that seems to give us the option to continue calculations after it is interrupted. So to restart the calculation do I simply need to change from false to true for 'restart' flag in the input?

  2. For some reason, it takes me the same time using 2 nodes as using 1 node. What would you suggest to run the program most efficiently?

Thanks.

Define matrix as input features

Dear Dr. Ouyang
Is there any possibility to define a matrix as input features in SISSO code? such as coulomb matrix, which extensively used
in ML calculations.

Regards

CV_RMSE

Hi! I was just wondering if the cross validation part of the code can be utilized. I have observed that the code calculating the CV is present in the code but for some reason has been set aside (and also has been removed from the input parameters post the first version of the code). Is there any specific reason for this, or could I just un-comment those parts of the code and use that feature?

SISSO input target feature

Hello Dr. Ouyang,

I am doing some experiments with SISSO and I am wandering how SISSO figures out which column is the target feature? If I understood it right, in train.dat_classification this is the materials-column and in train.dat_regression this is the property-column? If I want to run it on my own dataset how should I name my target variable? Is the name important?

Thank you in advance!

subs_sis >= 1000 --> segmentation fault

I am facing an other problem. As soon as I set the variable subs_sis >= 1000 I get a segmentation fault:

Final sure independence screening ...
Size of the SIS-selected subspace from phi02: 1000
FC done!

DI starts ...
forrtl: severe (174): SIGSEGV, segmentation fault occurred

This hapens before the first iteration finishes. Do you have any Idea why I get this error?

correlation between features

Dear Dr. Ouyang
Hi,
How I can calculate the correlation between different primary features which were implemented in the SISSO run?
Is there any built-in subroutine for doing this?

Best Regards.

Documentation or tutorials?

Dr. Ouyang,

Is it possible to create documentation or maybe a tutorial explaining different parameters that can be used to run this SISSO code for different tasks? I'm a new FORTRAN user and have been trying to reproduce your regression example. I was finally able to make it work with difficulty. It might be nice to have an end-to-end tutorial for new users like me and will encourage a larger community to use this excellent code.

Classification input error

Hello, the error I'm getting is from sisso.f90 line 533 "no y value in the train.dat file for classification".
I was wondering how the code recognizes the columns? My input file is set up like the example input you have with the materials column followed by feature columns. I have specified the classes (#,#) accordingly.

SISSO.txt

About cross validation procedure in SISSO

Hello, thank you for developing fascinating tools for descriptor identification.
I have some questions when performing cross validation for regression.

  1. In the paper, J. Phys.: Mater. 2, 024002 (2019), standardization of the input features has been mentioned.
    Also to avoid contamination of train/test dataset, it is stated that standardization within only training set is recommended.
    Question is, do I need to standardize the input features of materials when writing the "train.dat" file?
    Or does SISSO fortran code includes standardization process during regression?

If the standardization should be performed outside of current SISSO code, then should I standardize training data and test data separately to avoid above mentioned contamination of dataset?
I have seen several posts relating this scaling issue, but still confused.. #3, #10

  1. Using ST-SISSO, I tried Leave 10% out cross validation on my dataset by randomly selecting prediction set 30 times as suggested in the paper.
    This results in 30 models with some of descriptors having very high frequency of occurrence.
    For instance, 2D descriptor was shown to give small RMSD error and
    Among 30 models, how can I decide optimum descriptors and corresponding coefficients and intercept?
    Following is an example of L10%out CV with 30 times result.

CV RMSE MaxAE_T MaxAE_P interc coeffs. Descriptor

CV1 14.07 58.67 108.80 81.19 -0.83, 34.61 [((Eb_Ni/LUMO)/cos(E_pro))], [((Eb_HF-Eb_Li)abs(Eb_Li-Eb_PF5))]
CV2 13.04 39.09 60.40 81.67 -0.89, 1.19 [((Eb_Ni/LUMO)/cos(E_pro))], [((Dipole/Type)/log(LUMO))]
CV3 12.28 41.09 70.24 76.58 0.86, 7.56 [((Dipole/LUMO)/cos(E_pro))], [((Eb_HF-Eb_Ni)+(Type
Eb_PF5))]
CV4 13.38 39.27 58.53 81.78 -0.89, 1.19 [((Eb_Ni/LUMO)/cos(E_pro))], [((Dipole/Type)/log(LUMO))]
CV5 12.31 38.70 79.80 79.02 -0.98, -10.34 [((Eb_Ni/LUMO)/cos(E_pro))], [((Eb_Ni-Eb_PF5)abs(Eb_Li-Eb_PF5))]
CV6 13.07 43.51 51.02 83.42 4.74, 0.71 [(cos(LUMO)/cos(E_pro))], [((Dipole/LUMO)/log(LUMO))]
CV7 12.47 41.58 68.29 76.50 0.85, 8.00 [((Dipole/LUMO)/cos(E_pro))], [((Eb_HF-Eb_Ni)+(Type
Eb_PF5))]
CV8 13.02 58.61 43.21 76.00 -0.87, -9.38 [((Eb_Ni/LUMO)/cos(E_pro))], [((Eb_Ni-Eb_PF5)-(TypeEb_HF))]
CV9 13.20 40.14 57.41 82.90 -0.90, -0.28 [((Eb_Ni/LUMO)/cos(E_pro))], [((Eb_Ni
Dipole)/log(LUMO))]
CV10 13.07 42.08 42.21 83.64 0.80, 1.13 [((Dipole/LUMO)/cos(E_pro))], [((Dipole/Type)/log(LUMO))]
CV11 13.97 59.30 31.19 77.68 -0.82, 6.74 [((Eb_Ni/LUMO)/cos(E_pro))], [((Eb_Ni/Eb_Li)abs(Eb_Li-Eb_PF5))]
CV12 14.11 57.08 297.52 93.33 -0.82, -2.80 [((Eb_Ni/LUMO)/cos(E_pro))], [(log(Type)/abs(Eb_Li-Eb_PF5))]
CV13 10.81 36.53 104.99 85.06 -7.16, -0.31 [cos((HOMO/Eb_Li))], [((Eb_PF5/Eb_HF)/cos(LUMO))]
CV14 14.01 58.78 175.01 80.32 -0.81, 36.92 [((Eb_Ni/LUMO)/cos(E_pro))], [((Eb_HF-Eb_Li)abs(Eb_Li-Eb_PF5))]
CV15 13.05 48.58 98.92 85.57 1.18, -2.26 [((Type/LUMO)/cos(E_pro))], [(sin(HOMO)/cos(E_pro))]
CV16 13.28 58.42 107.78 123.92 -0.83, -50.59 [((Eb_Ni/LUMO)/cos(E_pro))], [exp(-abs(Eb_Li-Eb_PF5))]
CV17 13.93 56.94 20.18 81.26 -0.79, 36.41 [((Eb_Ni/LUMO)/cos(E_pro))], [((Eb_HF-Eb_Li)abs(Eb_Li-Eb_PF5))]
CV18 11.93 38.80 105.97 80.57 -2.46, 34.30 [((LUMO
Eb_HF)/cos(E_pro))], [(abs(Eb_HF-Eb_Li)abs(Eb_Li-Eb_PF5))]
CV19 13.71 41.30 17.27 82.92 0.78, -0.26 [((Dipole/LUMO)/cos(E_pro))], [((Eb_Ni
Dipole)/log(LUMO))]
CV20 13.29 57.89 44.36 82.85 0.72, -20.56 [((Dipole/LUMO)/cos(E_pro))], [(abs(Eb_HF-Eb_PF5)-abs(Eb_Li-Eb_PF5))]
CV21 12.21 41.33 213.96 77.13 0.87, 7.37 [((Dipole/LUMO)/cos(E_pro))], [((Eb_HF-Eb_Ni)+(Type
Eb_PF5))]
CV22 13.87 58.06 113.07 74.70 -0.84, -9.94 [((Eb_Ni/LUMO)/cos(E_pro))], [((Eb_Ni-Eb_PF5)-(Type
Eb_HF))]
CV23 12.30 44.19 337.82 80.24 -11.52, 0.45 [((Eb_Li/LUMO))^6], [(exp(-Eb_Ni)/sin(E_pro))]
CV24 13.54 50.41 106.22 82.99 1.37, -5.64 [((volume/LUMO)/cos(E_pro))], [((Eb_Li-Eb_PF5)/(Eb_HF/Eb_Li))]
CV25 12.16 37.31 69.68 80.06 -0.98, 40.87 [((Eb_Ni/LUMO)/cos(E_pro))], [((Eb_HF-Eb_Li)abs(Eb_Li-Eb_PF5))]
CV26 13.92 42.68 128.83 87.57 -0.13, 1.48 [exp((Dipole/LUMO))], [(log(Type)/cos(E_pro))]
CV27 13.59 41.22 291.34 82.68 0.76, -0.27 [((Dipole/LUMO)/cos(E_pro))], [((Eb_Ni
Dipole)/log(LUMO))]
CV28 14.33 57.50 24.78 81.69 -0.81, 35.23 [((Eb_Ni/LUMO)/cos(E_pro))], [((Eb_HF-Eb_Li)*abs(Eb_Li-Eb_PF5))]
CV29 13.59 40.87 51.63 82.26 0.75, 1.08 [((Dipole/LUMO)/cos(E_pro))], [((Dipole/Type)/log(LUMO))]
CV30 12.57 49.80 109.57 86.00 1.96, -2.39 [((volume/LUMO)/cos(E_pro))], [(sin(HOMO)/cos(E_pro))]

  1. I plotted boxplot as presented in the paper to my research project.
    Unlikely to what was given in the paper, where overall error decrease with the increase dimension and rung (increasing complexity), My result indicates the opposite.
    Overall error is increased with the dimension and rung.
    How should I interpret the result? Does this mean that my selected features are inappropriate to describe the target property?
    (I'm trying to predict experimental values using DFT calculated features which is actually pretty demanding task)
    SISSO_boxplot

  2. How should I compose "train.dat" file for MT_SISSO in case I have some missing features?
    I think I'm having trouble with the attached "train.txt" file.
    For some material, several features are not available and cannot be given into the input file.
    So I left it as blank... but is it correct way of making input file?
    If not, how can I make it for MT-SISSO?

My questions are quite naive and dummy, but would be appreciated for your help!
train.txt

How does SIS choose features in Sid

Hello,Dr Ouyang!
I use SISSO and try to find the top1000 features correlated to the target.
However,there are different results in the following two situations:
1)I set parameters in SISSO.in like this:
subs_sis=1000
dim=1
then I get top1000 features in feature_space/space_001d.name and feature_space/space_001d_p001.dat
2)I set parameters in SISSO.in like this:
subs_sis=400
dim=3
then I get 3*400 features, in feature_space/space_001(002,003)d.name and feature_space/space_001(002,003)d_p001.dat
BUT!
The 1st feature in feature_space/space_002d.name in case2 is not the 401st one in feature_space/space_001d.name in case1.
What's happened when SIS runs in different epoches?

Double counting in multiclass classification.

Hi Dr.Ouyang, i am running a multiclass (5) classification problem with about 736 data points. In the output file, I get something like this

Number of data in all overlap regions (the first metric): 300
Size of the overlap (the second metric): 0.32131
Actual number (without double counting) of data in all overlap regions: 204

Can you please elaborate a bit more on what 'double counting' is over here. I have some assumptions but I fear I might have it wrong.
Also my classes are distributed unevenly (179 68 373 68 48), will this imbalance affect my output?

p.s. I am running the ST-SISSO version.

SISSO models different for shuffled training data

Hello,

I am getting different results for a regression model using the same training data (and input parameters) but with shuffled lines in the train.dat file. It seems like an unintended behavior for the code, but could also be because I made a mistake.

Attaching the two shuffled training-data (train_shuffle1,2.dat), and my SISSO.in file here. Advice would be much appreciated. Using SISSOv3.0.

Thanks.
Shuffled.zip

parallelisation

Could you please give some suggestions how to run your code? If I use a small amount of CPU'S the code of course gets very slow after a few iterations. However, if I start with a large amount of CPU's the code does not proceed at the very beginning. I figured out that I can start with a small number of CPU's and then restart the code with a larger amount of CPU's one there is no progress visible. But this seems not to be a very elegant solution.

Error while reading file "train.dat"

I am trying to run SISSO but the only output I got is this error message:

Error while reading file "train.dat" !!!

The contents of SISSO.out are as follows:

****************************************************************
  Sure Independence Screening and Sparsifying Operator (SISSO)
             Version SISSO.3.2, September, 2022.
****************************************************************

Read in data from train.dat.

It does not specify what the error while reading 'train.dat' actually is so I am pretty lost. I double checked and my train.dat has the correct number of rows and columns based on SISSO.in, and the names of the rows have no spaces or mathematical operators.

About the number of operators

Hello,Dr Ouyang!
When I set rung=3, I find in my Uspace.name file all formulas have 7 operators, less than the maxcomplexity=10 that I set, which seems to not work. So if I want more complicated formula, should I set rung=4 or more? Does the maxcomplexity parameter only work when rung=3 and maxcomplexity <7 ?
Or if I was wrong, please tell me the maximum operaters rung=3 can reach.
Thank you very much!

Python code?

Hi Dr. Ouyang, do you have a copy of SISSO code based on Python?

Error: forrtl: severe (64): input conversion error, unit -5, file Internal List-Directed Read

Hello,
I executed SISSO_predict from the terminal (after successful installation and previous successful executions) and this error appears:


dimension: 1
feature: ((Z_A+Z_B)/rP_B)
forrtl: severe (64): input conversion error, unit -5, file Internal List-Directed Read
Image PC Routine Line Source
SISSO_predict 00000000004089AB Unknown Unknown Unknown
SISSO_predict 00000000004328D1 Unknown Unknown Unknown
SISSO_predict 00000000004315F5 Unknown Unknown Unknown
SISSO_predict 0000000000405BFF Unknown Unknown Unknown
SISSO_predict 0000000000402AC2 Unknown Unknown Unknown
libc-2.27.so 00007FE78219DB97 __libc_start_main Unknown Unknown
SISSO_predict 00000000004029AA Unknown Unknown Unknown


I do not understand the error or what causes it. What I noticed also is that in SISSO.out the following unusual lines appear after executing SISSO (see attached file):


Model/descriptor for generating residual:
coefficients_001: 0.7195685272E-02
coefficients_001: -0.1425151870E-02
Intercept_001: 0.3607139813E+01
coefficients_001: 0.5149789758E-01
Intercept_001: 0.3651808281E+01
coefficients_001: 0.1518181025E-02
Intercept_001: 0.3558224874E+01
coefficients_001: 0.3887855296E-01
Intercept_001: 0.3705094158E+01
coefficients_001: 0.3067298966E-02
Intercept_001: 0.3662611429E+01

      Intercept_001:     0.3677778981E+01

I do not know what causes these lines to appear, either. In previous attempts to run the code, a similar output was generated, with only the 2D descriptor part being skipped (the 3D descriptor followed directly after the 1D descriptor).

Any idea what happened here?

Thanks in advance!

Milena
files.zip

Clustering?

Hello Dr. Ouyang,

We know that this code can perform classification tasks, but is there a scope for performing clustering exercise with this code?

Question about regenerating your metal/non-metal classifier

Hello

I am trying to regenerate your non-metal/metal classifier in your paper.
But when I tried to regenerate with this SISSO.in, I couldn't get same results as you.
May I ask which parameter is wrong??

image

Order of primary feature in train.dat

image

SISSO-编译方法

SISSO-编译方法

E-Mail: [email protected]

SISSO安装使用mpiifort,需要安装intel编译工具,根据自身集群环境可进行选择版本以及调优,以下为测试过的版本

Windows版本

thanks to @testlablive

软件版本

编译器:Intel.Parallel.Studio.XE.2019.Cluster.Edition

操作系统:Windows 10

SISSO-Windows:https://github.com/testlablive/SISSO-for-windows

编译安装

在Windows下编译时,使用下列的编译命令,在命令行中进行编译

  • 下载源代码3.0.2版本
  • 安装Intel.Parallel.Studio(安装完之后重启)

mpiifort -F 100000000 -O2 var_global.f90 libsisso.f90 DI.f90 FC.f90 SISSO.f90 -o SISSO

Linux版本安装

软件版本

编译器:parallel_studio_xe_2019_update5_cluster_edition

操作系统:CentOS 7.6 (Ubuntu 18.04)

SISSO-Linux: https://github.com/rouyang2017/SISSO

编译安装

在Linux下编译安装时,注意安装前的检测,会提示安装g++,关于提示libc-32位版本的问题可以忽略

  • 下载源码3.0.2版本
  • 安装intel_parallel_studio之后,需要载入环境变量
    • source $INTEL_INSTALL_DIR/parallel_studio_xe_2019/psxevars.sh
    • source $INTEL_INSTALL_DIR/intel/impi/2019.5.281/intel64/bin
    • 个人环境可放入.bashrc
  • 运行时需要修改系统默认栈大小 ulimit -s unlimited

Feature dimensions

Dr. Ouyang,

Can you please explain how the feature dimensions work? In the input script, suppose I have dimclass=(1:2)(3:4)(5:5), does this assign dimensions as (mass, length, time) or does this just group features an assumes they have same dimensions?

My data looks as follows and I have also added their units here. How would I make sure that equations given by SISSO are dimensionally consistent? I'm currently using dimclass=(1:2)(3:3) and the equations generated don't seem right.

Materials Del_P Pipe_D (m) Inlet_V (m/s) angle (deg)
sample1 24.937540 0.005 0.020 1.0
sample2 23.688087 0.005 0.019 2.0
sample3 22.438908 0.015 0.018 1.0
sample4 21.190007 0.025 0.017 4.0
sample5 19.941388 0.007 0.016 5.0
... ... ... ... ...

Folder residul related issues

Hello
I hope this email finds you well. I have a question regarding the "residual" folder that is generated by the model. I would appreciate it if you could provide me with more details about its purpose and contents, as I'm not quite clear on its specifics.

Could you kindly explain what the "residual" folder represents? I'm particularly interested in understanding its functionality and the role it plays in the generated output. Additionally, if you could provide any information regarding the files typically found within the "residual" folder, it would greatly assist me in gaining a better understanding of its significance.

Thank you very much for your time and assistance. I look forward to your prompt response.

Non-physical features with feature_unit

Hi Dr. Ouyang,

I want to first thank you for this great work - this is a very powerful idea and code, and I'm excited to see its full potential in the future. I've run into a small issue with the code, and I'm hoping you can help me figure out how to fix this. Thank you in advance for your help.


NOTE: I updated to the most recent version to this date (ea96f46)

I've been trying some sample datasets with SISSO, and I think I've run into a problem where SISSO seems to suggest features that don't have physical meaning - the units do not match.

The dataset comes from UCI-Airfoil_Self-Noise. The features are below:

  • Frequency (Hz) - "freq"
  • Angle of attack (deg) - "angle"
  • Chord length (m) - "chord"
  • Free-stream velocity (m/s) - "fsv"
  • Suction side displacement thickness (m) - "ssdt"
  • Scaled sound pressure level (dB) - "sound" (output)

Here are the first few lines of the "train.dat" file:

samples sound freq angle chord fsv ssdt
0 126.201 800 0.0 0.3048 71.3 0.00266337
1 125.201 1000 0.0 0.3048 71.3 0.00266337
2 125.951 1250 0.0 0.3048 71.3 0.00266337
3 127.591 1600 0.0 0.3048 71.3 0.00266337

And here is the feature_unit file I've been using (columns correspond to m, s, deg):

0 -1 0
0 0 1
1 0 0
1 -1 0
1 0 0

I've formatted these in markdown for this post, but of course they are tab-delimited.

Looking at the "Uspace.expressions" file, I see some features that do not seem to have physical meaning (I have the first few lines below):

  • (sqrt(ssdt)*(freq*chord)) corr= 0.7020
  • ((freq*ssdt)*(chord-ssdt)) corr= 0.6955
  • ((freq/fsv)*sqrt(ssdt)) corr= 0.6829
  • ((freq-fsv)*(ssdt*chord)) corr= 0.6809
  • ((angle-freq)*(ssdt*chord)) corr= 0.6795

For example, freq [1/s] - fsv [m/s] does not make physical sense, and neither does angle [deg] - freq [1/s]. Am I doing this wrong, or is this a unique case?


Please let me know if you need any more information. I look forward to hearing back from you, and thank you in advance for your help.

SISSO div0 problem

Hello, I am working with the SISSO.3.0 version of June 2019 and encountering the following problem:
It sometimes happens that SISSO fit returns a model that due to division/log by featureA or division/log by (featureB-featureC) is of no use for applying SISSO predict, if in the prediction.dat featureA or (featureB-featureC) contains a 0 (or very small value). As I am embedding SISSOfit and -predict into a loop in which each iteration is dependent of the previous one, this causes some problems as the code stops execution at that point. I can program a workaround for this using the models and feature_space folder, but an automatic reselection of the next best model in case of failure of the chosen one could be worth implementing into the next SISSO version? Thanks in advance!

some clarifications about the output from a quantitaive model prediction

Hi! So considering that I am doing quantitative prediction, I have a few questions:
a. is there a place I can get the coefficients and intercepts for models other than the top ranked model?
b. Does it do scaling and standardization internally?
c. Does it consider the values in an absolute sense internally, because i ran two datasets with absolute values same and the sign (poisitive and negative) changed in some instances of the the two and the output models were the same. But this could be a special instance of the dataset too.
d. I understand that the output model should be put in the form of y'=mX+c where X is the value evaluated from the descriptor, and this finally would give me the predicted output variable. Is there any way I can change the linear function to a different function, say a polynomial function of order two?
e. Are two different descriptors linked in any way with each other (incase they are a multi dimensional descriptor and also incase they are not). really naive question, but bugs me a lot. :p

missing data representation

In the MT-SISSO paper you explain the capability of dealing with missing or unknown data.

How is this represented in the train.dat file?

Compiling this code with NVDIA support compiler

Hi Runhai,

I am trying to compile this code on ALCF Polaris machine. The problem is that they don't have mpiifort compiler. The compiler I used on that machine was nvhpc. But that leads to some error while building SISSO.f90. Some of the functions are build-in functions of ifport and intel. I tried to compile ifport by hand. But after compiling ifport there is still some error:
################################
ftn -Mfree -acc -mp=multicore,gpu -gpu=cc80 -Mcudalib=cublas,cufft -Mcuda=lineinfo -traceback -Minfo=mp,acc -gopt -traceback -cpp ifport.f90 var_global.f90 libsisso.f90 DI.f90 FC.f90 SISSO.f90 -o ../bin/SISSO
ifport.f90:
NVFORTRAN-W-0119-Redundant specification for sizeof_time_t (ifport.f90: 51)
NVFORTRAN-W-0119-Redundant specification for sizeof_size_t (ifport.f90: 52)
NVFORTRAN-W-0119-Redundant specification for sizeof_clock_t (ifport.f90: 53)
0 inform, 3 warnings, 0 severes, 0 fatal for ifport_types
var_global.f90:
libsisso.f90:
DI.f90:
FC.f90:
SISSO.f90:
/usr/bin/ld: DI.o: in function di_model_': /home/siyugao/Polaris/SISSO/SISSO-master/src/DI.f90:542: undefined reference to isnan_'
/usr/bin/ld: /home/siyugao/Polaris/SISSO/SISSO-master/src/DI.f90:523: undefined reference to isnan_' /usr/bin/ld: /home/siyugao/Polaris/SISSO/SISSO-master/src/DI.f90:507: undefined reference to isnan_'
/usr/bin/ld: SISSO.o: in function MAIN_': /home/siyugao/Polaris/SISSO/SISSO-master/src/SISSO.f90:94: undefined reference to makedirqq_'
/usr/bin/ld: /home/siyugao/Polaris/SISSO/SISSO-master/src/SISSO.f90:95: undefined reference to makedirqq_' /usr/bin/ld: /home/siyugao/Polaris/SISSO/SISSO-master/src/SISSO.f90:96: undefined reference to makedirqq_'
pgacclnk: child process exit status 1: /usr/bin/ld
################################
Since I don't want to make any change to your code, in case I might screw it, do you have other suggestions to build the code with GPU support? Is there anyone else trying to build the code on GPUs?

Running SISSO on Windows

The current version v3.0.2 of SISSO has multiple system calls that are not available on Windows. Therefore SISSO won't run on Windows machines. It might be possible to use built-in Fortran functions rather than using system calls in future releases.

In the meantime if you want to run SISSO on a Windows machine, you can use the version SISSO-for-windows, which is an adapted version v3.0.2 of SISSO. All linux specific system calls are replaced with windows system calls.

LS_RSME

How is the LS_RSME exactely calculated? How is the data split into training and learing set? Could you plead add this in the documentation? It is not clear if this is done the same way as in your paper.

Max number of tasks for multi target

Hi,

Is there any limitation to the maximum number of tasks when doing multi task SISSO?
For instance, I have 160 properties (they are some temperature variable), with 200 samples each. My train.dat file thus contains 32001 lines. In SISSO.in I have ntasks=160 and nsamples=200,200,... 160 times. However, I get an end-of-file during read error. Reducing the ntask parameter to 123 works (124 not) while leaving the rest as it works.

Also if I do exactly the same but for 80 tasks (ntask=80, 16001 lines in train.dat, and 200,200,... 80 times) everything works, so I assume my input files are right.

Thank you.

SISSO_predict with Multi-Task SISSO

Hi
Is there a way to use SISSO_predict with a multitask learning SISSO.out ?
If yes, how should the input in the parameter file look like?

Installation Help (no prior experience with fortran)

Hello,

Sorry to bother you, but I'm not familiar with fortran compilers at all.
I read your advice on some other issue comments so I installed Intel Fortran Compiler and the Intel MPI Library based on your links for windows. I also installed powershell.
In powershell, I set the directory to the location of src and pasted your recommended installation code.

This just returned "You have to source \bin\mpivars.bat"

Given that I don't know anything about the procedure I'm trying to implement, is there any relevant tutorial you might recommend to help me understand how to properly compile the fortran files to windows executables?

Thank you

error in first run

Dear Dr. Ouyang
I'm Innovator in using sisso code and would like to use it in my machine-learning projects. However, I compiled it by mpiiforf without any compilation error, but when I run it at the presence of SISSO.in and train.dat files, the code crashes and shows this error:

forrtl: severe (59): list-directed I/O syntax error, unit -5, file Internal List-Directed Read
Image              PC                Routine            Line        Source             
sisso              00000000004986E6  Unknown               Unknown  Unknown
sisso              00000000004B7BAF  Unknown               Unknown  Unknown
sisso              00000000004B663E  Unknown               Unknown  Unknown
sisso              000000000048D611  Unknown               Unknown  Unknown
sisso              0000000000402AEE  Unknown               Unknown  Unknown
libc.so.6          00002BA0BDD06830  Unknown               Unknown  Unknown
sisso              00000000004029E9  Unknown               Unknown  Unknown

Both SISSO.in and train.dat are attached for clarity.
This is my SISSO.in file:

!_________________________________________________________________
! keywords for the target properties                               
!_________________________________________________________________
ptype=1                               
ntask=1                               
nsample=33                            ! number of samples for each task
task_weighting=1
desc_dim=5                            ! dimension of the descriptor  
restart=.false.                       ! set .true. to continue a job that was stopped but not yet finished 
!_________________________________________________________________
!keywords for feature construction and sure independence screening 
!_________________________________________________________________
nsf= 4                            ! number of scalar features (one feature is one number for each material)
rung=2                                ! rung (<=3) of the feature space to be constructed (times of applying the opset recursively)
opset='(+)(-)(*)(/)(exp)(log)(^-1)(^2)(^3)(sqrt)(cbrt)(|-|)'
maxcomplexity=10                      ! max feature complexity (number of operators in a feature)
dimclass=(1:2:3:4)                    ! group features according to their dimension/unit; those not in any () are dimensionless
maxfval_lb=1e-3                       ! features having the max. abs. data value < maxfval_lb will not be selected 
maxfval_ub=1e5                        ! features having the max. abs. data value > maxfval_ub will not be selected
subs_sis=50                          ! size of the SIS-selected (single) subspace for each descriptor dimension
!_________________________________________________________________
!keywords for descriptor identification via a sparsifying operator
!_________________________________________________________________
method='L0'                           ! sparsification operator: 'L1L0' or 'L0'; L0 is recommended!
fit_intercept=.true.                  ! fit to a nonzero intercept (.true.) or force the intercept to zero (.false.)
metric='RMSE'                         ! for regression only, the metric for model selection: RMSE,MaxAE
nm_output=100                         ! number of the best models to output

train.dat :

materials    dft_half_gap    pbe_gap         total_e            colomb_pot_e                  colomb_e
AlAs-216      3.149          1.355         -68080.02590        -104935.5177              -134814.8317
AlAs-186      3.069          1.689         -136160.0320        -209869.5141              -269630.2325
AlN-186       6.121          4.129         -16164.42650        -22887.61690              -30984.17055
AlN-216       5.667          3.194         -8082.159900        -11441.74040              -15491.99290
AlN-225       6.800          4.521         -8081.988900        -11387.52260              -15490.23130
AlP-186       3.330          1.857         -31778.35700        -46347.78760              -61421.24900
AlP-216       3.121          1.458         -15889.16660        -23174.11200              -30710.30160
AlSb-216      2.480          1.170         -182947.1691        -296433.0431              -370563.0144
BAs-186       2.116          1.041         -124315.2232        -192401.9206              -246681.0124
BAs-216       2.121          1.089         -62157.60320        -96202.02930              -123340.4979
Be2C-225      2.932          1.081         -1832.390600        -2288.257900              -3355.758500
BeO-186       10.504         7.532         -4888.655000        -6466.508300              -9150.075400
BeS-216       4.865          2.919         -11247.25590        -16533.60910              -21778.02410
BeSe-194      2.069          0.445         -132971.4590        -206458.1258              -264223.1369
BeSe-216      4.421          2.478         -66486.60637        -103300.2282              -132112.6400
BeSe-225      1.668          0.185         -66485.63960        -103228.3446              -132111.3807
BN-186        7.404          5.046         -4320.406000        -5625.691400              -8039.889500
BN-194        6.287          4.279         -4320.343300        -5791.917900              -8042.123700
BN-216        6.574          4.300         -2160.242300        -2811.809600              -4019.990000
BP-186        1.989          0.913         -19934.36640        -28959.59100              -38473.69420
BP-216        1.998          1.114         -9967.189900        -14479.93150              -19236.86480
BSb-216       1.440          0.671         -177024.8354        -287628.7633              -359087.5343
CaO-225       6.539          3.550         -20536.81340        -30823.76600              -40018.13260
CaS-225       4.226          2.178         -29341.50050        -44176.84650              -57223.23550
CaSe-216      5.301          3.138         -84580.19460        -131056.9032              -167559.0155  
CdS-216       2.720          1.008         -163052.1670        -261587.9915              -328687.1789
CdSe-216      2.130          0.463         -218291.8035        -348406.6972              -439023.3847
GaAs-216      1.598          0.400         -114347.7170        -177044.2114              -226854.0568
GaN-186       3.636          1.879         -108696.6435        -167015.0723              -215058.1597
GaN-216       3.435          1.709         -54348.31670        -83505.50830              -107529.2433
GaN-225       2.626          0.530         -54347.46020        -83419.36840              -107526.7074
GaP-186       2.434          1.288         -124313.2937        -190563.5159              -245498.9299
GaP-216       2.764          1.462         -62156.71210        -95270.00460              -122749.5544

Any guideline would be greatly appreciated.

[feature overlap] feature overlap reject issue

Hi now, I try to change your feature generator from fortran to python.

and I complete the 1d SIS feature generator but sorting score part is difference with your code.

I make a score about all of the features and then make a order.

so I make a feature like this

materials    property   feature1   feature2   feature3   feature4   feature5
sample1       7.4011   7.9444   3.3169   6.4316   7.9682   2.6082
sample2       0.5658   4.5632   6.1904   2.6816   7.3536   3.3143
sample3       9.9871   2.2088   3.0835   9.1712   4.0763   5.4474
sample4       4.0016   7.0526   2.9464   0.18   4.7757   7.4445
sample5       1.4485   0.9315   6.3577   0.5511   2.65   8.2115
sample6       0.0635   2.4469   5.1507   7.6108   2.4139   9.2765

and this is your 1d name results(10 results)

(cbrt(feature2)*(feature4+feature5))  corr=      0.9886
(feature4+(feature2+feature5))  corr=      0.9821
abs((-feature4)-(feature2+feature5))  corr=      0.9821
(abs(-feature4)+(feature2+feature5))  corr=      0.9821
((+feature2)+(feature4+feature5))  corr=      0.9821
((feature2-)+(feature4+feature5))  corr=      0.9821
((-feature2)-(feature4+feature5))  corr=      0.9821
abs((-feature2)-(feature4+feature5))  corr=      0.9821
(abs(-feature2)+(feature4+feature5))  corr=      0.9821
(feature2+(feature4+feature5))  corr=      0.9821

and this is my 1d name results(10 results)

((feature2)^0.333*(feature5+feature4))           0.9886299719844789
((feature5+feature2)-(-feature4))                0.9820851989932219
((feature5+feature2)+abs(-feature4))             0.9820851989932219
(abs(-feature2)+(feature5+feature4))             0.9820851989932219
((-feature2)-(feature5+feature4))                0.9820851989932219
abs((feature5-feature3)-abs(feature2-feature3))       0.9820851989932219
abs((feature3+feature1)-abs(feature4-feature5))       0.9820851989932219
((feature3+feature2)/exp(feature2))              0.9820851989932219
((feature2/feature4)*(feature4+feature1))        0.9804105405638213
((feature4+feature1)/(feature4/feature2))        0.9797699529153319

Except this example, I tested 10 dummy like 5 features 6 smaples. and most highest scores and names are always same. but In my sorting system, I sort all of the feature and then I reject the overlap features by ftag and score. so like upper results, sometimes my results are different with your fortran results.

Finally, this is the question, Is it fine to reject the same meaning features by ftag and score? I

Errors with open source compilers

Hi, this may be some user error but I cannot get SISSO compiled using gfortran or mpif90, only ifort.

For reproducibility I've been using Docker (The same behavior is seen on a clean Ubuntu install, I tried and had a friend try too), since it's 0-overhead usually. I've included some instructions for how to use it, sorry if you know.

Here is the Dockerfile, once you get Docker installed (fairly easy following online docs) you can use this command wherever you download the Dockerfile to: "docker build -t "sisso" ."

FROM ubuntu:latest

RUN apt-get -y update && apt-get -y upgrade && apt-get install -y build-essential

RUN DEBIAN_FRONTEND='noninteractive' apt-get install -y git p7zip-full wget libboost-all-dev make g++9 cmake
RUN DEBIAN_FRONTEND='noninteractive' apt-get install -y libmpich-dev

WORKDIR /root

RUN git clone https://github.com/rouyang2017/SISSO.git

WORKDIR /root/SISSO

#RUN cat src/var_global.f90
RUN sed -i "s/use mpi/include \'mpif.h\'/g" src/var_global.f90 
#RUN sed -i "s/implicit none//" src/var_global.f90
#RUN cat src/var_global.f90

RUN cd src && mpif90 -fimplicit-none var_global.f90 libsisso.f90 DI.f90 FC.f90 SISSO.f90 -o ~/SISSO.x

#entry.sh is blank, I'm just working on compilation first
ENTRYPOINT ["/bin/bash", "entry.sh"]

I run into this error while building:

var_global.f90:18:13:

   18 | implicit none
      |             1
Error: IMPLICIT NONE statement at (1) cannot follow INTERFACE statement at (2)
libsisso.f90:37:4:

   37 | use var_global
      |    1
Fatal Error: Cannot open module file 'var_global.mod' for reading at (1): No such file or directory

If I remove the -fimplicit-none the same error results. Using libopenmpi-dev instead of libmpich-dev didn't help, and using gfortran-8 did not help.

If you want to debug, you can comment out the make line in the Dockerfile and build it like above.

However if you , then to run that SISSO tag, "docker run --entrypoint /bin/bash -I sisso"

Are we forced in to using intel compilers, or is there any way I'm missing something with the other compilers?

Thanks for your time

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.