cedergrouphub / text-mined-synthesis_public Goto Github PK

Codes for text-mined solid-state reactions dataset

Python 100.00%

materials-science natural-language-processing inorganic-materials-synthesis text-mining dataset solid-state-reactions

text-mined-synthesis_public's Introduction

Text-mined Synthesis

In our project on text-mining data from literature, we have build up a large dataset of solid-state reactions. Here, we provide our auto-generated open-source dataset of 30,031 chemical reactions retrieved from 95,283 solid-state synthesis paragraphs: text-mined dataset. The data are collected using an automated extraction pipeline (see below) which converts unstructured scientific paragraphs describing inorganic materials synthesis into so-called “codified recipe” of synthesis. The pipeline utilizes a variety of text mining and NLP approaches to find information about target materials, starting compounds, synthesis steps and conditions in the text, and to process them into chemical equation.

This repo contains necessary codes and modules built to create the solid-state reactions dataset. If you find the codes and data useful, please cite our papers:

Dataset:

Kononova, O., Huo, H., He, T., Rong Z., Botari, T., Sun, W., Tshitoyan, V. and Ceder, G., 2019. Text-mined dataset of inorganic materials synthesis recipes. Scientific Data 6: 203.

Paragraphs classification:

Huo, H., Rong, Z., Kononova, O., Sun, W., Botari, T., He, T., Tshitoyan, V. and Ceder, G., 2019. Semi-supervised machine-learning classification of materials synthesis procedures. npj Computational Materials, 5(1), p.62.

Materials Entity Recognition (MER):

He, T., Sun, W., Huo, H., Kononova, O., Rong, Z., Tshitoyan, V., Botari, T. and Ceder, G., 2020. Similarity of Precursors in Solid-State Synthesis as Text-Mined from Scientific Literature. Chemistry of Materials, 32(18), pp.7861-7873.

Versions

[2020-07-13] Updated dataset 31782 solid state reactions and 9518 sol-gel precursor synthesis reactions. Updated data schema is dataset_typing.py.

Getting help

If you have questions about the project, please submit a issue or contact us ([email protected]). Thanks!

text-mined-synthesis_public's People

Contributors

Stargazers

Watchers

Forkers

huanglydd lfoppiano cngaowenbo faight4869 anngineering shuangte poioit aksub99 nytarini ping543f jzhang73 pj0616 nikolateslein suncmm alexey-krasnov dddragons anyuanay tomczakm orbital188

text-mined-synthesis_public's Issues

MER install problem

Hi, there is an error when I install Materials Entity Recognition (MER), using 'git clone [email protected]:CederGroupHub/MatEntityRecognition.git'

ERROR: Repository not found.
fatal: Could not read from remote repository.

do you have any idea?
Thanks!

Full script for recipe extraction

Hi,

Is there a script for running the full pipeline? Specifically, I am interested in a script that takes a HTML/XML/PDF article and outputs the recipe extracted from it in json format like the ones in the dataset. If not, could you provide instructions on exactly what to run to do this?

Problem with MER

When I try to import MatRecognition from material_entity_recognition, I get the error "zsh: illegal hardware instruction python3"

I have installed the git LFS successfully and installed all the dependencies as described in the README file.

Please can anyone help out.

Thanks

OSError: [E053] Could not read config file from \OperationsExtraction\operations_extractor\models\SpaCy_updated_v1.model\config.cfg

from operations_extractor import OperationsExtractor

w2v_model = 'models/w2v_embeddings_lemmas_v3'
classifier_model = 'models/fnn-model-1_7classes_dense32_perSentence_3'
spacy_model = 'models/SpaCy_updated_v1.model'

OC = OperationsExtractor(w2v_model, classifier_model, spacy_model)

I have run the above code, but it says it cannot find the (config.cfg) for the spacy model

OSError: [E053] Could not read config file from \OperationsExtraction\operations_extractor\models\SpaCy_updated_v1.model\config.cfg

version query

what version of python have you used for coding?

Data request

Hey,

Thanks for the great project.
Is it possible to share the original data, I mean not the automatically generated data (the json one) but the original text used for learning.
In the shared json data, there is this paragraph_string but only include the 50 first and last characters of the original text data. Is it possible to get this full paragraph text data?

Again, thank you very much for the great effort.

The full corpus is not provided?

Hi,
I would like to use your corpus to challenge the extraction of the recipes. However, when I checked the contents of the json files (extracted from solid-state_dataset_2019-12-03.json.xz and solid-state_dataset_20200713.json.xz), the text of the paragraph is omitted, as shown below in "<...>".

Is the full dataset published?

    {"token": "repelleted", "type": "ShapingOperation", "conditions": null}],
  "paragraph_string": "All materials were obtained from Aldrich Chemicals<...>d repelleted after each 24 or 48 h heating period."
},

[Bug]: OSError: [E053] Could not read config file from <path>\OperationsExtraction\operations_extractor\models\SpaCy_updated_v1.model\config.cfg

Email (Optional)

No response

Version

not sure

Which OS(es) are you using?

MacOS
Windows
Linux

What happened?

from operations_extractor import OperationsExtractor

w2v_model = 'models/w2v_embeddings_lemmas_v3'
classifier_model = 'models/fnn-model-1_7classes_dense32_perSentence_3'
spacy_model = 'models/SpaCy_updated_v1.model'

OC = OperationsExtractor(w2v_model, classifier_model, spacy_model)

I have run the above code, but it says it cannot find the (config.cfg) for the spacy model

OSError: [E053] Could not read config file from \OperationsExtraction\operations_extractor\models\SpaCy_updated_v1.model\config.cfg

Code snippet

from operations_extractor import OperationsExtractor

w2v_model = 'models/w2v_embeddings_lemmas_v3'
classifier_model = 'models/fnn-model-1_7classes_dense32_perSentence_3'
spacy_model = 'models/SpaCy_updated_v1.model'

OC = OperationsExtractor(w2v_model, classifier_model, spacy_model)

Log output

Operations Extractor v2.9
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[12], line 5
      2 classifier_model = 'models/fnn-model-1_7classes_dense32_perSentence_3'
      3 spacy_model = 'models/SpaCy_updated_v1.model'
----> 5 OC = OperationsExtractor(w2v_model, classifier_model, spacy_model)

File D:\conda_envs\Text-Mined dataset Paper\text-mined-synthesis_public\MaterialParser\OperationsExtraction\operations_extractor\operations_extractor.py:120, in OperationsExtractor.__init__(self, w2v_model, classifier_model, spacy_model)
    117 print("Operations Extractor v2.9")
    119 my_folder = os.path.dirname(os.path.realpath(__file__))
--> 120 self.__nlp = spacy.load(os.path.join(my_folder, spacy_model))
    121 self.__embeddings = Word2Vec.load(os.path.join(my_folder, w2v_model))
    122 self.__model = keras.models.load_model(os.path.join(my_folder, classifier_model))

File ~\anaconda3\envs\MaterialParser\Lib\site-packages\spacy\__init__.py:51, in load(name, vocab, disable, enable, exclude, config)
     27 def load(
     28     name: Union[str, Path],
     29     *,
   (...)
     34     config: Union[Dict[str, Any], Config] = util.SimpleFrozenDict(),
     35 ) -> Language:
     36     """Load a spaCy model from an installed package or a local path.
     37 
     38     name (str): Package name or model path.
   (...)
     49     RETURNS (Language): The loaded nlp object.
     50     """
---> 51     return util.load_model(
     52         name,
     53         vocab=vocab,
     54         disable=disable,
     55         enable=enable,
     56         exclude=exclude,
     57         config=config,
     58     )

File ~\anaconda3\envs\MaterialParser\Lib\site-packages\spacy\util.py:467, in load_model(name, vocab, disable, enable, exclude, config)
    465         return load_model_from_package(name, **kwargs)  # type: ignore[arg-type]
    466     if Path(name).exists():  # path to model data directory
--> 467         return load_model_from_path(Path(name), **kwargs)  # type: ignore[arg-type]
    468 elif hasattr(name, "exists"):  # Path or Path-like to model data
    469     return load_model_from_path(name, **kwargs)  # type: ignore[arg-type]

File ~\anaconda3\envs\MaterialParser\Lib\site-packages\spacy\util.py:538, in load_model_from_path(model_path, meta, vocab, disable, enable, exclude, config)
    536 config_path = model_path / "config.cfg"
    537 overrides = dict_to_dot(config, for_overrides=True)
--> 538 config = load_config(config_path, overrides=overrides)
    539 nlp = load_model_from_config(
    540     config,
    541     vocab=vocab,
   (...)
    545     meta=meta,
    546 )
    547 return nlp.from_disk(model_path, exclude=exclude, overrides=overrides)

File ~\anaconda3\envs\MaterialParser\Lib\site-packages\spacy\util.py:714, in load_config(path, overrides, interpolate)
    712 else:
    713     if not config_path or not config_path.is_file():
--> 714         raise IOError(Errors.E053.format(path=config_path, name="config file"))
    715     return config.from_disk(
    716         config_path, overrides=overrides, interpolate=interpolate
    717     )

OSError: [E053] Could not read config file from D:\conda_envs\Text-Mined dataset Paper\text-mined-synthesis_public\MaterialParser\OperationsExtraction\operations_extractor\models\SpaCy_updated_v1.model\config.cfg

Code of Conduct

I agree to follow this project's Code of Conduct

Problem with MER

this is to sincerely seek your professional acvice about the MER.
Following the steps in the README, i try to run the example in the test folder, and encounter with this error:
File "D:\aconcon\envs\txmine\lib\site-packages\keras\optimizers\optimizer_experimental\optimizer.py", line 1151, in weight_decay_fn
wd = tf.cast(self.weight_decay, variable.dtype)
Node: 'Cast_1'
Cast string to float is not supported
[[{{node Cast_1}}]] [Op:__inference_train_function_52474]

cedergrouphub / text-mined-synthesis_public Goto Github PK

text-mined-synthesis_public's Introduction

Text-mined Synthesis

Versions

Getting help

text-mined-synthesis_public's People

Contributors

Stargazers

Watchers

Forkers

text-mined-synthesis_public's Issues

Email (Optional)

Version

Which OS(es) are you using?

What happened?

Code snippet

Log output

Code of Conduct

Recommend Projects

Recommend Topics

Recommend Org