Giter Club home page Giter Club logo

lotus-web's Introduction

LOTUS web

Natural Products Online is an open source project for Natural Products (NPs) storage, search and analysis. This repository contains code for LOTUS, one of the biggest and best annotated resources for NPs occurrences available free of charge and without any restriction. LOTUS is a living database which is hosted in parallel at Wikidata and here. The Wikidata version allows for community curation and addition of novel data. The current version allows a more user friendly experience (such as structural search, taxonomy oriented query, flat table and structures exports).

Prior to any further action, a target folder needs to be generated, with the following command:

mvn package

To fire up a local version of it, you need to have Docker installed.

docker-compose build
docker-compose up -d
docker exec -it npoc-mongo-db bash
mongo --port 27019
use NPOC2021
db.dropDatabase()
exit

You then need to download the latest version of LOTUS (available at https://lotus.naturalproducts.net/download), and unzip it

curl https://lotus.naturalproducts.net/download/mongo -o mongodata/LOTUSlatest.zip
unzip mongodata/LOTUSlatest.zip -d mongodata/ 
cd mongodata/NPOC2021/NPOC2021/
mongorestore --port 27019 --db=NPOC2021 --noIndexRestore .
mongo --port 27019
use NPOC2021
db.lotusUniqueNaturalProduct.createIndex( {lotus_id:1})
db.lotusUniqueNaturalProduct.createIndex( {inchi:"hashed"})
db.lotusUniqueNaturalProduct.createIndex( {inchikey:1})
db.lotusUniqueNaturalProduct.createIndex( {smiles: "hashed"})
db.lotusUniqueNaturalProduct.createIndex( {inchi2D:"hashed"})
db.lotusUniqueNaturalProduct.createIndex( {inchikey2D:1})
db.lotusUniqueNaturalProduct.createIndex( {smiles2D: "hashed"})
db.lotusUniqueNaturalProduct.createIndex( {molecular_formula:1})
db.lotusUniqueNaturalProduct.createIndex( {fragmentsWithSugar:"hashed"})
db.lotusUniqueNaturalProduct.createIndex( {fragments:"hashed"})

db.lotusUniqueNaturalProduct.createIndex( {molecular_weight:1})
db.lotusUniqueNaturalProduct.createIndex( {fsp3:1})
db.lotusUniqueNaturalProduct.createIndex( {lipinskiRuleOf5Failures:1})
db.lotusUniqueNaturalProduct.createIndex( {heavy_atom_number:1})



db.runCommand(
  {
    createIndexes: 'lotusUniqueNaturalProduct',
    indexes: [
        {
            key: {
                iupac_name:"text", traditional_name:"text", allTaxa:"text", allChemClassifications:"text", allWikidataIds:"text"
            },
            name: "superTextIndex",
	    weights: { traditional_name:10, allTaxa:5  }
        }

    ]
  }
)




db.lotusUniqueNaturalProduct.createIndex( {npl_score:1})

db.lotusUniqueNaturalProduct.createIndex( { pubchemBits : "hashed" } )


db.lotusUniqueNaturalProduct.createIndex( {deep_smiles: "hashed"})
db.lotusUniqueNaturalProduct.createIndex( { "pfCounts.bits" :1} )
db.lotusUniqueNaturalProduct.createIndex( { "pfCounts.count" : 1 })

exit
exit
docker-compose up -d --no-deps --build lotus-online

lotus-web's People

Contributors

adafede avatar cthoyt avatar egonw avatar imgbotapp avatar kohulan avatar msorok avatar oolonek avatar renovate-bot avatar renovate[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

lotus-web's Issues

Errors in Lotus

How and where can erroneous entries in the Lotus database be reported and corrected? For example mescaline has an entry of being reported from Echinopsis spachianus. If a person is familiar with the reference that is included, that paper specifically did not observe mescaline in this species (either in their experimental subjects or in their controls). This is only one example of erroneous data that would benefit from correction.

Error occurred on image search

I tried dragging a molecular structure image displayed in the page into the search box to search. It accepted the search as a string (data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAASwAAADICAYAAABS39xVAAAAAXNSR0IArs4c6QAAGCVJREFUeF7tnQsUNVVZhl/vacu8lWJGkpl4SVGz0hREK0usNCXoKobaRTKy0tK8oqh5wyTSlNRUyktqZpRmmAqKKZoXLM0EJcxbLjU1wlutx/VNazicc2afmT0ze59591qs8/PPzN7fPN/M+++959v7u4xcTMAETKASApepxE6baQImYAKyYPkhMAETqIaABasaV9lQEzABC5afARMwgWoIWLCqcZUNNQETsGD5GTABE6iGgAWrGlfZUBMwAQuWnwETMIFqCFiwqnGVDTUBE7Bg+RkwAROohoAFqxpX2VATMAELlp8BEzCBaghYsKpxlQ01AROwYPkZMAETqIaABasaV9lQEzABC5afARMwgWoIWLCqcZUNNQETsGD5GTABE6iGgAWrGlfZUBMwAQuWnwETMIFqCFiwqnGVDTUBE7Bg+RkwAROohoAFqxpX2VATMAELlp8BEzCBaghYsKpxlQ01AROwYPkZMAETqIaABasaV9lQEzABC5afARMwgWoIWLCqcZUNNQETsGD5GTABE6iGgAWrGlfZUBMwAQuWnwETMIFqCFiwqnGVDTUBE7Bg+RkwAROohoAFqxpX2VATMAELlp8BEzCBaghYsKpxlQ01AROwYPkZMAETqIaABasaV9lQEzABC5afgdwEfk7Si1qVvlYSf/fp3A0toL7bSzqrdZ/vlnS0pA8s4N7X3qIFa6mez3/fV5Z0kqSDVgSqeenuIOnN+Zvd2xofIeleKwJ1sKSXSHqypNP29s633JgFa4leH+ee6UUdKulBki5aaQLR4gV0TyuN/TZeiNYzJP36EntaFqy0B8hnbSfQ9K7O3PAvf3P8he5lJT1KiDvlsRvO5vh5S+xlWbCSnh+f1EHgWvHy8IJtGvYt9iXb8enpEn+qo6d6gy2CtmOT9ZxuwarHVyVbasHK5x0Lluew8j1Nrmktga6XrMQh4dUkPVrSreOr5nMK8q2HhBuc4R5WQU9p5abUNOn+wBArvrQRdsGHgivF351RgB886W7BKuAx3G8TaghrOCJE6ZOSHiPp7S2XHCPpUZLeFOd8eCZ3XVbS1+KrqsMaVpzgHtZMT+UeN5sSOHpLSe+akMGNQ4RuEUJFLNO6wvvAMJEhGb8nTGgjTd0nRPOukt4vaVvg6D0lPTjspJe4iGLBWoSbR7nJu8fLwsv/xB1auJykD0l6dVw/ZgT8FaON3wyhesKKnQwNT15j+w1DOG4X17cj93e41eRTD4v2rhDtvT7xSv5xoFf4tri/DyZeV+1pFqxqXTeb4TePl+RG8ZK8vIclzYT3L0YdRMjnLveNl/810caFrQaaoeGnJD1E0vs2NH6XuNcvRF1nZzbyelEvPSp6dKf2rJ8eIcL1+LjXr/asp/jLLFjFu6gYA78xXooHxEvB8pCh5Vbxoh4Ydb5qaIWS7hR1fiXqZE6qKc3Q8JBW7zClyV+Je//LuO4TKRd1nPPQqPNpYefFA+tkSRSi1dz/8wfWV+TlFqwi3VKcUbyw9ABeGS/XxzNbyOQy9bOol99ze9R//bj28LCx/cI2Q8PfimP0RHYtjWAfF+30Feyj4vr3xu+/7GpIx/l3DuFaJ9iZm5q+OgvW9MxrapEhEQLy+XjR3zKy8c0k8inR7n8ntrdtSNQMDZmY5l7aQ8PE6i9xWntITH2vSKzkNtH+deL39MTr+p52vxCudUPivnXOfp0Fa3YXFGlAM+l82xCqsSed2xAOiBeaSX1CD561hdDPxrnrJp2boRHzOQhLe2iYAzr2MQS7IOrf9NXzmnGcCXLsWDfJn8OedXUQW4aN9Cxpe/Wjw1jtjlavBWs0tFVW3HzWf3iIxdSf9dvQfiBeMoZivGyvax1ESPm7bwo725/1GRrykjI0QvCeN7InCDqlveeGTf/Vau/4+DuGp9jy2ZFt2VT9TcLGrrCOmcxLb9aClc5q388kcBIReGO8XOcXcsM/3/pCR++EWKVmzuuZKzYitE1Pgt+pvpZ9cwjCz0T7H2kNPxGqdxbC8m5hJ4Gz8DmnELuSzbBgJaPa6xMJQnxkDB1KWJqyDjb2IVZ8qePPhBo0pRkaErnOizhXPNL3SXqKJIbU7Ff1F4U+NcSf3VHSkYXat9EsC1ZtHhvH3sfFpnsnjlN9tloZbhG7xEcACkNDhmNXD6EqIeKb2DI2Mjw22127ov8nYMHywwCBWgWLYexVJK0ODef0qgVrRPoWrBHhVlR1rYJVCuKPSmKyn9gnC9aIXrFgjQi3oqotWMOc9SVJfM38sgVrGMiuqy1YXYSWcdyCNczPFqxh/JKvtmAlo9rrEy1Yw9xrwRrGL/lqC1Yyqr0+0YI1zL0WrGH8kq+2YCWj2usTLVjD3GvBGsYv+WoLVjKqrCeyk+QvrEk6uimZw9gpyy1Yw9xrwRrGL/lqC1Yyqqwn7iJYU6Qst2ANc68Faxi/5KstWMmosp6YKlhTZU+xYA1zrwVrGL/kqy1YyaiynpgqWFPlp0Ow2Huqz8Z2WcF0VLa6NGfKtre1ZcGayBMWrIlArzSzOie1agU7FLAxHHudnxlp4NdZmitluQVr2HNgwRrGL/lqC1YyqqwnpvSwLFiXRu4eVtbHsL7KLFjz+CxFsMhK7CHhJf1jwZrneS2mVQvWPK5IFawpJ909h9X/WfCQsD+7na60YO2EK9vJqYJFg6lhDSQLJTvMH0v6zI6Weg5rR2Arp1uwhvFLvtqClYwq64m7CBYNpwSOfq8k0nERkEriBv7750SrLViJoDacZsEaxi/5agtWMqpqTvw2Sb8c4vWG6HH9fYf1Fqxh7rVgDeOXfLUFKxlVdSdeNkQL8WL/c3pcL9xwFxasYe61YA3jl3y1BSsZVdUn3iPE6+DWcPFzrTuyYA1zLyzJnOMN/IZx7LzagtWJaK9OIKsL81xkmaHHxQQ9qdJrFSzSavGh4U8L8pK3SB7RGRasEeEWXPWBreHi6yUxpGGCvralOT8Y2XIujlyKrAqYu1iwRvSABWtEuBVUfbkQridFLj96X28t1G7SrR8Xy5TIS/i/LTvvH8J1evz+x0z3cDNJfyDpryQ9YyYb9rpZC9Zeu7fz5q4h6aWSPiTpPZHjj+SfJCP9VOfV05zwk2HPv0k6VdLRkg6Lv3tBy4RvCPt/I3pbT5zGvK+3wj5m5EckeSrs+AfAZQQCFqwRoFZSJdmJEavXSHpY2HzVePH4ssiL99QZ7+WQsOGgECAyPjelPRTEzrNax24a9/DdcR33OGb5pbDz1fH7sTEbW3rdFqxlPgFMvr8shi3rRAmxoMdwg3jpXzkhJkQTEWqGeU/b0va2oeCPxT18POp7R+Z7+KGo96L4fXPm+l3dGgIWrOU9FncNsfo1Sc/vuH2GYwjXefFSMmwcszwg2qFX9JjEYWl7KIjQ/f6Kgc0wjRg0ju+6bGn1fhFxmNwh6tsU2zYmp8XWbcFaluvZZ+s5ko6SxBAmtTDhzUv67HhJCUTNWX60JSaIyj/2qJyhINcy8c0vPcimMFeH/fcOIWRivE+hDupGTPl1mZiABWti4DM2d7ykB4dYvaWHHdeOl/6n4mX9ox51rF5yo6jre+L3zzPUyVAQMWEuid/2UJB2EJ1vDdFJFW3WZ1IXwz5+6XG6zEDAgjUD9BmaPEES0e6IzQcGtn/beOmvHi/va3vUd/mo43ejjhN71NF1CQKNOPElkR5ReygIB44Re4YAbVokzqJzzrtKnNe1JrPLJh8fSMCCNRBgBZefIonhEi/pf2a0l2h5XuZz4mX+YGLdBFYiEmfE7wWJ1/U5jaEgbTU9pNXYqN9ZM+dFOwfEdT8RYseKAJcCCFiwCnDCiCa8WBK9GcSqHWiZs8mHh3AR94Q4fHVD5XeM4zxznMdOElOVZsh53Wj7r7c0/JA45+T45SugSyEELFiFOCKzGQzX+NJ2fmw1k7n6S1V3/RCtO0eP5HmtM1gGRE/sLiEAzx3bmC31N/Nv54ad7aHgkWHn++MY57gURsCCVZhDMpjznSFWzC01AaEZqk2q4k7x0n8txKn59E/kN/NI7GZQQmHuDBF9uiQCUuHEPmLYyLIal0IJWLAKdUxPs9h1lM/5DGfmjFK/byykfrskQiKGTvT3xLH1Mr4UMjS9uaSXhHiN0Y7rzEjAgpUR5pqqWGN2ROyG8FFJ/PeJkZoklgmxemBCQOhIJlyiWvaIoreXc6J/CrvdRsEELFjjOof5Gno9H5F0vYj/4csVwsWOAu3f1T+TxSa1kFD1T2JyPTW2KLXuPuexjo9F1Dfuc7GvMYFNBCxY4z0bLIq9nyTW7bULG84xHEHAmv/a/9/8ma9TjahtEjbWybH0hM/zTCj3CQgdgwChC0zAE07gYgLZCFiwsqG8REW3kPROSQRZEqfUp1xrRdjWiRzb8pJc9HaFzRMRBc/XNu8J1cfzvmYjAQvWOA8HO18ykfuHrerZzpfP/6u9pc8PMOFqki6UdMvY02pAVVkvZbKdnt/ZWWt1ZYsnYMHK/wg8OXpGzCu1Cy9wM4/VHgoSaJkyp7XJUtojKJSAxxLKlSQxnGXo+5USDLIN+0PAgpXXl/eK3SZvLamdlWZbKwR5rhOy9hwXS0XaosaQq1nXxpe4f5J0nRCKvHe0e23EXhFS8f27X+orTGA7AQtWvieEOSbmrY6R1GdB8DZLyDHY7pW9LYaCzTWnxVzZSflup3dND4pwBvbbcjGBrAQsWPlwEjFNT4do6anLobHPVQlhBH8Wgl1S6q2p/eH2RiJgwcoDlqUeh0sieHOuQroudhVgsn/Owq4Nd9+yZcuctrntyglYsIY7kF0IWH92q5k3dmMX0V+VxHq+uQrzaCRmveZcBrjd/SZgwRrmX76IMW/FBnQMheYuiAUZb940kyHs9snSoB+ZqX03u+cELFjDHMxyGJbQ8JKWUMjJx1Kg1ZCKqWxj/o7krOyR5WIC2QlYsPojZdkNvRkEopRCBhkWV7NhHYlHpy5/E/Nor5q6Ybe3DAIWrH5+ZnEvQ0H2/Caqu6RCmit6Ob89g1GfjKj7uVLFz3DLbnJKAhasfrSZI2I3ghLXypE3792xL/kX+91er6vIgEMWadp3MYFRCFiwdsdKD4Ztf0nCUGohuScxYduyJo9hO2sbUyP8x2jfde45gdoFix0NiPJuf5UiWSh/N0YhEzIiQAjDZ8doIFOdDFXZV51ez5iFyf0XtRogwp+/+/SYjbru5RKoWbB4Kc+S1BYodvhslqewRCRnxhMyrjBvdaykv63gkWGtIV8xcyQnXb3dhvNBKwLV+IT1hCQddTGBrARqFaymZ/XYNS9G8zKxxUvOntYrJL03khdkdcJIlbGh33ERgZ+7CXpRLAda948CovUI97RyI3d9EKhVsLpeCo6z22VqL6vZHG/dzp8sOmYfK/atopdVUyGNFdHvb8xodNc/CM1x5tHcy8oI3lXVK1jb/oXHrwdHRhR2DPiWju1bEClil7r2pHqPJDIBM5lNYb/2x88U75T67LIHF7uRsnlgrrKtd9u0QQ/rvMw93Fz2u56KCdTaw9pFsG4i6YQtCR+IGUrZaO73JH17KzHpnPFOqY8cS4fY9529qf419aKO8yxYmUC6mt0J1CpYuYeEKeR4UQmMZIiICDTxTiz43SXDTUpbOc8hhfwVIj9gV708D9sSY7DlM18C+bCxaY7QQ8Iuyj7em0CtgjXHpDuQT5H0qRhu8v9zxTvt4nC+5L1P0ndJYnfTTfN0zd93ZephXsyT7rt4wOdmI1CrYAFg6rAG2iRL8OsiiryxYYp4p10cTqaeH5fEELYp/y4JkSc/YpPQdVNeRNLMdxWHNXQR8vFRCNQsWACZOnCUNglvYAnKs8MjY8Y79XE6sWIM3fgoQOFLKR8Lxtgny4GjfTzka3oTqF2wet/4gAuJqn+CJBJNUMaMd9rVTJJTXF4SSVwpfCFkJ1JsZa8sFxOomoAFq5/73hqb9jVp4ceId9rVMuLOHhzixFdPfEtviwXaDFtdTKB6AvsgWAwLp167dh9JR0pih03KGPFOuzxcTKgjTkfE1zuupbfFNjPs2eViAntBYB8Ea64swxdIYjH0OySNEe+0ywPGh4C/k0RSVcq9Yz8shoIpMWa7tOVzTWA2ArUL1pxZhh8m6Tsk3T+8t0u8U06HExR7M0kkcaWwQwO9LTL4sDjcxQT2hkDtgjVnlmEyw7Ckh3WGhAg08U4Ekn5hoieEIeCpMW9FMCuFr5Z8xXzKRDa4GROYjEDtgjV3lmHCB5g/e1R4jOSh7OgwhVggmPSkSDxBElcKu1fctNXbmuxBckMmMAWB2gWL1Fr0Jl4wBaw1bbC3+xmS6FVRCCMg+v2GE9jDPlcEgpLElXK3iA1j3oqen4sJ7B2B2gWLLMMERc4ZY/TyiH5/VjwdTIATRjBmnsLjJd1TEklcKfS22EWCr5XOWLN3r6lvqCFQs2CVkmX4hyU9KbZNhiuT3wjKYSM9Zuy8QBIMelKsEaS8WNKHW72tkZp2tSYwL4GaBaukLMNnR/Q7Kesp50Zy1X8Ywb3nSHpmbH9M9cxh3WOknUVHMN9VmkB/AjULVklZho+RdHQEbuINNg68aohYf+9c+kom+UmWShJXCgud3xC9LaLtXUxgrwnULFilZRlmSMa6wrESq5Jsgwl2hoJfiqeSoFUi2kk24WICe0+gZsFiX6pDIgaqBEchJnwdbHo/OW1iiQ1fIx8Z81fUzd5cV2wFruZsz3WZQJEEahWsErMMszkeO5Kyn/z5I3ubhc4Pid7Wl0duy9WbQDEEahUs9mEinIF5o1IKO3YSyMli7M9Ez291s7x2ogt6iH0KvTja4aMDXwtdTGAxBGoVrKdLunCiiPKUh4GdSF8WQaMnSrr2lkw9zVbETMpv2o64LWz/s2IAi5xZfkMohYsJLIpArYJFvjsWH+fMt9fX8YeHWJHaqgkeTamLr32rCR/W7bfOusSmp3ZAxFsRNOpiAosjUKNgYTNfyZgz+uLMHiNIlJ7VT0t66Ui2NHkVEbdrxOT7x0Zqy9WaQNEEahQseiF8MeNz/skz0mUbYrZKPirsmdEUN20CyyBQo2DhmdtEqi3migggPX1idz1U0rERd/Wuidt2cyawWAK1ClbjML4SsrULaeQRrikWQbOr56EhVqTPcjEBE5iIQO2C1WCix/NoSU8N4bp4JH6kzmLRNRHtJWd7Hun2Xa0JzEtgXwQLikxKI1psDUxvi504cxW2YmZS/XOxX3quel2PCZjADgT2SbCa22ZbF4aJ5OdDuMjLN6RcN74EskaQHU5dTMAEZiKwj4LVoCQVF8LFbgYIF4uTV0tK5mjW7BFG8LiZfORmTcAEgsA+Cxa3yKJhRIsgU0SLPc+bcvvIKsMuCKfFX15Z0knxZ3pTF/lJMQETKIfAvgtWQ5pEowjXgbGtcNOzQsCImm+XRrTObAlZOR6zJSawYAJLEazGxeTvY1thelcspWER9bqs0RxnRwT3shb8cvjWyyOwNMFqPIBQEUu1SZDYIoYvjuwcuk7QyvOkLTKBBRCwYK2fp7JgLeDh9y3WR2CpguUhYX3Pqi02AS1VsDzp7offBCoksFTBwlUOa6jwgbXJyyawZMHC8ymBo8t+Qnz3JlAQgaULVkGusCkmYAJdBCxYXYR83ARMoBgCFqxiXGFDTMAEughYsLoI+bgJmEAxBCxYxbjChpiACXQRsGB1EfJxEzCBYghYsIpxhQ0xARPoImDB6iLk4yZgAsUQsGAV4wobYgIm0EXAgtVFyMdNwASKIWDBKsYVNsQETKCLgAWri5CPm4AJFEPAglWMK2yICZhAFwELVhchHzcBEyiGgAWrGFfYEBMwgS4CFqwuQj5uAiZQDAELVjGusCEmYAJdBCxYXYR83ARMoBgCFqxiXGFDTMAEughYsLoI+bgJmEAxBCxYxbjChpiACXQRsGB1EfJxEzCBYghYsIpxhQ0xARPoImDB6iLk4yZgAsUQsGAV4wobYgIm0EXAgtVFyMdNwASKIWDBKsYVNsQETKCLgAWri5CPm4AJFEPAglWMK2yICZhAFwELVhchHzcBEyiGgAWrGFfYEBMwgS4CFqwuQj5uAiZQDAELVjGusCEmYAJdBCxYXYR83ARMoBgCFqxiXGFDTMAEughYsLoI+bgJmEAxBCxYxbjChpiACXQRsGB1EfJxEzCBYghYsIpxhQ0xARPoImDB6iLk4yZgAsUQsGAV4wobYgIm0EXAgtVFyMdNwASKIWDBKsYVNsQETKCLwP8BAKCOBcO+VPYAAAAASUVORK5CYII=)

But on clicking search, it gave an error. It is repeatable.
1
2

How to download all the chemical compound and their related data of an organism from LOTUS ?

So, i have an organism and i want to download all the chemical compounds related to that organism with their smile ID and the species that produce those chemical compounds.

So what i did was just search in the web page and found all the entries of chemical compounds related to that organism. And downloaded the SDF file which was the only downloading option available. And later converted it to excel format.

But what i realized was that file was missing compound names.

So what i wanted was Compound name, Smile ID, Species it is present.

Is is possible to get it as such from the LOTUS database by any means ?

Cool feature

While I was reading this nice document, I imagine something that might be a cool feature: similarity/substructure networks of the structures color coded by taxonomy.
Let me know if that make sense for you to add.
Thanks

Inconsistency in taxon:compound, name:compound mapping? Example (Q105216729)

Hello:
I want to use Lotus to reconstruct evolution of natural products on phylogenetic trees. I began entering some references and taxon:compound pairs that are not in Lotus (or Wikidata) and found some unexpected complications and/or inconsistencies on how the taxon: compound and name:compound mapping depicted in Lotus in encoded in Wikidata and how these associations can be retrieved using queries of Wikidata. Can you help me understand what is happening so that I can add and retrieve data effectively? Thank you! Tanya

I give as example the compound blepharin (Q105216729 on Wikidata).

1. If you query "blepharin" in Lotus, you get no results.
2. If you query the InChIKey encoding of blepharin, "PYQSUTLVBSTCSK-UHFFFAOYSA-N", in Lotus, you get "Q105216729
2-[(3-hydroxy-2h-1,4-benzoxazin-2-yl)oxy]-6-(hydroxymethyl)oxane-3,4,5-triol".
3. That name "2-[(3-hydroxy-2h-1,4-benzoxazin-2-yl)oxy]-6-(hydroxymethyl)oxane-3,4,5-triol" does not appear in the Wikidata (nor PubChem) record for "blepharin". "Blepharin" does not occur in the Lotus record for "2-[(3-hydroxy-2h-1,4-benzoxazin-2-yl)oxy]-6-(hydroxymethyl)oxane-3,4,5-triol". Why not? Where is this name coming from, and why doesn't the name in the Wikidata record appear on Lotus (and vice versa)?
4. The chemical structure illustrated on Lotus for Q105216729 is not the same as chemical structure illustrated on PubChem for PubChem CID 14605136 (blepharin). PubChem has the structure depicted with N-C(=O). Lotus has the structure illustrated with N=C(-OH). Are these the same structure?
5. When you look at the Wikidata record for blepharin (Q105216729), you do not see the taxa listed on Lotus as containing this compound (record Q105216729 on Lotus). There is no "found in taxon P703" statement in the Wikidata record for the species listed on Lotus: Acanthus montanus, Blepharis edulis, Acanthus ebracteatus .
6. When you click on the Wikidata symbol for the reference for each of these taxon records in Lotus, you will see that the species name and the compound name "blepharin" both occur in the "main subject P921" statement for the reference (Q42783412) on Wikidata. This seems a dangerous way to make the taxon:compound link since a publication may has as its "main subject" multiple species and multiple compounds, but all combinations of them do not necessarily occur. How many of the taxon:compound links in Lotus are made via this "main subject" statement in the reference? Is there a plan to transfer these links to "found in taxon P703" statements on the compound Wikidata record?
7. When I query Wikidata for all compounds in, e.g. Acanthus montanus (Q4672080), it does not return blepharin (Q105216729). The query yields 10 compounds. See query text below.

SELECT DISTINCT ?taxon ?children ?childrenLabel ?structure ?structureLabel ?structure_inchi
WHERE {
VALUES ?taxon {
wd:Q4672080 # You can remove the Qxxxxxx and hit Ctrl+space, type the first letters and it should autocomplete
}
?children (wdt:P171*) ?taxon. # Include children taxa
?structure wdt:P234 ?structure_inchi ; # Get the InChI
(p:P703/ps:P703) ?children. # Found in given taxon/taxa

SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }}

8. When I query Lotus for the text string "Acanthus montanus", the query returns 16 compounds, including Q105216729. How can I retrieve the Lotus result from Wikidata?

Add the ability to flag dubious compound

Diethyl Phthalate is listed as naturally sourced, but it is widely considered an artifact of purification/isolation in natural product.

While this is one of the most common example, there are probably more, it could be interesting to have a way to flag compounds with dubious origin for manual review.

I have Lotus running locally. How do I access?

I want to connect to the local Docker instance of Lotus, so I can run the API query below. How do I do that?
I have both the mongo and lotus-online containers running, but I only see command line access via the Docker dashboard.

https://lotus.naturalproducts.net/api/search/simple?query=inchikey

Screen Shot 2022-10-14 at 3 11 21 PM

Use of NPASS

I was wondering about your use of NPASS as a data source as I've found a variety of inconsistencies with it in the past:

NPASS is listred as a source in your paper, so wondering if you had vetted these issues somehow as I also notice that the choloroquine/Vernonia brachycalyx issue isn't apparent in LOTUS

Role of WikiData in LOTUS

I want to clarify the role of WikiData in LOTUS. Initially I thought LOTUS had curated a lot of sturcture-organism data which is then disseminated via https://lotus.naturalproducts.net and also via WikiData and that WikiData's role was simply dissemination. If this is the case, is it possible to extract only LOTUS-curated data from WikiData?

I suspect this isn't the case, as reading further the favoured approach to adding data to LOTUS is to add entries to WikiData directly. In this case, is there any distinction between LOTUS data and WikiData data?

Bad SDF file when downloaded after similarity search

When trying to open the SDF file, downloaded after a similarity search with SMILES (returning ~ 200 molecules), the SDF file cannot be opened (bad file), neither with R, nor with a user interface chemical program, like ChemDraw.

> read.SDFset(sdf_file_path)
An instance of "SDFset" with 1 molecules
There were 26 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In FUN(X[[i]], ...) : bad key value pair found:
									  V3000
2: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
3: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
4: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
5: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
6: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
7: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
8: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
9: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
10: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
11: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
12: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
13: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
14: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
15: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
16: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
17: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
18: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
19: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
20: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
21: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
22: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
23: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0
24: In FUN(X[[i]], ...) : bad key value pair found:
									  0 0 V3000
25: In mde(x) : NAs introduced by coercion
26: In read.SDFset(sdf_file_path) :
  1 invalid SDFs detected. To fix, run: valid <- validSDF(sdfset); sdfset <- sdfset[valid]


> validSDF(sdf_file_path)

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘atomblock’ for signature ‘"character"’
In addition: Warning message:
In validSDF(sdf_file_path) : x needs to be of class SDFset

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Rate-Limited

These updates are currently rate-limited. Click on a checkbox below to force their creation now.

  • Update dependency openchemlib to v8
  • Update dependency org.apache.tomcat:tomcat-catalina to v10
  • Update dependency postcss-loader to v7
  • Update dependency react-bootstrap to v2
  • Update dependency react-bootstrap-range-slider to v3
  • Update dependency react-infinite-scroll-component to v6
  • Update dependency react-markdown to v8
  • Update dependency react-router-dom to v6
  • Update dependency sass-loader to v13
  • Update dependency style-loader to v3
  • Update dependency uglifyjs-webpack-plugin to v2
  • Update dependency url-loader to v4
  • Update dependency webpack to v5
  • Update dependency webpack-cli to v5
  • Update mongo Docker tag to v6
  • Update react monorepo to v18 (major) (react, react-dom)
  • 🔐 Create all rate-limited PRs at once 🔐

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Ignored or Blocked

These are blocked by an existing closed PR and will not be recreated unless you click a checkbox below.

Detected dependencies

docker-compose
docker-compose.yml
  • mongo 4.0.6
dockerfile
Dockerfile
maven
pom.xml
  • org.springframework.boot:spring-boot-starter-parent 2.2.4.RELEASE
  • org.openscience.cdk:cdk-core 2.3
  • org.openscience.cdk:cdk-atomtype 2.3
  • org.openscience.cdk:cdk-io 2.3
  • org.openscience.cdk:cdk-libiocml 2.3
  • org.openscience.cdk:cdk-formula 2.3
  • org.openscience.cdk:cdk-fingerprint 2.3
  • org.openscience.cdk:cdk-structgen 2.3
  • org.openscience.cdk:cdk-inchi 2.3
  • org.openscience.cdk:cdk-render 2.3
  • org.openscience.cdk:cdk-renderawt 2.3
  • org.openscience.cdk:cdk-renderbasic 2.3
  • org.openscience.cdk:cdk-renderextra 2.3
  • org.openscience.cdk:cdk-sdg 2.3
  • org.openscience.cdk:cdk-extra 2.3
  • org.openscience.cdk:cdk-log4j 2.3
  • org.openscience.cdk:cdk-silent 2.3
  • org.openscience.cdk:cdk-qsarcml 2.3
  • org.openscience.cdk:cdk-qsarsubstance 2.3
  • org.openscience.cdk:cdk-qsar 2.3
  • org.openscience.cdk:cdk-qsaratomic 2.3
  • org.openscience.cdk:cdk-signature 2.3
  • org.openscience.cdk:cdk-legacy 2.3
  • org.javatuples:javatuples 1.2
  • org.openscience.cdk:cdk-qsarmolecular 2.3
  • com.google.guava:guava 28.0-jre
  • org.apache.tomcat:tomcat-catalina 8.0.28
  • org.springframework.data:spring-data-rest-webmvc 3.2.4.RELEASE
  • org.jetbrains.kotlin:kotlin-maven-allopen 1.3.61
  • com.github.eirslett:frontend-maven-plugin 1.8.0
  • com.spotify:dockerfile-maven-plugin 1.3.6
  • org.apache.maven.plugins:maven-resources-plugin 3.0.1
npm
package.json
  • @fortawesome/fontawesome-svg-core ^1.2.30
  • @fortawesome/free-solid-svg-icons ^5.14.0
  • @fortawesome/react-fontawesome ^0.1.11
  • bootstrap ^4.5.0
  • formik ^2.1.5
  • jquery ^3.5.1
  • mdbreact ^4.27.0
  • openchemlib ^7.2.1
  • react ^16.13.1
  • react-bootstrap ^1.3.0
  • react-bootstrap-range-slider ^1.1.2
  • react-dom ^16.13.1
  • react-infinite-scroll-component ^5.1.0
  • react-markdown ^4.3.1
  • react-native ^0.63.2
  • react-router-bootstrap ^0.25.0
  • react-router-dom ^5.2.0
  • react-showdown ^2.1.0
  • react-star-ratings ^2.3.0
  • rest ^2.0.0
  • stompjs ^2.3.3
  • uglifyjs-webpack-plugin ^1.2.0
  • webpack ^4.43.0
  • webpack-cli ^3.3.12
  • when ^3.7.8
  • @babel/core ^7.10.5
  • @babel/plugin-proposal-class-properties ^7.10.4
  • @babel/preset-env ^7.10.4
  • @babel/preset-react ^7.10.4
  • @welldone-software/why-did-you-render ^3.3.6
  • autoprefixer ^9.8.5
  • babel-loader ^8.1.0
  • css-loader ^3.6.0
  • favicons-webpack-plugin ^3.0.1
  • file-loader ^4.2.0
  • html-loader ^0.5.5
  • node-sass ^4.14.1
  • postcss-loader ^3.0.0
  • raw-loader ^4.0.1
  • sass-loader ^8.0.0
  • style-loader ^1.2.1
  • url-loader ^2.2.0
  • xxxxx ^1.0.3

  • Check this box to trigger a request for Renovate to run again on this repository

SMILES outputs from LOTUS and WikiData

I've downloaded some metabolite data from LOTUS and am trying to cross reference this with data from ChemBL. It seems that one of the more reliable ways to do this would be to use the SMILES key.

Looking at some examples in LOTUS e.g. https://lotus.naturalproducts.net/compound/lotus_id/LTS0095286, the SMILES given by Wikidata are (canonical) "COC1=CC2=C(C=CN=C2C=C1)C(C3CC4CCN3CC4C=C)O" and (isomeric) "COC1=CC2=C(C=CN=C2C=C1)C@HO", neither of which appear to provide a direct match in ChemBL. In contrast, the 2D Smiles given by lotus for this metbolite "C=CC1CN2CCC1CC2C(O)c1ccnc2ccc(OC)cc12" matches with the ChemBL compound (https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL15088/).

This 2D smiles given by LOTUS appears in general to match with chembl, and seems to be the result of applying the rdkit method: Chem.CanonSmiles(x) to the 'canonical' smiles given in Wikidata. My question is it possible to download this 2D SMILES directly and if not, is my guess as to how it is generated correct?

Note, I'm downloading the data using the query:

SELECT DISTINCT ?structure ?structureLabel ?structure_smiles ?structure_cas ?structure_inchikey ?organism ?organism_name WHERE {
VALUES ?taxon {
  wd:Q21754 # Gentianales
}
?organism (wdt:P171*) ?taxon;
  wdt:P225 ?organism_name.
?structure (p:P703/ps:P703) ?organism.
OPTIONAL { ?structure wdt:P235 ?structure_inchikey. }
OPTIONAL { ?structure wdt:P233 ?structure_smiles. }
OPTIONAL { ?structure wdt:P231 ?structure_cas. }
OPTIONAL { ?organism wdt:P961 ?ipniID. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 100000
:return:

download files with source organisms

Hello,
I can see we can download certain files for the full database, but in these, I couldn't find a file that includes the source organisms.
Interesting enough (and I've being using it; thank you), I can search for genus and download a .sdf file with this information...

Thank you again for this database.

Returning all metabolites in a given clade, including possibly missing properties

I'm trying extract all metabolites in a plant order and include given CAS ID, INCHIKey and Smiles information. When I run:

SELECT DISTINCT ?structure ?structureLabel ?structure_smiles ?structureCAS ?structureINCHIKEY ?organism ?organism_name WHERE {
  VALUES ?taxon {
    wd:Q21754                                    # You can remove the Qxxxxxx and hit Ctrl+space, type the first letters and it should autocomplete
  }
  ?organism (wdt:P171*) ?taxon;                   # Include children taxa
                        wdt:P225 ?organism_name.  # Get organism name
  ?structure wdt:P233 ?structure_smiles;          # Get the SMILES
             (p:P703/ps:P703) ?organism.          # Found in given taxon/taxa

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 100000

I get 20968 results, however when I try to include CASID and INCHIKEY information with the following:

SELECT DISTINCT ?structure ?structureLabel ?structure_smiles ?structureCAS ?structureINCHIKEY ?organism ?organism_name WHERE {
  VALUES ?taxon {
    wd:Q21754                                    # You can remove the Qxxxxxx and hit Ctrl+space, type the first letters and it should autocomplete
  }
  ?organism (wdt:P171*) ?taxon;                   # Include children taxa
                        wdt:P225 ?organism_name.  # Get organism name
  ?structure wdt:P233 ?structure_smiles;          # Get the SMILES
             (p:P703/ps:P703) ?organism;          # Found in given taxon/taxa
             wdt:P231 ?structureCAS;          # Get the CAS
             wdt:P235 ?structureINCHIKEY.          # Get the INCHIKEY

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 100000

I only get 7967 results. I imagine this might be because the latter query doesn't return instances without a CAS ID or INCHIKEY. Is it possible to return all metabolites found in taxa and leave missing values for the properties as NaN?

Originally posted by @alrichardbollans in #27 (comment)

No mongodb data

After going into mongo and dropping the database NPOC2021 we get the following which indicates we have no data to reinstall. How do we solve this?

> db.dropDatabase()
{ "ok" : 1 }
> exit
bye
root@54ce1f4c6c12:/mongodata# cd mongodata/NPOC2021/NPOC2021/
bash: cd: mongodata/NPOC2021/NPOC2021/: No such file or directory
root@54ce1f4c6c12:/mongodata# 

Differences between relations available on Wikidata and the LOTUS web interface

Hello,

I may have noticed some inconsistencies between the data available by SPARQL request on Wikidata and those available on the LOTUS web interface, which I understood as a mirror of what's in Wikidata.

For instance, N(6)-(1,2-dicarboxyethyl)-AMP (Q2823236) in found produced by Valsa sordida (Q7912606) and this relation is referenced by the publication Phosphoproteomic and Metabolomic Profiling Uncovers the Roles of CcPmk1 in the Pathogenicity of Cytospora chrysosperma (https://doi.org/10.1128/SPECTRUM.00176-22). However, this relation does not seems to be available on the LOTUS interface.

Is this just an update issue ?

Thanks for your help.

Two entries for Valencene

If you search in LOTUS for Valencene, you get two hits:
Q105219112 and
Q289496.
They look the same to the uninitiated user, at least in the hit list. Certainly, the one with stereochemistry should be displayed in the LOTUS hitlist accordingly.

The two original entries for Valencene in Wikidata (https://www.wikidata.org/wiki/Q289496 and http://www.wikidata.org/entity/Q105219112) are different. One has no stereochemistry but species annotations and the other one has stereochemistry. The two InChI keys are different. I guess they should still be merged on LOTUS.

How to get all compounds from Plantae

Hi, thanks for your efforts on the valuable database.
I want to get all natural compounds found in Plantae with information of Chemical ontology, Organism taxonomy and smiles, and I tried to download data from MongDB. However I found the data is not consistent with that in the Lotus website.
For example, this is the organism source for LTS0257199 from MongoDB and website.
image

image
I wonder how I can get the source of organism with the same form as the website that I can clearly know its Phylum, Family, Genus, Species, etc. Can I get all these information by wikidata query? If so, how to use wiki query to get all compounds from Plantae?
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.