-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChemExper Blocking #13
Comments
A short update: Some structures are still found without ChemExper - seems to be coincidental that the first 150 were not :) |
When reading log files, I often had the impression that ChemExper blocks if repeated access is detected. However, I hoped that the other suppliers are still enough in such case. MOLfiles can be loaded from Acros, Cactus, chemicalbook, Fluorochem, NIST and Pubchem, so there are multiple sources. MSDS can be loaded from Acros, Activate, Alfa, Apollo, Biosolve, carbolution, Carl Roth, Cayman, Fisher, Fluorochem, ITW/Applichem, Merck, Oakwood and Strem. The blockings are a bit sad as Acros has many substances, good quality data, MOLfiles and MSDS... In a different case of IP address blockings, proxy services like http://anonymouse.org/cgi-bin/anon-www_de.cgi/http://sciformation.com may help, but I am not sure if we should get into this. |
Thanks for your answer! I got the same impression, the first few request go fine, then blocking starts. Indeed - they are sufficient for most of the molecules. Yet a downside, however, is the long time it takes when being blocked, as several hundreds (or thousands) of timeouts do sum up after all. I tried to deactivate Acros in the As a side question: We just see the following suppliers in our
Is there a setting we are overseeing to also have the others in the list? |
I had the same experience with ChemExper temporarily blocked after several attempts as well. As Felix said, there are also many other sources for structure and SDS.
Lines 309 to 325 in 6198356
// Khoi: removing Sigma and Acros because the scrapping scripts for these 2 site do not work and just take time
unset($addInfo[1]); // removing Acros
// unset($addInfo[4]); // removing Sigma; update 2019-07-26, Sigma search is working on A2hosting server now
// unset($addInfo[6]); // removing chemicalBook The This has worked for me but @rudolphi can tell you the best way. |
@lcnittl : I also wrote a couple python scripts to scrape structures and SDS from the internet and add the info into OE as well. they basically look into your OE database of interest, find the molecule (CAS#) with missing structure or SDS and then proceed to scrape from the internet those info. You would need python on your hosting server and root (on the host server) access. If you are interested, please let me know and I can share those scripts with you. |
@khoivan88 Thanks for your input. I think I will indeed go with option 2b. Concerning the python scripts: If you are willing to give them away I would certainly not say no :) |
@lcnittl : Here is the link to my python script to search for missing structure. You can install the required packages in https://github.com/khoivan88/update_sql_mol I have another script to update SDS but I have not upload to github yet. I will do that and then give you the link later. Update (2020-01-18): the newest version of this script should work without the extra manual Batch Processing step. I have updated instruction in the repo as well. |
@lcnittl : so this is the link to updating missing SDS. It runs very similar to the python script for update mol files. However, you just need to run this script and done, no 2nd step required. As usual, if there is any problem, please let me know. PS: I forgot to say that both of the python scripts are made for OE hosting on Linux (specifically CentOS 7), if you hosted it on a different system like Mac or Windows, you might want to change the |
@khoivan88 Thanks for the scripts - they are very much appreciated. I will have a look at them within the next days. For the OS - no problem, we are running on Debian (containerized, so I will still have a look) :) |
Is it possible that ChemExper blocks an IP that sends too many requests? We were running an inventory
Batch processing
withRead data from suppliers
. The first few entries go fine, then requests sent to ChemExper give timeouts. To probe whether ChemExper was down or not, we cURLed from another host - no problem reaching it. After waiting some hours the blocking seems to be reset.I guess there is no possible workaround for this? And did I deduce correctly, that structures are fetched from ChemExper (at least no structures were generated if we deactivated the use of ChemExper by setting
$GLOBALS["suppliers"]["acros"]["alwaysProcDetail"]
tofalse
.open_enventory/suppliers/Acros.php
Lines 29 to 38 in 6198356
The text was updated successfully, but these errors were encountered: