@Maikuolan How about adding Browser Integrity Check?
is it possible? so, people cant retrieve data using some basic get contents functions, like
file_get_contents($url);
what do you think about this?
This might help you,
https://github.com/ItayRosen/Browser-Integrity-Check
I also worried about how Google, Bing, Yandex, Facebook will handle it.
@Dibbyo456 Sorry about the delayed reply here. Delay is because I've been looking into this, and I've been finding a few problems with the idea, but I'm not quite ready to write it off yet, because I do still think it's a good idea, and if it can be made to work, it could be useful for users, I think.
Creating a separate issue for this suggestion, to avoid bloating #34 too much.
The example linked to is pretty straightforward, and on the surface, this idea seems like this should be pretty simple and easy to implement. But I think you hit the nail on the head with your original concern about search engines.
Because the browser integrity check (at least, in the manner originally suggested, as well as mostly everything else I've been able to come up with) relies on an intermediary page to be served between the initial request from any particular source and serving the actually requested resource, I think there's a high probability that this will negatively impact PR and search engine listings as a whole. In particular, as the manner originally suggested and most alternatives tend to rely on JavaScript, it's likely that it'll trip up any bots that don't support JavaScript properly or at all, or can't properly handle JavaScript in the manner required, which in the case of spambots, hacktools and so on, is a good thing, but would likely also apply to a number of search engines too, which is obviously a bad thing.
I'd briefly considered maybe just checking all requests against a whitelist prior to serving the browser integrity check, as to not serve it at all for any search engines, good bots and such things. The problem there though is that the number of things we'd want to have on the whitelist is so large - not just by way of search engines, but also by way of other potentially good bots, social networking share bots, previewers, analytics tools, uptime checkers and so on - that I suspect it'd effectively double the workload required for maintaining CIDRAM, potentially be untenable in terms of false positives and the time required to fix them and prevent them, and potentially be better suited as a secondary project, or maybe an entirely separate package or something. Any such whitelist would likely be constantly evolving over time, as new services emerge on the web, old services go offline, brand names change, user agents change and so on, so a simpler solution would probably be preferable, I think.
Anyway, I'll keep this issue open, in case anyone has any suggestions, ideas, etc, that might be helpful in regards to the issue.