I have a list of URLs and am trying to collect their "descriptions." By description I mean what comes up, for example, if you Googled the link. For example, http://stackoverflow.com">Google: http://stackoverflow.com shows the description as
A language-independent collaboratively edited question and answer site for programmers. Questions and answers displayed by user votes and tags.
This the data I'm trying to accumulate for the URLs I have.
I tried parsing the URL's meta-descriptions, however most of them are lacking a meta-description (yet Google and other search engines manage to get a description somehow).
Any ideas? Should I just "google" each link and scrape the data? I have a feeling Google wouldn't like this...
Thanks guys.
Different search engines have different algorithms to get the description out of the page if/when they are lacking the description meta tag. Some ignore the tag even it it's there.
If you want the description Google has, the most accurate way to get it would be to scrape it. Otherwise, you could write your own or look around on the web for code that does it.
These are called snippets.
Google use proprietary (and possibly patented) methods to garner this information, so there is no simple answer.
As you suggest, they will use meta-description information if it is there. (How to set the meta-information to help Google.)
They will also honour requests from the page authors to NOT include snippets. (How to prevent Google from displaying snippets) You should probably respect this too (as well as robots.txt, of course.)
You may have some luck with existing auto-summary packages, such as OTS.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With