Main Tutorials

jsoup : Send search query to Google

google-search

This example shows you how to use jsoup to send a search query to Google.


	Document doc = Jsoup
		.connect("https://www.google.com/search?q=mario");
		.userAgent("Mozilla/5.0")
		.timeout(5000).get();
Unusual traffic from your computer network
Don’t use this example to spam Google, you will get above message from Google, read this Google answer.

1. jsoup example

Example to send a “mario” search query to Google, parse the search result and filters out the domain name.

FunnyCrawler.java

package com.mkyong;

import java.io.IOException;
import java.util.HashSet;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class FunnyCrawler {

  private static Pattern patternDomainName;
  private Matcher matcher;
  private static final String DOMAIN_NAME_PATTERN 
	= "([a-zA-Z0-9]([a-zA-Z0-9\\-]{0,61}[a-zA-Z0-9])?\\.)+[a-zA-Z]{2,6}";
  static {
	patternDomainName = Pattern.compile(DOMAIN_NAME_PATTERN);
  }
	
  public static void main(String[] args) {

	FunnyCrawler obj = new FunnyCrawler();
	Set<String> result = obj.getDataFromGoogle("mario");
	for(String temp : result){
		System.out.println(temp);
	}
	System.out.println(result.size());
  }

  public String getDomainName(String url){
		
	String domainName = "";
	matcher = patternDomainName.matcher(url);
	if (matcher.find()) {
		domainName = matcher.group(0).toLowerCase().trim();
	}
	return domainName;
		
  }
	
  private Set<String> getDataFromGoogle(String query) {
		
	Set<String> result = new HashSet<String>();	
	String request = "https://www.google.com/search?q=" + query + "&num=20";
	System.out.println("Sending request..." + request);
		
	try {

		// need http protocol, set this as a Google bot agent :)
		Document doc = Jsoup
			.connect(request)
			.userAgent(
			  "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
			.timeout(5000).get();

		// get all links
		Elements links = doc.select("a[href]");
		for (Element link : links) {

			String temp = link.attr("href");		
			if(temp.startsWith("/url?q=")){
                                //use regex to get domain name
				result.add(getDomainName(temp));
			}

		}

	} catch (IOException e) {
		e.printStackTrace();
	}
		
	return result;
  }

}

Output


Sending request...https://www.google.com/search?q=mario&num=20

www.imdb.com
www.mariobatali.com
www.freemario.org
www.mariogames.be
mario.wikia.com
stabyourself.net
webcache.googleusercontent.com
www.youtube.com
www.huffingtonpost.com
www.mariowiki.com
mario.lancashire.gov.uk
amirulhafiz.deviantart.com
www.mariohugo.com
mariofoods.com
mario.nintendo.com
www.mario2u.com
www.botta.ch
en.wikipedia.org
www.mariotestino.com
www.hubmario.com
www.mariolemieux.org
pouetpu.pbworks.com
23

About Author

author image
Founder of Mkyong.com, love Java and open source stuff. Follow him on Twitter. If you like my tutorials, consider make a donation to these charities.

Comments

Subscribe
Notify of
15 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
neha
6 years ago

Hi, thank you so much for the useful program. I am curious to know if jsoup returns the result in the same order in which an incognito search would return? So that while we are iterating, we get the count at which a particular link was found and that could be equivalent to it’s page rank.

sunny sharma
6 years ago

Great

Jyothish Jose
8 years ago

Hi,
org.jsoup.HttpStatusException: HTTP error fetching URL. Status=400 i get this error all the time

D.hou
8 years ago

Hi,
Its very interesting . But i want to ask you question if its possible. I want to get a result search from google search by using an arabic key word with java api but i have an error when i want to run the programm i think ts due to charset utf-8 don’t work. Can you help me thanks in advance

Shashank Makkar
8 years ago

Hi,

Thank for sharing this. But i am dilemma whether to use their interface like “/search” or not as according to google it’s considered as illegal.
I have also checked there robot.txt file: http://www.google.com/robot.txt
interface: /search is not allowed.

So if I use this interface for 10Millions times in my java program, it will definitely create network congestion for google (particularly on this exposed interface) and then problem to me. Isn’t?

But before that please assist me in screen scraping activity i am doing.
I am trying to fetch data provided by google up-front like for word-meaning using jSoup:

https://www.google.in/?gws_rd=ssl$#q=pretend+meaning

Thanks in Anticipation

abdielcs
8 years ago

Apparently what you’re looking for is http://www.faroo.com. Maybe not as good as google but at least it’s free and 1 million queries/month.

Philip Vero
8 years ago

Very nice and comprehensive, I have one question though at the line code

String request = “https://www.google.com/search?q=” + query + “&num=20”;
the `num=20` is the number of retrieved urls I assume yet when I insert 3 it brings 11,why there
is a no direct analogy?and how could I retrieve only 3 urls.Thnx in advance.

Philip Vero
8 years ago
Reply to  Philip Vero

I found the reason , is that multiple Url’s are transfered , try to minise the output by saying to jsoup
Elements tag = doc.getElementsByTag(“h3”);
Elements links = tag.select(“a[href]”)

beacuse google uses h3 tag for each title in that way you will get the exact urls you are looking for.

Garth
9 years ago

How do you retrieve the full link from the search? Eg( https://mkyong.com/java/jsoup-send-search-query-to-google/) instead of http://www.mkyong.com. ? Appreciate any light on the matter 😀

t t
4 years ago
Reply to  Garth

I use this but it does not work for all websites
switch the getDomainName with this:
public String getDomainName(String url) {

String domainName = url.replace(“/url?q=”, “”);

int d = domainName.indexOf(“&”);

domainName = domainName.substring(0, d);

return domainName;

}

new programmer
9 years ago

Hi,

Im unable to use the jsoup.jar file with the source code above.

I have downloaded jsoup and have it in the desktop along with the class file.

Would you be so kind as to outline the steps in using the library with this code?

Kind regards.

t t
4 years ago
Reply to  new programmer

if you are using net beans just right click on libraries folder and then click add JAV/folder

T
9 years ago

Very nice google scrape of a specific number of results with jsoup java. Thanks!

Digvijay Bhakuni
9 years ago

Work’s Great