Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / April 2006

Tip: Looking for answers? Try searching our database.

Sending a (UTF-8) query to Google search engine

Thread view: 
Kevin - 13 Apr 2006 16:06 GMT
Hi, All!   I am spending days now trying to get a simple program to
work.

I want to query Google with Unicode-included (Chinese,Japanese) queries
with URLConnection.

Try this on your favorite Browser:

http://www.google.com/search?hl=en&lr=&q=%e6%96%b0%e5%ae%bf+&start=0&sa=N

If you go to your browser (choose View->Encoding) , you will see that
the the browser automatically set it to (UTF-8).    If you manually
change it to ISO, and, type in the URL again, then the search returns
wrong results.

It seems that I need to set the HTTP request correctly to UTF-8.   How
do I do that?

I am using the following code, and it is NOT working.

**************************************************************
URL urlObject = new URL(url);
HttpURLConnection con = (HttpURLConnection)urlObject.openConnection();
con.setRequestProperty ( "User-Agent","Mozilla/4.71 [en] (WinNT; I)");
con.setRequestProperty("Content-Type", "x-www-form-urlencoding;
charset=UTF8");
con.setRequestProperty("Content-Encoding", "UTF8");
System.out.println(con.getRequestProperty("Content-Type")) ;
BufferedReader webData = new BufferedReader(new
InputStreamReader(con.getInputStream(), "UTF8"));
**************************************************************

Thanks!

Kevin
Roedy Green - 13 Apr 2006 20:06 GMT
>It seems that I need to set the HTTP request correctly to UTF-8.   How
>do I do that?

Google has a search parm for the encoding apart from the HTTP Header.

Try going to the google website and using their tools to build you
search boxes. They will likely include it.

Also just look at the URL Google constructs when you use one of the
search boxes on their site. e.g.

http://www.google.ca/search?client=opera&rls=en&q=simple+serial+Javax&sourceid=o
pera&ie=utf-8&oe=utf-8


note the ie and oe parms.  I presume one decribes the encoding of the
URL and one describes the desired encoding of the response.

Seems a bit odd to have a parm to control the encoding after the data
is describes though.

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

NOBODY - 20 Apr 2006 01:30 GMT
> Hi, All!   I am spending days now trying to get a simple program to
> work.
[quoted text clipped - 27 lines]
> BufferedReader webData = new BufferedReader(new
> InputStreamReader(con.getInputStream(), "UTF8"));

In the http/url specs, the URI (the part of the url after the server's
host[:port]) cannot be UTF-8 (unless some new IRI spec is considered but
that is not the point here since most servers aren't there yet)
You would have to POST to the server, not GET (change the method on
httpurlconnection).

In Http, the query on a GET is encoded in ISO-8859-1 (standard latin 1).
Maybe some server can understand an utf-8 URI but don't count on that.
Setting the request content type is futile, as there is no content sent
on a GET.

On a POST however, you can set the type to
       application/www-form-urlencoded; charset="UTF-8"
and in the body, send the utf8+urlencoded (see java.net.URLEncode)
of your params. Now, you just typed
...charset=UTF8
instead of (try these, I can remember which is good)
...charset=\"UTF8\""
...charset=\"UTF-8\""
And don't fiddle with Content-Encoding. I don't think you should touch
that (read carefully the http rfc for the meaning of each headers).

Finally, the browser view-encoding it totally useless as it is only a
rendering setting (although it may affect the content-type charset when
clicking). It is like forcing the browser to read the bytes in a given
encoding (and letting it fail if any errors).


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.