Discussion:
Problem with google search
(too old to reply)
root
2019-06-02 21:03:21 UTC
Permalink
Recently google search returns an html page that
cannot be rendered correctly by either lynx or w3m.
Chrome has no problem with the pages.

In particular try the search:

http://www.google.com/search?q=blood+pressure

Under Chrome a link to NIH appears near the top.

Under either lynx or w3m the link for NIH
is rendered:
[21]High Blood Pressure - National Institute on Aging - NIH
and is not a link to a URL.

None of what come up as links do so under either
lynx or w3m.

If you have a console work-around to access google
search please let me know.

Thanks.
root
2019-06-02 22:22:21 UTC
Permalink
Post by root
Recently google search returns an html page that
cannot be rendered correctly by either lynx or w3m.
Chrome has no problem with the pages.
http://www.google.com/search?q=blood+pressure
Under Chrome a link to NIH appears near the top.
Under either lynx or w3m the link for NIH
[21]High Blood Pressure - National Institute on Aging - NIH
and is not a link to a URL.
None of what come up as links do so under either
lynx or w3m.
If you have a console work-around to access google
search please let me know.
Thanks.
Problem somewhat solved. The links are there, they
are just invisible. It is as if they were written
but ended in CRLF and overwritten.

The search results come back as series of short
paragraphs. Just click on the line above any
one of the paragraphs.
Mike Spencer
2019-06-03 06:00:09 UTC
Permalink
Post by root
Problem somewhat solved. The links are there, they
are just invisible. It is as if they were written
but ended in CRLF and overwritten.
The search results come back as series of short
paragraphs. Just click on the line above any
one of the paragraphs.
Google.ca [1] has changed the format of "hit" URLs in search results.
For a while -- a few days -- I was seeing what you describe above.

All my Gwgle search results are run through a perl script before the
browser renders them to tweak them to my satisfaction. I scrutinized
a saved page of results and found a couple of places where my regexps
no longer matched google's locutions. Fixed that and rendering of
search results returned to status quo ante.

Can't say for sure that the format changes I found were the cause of
the blocky paragraphs and messed up links. Maybe Gwglw broke
something, then fixed it during the couple of days I was messing with
my script.

On my system [2], Google search for blood pressure (as separate words)
returns:

High Blood Pressure - National Institute on Aging - NIH

https://www.nia.nih.gov/health/high-blood-pressure

among others, the latter line as a clickable link.

[1] AFAICT and the last time I looked, google.com redirects my IP
address(s) to google.ca. That messed up part of my script so I
now send searches directly to google.ca. Shpx knows what
difference, if any, that may make.

[2] Sending User-agent request header of:

Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)

FWIW,
--
Mike Spencer Nova Scotia, Canada
root
2019-06-03 13:40:53 UTC
Permalink
Post by Mike Spencer
All my Gwgle search results are run through a perl script before the
browser renders them to tweak them to my satisfaction. I scrutinized
a saved page of results and found a couple of places where my regexps
no longer matched google's locutions. Fixed that and rendering of
search results returned to status quo ante.
I am interested in how you operate on the search results. Have
you wired into the browser or do you fetch the search page
and pipe it into the browser through the perl script?

I do the latter with google news. Even though I say I run w3m
what I use is my heavily modified version of w3m of over
a decade ago.
Post by Mike Spencer
On my system [2], Google search for blood pressure (as separate words)
High Blood Pressure - National Institute on Aging - NIH
https://www.nia.nih.gov/health/high-blood-pressure
That link comes back using either lynx or w3m. In the html
source the link is:

/url?q=https://www.nia.nih.gov/health/high-blood-pressure&sa=U&ved=2ahUKEwj1rLuT4cbiAhUK8LwKHRLGB0cQFjAAegQIIBAE&usg=AOvVaw2aCWwRNevvZbD7MP0wn7nP

to which you have to add the prefix:https://www.google.com to yield:
so it becomes a redirection through google to nih.

I never bothered to look into the html source coming from google
until the recent change.
Post by Mike Spencer
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)
I found that if I use w3m with a user agent nasdaq blocks my
traffic. nasdaq is important to me so I stick with UA=w3m/0.5.3

Thanks for responding Mike.
Mike Spencer
2019-06-03 21:03:15 UTC
Permalink
Post by root
Post by Mike Spencer
All my Gwgle search results are run through a perl script before the
browser renders them to tweak them to my satisfaction. I scrutinized
a saved page of results and found a couple of places where my regexps
no longer matched google's locutions. Fixed that and rendering of
search results returned to status quo ante.
I am interested in how you operate on the search results. Have
you wired into the browser or do you fetch the search page
and pipe it into the browser through the perl script?
+ Browser home page is a local file, which has a link to

+ A local copy of the Gwgle advanced search page saved years ago.

+ On that page the ACTION attribute of the search FORM points to
a cgi-bin script on localhost which

+ calls wget [1] to send the search string to Gwgle with returned
results piped to stdout (viz. a open perl file handle) and into a
variable.

+ After the entire returned data is accumulated in a perl variable,
that data is edited with regular expressions to suit me.
Post by root
I do the latter with google news. Even though I say I run w3m
what I use is my heavily modified version of w3m of over
a decade ago.
Post by Mike Spencer
On my system [2], Google search for blood pressure (as separate words)
High Blood Pressure - National Institute on Aging - NIH
https://www.nia.nih.gov/health/high-blood-pressure
That link comes back using either lynx or w3m. In the html
/url?q=https://www.nia.nih.gov/health/high-blood-pressure&sa=U&ved=2ahUKEwj1rLuT4cbiAhUK8LwKHRLGB0cQFjAAegQIIBAE&usg=AOvVaw2aCWwRNevvZbD7MP0wn7nP
so it becomes a redirection through google to nih.
Well, yes, you *can* add back the redirect through Gwgle but why would
you want to? AFAICT (and experience so far supports the notion)
everything after the first '&' or "&" is tracking data. You don't
need it (or the "/url?=") to reach the target page.
Post by root
I never bothered to look into the html source coming from google
until the recent change.
Cavalierly eliding all js, style & noscript blocks, meta and link tags
and globally replacing '>' with ">\n\n" makes reading it much easier.
The "non-greedy" ".*?" locution in perl regexps is your friend, e.g.

s/<noscript>(.*?)</noscript>//sgi;

[1] I was opening a socket (within perl) on gwglw and unpacking the
chunked data returned. A good educational exercise but wget is
easier, simpler to maintain.
--
Mike Spencer Nova Scotia, Canada
root
2019-06-03 22:24:50 UTC
Permalink
Post by Mike Spencer
+ Browser home page is a local file, which has a link to
+ A local copy of the Gwgle advanced search page saved years ago.
+ On that page the ACTION attribute of the search FORM points to
a cgi-bin script on localhost which
+ calls wget [1] to send the search string to Gwgle with returned
results piped to stdout (viz. a open perl file handle) and into a
variable.
+ After the entire returned data is accumulated in a perl variable,
that data is edited with regular expressions to suit me.
Thanks.
Post by Mike Spencer
Post by root
so it becomes a redirection through google to nih.
I am sorry I used the pronoun 'you' I meant the string needed a
prefix to become a valid URL.
Post by Mike Spencer
Well, yes, you *can* add back the redirect through Gwgle but why would
you want to? AFAICT (and experience so far supports the notion)
everything after the first '&' or "&amp;" is tracking data. You don't
need it (or the "/url?=") to reach the target page.
Post by root
I never bothered to look into the html source coming from google
until the recent change.
Rich
2019-06-03 03:50:24 UTC
Permalink
Post by root
Recently google search returns an html page that
cannot be rendered correctly by either lynx or w3m.
Chrome has no problem with the pages.
...
Under Chrome a link to NIH appears near the top.
Under either lynx or w3m the link for NIH
[21]High Blood Pressure - National Institute on Aging - NIH
and is not a link to a URL.
None of what come up as links do so under either
lynx or w3m.
Just tested your search above on lynx.

The NIH entry was a link that took me to the NIH page.

So I don't see the same results you see.

Note, I also did reject all of the cookies Google wanted to set.
root
2019-06-03 04:27:43 UTC
Permalink
Post by Rich
Just tested your search above on lynx.
The NIH entry was a link that took me to the NIH page.
So I don't see the same results you see.
Note, I also did reject all of the cookies Google wanted to set.
Here is what lynx brings up for me:
[15]Understanding Blood Pressure Readings | American Heart ...

Use our blood pressure chart to learn what your blood pressure numbers mean.
Systolic, diastolic? The American Heart Association helps you understand the ...
https://www.heart.org/...blood-pressure/understanding-blood-pressure- readings - 114k - [16]Cached - [17]Similar pages

[18]Blood pressure: What is normal? - Medical News Today

Blood pressure is essential to life because it forces the blood around the body,
delivering all the nutrients it needs. Here, we explain how to take your blood ...
https://www.medicalnewstoday.com/articles/270644.php - 171k - [19]Cached - [20]Similar pages

[21]High Blood Pressure - National Institute on Aging - NIH

May 2, 2018 ... Read about high blood pressure or hypertension. Learn how changes in lifestyle
--like getting more exercise and having less salt--may help ...
https://www.nia.nih.gov/health/high-blood-pressure - 149k - [22]Cached - [23]Similar pages

[24]Blood Pressure Chart & Numbers (Normal Range, Systolic, Diastolic)

There is more above and below what I cited here.

Nothing I click on brings me to the nia.nih.gov link. However
if I use tab to move down I can get to the link by hitting
right arrow.

The display on w3m (which I use) is different. As I said in
the previous post, if I click on the blank line above
the line May 2, 2018.....
I get to the site.

Someone more familiar with javascript than I suggested that
this behavior is due to the lack of javascript on these
browsers.

Knowing what I do, the google search is still useful.
I had been using bing, but it is pretty pathetic.

Thanks for responding Rich.
Rich
2019-06-03 12:51:42 UTC
Permalink
Post by root
Nothing I click on brings me to the nia.nih.gov link. However
if I use tab to move down I can get to the link by hitting
right arrow.
Ok, when you say "click" do you mean using the mouse on an xterm/rxvt
window?

Did that ever work for you? Because lynx is normally purely keyboard
driven, you have to "tab" to the link you want, then hit right arrow to
"go" to the link. Is it even possible to have lynx react to text terminal
mouse clicks?
root
2019-06-03 13:47:55 UTC
Permalink
Post by Rich
Post by root
Nothing I click on brings me to the nia.nih.gov link. However
if I use tab to move down I can get to the link by hitting
right arrow.
Ok, when you say "click" do you mean using the mouse on an xterm/rxvt
window?
Did that ever work for you? Because lynx is normally purely keyboard
driven, you have to "tab" to the link you want, then hit right arrow to
"go" to the link. Is it even possible to have lynx react to text terminal
mouse clicks?
I (almost) never run X on this machine so everything runs under console.

By default lynx ignores the mouse, and it might always do
that running under an xterm. It depends upon whether lynx is
smart enough to switch to gpm under a console, and use
X mouse routines under X.

Under a console you have to start lynx with
lynx -use_mouse

I haven't tried running either lynx or my w3m under X because
of the mouse problem.

Thanks again Rich.
Steve555
2019-06-18 19:41:12 UTC
Permalink
Post by root
Post by Rich
Post by root
Nothing I click on brings me to the nia.nih.gov link. However
if I use tab to move down I can get to the link by hitting
right arrow.
Ok, when you say "click" do you mean using the mouse on an xterm/rxvt
window?
Did that ever work for you? Because lynx is normally purely keyboard
driven, you have to "tab" to the link you want, then hit right arrow to
"go" to the link. Is it even possible to have lynx react to text terminal
mouse clicks?
I (almost) never run X on this machine so everything runs under console.
By default lynx ignores the mouse, and it might always do
that running under an xterm. It depends upon whether lynx is
smart enough to switch to gpm under a console, and use
X mouse routines under X.
Under a console you have to start lynx with
lynx -use_mouse
I haven't tried running either lynx or my w3m under X because
of the mouse problem.
Thanks again Rich.
FWIW I have found that 'links' (from www.twibright.com) works very
nicely in the console. And with the linux framebuffer console
you can get a graphical mode that displays images, by runnning 'links -g'.
And no javascript.
--
Gnd -|o----|- Vcc Hey computer, what's the weather in Sydney?
trig -| 555 |- dschrg $> finger o:***@graph.no|tail -1|espeak
o/p -| |- thrsh
rst -|-----|- cntrl Steve555
root
2019-06-19 19:04:31 UTC
Permalink
Post by Steve555
FWIW I have found that 'links' (from www.twibright.com) works very
nicely in the console. And with the linux framebuffer console
you can get a graphical mode that displays images, by runnning 'links -g'.
And no javascript.
Thanks for responding. I just tried links. After a few
attempts it started and gave me an error message that
I hadn't configure the mouse, and then it crashed my
system.

It seems that links uses svgalib and must be trying to
display a mode that my graphics card no longer supports.
I forget which mode I chose, but I think it was
640x480x256. I don't know for sure, but I suspect that
svgalib is no longer supported.

Years ago I looked into svgalib and I really had to
admire the guy supporting it. The guy had a patchwork
of code to support the various video cards.

Sylvain Robitaille
2019-06-14 16:57:50 UTC
Permalink
Post by root
http://www.google.com/search?q=blood+pressure
Ok ...
Post by root
Under Chrome a link to NIH appears near the top.
I'll take your word for it ...
Post by root
Under either lynx or w3m the link for NIH
[21]High Blood Pressure - National Institute on Aging - NIH
and is not a link to a URL.
I'm afraid that I don't see what you mean by "is not a link to a URL":

Link that you currently have selected

Linkname: High Blood Pressure - National Institute on Aging - NIH
URL:
http://www.google.com/url?q=https://www.nia.nih.gov/health/high-
blood-pressure&sa=U&ved=2ahUKEwi9m9rWtuniAhUDT98KHZkUBZU4ChAWMAR
6BAgAEAI&usg=AOvVaw1vq3V0zqjga9hu7QI0XckJ
Post by root
None of what come up as links do so under either
lynx or w3m.
Seems to work perfectly well for me:

: charlotte[syl] ~; cat /etc/slackware-version
Slackware 14.2
: charlotte[syl] ~; lynx --version
Lynx Version 2.8.8rel.2 (09 Mar 2014)
libwww-FM 2.14, SSL-MM 1.4.1, OpenSSL 1.0.2r, ncurses 5.9.20141206(wide)
Built on linux-gnu May 30 2017 12:31:07

Copyrights held by the Lynx Developers Group,
the University of Kansas, CERN, and other contributors.
Distributed under the GNU General Public License (Version 2).
See http://lynx.isc.org/ and the online help for more information.

See http://www.openssl.org/ for information about OpenSSL.

(this is the Lynx shipped with Slackware)
Post by root
If you have a console work-around to access google
search please let me know.
Don't need. The link I get seems to work just fine. Sorry I can't
help more than that, but perhaps the extra data point will still be
useful to you.
--
----------------------------------------------------------------------
Sylvain Robitaille ***@encs.concordia.ca

Systems analyst / AITS Concordia University
Faculty of Engineering and Computer Science Montreal, Quebec, Canada
----------------------------------------------------------------------
root
2019-06-14 19:44:36 UTC
Permalink
Post by Sylvain Robitaille
Post by root
None of what come up as links do so under either
lynx or w3m.
If you have a console work-around to access google
search please let me know.
Don't need. The link I get seems to work just fine. Sorry I can't
help more than that, but perhaps the extra data point will still be
useful to you.
Thanks for responding. I should have made clear that I was
running Lynx under console mode, with mouse action enabled.
The links work if you tab down to them, but do not work
if you click on them with the mouse.

Under w3m the links appear as blank lines, but if you
click on the blank lines you get to the site.

After I got used to the behavior it is perfectly useable.
Loading...