Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wget page title

Tags:

shell

wget

Is it possible to Wget a page's title from the command line?

input:

$ wget http://bit.ly/rQyhG5 <<code>>

output:

If it’s broke, fix it right   - Keeping it Real Estate. Home
like image 735
Mr. Demetrius Michael Avatar asked Nov 23 '25 23:11

Mr. Demetrius Michael


1 Answers

This script would give you what you need:

wget --quiet -O - http://bit.ly/rQyhG5 \
  | sed -n -e 's!.*<title>\(.*\)</title>.*!\1!p'

But there are lots of situations where it breaks, including if there is a <title>...</title> in the body of the page, or if the title is on more than one line.

This might be a little better:

wget --quiet -O - http://bit.ly/rQyhG5 \
  | paste -s -d " "  \
  | sed -e 's!.*<head>\(.*\)</head>.*!\1!' \
  | sed -e 's!.*<title>\(.*\)</title>.*!\1!'

but it does not fit your case as your page contains the following head opening:

<head profile="http://gmpg.org/xfn/11">

Again, this might be better:

wget --quiet -O - http://bit.ly/rQyhG5 \
  | paste -s -d " "  \
  | sed -e 's!.*<head[^>]*>\(.*\)</head>.*!\1!' \
  | sed -e 's!.*<title>\(.*\)</title>.*!\1!'

but there is still ways to break it, including no head/title in the page.

Again, a better solution might be:

wget --quiet -O - http://bit.ly/rQyhG5 \
  | paste -s -d " "  \
  | sed -n -e 's!.*<head[^>]*>\(.*\)</head>.*!\1!p' \
  | sed -n -e 's!.*<title>\(.*\)</title>.*!\1!p'

but I am sure we can find a way to break it. This is why a true xml parser is the right solution, but as your question is tagged shell, the above it the best I can come with.

The paste and the 2 sed can be merged in a single sed, but is less readable. However, this version has the advantage of working on multi-line titles:

wget --quiet -O - http://bit.ly/rQyhG5 \
  | sed -n -e 'H;${x;s!.*<head[^>]*>\(.*\)</head>.*!\1!;T;s!.*<title>\(.*\)</title>.*!\1!p}'

Update:

As explain in the comments, the last sed above uses the T command which is a GNU extension. If you do not have a compatible version, you can use:

wget --quiet -O - http://bit.ly/rQyhG5 \
  | sed -n -e 'H;${x;s!.*<head[^>]*>\(.*\)</head>.*!\1!;tnext;b;:next;s!.*<title>\(.*\)</title>.*!\1!p}'

Update 2:

As above still not working on Mac, try:

wget --quiet -O - http://bit.ly/rQyhG5 \
  | sed -n -e 'H;${x;s!.*<head[^>]*>\(.*\)</head>.*!\1!;tnext};b;:next;s!.*<title>\(.*\)</title>.*!\1!p'

and/or

cat << EOF > script
H
\$x
\$s!.*<head[^>]*>\(.*\)</head>.*!\1!
\$tnext
b
:next
s!.*<title>\(.*\)</title>.*!\1!p
EOF
wget --quiet -O - http://bit.ly/rQyhG5 \
  | sed -n -f script

(Note the \ before the $ to avoid variable expansion.)

It seams that the :next does not like to be prefixed by a $, which could be a problem in some sed version.

like image 171
jfg956 Avatar answered Nov 25 '25 14:11

jfg956