Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract links from an html page

I have an html page that has data like so:

<td><a href="test-2025-03-24_17-05.log">test-2025-03-24_17-05.log</a></td>
<td><a href="PASS_report_test_2025-03-24_17h07m10.html">PASS_report_test_2025-03-24_17h07m10.html</a></td>
<td><a href="TESTS-test_01.xml">TESTS-test_01.xml</a></td>
<td><a href="TESTS-test_02.xml">TESTS-test_02.xml</a></td>

I would like to extract the link 'PASS_report_test_2025-03-24_17h07m10.html'. The date and timestamp of the link changes depending on the day that the tests are run. However, the prefix substring 'PASS_report_' does not.

Expected output: PASS_report_test_2025-03-24_17h07m10.html

I tried the solution sed -n 's/.*href="\([^"]*\).*/\1/p' file

suggested here. But it didn't work i.e. Printing out the values of the variable that contained the links after parsing resulted null.

Any suggestions on how to extract the link?

Thank you in advance.

like image 671
Archie Avatar asked Oct 22 '25 15:10

Archie


1 Answers

OP has cut-n-pasted a sed solution from another Q&A but states that it didn't work which I take to mean that it generated all links, ie:

$ sed -n 's/.*href="\([^"]*\).*/\1/p' test.html
test-2025-03-24_17-05.log
PASS_report_test_2025-03-24_17h07m10.html
TESTS-test_01.xml
TESTS-test_02.xml

One idea for updating this sed solution to look for just the one link OP is interested in:

$ sed -n 's/.*href="\(PASS_report[^"]*\).*/\1/p' test.html
PASS_report_test_2025-03-24_17h07m10.html

If OP's html file is guaranteed to be nicely formatted as in the example then there are a slew of approaches that will also work, eg:

$ grep '"PASS_report' test.html | cut -d'"' -f2
PASS_report_test_2025-03-24_17h07m10.html

$ cut -d'"' -f2 test.html | grep '^PASS_report'
PASS_report_test_2025-03-24_17h07m10.html

$ awk -F'"' '$2~/^PASS_report/ {print $2}' test.html
PASS_report_test_2025-03-24_17h07m10.html

$ while IFS='"' read -r _ link _; do [[ "${link}" =~ PASS_report* ]] && { echo "${link}"; break; }; done < test.html
PASS_report_test_2025-03-24_17h07m10.html
like image 198
markp-fuso Avatar answered Oct 25 '25 05:10

markp-fuso



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!