I have the following json input:
{
"url": "https://www.example.com",
"html": "<html>...</html>"
}
How can I use jq to extract all JavaScript <script> tags from html using pup?
For an example, I can extract all scripts I want from a single html using a pipe call:
cat example.json | jq -r .html | pup 'script[type="text/javascript"] text{}'
I would like to put all of these extracted scripts in a new resulting json:
{
"url": "https://www.example.com",
"scripts": [
"<script>...",
"<script>..."
]
}
If I try using:
jq -c '{url: .url, scripts: [.html | pup "script[type=text/javascript] text{}"]}'
it will not work because pup is an external command and not part of jq.
How can I achieve this?
It's not possible directly from jq (jq cannot call external programs from within a jq program). But if your input only contains a single object with those two properties, the following should work in POSIX shells:
{
jq '{url}' input.json;
jq -r '.html' input.json | pup ... | jq -Rs '{scripts: .}';
} | jq -s 'add'
It's also possible to invoke jq with --arg – which still invokes jq 2 times and reads your input twice:
jq --arg scripts "$(jq -r '.html' input.json | pup ... )" \
'{url, $scripts}' input.json
Sample output:
{
"url": "http://example.com",
"scripts": "...."
}
Use json{} instead of text{} to enable post-processing with jq. For example:
jsonfile='example.json'
jq '.scripts = (input | map(.text // empty))' "$jsonfile" <(
jq -r '.html' "$jsonfile" | pup 'script[type=text/javascript] json{}'
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With