I don't think I understand how to return only the matched regular expression. I have a file that is a webpage. I'm trying to get all the links in the page. The regex works fine. But if I printf it out it will print out the line in which that match occurs. I only want to display the match only. I see you can do grouping so I tried that and am getting back an int value for my second printf call. According to the doc it is an offset. But offset to what? It doesn't seem to be accurate either because it would say 32 when character 32 on that line has nothing to do with the regex. I put in an exit just see the first match. Where am I going wrong?
char line[1000];
FILE *fp_original;
fp_original = fopen (file_original_page, "r");
regex_t re_links;
regmatch_t group[2];
regcomp (&re_links, "(href|src)=[\"|'][^\"']*[\"|']", REG_EXTENDED);
while (fgets (line, sizeof line, fp_original) != NULL) {
if (regexec (&re_links, line, 2, group, 0) == 0) {
printf ("%s", line);
printf ("%u\n", line[group[1].rm_so]);
exit (1);
}
}
fclose (fp_original);
regmatch_t array
regmatch_t is the matcharray that you pass to the regex call. If we pass 2 as the number of matches in regex we obtain in regmatch_t[0] the whole match and in regmatch_t[1] the submatch.
For instance:
size_t nmatch = 2;
regmatch_t pmatch[2];
rc = regex(&re_links, line, nmatch, pmatch, 0);
If this succeeded you can get the subexpression as follows:
pmatch[1].rm_eo - pmatch[1].rm_so, &line[pmatch[1].rm_so],
pmatch[1].rm_so, pmatch[1].rm_eo - 1);
Here is an example on how to apply the above:
#include <regex.h>
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
regex_t preg;
char *string = "I'm a link to somewhere";
char *pattern = ".*\\(link\\).*";
size_t nmatch = 2;
regmatch_t pmatch[2];
regcomp(&preg, pattern, 0);
regexec(&preg, string, nmatch, pmatch, 0);
printf("a matched substring \"%.*s\" is found at position %d to %d.\n",
pmatch[1].rm_eo - pmatch[1].rm_so, &string[pmatch[1].rm_so],
pmatch[1].rm_so, pmatch[1].rm_eo - 1);
regfree(&preg);
return 0;
}
Above code is certainly not save. It serves only as an example. If you exchange pmatch with your group it should work. Also don't forget to parenthesize the part of your regex you want to capture in your group --> \\(.*\\)
Edit
In order to avoid the warning by the compiler concerning the field precision, you can replace the whole printf part with this:
char *result;
result = (char*)malloc(pmatch[1].rm_eo - pmatch[1].rm_so);
strncpy(result, &string[pmatch[1].rm_so], pmatch[1].rm_eo - pmatch[1].rm_so);
printf("a matched substring \"%s\" is found at position %lld to %lld.\n",
result, pmatch[1].rm_so, pmatch[1].rm_eo - 1);
// later on ...
free(result);
the resulting match (your group) gives you a start index and an end index. you need to print just the items between those two indeces.
group[0] will be the entire regex match. the subsequent groups will be any captures you have in your regex.
for(int i = 0; i < re_links.re_nsub; ++i) {
printf("match %d from index %d to %d: ", i, group[i].rm_so, group[i].rm_eo);
for(int j = group[i].rm_so; j < group[i].rm_eo; ++j) {
printf("%c", line[j]);
}
printf("\n");
}
For a full example see my answer here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With