Extract each occurrence of a string into a separate line to build a list of URLs

Tagged: find in files

Viewing 4 posts - 1 through 4 (of 4 total)

Author
Posts
March 29, 2022 at 4:28 pm #28131
Simon McMahon
Participant
Hi there, I would like to extract all occurrences of a URL string pattern (which can appear multiple times in a file) to build a list of all occurrences.
Currently I can identify each occurrence with the Find in files feature, but I would like the Extract feature to list each occurrence on a new line. Currently the feature lists each line that contains the string. And a line can contain the sting multiple times.
My goal is to get a list of the full URL that contains __data/assets/
In the below example __data/assets/ occurs 48 times.
However, the extract only 44 lines are extracted, but I need to output all 48 occurrences (the full URL).
I will be running this extract over 270 files in total.
View source of this example webpage:
https://www.walkerville.sa.gov.au/council/strategic-plans/2020-2024-living-in-the-town-of-walkerville-a-strategic-community-plan
March 30, 2022 at 10:29 am #28132
Patrick C
Participant
Partial answer
You probably cannot do it in one step, but you would* be able do this in two:
First extract
and then Find -> Extract with the following Regex*
/https:\/\/.*__data\/assets.*(?=")/gU
*there is a problem here:
The syntax above is not perl compatible
Does somebody know how to apply the /gU regex flag?
Would really appreciate this
March 30, 2022 at 10:52 am #28133
Patrick C
Participant
Figured it out:
First extract the lines via find in file
Then Find → Extract with Regex:
(?<=")[^"]*__data\/assets.*?(?=")
🙂
March 30, 2022 at 11:06 am #28134
Patrick C
Participant
Arghh, stupid me
You can do this in one step
Find in files → Extract with Regex:
(?<=")[^"]*__data\/assets.*?(?=")
With the extract option display matched strings only
😴
Author
Posts

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.