What is the easiest approach to pick all text between two tags, such as all of the ‘pre>’ tags on the page?
Asked by basheps
use “pre>(.*?)/pre>” (replace pre with whatever text you wish) to extract the first group (for more detailed instructions provide a language), although this assumes you have extremely simple and acceptable HTML.
If you’re doing something complicated, as several commenters have advised, use an HTML parser.
Answered by PyKing
Another line can be used to finish the tag. This is why n must be included.
Answered by zac
Basically, it accomplishes the following:
(?=(pre>)) The selection must be preceded by the pre> tag.
(w|d|n|[().,-:;@#$ percent &*”‘+–/®°0!?|]|) This is simply a regular expression that I’d want to use. In this case, it chooses a letter, a numeric, a newline character, or some of the special characters given in square brackets in the sample. The pipe symbol | stands for “or.”
+? Plus character states to select one or more of the above – order does not matter. Question mark changes the default behavior from ‘greedy’ to ‘ungreedy’.
(?=()) The /pre> tag must be applied to the selection.
Depending on your use case you might need to add some modifiers like (i or m)
This search was done in Sublime Text so that I didn’t have to utilize modifiers in my regex.
Answered by DevWL
To remove the delimiting tags, type:
(?=pre>) searches for text after the pre>.
(?=/pre>) checks for text before/before/before/before/before/before/before/before/before/before/before/before/before/before/before/before
The text in the pre tag will be the results.
Answered by Jean-Simon Collard
To get content between elements, use the pattern below. Replace [tag] with the name of the element from which you want to extract the content.
When tags have attributes, such as the href attribute on an anchor tag, use the pattern below.
Answered by Shravan Ramamurthy
Post is based on https://stackoverflow.com/questions/7167279/regex-select-all-text-between-tags