svg { background: rgba(255, 255, 255, 0.0); }4. Testing image/svg+xml 4. Testing Zidarics Zoltánzamek@vili.pmmf.hu 2020 Regular expression EmbeddedProgramming The problem describing a certain amount of text validating a text manipulating a text find a word which are beginning with pr and ending ted is an email/telephone number etc. valid? change word orders like this: final <-> static Coding MQTT DEMO wildcards dot (.) metacharacter matching specific character matching number only match any single character character classes [] [^] numbers are [0-9] or \d not numbers [^0-9] \D d.g -> dog dig doing d[io]g -> dog dig dag doing d[^ia] -> dug dog dig dag d[0-9]g -> d0g d1g dag doing d[^0-9]g -> dug dog d0g d9g d\dg -> d0g d1g dag doing d\Dg -> dug dog d0g d9g matching alpha only alphas are [a-zA-Z] not alpha [^a-zA-Z] d[a-zA-Z]g -> dog dag d0g doing d[^a-zA-Z]g -> dug dog d0g doing matching whitespaces only whitespaces are [ \t\r\n] or \s not whitespaces [^\t\r\n] \S d[ ]g -> d g d g dog doing d[^ ]g -> dug dog d g d\sg -> d g d g d0g doing d\Sg -> d g dog dag special characters \. \[ \] \/ \^ \$ \| \? \+ \* \( \) \{ \} \\ matching word only d\wg -> dog dag d0g doing d\Wg -> d0g d9g dog dag words are \w not words \W negative look ahead provides the possibility to exclude a pattern (?!) a(?!b) -> apple abbey Repetitions * zero or more repetitions + one or more repetitions ? zero or 1 repetitions d.*g -> dog dig doing {m} m repetitions belongs to the previous pattern d.+g -> dog dig doing dg d.?g -> dog dig dg doing d.{1}g -> dog dig dg doing {m,n} m to n repetitions d.{1,3}g -> dog dig doing dg Watch out for the greediness! <.+> -> this is a <b>bold</b> text <[^>]> -> this is a <b>bold</b> text avoid greediness: ? after pattern:.+? .*? .{2,6}? <.+?> -> this is a <b>bold</b> text Anchors $ end of line ^d.*g -> dog eats chicken ^d.*g -> The dog eats chicken ^ start of line .*n$ -> dog eats chicken .*n$ -> The dog eats meal Empty line pattern: ^$ Start of string: \A never matches at end of the string End of string: \Z never matches at the start of the string never matches at line breaks never matches at line breaks zero length matches Word boundary: \b Before the first character in the string, if the first character is a word character.never matches at line breaks After the last character in the string, if the last character is a word character. Between two characters in the string, where one is a word character and the other is not a word character. \b -> |dog| |d| |g| |d_g| |d||!|g| Capture groups Any subpattern inside a pair of parentheses will be captured as a group Regular expressions allow us to not just match text but also to extract information for further processing. replace: \2 \1 -> private static final String KEY_ZAPHOD="Zaphod"; find: (final) (static) -> private final static String KEY_ZAPHOD="Zaphod"; you can use the | (logical OR, aka. the pipe) to denote different possible sets of characters. replace: \2 \1 -> private static final String KEY_ZAPHOD="Zaphod"; Buy more (milk|bread) -> Buy more milk. Buy more bread. Buy more dogfood. Regex tutorials Regular expressions tutorial Text processing with AWK AWK BEGIN block Text file processing The BEGIN block gets executed at program start-up. It executes only once. START START Execute BEGIN block Execute BEGIN block read line from input stream read line from input stream execute AWK commnds on a line execute AWK commnds on a line Execute END block Execute END block EOF EOF Viewer does not support full SVG 1.1 This is good place to initialize variables. BEGIN is an AWK keyword and hence it must be in upper-case. This block is optional. Syntax BEGIN { awk-commands } Body block The body block applies AWK commands on every input line. By default, AWK executes commands on every line. We can restrict this by providing patterns. there are no keywords for the Body block. Syntax /pattern/ { awk-commands } End block The END block executes at the end of the program. END is an AWK keyword and hence it must be in upper-case. This block is optional.
1
  1. Main
  2. The problem
  3. Wildcards 1
  4. Wildcards 2
  5. Repetitions
  6. Anchors
  7. Capture groups
  8. AWK 1
  9. AWK 2
  10. AWK 3
  11. Tutorials
  12. End