svg { background: rgba(255, 255, 255, 0.0); }
4. Testing
image/svg+xml
4. Testing
Zidarics Zoltán
zamek@vili.pmmf.hu
2020
Regular expression
Embedded
Programming
The problem
describing a certain amount of text
validating a text
manipulating a text
find a word which are beginning with pr and ending ted
is an email/telephone number etc. valid?
change word orders like this: final <-> static
Coding
MQTT
DEMO
wildcards
dot (.) metacharacter
matching specific character
matching number only
match any single character
character classes [] [^]
numbers are [0-9] or \d
not numbers [^0-9] \D
d.g ->
dog dig
doing
d[io]g ->
dog dig
dag
doing
d[^ia] ->
dug dog
dig dag
d[0-9]g ->
d0g d1g
dag
doing
d[^0-9]g ->
dug dog
d0g d9g
d\dg ->
d0g d1g
dag
doing
d\Dg ->
dug dog
d0g d9g
matching alpha only
alphas are [a-zA-Z]
not alpha [^a-zA-Z]
d[a-zA-Z]g ->
dog dag
d0g
doing
d[^a-zA-Z]g ->
dug dog
d0g doing
matching whitespaces only
whitespaces are [ \t\r\n] or \s
not whitespaces [^\t\r\n] \S
d[ ]g ->
d g d g
dog
doing
d[^ ]g ->
dug dog
d g
d\sg ->
d g d g
d0g
doing
d\Sg ->
d g
dog dag
special characters
\. \[ \] \/ \^ \$ \| \? \+ \* \( \) \{ \} \\
matching word only
d\wg ->
dog dag
d0g
doing
d\Wg ->
d0g d9g
dog dag
words are \w
not words \W
negative look ahead
provides the possibility to exclude a pattern (?!)
a(?!b) ->
ap
ple abbey
Repetitions
* zero or more repetitions
+ one or more repetitions
? zero or 1 repetitions
d.*g ->
dog dig
doing
{m} m repetitions
belongs to the previous pattern
d.+g ->
dog dig
doing
dg
d.?g ->
dog dig
dg
doing
d.{1}g ->
dog dig
dg
doing
{m,n} m to n repetitions
d.{1,3}g ->
dog dig
doing
dg
Watch out for the greediness!
<.+> -> this is a
<b>bold</b>
text
<[^>]> -> this is a
<b>
bold
</b>
text
avoid greediness: ? after pattern:
.+? .*? .{2,6}?
<.+?> -> this is a
<b>
bold
</b>
text
Anchors
$ end of line
^d.*g ->
dog eats chicken
^d.*g ->
The
dog eats chicken
^ start of line
.*n$ ->
dog eats chicken
.*n$ ->
The
dog eats meal
Empty line pattern: ^$
Start of string: \A
never matches at end of the string
End of string: \Z
never matches at the start of the string
never matches at line breaks
never matches at line breaks
zero length matches
Word boundary: \b
Before the first character in the string, if the first character is a word character.never matches at line breaks
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
\b ->
|
dog
|
|
d
|
|
g
|
|
d_g
|
|
d
||
!
|
g
|
Capture groups
Any subpattern inside a pair of parentheses will be captured
as a group
Regular expressions allow us to not just match text but also
to extract information for further processing.
replace: \2 \1 -> private static final String KEY_ZAPHOD="Zaphod";
find: (final) (static) -> private
final static
String KEY_ZAPHOD="Zaphod";
you can use the | (logical OR, aka. the pipe) to denote different
possible sets of characters.
replace: \2 \1 -> private static final String KEY_ZAPHOD="Zaphod";
Buy more (milk|bread) -> Buy more milk. Buy more bread. Buy more dogfood.
Regex tutorials
Regular expressions tutorial
Text processing with AWK
AWK
BEGIN block
Text file processing
The BEGIN block gets executed at program start-up. It executes only once.
START
START
Execute BEGIN block
Execute BEGIN block
read line from input stream
read line from input stream
execute AWK commnds on a line
execute AWK commnds on a line
Execute END block
Execute END block
EOF
EOF
Viewer does not support full SVG 1.1
This is good place to initialize variables.
BEGIN is an AWK keyword and hence it must be in upper-case.
This block is optional.
Syntax
BEGIN { awk-commands }
Body block
The body block applies AWK commands on every input line.
By default, AWK executes commands on every line.
We can restrict this by providing patterns.
there are no keywords for the Body block.
Syntax
/pattern/ { awk-commands }
End block
The END block executes at the end of the program.
END is an AWK keyword and hence it must be in upper-case.
This block is optional.
1
Main
The problem
Wildcards 1
Wildcards 2
Repetitions
Anchors
Capture groups
AWK 1
AWK 2
AWK 3
Tutorials
End