user3034801 February 2016

Regex to find spaces between quotes in Graylog

Working on an input extractor issue with IIS logs using an "advanced" IIS login tool to collect more than the basic logs provide. It's adding double quotes and spaces to many of the fields and we are trying to us the extractor to correct this. This is the beginning of an example message:

2016-02-08 16:46:35.957 "SITE" "SOURCE" XX.XX.XX.XX GET /blah/etc/etc/file.ext - 80 - "XX.XX.XX.XX" "HTTP/1.1" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; yie11; rv:11.0) like Gecko"

We've already written an extractor to remove all the added quotes before running it through all the other extractors to populate the fields, etc., but we want to replace all spaces between the quotes with + before we do that to match the old logging style.

Can anyone point us in the right direction for this? The closest I've come so far is catching " " between SITE and SOURCE and replacing that using something like "([\s]*)". Result:

2016-02-08 16:46:35.957 "SITE+SOURCE" XX.XX.XX.XX GET /blah/etc/etc/file.ext - 80 - "XX.XX.XX.XX+HTTP/1.1+Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; yie11; rv:11.0) like Gecko"

I can't seem to only look for spaces between the quotes.

Any help would be greatly appreciated. Thanks.


Further Clarification. This portion of the string:

"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; yie11; rv:11.0) like Gecko"

Should be:

"Mozilla/5.0+(Windows+NT+6.1;+WOW64;+Trident/7.0;+yie11;+rv:11.0)+like+Gecko"

Everything else should remain the same as those are the only spaces inside of a quoted section of the string.

Is this even possible with regex?

Answers


tobias_k February 2016

I'm afraid that regular expressions are not the best tool for this. You basically have to "count" quotes to determine whether a space is within quotes or not.

You can try something like this (Python):

text = '2016-02-08 16:46:35.957 "SITE" "SOURCE" XX.XX.XX.XX GET /blah/etc/etc/file.ext - 80 - "XX.XX.XX.XX" "HTTP/1.1" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; yie11; rv:11.0) like Gecko"'
escaped = ""
count = 0
for c in text:
    if c == '"':
        count += 1
    if c == " " and count % 2 == 1:
        escaped += "+"
    else:
        escaped += c

Afterwards, escaped is this:

2016-02-08 16:46:35.957 "SITE" "SOURCE" XX.XX.XX.XX GET /blah/etc/etc/file.ext - 80 - "XX.XX.XX.XX" "HTTP/1.1" "Mozilla/5.0+(Windows+NT+6.1;+WOW64;+Trident/7.0;+yie11;+rv:11.0)+like+Gecko"

Post Status

Asked in February 2016
Viewed 1,898 times
Voted 11
Answered 1 times

Search




Leave an answer