chopper draw lion4 February 2016

Matching special characters with Regex lookahead

I am receiving messages which have underscores:

  _omgitworks_

However, when these messages are formatted by our HTML wire formatter we receive them with backslashes infront of the underscores:

  \_omgitworks\_

I created the following regex to capture the backslashes and remove them so the text tokenizes correctly:

rawInput.replace(/\\([_])/g, '$1');

However, there is an edge case of when a user italicizes text. When the user utalicizes text, we receive the messages with NO backslashes but with underscores - and we want to remove the underscores. This is what the received text looks like:

_omgitworks_

I am trying to design a regex which matches a backslash followed by an underscore and then replaces the backslash (but not the underscore) OR if there is only a underscore with no bachslash, it only matches the underscore.

I tried to implement this using lookaheads:

var regex = /\\(?=_)|_/g;
var string = '\\_omgitworks\\_'
string.replace(regex, '')
>>> "omgitworks"

But it is removing the backslash AND the underscore instead of just the underscore. Are there any nuances to lookaheads that I am overlooking?

Answers


Wiktor Stribi┼╝ew February 2016

You can use the following regex:

\\(_)|(^|[^\\])_

And replace with $1$2.

See regex demo

Explanation:

  • \\(_) - the first alternative matching \ followed with a _ (Capture group 1)
  • | - or...
  • (^|[^\\])_ - another alternative matching start of string or any character other than \ (captured into Group 2) followed with an underscore.

In the relacement part, we restore the captures using the backreferences. In JS, failed groups are always pre-populated with an empty string, so it is safe to use even if the capture groups happen to be empty.

var re = /\\(_)|(^|[^\\])_/g; 
var str = '_omgitworks_\n\\_omgitworks\\_';
var result = str.replace(re, '$1$2');
document.body.innerHTML = "<pre>"+ result + "</pre>";

Post Status

Asked in February 2016
Viewed 1,413 times
Voted 8
Answered 1 times

Search




Leave an answer