Kaleidoscope

Statically-checked inline pattern matching on regular expressions

About

Kaleidoscope is a small library to make pattern matching against strings more pleasant. Regular expressions can be written directly in patterns, and capturing groups bound directly to variables, typed according to the group's repetition. Here is an example:

case class Email(user: Text, domain: Text)

email match
  case r"$user([^@]+)@$domain(.*)" => Email(name, domain)

Strings are widely used to carry complex data, when it's wiser to use structured objects. Kaleidoscope makes it easier to move away from strings.

Features

Availability

Kaleidoscope is available as a binary for Scala 3.4.0 and later, from Maven Central. To include it in an sbt build, use the coordinates:

libraryDependencies += "dev.soundness" % "kaleidoscope-core" % "0.1.0"

Getting Started

Kaleidoscope is included in the kaleidoscope package, and exported to the soundness package.

To use Kaleidoscope alone, you can include the import,

import kaleidoscope.*

or to use it with other Soundness libraries, include:

import soundness.*

Note that Kaleidoscope uses the Text type from Anticipation and the Optional type from Vacuous. These offer some advantages, but they can be easily converted: Text#s converts a Text to a String and Optional#option converts an Optional value to its equivalent Option. The necessary imports are show in the examples.

You can then use a Kaleidoscope regular expression—a string prefixed with the letter r—anywhere you can pattern match against a string in Scala. For example,

import anticipation.Text

def describe(path: Text): Unit =
  path match
    case r"/images/.*" => println("image")
    case r"/styles/.*" => println("stylesheet")
    case _             => println("something else")

or,

import vacuous.{Optional, Unset}

def validate(email: Text): Optional[Text] = email match
  case r"^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,6}$$" => email
  case _                                            => Unset

Such patterns will either match or not, however should they match, it is possible to extract parts of the matched string using capturing groups. The pattern syntax is exactly as described in the Java Standard Library, with the exception that a capturing group (enclosed within ( and )) may be bound to an identifier by placing it, like an interpolated string substitution, immediately prior to the capturing group, as $identifier or ${identifier}.

Here is an example of using a pattern match against filenames:

enum FileType:
  case Image(text: Text)
  case Stylesheet(text: Text)

def identify(path: Text): FileType = path match
  case r"/images/${img}(.*)"  => FileType.Image(img)
  case r"/styles/$styles(.*)" => FileType.Stylesheet(styles)

Alternatively, as with patterns in general, this can be extracted directly in a val definition.

Here is an example of matching an email address:

val r"^[a-z0-9._%+-]+@$domain([a-z0-9.-]+\.$tld([a-z]{2,6}))$$" =
  "test@example.com": @unchecked

The @unchecked annotation ascribed to the result is standard Scala, and acknowledges to the compiler that the match is partial and may fail at runtime.

If you try this example in the Scala REPL, it would bind the following values:

> domain: Text = t"example.com"
> tld: Text = t"com"

In addition, the syntax of the regular expression will be checked at compile-time, and any issues will be reported then.

Repeated and optional capture groups

A normal, unitary capturing group, like domain and tld above, will extract into Text values. But if a capturing group has a repetition suffix, such as * or +, then the extracted type will be a List[Text]. This also applies to repetition ranges, such as {3}, {2,} or {1,9}.

Note that {1} will still extract a Text value. The type is determined statically from the pattern, and not dynamically from the runtime scrutinee.

A capture group may be marked as optional, meaning it can appear either zero or one times. This will extract a value with the type Optional[Text]; that is, if it present it will be a Text value, and if not, it will be Unset.

For example, see how init is extracted as a List[Text], below:

import gossamer.{drop, Rtl}

def parseList(): List[Text] = "parsley, sage, rosemary, and thyme" match
  case r"$only([a-z]+)"                      => List(only)
  case r"$first([a-z]+) and $second([a-z]+)" => List(first, second)
  case r"$init([a-z]+, )*and $last([a-z]+)"  => init.map(_.drop(2, Rtl)) :+ last

Escaping

Note that inside an extractor pattern string, whether it is single- (r"...") or triple-quoted (r"""..."""), special characters, notably \, do not need to be escaped, with the exception of $ which should be written as $$.

It is still necessary, however, to follow the regular expression escaping rules, for example, an extractor matching a single opening parenthesis would be written as r"\(" or r"""\(""".

Globs

Globs offer a simplified and limited form of regular expression. You can use these in exactly the same way as a standard regular expresion, using the g"..." interpolator instead.

License

Kaleidoscope is copyright © 2024 Jon Pretty & Propensive OÜ, and is made available under the Apache 2.0 License.