Chapter 02: Semigroups

In chapter one we learned about TypeClasses, now we'll see what we can do with them. From this chapter on, we will learn some algebraic structures that are easy to learn and gives us a basic understanding of algebraic structures. We'll start with one of the most basic structures: Semigroup.

Definition

Semigroup is one of the most simple algebraic structure that we'll work with. Since we don't want to delve into Category Theory, suffice to say that most of these algebraic structures (especially the ones with strange names like SEMIGROUP!!!) are borrowed from Category Theory.

A Semigroup knows how to combine two instances of specific type into an instance of the same type: T, T => T. Mathematically speaking, a Semigroup is an algebraic structure that has a set with a binary associative function defined on that set.

Associativity means that changing order of parathesis must not affect the result, for example + and * are associative, : (a + b) + c == a + (b + c) == a + b + c. but - is not.

For example (in mathematics) the set of Integers and the function add form a semigroup, because add is both binary and associative. An also the set of Integers and the function multiply form a semigroup too.

In programming, we can translate it to: A semigroup is a TypeClass defined for a type T with an associative function def combine(T, T): T.

Semigroup TypeClass

Ok, it's a little scary, so let's speak in our language!

trait Semigroup[T] {
    def combine(left: T, right: T): T
}

This is the TypeClass for semigroup. it has a binary function (with two parameters). Note that the combine should be associative, but at this moment there is no way to guaranty associativity at the compile-time (at least not that I know of) and we have to ensure it with design and tests.

Let's implement a few Semigroups to make it clear.

  implicit val intMulSemigroup = new Semigroup[Int] {
    def combine(left: Int, right: Int): Int = left * right
  }

  implicit val stringSemigroup = new Semigroup[String] {
    def combine(left: String, right: String): String = left + right
  }

  type IntVector = (Int, Int)
  implicit val intVectorSemigroup = new Semigroup[IntVector] {
    def combine(left: IntVector, right: IntVector): IntVector =
      (left._1 + right._1, left._2 + right._2)
  }

I think everything is more clear now. Note that Semigroup is a TypeClass and these are some instances. Can you implement Syntax and Ops classes for Semigroup to make it easier to use ?

// Exercise 2.1: Implement Syntax and Ops classes for Semigroup so that you can test it like this:
  "SemigroupSyntax" should "provide an easy way to combine values" in {
    import SemigroupSyntax._
    import BasicSemigroupInstances._

    2 |+| 3 shouldBe 6
    "Hello" |+| " " |+| "World!" shouldBe "Hello World!"
    (1, 1) |+| (2, 3) shouldBe (3, 4)
  }

Checkout the answer and also the source code for tests.

Semigroup with subtype polymorphism

If you think that the title is a little bit scary, it's not. You may know subtype polymorphism as generics, or parametric types, and here we want to know how to use Semigroup for parametric types. To better understand this concept and see the true power of abstractions using Semigroups, let's walk through an example. Remember IntVector from our previous sample? It's fairly simple, we can treat any Tuple2 of integers as a vector (we call them IntVector) and we can add two IntVectors together. But in this model, we're kind of stuck with integers. What if we want Vectors of Long ints ? or Doubles? If we want to add two vectors, what types can we use other than Int, Long, and Double? Think about it. When we're talking about vectors and adding them, we can use any type, as long as we can add two instances of the type together in a consistent way with rules regarding to add vectors. For example, think of string vectors ! ('a' , 'b') + ('c' , 'd') => ('ac' , 'bd') . This way we can easily have string vectors, BigInt vectors or anything else! So how do we implement this way of thinking ? Using Semigroup TypeClass! Let's rewrite the previous sample in this way:

  type Vec[T] = (T, T)
  implicit def vectorSemigroup[T](implicit semigroup: Semigroup[T]): Semigroup[Vec[T]] = new Semigroup[Vec[T]] {
    override def combine(left: (T, T), right: (T, T)): (T, T) =
      (semigroup.combine(left._1, right._1), semigroup.combine(left._2, right._2))
  }

or even nicer!

  import SemigroupSyntax._
  type Vec[T] = (T, T)

  implicit def vectorSemigroup[T: Semigroup]: Semigroup[Vec[T]] = new Semigroup[Vec[T]] {
    override def combine(left: (T, T), right: (T, T)): (T, T) =
      (left._1 |+| right._1 , left._2 |+| right._2)
  }

Ok let's break it down piece by piece. First we define a type alias Vec[T], in Scala, type aliases are just a syntactic sugar to make working with complex types a little easier. a type alias doesn't do anything specific, it's just a reference to a definition, which means that you can replace it with the definition wherever you see it. It this example, wherever you see Vec[T], you can replace it with (T, T), that's it.

Next we define a Semigroup instance for type Vec[T]. This function creates an instance of Semigroup for Vec[T] if you provide a Semigroup instance for type T. Which means for every type T, if you can combine two instances of T, it can combine two vectors of T, and of course combining two instances of T we need a Semigroup[T].

We've implemented the answer in two styles, which have two differences: 1. The way to pass implicit instance for Semigroup[T]: The first style uses direct implicit parameter, the second style uses Haskell-like TypeClass style : [T: Semigroup] 2. Implementation: The first style directly uses the provided semigroup to combine two instances of T. The second style uses SimegroupSyntax and SemigroupOps which is easier to read and understand.

Time for exercise! Implement an instance for Option, Note that in the following test we use some and none syntactic sugar for option (10.some) which we implemented in previous chapter, try to implement it again. Beware of the typing, 10.some should produce the type Option[Int] not Some[Int], For example, if you type val o = 10.some and checkout the type inferred for o, you'll see Option[Int]. Here's the syntactic sugar for none[Int]:

  def none[T]: Option[T] = None

Note that this function produces Option type as well. Can you guess the reason ? It's because we want to use these values with semigroup and thus we need implicit conversion and implicit parameters, and implicit resolution in Scala compiler is not comfortable with using Option and Some interchangeably.

// Exercise 2: Implement an instance for Semigroup[Option[T]], so that your implementation satisfies the following test

  "Option Semigroup" should "combine two options" in {
    import BasicSemigroupInstances._
    import SemigroupSyntax._
    import Ex2._

    (10.2, 11.3).some |+| (12.3, 13.5).some shouldBe (22.5, 24.8).some

    10.some |+| 20.some shouldBe 200.some

    "Something".some |+| none[String] shouldBe none[String]
    none[String] |+| "Something".some shouldBe none[String]

    none[Int] |+| none[Int] shouldBe none[Int]

  }

Checkout the answer and also the source code for tests.

Easy, right? let's harder examples!

// Exercise 3: Implement an instance for Semigroup[Future[T]] and another one for lists!

  "Semigroup[Future]" should "combine values inside futures" in {
    import Ex3._
    import SemigroupSyntax._
    import BasicSemigroupInstances._
    import scala.concurrent.ExecutionContext.Implicits.global
    import scala.concurrent.duration._

    val future = Future.successful(12 -> 24) |+| Future.successful(23 -> 42)
    Await.result(future, 2.seconds) shouldBe (35, 66)

    List(1, 2, 3) |+| List(4, 5, 6) shouldBe (1 to 6).toList
  }

Checkout the answer and also the source code for tests.

More applications

Despite being simple, Semigroups have many powerful applications. One of the most useful applications of Semigroups is a concept called Reduction. Reduction is the process of combining many objects into a single one. For example, when you have a list of integers, you can reduce it into a single one by summing (or multiplying) all of them. Reduction is one of the core functions in Map-Reduce (hence the name) which has powered the big data engines for years. Note that you can't use just any T, T => T as a reduction function. Think for a minute, what makes Semigroup so special? What is the main difference between a Semigroup and the little binary function T, T => T? Why does it matter so much in the big data world? Try to guess the answer before reading the next passage.

If you look back at the Semigroup's definition above, the only thing that differentiates between a simple binary function and a semigroup is associativity! That's right!! Why on earth do you ask? Because when our binary operation is associative, it means that we can run our operations in parallel. Let's say we want to reduce 1, 2, 3, 4 with + (e.g. you can't use - because it's not associative and the result changes by changing the order of execution, try it), since summing is associative we can sum 1,2 in one thread and 3,4 in another thread in parallel and then reduce it one more time to form the final result, the reduction process looks something like this

1, 2, 3, 4
    =>
  3 , 7
    =>
    10

Our tiny list of four numbers may look innocent and you might ask why bothering summing them in parallel, but when you have millions and millions and millions of numbers that won't even fit in a single server, the reduction process is inevitable. And the good news is with Semigroup, it doesn't matter what is the operation or what is the type, you can reduce anything as long as you can define a Semigroup instance for it!

Ok, enough talk! let's test it. Try to implement a simple (non-parallel) reduction operation that reduces a list, the result is optional since we can't reduce an empty list. Note that Scala's default implementation of reduce returns a concrete result instead of an optional one and throws an execption for empty inputs.

  // Exercise 4: Implement reduce function:
  // def reduce[T: Semigroup](list: List[T]): Option[T]
  "reduce" should "return None if the list is empty" in {
    reduce(List[Int]()) shouldBe none[Int]
  }

  it should "return the head if the list has just one element" in {
    reduce(List("Test")) shouldBe "Test".some
    reduce(List(12)) shouldBe 12.some
  }

  it should "sum all elements in a long list" in {
    val term = "Hello world!"
    val list = term.toList.map(_.toString)
    reduce(list) shouldBe term.some
    // Note that the Semigroup[Int] that we've imported here multiplies numbers
    reduce(1 :: 2 :: 3 :: 4 :: 5 :: Nil) shouldBe 120.some

Checkout the answer and also the source code for tests.

Scalaz

Like everything else in this book, Scalaz offers great implementations for Semigroup and many useful instances, you just have to import it all!

import scalaz._
import Scalaz._

Have fun!

Last updated

Was this helpful?