Chains: My Attempt at an Itertools for Go
Top Matter: GitHub for the library, doc for the library.
It’s been six months since I’ve done this, but I’m finally writing about it!
Go recently added proper iterator support to the language, which is something of an improvement over the prior pattern of spinning up a goroutine and communicating via a channel in a range
to get a stream of values.
One of the tools in my toolbox that I use in coding interviews and some light data processing work is Python’s itertools. The nice thing about this library is it gives you a good set of conceptual building blocks to use as a frame around a problem and a fairly clean way to use them. Once you’re familiar with, say, combinations
and permutations
you can take a harder problem and decompose it into those recognizable parts and then have a stdlib function that’s already bug free and readily available.
While I was inspired to author this, I was writing a lot of Ruby and Typescript. Both Ruby and Javascript do processing over lists in a very chainy way; for example
Object.entries(thing)
.filter(([key, value]) => key.startsWith("data-"))
.map(([key, value]) => `${key.replace(/^data-/, "")}: ${value}`);
Ruby likes to add lots of compact calls etc. as well to handle bad data.
Having this syntactic sugar makes it easier to write complex logic, and it also helps conform the logic to one’s brain.
Anyway, Go has iterators now, and I like using iterators. The first thing I wanted was my brain poisoning syntactic sugar from Ruby/Typescript; how could I go about doing X.Filter(g => g > 100).Map(h => fmt.Sprintf("Hello, %v!", h))
in Go?
The Cookbook is the Requirements Doc
Once I had a framework cooking I could start thinking of examples. My test suite took a backseat to being a test of cases I cared about, in cookbook form.
Some things I wanted:
- Map/Filter/Reduce
- Cleanups (compacts, nonzeroes, etc)
- Combinatorics
- Higher-level stream processing (various merges)
Map/Filter/Reduce
Not much to write home about here, anyone can write these and I encourage each person to do it themselves using Go iterators. There’s the usual suspects here along with various types of Reduce
that all boil down to the base Accumulate
.
Cleanups
Similar to the above, pretty trivial to write. Can even treat these as specific cases of Filter
. Compact
is one such case.
Basic Tests
- There’s an
Any
and anAll
for testing for conditions satisfied by the entire sequence;Count
gets just the length. All are useful withTee
!
Slicing
First
andLast
do what they say;FirstAndRest
is the CAR/CDR you didn’t know you needed.- You can
TakeWhile
andDropUntil
to start/end at a particular point. - You can
Repeat
an item N times,Rotate
a slice so the first N items are moved to the back,Repeat
each element N times, andCycle
the iterator which is just infinite repeats.
Ruby Silliness
Flatten
takes an iterable of iterables and turns it into a flat iterable, but it doesn’t do it to arbitrary levels of nesting like Ruby does. There are rules here dude.Tap
is largely useless, kind of like a forEach or a visitor that passes the item along.Partition
splits an iterator into two based on a partition function, allowing you to e.g. split good/bad inputs into separate pipelines. A simplerUniq
function just returns the first value of each key grouping instead.- Similarly,
GroupBy
takes an ordered set of items and “bins” them based on a key function, allowing you tofor key, vals := range Partition { for val := range vals { ...} }
- There’s also a
Tee
to get a tear-off copy of the iterator.
Combinatorics
I wanted combinations
and permutations
, so those were high on the list. I found myself writing the code more and more generically as I went along, eventually ending with the mess that is combinatorics.go
. Funnily enough, each combinatorial case was some combination of:
- Length (one, fixed, variable)
- Ordering (in order of occurrence, free variance)
- Repetition of elements (on/off)
Function | Length | Ordering | Repetition |
---|---|---|---|
(Identity function, superfluous) | N (Length of inputs) | Fixed | No |
OrderedPermutationsOfLength |
M (User-specified) | Fixed | No |
AllOrderedPermutations |
1…N (Length of inputs) | Fixed | No |
OrderedPermutationsToLength |
1…M (User-specified) | Fixed | No |
Permutations |
N (Length of inputs) | Free | No |
PermutationsOfLength |
M (User-specified) | Free | No |
AllPermutations |
1…N (Length of inputs) | Free | No |
PermutationsToLength |
1…M (User-specified) | Free | No |
Combinations |
N (Length of inputs) | Free | Yes |
CombinationsOfLength |
M (User-specified) | Free | Yes |
AllCombinations |
1…N (Length of inputs) | Free | Yes |
CombinationsToLength |
1…M (User-specified) | Free | Yes |
Windows
Not as high-level as Combinatorics, but I wanted to take a window of N at as time.
Function | Length | Overlaps | Examples |
---|---|---|---|
Windows |
1…M | No | {1, 2, 3}, 2 -> {1, 2}, {3} |
SlidingWindows |
M | Yes | {1, 2, 3}, 2 -> {1, 2}, {2, 3} |
Higher-Level Chaining
I like to concatenate iterators sometimes; e.g. to process the results of two tasks in a single queue. itertools.chain
works in Python, it was a breeze to write here as well.
for x := range chains.FlattenArgs(chains.Each([]int{1, 2, 3}, []int{4, 5, 6})) {
/// Gives you 1, 2, 3, 4, 5, 6
}
Got the base case down.
Things I Never Had in Itertools: Merging Iterators
itertools.chain
works in a pinch for “gluing” together iterators in Python, and that was pretty trivial to write (I called it “Flatten” because the first iteration took an iterator of iterators, thanks to Ruby/Javascript brain, then made one that took variadic iterator args as FlattenArgs
too). Some common use cases I find myself writing a lot just aren’t in the stdlib in Python. They generally involve taking many iterators and unifying then in ways dependent on the structure of the iterators themselves; either by length or by value.
Combining Streams “Fairly”
Another use case is Round-Robining a set of iterators until they are all exhausted. We’ve got a set of inputs and want to consume from all of them until we run out. For that, there’s RoundRobin
, which works exactly as expected. Each iterator can have a variable number of entries but all entries are considered by index, so we don’t exhaust one before going to the next but try to consume them all equally.
item1 := []int{1, 2, 3, 4}
item2 := []int{5, 6, 7, 8, 9}
item3 := []int{10, 11, 12}
for x := range RoundRobin(Each(item1), Each(item2), Each(item3)) {
// Equivalent of ...[]int{1, 5, 10, 2, 6, 11, 3, 7, 12, 4, 8, 9}
}
Merging Sorted Streams
Use case: I have 3000 CSVs, each has rows in order of date. The time ranges may overlap in some cases. I wanted a unified stream of all the rows in order. For that, I wrote Merged
, which is at its core a pull-on-demand heap. Once the smallest value has been pulled off the heap, yield it, then grab the next value from the iterator that provided the value to place on the heap.
item1 := []int{1, 2, 3, 4}
item2 := []int{4, 5, 6, 7}
item3 := []int{1, 5, 8, 10}
expectedSlice := []int{1, 1, 2, 3, 4, 4, 5, 5, 6, 7, 8, 10}
for x := range Merged(Each(item1), Each(item2), Each(item3)) {
// Equivalent of ...[]int{1, 1, 2, 3, 4, 4, 5, 5, 6, 7, 8, 10}
}
My Journey Of Discovery Got Me Using My Favored Pattern In A More Go-Like Way In Go
Starting Out: The Chain
I wanted to be able to chain calls like I can in those other languages, so the first thing I thought to do was design some sort of struct or interface with various .Map
/.Reduce
/.Filter
/etc methods – so like
type Chainable interface {
Map(mapFunc(T) V) Chainable[V]
Reduce(mapFunc(T) V) Chainable[V]
Filter(mapFunc(T) V) Chainable[V]
}
Immediately there’s a problem because Interfaces can’t use generics in Go, so we do a struct instead:
type Chainable[T] struct {
func(c *Chainable) Map(mapFunc(T) V) Chainable[V] { ... }
Reduce(mapFunc(T) V) Chainable[V] Chainable[V] { ... }
Filter(mapFunc(T) V) Chainable[T] Chainable[V] { ... }
}
This sort of works! You can see in the IterableSequence
type we do this. But to do two types (say we’re mapping from int
to string
), we have to have an IterableSequence
that does two, and so IterableSequence2
was born.
Now how to get from an IterableSequence
to an IterableSequence2
? I decided on a top-level function to create a type called a Junction
to go from a one-typed chainable to a two-typed one. Now what if we need a third type? This is getting messy1.
This pattern works for simple cases just fine, but it falls down once we get into the variadic world. Go’s obviously stunted-on-purpose generics are preventing us from doing this syntactic sugar in a clean way, but it is also suggesting a different way to do it.
I was in love with my ability to do chained iterators, but they got clunky. Go generics only apply to functions and you can’t template an interface. So while X.Filter(...).Map(...)
is fun and cute, in Go you’re better off doing something like:
f := Filter(X, filterFunc)
m := Map(f, mapfunc)
for results := range m {...}
…which, quite frankly, feels a lot more Go-like and less foreign than the cute way we do it in other languages. You can see I gave up on chaining in the above examples and just do individual iterators.
And so, instead I found myself using single iterators via the Each
adapter function and back to slices with ToSlice
. As I got further into implementing the various functions I wanted, I moved away from the Chainable
pattern into simple functions. It’s still ugly to do Map(Filter(Reduce(...)))
because the pipeline appears in opposite order but doing each as an assignment keeps the order at the expense of slightly more verbosity. It’s not as aesthetic but it works.
I think the most practical example I can give is the test in the cookbook that generates a sequence of fights in Street Fighter. If you play as a playable character you can fight all the other playable characters and each of the bosses. You cannot play as the bosses. As such, we have two separate matchup types:
- Player v Player, unordered
- Player v Boss, unordered
And gluing the two together is pretty clean:
regularFighters := []string{"Ryu", "Chun Li", "Guile", "E. Honda"}
bosses := []string{"Sagat", "Vega", "M. Bison"}
allExpectedFights := []string{
"Ryu vs. Chun Li",
"Ryu vs. Guile",
"Ryu vs. E. Honda",
"Chun Li vs. Guile",
"Chun Li vs. E. Honda",
"Guile vs. E. Honda",
"Ryu vs. Sagat",
"Chun Li vs. Sagat",
"Guile vs. Sagat",
"E. Honda vs. Sagat",
"Ryu vs. Vega",
"Chun Li vs. Vega",
"Guile vs. Vega",
"E. Honda vs. Vega",
"Ryu vs. M. Bison",
"Chun Li vs. M. Bison",
"Guile vs. M. Bison",
"E. Honda vs. M. Bison",
}
// Each combination of players without replacement
matchups := CombinationsOfLength(regularFighters, 2)
singlePlayerFights := Map(matchups, func(names []string) string {
return strings.Join(names, " vs. ")
})
// Trick to get pairwise fights from two lists -- lengthen the one by
// the number of elements in the other, then cycle.
// e.g. if you have P1, P2, P3 and B1, B2:
// Cycle --> P1, P2, P3, P1, P2, P3, P1, ...
// Lengthen --> B1, B1, B1, B2, B2, B2
// (each boss is repeated number of players times)
bossMatchups := Zip(
Cycle(Each(regularFighters)),
Lengthen(
Each(bosses),
len(regularFighters),
),
)
bossMatchupFights := Map2(bossMatchups,
func(p1, p2 string) string {
return strings.Join([]string{p1, p2}, " vs. ")
},
)
// Combine all iterators into one
allFightsCombined := FlattenArgs(singlePlayerFights, bossMatchupFights)
allFights := ToSlice(allFightsCombined)
if !slices.Equal(allFights, allExpectedFights) {
t.Fatalf("%v != %v", allFights, allExpectedFights)
}
Once I started doing things the Go way, it really increased the pace of development as well. Being able to implement each iterator as a simple function meant I could focus on implementation and not boilerplate. Constraining myself to iter.Seq
and iter.Seq2
relieved me of the analysis paralysis of variadic iterators: one value, and when it made sense, two.
Don’t Use This As A Library
I’ve made this available as an importable library, but many of these patterns are easier to just copy and paste into your code. They should also be inspiration: this is a fun problem to solve! Solve it yourself!
-
After posting this, I discovered another person had done something similar but used runtime reflection to sacrifice compile-time safety for the syntactic cleanliness I was going after. I respect the approach and I like the depth of knowledge of the language needed to do this. I’m going to say which implementation is better is a matter of knowing the tradeoffs: are you willing to sacrifice compile-time safety for convenience? It’s probably an unequivocal yes if you have good test coverage. ↩︎