Programming with sequences part 1 - introduction to powerseq
Introduction
Functional programming gives us many wonderful things, and one of the best things for me is the way we can work with collections of data. Every developer, day in and day out, loops over collections, aggregates some values, maps one type of collection to another type of collection. If you change the way you look at those kinds of operations, everything will change for you. It will have a more significant impact on your programming skills than many other fancy libraries, tips, patterns, tools, architectures, etc. It allows you to write simple, concise, easy-to-understand, and maintainable code.
In this article, we will be talking about lazy sequences of values and the operators that can be performed on them. All code samples will be written in JavaScript, but you can easily apply this knowledge to any other language or runtime. I am aware that most developers know and use operators like map, filter, and reduce in their code, but I have also frequently encountered bad or entirely unnecessary usage of those operators. The true power comes from the number of operators available to us and the way we can compose them together.
First, I will try to explain why a declarative style of programming is better than an imperative style (at least for me). Then we will extract some typical code snippets and implement equivalent operators ourselves. We can treat this article as an introduction to the library powerseq. In the following parts of the series, I will demonstrate real-world scenarios implemented using sequence operators. But to be honest, it is not about any particular library or even programming language, it is about the way of thinking.
Sequences vs. collections (iterators, generators)
Let’s take a look at for/of
loop in JavaScript:
const numbers = [1, 5, 3, 9, -1, 5, -12, 0, 44, 12, -100];
for (var n of numbers) {
console.log(`${n} zl`);
}
Looping over a collection of items using for/of
is the same as executing the following code:
const iterable = numbers;
const iterator = iterable[Symbol.iterator]();
let result;
while (!(result = iterator.next()).done) {
console.log(`${result.value} zl`);
}
It’s a JavaScript way of implementing the “iterator design pattern,” which can be expressed using TypeScript interfaces Iterable, Iterator
interface Iterable<T> {
[Symbol.iterator](): Iterator<T>;
}
interface Iterator<T> {
next(): { done: boolean; value: T };
}
Iterable<T>
interface represents a sequence of values. In contrast to the typical collection of values like Array
, the sequence does not have a length
property that returns the number of items. Sequence is a lazy-evaluated generator of values that can potentially be infinite. The consumer of the sequence decides how many items it needs to process and can always stop processing at any time by escaping the loop using break
or return
.
Because Array
type implements Iterable<T>
interface, we can iterate over the items of an array using a for/of
loop. We can always create our own objects implementing Iterable<T>
function range(start, count) {
return {
[Symbol.iterator]() {
const end = start + count;
let i = start;
return {
next() {
return i < end ? { done: false, value: i++ } : { done: true };
},
};
},
};
}
var numbers = range(1, 5);
console.log(Array.from(numbers).join()); // -> 1,2,3,4,5
console.log(Array.from(numbers).join()); // -> 1,2,3,4,5
This function returns an object that creates a sequence of count
numbers starting with start
. We can iterate over the sequence or convert the sequence into an array using the built-in static method Array.from(numbers)
or the spread operator [...numbers]
. The returned object generates values only on demand, so we can even call range(1,Number.MAX_VALUE)
and nothing happens. It’s just a recipe for generating values, not the values themselves.
const numbers = range(1, Number.MAX_VALUE);
for (const number of numbers) {
if (number > 5) {
// leave the loop
break;
}
console.log(number);
}
Writing objects that implement the Iterable<T>
interface manually can be cumbersome. As we saw above, the code can become very verbose and hard to understand, especially when the complexity of the operation grows. This is why the creators of JavaScript language provided us a feature called generators. The same function range
can be written as follows:
function* range(start, count) {
const end = start + count;
for (let i = start; i < end; i++) {
yield i;
}
}
const numbers = range(1, 5);
console.log(Array.from(numbers).join()); // -> 1,2,3,4,5
console.log(Array.from(numbers).join()); // -> <no items> !
The generator function must be marked with an asterisk at the end of function*
keyword. This means that function returns a JavaScript object implementing Iterable<T>
interface. In addition to the regular return ...
(which returns the last value in the sequence) we can use yield ...
every time our function generates a new value. We have to get used to a new syntax, but with time, the code becomes much easier to write and understand comparing to the manual implementation of Iterable<T>
interface. As an example, take a look how natural implementation of an infinite sequence looks like:
function* repeat(value) {
while (true) {
yield value;
}
}
I’m not sure if you’ve noticed one small detail in the code above. The generator function creates an iterable object numbers
, then we iterate over all items in the sequence until we reach the last item. If we try to iterate over the same object again, we won’t get any items. It was quite surprising when I encountered this behavior for the first time. In some languages like C#, F#, dart, … generator function produces values from the beginning every time we start a new iteration. JavaScript (and for example Python) took a different approach, the second iteration process continues the previous one. This behavior occurs because the generator function creates an iterable object that returns the same instance of the iterator object every time we execute iterable[Symbol.iterator]()
method. I personally prefer the behavior found in languages like C#, F#, dart, .. so I implemented it in the powerseq
library.
filter
Let’s take a look at the following code:
const numbers = [1, 5, 3, 9, -1, 5, -12, 0, 44, 12, -100];
for (const n of numbers) {
if (n > 0) {
console.log(`${n} zl`);
}
}
It’s a very typical loop iterating over collection of items and performing some operation only for selected items. We could implement the same functionality using the built-in Array
method called filter
.
for (const n of numbers.filter((n) => n > 0)) {
console.log(`${n} zl`);
}
For me, the second approach is definitely better. In the first approach, we filter the collection of items using for/of
loop. The loop is a very primitive and general-purpose structure, we can do anything with it. It’s like a reverse engineering, we need to carefully read every loop to understand the programmer’s intent. In the second case, we immediately see what we are trying to achieve. However, there is a small issue with this approach as well. We call filter
method and that creates a new temporary array in memory only to iterate over it once. This array has to be cleaned up by the garbage collector. To eliminate this small waste of resources, we can implement our own filter
function based on JavaScript generators.
function* filter(items, f) {
for (const item of items) {
if (f(item)) {
yield item;
}
}
}
for (const n of filter(numbers, (n) => n > 0)) {
console.log(`${n} zl`);
}
Now, the code is readable and no longer introduces an unnecessary array.
map
We can go even further looking at the code above. We iterate over a collection of numbers only to convert them into a collection of strings. Once again, it’s a very common pattern that we frequently encounter in our code. We could use the map
method of the Array
type, but again, it would introduce a temporary array. Let’s implement our own map
function:
function* map(items, f) {
for (const item of items) {
yield f(item);
}
}
const positiveNumbers = filter(numbers, n => n > 0);
const positiveCurrencies = map(positiveNumbers, n => `${n} zl`);
for (const of positiveCurrencies) {
console.log(c);
}
pipe
The final result looks like a SQL query, where filter
operator is a WHERE
clause, map
operator is a SELECT
clause. We can imagine many additional operators like distinct
, max
, join
and so on. You might be wondering if there is a simple way to avoid introducing temporary variables like positiveNumbers
or positiveCurrencies
? Yes, there is, and it’s quite straightforward, thanks to the pipe
function:
function pipe(value, ...funcs) {
return funcs.reduce((v, f) => f(v), value);
}
pipe(
10,
(v) => v + 1,
(v) => v * 10,
(v) => v.toString()
); // -> "110"
The pipe
function takes a value
and any number of functions funcs
, each function takes one argument and returns one result. The pipe
function calls the first function from funcs
collection, passing the value
argument. The returned result is then passed to the second function, and so on and so forth. The final result becomes the result returned from pipe
function. In many programming languages like OCaml, F#, ELM pipe
function is available as the |>
infix operator, and it is a crucial part of many codebases. One day it will also be available in JavaScript language. Now let’s look how pipe
function can help us with our previous example:
const positiveCurrencies = pipe(
numbers,
(x) => filter(x, (n) => n > 0),
(x) => map(x, (n) => `${n} zl`)
);
It’s slightly better because we don’t have to introduce temporary variables and the whole piece of code remains a single expression. Previously, there were two separate statements for the declaration and initialization of two variables. However, it still does not look perfect, we can improve it further. Let’s change the implementation of filter
and map
functions to support new overloads:
function filter(...args) {
return args.length === 2 ? filter_(...args) : (s) => filter_(s, ...args);
function* filter_(items, f) {
let index = 0;
for (const item of items) {
if (f(item, index)) {
yield item;
}
index++;
}
}
}
function map(...args) {
return args.length === 2 ? map_(...args) : (s) => map_(s, ...args);
function* map_(items, f) {
let index = 0;
for (const item of items) {
yield f(item, index);
index++;
}
}
}
const positiveCurrencies = pipe(
numbers,
filter((n) => n > 0),
map((n) => `${n} zl`)
);
Now we can use those operators in one of two ways: as a stand-alone operator, passing two arguments, or as an operator used inside pipe
call, passing one argument. We have also introduced a small additional feature: passing an optional index
argument next to the item
argument to function f
.
toarray
In our example, we have printed to the console positive numbers in a currency format. However, maybe we would like to create an array containing the final result. Let’s introduce a helper function called toarray
:
function toarray(...args) {
return args.length === 1 ? toarray_(...args) : (s) => toarray_(s);
function toarray_(items) {
return Array.isArray(items) ? items : Array.from(items);
}
}
pipe(
numbers,
filter((n) => n > 0),
map((n) => `${n} zl`),
toarray()
);
// -> [ '1 zl', '5 zl', '3 zl', '9 zl', '5 zl', '44 zl', '12 zl' ]
take
At the end, let’s add a final operator called take
. It takes two arguments, the sequence of items and the number n
. It returns a new sequence containing n
first items from the original sequence. So it’s similar to the TOP
operator in SQL.
function take(...args) {
return args.length === 2 ? take_(...args) : (s) => take_(s, ...args);
function* take_(items, n) {
var i = n;
if (i > 0) {
for (var item of items) {
yield item;
if (--i === 0) {
return;
}
}
}
}
}
const numbers = [1, 5, 3, 9, -1, 5, -12, 0, 44, 12, -100];
[...take(numbers, 4)]; // -> [ 1, 5, 3, 9 ]
The power of lazy evaluation
The code below compares two implementations of the same task side by side. The first one uses built-in Array
methods, while the second one uses the functions we implemented previously.
const numbers = [1, 5, 3, 9, -1, 5, -12, 0, 44, 12, -100];
// 1. using built-in Array methods
var result = numbers
.filter((n) => n > 0)
.map((n) => `${n} zl`)
.slice(0, 5);
// -> [ '1 zl', '5 zl', '3 zl', '9 zl', '5 zl' ]
// 2. using functions based on sequences
var result = pipe(
numbers,
filter((n) => n > 0),
map((n) => `${n} zl`),
take(5),
toarray()
); // -> [ '1 zl', '5 zl', '3 zl', '9 zl', '5 zl' ]
The final result is exactly the same. In terms of the number of lines of code, the first one is even shorter. But what is the difference during the execution?
In case of the first query, a new array is created after each step. After the filter
and map
operations, two new arrays are created, completely unnecessary. What’s even worse is that lambda functions passed to the methods are executed for all items stored in arrays. In our example, input array contains 11 items, function passed to filter
is executed 11 times (for all items), function passed to map
is executed 8 times (for all items returned by filter
). Just imagine how many times those functions would be executed for an input array containing 1000 items when, in fact, we only need the first 5 items?
In the case of the second query, operators like filter
, map
and take
are lazy. The last operator toarray
kicks off the execution. Firstly, the code inside take
function is executed, take
pulls item from map
, map
pulls item from filter
. filter
takes the first number 1
and calls the predicate function n => n > 0
. Since 1
is greater than 0
, it is returned further to map
, map
calls the converter function n => `${n} zl`
that results in the value 1 zl
that is returned to take
, which is returned as a first element of the final result. So to produce the first value, the lambda functions had to be called only twice. If we analyze the execution process further, we will see that the lambda functions are called the minimum number of times necessary to produce the final collection. Furthermore, no temporary array is generated that requires subsequent cleanup during this process.
Summary
Let’s recap what we have learned:
- Starting with ES6, the iterator pattern is directly supported in JavaScript as an API for iterable objects. A new kind of loop
for/of
and a generator functionfunction*
were also introduced. They provide a very convenient way of implementing iterable objects. - A typical imperative code contains variables, loops, conditionals (such as
if
statements), and often we can rewrite the same logic using functions likefilter
,map
, andreduce
. Declarative code, which utilizes these functions, is easier to read, understand, and maintain. - The built-in type
Array
does provide several functions likefilter
,map
, andreduce
. However, there may be situations where these functions are not sufficient, especially for those who have experience with functional programming languages. There is no elegant way to add a new operators to existingArray
type. - We can write our own lazy operators using generator function
map
,filter
, etc. and combine them usingpipe
function in a very composable way. - Lazy evaluation helps us write more performant code and also enables us to divide a problem into smaller, reusable functions that can be combined at the end to form a final solution. This is possible because the consumer of the sequence decides how and when the sequence will be executed.
- Finally, powerseq is a set of operators working with lazy sequences, the API is based on LINQ which is based on functional programming like Haskell and OCaml. It is no coincidence that these operators are so powerful, they are “standing on the shoulders of giants”.
In the next articles, we will take a closer look at these new operators and use them to solve real-world problems using a declarative style of programming.