Golang | Channel vs Mutex - Which should I use

2020 05 19, Tue

更新一下。我的结论是没错的，但是结论不准确。有问题的是我使用channel的方式。请看我的下一篇post，会解释Golang的channel是什么，我为什么用错了，怎么解决。

最近在 Exercism 上做题练习一些我不够熟悉的语言。这两天遇到了一个简单的问题，就是并行任务应该加锁还是用Channel避开锁，和我的mentor有不同的想法。在这里解释为什么。由于Mentor是外国人，这篇博客也会用英语再写一遍。

I’ve been working on exercises on Exercism for a while. I met a pretty simple question and I have a different opinion with my mentor on one question: channel or mutex, which is proper for this single problem. For my readers are mostly Chinese, i will mainly focus on Chinese version and rewrite / translate most part into English. tell me if some sentences make no sense, i will update this post.

背景 Background

问题是个非常简单的问题：数多段文字中 rune ( Unicode字符 ) 的数量。

The question itself is deadly simple: count the runes in a list of strings.

问题是如何实现这样的事情，有两种思路。

The problem is how should I implement it. There are two methods.

粗暴思路 Simple way: Lock & Mutex

最基本的思路就是使用传统的同步原语来加锁，比较常见的就是Mutex。简单说加锁就是一个原子的操作，一个锁被锁上以后，在解锁以前都无法再次加锁。也就意味着这段代码和这块数据在这段时间内不再可以被访问，直到锁被释放掉。这种方法相对来说比较常见。

The simplest solution is lock that piece of data and code using syncing primitives, which means once a procedure acquires the lock, it cannot be locked again before the lock is released, which means the code will wait until the lock is available.

但是相对的问题也很多，比如说死锁，比如说性能下降，比如说热点。

Though it brings some problems to us like dead lock, performance degration and hotspot.

对于分布式系统来说，也有分布式锁，比如说最近比较火的redis所支持的一种用法：Redlock.

For distruted systems, there are distributed locks. I’m using one of them: Redlock.

保证只有一个人写内存 Ensure That there is only one thread to write memory

另一种方法其实是把任务切分开，负载分成读、工作、写三部分。每部分给自己的下游发布数据，写的部分单线程，以此保证写的时候一致。比如Producer/Consumer模式和Actor模式。

Another way is to cut up the tasks into three parts: read data, process, write result. each part publish work to its downstream. the writer part is done in single thread to ensure consistency.

通常来说，我们需要一个类似于队列的东西。在不同语言中有不同的方法来做类似的事情，比如说Python的threading.Queue和Go的channel。分布式里面也有各种MQ，比如说Azure的Storage Queue，或者开源的RabbitMQ。

Generally we need a queue. There are difference implementation in languages and systems, like threading.Queue in Python, chan in Golang, Azure Storage Queue or RabbitMQ.

方法对比 Comparison

编码成本 Complexity

加锁很明显是最简单最普遍的方案，只要在需要并发访问的资源上加锁就好了。用队列需要把任务拆分成多个部分，麻烦一些。

Using a lock could be the simplest solution. Only thing to do is to add a lock on the object. Queue is more complex as we have to rewrite tasks.

出错 Bugs

锁更多的会出现竞争、死锁和饥饿的问题。而队列和Channel更多的可能出现内存泄漏问题。

There is race condition, dead lock problem and starvation for locks. For queues we have problem of memory leak.

效率差别 Performance

效率问题略微复杂一些，对于单机程序要考虑锁和队列本身的开支以及异步任务的开支，要考虑访问数据的时序、频率之类的问题，要考虑拆分工作以后、新增的协程线程进程的开支，对于分布式系统还有网络和系统调优。

Performance is a much more complex topic. for single instance application, we need compare the cost of asynchronous task with the cost of lock or queue. we need to consider how frequent we change the data and some sequencial problem. we need to consider the cost of coroutines/threads/processes. and for distributed app, there is network issues and system tuning.

除此之外顺带一提，其实队列也要用到锁的。

Also, many implementations of queue actually already make use of locks.

重新思考题目 Rethink about the question

题目要求是利用并行求出字符数量。其实就是两步

从字符串取出字符
计数器增一

In the question, there are two steps:

read the rune
increase the counter

事实上两种方法都可以解决问题。

And both of two methods are good enough.

但是直觉上其实就很倾向于用锁。因为这个任务足够小，新建一个协程的成本会远超锁。

Though my soul tells me to use mutex. Because that the task is so simple that creating a coroutine costs more resource than using a mutex.


// ConcurrentFrequency calculate the frequency concurrently
func ConcurrentFrequency(strings []string) FreqMap {
	m := FreqMap{}
	lock := &sync.Mutex{}

	// use wg to wait for concurrent tasks
	wg := &sync.WaitGroup{}
	wg.Add(len(strings))
	for _, str := range strings {
		// anonymous closure
		go func(s string) {
			for _, r := range s {
				// in concurrent scenario, we need either locks or atomic operations.
				// since i did not find how to make sync/atomic work with map[rune]int,
				// i just use sync.Mutex as lock
				lock.Lock()
				m[r]++
				lock.Unlock()
			}
			wg.Done()
		}(str)
	}
	wg.Wait()
	return m
}

// ConcurrentFrequencyChan calculate the frequency concurrently
func ConcurrentFrequencyChan(strings []string) FreqMap {
	m := FreqMap{}
	ch := make(chan rune)

	// use wg to wait for concurrent tasks
	wg := &sync.WaitGroup{}
	writerWg := &sync.WaitGroup{}

	writerWg.Add(1)
	wg.Add(len(strings))
	for _, str := range strings {
		go func(s string) {
			for _, r := range s {

				ch <- r
			}
			wg.Done()
		}(str)
	}
	go func() {
		for r := range ch {
			m[r]++
		}
		writerWg.Done()
	}()
    wg.Wait()
    // close the channel to prevent memory leak
	close(ch)

	writerWg.Wait()

	return m
}

两个都可以通过测试：

And both of them passed the tests.

func BenchmarkConcurrentFrequency(b *testing.B) {
	for i := 0; i < b.N; i++ {
		ConcurrentFrequency([]string{euro, dutch, us})
	}
}

func BenchmarkConcurrentFrequencyChan(b *testing.B) {
	for i := 0; i < b.N; i++ {
		ConcurrentFrequencyChan([]string{euro, dutch, us})
	}
}

实际测试结果:

Here is the result:

=== RUN   TestConcurrentFrequency
--- PASS: TestConcurrentFrequency (0.00s)
=== RUN   TestSequentialFrequency
--- PASS: TestSequentialFrequency (0.00s)
goos: linux
goarch: amd64
pkg: letter
BenchmarkSequentialFrequency
BenchmarkSequentialFrequency-8              54927         21473 ns/op
BenchmarkConcurrentFrequency
BenchmarkConcurrentFrequency-8              22798         44139 ns/op
BenchmarkConcurrentFrequencyChan
BenchmarkConcurrentFrequencyChan-8           3967        380515 ns/op
PASS
ok      letter    8.987s
go test -bench '.*' -v .  9.14s user 0.89s system 109% cpu 9.192 total

但是我们看到实际上锁的性能会远高于Channel。在我的猜想中，新增一个coroutine会带来更多的资源消耗。我们下面验证一下

We see that BenchmarkConcurrentFrequency has better performance than BenchmarkConcurrentFrequencyChan. imho, i guess that it is because of the cost of that goroutine. let’s verify it.

pprof

我用了runtime/pprof改了一下Benchmark函数，做了简单的计量。

i changed the benchmark methods to measure the time.

func BenchmarkConcurrentFrequency(b *testing.B) {
	fp, err := os.OpenFile("lock.pprof", os.O_CREATE|os.O_RDWR, 0644)
	if err == nil {
		defer fp.Close()

		pprof.StartCPUProfile(fp)
		defer pprof.StopCPUProfile()
	}
	for i := 0; i < b.N; i++ {
		ConcurrentFrequency([]string{euro, dutch, us})
	}
}

func BenchmarkConcurrentFrequencyChan(b *testing.B) {
	fp, err := os.OpenFile("chan.pprof", os.O_CREATE|os.O_RDWR, 0644)
	if err == nil {
		defer fp.Close()

		pprof.StartCPUProfile(fp)
		defer pprof.StopCPUProfile()
	}
	for i := 0; i < b.N; i++ {
		ConcurrentFrequencyChan([]string{euro, dutch, us})
	}
}

用go tool pprof -svg -nodefraction=0 -output xx.{svg,pprof}得出来了下面的两张call graph。点开看吧，这个SVG感觉有魔法，我嵌到网页里面丑到爆。

and we get two svg image by executing go tool pprof -svg -nodefraction=0 -output xx.{svg,pprof}

and these are files produced by profiling:

chan.pprof lock.pprof

简单解释一下就是，channel这种思路，在channel本身上，以及coroutine调度上，都花费了很多数据。而且我们可以看到对于channel本身、go语言实际上还是用到了runtime.lock，实际上还是用到了锁。

To make it simple. Channel takes too much time to aquire locks and waiting for runnable coroutine.

Conclusion

我认为对于这种简单任务来说，比如说数字数的这一题，锁还是最正确的做法，我们主要需要梳理资源的依赖关系，尽可能让依赖指向一个方向，而不是互相依赖。对于更复杂的问题，尤其是有比较重的计算任务或者IO时，用channel可以减少编码时维持资源依赖关系带来的复杂度。

To deal with simple problems like counting chars, mutex & lock should be the perfect solution. Just keep in mind that don’t let two resources rely on each other. instead make it a stream, dependencies are only from upstream to downstream. And for complex tasks, like heavy compution or IO tasks, channel could make coding easier.