With engineers trying out grandiose databases everyday for various use cases, I recently happened to have a use case where I went old school, as in with a single file database, namely SQLite. Now before you become skeptical, let me assure you that it ended up serving the use case very well, but there were some nasty surprises along the way.
So, we had this service (called MeMe, for brevity) that read a CSV formatted data dump and then put it in an in-memory database. One apparently fine day, the business came up with a request to provide data for missing metrics as well (yes, they basically asked us to create data out of thin air), and we figured that we had to calculate a bunch of mean values to generate the missing values. This responsibility fell on MeMe. After evaluating various solutions, I ended up giving SQLite a shot, because these aggregations were required only once a day and SQLite was more portable and cheaper (both in terms of development time and infrastructure cost) than other common databases.
I decided to use go-sqlite3 as it seemed like a mature enough Go driver for SQLite. After the initial implementation, I ran into an apparently common issue with SQLite drivers i.e. errors saying:
database is locked
After some digging, I found out that go-sqlite3 compiles SQLite in serialized mode i.e. it can be safely used by multiple threads with no restriction, which is enabled by internally managing concurrent read and write operations.
So, ideally this mode should require no coordination on the Go side, but when used with Go’s SQL interface it introduces contention on SQLite side due to connection pooling, which results in some requests running into locked database.
This situation was further complicated by the fact that I was sharding requests to the SQL interface itself over multiple goroutines, because MeMe was reading data dumps as a batch of multiple files, and the initial ingestion of all the data could be performed in parallel.
After some research, I figured that the easiest way (albeit, not the best way) was to force SQL interface to use a single connection to SQLite, so that SQLite completely takes over the responsibility of managing concurrent operations:
db.SetMaxOpenConns(1)
This increased the overall ingestion time of the data, as it was being limited by SQLite’s rate of write operations, but I kept the sharding of reading the data dump as there were no limits on that side.
Now, with MeMe using SQLite queries to perform our required aggregations (and severely crippled by the single access nature of SQLite, which by the way, was not an issue – as the time delta was less than 5 minutes – keeping in mind that we were gonna run this part of MeMe just once in 24 hours), I confidently set out to deploy it using our existing Docker image, and ran into:
standard_init_linux.go:207: exec user process caused "no such file or directory"
Upon inspection, I realized that our Docker image was not using cgo when compiling the Go binaries (which – by the way – is the default behavior for cross-compilation of Go binaries). cgo enables the usage of C code with Go code by compiling it along with the Go code, and it was required by MeMe because SQLite is written in C and gets compiled with go-sqlite3.
So, I enabled cgo in the build command for MeMe’s Go binary in our Dockerfile:
CGO_ENABLED=1 GOOS=linux go build -o meme .
But, after giving it another try, it still gave the same error.
It turned out that even though the C code was being compiled now, the compiled libraries were still missing because we had a multi-stage Docker build and we only copied over the MeMe binary to the execution container. The solution to this problem was to simply bundle everything into a single binary i.e. build a static binary, which includes all the linked libraries (actually, not all, as I later realized):
CGO_ENABLED=1 GOOS=linux go build -a -ldflags '-linkmode external -extldflags "-static"' -o meme .
With fingers crossed, I triggered the deployment again, and came face to face with a rather underwhelming error:
sh: /bin/meme: not found
I was literally scratching my head at this point in time; I mean, the binary was compiling alright now, but somehow my execution container thought that it was not there!
After almost hitting the rock bottom of this rabbit hole (luckily, not the rock bottom of my life), I found out that even static binaries were not always sandboxed e.g. usage of Go’s net package adds dynamic dependency on some system libraries, which was a problem in our case because MeMe was being built in a Debian container but was being run in an Alpine one, and both of these distros have different networking libraries.
So, I happily changed the build container to use an Alpine image and everything went south as Git couldn’t access our private repositories, hence, the Go build started failing. The problem was that our CA certificates were not being added to the system (which we copied from a volume to our build container). After some investigation I found out that this was a common issue on Alpine i.e. update-ca-certificates on Alpine only takes into account the certificates that are located directly under /usr/local/share/ca-certificates while the certificates in the subdirectories of this path are ignored. Therefore, I copied our certificates to this path instead of copying the folder containing them to this path:
cp -R /tmp/cert/* /usr/share/ca-certificates/
At this stage, I was almost ready to let loose a couple of party poppers, but was greeted by an old friend after another try:
standard_init_linux.go:207: exec user process caused "no such file or directory"
Before I could almost go bald by pulling all my hair, I figured out that Alpine contains standard libraries which are located in a different path than most of the distros. So, to enable Go compiler to access the required build tools for C code, I installed the standard C compiler in our Alpine based build image:
RUN apk add --update gcc=6.3.0-r4 g++=6.3.0-r4
And voila! I slept like a baby after that…zzZ