构建容器镜像 - 新手到大师之路

2021 12 18, Sat

大概说说容器镜像怎么打包。

这篇博文出于各种原因没有使用docker作为主要的命令行工具,而是他的替代品podman。这个命令在macOS上没有,但是我觉得原理是一样的。

关于查看多架构镜像manifest的部分,因为podman有些不成熟,所以还是用的是docker,但是不需要dockerd,docker查看manifest是客户端直接和registry通信的。

什么是容器和镜像

镜像之于容器,如同文件之于进程。

镜像就是一个容器里面所有的文件,比如说有的镜像装着一个袖珍但是完整的Ubuntu系统,有的镜像装个一个数据库,还有的镜像就只是装着一个应用。这些镜像启动以后就是一个容器,容器内的文件系统和外界隔离,不受宿主的影响。

容器镜像,其实就是按照正常的系统结构打包的根目录的压缩包,加上镜像的信息,比如说名字、架构、哈希、签名,诸如此类的。现在常见的应用容器镜像包含了很多层,那么就还包含了镜像整体的信息、每一层的文件系统和跟文件系统有关的信息。

镜像是怎么工作的

Overlay

复杂一些的容器镜像,比如说我们今天主要讨论的Docker类应用镜像,为了缩减分发的大小,让容器更加的轻便,用多层文件系统合并的方法叠加形成完整的文件系统这种方法来构建容器的文件系统。这种文件系统我们叫他overlay,在linux中包含overlay和overlay2两种实现,通常我们默认使用overlay2。

overlay文件系统通常由多个只读目录组成,一般使用可选的一个可写的upper目录和多个只读的lower目录,没有upper就只读,有upper就可写并且需要一个work目录,写入的内容都会放在upper目录中。

容器镜像下载下来以后都会展开到目录中,引擎启动容器时会新建一个空目录作为uppper,把镜像的各个层按新到旧的顺序填到lower里面,挂载的目标目录就是容器最终的rootfs。

当然也有很多比较新的文件系统原生支持这样的功能,比如说btrfs和zfs,那么支持的容器引擎也可以配置为直接使用这些文件系统的内置特性来实现,使用的逻辑和overlay是类似的。

假如我们用工具去看自己构建的image,那么会看到镜像是不同的diff层构成的。比如说我写过一个CRD和controller的例子:

guochao@yoga14c ~ % podman manifest inspect docker.io/jeffguorg/l4-ingress-controller-for-nginx:0.0.1
WARN[0006] Warning! The manifest type application/vnd.docker.distribution.manifest.v2+json is not a manifest list but a single image.
{
    "schemaVersion": 2,
    "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
    "config": {
        "mediaType": "application/vnd.docker.container.image.v1+json",
        "size": 1132,
        "digest": "sha256:edf8ed398c5ecb23c80df9171f206d9c561b77f18e8f14de4de860cf9cc7f661"
    },
    "layers": [
        {
            "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
            "size": 643568,
            "digest": "sha256:5749e56bea7178768b00aac0bf087558fccd5b8e0c807610618be7568459f359"
        },
        {
            "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
            "size": 17300907,
            "digest": "sha256:04b473b9b121d617cdc67c689ca6dc54278bc114bb6de2b846477c33dc9f6f60"
        }
    ]
}

这是镜像只有一个架构的例子,还有种情况是同一个tag下面有多种架构的,会返回一个application/vnd.docker.distribution.manifest.list.v2+json类型的manifest,比如说官方的debian:buster

guochao@yoga14c ~ % podman manifest inspect docker.io/library/debian:buster
{
    "schemaVersion": 2,
    "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
    "manifests": [
        {
            "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
            "size": 529,
            "digest": "sha256:9a1f494bb52e5d18e2dfb0fd6e59dbfe69aae9feecff1b246ad69984fbe25772",
            "platform": {
                "architecture": "amd64",
                "os": "linux"
            }
        },
        {
            "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
            "size": 529,
            "digest": "sha256:e32b6426b9674c7673fa277a465ca0cf0d83ef5c414dca495b69d576ad0045e8",
            "platform": {
                "architecture": "arm",
                "os": "linux",
                "variant": "v5"
            }
        },
        {
            "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
            "size": 529,
            "digest": "sha256:e2dfabc3b3958ebbf3b5c007b1d557a49d68c3ad07a0137ef4263b45f90371fc",
            "platform": {
                "architecture": "arm",
                "os": "linux",
                "variant": "v7"
            }
        },
        {
            "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
            "size": 529,
            "digest": "sha256:513fb887826a93e31a0fc321fc1b425681d806dec87c23996304b68952e74166",
            "platform": {
                "architecture": "arm64",
                "os": "linux",
                "variant": "v8"
            }
        },
        # ...
    ]
}

实际我们拉取镜像时会根据本地的情况,只拉取对应架构的diff layers。

具体到manifest的问题,我们后面会解释。

Chroot & Mount Namespace

有了overlay以后,还有一个问题是怎么让容器内的进程以为自己在一个新的环境里面,而不是还能而且默认使用宿主的文件系统环境。

安装过Arch和Gentoo Linux的同学都很熟悉一个东西叫chroot。chroot就是让一个进程把一个目录作为自己的根文件系统。但是对于chroot来说有一个不大好的地方,就是chroot并不能屏蔽掉宿主一侧的其他信息,比如说挂载点、进程,在chroot内还可以看到很多额外的信息,也没有办法屏蔽各种特权。

所以实际上我们主要用到的特性叫namespace,命名空间,类似于我们新开了一个空间一个宇宙,这个空间内的进程看不到之前的空间中的大部分东西,只能看到和操控非常有限的内容。

比如说我启动了一个容器,那么容器引擎会创建好空的upper/work目录,启动运行时会在新的命名空间中挂载目录以后启动容器进程,容器进程就只能看到自己的根文件系统了,会看到根目录是一个overlay,可以在宿主一侧看到各个目录。

比如说我拉取了debian:buster,启动一个podman容器:

podman image inspect docker.io/debian:buster
guochao@yoga14c ~ % podman image inspect docker.io/debian:buster 
[
    {
        "Id": "6642e362a8254d7645ed8dae69be5e3edcd2e9a5ee77baeb0655595244082d13",
        "Digest": "sha256:5b57f8c365c40fde437d53b953c436995525be7c481eb0128b1cbf3b49b0df18",
        "RepoTags": [
            "docker.io/library/debian:buster"
        ],
        "RepoDigests": [
            "docker.io/library/debian@sha256:5b57f8c365c40fde437d53b953c436995525be7c481eb0128b1cbf3b49b0df18",
            "docker.io/library/debian@sha256:9a1f494bb52e5d18e2dfb0fd6e59dbfe69aae9feecff1b246ad69984fbe25772"
        ],
        "Parent": "",
        "Comment": "",
        "Created": "2021-12-02T02:48:32.500000689Z",
        "Config": {
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "bash"
            ]
        },
        "Version": "20.10.7",
        "Author": "",
        "Architecture": "amd64",
        "Os": "linux",
        "Size": 119266070,
        "VirtualSize": 119266070,
        "GraphDriver": {
            "Name": "overlay",
            "Data": {
                "UpperDir": "/home/guochao/.local/share/containers/storage/overlay/2c7e7ab2260a4a82502119b16f69d58fbcb7b561435a8a86013226505809d550/diff",
                "WorkDir": "/home/guochao/.local/share/containers/storage/overlay/2c7e7ab2260a4a82502119b16f69d58fbcb7b561435a8a86013226505809d550/work"
            }
        },
        "RootFS": {
            "Type": "layers",
            "Layers": [
                "sha256:2c7e7ab2260a4a82502119b16f69d58fbcb7b561435a8a86013226505809d550"
            ]
        },
        "Labels": null,
        "Annotations": {},
        "ManifestType": "application/vnd.docker.distribution.manifest.v2+json",
        "User": "",
        "History": [
            {
                "created": "2021-12-02T02:48:31.875020574Z",
                "created_by": "/bin/sh -c #(nop) ADD file:f077e1a42fb919be2af67820782ceee3b46ffd13d7ab6949bea9921217d12813 in / "
            },
            {
                "created": "2021-12-02T02:48:32.500000689Z",
                "created_by": "/bin/sh -c #(nop)  CMD [\"bash\"]",
                "empty_layer": true
            }
        ],
        "NamesHistory": [
            "docker.io/library/debian:buster"
        ]
    }
]
podman run -it --rm docker.io/debian:buster
guochao@yoga14c ~ % podman run -it --rm docker.io/debian:buster     
root@720de25ef7d2:/# mount
overlay on / type overlay (rw,relatime,lowerdir=/home/guochao/.local/share/containers/storage/overlay/l/GGGENAHOSSNMQFTCIWTGULBJAR,upperdir=/home/guochao/.local/share/containers/storage/overlay/33dee2ffefddb08cb5efdc1097cab8a7dfc04b4c20ca4452888dd221f6f83778/diff,workdir=/home/guochao/.local/share/containers/storage/overlay/33dee2ffefddb08cb5efdc1097cab8a7dfc04b4c20ca4452888dd221f6f83778/work,index=off,metacopy=off,volatile,userxattr)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
...
tmpfs on /etc/resolv.conf type tmpfs (rw,nosuid,nodev,relatime,size=1423144k,nr_inodes=355786,mode=700,uid=1000,gid=1000,inode64)
tmpfs on /etc/hosts type tmpfs (rw,nosuid,nodev,relatime,size=1423144k,nr_inodes=355786,mode=700,uid=1000,gid=1000,inode64)
shm on /dev/shm type tmpfs (rw,relatime,size=64000k,uid=1000,gid=1000,inode64)
tmpfs on /etc/hostname type tmpfs (rw,nosuid,nodev,relatime,size=1423144k,nr_inodes=355786,mode=700,uid=1000,gid=1000,inode64)
tmpfs on /run/.containerenv type tmpfs (rw,nosuid,nodev,relatime,size=1423144k,nr_inodes=355786,mode=700,uid=1000,gid=1000,inode64)
...
devpts on /dev/console type devpts (rw,relatime,gid=100004,mode=620,ptmxmode=666)
root@720de25ef7d2:/#
guochao@yoga14c ~ % podman inspect 720de25ef7d2
[
    {
        "Id": "720de25ef7d2a10a92e5d709eff0f38248869a2c51998f8d965a536b98aabb3f",
        "Created": "2021-12-18T15:06:20.965087196+08:00",
        "Path": "bash",
        "Args": [
            "bash"
        ],
        "State": {
            "OciVersion": "1.0.2-dev",
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 2530,
            "ConmonPid": 2527,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2021-12-18T15:06:21.03623332+08:00",
            "FinishedAt": "0001-01-01T00:00:00Z",
            "Healthcheck": {
                "Status": "",
                "FailingStreak": 0,
                "Log": null
            },
            "CgroupPath": "/user.slice/user-1000.slice/user@1000.service/user.slice/libpod-720de25ef7d2a10a92e5d709eff0f38248869a2c51998f8d965a536b98aabb3f.scope"
        },
        "Image": "6642e362a8254d7645ed8dae69be5e3edcd2e9a5ee77baeb0655595244082d13",
        "ImageName": "docker.io/library/debian:buster",
        "Rootfs": "",
        "Pod": "",
        "ResolvConfPath": "/run/user/1000/containers/overlay-containers/720de25ef7d2a10a92e5d709eff0f38248869a2c51998f8d965a536b98aabb3f/userdata/resolv.conf",
        "HostnamePath": "/run/user/1000/containers/overlay-containers/720de25ef7d2a10a92e5d709eff0f38248869a2c51998f8d965a536b98aabb3f/userdata/hostname",
        "HostsPath": "/run/user/1000/containers/overlay-containers/720de25ef7d2a10a92e5d709eff0f38248869a2c51998f8d965a536b98aabb3f/userdata/hosts",
        "StaticDir": "/home/guochao/.local/share/containers/storage/overlay-containers/720de25ef7d2a10a92e5d709eff0f38248869a2c51998f8d965a536b98aabb3f/userdata",
        "OCIConfigPath": "/home/guochao/.local/share/containers/storage/overlay-containers/720de25ef7d2a10a92e5d709eff0f38248869a2c51998f8d965a536b98aabb3f/userdata/config.json",
        "OCIRuntime": "crun",
        "ConmonPidFile": "/run/user/1000/containers/overlay-containers/720de25ef7d2a10a92e5d709eff0f38248869a2c51998f8d965a536b98aabb3f/userdata/conmon.pid",
        "PidFile": "/run/user/1000/containers/overlay-containers/720de25ef7d2a10a92e5d709eff0f38248869a2c51998f8d965a536b98aabb3f/userdata/pidfile",
        "Name": "nifty_chaplygin",
        "RestartCount": 0,
        "Driver": "overlay",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "",
        "EffectiveCaps": [
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FOWNER",
            "CAP_FSETID",
            "CAP_KILL",
            "CAP_NET_BIND_SERVICE",
            "CAP_SETFCAP",
            "CAP_SETGID",
            "CAP_SETPCAP",
            "CAP_SETUID",
            "CAP_SYS_CHROOT"
        ],
        "BoundingCaps": [
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FOWNER",
            "CAP_FSETID",
            "CAP_KILL",
            "CAP_NET_BIND_SERVICE",
            "CAP_SETFCAP",
            "CAP_SETGID",
            "CAP_SETPCAP",
            "CAP_SETUID",
            "CAP_SYS_CHROOT"
        ],
        "ExecIDs": [],
        "GraphDriver": {
            "Name": "overlay",
            "Data": {
                "LowerDir": "/home/guochao/.local/share/containers/storage/overlay/2c7e7ab2260a4a82502119b16f69d58fbcb7b561435a8a86013226505809d550/diff",
                "MergedDir": "/home/guochao/.local/share/containers/storage/overlay/33dee2ffefddb08cb5efdc1097cab8a7dfc04b4c20ca4452888dd221f6f83778/merged",
                "UpperDir": "/home/guochao/.local/share/containers/storage/overlay/33dee2ffefddb08cb5efdc1097cab8a7dfc04b4c20ca4452888dd221f6f83778/diff",
                "WorkDir": "/home/guochao/.local/share/containers/storage/overlay/33dee2ffefddb08cb5efdc1097cab8a7dfc04b4c20ca4452888dd221f6f83778/work"
            }
        },
        "Mounts": [],
        "Dependencies": [],
        "NetworkSettings": {
            "EndpointID": "",
            "Gateway": "",
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "MacAddress": "",
            "Bridge": "",
            "SandboxID": "",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "/run/user/1000/netns/cni-0be496a9-f5d2-71a2-586d-e0ca261e40fa"
        },
        "ExitCommand": [
            "/usr/bin/podman",
            "--root",
            "/home/guochao/.local/share/containers/storage",
            "--runroot",
            "/run/user/1000/containers",
            "--log-level",
            "warning",
            "--cgroup-manager",
            "systemd",
            "--tmpdir",
            "/run/user/1000/libpod/tmp",
            "--runtime",
            "crun",
            "--storage-driver",
            "overlay",
            "--events-backend",
            "journald",
            "container",
            "cleanup",
            "--rm",
            "720de25ef7d2a10a92e5d709eff0f38248869a2c51998f8d965a536b98aabb3f"
        ],
        "Namespace": "",
        "IsInfra": false,
        "Config": {
            "Hostname": "720de25ef7d2",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": true,
            "OpenStdin": true,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "TERM=xterm",
                "container=podman",
                "https_proxy=http://127.0.0.1:8888/",
                "HTTPS_PROXY=http://127.0.0.1:8888/",
                "FTP_PROXY=http://127.0.0.1:8888/",
                "NO_PROXY=localhost,127.0.0.0/8,::1,192.168.0.0/16,172.16.0.0/12,10.0.0.0/8,*.cn,cn",
                "http_proxy=http://127.0.0.1:8888/",
                "ftp_proxy=http://127.0.0.1:8888/",
                "no_proxy=localhost,127.0.0.0/8,::1,192.168.0.0/16,172.16.0.0/12,10.0.0.0/8,*.cn,cn",
                "HTTP_PROXY=http://127.0.0.1:8888/",
                "HOME=/root",
                "HOSTNAME=720de25ef7d2"
            ],
            "Cmd": [
                "bash"
            ],
            "Image": "docker.io/library/debian:buster",
            "Volumes": null,
            "WorkingDir": "/",
            "Entrypoint": "",
            "OnBuild": null,
            "Labels": null,
            "Annotations": {
                "io.container.manager": "libpod",
                "io.kubernetes.cri-o.Created": "2021-12-18T15:06:20.965087196+08:00",
                "io.kubernetes.cri-o.TTY": "true",
                "io.podman.annotations.autoremove": "TRUE",
                "io.podman.annotations.init": "FALSE",
                "io.podman.annotations.privileged": "FALSE",
                "io.podman.annotations.publish-all": "FALSE",
                "org.opencontainers.image.stopSignal": "15"
            },
            "StopSignal": 15,
            "CreateCommand": [
                "podman",
                "run",
                "-it",
                "--rm",
                "docker.io/debian:buster"
            ],
            "Umask": "0022",
            "Timeout": 0,
            "StopTimeout": 10
        },
        "HostConfig": {
            "Binds": [],
            "CgroupManager": "systemd",
            "CgroupMode": "private",
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "journald",
                "Config": null,
                "Path": "",
                "Tag": "",
                "Size": "0B"
            },
            "NetworkMode": "slirp4netns",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "",
                "MaximumRetryCount": 0
            },
            "AutoRemove": true,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": [],
            "CapDrop": [
                "CAP_AUDIT_WRITE",
                "CAP_MKNOD",
                "CAP_NET_RAW"
            ],
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": [],
            "GroupAdd": [],
            "IpcMode": "private",
            "Cgroup": "",
            "Cgroups": "default",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "private",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": [],
            "Tmpfs": {},
            "UTSMode": "private",
            "UsernsMode": "",
            "ShmSize": 65536000,
            "Runtime": "oci",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "user.slice",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": 0,
            "OomKillDisable": false,
            "PidsLimit": 2048,
            "Ulimits": [],
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "CgroupConf": null
        }
    }
]

我们可以看到debian镜像inspect的信息里面写着没有父级镜像,唯一的一层是sha:2c7e7ab2260a4a82502119b16f69d58fbcb7b561435a8a86013226505809d550,从对pod inspect的信息中可以看到它的diff目录(每层文件系统的差异)正是 GraphDriver中的LowerDir。而UpperDir的Diff可以看到在/home/guochao/.local/share/containers/storage/overlay/33dee2ffefddb08cb5efdc1097cab8a7dfc04b4c20ca4452888dd221f6f83778/diff。在容器内查看根目录挂载信息可以证实podman inspect得到的graph driver的信息:overlay on / type overlay (rw,relatime,lowerdir=/home/guochao/.local/share/containers/storage/overlay/l/GGGENAHOSSNMQFTCIWTGULBJAR,upperdir=/home/guochao/.local/share/containers/storage/overlay/33dee2ffefddb08cb5efdc1097cab8a7dfc04b4c20ca4452888dd221f6f83778/diff,workdir=/home/guochao/.local/share/containers/storage/overlay/33dee2ffefddb08cb5efdc1097cab8a7dfc04b4c20ca4452888dd221f6f83778/work,index=off,metacopy=off,volatile,userxattr)

我们尝试在容器里面touch创建一个文件以后,对应的upper目录就会出现这个文件,在外面改文件内容以后容器内也可以看到:

root@720de25ef7d2:/# touch /test
root@720de25ef7d2:/# ls /
bin  boot  dev	etc  home  lib	lib64  media  mnt  opt	proc  root  run  sbin  srv  sys  test  tmp  usr  var
guochao@yoga14c ~ % ls /home/guochao/.local/share/containers/storage/overlay/33dee2ffefddb08cb5efdc1097cab8a7dfc04b4c20ca4452888dd221f6f83778/diff/    
etc  run  test
guochao@yoga14c ~ % echo 123 > /home/guochao/.local/share/containers/storage/overlay/33dee2ffefddb08cb5efdc1097cab8a7dfc04b4c20ca4452888dd221f6f83778/diff/test
root@720de25ef7d2:/# cat /test 
123

如果我们用一个更复杂的镜像,那么这个lower的列表就会长很多,有兴趣的同学可以自己试试看。

用Dockerfile创建镜像

理解镜像是怎么回事以后,下来就要聊聊我们怎么创建一个镜像。

需求和原则

应用容器镜像这样设计就是为了容器大小更精简,使用namespace、cgroup和apparmor则是为了更安全,我们构建镜像也应该基于这个原则来考虑。

为了精简,我们应该在实际产生有可以复用的层以后,把可复用层打包成单独的镜像以供打包其他镜像使用,这样可以减少重复的内容。

每个镜像甚至每层都应该尽可能只包含镜像所需要的产物,比如说二进制、处理过的资源,诸如此类的。应该尽量减少源文件以及中间产物的存留,比如说包管理的缓存、编译的源文件和中间文件。

为了安全,我们应该尽可能多的利用容器本身的设施进行打包,避免在容器外打包,在打包过程中产生安全问题。

基础

构建镜像我们通常使用Dockerfile来定义构建过程,这是一个业界的事实标准。一般使用FROM、ARG、LABEL、ADD、COPY、WORKDIR、USER、RUN、VOLUME、EXPOSE、ENTRYPOINT和CMD这几个命令来构建。

# 基础镜像
FROM    ubuntu:focal

# 额外的元信息,标签,比如说OCI所罗列的这些:https://github.com/opencontainers/image-spec/blob/main/annotations.md#rules
LABEL   org.opencontainers.image.authors=guochao <jeffguorg at gmail.com> 

# 解压并添加tarball中的内容
ADD     ./release.tar.gz /app/

# 复制文件
COPY    ./blablabla.json /app/config.json

# 构建和运行阶段的工作目录
WORKDIR /app/

# 使用低权限用户和组运行
USER    nobody:nobody

# 在/data/这里挂载一个存储卷
VOLUME  /data 

# 构建阶段执行一个命令
RUN     /app/myapp bootstrap 

# 暴露8080端口
EXPOSE  8080 

# 运行阶段默认会执行的命令
ENTRYPOINT /app/myapp

# 运行阶段执行命令的默认参数
CMD     serve 

比如说我写了一个Python/tornado的应用,就会这么打包一个应用:

FROM python:3.8-buster
RUN  pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tornado aioredis databases[mysql]
ADD  . /app
VOLUME /data
EXPOSE 1234
WORKDIR /app
USER nobody
# ENTRYPOINT 默认是sh -c + CMD
CMD  ["python", "-m", "myserver", "-p", "1234", "-d", "/data"]

构建的过程会在容器内进行,并且标准容器引擎构建容器时都会尽可能多利用容器本身隔离构建环境,避免构建过程的恶意行为对设施造成实际的损伤。

多阶段构建

上面贴的构建脚本是一个非常简单的脚本,其实问题很多。尤其是我们使用到webpack之类的工具打包前端,或者遇到源码编译到二进制的情况,我们只需要最终产物,并不关心中间产生的文件。虽然可以在外面构建然后拷贝到容器内,但是这就违背了安全的前提,如果包含源码和中间产物,又会让镜像变得异常大。

这就引入了多阶段构建。应用容器的构建可以依次启动多个容器,在不同的容器内做不同的事情,最后产生一个容器作为输出。为了满足需求,Dockerfile的指令也有了新的参数,我们主要涉及到的是FROM和COPY这两个指令。

对于多阶段构建的容器,我们可以定义多个FROM,每一个FROM启动一个构建容器,FROM可选带一个AS为构建容器起一个别名,最后一个构建容器一般是最终产生的容器。后启动的容器可以用之前的别名作为base继续构建,也可以通过别名或者序号用COPY指令从前一个阶段复制产生的文件。

比如说我可能会希望在前一个阶段构建前端(比如说webpack),然后编译服务器(比如说Java、Rust、Go,或者Python编译bytecode),最终生成一个只包含编译结果和服务器二进制的镜像。

我就会写

FROM node:11
ADD  frontend /app
WORKDIR /app
RUN  yarn install && yarn build

FROM go:1.17-alpine
ADD  backend /app
WORKDIR /app
RUN  go build ./cmd/server

FROM ubuntu:focal 
COPY --from=0 /app/dist /dist
ARG  ASSET_S3
RUN  if [ -z "$ASSET_S3" ]; then ....; fi

FROM alpine:3.12
RUN  apk update && apk add ca-certificates
COPY --from=0 /app/dist /app/static
COPY --from=1 /app/server /app/server
VOLUME /data
EXPOSE 8888
WORKDIR /app
ENTRYPOINT /app/server
CMD  ["serve", "--listen", ":8888", "--data", "/data"]

前面开了三个容器,分别编译前端,编译Go,根据参数上传文件,但是产生的文件都不会放到最终产品的容器里面,只有最后一个容器会从前面拷贝产物,并且生成镜像。

多架构镜像

多架构是另一个问题。

主要原因是我们常用的的计算机架构相对以前更多了,我们以前都是用x86/ia32,早一点的现在更多用x86-64/amd64,但是今天我们有arm,有aarch64、mips、riscv、longson、s390x、ppc64,其中大部分架构只要能跑Linux就可以运行容器。但是我们混合用这些架构的时候,我们很难挨个容器去设置,你用那个镜像,他用另一个,所以我们需要在同一个名字下面提供几种不同架构的容器。

但是这个时候出现了另外几个问题:我们怎么让同一个image:tag指向多个架构的镜像,我们怎么尽量在同一个机器上构建多个架构的镜像,有多少registry支持多架构的镜像。

我们先说简单的一个:有多少registry支持多架构的镜像。现在所有的registry都支持多架构镜像。

怎么让image:tag指向不同的镜像:application/vnd.docker.distribution.manifest.list.v2+json

下来我们解释多架构镜像是怎么实现的。首先是同一个image:tag指向不同的镜像,文件都是数据,没有特别明显的特征,尤其是我们可能会有多个架构的文件作为文件系统的一部分,所以没有办法100%精准的通过文件系统判断类型,所以我们在manifest上做了一些手脚:让image:tag指向一个列表,这个列表中的每项带着特别的信息和真实的镜像的信息。

以上面提到的 docker.io/library/debian:buster的manifest为例,返回了一个列表,这个列表中有很多类型的manifest,包括amd64、arm/v5、arm/v7、arm64……等等很多架构的manifest,这些manifest最终指向具体的镜像。

我在这里换用docker了,因为podman有些奇怪的问题。

docker manifest inspect docker.io/library/debian:buster -v

[
	{
		"Ref": "docker.io/library/debian:buster@sha256:9a1f494bb52e5d18e2dfb0fd6e59dbfe69aae9feecff1b246ad69984fbe25772",
		"Descriptor": {
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"digest": "sha256:9a1f494bb52e5d18e2dfb0fd6e59dbfe69aae9feecff1b246ad69984fbe25772",
			"size": 529,
			"platform": {
				"architecture": "amd64",
				"os": "linux"
			}
		},
		"SchemaV2Manifest": {
			"schemaVersion": 2,
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"config": {
				"mediaType": "application/vnd.docker.container.image.v1+json",
				"size": 1462,
				"digest": "sha256:6642e362a8254d7645ed8dae69be5e3edcd2e9a5ee77baeb0655595244082d13"
			},
			"layers": [
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 50437113,
					"digest": "sha256:c4cc477c22ba7abce860198107408434dd7bd73ddbaf82f1e69ab941b9979405"
				}
			]
		}
	},
	{
		"Ref": "docker.io/library/debian:buster@sha256:e32b6426b9674c7673fa277a465ca0cf0d83ef5c414dca495b69d576ad0045e8",
		"Descriptor": {
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"digest": "sha256:e32b6426b9674c7673fa277a465ca0cf0d83ef5c414dca495b69d576ad0045e8",
			"size": 529,
			"platform": {
				"architecture": "arm",
				"os": "linux",
				"variant": "v5"
			}
		},
		"SchemaV2Manifest": {
			"schemaVersion": 2,
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"config": {
				"mediaType": "application/vnd.docker.container.image.v1+json",
				"size": 1475,
				"digest": "sha256:22f7b5af76cfc19ea1c3297f4e55c5ef2cac68588f4c9c6846dc6129fcec6c38"
			},
			"layers": [
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 48154319,
					"digest": "sha256:a1593bc232eeb9f23957084e12e8d4be6b14f8a99d43fd93b74963cfca9a24a9"
				}
			]
		}
	},
	{
		"Ref": "docker.io/library/debian:buster@sha256:e2dfabc3b3958ebbf3b5c007b1d557a49d68c3ad07a0137ef4263b45f90371fc",
		"Descriptor": {
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"digest": "sha256:e2dfabc3b3958ebbf3b5c007b1d557a49d68c3ad07a0137ef4263b45f90371fc",
			"size": 529,
			"platform": {
				"architecture": "arm",
				"os": "linux",
				"variant": "v7"
			}
		},
		"SchemaV2Manifest": {
			"schemaVersion": 2,
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"config": {
				"mediaType": "application/vnd.docker.container.image.v1+json",
				"size": 1475,
				"digest": "sha256:9c8cb83393db2d78474684771e30adfa89b6820ac0c35acd3e7cf71d86262fd3"
			},
			"layers": [
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 45918126,
					"digest": "sha256:2c5d3a36ba44675d774d996a47340758fe658760807ada03e875b485efd98631"
				}
			]
		}
	},
	{
		"Ref": "docker.io/library/debian:buster@sha256:513fb887826a93e31a0fc321fc1b425681d806dec87c23996304b68952e74166",
		"Descriptor": {
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"digest": "sha256:513fb887826a93e31a0fc321fc1b425681d806dec87c23996304b68952e74166",
			"size": 529,
			"platform": {
				"architecture": "arm64",
				"os": "linux",
				"variant": "v8"
			}
		},
		"SchemaV2Manifest": {
			"schemaVersion": 2,
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"config": {
				"mediaType": "application/vnd.docker.container.image.v1+json",
				"size": 1477,
				"digest": "sha256:d89af8d127c5228a69ab74a3b442f8d17c73c156fa42479c3f75b569527208f8"
			},
			"layers": [
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 49223045,
					"digest": "sha256:39e4f823356a9e2dbba530f9d363b4d76beaff75a13bad788d38eebeae67e5b0"
				}
			]
		}
	},
	{
		"Ref": "docker.io/library/debian:buster@sha256:31633d43755ba72b393b1f87f86e1ba166e5c749aa9b983f536a93a56f8db72d",
		"Descriptor": {
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"digest": "sha256:31633d43755ba72b393b1f87f86e1ba166e5c749aa9b983f536a93a56f8db72d",
			"size": 529,
			"platform": {
				"architecture": "386",
				"os": "linux"
			}
		},
		"SchemaV2Manifest": {
			"schemaVersion": 2,
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"config": {
				"mediaType": "application/vnd.docker.container.image.v1+json",
				"size": 1460,
				"digest": "sha256:0f748b4ae4c84d01f74e031cf5696a8838cb50a03fd050f93d9eff7fb1b24756"
			},
			"layers": [
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 51207683,
					"digest": "sha256:3ed3f8c8d2c596bcef8beee90c4666003f0848cc4a23aaa9a695092c1c5da63e"
				}
			]
		}
	},
	{
		"Ref": "docker.io/library/debian:buster@sha256:0723c5922ea4b6f9970388b8744b9eb88ce3e23a4be563575ef6422e78ebecbb",
		"Descriptor": {
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"digest": "sha256:0723c5922ea4b6f9970388b8744b9eb88ce3e23a4be563575ef6422e78ebecbb",
			"size": 529,
			"platform": {
				"architecture": "mips64le",
				"os": "linux"
			}
		},
		"SchemaV2Manifest": {
			"schemaVersion": 2,
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"config": {
				"mediaType": "application/vnd.docker.container.image.v1+json",
				"size": 1465,
				"digest": "sha256:1e2d6b2807f976b0d0208528a986d5ce47ce1768705356020564d082ed692f89"
			},
			"layers": [
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 49079474,
					"digest": "sha256:db00aa5bb460f25028e8f1293b6a9fc4b7a63ea0c2601b0b4e6f8b9e5acfa683"
				}
			]
		}
	},
	{
		"Ref": "docker.io/library/debian:buster@sha256:90b1140489ded702d3424b85dcba30ea2f41c3098fd9ec8fe4b48536b18f2756",
		"Descriptor": {
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"digest": "sha256:90b1140489ded702d3424b85dcba30ea2f41c3098fd9ec8fe4b48536b18f2756",
			"size": 529,
			"platform": {
				"architecture": "ppc64le",
				"os": "linux"
			}
		},
		"SchemaV2Manifest": {
			"schemaVersion": 2,
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"config": {
				"mediaType": "application/vnd.docker.container.image.v1+json",
				"size": 1464,
				"digest": "sha256:9a23435a17a291fc393f6a795f17506c556f3c3fb2e2274ac5e5f7a16e578081"
			},
			"layers": [
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 54183785,
					"digest": "sha256:e863e86537b5fca0c122c393c3cc05bd29095cfe10c60ec14e3024f0589df622"
				}
			]
		}
	},
	{
		"Ref": "docker.io/library/debian:buster@sha256:c80020fdd952b5a2c9d660b723fde16661cacaecb70102140c54c86d92dfcd3c",
		"Descriptor": {
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"digest": "sha256:c80020fdd952b5a2c9d660b723fde16661cacaecb70102140c54c86d92dfcd3c",
			"size": 529,
			"platform": {
				"architecture": "s390x",
				"os": "linux"
			}
		},
		"SchemaV2Manifest": {
			"schemaVersion": 2,
			"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
			"config": {
				"mediaType": "application/vnd.docker.container.image.v1+json",
				"size": 1462,
				"digest": "sha256:2c154b650dfa3cf8df3e1285b5e99833c3fa90aba0652da8b92279fbe5acca1f"
			},
			"layers": [
				{
					"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
					"size": 49005485,
					"digest": "sha256:880071943e5204965576e16f73b7be79cd355b6dfa2808413019623a4fc50be8"
				}
			]
		}
	}
]

可以看到其中有archetecture、os和variant,拉下来这个列表以后引擎就可以根据这三项来确认具体用哪个镜像了。

反过来说,我们可以构建多个镜像,然后让构建引擎创建一个列表,逐个添加不同架构的信息并且指向每个镜像实际的层信息就可以了。

用Docker手工创建多架构镜像

根据docker的博客,我们可以在不同的机器上构建、推送镜像后,用docker manifest手工创建多架构镜像:

# AMD64
$ docker build -t your-username/multiarch-example:manifest-amd64 --build-arg ARCH=amd64/ .
$ docker push your-username/multiarch-example:manifest-amd64

# ARM32V7
$ docker build -t your-username/multiarch-example:manifest-arm32v7 --build-arg ARCH=arm32v7/ .
$ docker push your-username/multiarch-example:manifest-arm32v7

# ARM64V8
$ docker build -t your-username/multiarch-example:manifest-arm64v8 --build-arg ARCH=arm64v8/ .
$ docker push your-username/multiarch-example:manifest-arm64v8

# Create multi-arch manifest
$ docker manifest create \
your-username/multiarch-example:manifest-latest \
--amend your-username/multiarch-example:manifest-amd64 \
--amend your-username/multiarch-example:manifest-arm32v7 \
--amend your-username/multiarch-example:manifest-arm64v8

简单说就是分别构建三个镜像,推送到registry,最后docker客户端直接告诉registry把这三个镜像放到list里面做成一个新的image。

这种方法适合我们有不同架构的机器,并且可以用CI调度的情况。但是如果希望用同一个命令

用buildah/podman

以防万一,我觉得大家都还没怎么用过podman。podman是一个docker的替代品,感觉红帽现在在主推podman,在RHEL8的很多组件中集成了podman。

podman使用buildah作为构建镜像的工具,所以podman build和buildah build是同一个命令。我们可以通过–platform指定要构建哪些平台的镜像,一次性构建好。

唯一不开心的是在写博客的这两天里面,buildah的多平台构建似乎有bug,不是特别好使。

用Docker buildx

docker的多平台构建工具在buildx中。用法和podman类似。

docker buildx build --platform="linux/arm64,linux/amd64" --tag registry.jeffthecoder.xyz/guocaho/iam:0.0.1 .
...

没有解决的问题

无法编译除了本平台以外其他平台的镜像。podman和docker会提示格式错误:

podman build --platform="linux/arm64" --tag registry.jeffthecoder.xyz/guocaho/iam:0.0.1 .
[1/2] STEP 1/5: FROM golang:buster
Resolving "golang" using unqualified-search registries (/etc/containers/registries.conf)
Trying to pull docker.io/library/golang:buster...
Getting image source signatures
Copying blob 4435d4bbc53c skipped: already exists  
Copying blob df66cf961d40 skipped: already exists  
Copying blob e8921db2f78f skipped: already exists  
Copying blob 39e4f823356a skipped: already exists  
Copying blob b9db99d12f51 skipped: already exists  
Copying blob f36cb2a39eb4 skipped: already exists  
Copying blob cf6c547e43b8 skipped: already exists  
Copying config df3537e9a5 done  
Writing manifest to image destination
Storing signatures
[1/2] STEP 2/5: ADD  . /app
--> Using cache 23a714426667c5c74b953b9409261010cb405f57eadde833e5fdcadd23dfa5bc
--> 23a71442666
[1/2] STEP 3/5: WORKDIR /app
--> Using cache 11ed84cfc39b4f3aa32f542df1e13b94aabf1f7124b1fb1e1ce1f375481c324c
--> 11ed84cfc39
[1/2] STEP 4/5: RUN  sed -i 's/deb.debian.org/mirrors.ustc.edu.cn/g; s|security.debian.org/debian-security|mirrors.ustc.edu.cn/debian-security|g' /etc/apt/sources.list && apt update && apt install -y build-essential libsqlite3-dev
exec container process `/bin/sh`: Exec format error
[2/2] STEP 1/5: FROM debian:buster
Error: error building at STEP "RUN sed -i 's/deb.debian.org/mirrors.ustc.edu.cn/g; s|security.debian.org/debian-security|mirrors.ustc.edu.cn/debian-security|g' /etc/apt/sources.list && apt update && apt install -y build-essential libsqlite3-dev": error while running runtime: exit status 1

这是因为可执行文件都有目标平台的架构,架构不同一般无法执行。

同一台机器是如何构建多架构的镜像的

从上面来看其实构建过程并没有什么不一样的地方,都是做一些操作放文件进去或者给manifest添加信息。唯一有区别的就是放什么架构的二进制文件进去。

对于可以方便跨平台构建的项目来说,可以在前面几个Dockerfile的阶段构建以后复制到最终成品的镜像里面。也就是上面说的用docker手动创建的方法。但是终究会麻烦很多。

如果我们想要在同一台机器上启动不同平台的容器就需要一个稍微另类一些的方法,那就是通过用户态qemu直接执行其他平台的二进制。我们在这里不多追究原理,用法来说首先需要qemu-user-static的包,安装完成后应该有qemu-$PLATFORM-static的二进制,然后我们需要执行镜像:docker run --rm --privileged multiarch/qemu-user-static --reset -p yes,然后就可以直接运行不同平台的镜像了。

脱离Docker构建镜像

构建镜像本质其实就是启动一个容器,做各种事情以后打包成tar,每一步形成一个layer。既然是容器,那么对于Kubernetes集群中的CI来说,有一个额外的难点是容器内套容器困难,挂载docker的socket又会有安全隐患,而且Kubernetes也在计划让docker退休了。所以我们需要一个办法可以让CI在容器内就打包好镜像。

使用buildah代替Docker进行构建 / 或者使用rootless docker进行构建

其实这部分的解决办法类似于我在之前的《在LXD中运行containerd》所用的办法,也就是用rootless container,不需要root用户和特权就可以创建并运行的容器。

buildah和podman在这方面走的格外超前,我最开始使用rootless container就是从podman开始的。只要在主机一侧配置好subuid就可以无障碍的在容器里面使用podman和buildah再启动容器和构建镜像。

有兴趣的同学可以看红帽的介绍:https://www.redhat.com/sysadmin/podman-inside-container。

考虑我们本身是在k8s环境下构建镜像,我们可以每次构建新开容器,所以容器内环境会不会被破坏我们并不关心,我们只要可以借助容器的隔离环境,安全的挂载目录、执行各种指令,最后把镜像推到registry中,就算是胜利。

对应到红帽的操作中,我们需要做的就是准备一个初始镜像,这个镜像需要安装podman(或者直接使用podman容器),修改容器中的/etc/containers/storage.conf和containers.conf,创建需要的目录。

演示一下容器里面开容器:

guochao@yoga14c ~ % podman run -it --rm --cap-add=sys_admin,mknod --device /dev/fuse centos:8
[root@eed0acd277a8 /]# dnf install podman vim crun -y
Failed to set locale, defaulting to C.UTF-8
CentOS-8 - AppStream                                                                                                                                                                                          3.9 MB/s | 8.2 MB     00:02    
CentOS-8 - Base                                                                                                                                                                                               1.3 MB/s | 3.5 MB     00:02    
CentOS-8 - Extras                                                                                                                                                                                             7.6 kB/s |  10 kB     00:01    
Dependencies resolved.
==============================================================================================================================================================================================================================================
 Package                                                         Architecture                               Version                                                                       Repository                                     Size
==============================================================================================================================================================================================================================================
Installing:
 podman                                                          x86_64                                     3.3.1-9.module_el8.5.0+988+b1f0b741                                           AppStream                                      12 M
 vim-enhanced                                                    x86_64                                     2:8.0.1763-16.el8                                                             AppStream                                     1.4 M
 crun                                            x86_64                                            1.0-1.module_el8.5.0+911+f19012f9                                               AppStream                                            193 k
....

[root@eed0acd277a8 /]# vim /etc/containers/containers.conf
[containers]
netns="host"
userns="host"
ipcns="host"
utsns="host"
cgroupns="host"
cgroups="disabled"
log_driver = "k8s-file"
[engine]
cgroup_manager = "cgroupfs"
events_logger="file"
runtime="crun"

[root@eed0acd277a8 /]# chmod 644 /etc/containers/containers.conf; sed -i -e 's|^#mount_program|mount_program|g' -e '/additionalimage.*/a "/var/lib/shared",' -e 's|^mountopt[[:space:]]*=.*$|mountopt = "nodev,fsync=0"|g' /etc/containers/storage.conf
[root@eed0acd277a8 /]# mkdir -p /var/lib/shared/overlay-images /var/lib/shared/overlay-layers /var/lib/shared/vfs-images /var/lib/shared/vfs-layers; touch /var/lib/shared/overlay-images/images.lock; touch /var/lib/shared/overlay-layers/layers.lock; touch /var/lib/shared/vfs-images/images.lock; touch /var/lib/shared/vfs-layers/layers.lock

[root@eed0acd277a8 /]# podman run -it --rm vhmjt3oe.mirror.aliyuncs.com/library/ubuntu:focal
root@eed0acd277a8:/#

然后之后我们每次都用这个镜像启动一个打包容器,就可以相对比较安全的打包镜像了。