[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Docker builds vs package builds



On 23/05/14 12:33 +0000, Colin Walters wrote:
I posted something related to this on atomic@, but it's more of a development thing, so moving here.

After playing with Dockerfiles for some of my code, one thing that's becoming very clear to me is that using Dockerfiles certainly make it easy to get started, but the simplicity of the model/format seriously hampers efficient RPM (or many other buildsystems) integration, and has some subtle traps.

For example, a simple thing you see in a lot of Dockerfiles is:

RUN yum -y update

Except...this gets cached by Docker which is **not** what you want in general. This is https://github.com/dotcloud/docker/issues/1996

I ended up doing the recommended workaround in that bug of adding a comment, so it's RUN yum -y update #nocache20140523.0
And whenever I want to avoid the cache I change the comment.

there is also '--no-cache' available to `docker build`

The right thing depends on circumstance, but it would be lot more efficient to do a check of the repository timestamps, and reuse the cached layer if they haven't changed. A further optimization would be to reuse the cached layer if none of the packages have changed (this matters a lot for Fedora, where the repo changes a lot but not always for packages you actually use).

Doing this sort of intelligence requires nontrivial code; it's not clear to me whether it should live inside or outside of the container.

It's hard for it to live inside the container as the container itself (AFAIK) can't drive the Docker layer caching. We'd have to do something like include within each layer cache metadata (such as the repository timestamps), and then enhance yum to leave the system untouched if the timestamps haven't changed.

This gets really quite ugly as we'd need to do pervasive O_NOATIME inside the container to avoid any changes from simply running code, clean up every temporary file, etc.

Then the other model is to try to entirely drive the process from outside of Docker. But if you do that, it's again hard to reuse the Docker image cache; you'd have to keep cached state on the host system.

Is anyone aware of any advanced work on package (or other advanced buildsystem/deployment) integration with Dockerfiles? This topic could probably use a wiki page or something, it seems like a fairly open area.

I'm not entirely sure of more advance uses going on around this. :-\

Attachment: pgpypd1EKFZlS.pgp
Description: PGP signature


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]