INFRA-162: Increase BuildKite parallelism

Metadata

Source
INFRA-162
Type
Improvement
Priority
Major
Status
Closed
Resolution
Won't Fix
Assignee
Giovanni Tirloni
Reporter
Giovanni Tirloni
Created
2018-04-12T11:36:28.307-0400
Updated
2018-08-29T13:24:13.664-0400
Versions
N/A
Fixed Versions
N/A
Component
N/A

Description

BuildKite's agent works on a single build at once. To increase the parallelism and make effective use of all CPUs for our builds, more agents need to be started, per host.

Comments

  • Giovanni Tirloni commented 2018-04-13T08:51:41.233-0400

    Job Concurrency

    While Jenkins lets you tweak the number of executors and GitLab lets you define how many concurrent jobs a runner supports, the Buildkite agent runs one job at a time. This makes it simpler and more robust, which are trade-offs I can agree with.

    If you want to increase the job concurrency per host, you have to deploy multiple agents.

    The official documentation is a bit light on details so I’ll add the steps I have followed to achieve this. My environment consists of CentOS hosts and the agents are started by systemd. I’ve tested these steps on buildkite-agent v3.0.

    The idea is to have a single agent configuration file with global settings for the host and tweak the per-agent settings using a systemd unit template, so you can easily instantiate as many units as needed.

    The only settings you will need to define are the token and any tags you might need. Tags are useful when you need to define a different queue name or add meta-data about your hosts (e.g. queue=priority,hypervisor=kvm,docker=true) that can later be used to target jobs.

    The minimum per-agent settings are:

    • name
    • build-path
    • hooks-path
    • plugins-path

    As I’ve mentioned, the latter will be set automatically using systemd.

    If you need to set per-host settings, use the configuration file. For per-agent settings, add them to the systemd unit file.

    Another approach is to have per-agent configuration files instead of using systemd (and environment variables). For that, tweak the systemd unit file below to pass the BUILDKITE_AGENT_CONFIG environment variable or the --config parameter to buildkite-agent start.

    Steps

    First, install the agent following the official steps here.

    Edit the /etc/buildkite-agent/buildkite-agent.cfg file so it only has the following lines:

    token="xxx"
    tags="tag1=value1,tag2=value2"     # optional
    

    In other words, ensure that per-agent settings like namebuild-pathhooks-path and plugins-path are not defined in the configuration file.

    Next, edit the /usr/lib/systemd/system/buildkite-agent@.service file and replace it with the following:

    [Unit]
    Description=Buildkite Agent (%i)
    Documentation=https://buildkite.com/agent
    After=syslog.target
    After=network.target
    
    [Service]
    Type=simple
    User=buildkite-agent
    PermissionsStartOnly=true
    Environment=HOME=/var/lib/buildkite-agent-%i
    Environment=BUILDKITE_AGENT_NAME=%H-%i
    Environment=BUILDKITE_BUILD_PATH=/var/lib/buildkite-agent-%i/builds
    Environment=BUILDKITE_HOOKS_PATH=/etc/buildkite-agent/hooks
    Environment=BUILDKITE_PLUGINS_PATH=/etc/buildkite-agent/plugins
    ExecStartPre=/bin/mkdir -p /var/lib/buildkite-agent-%i/builds
    ExecStartPre=/bin/chown -R buildkite-agent /var/lib/buildkite-agent-%i
    ExecStart=/usr/bin/buildkite-agent start
    RestartSec=5
    Restart=on-failure
    TimeoutSec=10
    
    [Install]
    WantedBy=multi-user.target
    DefaultInstance=1
    

    Each agent will get a directory under /var/lib/buildkite-agent-%i where %i is the instance argument.

    Reload the systemd configuration with systemctl daemon-reload.

    Now you are ready to start as many agents as needed:

    # systemctl enable buildkite-agent@1.service
    # systemctl start buildkite-agent@1.service
    

    Repeat these steps and increase the instance numbers and you should see your agents appearing in the Buildkite dashboard

    In case you want to change the agent name to something other than %hostname-%instance, ensure it’s set to something globally unique. For a list of systemd specifiers, see the official documentation (table 4).

    Caveats

    It might seem unintuitive at first but Buildkite can schedule jobs from a single pipeline to different agents.

    If all your pipeline steps need to run on the same host, you have to ensure that each step is properly targetted at that host using a combination of meta-data selectors (see the agents attribute for the {{command}}step).

    - name: build
        command: echo 'hello world'
        agents:
          name: hostname
          queue: default
    

    There is a feature request to have per-pipeline agent rules so the agent doesn’t need to be specified per step (buildkite/feedback/issues/173), which should simplify this.

    Source: https://ret.cx/2018/04/running-multiple-buildkite-agents-per-host/

  • Giovanni Tirloni commented 2018-04-13T09:05:51.613-0400

    Avtar Gill does this approach sound okay? I'm just worried about what I mentioned in the caveats section. That will make the pipeline.yml file look a bit ugly. Some people have tried to simplify it by turning the yaml into a shell script and then getting the agent's name and adding it to everything (so the step becomes "pipeline.sh | buildkite-agent upload pipeline"). If issue#173 would be implemented, that would be awesome but I'm not sure how long we'd have to wait for that.

    I'm trying to think of a situation where running steps in different hosts would be beneficial, it's so different from the way we have been working for years now.

  • Giovanni Tirloni commented 2018-04-16T20:54:16.520-0400

    Made some changes to the systemd unit file so it has the agent with separate users:

    [Unit]
    Description=Buildkite Agent (%i)
    Documentation=https://buildkite.com/agent
    After=syslog.target
    After=network.target
    
    [Service]
    Type=simple
    User=buildkite-agent-%i
    PermissionsStartOnly=true
    Environment=HOME=/home/buildkite-agent-%i
    Environment=BUILDKITE_AGENT_NAME=%H-%i
    Environment=BUILDKITE_BUILD_PATH=/home/buildkite-agent-%i/builds
    Environment=BUILDKITE_HOOKS_PATH=/etc/buildkite-agent/hooks
    Environment=BUILDKITE_PLUGINS_PATH=/etc/buildkite-agent/plugins
    ExecStartPre=/bin/mkdir -p /home/buildkite-agent-%i/builds
    ExecStartPre=/bin/chown -R buildkite-agent-%i /home/buildkite-agent-%i
    ExecStart=/usr/bin/buildkite-agent start
    RestartSec=5
    Restart=on-failure
    TimeoutSec=10
    
    [Install]
    WantedBy=multi-user.target
    DefaultInstance=1
    
    useradd -g buildkite-agent buildkite-agent-1
    
  • Justin Obara commented 2018-05-22T10:24:53.090-0400

    Merged PR ( https://github.com/fluid-project/infusion/pull/901 ) into the project repo at 8b7038bd1c50df813748fe12cfb186b9ae8546ab

  • Giovanni Tirloni commented 2018-05-22T12:21:17.866-0400

    Thanks Justin Obara! Now we can spin up new agents for other purposes and they won't interfere.

  • Giovanni Tirloni commented 2018-05-24T16:27:47.060-0400

    Until the following features are implemented, it's necessary to define a tag 'name' in the agent configuration (through a env var, for automatic systemd configuration of the agent name) so we have information to properly target the agent:

    Submitted a new PR with the necessary changes (https://github.com/fluid-project/infusion/pull/905)

    Deployed 4 agents on h-0005.

    This is the latest systemd unit file:

    # cat /usr/lib/systemd/system/buildkite@.service
    [Unit]
    Description=Buildkite Agent (%i)
    Documentation=https://buildkite.com/agent
    After=syslog.target
    After=network.target
    
    [Service]
    Type=simple
    User=buildkite-agent-%i
    PermissionsStartOnly=true
    Environment=HOME=/home/buildkite-agent-%i
    Environment=BUILDKITE_AGENT_NAME=%H-%i
    Environment=BUILDKITE_AGENT_TAGS=name=%H-%i,env=dev,type=physical,hypervisor=virtualbox,docker=true,vagrant=true
    Environment=BUILDKITE_BUILD_PATH=/home/buildkite-agent-%i/builds
    Environment=BUILDKITE_HOOKS_PATH=/etc/buildkite-agent/hooks
    Environment=BUILDKITE_PLUGINS_PATH=/etc/buildkite-agent/plugins
    ExecStartPre=/bin/mkdir -p /home/buildkite-agent-%i/builds
    ExecStartPre=/bin/chown -R buildkite-agent-%i /home/buildkite-agent-%i/builds
    ExecStart=/usr/bin/buildkite-agent start
    RestartSec=5
    Restart=on-failure
    TimeoutSec=10
    
    [Install]
    WantedBy=multi-user.target
    DefaultInstance=1
    
  • Giovanni Tirloni commented 2018-08-29T13:24:13.661-0400

    It's unlikely we will continue to use BuildKite at this point (and Jenkins has been configured for some Fluid projects already).

    With that in mind, I've rolled back these changes and uninstalled buildkite-agent from h-0005.