INFRA-162 | Fluid Project Issues Archive

Metadata

Source: INFRA-162
Type: Improvement
Priority: Major
Status: Closed
Resolution: Won't Fix
Assignee: Giovanni Tirloni
Reporter: Giovanni Tirloni
Created: 2018-04-12T11:36:28.307-0400
Updated: 2018-08-29T13:24:13.664-0400
Versions: N/A
Fixed Versions: N/A
Component: N/A

Description

BuildKite's agent works on a single build at once. To increase the parallelism and make effective use of all CPUs for our builds, more agents need to be started, per host.

Comments

Giovanni Tirloni commented 2018-04-13T08:51:41.233-0400

Job Concurrency

While Jenkins lets you tweak the number of executors and GitLab lets you define how many concurrent jobs a runner supports, the Buildkite agent runs one job at a time. This makes it simpler and more robust, which are trade-offs I can agree with.

If you want to increase the job concurrency per host, you have to deploy multiple agents.

The official documentation is a bit light on details so I’ll add the steps I have followed to achieve this. My environment consists of CentOS hosts and the agents are started by systemd. I’ve tested these steps on buildkite-agent v3.0.

The idea is to have a single agent configuration file with global settings for the host and tweak the per-agent settings using a systemd unit template, so you can easily instantiate as many units as needed.

The only settings you will need to define are the token and any tags you might need. Tags are useful when you need to define a different queue name or add meta-data about your hosts (e.g. queue=priority,hypervisor=kvm,docker=true) that can later be used to target jobs.

The minimum per-agent settings are:
- name
- build-path
- hooks-path
- plugins-path
As I’ve mentioned, the latter will be set automatically using systemd.

If you need to set per-host settings, use the configuration file. For per-agent settings, add them to the systemd unit file.

Another approach is to have per-agent configuration files instead of using systemd (and environment variables). For that, tweak the systemd unit file below to pass the BUILDKITE_AGENT_CONFIG environment variable or the --config parameter to buildkite-agent start.

Steps

First, install the agent following the official steps here.

Edit the /etc/buildkite-agent/buildkite-agent.cfg file so it only has the following lines:
```
token="xxx"
tags="tag1=value1,tag2=value2"     # optional
```
In other words, ensure that per-agent settings like name, build-path, hooks-path and plugins-path are not defined in the configuration file.

Next, edit the /usr/lib/systemd/system/buildkite-agent@.service file and replace it with the following:
```
[Unit]
Description=Buildkite Agent (%i)
Documentation=https://buildkite.com/agent
After=syslog.target
After=network.target

[Service]
Type=simple
User=buildkite-agent
PermissionsStartOnly=true
Environment=HOME=/var/lib/buildkite-agent-%i
Environment=BUILDKITE_AGENT_NAME=%H-%i
Environment=BUILDKITE_BUILD_PATH=/var/lib/buildkite-agent-%i/builds
Environment=BUILDKITE_HOOKS_PATH=/etc/buildkite-agent/hooks
Environment=BUILDKITE_PLUGINS_PATH=/etc/buildkite-agent/plugins
ExecStartPre=/bin/mkdir -p /var/lib/buildkite-agent-%i/builds
ExecStartPre=/bin/chown -R buildkite-agent /var/lib/buildkite-agent-%i
ExecStart=/usr/bin/buildkite-agent start
RestartSec=5
Restart=on-failure
TimeoutSec=10

[Install]
WantedBy=multi-user.target
DefaultInstance=1
```
Each agent will get a directory under /var/lib/buildkite-agent-%i where %i is the instance argument.

Reload the systemd configuration with systemctl daemon-reload.

Now you are ready to start as many agents as needed:
```
# systemctl enable buildkite-agent@1.service
# systemctl start buildkite-agent@1.service
```
Repeat these steps and increase the instance numbers and you should see your agents appearing in the Buildkite dashboard

In case you want to change the agent name to something other than %hostname-%instance, ensure it’s set to something globally unique. For a list of systemd specifiers, see the official documentation (table 4).

Caveats

It might seem unintuitive at first but Buildkite can schedule jobs from a single pipeline to different agents.

If all your pipeline steps need to run on the same host, you have to ensure that each step is properly targetted at that host using a combination of meta-data selectors (see the agents attribute for the {{command}}step).
```
- name: build
    command: echo 'hello world'
    agents:
      name: hostname
      queue: default
```
There is a feature request to have per-pipeline agent rules so the agent doesn’t need to be specified per step (buildkite/feedback/issues/173), which should simplify this.

Source: https://ret.cx/2018/04/running-multiple-buildkite-agents-per-host/
Giovanni Tirloni commented 2018-04-13T09:05:51.613-0400

Avtar Gill does this approach sound okay? I'm just worried about what I mentioned in the caveats section. That will make the pipeline.yml file look a bit ugly. Some people have tried to simplify it by turning the yaml into a shell script and then getting the agent's name and adding it to everything (so the step becomes "pipeline.sh | buildkite-agent upload pipeline"). If issue#173 would be implemented, that would be awesome but I'm not sure how long we'd have to wait for that.

I'm trying to think of a situation where running steps in different hosts would be beneficial, it's so different from the way we have been working for years now.

Giovanni Tirloni commented 2018-04-16T20:54:16.520-0400

Made some changes to the systemd unit file so it has the agent with separate users:

[Unit]
Description=Buildkite Agent (%i)
Documentation=https://buildkite.com/agent
After=syslog.target
After=network.target

[Service]
Type=simple
User=buildkite-agent-%i
PermissionsStartOnly=true
Environment=HOME=/home/buildkite-agent-%i
Environment=BUILDKITE_AGENT_NAME=%H-%i
Environment=BUILDKITE_BUILD_PATH=/home/buildkite-agent-%i/builds
Environment=BUILDKITE_HOOKS_PATH=/etc/buildkite-agent/hooks
Environment=BUILDKITE_PLUGINS_PATH=/etc/buildkite-agent/plugins
ExecStartPre=/bin/mkdir -p /home/buildkite-agent-%i/builds
ExecStartPre=/bin/chown -R buildkite-agent-%i /home/buildkite-agent-%i
ExecStart=/usr/bin/buildkite-agent start
RestartSec=5
Restart=on-failure
TimeoutSec=10

[Install]
WantedBy=multi-user.target
DefaultInstance=1

useradd -g buildkite-agent buildkite-agent-1

Justin Obara commented 2018-05-22T10:24:53.090-0400

Merged PR ( https://github.com/fluid-project/infusion/pull/901 ) into the project repo at 8b7038bd1c50df813748fe12cfb186b9ae8546ab
Giovanni Tirloni commented 2018-05-22T12:21:17.866-0400

Thanks Justin Obara! Now we can spin up new agents for other purposes and they won't interfere.

Giovanni Tirloni commented 2018-05-24T16:27:47.060-0400

Until the following features are implemented, it's necessary to define a tag 'name' in the agent configuration (through a env var, for automatic systemd configuration of the agent name) so we have information to properly target the agent:

Submitted a new PR with the necessary changes (https://github.com/fluid-project/infusion/pull/905)

Deployed 4 agents on h-0005.

This is the latest systemd unit file:

# cat /usr/lib/systemd/system/buildkite@.service
[Unit]
Description=Buildkite Agent (%i)
Documentation=https://buildkite.com/agent
After=syslog.target
After=network.target

[Service]
Type=simple
User=buildkite-agent-%i
PermissionsStartOnly=true
Environment=HOME=/home/buildkite-agent-%i
Environment=BUILDKITE_AGENT_NAME=%H-%i
Environment=BUILDKITE_AGENT_TAGS=name=%H-%i,env=dev,type=physical,hypervisor=virtualbox,docker=true,vagrant=true
Environment=BUILDKITE_BUILD_PATH=/home/buildkite-agent-%i/builds
Environment=BUILDKITE_HOOKS_PATH=/etc/buildkite-agent/hooks
Environment=BUILDKITE_PLUGINS_PATH=/etc/buildkite-agent/plugins
ExecStartPre=/bin/mkdir -p /home/buildkite-agent-%i/builds
ExecStartPre=/bin/chown -R buildkite-agent-%i /home/buildkite-agent-%i/builds
ExecStart=/usr/bin/buildkite-agent start
RestartSec=5
Restart=on-failure
TimeoutSec=10

[Install]
WantedBy=multi-user.target
DefaultInstance=1

Giovanni Tirloni commented 2018-08-29T13:24:13.661-0400

It's unlikely we will continue to use BuildKite at this point (and Jenkins has been configured for some Fluid projects already).

With that in mind, I've rolled back these changes and uninstalled buildkite-agent from h-0005.

Metadata

Description

Comments

Job Concurrency

Steps

Caveats