Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAIS deploy #118

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
24ba37e
NAIS deploy
mallport Nov 29, 2024
e535d40
Merge branch 'master' into nais-deploy
mallport Nov 29, 2024
92a13a3
Merge branch 'master' into nais-deploy
mallport Nov 29, 2024
6213fa5
Merge branch 'master' into nais-deploy
mallport Nov 29, 2024
c3eb29b
Temporarily deploy on PR commit
mallport Dec 4, 2024
86887cf
Add NAIS Keycloak as trusted issuer
mallport Dec 4, 2024
5b40d46
Fix PR deploy branch
mallport Dec 4, 2024
3d5d3bc
Merge branch 'master' into nais-deploy
mallport Dec 4, 2024
933325b
Forgor to save
mallport Dec 4, 2024
7aeae39
Fix application config. Add variable for templating
mallport Dec 4, 2024
ccc50e5
Add team as templated variable
mallport Dec 4, 2024
0092d99
yaml -> yml
mallport Dec 4, 2024
e806f4d
Add Keycloak for egress
mallport Dec 4, 2024
60eae1b
Add Keycloak BIP for egres
mallport Dec 4, 2024
74c3ab7
add prod release
mallport Dec 10, 2024
36fcc0c
Use pseudo users
mallport Dec 10, 2024
20fc430
Use pseudo admins
mallport Dec 10, 2024
aee750a
Lower resources in test
mallport Dec 10, 2024
fe8395e
Add internal ingress for prod. Add external egress for test
mallport Dec 10, 2024
0cd43b3
Remove test subdomain from ingress URL
mallport Dec 10, 2024
f641eeb
add alerts for pseudo-service (#119)
ssb-jnk Jan 10, 2025
94a5601
alert-deploy.yml (#120)
ssb-jnk Jan 10, 2025
772643d
change high memory usage to fetch memory dynamically
ssb-jnk Jan 10, 2025
3ae7013
edit high memory alert
ssb-jnk Jan 10, 2025
3ee443c
revert to putting max memory manually
ssb-jnk Jan 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions .github/workflows/alert-deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: Deploy alerts
run-name: Deploy alerts for pseudo-service to dev and prod

on:
push:
branches:
- master
- nais-deploy
paths:
- '.nais/alerts.yaml'
- '.github/workflows/alert-deploy.yml'
permissions:
id-token: write

jobs:
test-deploy:
name: Deploy alerts to test
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Deploy to test
uses: nais/deploy/actions/deploy@v2
env:
CLUSTER: test
RESOURCE: .nais/alerts.yaml
DEPLOY_SERVER: deploy.ssb.cloud.nais.io:443

prod-deploy:
name: Deploy alerts to prod
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Deploy to prod
uses: nais/deploy/actions/deploy@v2
env:
CLUSTER: prod
RESOURCE: .nais/alerts.yaml
DEPLOY_SERVER: deploy.ssb.cloud.nais.io:443
17 changes: 5 additions & 12 deletions .github/workflows/build-deploy-app.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
on:
release:
types: [ published ]
pull_request: ## ONLY FOR TESTING, SHOULD BE REMOVED AFTER DEPLOY PR IS MERGED
branches: [nais-deploy]
branches:
- master
push:
branches:
- master
Expand Down Expand Up @@ -82,15 +81,9 @@ jobs:
- name: Generate image tags
id: nais-deploy-vars
run: |
if [[ ${{github.event_name}} == "release" ]]; then
echo "nais_tag=${{ steps.version-tag.outputs.version_tag }}" >> "$GITHUB_OUTPUT"
echo "cluster=prod" >> "$GITHUB_OUTPUT"
echo "nais_config_path=.nais/prod/nais.yaml" >> "$GITHUB_OUTPUT"
else
echo "nais_tag=${{ steps.docker-push.outputs.tag }}" >> "$GITHUB_OUTPUT"
echo "cluster=test" >> "$GITHUB_OUTPUT"
echo "nais_config_path=.nais/test/nais.yaml" >> "$GITHUB_OUTPUT"
fi
echo "nais_tag=${{ steps.docker-push.outputs.tag }}" >> "$GITHUB_OUTPUT"
echo "cluster=prod" >> "$GITHUB_OUTPUT"
echo "nais_config_path=.nais/prod/nais.yaml" >> "$GITHUB_OUTPUT"

deploy:
name: Deploy to NAIS
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/deploy-to-nais.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,6 @@ jobs:
env:
CLUSTER: ${{ inputs.cluster }}
RESOURCE: ${{ inputs.nais-config-path }}
VAR: image=${{ inputs.registry }}/${{ secrets.NAIS_MANAGEMENT_PROJECT_ID }}/${{ inputs.repository }}/${{ inputs.image-name }}:${{ inputs.image-tag }}
VAR: image=${{ inputs.registry }}/${{ secrets.NAIS_MANAGEMENT_PROJECT_ID }}/${{ inputs.repository }}/${{ inputs.image-name }}:${{ inputs.image-tag }},team=dapla-stat
DEPLOY_SERVER: deploy.ssb.cloud.nais.io:443
REF: ${{ inputs.ref }}
75 changes: 75 additions & 0 deletions .nais/alerts.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
apiVersion: "monitoring.coreos.com/v1"
kind: PrometheusRule
metadata:
name: alert-pseudo-service
namespace: dapla-stat
labels:
team: dapla-stat
spec:
groups:
- name: dapla-stat
rules:
# This alert checks if no replicas of pseudo-service are available, indicating the service is unavailable.
- alert: PseudoServiceUnavailable
expr: kube_deployment_status_replicas_available{deployment="pseudo-service"} == 0
for: 1m
annotations:
title: "Pseudo-service is unavailable"
consequence: "The service is unavailable to users. Immediate investigation required."
action: "Check the deployment status and logs for issues."
labels:
service: pseudo-service
namespace: dapla-stat
severity: critical

# This alert detects high CPU usage by calculating the CPU time used over 5 minutes.
- alert: HighCPUUsage
expr: rate(process_cpu_seconds_total{app="pseudo-service"}[5m]) > 0.8
for: 5m
annotations:
title: "High CPU usage detected"
consequence: "The service might experience performance degradation."
action: "Investigate the cause of high CPU usage and optimize if necessary."
labels:
service: pseudo-service
namespace: dapla-stat
severity: warning

# This alert checks if memory usage exceeds 90% of the 12GB limit, which could cause instability.
- alert: HighMemoryUsage
expr: process_resident_memory_bytes{app="pseudo-service"} > (0.9 * 12 * 1024 * 1024 * 1024)
for: 5m
annotations:
title: "High memory usage detected"
consequence: "The service might experience instability due to high memory usage."
action: "Check memory utilization and consider increasing resources or optimizing the service."
labels:
service: pseudo-service
namespace: dapla-stat
severity: warning

# This alert detects a high number of error logs in pseudo-service.
- alert: HighNumberOfErrors
expr: (100 * sum by (app, namespace) (rate(log_messages_errors{app="pseudo-service", level=~"Error"}[3m])) / sum by (app, namespace) (rate(log_messages_total{app="pseudo-service"}[3m]))) > 10
for: 3m
annotations:
title: "High number of errors logged in pseudo-service"
consequence: "The application is logging a significant number of errors."
action: "Check the service logs for errors and address the root cause."
labels:
service: pseudo-service
namespace: dapla-stat
severity: critical

# This alert monitors the number of pod restarts for pseudo-service and triggers if more than 3 restarts occur within 15 minutes.
- alert: HighPodRestarts
expr: increase(kube_pod_container_status_restarts_total{namespace="dapla-stat", app="pseudo-service"}[15m]) > 3
for: 15m
annotations:
title: "High number of pod restarts"
consequence: "The service may be unstable or misconfigured."
action: "Investigate the cause of pod restarts and fix configuration or resource issues."
labels:
service: pseudo-service
namespace: dapla-stat
severity: warning
213 changes: 213 additions & 0 deletions .nais/prod/nais.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
apiVersion: nais.io/v1alpha1
kind: Application
metadata:
name: pseudo-service
namespace: {{team}}
labels:
team: {{team}}
spec:
image: "{{ image }}" # Injected from the GitHub Action
port: 10210
replicas:
max: 5
min: 1
resources:
requests:
cpu: 200m
memory: 2Gi
limits:
memory: 12Gi

ingresses:
- https://pseudo-service.intern.ssb.no

accessPolicy:
outbound:
external:
- host: "auth.ssb.no"
- host: "keycloak.prod-bip-app.ssb.no"
- host: "cloudkms.googleapis.com"
- host: "secretmanager.googleapis.com"
- host: "www.googleapis.com"
- host: "cloudidentity.googleapis.com"

liveness:
path: /health/liveness
port: 10210
readiness:
path: /health/readiness
port: 10210
startup:
path: /health/readiness
port: 10210

env:
- name: MICRONAUT_CONFIG_FILES
value: /conf/bootstrap-prod.yml,/conf/application-prod.yml
- name: LOGBACK_CONFIGURATION_FILE
value: /conf/logback-prod.xml

envFrom:
- secret: pseudo-key-config
- secret: pseudo-elevated-users

filesFrom:
- configmap: pseudo-application-prod-configmap
mountPath: /conf

---

apiVersion: v1
kind: ConfigMap
metadata:
name: pseudo-application-prod-configmap
namespace: {{team}}
labels:
team: {{team}}
data:
bootstrap-prod.yml: |-
micronaut:
application:
name: pseudo-service
config-client:
enabled: true
gcp:
project-id: prod-dapla-pseudo-1530

application-prod.yml: |-
micronaut:
application:
name: pseudo-service
server:
port: 10210
cors.enabled: true
idle-timeout: 60m
read-idle-timeout: 60m
write-idle-timeout: 60m
thread-selection: AUTO
max-request-size: 2gb
multipart:
max-file-size: 2gb

netty:
event-loops:
other:
num-threads: 100
prefer-native-transport: true

http:
client:
event-loop-group: other
read-timeout: 60s

services:
sid-service:
url: 'http://reg-freg-p-sid-lookup-service.freg.svc.cluster.local'
path: '/v2'
read-timeout: 60s
pool:
enabled: true
max-connections: 50
cloud-identity-service:
url: 'https://cloudidentity.googleapis.com'
path: '/v1'
read-timeout: 60s

caches:
secrets:
expire-after-access: 15m
cloud-identity-service-cache:
expire-after-write: 1m

router:
static-resources:
swagger:
paths: classpath:META-INF/swagger
mapping: /api-docs/**
swagger-ui:
paths: classpath:META-INF/swagger/views/swagger-ui
mapping: /api-docs/swagger-ui/**
rapidoc:
paths: classpath:META-INF/swagger/views/rapidoc
mapping: /api-docs/rapidoc/**
redoc:
paths: classpath:META-INF/swagger/views/redoc
mapping: /api-docs/redoc/**

security:
enabled: true
intercept-url-map:
- pattern: /api-docs/**
httpMethod: GET
access:
- isAnonymous()
token:
name-key: email
jwt:
signatures:
jwks:
keycloak-nais:
url: 'https://auth.ssb.no/realms/ssb/protocol/openid-connect/certs'
keycloak-bip:
url: 'https://keycloak.prod-bip-app.ssb.no/auth/realms/ssb/protocol/openid-connect/certs'
google:
url: 'https://www.googleapis.com/oauth2/v3/certs'

basic-auth:
enabled: false

endpoints:
prometheus:
sensitive: false
info:
enabled: true
sensitive: false

logger:
levels:
io.micronaut.security: INFO
no.ssb.dlp.pseudo.service: INFO
io.micronaut.security.token.jwt.validator: DEBUG

services:
secrets:
impl: GCP

gcp:
kms:
key-uris:
- ${PSEUDO_KEK_URI}

http:
client:
filter:
project-id: 'prod-dapla-pseudo-1530'
services:
cloud-identity-service:
audience: "https://www.googleapis.com/auth/cloud-identity.groups.readonly"

pseudo.secrets:
ssb-common-key-1:
id: ${SSB-COMMON-KEY-1-KEY-ID}
type: TINK_WDEK
ssb-common-key-2:
id: ${SSB-COMMON-KEY-2-KEY-ID}
type: TINK_WDEK
papis-common-key-1:
id: ${PAPIS-COMMON-KEY-1-KEY-ID}
type: TINK_WDEK

export:
default-target-root: gs://ssb-prod-dapla-pseudo-service-data-export/felles

sid.mapper.partition.size: 100000

app-roles:
# When using isAuthenticated() the JWT token must be signed by this trusted-issuer
trusted-issuers:
- https://keycloak.prod-bip-app.ssb.no/auth/realms/ssb
- https://auth.ssb.no/realms/ssb
users: ${PSEUDO_USERS}
# admins: ${PSEUDO_ADMINS}
users-group: [email protected]
admins-group: [email protected]
Loading
Loading