Data Control
I am sure it is not news for anyone at this point, that the major search engine providers (namely Google Search, Micro$oft Bing etc) hoover up all the data they can get their hands on from their users/customers. The problem and data race has accelerated even further with the hunt for data for training large Machine Learning Models.
To regain control of your highly sensitive search data, the answer is self hosting your own search engine. Namely SearxNG.
Kubernetes
There already exist a Helm chart for SearxNG, however I personally feel that it often offers less control that I want (securityContext).
Below is the deployment
, service
and ingress
that I personally use day to day.
The ingress is type Nginx and I also use cert-manager with an Let’s Encrypt clusterIssuer for certificate issuance. Default namespace for simplicity.
service.yaml (Click to expand)
apiVersion: v1
kind: Service
metadata:
name: searxng-service
namespace: default
spec:
internalTrafficPolicy: Cluster
ports:
- port: 8080
selector:
app: searxng
ingress.yaml (Click to expand)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
name: searxng-ingress
namespace: default
spec:
ingressClassName: nginx
rules:
- host: search.example.com
http:
paths:
- backend:
service:
name: searxng-service
port:
number: 8080
path: /
pathType: Prefix
tls:
- hosts:
- search.example.com
secretName: searxng-tls
deployment.yaml (Click to expand)
apiVersion: apps/v1
kind: Deployment
metadata:
name: searxng
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: searxng
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: searxng
spec:
automountServiceAccountToken: false
securityContext:
fsGroup: 977
containers:
- env:
- name: BASE_URL
value: https://search.example.com
name: searxng
image: searxng/searxng:latest
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
startupProbe:
failureThreshold: 60
httpGet:
path: /healthz
port: 8080
scheme: HTTP
periodSeconds: 10
timeoutSeconds: 30
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 15
periodSeconds: 20
successThreshold: 1
ports:
- containerPort: 8080
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
securityContext:
runAsNonRoot: false
privileged: false
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
procMount: DefaultProcMount
seccompProfile:
type: RuntimeDefault
capabilities:
drop:
- ALL
add:
- SETGID
- SETUID
terminationMessagePath: /dev/termination-log
volumeMounts:
- mountPath: /etc/searxng
name: searxng
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
terminationGracePeriodSeconds: 30
volumes:
- name: searxng
emptyDir: {}