Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save data in Kubernetes? I have tried Persistent Volume but it doesn't solve the problem

There is a folder name "data-persistent" in the running container that the code reads and writes from, I want to save the changes made in that folder. when I use persistent volume, it removes/hides the data from that folder and the code gives an error. So what should be my approach.

FROM python:latest
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
#RUN mkdir data-persistent
ADD linkedin_scrape.py .
COPY requirements.txt ./requirements.txt
COPY final_links.csv ./final_links.csv
COPY credentials.txt ./credentials.txt
COPY vectorizer.pk ./vectorizer.pk
COPY model_IvE ./model_IvE
COPY model_JvP ./model_JvP
COPY model_NvS ./model_NvS
COPY model_TvF ./model_TvF
COPY nocopy.xlsx ./nocopy.xlsx
COPY data.db /data-persistent/
COPY textdata.txt /data-persistent/
RUN ls -la /data-persistent/*
RUN pip install -r requirements.txt
CMD python linkedin_scrape.py --bind 0.0.0.0:8080 --timeout 90

And my deployment file

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-first-cluster1
spec:
  replicas: 2
  selector:
    matchLabels:
      app: scrape
  template:
    metadata:
      labels:
        app: scrape
    spec:
      containers:
      - name: scraper
       
        image: image-name
        #
        ports:
        - containerPort: 8080
        env:
        - name: PORT
          value: "8080"

        volumeMounts:
        - mountPath: "/dev/shm"
          name: dshm
        - mountPath: "/data-persistent/"
          name: tester
      volumes:
      - name: dshm
        emptyDir:
          medium: Memory
      - name: tester
        persistentVolumeClaim:
          claimName: my-pvc-claim-1

Let me explain the workflow of the code. The code reads from the textdata.txt file which contains the indices of links to be scraped e.g. from 100 to 150, then it scrapes the profiles, inserts them to data.db file and then writes to the texdata.txt file the sequence to be scraped in next run e.g. 150 to 200.

like image 766
Sardar Arslan Avatar asked Oct 15 '25 04:10

Sardar Arslan


1 Answers

First , k8s volume mounting point overwrite the original file system /data-persistent/

To solve such a case you have many options

Solution 1

  • edit your docker file to copy local data to /tmp-data-persistent
  • then add "init container" that copy content of /tmp-data-persitent to /data-persistent that will copy the data to the volume and apply persistency

Solution 2

  • its not good to copy data in docker images , that will increase images sizes and also unityfing code and data change pipelines

  • Its better to keep data in any shared storage like "s3" , and let the "init container" compare and sync data

if cloud services like s3 not available

  • you can use persistent volume type that support multipe r/w mounts

  • attach same volume to another deployment { use busybox image as example } and do the copy with "kubectl cp"

  • scale temp deployments to zero after finalizing the copy , also you can make it as part of CI pipeline

like image 77
Tamer Elfeky Avatar answered Oct 16 '25 17:10

Tamer Elfeky