# Automated OS image import, poll and update

CDI supports automating OS image import, poll and update, keeping OS images up-to-date according to the given `schedule`. On the first time a `DataImportCron` is scheduled, the controller will import the source image. On any following scheduled poll, if the source image digest (sha256) has updated, the controller will import it to a new [*source*](#dataimportcron-source-formats) in the `DataImportCron` namespace, and update the managed `DataSource` to point to the newly created source. A garbage collector (`garbageCollect: Outdated` enabled by default) is responsible to keep the last `importsToKeep` (3 by default) imported sources per `DataImportCron`, and delete older ones.

See design doc [here](https://github.com/kubevirt/community/blob/main/design-proposals/golden-image-delivery-and-update-pipeline.md)

```yaml
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataImportCron
metadata:
  name: fedora-image-import-cron
  namespace: golden-images
spec:
  template:
    spec:
      source:
        registry:
          url: "docker://quay.io/kubevirt/fedora-cloud-registry-disk-demo:latest"
          pullMethod: node
          certConfigMap: some-certs
      storage:
        resources:
          requests:
            storage: 5Gi
        storageClassName: hostpath-provisioner
  schedule: "30 1 * * 1"
  garbageCollect: Outdated
  importsToKeep: 2
  managedDataSource: fedora
```

A `DataVolume` can use a `sourceRef` referring to a `DataSource`, instead of the `source`, so whenever created it will use the latest imported source similarly to specifying `dv.spec.source`. 

```yaml
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: fedora-ref
  namespace: golden-images
spec:
  sourceRef:
      kind: DataSource
      name: fedora
  storage:
    resources:
      requests:
        storage: 5Gi
    storageClassName: hostpath-provisioner
```
## OpenShift ImageStreams

Using `pullMethod: node` we also support import from OpenShift `imageStream` instead of `url`:

```yaml
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataImportCron
metadata:
  name: rhel8-image-import-cron
  namespace: openshift-virtualization-os-images
spec:
  template:
    spec:
      source:
        registry:
          imageStream: rhel8-is
          pullMethod: node
      storage:
        resources:
          requests:
            storage: 5Gi
        storageClassName: hostpath-provisioner
  schedule: "0 0 * * 5"
  importsToKeep: 4
  managedDataSource: rhel8
```

Currently we assume the `ImageStream` is in the same namespace as the `DataImportCron`.

To create an `ImageStream` one can use for example:
* oc import-image rhel8-is -n openshift-virtualization-os-images --from=registry.redhat.io/rhel8/rhel-guest-image --scheduled --confirm
* oc set image-lookup rhel8-is -n openshift-virtualization-os-images 

Or on CRC:
* oc import-image cirros-is -n openshift-virtualization-os-images --from=kubevirt/cirros-container-disk-demo --scheduled --confirm
* oc set image-lookup cirros-is -n openshift-virtualization-os-images

More information on image streams is available [here](https://docs.openshift.com/container-platform/4.13/openshift_images/image-streams-manage.html) and [here](https://www.tutorialworks.com/openshift-imagestreams).

## PVC Source

A `PVC` from any namespace can also be the source for a `DataImportCron`. The source digest is based on the `PVC` `UID`, which is polled according to the schedule, so when a new `PVC` is detected it will be imported.

```yaml
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataImportCron
metadata:
  name: pvc-import-cron
  namespace: ns1
spec:
  template:
    spec:
      source:
        pvc:
          name: my-pvc
          namespace: ns2
...
```

## DataImportCron source formats

* PersistentVolumeClaim
* VolumeSnapshot

DataImportCron was originally designed to only maintain PVC sources,  
However, for certain storage types, we know that snapshots sources scale better.  
Some details and examples can be found in [clone-from-volumesnapshot-source](./clone-from-volumesnapshot-source.md).

We keep this provisioner-specific information on the [StorageProfile](./storageprofile.md) object for each provisioner at the `dataImportCronSourceFormat` field (possible values are `snapshot`/`pvc`), which tells the DataImportCron which type of source is preferred for the provisioner.  

Some provisioners like ceph rbd are opted in automatically.  
To opt-in manually, one must edit the `StorageProfile`:
```yaml
apiVersion: cdi.kubevirt.io/v1beta1
kind: StorageProfile
metadata:
  ...
spec:
  dataImportCronSourceFormat: snapshot
```

To ensure smooth transition, existing DataImportCrons can be switchd to maintaining snapshots instead of PVCs by updating their corresponding storage profiles.

## DataImportCron storage class
Unless specified explicitly, similarly to PVCs, DataImportCrons will be provisioned using the default [virt](./datavolumes.md#default-virtualization-storage-class)/k8s storage class.  
In previous versions, an admin would have to actively delete the old sources upon change of the storage class  
(either explicitly by editing the DataImportCron or a cluster-wide change of the default storage class)  

Today, the controller performs this automatically;  
However, changing the storage class should be a conscious decision and in some cases (complex CI setups) it's advised to specify it explicitly
to avoid exercising a different storage class for golden images throughout installation.  
This flip flop could be costly and in some cases outright surprising to cluster admins.