Skip to content

CLVM enhancements and fixes#12617

Draft
Pearl1594 wants to merge 87 commits into
mainfrom
clvm-enhancements
Draft

CLVM enhancements and fixes#12617
Pearl1594 wants to merge 87 commits into
mainfrom
clvm-enhancements

Conversation

@Pearl1594

@Pearl1594 Pearl1594 commented Feb 9, 2026

Copy link
Copy Markdown
Contributor

Description

This PR enhances the existing CLVM implementation which was based on the deprecated CLVM technology which was based on corosync/pacemaker. With RHEL 7 having reached EOL, CLVM seems to be broken. CLVM supports RAW volumes on LVM , where as CLVM_NG support QCOW2 on LVM.

Further details: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Modernized+CLVM%3A+Enhancements+and+CLVM_NG+support

NOTE: On testing - it was identified that incremental snapshots for clvm-ng do not work as expected. As of now it's been removed from scope. So, CLVM and CLVM_NG would only support full snapshots.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@codecov

codecov Bot commented Feb 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 3.68%. Comparing base (d3e1976) to head (df61d6f).
⚠️ Report is 15 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (d3e1976) and HEAD (df61d6f). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (d3e1976) HEAD (df61d6f)
unittests 1 0
Additional details and impacted files
@@              Coverage Diff              @@
##               main   #12617       +/-   ##
=============================================
- Coverage     17.90%    3.68%   -14.23%     
=============================================
  Files          5938      454     -5484     
  Lines        532864    38798   -494066     
  Branches      65192     7151    -58041     
=============================================
- Hits          95392     1428    -93964     
+ Misses       426793    37183   -389610     
+ Partials      10679      187    -10492     
Flag Coverage Δ
uitests 3.67% <ø> (+<0.01%) ⬆️
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16801

@harikrishna-patnala harikrishna-patnala added this to the 4.23.0 milestone Feb 17, 2026
@codecov

codecov Bot commented Feb 17, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 3.53%. Comparing base (6bc83a3) to head (729f79a).

❗ There is a different number of reports uploaded between BASE (6bc83a3) and HEAD (729f79a). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (6bc83a3) HEAD (729f79a)
unittests 1 0
Additional details and impacted files
@@              Coverage Diff              @@
##               main   #12617       +/-   ##
=============================================
- Coverage     18.75%    3.53%   -15.23%     
=============================================
  Files          6160      464     -5696     
  Lines        552578    40203   -512375     
  Branches      67348     7563    -59785     
=============================================
- Hits         103660     1421   -102239     
+ Misses       437512    38592   -398920     
+ Partials      11406      190    -11216     
Flag Coverage Δ
uitests 3.53% <ø> (-0.01%) ⬇️
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16875

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16877

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18195

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18198

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@sonarqubecloud

sonarqubecloud Bot commented Jun 8, 2026

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot
34.0% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube Cloud

@JoaoJandre JoaoJandre left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was only able to review half the files. I will do the rest later.
Also, could you point to a documentation somewhere on how to setup a CLVM_NG storage to add it to ACS? I was not able to find it online

Comment thread engine/orchestration/src/main/java/com/cloud/vm/VirtualMachineManagerImpl.java Outdated
Comment thread engine/orchestration/src/main/java/com/cloud/vm/VirtualMachineManagerImpl.java Outdated
Comment thread engine/orchestration/src/main/java/com/cloud/vm/VirtualMachineManagerImpl.java Outdated
}
}

updateClvmLockHostForVmVolumes(vm.getId(), dstHostId);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the result is false, or null, we still update the lock? How do we know if this update will be correct?
If the host does not execute the command (the command queue is full and the command times out, or the host is restarted) we will update the lock metadata to point to the wrong host, correct?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it does, in the next re-try it will get corrected.

Comment thread engine/orchestration/src/main/java/com/cloud/vm/VirtualMachineManagerImpl.java Outdated
Comment on lines +151 to +156
return new MigrateCommand.MigrateDiskInfo(srcVolumeInfo.getPath(),
MigrateCommand.MigrateDiskInfo.DiskType.FILE,
MigrateCommand.MigrateDiskInfo.DriverType.QCOW2,
MigrateCommand.MigrateDiskInfo.Source.FILE,
destPath,
backingPath);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a style change only? Is it needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was just cosmetic for better readability.

Comment on lines +2268 to +2269
String details = preMigAnswer != null ? preMigAnswer.getDetails() : "null answer returned";
logger.warn("PreMigrationCommand failed for CLVM/CLVM_NG VM [{}] on source host [{}]: {}. Migration will continue but may fail if volumes are exclusively locked.", vmTO.getName(), srcHost.getId(), details);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we abort the migration in this case?
If we continue and it does not work, it could lead to problems in the VM, no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - if it fails, it will behave like any VM migration failure. The source VM / volume goes back to original state.

@Pearl1594

Copy link
Copy Markdown
Contributor Author

I was only able to review half the files. I will do the rest later. Also, could you point to a documentation somewhere on how to setup a CLVM_NG storage to add it to ACS? I was not able to find it online

It can be found at: apache/cloudstack-documentation#637

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

Comment thread server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java Outdated
Comment thread server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java Outdated
Comment thread server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java Outdated
Comment on lines +2749 to 2785
// Check if CLVM lock transfer is needed (even if moveVolumeNeeded is false)
// This handles the case where the volume is already on the correct storage pool
// but the VM is running on a different host, requiring only a lock transfer
boolean isClvmLockTransferNeeded = !moveVolumeNeeded &&
isClvmLockTransferRequired(newVolumeOnPrimaryStorage, existingVolumeOfVm, vm);

if (isClvmLockTransferNeeded) {
// CLVM lock transfer - no data copy, no pool change needed
newVolumeOnPrimaryStorage = executeLightweightLockMigration(
newVolumeOnPrimaryStorage, vm, existingVolumeOfVm,
"CLVM lock transfer", "same pool to different host");
} else if (moveVolumeNeeded) {
PrimaryDataStoreInfo primaryStore = (PrimaryDataStoreInfo)newVolumeOnPrimaryStorage.getDataStore();
if (primaryStore.isLocal()) {
throw new CloudRuntimeException(
"Failed to attach local data volume " + volumeToAttach.getName() + " to VM " + vm.getDisplayName() + " as migration of local data volume is not allowed");
}
StoragePoolVO vmRootVolumePool = _storagePoolDao.findById(existingVolumeOfVm.getPoolId());

try {
HypervisorType volumeToAttachHyperType = _volsDao.getHypervisorType(volumeToAttach.getId());
newVolumeOnPrimaryStorage = _volumeMgr.moveVolume(newVolumeOnPrimaryStorage, vmRootVolumePool.getDataCenterId(), vmRootVolumePool.getPodId(), vmRootVolumePool.getClusterId(),
volumeToAttachHyperType);
} catch (ConcurrentOperationException | StorageUnavailableException e) {
logger.debug("move volume failed", e);
throw new CloudRuntimeException("move volume failed", e);
boolean isClvmLightweightMigration = isClvmLightweightMigrationNeeded(
newVolumeOnPrimaryStorage, existingVolumeOfVm, vm);

if (isClvmLightweightMigration) {
newVolumeOnPrimaryStorage = executeLightweightLockMigration(
newVolumeOnPrimaryStorage, vm, existingVolumeOfVm,
"CLVM lightweight migration", "different pools, same VG");
} else {
StoragePoolVO vmRootVolumePool = _storagePoolDao.findById(existingVolumeOfVm.getPoolId());

try {
HypervisorType volumeToAttachHyperType = _volsDao.getHypervisorType(volumeToAttach.getId());
newVolumeOnPrimaryStorage = _volumeMgr.moveVolume(newVolumeOnPrimaryStorage, vmRootVolumePool.getDataCenterId(), vmRootVolumePool.getPodId(), vmRootVolumePool.getClusterId(),
volumeToAttachHyperType);
} catch (ConcurrentOperationException | StorageUnavailableException e) {
logger.debug("move volume failed", e);
throw new CloudRuntimeException("move volume failed", e);
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this volume move section be extracted to a separate method to reduce indentation?

@Pearl1594 Pearl1594 Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be ok if I addressed this in another PR - with other enhancements that are yet to come.

Comment thread server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java Outdated
@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18208

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants