“健康监测”界面能够查询服务器各部件的HA事件、修改自定义级别、配置监测范围。
开启健康监测功能后,系统会将HA事件发布给vCenter Server,vCenter Server主动执行高可用性保障措施,从而提升系统可用性。
启用健康监测功能前,请启用DRS、vSphere HA、Proactive HA及提供程序。
进入“主机和集群”界面。
进入“健康监测”界面。
健康监测功能默认为开启状态。
在“部件”、“建议级别”、“自定义级别”后,单击 , 在下拉框中选择待查看的对象,系统将会查询所选对象的事件ID、事件描述、建议级别及自定义级别并呈现结果。
参数 |
说明 |
示例 |
---|---|---|
部件 |
部件类型 监控范围:电源、内存、风扇、存储、网络 |
Fan |
事件ID |
HA事件的ID |
0x04000005 |
事件描述 |
HA事件的描述 |
Fan Redundancy Lost |
建议级别 |
HA事件的建议级别
|
轻微 |
自定义级别 |
发布给vCenter Server的事件级别。 说明:用户修改事件级别后,系统会将修改后的HA事件发布给vCenter Server。 |
紧急 |
序号 |
部件 |
事件ID |
事件描述 |
建议级别 |
---|---|---|---|---|
1 |
Power |
0x03000011 |
PSU Overtemperature |
严重 |
2 |
Power |
0x03000007 |
Power Supply Redundancy Lost |
严重 |
3 |
Power |
0x03000009 |
PSU Fault |
严重 |
4 |
Power |
0x0300000B |
Power Supply Prewarning |
轻微 |
5 |
Power |
0x0300000D |
Power Input Lost |
紧急 |
6 |
Power |
0x0300000F |
PSU Fan Fault |
严重 |
7 |
Power |
0x19000003 |
PSU backplane power supply abnormal |
严重 |
8 |
Fan |
0x04000005 |
Fan Redundancy Lost |
严重 |
9 |
Fan |
0x04000007 |
Large Fan Speed Difference |
严重 |
10 |
Fan |
0x0400000F |
(Major) The fan speed is high |
严重 |
11 |
Fan |
0x04000011 |
(Minor) The fan speed is high |
轻微 |
12 |
Memory |
0x0100003B |
Memory Overtemperature |
轻微 |
13 |
Memory |
0x0100003D |
Memory Overtemperature Major |
严重 |
14 |
Memory |
0x01000071 |
Memory Isolated |
严重 |
15 |
Memory |
0x0100005D |
Memory minor prefailure alarm |
轻微 |
16 |
Memory |
0x01000017 |
Memory MCE Error |
紧急 |
17 |
Memory |
0x2C00004B |
Configuration error of the DIMMs connected to the CPU |
紧急 |
18 |
Memory |
0x01000049 |
NVDIMM Slot Mismatch |
轻微 |
19 |
Memory |
0x0100004D |
DCPMM Warning |
轻微 |
20 |
Memory |
0x0100004F |
DCPMM Fault |
严重 |
21 |
Memory |
0x0100005B |
Memory in poor contact alarm |
严重 |
22 |
Memory |
0x0100005F |
Memory major prefailure alarm |
严重 |
23 |
Network |
0x53000003 |
OCP hardware component overtemperature |
轻微 |
24 |
Network |
0x53000009 |
OCP hardware component optical module overtemperature minor alarm |
轻微 |
25 |
Network |
0x080000CF |
NetCard minor fault alarm on PCIe card |
轻微 |
26 |
Network |
0x53000007 |
OCPCard hardware component fault minor alarm |
轻微 |
27 |
Network |
0x2900000F |
Optical Module Voltage Abnormal |
严重 |
28 |
Network |
0x2900002F |
NIC port link down |
严重 |
29 |
Network |
0x080000E5 |
Network Card minor fault alarm on PCIe card |
轻微 |
30 |
Network |
0x080000E7 |
Network Card major fault alarm on PCIe card |
严重 |
31 |
Storage |
0x02000007 |
Hard Disk Fault |
严重 |
32 |
Storage |
0x02000009 |
Hard Disk Prewarning |
轻微 |
33 |
Storage |
0x0200002B |
Hard Disk Link Fault |
轻微 |
34 |
Storage |
0x0200001D |
Low Hard Disk Remnant Wearout |
严重 |
35 |
Storage |
0x080000CD |
Disk major fault alarm on PCIe card |
严重 |
36 |
Storage |
0x02000025 |
Hard Disk Link Fault |
严重 |
37 |
Storage |
0x02000027 |
Hard Disk Status Abnormal |
轻微 |
38 |
Storage |
0x0200002D |
Hard Disk Lost |
严重 |
39 |
Storage |
0x02000013 |
Hard Disk MCE/AER Error |
严重 |
40 |
Storage |
0x32000003 |
Communication Failure Between Expander Controller and RAID Controller Card |
严重 |
重置后,所有自定义级别将恢复到系统默认的建议级别。
进入“主机和集群”界面。
进入“健康监测”界面。
弹出确认对话框。