ceph - ceph-dashboard - 2024-08-27

Timestamp (UTC)Message
2024-08-27T06:21:39.019Z
<nizamial09> @athakkar there is a bug report from @Yite Gu of a crash on `get_daemon_health_metrics` in the mgr. He is running 16.2.4. Is something similar observed before in the prometheus and did we fix it at some point?
2024-08-27T06:22:58.257Z
<athakkar> I'm not sure if I'm ever involved in its implementation, AFAIK it's @Pere Diaz Bou right?
2024-08-27T06:24:51.258Z
<nizamial09> tbh, i don't recall correctly. let's wait for Pere then..
2024-08-27T06:32:33.383Z
<Yite Gu> This is the call stack during a crash
```{
    "crash_id": "2024-08-26T09:54:36.436236Z_e855b666-41be-4c85-92fe-fc6d8fba6c8e",
    "timestamp": "2024-08-26T09:54:36.436236Z",
    "process_name": "ceph-mgr",
    "entity_name": "mgr.a",
    "ceph_version": "16.2.14-2",
    "utsname_hostname": "n63-228-004",
    "utsname_sysname": "Linux",
    "utsname_release": "5.4.56.bsk.9-amd64",
    "utsname_version": "#5.4.56.bsk.9 SMP Debian 5.4.56.bsk.9 Wed Aug 25 03:42:38 UTC 20",
    "utsname_machine": "x86_64",
    "os_name": "CentOS Stream",
    "os_id": "centos",
    "os_version_id": "8",
    "os_version": "8",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12cf0) [0x7f8192e81cf0]",
        "(ActivePyModules::get_daemon_health_metrics()+0xf3) [0x55651ca5cee3]",
        "/lib64/libpython3.6m.so.1.0(+0x19d0d7) [0x7f819d06e0d7]",
        "_PyEval_EvalFrameDefault()",
        "/lib64/libpython3.6m.so.1.0(+0x179e48) [0x7f819d04ae48]",
        "/lib64/libpython3.6m.so.1.0(+0x19d377) [0x7f819d06e377]",
        "_PyEval_EvalFrameDefault()",
        "/lib64/libpython3.6m.so.1.0(+0x179e48) [0x7f819d04ae48]",
        "/lib64/libpython3.6m.so.1.0(+0x19d377) [0x7f819d06e377]",
        "_PyEval_EvalFrameDefault()",
        "/lib64/libpython3.6m.so.1.0(+0xf9984) [0x7f819cfca984]",
        "/lib64/libpython3.6m.so.1.0(+0x19c14f) [0x7f819d06d14f]",
        "PyObject_Call()",
        "_PyEval_EvalFrameDefault()",
        "/lib64/libpython3.6m.so.1.0(+0xfa2f6) [0x7f819cfcb2f6]",
        "/lib64/libpython3.6m.so.1.0(+0x17a030) [0x7f819d04b030]",
        "/lib64/libpython3.6m.so.1.0(+0x19d377) [0x7f819d06e377]",
        "_PyEval_EvalFrameDefault()",
        "_PyFunction_FastCallDict()",
        "_PyObject_FastCallDict()",
        "/lib64/libpython3.6m.so.1.0(+0x10db30) [0x7f819cfdeb30]",
        "PyObject_Call()",
        "_PyEval_EvalFrameDefault()",
        "/lib64/libpython3.6m.so.1.0(+0x179e48) [0x7f819d04ae48]",
        "/lib64/libpython3.6m.so.1.0(+0x19d377) [0x7f819d06e377]",
        "_PyEval_EvalFrameDefault()",
        "/lib64/libpython3.6m.so.1.0(+0x179e48) [0x7f819d04ae48]",
        "/lib64/libpython3.6m.so.1.0(+0x19d377) [0x7f819d06e377]",
        "_PyEval_EvalFrameDefault()",
        "_PyFunction_FastCallDict()",
        "_PyObject_FastCallDict()",
        "/lib64/libpython3.6m.so.1.0(+0x10db30) [0x7f819cfdeb30]",
        "PyObject_Call()",
        "/lib64/libpython3.6m.so.1.0(+0x20e012) [0x7f819d0df012]",
        "/lib64/libpython3.6m.so.1.0(+0x1b44c4) [0x7f819d0854c4]",
        "/lib64/libpthread.so.0(+0x81ca) [0x7f8192e771ca]",
        "clone()"
    ]
}```
2024-08-27T06:35:04.190Z
<Yite Gu> This is the call stack during crash
```{
    "crash_id": "2024-08-26T09:54:36.436236Z_e855b666-41be-4c85-92fe-fc6d8fba6c8e",
    "timestamp": "2024-08-26T09:54:36.436236Z",
    "process_name": "ceph-mgr",
    "entity_name": "mgr.a",
    "ceph_version": "16.2.14-2",
    "utsname_hostname": "n63-228-004",
    "utsname_sysname": "Linux",
    "utsname_release": "5.4.56.bsk.9-amd64",
    "utsname_version": "#5.4.56.bsk.9 SMP Debian 5.4.56.bsk.9 Wed Aug 25 03:42:38 UTC 20",
    "utsname_machine": "x86_64",
    "os_name": "CentOS Stream",
    "os_id": "centos",
    "os_version_id": "8",
    "os_version": "8",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12cf0) [0x7f8192e81cf0]",
        "(ActivePyModules::get_daemon_health_metrics()+0xf3) [0x55651ca5cee3]",
        "/lib64/libpython3.6m.so.1.0(+0x19d0d7) [0x7f819d06e0d7]",
        "_PyEval_EvalFrameDefault()",
        "/lib64/libpython3.6m.so.1.0(+0x179e48) [0x7f819d04ae48]",
        "/lib64/libpython3.6m.so.1.0(+0x19d377) [0x7f819d06e377]",
        "_PyEval_EvalFrameDefault()",
        "/lib64/libpython3.6m.so.1.0(+0x179e48) [0x7f819d04ae48]",
        "/lib64/libpython3.6m.so.1.0(+0x19d377) [0x7f819d06e377]",
        "_PyEval_EvalFrameDefault()",
        "/lib64/libpython3.6m.so.1.0(+0xf9984) [0x7f819cfca984]",
        "/lib64/libpython3.6m.so.1.0(+0x19c14f) [0x7f819d06d14f]",
        "PyObject_Call()",
        "_PyEval_EvalFrameDefault()",
        "/lib64/libpython3.6m.so.1.0(+0xfa2f6) [0x7f819cfcb2f6]",
        "/lib64/libpython3.6m.so.1.0(+0x17a030) [0x7f819d04b030]",
        "/lib64/libpython3.6m.so.1.0(+0x19d377) [0x7f819d06e377]",
        "_PyEval_EvalFrameDefault()",
        "_PyFunction_FastCallDict()",
        "_PyObject_FastCallDict()",
        "/lib64/libpython3.6m.so.1.0(+0x10db30) [0x7f819cfdeb30]",
        "PyObject_Call()",
        "_PyEval_EvalFrameDefault()",
        "/lib64/libpython3.6m.so.1.0(+0x179e48) [0x7f819d04ae48]",
        "/lib64/libpython3.6m.so.1.0(+0x19d377) [0x7f819d06e377]",
        "_PyEval_EvalFrameDefault()",
        "/lib64/libpython3.6m.so.1.0(+0x179e48) [0x7f819d04ae48]",
        "/lib64/libpython3.6m.so.1.0(+0x19d377) [0x7f819d06e377]",
        "_PyEval_EvalFrameDefault()",
        "_PyFunction_FastCallDict()",
        "_PyObject_FastCallDict()",
        "/lib64/libpython3.6m.so.1.0(+0x10db30) [0x7f819cfdeb30]",
        "PyObject_Call()",
        "/lib64/libpython3.6m.so.1.0(+0x20e012) [0x7f819d0df012]",
        "/lib64/libpython3.6m.so.1.0(+0x1b44c4) [0x7f819d0854c4]",
        "/lib64/libpthread.so.0(+0x81ca) [0x7f8192e771ca]",
        "clone()"
    ]
}```
2024-08-27T07:58:34.605Z
<Pere Diaz Bou> is there a way to reproduce this? Where you able to get the core dump and retrieve the line where it fails?
2024-08-27T08:13:57.164Z
<Yite Gu> The probability is very low, and we still don't know how to reproduce it
2024-08-27T08:15:06.600Z
<Yite Gu> No code dump file generated

Any issue? please create an issue here and use the infra label.