2024-09-12T11:21:52.423Z | <Matthews Jose> https://files.slack.com/files-pri/T1HG3J90S-F07MH7EAZL1/download/untitled.cpp |
2024-09-12T11:22:03.695Z | <> file_delete |
2024-09-12T11:22:51.980Z | <Matthews Jose> Hi everyone,
I’m working on understanding the behavior of `MOSDOp` messages in Ceph, specifically regarding how multiple operations are bundled in the `ops` array.
From what I understand, `MOSDOp` is used to send operations (like reads, writes, and deletes) from clients to OSDs. The `ops` array in `MOSDOp` contains a list of `OSDOp` structures, each representing a different operation on the same object. I also know that `MOSDOp` messages can include various types of operations (such as reads and writes), but I want to clarify how often read and write operations (or other modifying operations like deletes) are mixed together in a single `MOSDOp`.
For example, in the following code snippet, the `ops` array could contain a write, followed by a read, and then another write:
My Questions:
1. *How common is it to mix read and write operations* in a single `MOSDOp` message? From what I’ve seen, most operations (like reads and writes) are typically issued separately by Ceph clients. Is it normal for clients to bundle them together, or is it an edge case for specific workloads?
2. If they are mixed, how does the OSD handle the execution of the operations? Does the OSD process these operations sequentially and return the results of the read, while ensuring that the writes are applied atomically?
3. Are there specific client-side behaviors (such as in CephFS, RADOS, or RBD) where this is more common, or are read and write operations typically separated in these scenarios as well?
Any insights or examples from real-world use cases would be helpful!
Thanks in advance for the clarification.: https://files.slack.com/files-pri/T1HG3J90S-F07MH7JAR4Z/download/untitled |
2024-09-12T13:14:19.599Z | <Casey Bodley> i managed to get something working by adding a pyproject.toml file with `requires = ["setuptools >= 61.0"]` in <https://github.com/ceph/ragweed/pull/28> |
2024-09-12T13:15:11.834Z | <Casey Bodley> but i have no idea why this broke. our s3tests repo uses a very similar pattern for tox and pytest, and still works |
2024-09-12T13:34:57.926Z | <John Mulligan> interesting. |
2024-09-12T14:33:11.826Z | <gregsfortytwo> @Matthews Jose It’s not very common, but it’s a key feature. Not that mixing reads and writes together itself is very interesting, since you can only return an integer value from anything with a write included — but because you can assert the object version, or an xattr value, or whatever other nonce you want, and only proceed to the write if it matches what you expect. |
2024-09-12T14:34:20.577Z | <gregsfortytwo> I’m not sure if any of the built-in systems use that any more, or have moved to more featureful interfaces like object classes. But for anybody using librados directly as a “normal” client application it’s quite a useful primitive |
2024-09-12T14:38:39.329Z | <gregsfortytwo> I believe RADOS will process the ops you input sequentially, but it will also only give you a small-byte-count information back — you can’t read back 4K with your write. If bigger mixed reads and writes are happening in a transaction, it will be as an object class function — that you **can** construct a mixed MOSDOp like you have is an accident of the APIs |
2024-09-12T16:16:47.088Z | <Guna Kambalimath> HI,
For the PRs that are raised in community, I see tests are being run for arm64, similarly, I would like to have tests enabled for ppc64le too. May I know what needs to be done for the same?
Possible approaches are
• Adding github actions - we run make, ninja tests, etc, by configuring in github actions, I am already trying to do this against my fork in git repo.
• Providing a VM to ceph foundation, so that the same runs via jenkins, we can work on providing the VM.
Will the first approach (github actions) be approved from ceph community? |
2024-09-12T16:25:10.868Z | <Matthews Jose> @gregsfortytwo Thanks for your response! I have a few follow-up questions to help clarify the behavior of a VM running on RBD and how operations are handled:
1. **Handling of Read/Write Operations in RBD**: When a VM accesses its RBD-backed storage and performs file operations (like `cat /etc/test.txt`), does Ceph typically handle these operations as **separate read and write `MOSDOp` messages**, or are these requests **bundled together** into a single `MOSDOp` message for efficiency?
2. **RBD Client and `MOSDOp` Process**: From my understanding, the VM's file system translates read/write operations into block-level requests. The **RBD client** then converts these block operations into **RADOS operations** (`MOSDOp` messages) that are sent to OSDs. Is this correct, and how does this process typically handle read/write interactions—are multiple `MOSDOp` messages generated for a sequence of operations like read-then-write?
3. **Object Classes in Ceph**: Regarding **object classes** in Ceph: if I were to run a command like `cat /etc/test.txt` from within the VM, I assume that wouldn’t involve an object class, but rather the **RBD client** using the **RADOS library** to send `MOSDOp` messages. Object classes seem to be used for more complex operations. For example, if I wanted to perform an operation that **counts the number of records** in an object or **compresses data** inside the OSD, I’d use an object class for that, but not for basic file read/write operations. Is that understanding correct?
Thanks again for the clarification. I’m trying to better understand the typical flow of operations from a VM on RBD and how `MOSDOp` and object classes come into play. |
2024-09-12T17:52:05.125Z | <gregsfortytwo> @Guna Kambalimath we haven’t used GitHub actions before so it would depend on how strong a case you can make. ARM enablement happened after several companies provided hardware to the sepia lab that provides the necessary throughput, and then interested volunteers committed to keeping them running and stable |
2024-09-12T17:52:57.717Z | <gregsfortytwo> A single VM is definitely not enough compute power to provide that throughout, though. You can go ask in <#C1HFJ4VTN|> about what the hardware requirements would be like. |
2024-09-12T17:54:19.362Z | <gregsfortytwo> @Matthews Jose rbd won’t bundle reads and writes together, and how many messages are generated will depend on a lot of different configuration options both above and within rbd |
2024-09-12T17:55:19.307Z | <gregsfortytwo> Yes, object classes are used for specialized ops. Rbd invokes them for its volume metadata updates and lock coordination but not much else |
2024-09-12T18:03:13.314Z | <Nathan Hoad> Hi all,
I just wanted to share that Bloomberg is hiring! If you are looking for a new challenge and working on interesting problems with large enterprise grade Ceph clusters sounds exciting to you, please feel free to apply. <https://bloomberg.avature.net/careers/JobDetail/Senior-Software-Engineer-Storage-Distributed-Downstream/7295> |