You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs_sphinx/chapters/base.rst
+6-7Lines changed: 6 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,12 +3,13 @@ Base
3
3
4
4
In this chapter, we get more familiar with some base ARM64 assembly instructions and how to benchmark the performance of such instructions.
5
5
6
+
All files related to the tasks of this chapter can be found under ``submissions/base/``.
7
+
6
8
Copying Data
7
9
------------
8
10
9
11
First, we will implement the functionality of the given ``copy_c_0`` and ``copy_c_1`` C functions from the ``copy_c.c`` file using only base instructions.
10
-
The corresponding assembly code will be written in the ``copy_asm_0`` and ``copy_asm_1`` functions, located in the ``copy_asm.s`` file under
11
-
``submissions/submission_25_04_24/copy_asm.s``.
12
+
The corresponding assembly code will be written in the ``copy_asm_0`` and ``copy_asm_1`` functions, located in the ``copy_asm.s`` file.
12
13
13
14
1. copy_asm_0
14
15
^^^^^^^^^^^^^
@@ -53,7 +54,7 @@ The corresponding assembly code will be written in the ``copy_asm_0`` and ``copy
@@ -79,9 +80,7 @@ Instruction Throughput and Latency
79
80
80
81
The next task is to benchmark the execution throughput and latency of the ``ADD`` (shifted register) and ``MUL`` instructions.
81
82
82
-
Our implementation is located under the directory ``submissions/submission_25_05_24/``.
83
-
84
-
Files: ``submissions/submission_25_05_24/``
83
+
Files:
85
84
- ``benchmark_driver.cpp``
86
85
- ``benchmark.s``
87
86
@@ -151,7 +150,7 @@ throughput and latency. For the throughput measurement of ``ADD`` this looks lik
151
150
ret
152
151
.size throughput_add, (. - throughput_add)
153
152
154
-
Throughput measurement of ``MUL`` is similar. For the latency benchmakring we use read-after-write dependencies to measure the latency of the instructions.
153
+
Throughput measurement of ``MUL`` is similar. For the latency benchmarking we use read-after-write dependencies to measure the latency of the instructions.
0 commit comments