Skip to content

Commit 7f7ebe1

Browse files
committed
doc: cleanup assembly, base, neon
1 parent 5225bb0 commit 7f7ebe1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+325
-428
lines changed

.github/workflows/ci.yml

Lines changed: 6 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ on:
55
push:
66
pull_request:
77
branches: [ "main" ]
8+
# Allows you to run this workflow manually from the Actions tab
9+
workflow_dispatch:
810

911
env:
1012
parallel_processes: 8 # A good default counts is: available Threads + 4
@@ -29,38 +31,26 @@ jobs:
2931
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
3032
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
3133
run: |
32-
cmake -S ${{github.workspace}}/submissions/submission_25_05_01 -B ${{github.workspace}}/build/submission_25_05_01 -DCMAKE_BUILD_TYPE=${{matrix.build_type}}
33-
cmake -S ${{github.workspace}}/submissions/submission_25_05_08 -B ${{github.workspace}}/build/submission_25_05_08 -DCMAKE_BUILD_TYPE=${{matrix.build_type}}
34-
cmake -S ${{github.workspace}}/submissions/submission_25_05_15 -B ${{github.workspace}}/build/submission_25_05_15 -DCMAKE_BUILD_TYPE=${{matrix.build_type}}
35-
cmake -S ${{github.workspace}}/submissions/submission_25_05_22 -B ${{github.workspace}}/build/submission_25_05_22 -DCMAKE_BUILD_TYPE=${{matrix.build_type}}
34+
cmake -S ${{github.workspace}}/submissions/neon -B ${{github.workspace}}/build/neon -DCMAKE_BUILD_TYPE=${{matrix.build_type}}
3635
cmake -S ${{github.workspace}} -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{matrix.build_type}}
3736
3837
- name: Build
3938
# Build your program with the given configuration
4039
run: |
41-
cmake --build ${{github.workspace}}/build/submission_25_05_01 --config ${{matrix.build_type}} -j ${{env.parallel_processes}}
42-
cmake --build ${{github.workspace}}/build/submission_25_05_08 --config ${{matrix.build_type}} -j ${{env.parallel_processes}}
43-
cmake --build ${{github.workspace}}/build/submission_25_05_15 --config ${{matrix.build_type}} -j ${{env.parallel_processes}}
44-
cmake --build ${{github.workspace}}/build/submission_25_05_22 --config ${{matrix.build_type}} -j ${{env.parallel_processes}}
40+
cmake --build ${{github.workspace}}/build/neon --config ${{matrix.build_type}} -j ${{env.parallel_processes}}
4541
cmake --build ${{github.workspace}}/build --config ${{matrix.build_type}} -j ${{env.parallel_processes}}
4642
4743
- name: Test
4844
working-directory: ${{github.workspace}}/build
4945
# Execute tests defined by the CMake configuration.
5046
run: |
51-
ctest -j ${{env.parallel_processes}} -C ${{matrix.build_type}} --test-dir submission_25_05_01 --output-on-failure
52-
ctest -j ${{env.parallel_processes}} -C ${{matrix.build_type}} --test-dir submission_25_05_08 --output-on-failure
53-
ctest -j ${{env.parallel_processes}} -C ${{matrix.build_type}} --test-dir submission_25_05_15 --output-on-failure
54-
ctest -j ${{env.parallel_processes}} -C ${{matrix.build_type}} --test-dir submission_25_05_22 --output-on-failure
47+
ctest -j ${{env.parallel_processes}} -C ${{matrix.build_type}} --test-dir neon --output-on-failure
5548
ctest -j ${{env.parallel_processes}} -C ${{matrix.build_type}} --output-on-failure -E "^Test einsum tree optimize and execute first example"
5649
5750
- name: Test + Valgrind
5851
working-directory: ${{github.workspace}}/build
5952
# Execute tests defined by the CMake configuration.
6053
run: |
61-
ctest -j ${{env.parallel_processes}} -T memcheck -C ${{matrix.build_type}} --test-dir submission_25_05_01 --output-on-failure
62-
ctest -j ${{env.parallel_processes}} -T memcheck -C ${{matrix.build_type}} --test-dir submission_25_05_08 --output-on-failure
63-
ctest -j ${{env.parallel_processes}} -T memcheck -C ${{matrix.build_type}} --test-dir submission_25_05_15 --output-on-failure
64-
ctest -j ${{env.parallel_processes}} -T memcheck -C ${{matrix.build_type}} --test-dir submission_25_05_22 --output-on-failure
54+
ctest -j ${{env.parallel_processes}} -T memcheck -C ${{matrix.build_type}} --test-dir neon --output-on-failure
6555
ctest -j ${{env.parallel_processes}} -T memcheck -C ${{matrix.build_type}} --output-on-failure -E "^Test *(gemm generation|unary|tensor operation|parallel tensor operation|einsum tree execute|einsum tree optimize and execute)"
6656

.vscode/settings.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,12 +88,17 @@
8888
"Fastor",
8989
"fmax",
9090
"fmla",
91+
"GFLOPS",
9192
"heapbytes",
9293
"jited",
9394
"linalg",
9495
"madd",
96+
"matmul",
9597
"MATMUL",
9698
"MATMULS",
99+
"microbenchmark",
100+
"Microbenchmark",
101+
"microbenchmarks",
97102
"microkernel",
98103
"MINIJIT",
99104
"movz",

docs_sphinx/chapters/assembly.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ Assembly
44
Before we begin implementing the individual components of the project, we will start with a brief review of assembly language.
55
This short chapter is intended as a refresher on the basic knowledge required for the project.
66

7+
All files related to the tasks of this chapter can be found under ``submissions/assembly/``.
8+
79
Hello Assembly
810
--------------
911

docs_sphinx/chapters/base.rst

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,13 @@ Base
33

44
In this chapter, we get more familiar with some base ARM64 assembly instructions and how to benchmark the performance of such instructions.
55

6+
All files related to the tasks of this chapter can be found under ``submissions/base/``.
7+
68
Copying Data
79
------------
810

911
First, we will implement the functionality of the given ``copy_c_0`` and ``copy_c_1`` C functions from the ``copy_c.c`` file using only base instructions.
10-
The corresponding assembly code will be written in the ``copy_asm_0`` and ``copy_asm_1`` functions, located in the ``copy_asm.s`` file under
11-
``submissions/submission_25_04_24/copy_asm.s``.
12+
The corresponding assembly code will be written in the ``copy_asm_0`` and ``copy_asm_1`` functions, located in the ``copy_asm.s`` file.
1213

1314
1. copy_asm_0
1415
^^^^^^^^^^^^^
@@ -53,7 +54,7 @@ The corresponding assembly code will be written in the ``copy_asm_0`` and ``copy
5354
cmp x3, x0 // compare value in x3 and x0
5455
b.ge end_loop // conditions: counter x3 greater equal n/x0 (value in [x0])
5556
56-
ldr w4, [x1, x3, lsl #2] // adress = x1 + (x3 << 2)
57+
ldr w4, [x1, x3, lsl #2] // address = x1 + (x3 << 2)
5758
str w4, [x2, x3, lsl #2] // x3 << 2 = x3 * 4
5859
5960
add x3, x3, #1
@@ -79,9 +80,7 @@ Instruction Throughput and Latency
7980

8081
The next task is to benchmark the execution throughput and latency of the ``ADD`` (shifted register) and ``MUL`` instructions.
8182

82-
Our implementation is located under the directory ``submissions/submission_25_05_24/``.
83-
84-
Files: ``submissions/submission_25_05_24/``
83+
Files:
8584
- ``benchmark_driver.cpp``
8685
- ``benchmark.s``
8786

@@ -151,7 +150,7 @@ throughput and latency. For the throughput measurement of ``ADD`` this looks lik
151150
ret
152151
.size throughput_add, (. - throughput_add)
153152
154-
Throughput measurement of ``MUL`` is similar. For the latency benchmakring we use read-after-write dependencies to measure the latency of the instructions.
153+
Throughput measurement of ``MUL`` is similar. For the latency benchmarking we use read-after-write dependencies to measure the latency of the instructions.
155154
For ``ADD`` this looks like this:
156155

157156
.. code-block:: asm

0 commit comments

Comments
 (0)