bondscell_resultsE$8a51b9c5-8888-4578-ae40-cf906ec9b5faqueued¤logsrunning¦outputbody

[Eij10] V. Eijkhout. Introduction to High Performance Scientific Computing. 3 Edition, Vol. 1 (Lulu.com, 2010).

mimetext/htmlrootassigneelast_run_timestampAe#Mpersist_js_state÷has_pluto_hook_features§cell_id$8a51b9c5-8888-4578-ae40-cf906ec9b5fadepends_on_disabled_cells§runtimepublished_object_keysdepends_on_skipped_cells§errored$19943be2-1633-48c9-8cb3-2a73fb96e4aequeued¤logsrunning¦outputbodymimetext/plainrootassigneelast_run_timestampAe6^persist_js_state·has_pluto_hook_features§cell_id$19943be2-1633-48c9-8cb3-2a73fb96e4aedepends_on_disabled_cells§runtime 3published_object_keysdepends_on_skipped_cells§errored$98a65469-573e-43b5-9043-f3d0f3198bccqueued¤logsrunning¦outputbodys mimetext/htmlrootassigneelast_run_timestampAepersist_js_state·has_pluto_hook_features§cell_id$98a65469-573e-43b5-9043-f3d0f3198bccdepends_on_disabled_cells§runtime jpublished_object_keysdepends_on_skipped_cells§errored$8e337fad-abcf-4ad3-bf75-ab3980f36baaqueued¤logsrunning¦outputbodymimetext/plainrootassigneelast_run_timestampAeаpersist_js_state·has_pluto_hook_features§cell_id$8e337fad-abcf-4ad3-bf75-ab3980f36baadepends_on_disabled_cells§runtime published_object_keysdepends_on_skipped_cells§errored$9e78f2a1-0811-4f61-957d-ad4718430f7fqueued¤logsrunning¦outputbodyt

Cache hierarchy for a multi-core CPU

mimetext/htmlrootassigneelast_run_timestampAeҟcpersist_js_state÷has_pluto_hook_features§cell_id$9e78f2a1-0811-4f61-957d-ad4718430f7fdepends_on_disabled_cells§runtimeFpublished_object_keysdepends_on_skipped_cells§errored$e83baa29-ad2b-4ffc-99f5-cdbca9e31233queued¤logsrunning¦outputbodyb

Application to parallel sum

mimetext/htmlrootassigneelast_run_timestampAeҡpersist_js_state÷has_pluto_hook_features§cell_id$e83baa29-ad2b-4ffc-99f5-cdbca9e31233depends_on_disabled_cells§runtimeаpublished_object_keysdepends_on_skipped_cells§errored$96bffd66-24fc-46f7-b211-57e7d27bc316queued¤logsrunning¦outputbodyprefixFloat32elements0.700597text/plain0.429158text/plain0.159631text/plain0.174324text/plain0.834876text/plain0.742624text/plain0.60752text/plain0.819994text/plain 0.744194text/plain 0.817196text/plain 0.515746text/plain 0.0915513text/plain 0.204315text/plain0.266617text/plain0.0926264text/plain0.73415text/plain0.392629text/plain0.409257text/plain0.427472text/plain0.535635text/plainmore0.114796text/plain0.943326text/plain0.481296text/plain0.639699text/plain0.645393text/plain0.407587text/plain0.304633text/plain0.967761text/plain0.830977text/plain0.820921text/plaintypeArrayprefix_shortobjectid262fefed9465eccdmime!application/vnd.pluto.tree+objectrootassigneemany_veclast_run_timestampAe>̧persist_js_state·has_pluto_hook_features§cell_id$96bffd66-24fc-46f7-b211-57e7d27bc316depends_on_disabled_cells§runtimepublished_object_keysdepends_on_skipped_cells§errored$658ca396-2d73-4c93-8138-33c101deee7bqueued¤logsrunning¦outputbody

This shows the importance of data locality. An algorithm performs better if it accesses data close in memory and in a predictable pattern.

mimetext/htmlrootassigneelast_run_timestampAeҜpersist_js_state÷has_pluto_hook_features§cell_id$658ca396-2d73-4c93-8138-33c101deee7bdepends_on_disabled_cells§runtime õpublished_object_keysdepends_on_skipped_cells§errored$d5432907-3e55-4035-9c91-183c37d886eaqueued¤logsrunning¦outputbody mimetext/htmlrootassigneelast_run_timestampAeӈpersist_js_state·has_pluto_hook_features§cell_id$d5432907-3e55-4035-9c91-183c37d886eadepends_on_disabled_cells§runtimejpublished_object_keys573dc2256-4ab3-11f1-b967-89f2e690ee32/8850476c856a19d0depends_on_skipped_cells§errored$e90fd21d-d046-4852-823c-5d7210068923queued¤logsrunning¦outputbodyى

Cache coherence : Update L1 cache when the corresponding memory is modified by another core.

mimetext/htmlrootassigneelast_run_timestampAeҟpersist_js_state÷has_pluto_hook_features§cell_id$e90fd21d-d046-4852-823c-5d7210068923depends_on_disabled_cells§runtime4published_object_keysdepends_on_skipped_cells§errored$4b7a62a4-1e88-410b-8549-3021f6cdf6daqueued¤logsrunning¦outputbodyU

$$\begin{align} T_p &= T_1F_s + T_1F_p/p\\ S_p &= \frac{1}{F_s + F_p/p} & E_p &= \frac{1}{pF_s + F_p}\\ \lim_{p \to \infty} S_p &= \frac{1}{F_s} \end{align}$$

mimetext/htmlrootassigneelast_run_timestampAeҡSLpersist_js_state÷has_pluto_hook_features§cell_id$4b7a62a4-1e88-410b-8549-3021f6cdf6dadepends_on_disabled_cells§runtime ߵpublished_object_keysdepends_on_skipped_cells§errored$b26ab400-ce89-4a76-ad48-464ac6821dd2queued¤logsrunning¦outputbodyL

Amdahl's law

mimetext/htmlrootassigneelast_run_timestampAeҡpersist_js_state÷has_pluto_hook_features§cell_id$b26ab400-ce89-4a76-ad48-464ac6821dd2depends_on_disabled_cells§runtimeصpublished_object_keysdepends_on_skipped_cells§errored$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1queued¤logsrunning¦outputbodymimetext/plainrootassigneelast_run_timestampAe˰persist_js_state·has_pluto_hook_features§cell_id$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1depends_on_disabled_cells§runtimeCpublished_object_keysdepends_on_skipped_cells§errored$1f45bab8-afb7-4cd2-8a37-1f258f37ad8fqueued¤logsrunning¦outputbodyJ

Many processors

mimetext/htmlrootassigneelast_run_timestampAeҠ$persist_js_state÷has_pluto_hook_features§cell_id$1f45bab8-afb7-4cd2-8a37-1f258f37ad8fdepends_on_disabled_cells§runtimezpublished_object_keysdepends_on_skipped_cells§errored$e7445ed8-cbf7-475d-bd67-3df8d9015de2queued¤logsrunning¦outputbodyD

Parallel sum

mimetext/htmlrootassigneelast_run_timestampAeҟpersist_js_state÷has_pluto_hook_features§cell_id$e7445ed8-cbf7-475d-bd67-3df8d9015de2depends_on_disabled_cells§runtime?published_object_keysdepends_on_skipped_cells§errored$ccfd4488-a32a-4b35-a922-2e830f91ca08queued¤logsrunning¦outputbodymimetext/plainrootassigneelast_run_timestampAepersist_js_state·has_pluto_hook_features§cell_id$ccfd4488-a32a-4b35-a922-2e830f91ca08depends_on_disabled_cells§runtime!%|published_object_keysdepends_on_skipped_cells§errored$ebe1cd42-ba25-4538-acbe-353e0e47009equeued¤logsrunning¦outputbodymimetext/plainrootassigneelast_run_timestampAe͒persist_js_state·has_pluto_hook_features§cell_id$ebe1cd42-ba25-4538-acbe-353e0e47009edepends_on_disabled_cells§runtimepublished_object_keysdepends_on_skipped_cells§errored$8b98b33e-f65d-4cbd-9e80-20a7132cd349queued¤logsrunning¦outputbody0.2mimetext/plainrootassigneelast_run_timestampAe٢8vpersist_js_state·has_pluto_hook_features§cell_id$8b98b33e-f65d-4cbd-9e80-20a7132cd349depends_on_disabled_cells§runtimeOpublished_object_keysdepends_on_skipped_cells§errored$1b9fb8aa-71cf-4e69-ad84-666c1b66bb5equeued¤logsrunning¦outputbody mimetext/htmlrootassigneelast_run_timestampAeӮ&Ȱpersist_js_state·has_pluto_hook_features§cell_id$1b9fb8aa-71cf-4e69-ad84-666c1b66bb5edepends_on_disabled_cells§runtimeUd7published_object_keysdepends_on_skipped_cells§errored$f95dd40b-8c56-4e10-abbc-3dbb58148e1fqueued¤logsrunning¦outputbodyL

Amdahl's law

mimetext/htmlrootassigneelast_run_timestampAeҠCpersist_js_state÷has_pluto_hook_features§cell_id$f95dd40b-8c56-4e10-abbc-3dbb58148e1fdepends_on_disabled_cells§runtimeڇpublished_object_keysdepends_on_skipped_cells§errored$d8238145-9787-40f0-a151-1ef73d8c97eequeued¤logsrunning¦outputbodychildren text/html0_text/htmlclassnamestyle#display: flex; flex-direction: row;mime'application/vnd.pluto.divelement+objectrootassigneelast_run_timestampAe28persist_js_state·has_pluto_hook_features§cell_id$d8238145-9787-40f0-a151-1ef73d8c97eedepends_on_disabled_cells§runtimepublished_object_keysdepends_on_skipped_cells§errored$a32ba8f2-a9c9-41c6-99b4-577f0823bd9fqueued¤logsrunning¦outputbody\

Cache lines and prefetch

mimetext/htmlrootassigneelast_run_timestampAeҖcpersist_js_state÷has_pluto_hook_features§cell_id$a32ba8f2-a9c9-41c6-99b4-577f0823bd9fdepends_on_disabled_cells§runtimepublished_object_keysdepends_on_skipped_cells§errored$cd6c2bda-e80d-46ba-8242-f0f61a250471queued¤logsrunning¦outputbody+definition (generic function with 1 method)mimetext/plainrootassigneelast_run_timestampAe\persist_js_state·has_pluto_hook_features§cell_id$cd6c2bda-e80d-46ba-8242-f0f61a250471depends_on_disabled_cells§runtimek_published_object_keysdepends_on_skipped_cells§errored$253bd533-99b7-4012-b3f4-e86a2466a919queued¤logsrunning¦outputbodymimetext/plainrootassigneelast_run_timestampAeKbpersist_js_state·has_pluto_hook_features§cell_id$253bd533-99b7-4012-b3f4-e86a2466a919depends_on_disabled_cells§runtime":published_object_keysdepends_on_skipped_cells§errored$910fc9b2-c57d-4874-b8b2-df440fc921c0queued¤logslinemsg& 9.137 μs (3 allocations: 80 bytes) text/plaincell_id$910fc9b2-c57d-4874-b8b2-df440fc921c0kwargsidPlutoRunner_d1acb81efileP/home/runner/.julia/packages/Pluto/1XRxx/src/runner/PlutoRunner/src/io/stdout.jlgroupstdoutlevelLogLevel(-555)running¦outputbody32735.955f0mimetext/plainrootassigneelast_run_timestampAeّpersist_js_state·has_pluto_hook_features§cell_id$910fc9b2-c57d-4874-b8b2-df440fc921c0depends_on_disabled_cells§runtimeu published_object_keysdepends_on_skipped_cells§errored$6f70144e-5240-41ef-a719-8a8942e18feequeued¤logsrunning¦outputbodymimetext/htmlrootassigneelast_run_timestampAe`persist_js_state·has_pluto_hook_features§cell_id$6f70144e-5240-41ef-a719-8a8942e18feedepends_on_disabled_cells§runtimepublished_object_keysdepends_on_skipped_cells§errored$caec43a3-9bac-4f73-a8e5-288cfa9e1606queued¤logsrunning¦outputbodyw mimetext/htmlrootassigneelast_run_timestampAe

Benchmark

mimetext/htmlrootassigneelast_run_timestampAeҠNpersist_js_state÷has_pluto_hook_features§cell_id$db24839c-eb42-4d5c-8545-3714abc01bc5depends_on_disabled_cells§runtime ڵpublished_object_keysdepends_on_skipped_cells§errored$6657e4dd-f5c2-47c4-b0d6-a2a56aac7b96queued¤logsrunning¦outputbodymimetext/plainrootassigneelast_run_timestampAe\persist_js_state·has_pluto_hook_features§cell_id$6657e4dd-f5c2-47c4-b0d6-a2a56aac7b96depends_on_disabled_cells§runtime 8ߵpublished_object_keysdepends_on_skipped_cells§errored$ea9ff1a9-615d-4e18-a4c8-9aad20447156queued¤logsrunning¦outputbodyw mimetext/htmlrootassigneelast_run_timestampAeApersist_js_state·has_pluto_hook_features§cell_id$ea9ff1a9-615d-4e18-a4c8-9aad20447156depends_on_disabled_cells§runtime=published_object_keysdepends_on_skipped_cells§errored$c0bda86a-136b-45ca-84ba-7365c367d265queued¤logsrunning¦outputbodyT

Arithmetic intensity

mimetext/htmlrootassigneelast_run_timestampAeҞDpersist_js_state÷has_pluto_hook_features§cell_id$c0bda86a-136b-45ca-84ba-7365c367d265depends_on_disabled_cells§runtimepublished_object_keysdepends_on_skipped_cells§errored$11b1c6a8-3918-4dda-9028-17af2d6c44c4queued¤logsrunning¦outputbody`

Consider a program requiring m load / store operations with memory for o arithmetic operations.

As arithmetic operations and data transfer are done in parallel, the time per iteration is

$$\max(t_\text{arith}, t_\text{mem}) / o = 1/\min(\text{frequency}, a \cdot \text{bandwidth})$$

So the number of operations per second is $\min(\text{frequency}, a \cdot \text{bandwidth})$.

This piecewise linear function in $a$ gives the roofline model.

mimetext/htmlrootassigneelast_run_timestampAeҞapersist_js_state÷has_pluto_hook_features§cell_id$11b1c6a8-3918-4dda-9028-17af2d6c44c4depends_on_disabled_cells§runtimeWpublished_object_keysdepends_on_skipped_cells§errored$d221bad8-98fb-4c1d-9c9c-66e1b697f023queued¤logsrunning¦outputbody7
mimetext/htmlrootassigneelast_run_timestampAeҟo?persist_js_state÷has_pluto_hook_features§cell_id$d221bad8-98fb-4c1d-9c9c-66e1b697f023depends_on_disabled_cells§runtime Npublished_object_keysdepends_on_skipped_cells§errored$a7118fbb-66d6-44a1-a6ae-839f0e42a3ecqueued¤logslinemsg% 3.078 μs (0 allocations: 0 bytes) text/plaincell_id$a7118fbb-66d6-44a1-a6ae-839f0e42a3eckwargsidPlutoRunner_d1acb81efileP/home/runner/.julia/packages/Pluto/1XRxx/src/runner/PlutoRunner/src/io/stdout.jlgroupstdoutlevelLogLevel(-555)running¦outputbody32735.955f0mimetext/plainrootassigneelast_run_timestampAe&5persist_js_state·has_pluto_hook_features§cell_id$a7118fbb-66d6-44a1-a6ae-839f0e42a3ecdepends_on_disabled_cells§runtimeΨpublished_object_keysdepends_on_skipped_cells§errored$6e8865f5-84ad-4083-bb19-57ad1b561fabqueued¤logsrunning¦outputbodyP

The roofline model

mimetext/htmlrootassigneelast_run_timestampAeҟ+persist_js_state÷has_pluto_hook_features§cell_id$6e8865f5-84ad-4083-bb19-57ad1b561fabdepends_on_disabled_cells§runtimeUpublished_object_keysdepends_on_skipped_cells§errored$d718f117-41da-42ff-9bcd-8bef0e7e6974queued¤logsrunning¦outputbodyn

If we have many processors, we may want to speed up the last part as well:

mimetext/htmlrootassigneelast_run_timestampAeҠ}persist_js_state÷has_pluto_hook_features§cell_id$d718f117-41da-42ff-9bcd-8bef0e7e6974depends_on_disabled_cells§runtimepublished_object_keysdepends_on_skipped_cells§errored$3887824b-7c7f-4c24-bf6d-7a55ed7adc89queued¤logsrunning¦outputbodyF

Memory layout

mimetext/htmlrootassigneelast_run_timestampAe2?Ͱpersist_js_state÷has_pluto_hook_features§cell_id$3887824b-7c7f-4c24-bf6d-7a55ed7adc89depends_on_disabled_cells§runtimepublished_object_keysdepends_on_skipped_cells§errored$050a67f8-7f02-4ac9-8ac4-20327d46c5e8queued¤logsrunning¦outputbodymimetext/plainrootassigneelast_run_timestampAeWpersist_js_state·has_pluto_hook_features§cell_id$050a67f8-7f02-4ac9-8ac4-20327d46c5e8depends_on_disabled_cells§runtimepublished_object_keysdepends_on_skipped_cells§errored$62297efe-3a15-4dab-ac17-b823ab3e7933queued¤logslinemsg% 2.774 μs (0 allocations: 0 bytes) text/plaincell_id$62297efe-3a15-4dab-ac17-b823ab3e7933kwargsidPlutoRunner_d1acb81efileP/home/runner/.julia/packages/Pluto/1XRxx/src/runner/PlutoRunner/src/io/stdout.jlgroupstdoutlevelLogLevel(-555)running¦outputbody32811.023f0mimetext/plainrootassigneelast_run_timestampAeqװpersist_js_state·has_pluto_hook_features§cell_id$62297efe-3a15-4dab-ac17-b823ab3e7933depends_on_disabled_cells§runtimeΟ]published_object_keysdepends_on_skipped_cells§errored$31727049-ac0a-45a6-aae0-934d4549b541queued¤logslinemsg& 2.340 μs (3 allocations: 80 bytes) text/plaincell_id$31727049-ac0a-45a6-aae0-934d4549b541kwargsidPlutoRunner_d1acb81efileP/home/runner/.julia/packages/Pluto/1XRxx/src/runner/PlutoRunner/src/io/stdout.jlgroupstdoutlevelLogLevel(-555)running¦outputbody32811.03f0mimetext/plainrootassigneelast_run_timestampAe4persist_js_state·has_pluto_hook_features§cell_id$31727049-ac0a-45a6-aae0-934d4549b541depends_on_disabled_cells§runtimeδrǵpublished_object_keysdepends_on_skipped_cells§errored$2a1f3d29-4d6b-4634-86f3-4ecd4a7821a2queued¤logsrunning¦outputbodyX

Speed-up and efficency

mimetext/htmlrootassigneelast_run_timestampAeҠаpersist_js_state÷has_pluto_hook_features§cell_id$2a1f3d29-4d6b-4634-86f3-4ecd4a7821a2depends_on_disabled_cells§runtime3ipublished_object_keysdepends_on_skipped_cells§errored$81465bf1-8e54-461f-892c-2769bf94fdfequeued¤logsrunning¦outputbody

Latency of n bytes of data is given by

$$\alpha + \beta n$$

where $\alpha$ is the start up time and $\beta$ is the inverse of the bandwidth.

mimetext/htmlrootassigneelast_run_timestampAeҖ5ΰpersist_js_state÷has_pluto_hook_features§cell_id$81465bf1-8e54-461f-892c-2769bf94fdfedepends_on_disabled_cells§runtime=published_object_keysdepends_on_skipped_cells§errored$3a5d674d-7c5b-4dac-b9ae-d65a1e9a5cbaqueued¤logsrunning¦outputbodyprefixFloat32elements0.85051text/plain0.0503311text/plain0.198103text/plain0.0931568text/plain0.0859969text/plain0.695007text/plain0.736921text/plain0.212104text/plain 0.362795text/plain 0.110771text/plain 0.555139text/plain 0.0231498text/plain 0.835648text/plain0.349553text/plain0.663635text/plain0.100212text/plain0.0345355text/plain0.931736text/plain0.53907text/plain0.26969text/plainmore0.73696text/plain0.26269text/plain0.617377text/plain0.540314text/plain0.849055text/plain0.216793text/plain0.796756text/plain0.767626text/plain0.954686text/plain0.186273text/plaintypeArrayprefix_shortobjectid57a27ca5a9c857f3mime!application/vnd.pluto.tree+objectrootassigneeveclast_run_timestampAeӨ2npersist_js_state·has_pluto_hook_features§cell_id$3a5d674d-7c5b-4dac-b9ae-d65a1e9a5cbadepends_on_disabled_cells§runtime}published_object_keysdepends_on_skipped_cells§errored$6c021710-5828-4ac0-8619-ce690ba89d5fqueued¤logsrunning¦outputbody mimetext/htmlrootassigneelast_run_timestampAe=뎰persist_js_state·has_pluto_hook_features§cell_id$6c021710-5828-4ac0-8619-ce690ba89d5fdepends_on_disabled_cells§runtime: published_object_keys573dc2256-4ab3-11f1-b967-89f2e690ee32/3583170c6e86b605depends_on_skipped_cells§errored$e867d9be-5668-4756-af7f-c23c48962f08queued¤logsrunning¦outputbodyv mimetext/htmlrootassigneelast_run_timestampAeOưpersist_js_state·has_pluto_hook_features§cell_id$e867d9be-5668-4756-af7f-c23c48962f08depends_on_disabled_cells§runtimeCpublished_object_keysdepends_on_skipped_cells§errored$de0bbef2-1240-4f85-889f-0af509d6cfffqueued¤logsrunning¦outputbody mimetext/htmlrootassigneelast_run_timestampAe@(persist_js_state·has_pluto_hook_features§cell_id$de0bbef2-1240-4f85-889f-0af509d6cfffdepends_on_disabled_cells§runtimepublished_object_keysdepends_on_skipped_cells§errored$d52b5227-0ef4-4dc6-b443-03364b0caeffqueued¤logsrunning¦outputbodymimetext/plainrootassigneelast_run_timestampAe^persist_js_state·has_pluto_hook_features§cell_id$d52b5227-0ef4-4dc6-b443-03364b0caeffdepends_on_disabled_cells§runtimepublished_object_keysdepends_on_skipped_cells§errored$138caa9b-1d53-4c01-a3b9-c1a097413736queued¤logsrunning¦outputbody3mimetext/htmlrootassigneelast_run_timestampAezppersist_js_state·has_pluto_hook_features§cell_id$138caa9b-1d53-4c01-a3b9-c1a097413736depends_on_disabled_cells§runtime#{published_object_keysdepends_on_skipped_cells§errored$d1aef3d4-33d1-4151-8ba3-2169f734ea6bqueued¤logsrunning¦outputbody

How to get $1/F_s = \lim_{p \to \infty} S_p$ ?

The algorithm cannot use more than $n$ processes so if $p \ge n$, we have $T_p = 1 + \log_2(n)$. Therefore, $\lim_{p \to \infty} S_p = S_n = \frac{n}{1 + \log_2(n)}$. Ignoring the constant $1$, we get $F_s = \log_2(n)/n$.

mimetext/htmlrootassigneelast_run_timestampAeٖpersist_js_state·has_pluto_hook_features§cell_id$d1aef3d4-33d1-4151-8ba3-2169f734ea6bdepends_on_disabled_cells§runtimetpublished_object_keysdepends_on_skipped_cells§errored$947b8e5c-9cb7-4fe6-aff6-48416879fb43queued¤logslinemsgathread id : 0 / 2 0:32767 thread id : 1 / 2 32768:65535 thread id : 0 / 1 0:1 0.000088 seconds text/plaincell_id$947b8e5c-9cb7-4fe6-aff6-48416879fb43kwargsidPlutoRunner_d1acb81efileP/home/runner/.julia/packages/Pluto/1XRxx/src/runner/PlutoRunner/src/io/stdout.jlgroupstdoutlevelLogLevel(-555)running¦outputbody32811.023f0mimetext/plainrootassigneelast_run_timestampAeٓؐpersist_js_state·has_pluto_hook_features§cell_id$947b8e5c-9cb7-4fe6-aff6-48416879fb43depends_on_disabled_cells§runtimeCOpublished_object_keysdepends_on_skipped_cells§errored$4b9cfb4d-2355-42e3-be2f-35e2638e984bqueued¤logsrunning¦outputbody!
#include <vector>
#include <stdint.h>
#include <omp.h>
#include <stdio.h>

extern "C" {
float sum(float *vec, int length, int num_threads, int verbose) {
  float total = 0;
  omp_set_dynamic(0); // Force the value `num_threads`
  omp_set_num_threads(num_threads);
  #pragma omp parallel
  {
    int thread_num = omp_get_thread_num();
	int stride = length / num_threads;
    int last = stride * (thread_num + 1);
    if (thread_num + 1 == num_threads)
      last = length;
	if (verbose >= 1)
      fprintf(stderr, "thread id : %d / %d %d:%d\n", thread_num, omp_get_num_threads(), stride * thread_num, last - 1);
    #pragma omp simd
    for (int i = stride * thread_num; i < last; i++)
      total += vec[i];
  }
  return total;
}
}
mimetext/htmlrootassigneelast_run_timestampAepersist_js_state·has_pluto_hook_features§cell_id$4b9cfb4d-2355-42e3-be2f-35e2638e984bdepends_on_disabled_cells§runtime!9published_object_keysdepends_on_skipped_cells§errored$3bff6f16-ad66-4aed-9a29-4bdcfe41b57fqueued¤logsrunning¦outputbodyV]

LINMA2710 - Scientific Computing Shared-Memory Multiprocessing

P.-A. Absil and B. Legat

      mimetext/htmlrootassigneelast_run_timestampAe哰persist_js_state·has_pluto_hook_features§cell_id$3bff6f16-ad66-4aed-9a29-4bdcfe41b57fdepends_on_disabled_cells§runtime.published_object_keysdepends_on_skipped_cells§errored$f7da896d-089c-4430-b82c-db86c380b171queued¤logsrunning¦outputbody@

The first sum_to takes $n/p$ operations. Assuming factor is 2, there is one operation for each of the $\log_2(p)$ subsequent sum_to.

$$\begin{align} T_1 & = n\\ T_p & = n/p + \log_2(p)\\ S_p & = \frac{1}{1/p + \log_2(p)/n} & E_p & = \frac{1}{1 + p\log_2(p)/n} \end{align}$$

mimetext/htmlrootassigneelast_run_timestampAeҡʰpersist_js_state÷has_pluto_hook_features§cell_id$f7da896d-089c-4430-b82c-db86c380b171depends_on_disabled_cells§runtime@4published_object_keysdepends_on_skipped_cells§errored$655d59f3-665b-410b-b534-afadae96295bqueued¤logsrunning¦outputbody mimetext/htmlrootassigneelast_run_timestampAe̚persist_js_state·has_pluto_hook_features§cell_id$655d59f3-665b-410b-b534-afadae96295bdepends_on_disabled_cells§runtimeE published_object_keysdepends_on_skipped_cells§errored$5d7cd5e3-5fc2-4835-bea1-c4897467365bqueued¤logsrunning¦outputbody mimetext/htmlrootassigneelast_run_timestampAe9fpersist_js_state·has_pluto_hook_features§cell_id$5d7cd5e3-5fc2-4835-bea1-c4897467365bdepends_on_disabled_cells§runtime6 published_object_keysdepends_on_skipped_cells§errored$81da94b8-1bbf-4773-ba53-d229452cef75queued¤logsrunning¦outputbody256×256 Matrix{Float32}: 0.0597793 0.417787 0.126267 0.859484 … 0.371382 0.500273 0.938325 0.535802 0.242073 0.0661986 0.185958 0.154594 0.210403 0.471678 0.650056 0.062982 0.59131 0.536327 0.966751 0.744848 0.274031 0.318135 0.288982 0.104517 0.251897 0.251931 0.234469 0.0543474 0.427851 0.45264 0.0766009 0.310852 0.746904 0.00116313 0.693265 0.737975 0.119318 0.209818 0.86741 … 0.468451 0.139429 0.0551009 0.0502196 0.897035 0.595642 0.305991 0.0977772 0.160193 0.243116 ⋮ ⋱ ⋮ 0.420051 0.933573 0.288458 0.385103 … 0.733172 0.30978 0.991448 0.105571 0.65548 0.528511 0.605652 0.0840088 0.853477 0.7249 0.947079 0.30351 0.6983 0.179855 0.841775 0.839626 0.217399 0.143247 0.649893 0.37815 0.672719 0.215264 0.902336 0.00395232 0.00201213 0.883567 0.349935 0.920309 0.647945 0.369827 0.625111 0.579034 0.364038 0.0776384 0.0328473 … 0.573341 0.850136 0.074568mimetext/plainrootassigneematlast_run_timestampAeCݰpersist_js_state·has_pluto_hook_features§cell_id$81da94b8-1bbf-4773-ba53-d229452cef75depends_on_disabled_cells§runtimepublished_object_keysdepends_on_skipped_cells§errored$258817e3-8495-4136-8cb9-00a4475245b2queued¤logsrunning¦outputbody
#include <omp.h>
#include <stdio.h>

extern "C" {
void sum_to(float *vec, int length, float *local_results, int num_threads, int verbose) {
  omp_set_dynamic(0); // Force the value `num_threads`
  omp_set_num_threads(num_threads);
  #pragma omp parallel
  {
    int thread_num = omp_get_thread_num();
	int stride = length / num_threads;
    int last = stride * (thread_num + 1);
    if (thread_num + 1 == num_threads)
      last = length;
	if (verbose >= 1)
      fprintf(stderr, "thread id : %d / %d %d:%d\n", thread_num, omp_get_num_threads(), stride * thread_num, last - 1);
	float no_false_sharing = 0;
    #pragma omp simd
    for (int i = stride * thread_num; i < last; i++)
      no_false_sharing += vec[i];
	local_results[thread_num] = no_false_sharing;
  }
}

float sum(float *vec, int length, int num_threads, int factor, int verbose) {
  float* buffers[2] = {new float[num_threads], new float[num_threads / factor]};
  sum_to(vec, length, buffers[0], num_threads, verbose);
  int prev = num_threads, cur;
  int buffer_idx = 0;
  for (cur = num_threads / factor; cur > 0; cur /= factor) {
	sum_to(buffers[buffer_idx % 2], prev, buffers[(buffer_idx + 1) % 2], cur, verbose);
	prev = cur;
	buffer_idx += 1;
  }
  if (prev == 1)
	return buffers[buffer_idx % 2][0];
  sum_to(buffers[buffer_idx % 2], prev, buffers[(buffer_idx + 1) % 2], 1, verbose);
  return buffers[(buffer_idx + 1) % 2][0];
}
}
mimetext/htmlrootassigneelast_run_timestampAe'persist_js_state·has_pluto_hook_features§cell_id$258817e3-8495-4136-8cb9-00a4475245b2depends_on_disabled_cells§runtime#published_object_keysdepends_on_skipped_cells§errored$a144f991-3af4-4f88-a7e8-c1f3e9d7f7e2queued¤logsrunning¦outputbodyĒ mimetext/htmlrootassigneelast_run_timestampAe>persist_js_state·has_pluto_hook_features§cell_id$a144f991-3af4-4f88-a7e8-c1f3e9d7f7e2depends_on_disabled_cells§runtimeJxpublished_object_keysdepends_on_skipped_cells§errored$dbe9c6d8-611e-46f9-9a8e-2b3647e813faqueued¤logsrunning¦outputbodylmimetext/htmlrootassigneelast_run_timestampAepersist_js_state·has_pluto_hook_features§cell_id$dbe9c6d8-611e-46f9-9a8e-2b3647e813fadepends_on_disabled_cells§runtimeLpublished_object_keysdepends_on_skipped_cells§errored$4ae73a28-d945-4c1b-a281-aa4931bf0bfdqueued¤logslinemsg& 79.369 μs (0 allocations: 0 bytes) text/plaincell_id$4ae73a28-d945-4c1b-a281-aa4931bf0bfdkwargsidPlutoRunner_d1acb81efileP/home/runner/.julia/packages/Pluto/1XRxx/src/runner/PlutoRunner/src/io/stdout.jlgroupstdoutlevelLogLevel(-555)running¦outputbody32710.873f0mimetext/plainrootassigneelast_run_timestampAeapersist_js_state·has_pluto_hook_features§cell_id$4ae73a28-d945-4c1b-a281-aa4931bf0bfddepends_on_disabled_cells§runtimeJҕpublished_object_keysdepends_on_skipped_cells§errored$19655acd-5880-44fa-ac29-d56faf43e87bqueued¤logsrunning¦outputbodymimetext/plainrootassigneelast_run_timestampAeӯ~persist_js_state·has_pluto_hook_features§cell_id$19655acd-5880-44fa-ac29-d56faf43e87bdepends_on_disabled_cells§runtime59published_object_keysdepends_on_skipped_cells§errored$b2b3beda-c8bf-4616-b1bd-bdd907d11636queued¤logsrunning¦outputbodychildrenٺ

Def: Speed-up

$$S_p = \frac{T_1}{T_p}$$

text/htmlchildrenٺ

Def: Efficiency

$$E_p = \frac{S_p}{p}$$

text/htmlclassnamestylemargin-left: 30px;'application/vnd.pluto.divelement+objectchildren]

Let $T_p$ bet the time with $p$ processes

text/htmlclassnamestyleflex-grow: 1; margin: 30px;'application/vnd.pluto.divelement+objectclassnamestyleQalign-items: center; display: flex; flex-direction: row; justify-content: center;mime'application/vnd.pluto.divelement+objectrootassigneelast_run_timestampAeDаpersist_js_state·has_pluto_hook_features§cell_id$b2b3beda-c8bf-4616-b1bd-bdd907d11636depends_on_disabled_cells§runtime lpublished_object_keysdepends_on_skipped_cells§errored$37d9b5f0-48b6-4ff3-873d-592230687995queued¤logsrunning¦outputbody>

Hierarchy

mimetext/htmlrootassigneelast_run_timestampAe҅persist_js_state÷has_pluto_hook_features§cell_id$37d9b5f0-48b6-4ff3-873d-592230687995depends_on_disabled_cells§runtimepublished_object_keysdepends_on_skipped_cells§errored$fa017c45-6410-4c14-b9a2-ede33759d396queued¤logsrunning¦outputbodymimetext/plainrootassigneelast_run_timestampAe60persist_js_state·has_pluto_hook_features§cell_id$fa017c45-6410-4c14-b9a2-ede33759d396depends_on_disabled_cells§runtimeKLpublished_object_keysdepends_on_skipped_cells§errored$91f7f3fe-cb4d-4461-9147-81c72b650a4bqueued¤logsrunning¦outputbody%img (generic function with 3 methods)mimetext/plainrootassigneelast_run_timestampAe٬x6persist_js_state·has_pluto_hook_features§cell_id$91f7f3fe-cb4d-4461-9147-81c72b650a4bdepends_on_disabled_cells§runtimeVjpublished_object_keysdepends_on_skipped_cells§errored$b5f1d18c-53ad-4441-8ca8-e02d6ab840d0queued¤logslinemsgKthread id : 0 / 2 0:32767 thread id : 1 / 2 32768:65535 0.000062 seconds text/plaincell_id$b5f1d18c-53ad-4441-8ca8-e02d6ab840d0kwargsidPlutoRunner_d1acb81efileP/home/runner/.julia/packages/Pluto/1XRxx/src/runner/PlutoRunner/src/io/stdout.jlgroupstdoutlevelLogLevel(-555)running¦outputbody32811.03f0mimetext/plainrootassigneelast_run_timestampAe=Lpersist_js_state·has_pluto_hook_features§cell_id$b5f1d18c-53ad-4441-8ca8-e02d6ab840d0depends_on_disabled_cells§runtimedpublished_object_keysdepends_on_skipped_cells§errored$c6ea9bc5-bb15-4e25-a854-de3417d736a6queued¤logsrunning¦outputbodyw mimetext/htmlrootassigneelast_run_timestampAeA{persist_js_state·has_pluto_hook_features§cell_id$c6ea9bc5-bb15-4e25-a854-de3417d736a6depends_on_disabled_cells§runtime<۵published_object_keysdepends_on_skipped_cells§errored$02be0de6-70dc-4cf4-b630-b541a304eecdqueued¤logsrunning¦outputbodyoKmimetext/htmlrootassigneelast_run_timestampAe,԰persist_js_state·has_pluto_hook_features§cell_id$02be0de6-70dc-4cf4-b630-b541a304eecddepends_on_disabled_cells§runtimejpublished_object_keysdepends_on_skipped_cells§errored$f26f0a70-c16b-491d-b4cf-45ca146727c2queued¤logsrunning¦outputbody`

Illustration with matrices

mimetext/htmlrootassigneelast_run_timestampAeҜJpersist_js_state÷has_pluto_hook_features§cell_id$f26f0a70-c16b-491d-b4cf-45ca146727c2depends_on_disabled_cells§runtimeH2published_object_keysdepends_on_skipped_cells§errored±cell_dependenciesE$8a51b9c5-8888-4578-ae40-cf906ec9b5faprecedence_heuristic cell_id$8a51b9c5-8888-4578-ae40-cf906ec9b5fadownstream_cells_mapupstream_cells_map@md_strgetindex$19943be2-1633-48c9-8cb3-2a73fb96e4aeprecedence_heuristic cell_id$19943be2-1633-48c9-8cb3-2a73fb96e4aedownstream_cells_mapc_sum$4ae73a28-d945-4c1b-a281-aa4931bf0bfd$62297efe-3a15-4dab-ac17-b823ab3e7933$31727049-ac0a-45a6-aae0-934d4549b541$b5f1d18c-53ad-4441-8ca8-e02d6ab840d0$a7118fbb-66d6-44a1-a6ae-839f0e42a3ecupstream_cells_mapsum_matrix_lib$fa017c45-6410-4c14-b9a2-ede33759d396PtrMatrixCfloatsizeCintccall$98a65469-573e-43b5-9043-f3d0f3198bccprecedence_heuristic cell_id$98a65469-573e-43b5-9043-f3d0f3198bccdownstream_cells_mapcolumn_wise$fa017c45-6410-4c14-b9a2-ede33759d396upstream_cells_map@md_strCoreBase.get@bindasideFoldablePlutoRunnerCheckBoxPlutoRunner.create_bondBaseCore.applicablegetindex$8e337fad-abcf-4ad3-bf75-ab3980f36baaprecedence_heuristic cell_id$8e337fad-abcf-4ad3-bf75-ab3980f36baadownstream_cells_mapmany_sum_md_code$258817e3-8495-4136-8cb9-00a4475245b2many_sum_lib$6657e4dd-f5c2-47c4-b0d6-a2a56aac7b96upstream_cells_mapmany_sum_code$050a67f8-7f02-4ac9-8ac4-20327d46c5e8compile_lib$9e78f2a1-0811-4f61-957d-ad4718430f7fprecedence_heuristic cell_id$9e78f2a1-0811-4f61-957d-ad4718430f7fdownstream_cells_mapupstream_cells_map@md_strgetindex$e83baa29-ad2b-4ffc-99f5-cdbca9e31233precedence_heuristic cell_id$e83baa29-ad2b-4ffc-99f5-cdbca9e31233downstream_cells_mapupstream_cells_map@md_strgetindex$96bffd66-24fc-46f7-b211-57e7d27bc316precedence_heuristic cell_id$96bffd66-24fc-46f7-b211-57e7d27bc316downstream_cells_mapmany_vec$a7118fbb-66d6-44a1-a6ae-839f0e42a3ec$910fc9b2-c57d-4874-b8b2-df440fc921c0upstream_cells_mapmany_log_size$6c021710-5828-4ac0-8619-ce690ba89d5f^Cfloatrand$658ca396-2d73-4c93-8138-33c101deee7bprecedence_heuristic cell_id$658ca396-2d73-4c93-8138-33c101deee7bdownstream_cells_mapupstream_cells_map@md_strgetindex$d5432907-3e55-4035-9c91-183c37d886eaprecedence_heuristic cell_id$d5432907-3e55-4035-9c91-183c37d886eadownstream_cells_mapnum_threads$31727049-ac0a-45a6-aae0-934d4549b541$b5f1d18c-53ad-4441-8ca8-e02d6ab840d0log_size$3a5d674d-7c5b-4dac-b9ae-d65a1e9a5cbaupstream_cells_map@md_strCore:vboxBase.get@bindSliderasideBasePlutoRunnerPlutoRunner.create_bondCore.applicablegetindex$e90fd21d-d046-4852-823c-5d7210068923precedence_heuristic cell_id$e90fd21d-d046-4852-823c-5d7210068923downstream_cells_mapupstream_cells_map@md_strgetindex$4b7a62a4-1e88-410b-8549-3021f6cdf6daprecedence_heuristic cell_id$4b7a62a4-1e88-410b-8549-3021f6cdf6dadownstream_cells_mapupstream_cells_map@md_strgetindex$b26ab400-ce89-4a76-ad48-464ac6821dd2precedence_heuristic cell_id$b26ab400-ce89-4a76-ad48-464ac6821dd2downstream_cells_mapupstream_cells_map@md_strgetindex$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1precedence_heuristiccell_id$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1downstream_cells_mapStaticArraysSimpleClangPlutoUI$3bff6f16-ad66-4aed-9a29-4bdcfe41b57f$91f7f3fe-cb4d-4461-9147-81c72b650a4bHypertextLiteral$3bff6f16-ad66-4aed-9a29-4bdcfe41b57f$91f7f3fe-cb4d-4461-9147-81c72b650a4bExperimentalLayoutBenchmarkTools$4ae73a28-d945-4c1b-a281-aa4931bf0bfd$62297efe-3a15-4dab-ac17-b823ab3e7933$31727049-ac0a-45a6-aae0-934d4549b541$a7118fbb-66d6-44a1-a6ae-839f0e42a3ec$910fc9b2-c57d-4874-b8b2-df440fc921c0$8b98b33e-f65d-4cbd-9e80-20a7132cd349Markdown$cd6c2bda-e80d-46ba-8242-f0f61a250471LuxorPlutoTeachingTools$3bff6f16-ad66-4aed-9a29-4bdcfe41b57f$91f7f3fe-cb4d-4461-9147-81c72b650a4bupstream_cells_map$1f45bab8-afb7-4cd2-8a37-1f258f37ad8fprecedence_heuristic cell_id$1f45bab8-afb7-4cd2-8a37-1f258f37ad8fdownstream_cells_mapupstream_cells_map@md_strgetindex$e7445ed8-cbf7-475d-bd67-3df8d9015de2precedence_heuristic cell_id$e7445ed8-cbf7-475d-bd67-3df8d9015de2downstream_cells_mapupstream_cells_map@md_strgetindex$ccfd4488-a32a-4b35-a922-2e830f91ca08precedence_heuristic cell_id$ccfd4488-a32a-4b35-a922-2e830f91ca08downstream_cells_mapc_sum_matrix$fa017c45-6410-4c14-b9a2-ede33759d396upstream_cells_mapCCode*$ebe1cd42-ba25-4538-acbe-353e0e47009eprecedence_heuristic cell_id$ebe1cd42-ba25-4538-acbe-353e0e47009edownstream_cells_mapsum_md_code$4b9cfb4d-2355-42e3-be2f-35e2638e984bsum_lib$253bd533-99b7-4012-b3f4-e86a2466a919upstream_cells_mapno_false_sharing$1b9fb8aa-71cf-4e69-ad84-666c1b66bb5elocal_results$1b9fb8aa-71cf-4e69-ad84-666c1b66bb5ec_sum_code$19655acd-5880-44fa-ac29-d56faf43e87bcompile_lib$8b98b33e-f65d-4cbd-9e80-20a7132cd349precedence_heuristic cell_id$8b98b33e-f65d-4cbd-9e80-20a7132cd349downstream_cells_mapupstream_cells_mapBenchmarkTools$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1$1b9fb8aa-71cf-4e69-ad84-666c1b66bb5eprecedence_heuristic cell_id$1b9fb8aa-71cf-4e69-ad84-666c1b66bb5edownstream_cells_mapno_false_sharing$ebe1cd42-ba25-4538-acbe-353e0e47009elocal_results$ebe1cd42-ba25-4538-acbe-353e0e47009eno_difffalse_sharingupstream_cells_map@md_strCoreBase.get@bindasideFoldablePlutoRunnerCheckBoxPlutoRunner.create_bondBaseCore.applicablegetindex$f95dd40b-8c56-4e10-abbc-3dbb58148e1fprecedence_heuristic cell_id$f95dd40b-8c56-4e10-abbc-3dbb58148e1fdownstream_cells_mapupstream_cells_map@md_strgetindex$d8238145-9787-40f0-a151-1ef73d8c97eeprecedence_heuristic cell_id$d8238145-9787-40f0-a151-1ef73d8c97eedownstream_cells_mapupstream_cells_maphbox=>img$91f7f3fe-cb4d-4461-9147-81c72b650a4b$a32ba8f2-a9c9-41c6-99b4-577f0823bd9fprecedence_heuristic cell_id$a32ba8f2-a9c9-41c6-99b4-577f0823bd9fdownstream_cells_mapupstream_cells_map@md_strgetindex$cd6c2bda-e80d-46ba-8242-f0f61a250471precedence_heuristic cell_id$cd6c2bda-e80d-46ba-8242-f0f61a250471downstream_cells_mapdefinition$b2b3beda-c8bf-4616-b1bd-bdd907d11636upstream_cells_mapMarkdown.AdmonitionMarkdown.MDMarkdown$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1$253bd533-99b7-4012-b3f4-e86a2466a919precedence_heuristic cell_id$253bd533-99b7-4012-b3f4-e86a2466a919downstream_cells_mapc_sum$4ae73a28-d945-4c1b-a281-aa4931bf0bfd$62297efe-3a15-4dab-ac17-b823ab3e7933$31727049-ac0a-45a6-aae0-934d4549b541$b5f1d18c-53ad-4441-8ca8-e02d6ab840d0$a7118fbb-66d6-44a1-a6ae-839f0e42a3ecupstream_cells_maplengthsum_lib$ebe1cd42-ba25-4538-acbe-353e0e47009ePtrCfloatCintccallVector$910fc9b2-c57d-4874-b8b2-df440fc921c0precedence_heuristic cell_id$910fc9b2-c57d-4874-b8b2-df440fc921c0downstream_cells_mapupstream_cells_mapBenchmarkTools.Parameters,BenchmarkTools.generate_benchmark_definition##many_vec#310many_vec$96bffd66-24fc-46f7-b211-57e7d27bc316#___this_pluto_module_nameprintlnBenchmarkTools$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1base_num_threads$6c021710-5828-4ac0-8619-ce690ba89d5fmany_sum$6657e4dd-f5c2-47c4-b0d6-a2a56aac7b96@btimefactor$6c021710-5828-4ac0-8619-ce690ba89d5f==$6f70144e-5240-41ef-a719-8a8942e18feeprecedence_heuristic cell_id$6f70144e-5240-41ef-a719-8a8942e18feedownstream_cells_mapupstream_cells_mapimg$91f7f3fe-cb4d-4461-9147-81c72b650a4b$caec43a3-9bac-4f73-a8e5-288cfa9e1606precedence_heuristic cell_id$caec43a3-9bac-4f73-a8e5-288cfa9e1606downstream_cells_mapupstream_cells_mapasidebibcite$d52b5227-0ef4-4dc6-b443-03364b0caeff$db24839c-eb42-4d5c-8545-3714abc01bc5precedence_heuristic cell_id$db24839c-eb42-4d5c-8545-3714abc01bc5downstream_cells_mapupstream_cells_map@md_strgetindex$6657e4dd-f5c2-47c4-b0d6-a2a56aac7b96precedence_heuristic cell_id$6657e4dd-f5c2-47c4-b0d6-a2a56aac7b96downstream_cells_mapmany_sum$947b8e5c-9cb7-4fe6-aff6-48416879fb43$910fc9b2-c57d-4874-b8b2-df440fc921c0upstream_cells_mapccalllengthPtrCfloatCintmany_sum_lib$8e337fad-abcf-4ad3-bf75-ab3980f36baaVector$ea9ff1a9-615d-4e18-a4c8-9aad20447156precedence_heuristic cell_id$ea9ff1a9-615d-4e18-a4c8-9aad20447156downstream_cells_mapupstream_cells_mapasidebibcite$d52b5227-0ef4-4dc6-b443-03364b0caeff$c0bda86a-136b-45ca-84ba-7365c367d265precedence_heuristic cell_id$c0bda86a-136b-45ca-84ba-7365c367d265downstream_cells_mapupstream_cells_map@md_strgetindex$11b1c6a8-3918-4dda-9028-17af2d6c44c4precedence_heuristic cell_id$11b1c6a8-3918-4dda-9028-17af2d6c44c4downstream_cells_mapupstream_cells_map@md_strgetindex$d221bad8-98fb-4c1d-9c9c-66e1b697f023precedence_heuristic cell_id$d221bad8-98fb-4c1d-9c9c-66e1b697f023downstream_cells_mapupstream_cells_map@md_strgetindex$a7118fbb-66d6-44a1-a6ae-839f0e42a3ecprecedence_heuristic cell_id$a7118fbb-66d6-44a1-a6ae-839f0e42a3ecdownstream_cells_mapupstream_cells_mapBenchmarkTools.Parametersc_sum$19943be2-1633-48c9-8cb3-2a73fb96e4ae$253bd533-99b7-4012-b3f4-e86a2466a919##many_vec#304@btime,BenchmarkTools.generate_benchmark_definitionmany_vec$96bffd66-24fc-46f7-b211-57e7d27bc316#___this_pluto_module_name==printlnBenchmarkTools$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1$6e8865f5-84ad-4083-bb19-57ad1b561fabprecedence_heuristic cell_id$6e8865f5-84ad-4083-bb19-57ad1b561fabdownstream_cells_mapupstream_cells_map@md_strgetindex$d718f117-41da-42ff-9bcd-8bef0e7e6974precedence_heuristic cell_id$d718f117-41da-42ff-9bcd-8bef0e7e6974downstream_cells_mapupstream_cells_map@md_strgetindex$3887824b-7c7f-4c24-bf6d-7a55ed7adc89precedence_heuristic cell_id$3887824b-7c7f-4c24-bf6d-7a55ed7adc89downstream_cells_mapupstream_cells_map@md_strgetindex$050a67f8-7f02-4ac9-8ac4-20327d46c5e8precedence_heuristic cell_id$050a67f8-7f02-4ac9-8ac4-20327d46c5e8downstream_cells_mapmany_sum_code$8e337fad-abcf-4ad3-bf75-ab3980f36baaupstream_cells_mapCppCode$62297efe-3a15-4dab-ac17-b823ab3e7933precedence_heuristic cell_id$62297efe-3a15-4dab-ac17-b823ab3e7933downstream_cells_mapupstream_cells_mapBenchmarkTools.Parametersc_sum$19943be2-1633-48c9-8cb3-2a73fb96e4ae$253bd533-99b7-4012-b3f4-e86a2466a919vec$3a5d674d-7c5b-4dac-b9ae-d65a1e9a5cba@btime,BenchmarkTools.generate_benchmark_definition##vec#292BenchmarkTools$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1#___this_pluto_module_name==println$31727049-ac0a-45a6-aae0-934d4549b541precedence_heuristic cell_id$31727049-ac0a-45a6-aae0-934d4549b541downstream_cells_mapupstream_cells_mapBenchmarkTools.Parametersc_sum$19943be2-1633-48c9-8cb3-2a73fb96e4ae$253bd533-99b7-4012-b3f4-e86a2466a919,BenchmarkTools.generate_benchmark_definition##vec#298BenchmarkTools$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1#___this_pluto_module_nameprintlnvec$3a5d674d-7c5b-4dac-b9ae-d65a1e9a5cba@btimenum_threads$d5432907-3e55-4035-9c91-183c37d886ea==$2a1f3d29-4d6b-4634-86f3-4ecd4a7821a2precedence_heuristic cell_id$2a1f3d29-4d6b-4634-86f3-4ecd4a7821a2downstream_cells_mapupstream_cells_map@md_strgetindex$81465bf1-8e54-461f-892c-2769bf94fdfeprecedence_heuristic cell_id$81465bf1-8e54-461f-892c-2769bf94fdfedownstream_cells_mapupstream_cells_map@md_strgetindex$3a5d674d-7c5b-4dac-b9ae-d65a1e9a5cbaprecedence_heuristic cell_id$3a5d674d-7c5b-4dac-b9ae-d65a1e9a5cbadownstream_cells_mapvec$62297efe-3a15-4dab-ac17-b823ab3e7933$31727049-ac0a-45a6-aae0-934d4549b541$b5f1d18c-53ad-4441-8ca8-e02d6ab840d0$947b8e5c-9cb7-4fe6-aff6-48416879fb43upstream_cells_map^Cfloatrandlog_size$d5432907-3e55-4035-9c91-183c37d886ea$6c021710-5828-4ac0-8619-ce690ba89d5fprecedence_heuristic cell_id$6c021710-5828-4ac0-8619-ce690ba89d5fdownstream_cells_mapmany_log_size$96bffd66-24fc-46f7-b211-57e7d27bc316base_num_threads$947b8e5c-9cb7-4fe6-aff6-48416879fb43$910fc9b2-c57d-4874-b8b2-df440fc921c0factor$947b8e5c-9cb7-4fe6-aff6-48416879fb43$910fc9b2-c57d-4874-b8b2-df440fc921c0upstream_cells_map@md_strCore:vboxBase.get@bindSliderasideBasePlutoRunnerPlutoRunner.create_bondCore.applicablegetindex$e867d9be-5668-4756-af7f-c23c48962f08precedence_heuristic cell_id$e867d9be-5668-4756-af7f-c23c48962f08downstream_cells_mapupstream_cells_mapasidebibcite$d52b5227-0ef4-4dc6-b443-03364b0caeff$de0bbef2-1240-4f85-889f-0af509d6cfffprecedence_heuristic cell_id$de0bbef2-1240-4f85-889f-0af509d6cfffdownstream_cells_mapupstream_cells_map@md_strasidebibcite$d52b5227-0ef4-4dc6-b443-03364b0caefftipgetindex$d52b5227-0ef4-4dc6-b443-03364b0caeffprecedence_heuristic cell_id$d52b5227-0ef4-4dc6-b443-03364b0caeffdownstream_cells_mapbibcite$e867d9be-5668-4756-af7f-c23c48962f08$caec43a3-9bac-4f73-a8e5-288cfa9e1606$de0bbef2-1240-4f85-889f-0af509d6cfff$ea9ff1a9-615d-4e18-a4c8-9aad20447156$c6ea9bc5-bb15-4e25-a854-de3417d736a6upstream_cells_map*$138caa9b-1d53-4c01-a3b9-c1a097413736precedence_heuristic cell_id$138caa9b-1d53-4c01-a3b9-c1a097413736downstream_cells_mapupstream_cells_mapURL$91f7f3fe-cb4d-4461-9147-81c72b650a4bimg$91f7f3fe-cb4d-4461-9147-81c72b650a4b$d1aef3d4-33d1-4151-8ba3-2169f734ea6bprecedence_heuristic cell_id$d1aef3d4-33d1-4151-8ba3-2169f734ea6bdownstream_cells_mapupstream_cells_map@md_strFoldablegetindex$947b8e5c-9cb7-4fe6-aff6-48416879fb43precedence_heuristic cell_id$947b8e5c-9cb7-4fe6-aff6-48416879fb43downstream_cells_mapupstream_cells_map@timeBase.gc_alloc_countBase.*Base.-Base.gc_numbase_num_threads$6c021710-5828-4ac0-8619-ce690ba89d5fBase.cumulative_compile_time_nsBase.GC_DiffBase.===Base.stringBasevec$3a5d674d-7c5b-4dac-b9ae-d65a1e9a5cbamany_sum$6657e4dd-f5c2-47c4-b0d6-a2a56aac7b96Base.time_printfactor$6c021710-5828-4ac0-8619-ce690ba89d5fBase.time_nsBase.Threads.lock_profilingBase./Base.cumulative_compile_timing$4b9cfb4d-2355-42e3-be2f-35e2638e984bprecedence_heuristic cell_id$4b9cfb4d-2355-42e3-be2f-35e2638e984bdownstream_cells_mapupstream_cells_mapsum_md_code$ebe1cd42-ba25-4538-acbe-353e0e47009e$3bff6f16-ad66-4aed-9a29-4bdcfe41b57fprecedence_heuristic cell_id$3bff6f16-ad66-4aed-9a29-4bdcfe41b57fdownstream_cells_mapupstream_cells_mapPlutoTeachingTools$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1HypertextLiteral.BypassHypertextLiteral.ResultHypertextLiteral$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1PlutoUI$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1HypertextLiteral.content$PlutoTeachingTools.ChooseDisplayMode@htlPlutoUI.TableOfContents$f7da896d-089c-4430-b82c-db86c380b171precedence_heuristic cell_id$f7da896d-089c-4430-b82c-db86c380b171downstream_cells_mapupstream_cells_map@md_strgetindex$655d59f3-665b-410b-b534-afadae96295bprecedence_heuristic cell_id$655d59f3-665b-410b-b534-afadae96295bdownstream_cells_mapupstream_cells_map@md_strasidegetindex$5d7cd5e3-5fc2-4835-bea1-c4897467365bprecedence_heuristic cell_id$5d7cd5e3-5fc2-4835-bea1-c4897467365bdownstream_cells_mapupstream_cells_mapasidesum_matrix_code$fa017c45-6410-4c14-b9a2-ede33759d396$81da94b8-1bbf-4773-ba53-d229452cef75precedence_heuristic cell_id$81da94b8-1bbf-4773-ba53-d229452cef75downstream_cells_mapmat$4ae73a28-d945-4c1b-a281-aa4931bf0bfdupstream_cells_map^Cfloatrand$258817e3-8495-4136-8cb9-00a4475245b2precedence_heuristic cell_id$258817e3-8495-4136-8cb9-00a4475245b2downstream_cells_mapupstream_cells_mapmany_sum_md_code$8e337fad-abcf-4ad3-bf75-ab3980f36baa$a144f991-3af4-4f88-a7e8-c1f3e9d7f7e2precedence_heuristic cell_id$a144f991-3af4-4f88-a7e8-c1f3e9d7f7e2downstream_cells_mapupstream_cells_map@md_strasideURL$91f7f3fe-cb4d-4461-9147-81c72b650a4b=>img$91f7f3fe-cb4d-4461-9147-81c72b650a4bgetindex$dbe9c6d8-611e-46f9-9a8e-2b3647e813faprecedence_heuristic cell_id$dbe9c6d8-611e-46f9-9a8e-2b3647e813fadownstream_cells_mapfiledirupstream_cells_mapruncmd_genjoinpath@cmdmktempdirimg$91f7f3fe-cb4d-4461-9147-81c72b650a4b$4ae73a28-d945-4c1b-a281-aa4931bf0bfdprecedence_heuristic cell_id$4ae73a28-d945-4c1b-a281-aa4931bf0bfddownstream_cells_mapupstream_cells_mapmat$81da94b8-1bbf-4773-ba53-d229452cef75BenchmarkTools.Parametersc_sum$19943be2-1633-48c9-8cb3-2a73fb96e4ae$253bd533-99b7-4012-b3f4-e86a2466a919@btime,BenchmarkTools.generate_benchmark_definitionBenchmarkTools$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1##mat#286==println#___this_pluto_module_name$19655acd-5880-44fa-ac29-d56faf43e87bprecedence_heuristic cell_id$19655acd-5880-44fa-ac29-d56faf43e87bdownstream_cells_mapc_sum_code$ebe1cd42-ba25-4538-acbe-353e0e47009eupstream_cells_map*BoolCppCode$b2b3beda-c8bf-4616-b1bd-bdd907d11636precedence_heuristic cell_id$b2b3beda-c8bf-4616-b1bd-bdd907d11636downstream_cells_mapupstream_cells_map@md_strhboxdefinition$cd6c2bda-e80d-46ba-8242-f0f61a250471DictDiv=>getindex$37d9b5f0-48b6-4ff3-873d-592230687995precedence_heuristic cell_id$37d9b5f0-48b6-4ff3-873d-592230687995downstream_cells_mapupstream_cells_map@md_strgetindex$fa017c45-6410-4c14-b9a2-ede33759d396precedence_heuristic cell_id$fa017c45-6410-4c14-b9a2-ede33759d396downstream_cells_mapsum_matrix_code$5d7cd5e3-5fc2-4835-bea1-c4897467365bsum_matrix_lib$19943be2-1633-48c9-8cb3-2a73fb96e4aeupstream_cells_mapc_sum_matrix$ccfd4488-a32a-4b35-a922-2e830f91ca08column_wise$98a65469-573e-43b5-9043-f3d0f3198bcccompile_lib$91f7f3fe-cb4d-4461-9147-81c72b650a4bprecedence_heuristic cell_id$91f7f3fe-cb4d-4461-9147-81c72b650a4bdownstream_cells_mapsave_imagePath$91f7f3fe-cb4d-4461-9147-81c72b650a4bimgpathURL$138caa9b-1d53-4c01-a3b9-c1a097413736$a144f991-3af4-4f88-a7e8-c1f3e9d7f7e2$91f7f3fe-cb4d-4461-9147-81c72b650a4bimg$dbe9c6d8-611e-46f9-9a8e-2b3647e813fa$138caa9b-1d53-4c01-a3b9-c1a097413736$02be0de6-70dc-4cf4-b630-b541a304eecd$d8238145-9787-40f0-a151-1ef73d8c97ee$6f70144e-5240-41ef-a719-8a8942e18fee$a144f991-3af4-4f88-a7e8-c1f3e9d7f7e2upstream_cells_map!HypertextLiteral.BypassPlutoUI.LocalResourcePlutoUI$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1joinpathHypertextLiteral.contentstartswithStringendPlutoTeachingTools$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1@htlHypertextLiteral.attribute_pair&PlutoTeachingTools.RobustLocalResourcePath$91f7f3fe-cb4d-4461-9147-81c72b650a4bHypertextLiteral.ResultHypertextLiteral$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1URL$91f7f3fe-cb4d-4461-9147-81c72b650a4b*@__DIR__insplit$b5f1d18c-53ad-4441-8ca8-e02d6ab840d0precedence_heuristic cell_id$b5f1d18c-53ad-4441-8ca8-e02d6ab840d0downstream_cells_mapupstream_cells_map@timeBase.gc_alloc_countBase.*c_sum$19943be2-1633-48c9-8cb3-2a73fb96e4ae$253bd533-99b7-4012-b3f4-e86a2466a919Base.-Base.gc_numBase.cumulative_compile_time_nsBase.GC_DiffBase.===Base.stringBasevec$3a5d674d-7c5b-4dac-b9ae-d65a1e9a5cbaBase.time_printBase.time_nsnum_threads$d5432907-3e55-4035-9c91-183c37d886eaBase.Threads.lock_profilingBase./Base.cumulative_compile_timing$c6ea9bc5-bb15-4e25-a854-de3417d736a6precedence_heuristic cell_id$c6ea9bc5-bb15-4e25-a854-de3417d736a6downstream_cells_mapupstream_cells_mapasidebibcite$d52b5227-0ef4-4dc6-b443-03364b0caeff$02be0de6-70dc-4cf4-b630-b541a304eecdprecedence_heuristic cell_id$02be0de6-70dc-4cf4-b630-b541a304eecddownstream_cells_mapupstream_cells_mapimg$91f7f3fe-cb4d-4461-9147-81c72b650a4b$f26f0a70-c16b-491d-b4cf-45ca146727c2precedence_heuristic cell_id$f26f0a70-c16b-491d-b4cf-45ca146727c2downstream_cells_mapupstream_cells_map@md_strgetindexcell_execution_orderE$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1$3bff6f16-ad66-4aed-9a29-4bdcfe41b57f$8a51b9c5-8888-4578-ae40-cf906ec9b5fa$d52b5227-0ef4-4dc6-b443-03364b0caeff$3887824b-7c7f-4c24-bf6d-7a55ed7adc89$655d59f3-665b-410b-b534-afadae96295b$37d9b5f0-48b6-4ff3-873d-592230687995$81465bf1-8e54-461f-892c-2769bf94fdfe$e867d9be-5668-4756-af7f-c23c48962f08$a32ba8f2-a9c9-41c6-99b4-577f0823bd9f$658ca396-2d73-4c93-8138-33c101deee7b$caec43a3-9bac-4f73-a8e5-288cfa9e1606$f26f0a70-c16b-491d-b4cf-45ca146727c2$81da94b8-1bbf-4773-ba53-d229452cef75$98a65469-573e-43b5-9043-f3d0f3198bcc$ccfd4488-a32a-4b35-a922-2e830f91ca08$fa017c45-6410-4c14-b9a2-ede33759d396$19943be2-1633-48c9-8cb3-2a73fb96e4ae$5d7cd5e3-5fc2-4835-bea1-c4897467365b$c0bda86a-136b-45ca-84ba-7365c367d265$11b1c6a8-3918-4dda-9028-17af2d6c44c4$de0bbef2-1240-4f85-889f-0af509d6cfff$6e8865f5-84ad-4083-bb19-57ad1b561fab$d221bad8-98fb-4c1d-9c9c-66e1b697f023$ea9ff1a9-615d-4e18-a4c8-9aad20447156$9e78f2a1-0811-4f61-957d-ad4718430f7f$e90fd21d-d046-4852-823c-5d7210068923$c6ea9bc5-bb15-4e25-a854-de3417d736a6$e7445ed8-cbf7-475d-bd67-3df8d9015de2$d5432907-3e55-4035-9c91-183c37d886ea$3a5d674d-7c5b-4dac-b9ae-d65a1e9a5cba$1b9fb8aa-71cf-4e69-ad84-666c1b66bb5e$19655acd-5880-44fa-ac29-d56faf43e87b$ebe1cd42-ba25-4538-acbe-353e0e47009e$4b9cfb4d-2355-42e3-be2f-35e2638e984b$253bd533-99b7-4012-b3f4-e86a2466a919$4ae73a28-d945-4c1b-a281-aa4931bf0bfd$62297efe-3a15-4dab-ac17-b823ab3e7933$31727049-ac0a-45a6-aae0-934d4549b541$b5f1d18c-53ad-4441-8ca8-e02d6ab840d0$1f45bab8-afb7-4cd2-8a37-1f258f37ad8f$db24839c-eb42-4d5c-8545-3714abc01bc5$d718f117-41da-42ff-9bcd-8bef0e7e6974$6c021710-5828-4ac0-8619-ce690ba89d5f$96bffd66-24fc-46f7-b211-57e7d27bc316$a7118fbb-66d6-44a1-a6ae-839f0e42a3ec$050a67f8-7f02-4ac9-8ac4-20327d46c5e8$8e337fad-abcf-4ad3-bf75-ab3980f36baa$258817e3-8495-4136-8cb9-00a4475245b2$6657e4dd-f5c2-47c4-b0d6-a2a56aac7b96$947b8e5c-9cb7-4fe6-aff6-48416879fb43$910fc9b2-c57d-4874-b8b2-df440fc921c0$f95dd40b-8c56-4e10-abbc-3dbb58148e1f$2a1f3d29-4d6b-4634-86f3-4ecd4a7821a2$b26ab400-ce89-4a76-ad48-464ac6821dd2$4b7a62a4-1e88-410b-8549-3021f6cdf6da$e83baa29-ad2b-4ffc-99f5-cdbca9e31233$f7da896d-089c-4430-b82c-db86c380b171$d1aef3d4-33d1-4151-8ba3-2169f734ea6b$8b98b33e-f65d-4cbd-9e80-20a7132cd349$91f7f3fe-cb4d-4461-9147-81c72b650a4b$dbe9c6d8-611e-46f9-9a8e-2b3647e813fa$138caa9b-1d53-4c01-a3b9-c1a097413736$02be0de6-70dc-4cf4-b630-b541a304eecd$d8238145-9787-40f0-a151-1ef73d8c97ee$6f70144e-5240-41ef-a719-8a8942e18fee$a144f991-3af4-4f88-a7e8-c1f3e9d7f7e2$cd6c2bda-e80d-46ba-8242-f0f61a250471$b2b3beda-c8bf-4616-b1bd-bdd907d11636last_hot_reload_timeshortpath2_shared.jlprocess_statusreadypath:/home/runner/work/LINMA2710/LINMA2710/Lectures/2_shared.jlpluto_versionv0.20.24last_save_timeAeιzȪcell_orderE$3bff6f16-ad66-4aed-9a29-4bdcfe41b57f$8a51b9c5-8888-4578-ae40-cf906ec9b5fa$d52b5227-0ef4-4dc6-b443-03364b0caeff$3887824b-7c7f-4c24-bf6d-7a55ed7adc89$dbe9c6d8-611e-46f9-9a8e-2b3647e813fa$655d59f3-665b-410b-b534-afadae96295b$37d9b5f0-48b6-4ff3-873d-592230687995$138caa9b-1d53-4c01-a3b9-c1a097413736$81465bf1-8e54-461f-892c-2769bf94fdfe$e867d9be-5668-4756-af7f-c23c48962f08$a32ba8f2-a9c9-41c6-99b4-577f0823bd9f$02be0de6-70dc-4cf4-b630-b541a304eecd$658ca396-2d73-4c93-8138-33c101deee7b$caec43a3-9bac-4f73-a8e5-288cfa9e1606$f26f0a70-c16b-491d-b4cf-45ca146727c2$4ae73a28-d945-4c1b-a281-aa4931bf0bfd$81da94b8-1bbf-4773-ba53-d229452cef75$19943be2-1633-48c9-8cb3-2a73fb96e4ae$5d7cd5e3-5fc2-4835-bea1-c4897467365b$98a65469-573e-43b5-9043-f3d0f3198bcc$fa017c45-6410-4c14-b9a2-ede33759d396$ccfd4488-a32a-4b35-a922-2e830f91ca08$c0bda86a-136b-45ca-84ba-7365c367d265$11b1c6a8-3918-4dda-9028-17af2d6c44c4$de0bbef2-1240-4f85-889f-0af509d6cfff$6e8865f5-84ad-4083-bb19-57ad1b561fab$d8238145-9787-40f0-a151-1ef73d8c97ee$d221bad8-98fb-4c1d-9c9c-66e1b697f023$ea9ff1a9-615d-4e18-a4c8-9aad20447156$9e78f2a1-0811-4f61-957d-ad4718430f7f$6f70144e-5240-41ef-a719-8a8942e18fee$e90fd21d-d046-4852-823c-5d7210068923$c6ea9bc5-bb15-4e25-a854-de3417d736a6$e7445ed8-cbf7-475d-bd67-3df8d9015de2$4b9cfb4d-2355-42e3-be2f-35e2638e984b$62297efe-3a15-4dab-ac17-b823ab3e7933$31727049-ac0a-45a6-aae0-934d4549b541$b5f1d18c-53ad-4441-8ca8-e02d6ab840d0$3a5d674d-7c5b-4dac-b9ae-d65a1e9a5cba$253bd533-99b7-4012-b3f4-e86a2466a919$a144f991-3af4-4f88-a7e8-c1f3e9d7f7e2$d5432907-3e55-4035-9c91-183c37d886ea$1b9fb8aa-71cf-4e69-ad84-666c1b66bb5e$ebe1cd42-ba25-4538-acbe-353e0e47009e$19655acd-5880-44fa-ac29-d56faf43e87b$1f45bab8-afb7-4cd2-8a37-1f258f37ad8f$258817e3-8495-4136-8cb9-00a4475245b2$db24839c-eb42-4d5c-8545-3714abc01bc5$d718f117-41da-42ff-9bcd-8bef0e7e6974$947b8e5c-9cb7-4fe6-aff6-48416879fb43$a7118fbb-66d6-44a1-a6ae-839f0e42a3ec$910fc9b2-c57d-4874-b8b2-df440fc921c0$96bffd66-24fc-46f7-b211-57e7d27bc316$6657e4dd-f5c2-47c4-b0d6-a2a56aac7b96$6c021710-5828-4ac0-8619-ce690ba89d5f$8e337fad-abcf-4ad3-bf75-ab3980f36baa$050a67f8-7f02-4ac9-8ac4-20327d46c5e8$f95dd40b-8c56-4e10-abbc-3dbb58148e1f$2a1f3d29-4d6b-4634-86f3-4ecd4a7821a2$b2b3beda-c8bf-4616-b1bd-bdd907d11636$b26ab400-ce89-4a76-ad48-464ac6821dd2$4b7a62a4-1e88-410b-8549-3021f6cdf6da$e83baa29-ad2b-4ffc-99f5-cdbca9e31233$f7da896d-089c-4430-b82c-db86c380b171$d1aef3d4-33d1-4151-8ba3-2169f734ea6b$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1$8b98b33e-f65d-4cbd-9e80-20a7132cd349$91f7f3fe-cb4d-4461-9147-81c72b650a4b$cd6c2bda-e80d-46ba-8242-f0f61a250471published_objects573dc2256-4ab3-11f1-b967-89f2e690ee32/3583170c6e86b605childrenO

many_log_size = 16

text/html4

base_num_threads = 2

text/html

factor = 2

text/htmlclassnamestyle&display: flex; flex-direction: column;573dc2256-4ab3-11f1-b967-89f2e690ee32/8850476c856a19d0childrenE

log_size = 16

text/html*

num_threads = 2

text/htmlclassnamestyle&display: flex; flex-direction: column;nbpkginstall_time_ns|+&instantiatedòinstalled_versionsSimpleClang0.1.0StaticArrays1.9.17PlutoUI0.7.79HypertextLiteral1.0.0BenchmarkTools1.6.3Luxor4.4.1MarkdownstdlibPlutoTeachingTools0.4.7terminal_outputsStaticArrays Resolving... ===  Project No packages added to or removed from `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_voodkaakaf/Project.toml`  Updating `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_voodkaakaf/Manifest.toml` [d091e8ba] + Xorg_libXfixes_jll v6.0.2+0 [a65dc6b1] + Xorg_libpciaccess_jll v0.19.0+0 [8e53e030] + libdrm_jll v2.4.125+1 [9a156e7d] + libva_jll v2.23.0+0 Instantiating... === Precompiling... === Waiting for notebook process to start... Done. Starting precompilation...SimpleClang Resolving... ===  Project No packages added to or removed from `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_voodkaakaf/Project.toml`  Updating `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_voodkaakaf/Manifest.toml` [d091e8ba] + Xorg_libXfixes_jll v6.0.2+0 [a65dc6b1] + Xorg_libpciaccess_jll v0.19.0+0 [8e53e030] + libdrm_jll v2.4.125+1 [9a156e7d] + libva_jll v2.23.0+0 Instantiating... === Precompiling... === Waiting for notebook process to start... Done. Starting precompilation...nbpkg_sync Resolving... ===  Project No packages added to or removed from `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_voodkaakaf/Project.toml`  Updating `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_voodkaakaf/Manifest.toml` [d091e8ba] + Xorg_libXfixes_jll v6.0.2+0 [a65dc6b1] + Xorg_libpciaccess_jll v0.19.0+0 [8e53e030] + libdrm_jll v2.4.125+1 [9a156e7d] + libva_jll v2.23.0+0 Instantiating... === Precompiling... === Waiting for notebook process to start... Done. Starting precompilation...PlutoUI Resolving... ===  Project No packages added to or removed from `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_voodkaakaf/Project.toml`  Updating `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_voodkaakaf/Manifest.toml` [d091e8ba] + Xorg_libXfixes_jll v6.0.2+0 [a65dc6b1] + Xorg_libpciaccess_jll v0.19.0+0 [8e53e030] + libdrm_jll v2.4.125+1 [9a156e7d] + libva_jll v2.23.0+0 Instantiating... === Precompiling... === Waiting for notebook process to start... Done. Starting precompilation...HypertextLiteral Resolving... ===  Project No packages added to or removed from `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_voodkaakaf/Project.toml`  Updating `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_voodkaakaf/Manifest.toml` [d091e8ba] + Xorg_libXfixes_jll v6.0.2+0 [a65dc6b1] + Xorg_libpciaccess_jll v0.19.0+0 [8e53e030] + libdrm_jll v2.4.125+1 [9a156e7d] + libva_jll v2.23.0+0 Instantiating... === Precompiling... === Waiting for notebook process to start... Done. Starting precompilation...BenchmarkTools Resolving... ===  Project No packages added to or removed from `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_voodkaakaf/Project.toml`  Updating `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_voodkaakaf/Manifest.toml` [d091e8ba] + Xorg_libXfixes_jll v6.0.2+0 [a65dc6b1] + Xorg_libpciaccess_jll v0.19.0+0 [8e53e030] + libdrm_jll v2.4.125+1 [9a156e7d] + libva_jll v2.23.0+0 Instantiating... === Precompiling... === Waiting for notebook process to start... Done. Starting precompilation...Markdown Resolving... ===  Project No packages added to or removed from `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_voodkaakaf/Project.toml`  Updating `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_voodkaakaf/Manifest.toml` [d091e8ba] + Xorg_libXfixes_jll v6.0.2+0 [a65dc6b1] + Xorg_libpciaccess_jll v0.19.0+0 [8e53e030] + libdrm_jll v2.4.125+1 [9a156e7d] + libva_jll v2.23.0+0 Instantiating... === Precompiling... === Waiting for notebook process to start... Done. Starting precompilation...Luxor Resolving... ===  Project No packages added to or removed from `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_voodkaakaf/Project.toml`  Updating `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_voodkaakaf/Manifest.toml` [d091e8ba] + Xorg_libXfixes_jll v6.0.2+0 [a65dc6b1] + Xorg_libpciaccess_jll v0.19.0+0 [8e53e030] + libdrm_jll v2.4.125+1 [9a156e7d] + libva_jll v2.23.0+0 Instantiating... === Precompiling... === Waiting for notebook process to start... Done. Starting precompilation...PlutoTeachingTools Resolving... ===  Project No packages added to or removed from `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_voodkaakaf/Project.toml`  Updating `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_voodkaakaf/Manifest.toml` [d091e8ba] + Xorg_libXfixes_jll v6.0.2+0 [a65dc6b1] + Xorg_libpciaccess_jll v0.19.0+0 [8e53e030] + libdrm_jll v2.4.125+1 [9a156e7d] + libva_jll v2.23.0+0 Instantiating... === Precompiling... === Waiting for notebook process to start... Done. Starting precompilation...enabled÷restart_recommended_msgrestart_required_msgbusy_packageswaiting_for_permission,waiting_for_permission_but_probably_disabled«cell_inputsE$8a51b9c5-8888-4578-ae40-cf906ec9b5facell_id$8a51b9c5-8888-4578-ae40-cf906ec9b5facodeٙmd"[Eij10] V. Eijkhout. [Introduction to High Performance Scientific Computing.](https://theartofhpc.com/istc.html) 3 Edition, Vol. 1 (Lulu.com, 2010). "metadatashow_logsèdisabled®skip_as_script«code_folded$19943be2-1633-48c9-8cb3-2a73fb96e4aecell_id$19943be2-1633-48c9-8cb3-2a73fb96e4aecodexc_sum(x::Matrix{Cfloat}) = ccall(("sum", sum_matrix_lib), Cfloat, (Ptr{Cfloat}, Cint, Cint), x, size(x, 1), size(x, 2));metadatashow_logsèdisabled®skip_as_script«code_folded$98a65469-573e-43b5-9043-f3d0f3198bcccell_id$98a65469-573e-43b5-9043-f3d0f3198bcccodekaside( Foldable( md"What is the performance issue of this code ?", md"The way matrices are represented by Julia in memory is by concatenating all columns as single continuous memory. This means that it is more efficient to access the matrix column by column ! Switch to column-wise sum $(@bind column_wise CheckBox(default = false))", ), v_offset = -275 )metadatashow_logsèdisabled®skip_as_script«code_folded$8e337fad-abcf-4ad3-bf75-ab3980f36baacell_id$8e337fad-abcf-4ad3-bf75-ab3980f36baacodeymany_sum_md_code, many_sum_lib = compile_lib(many_sum_code("float"), lib = true, cflags = ["-O3", "-mavx2", "-fopenmp"]);metadatashow_logsèdisabled®skip_as_script«code_folded$9e78f2a1-0811-4f61-957d-ad4718430f7fcell_id$9e78f2a1-0811-4f61-957d-ad4718430f7fcode+md"## Cache hierarchy for a multi-core CPU"metadatashow_logsèdisabled®skip_as_script«code_folded$e83baa29-ad2b-4ffc-99f5-cdbca9e31233cell_id$e83baa29-ad2b-4ffc-99f5-cdbca9e31233code"md"## Application to parallel sum"metadatashow_logsèdisabled®skip_as_script«code_folded$96bffd66-24fc-46f7-b211-57e7d27bc316cell_id$96bffd66-24fc-46f7-b211-57e7d27bc316code(many_vec = rand(Cfloat, 2^many_log_size)metadatashow_logsèdisabled®skip_as_script«code_folded$658ca396-2d73-4c93-8138-33c101deee7bcell_id$658ca396-2d73-4c93-8138-33c101deee7bcodegmd""" * Accessing value not in the cache → *cache miss* * This value is then loaded along with a whole cache line (e.g., 64 or 128 contiguous bytes) * Following cache lines may also be anticipated and prefetched This shows the importance of *data locality*. An algorithm performs better if it accesses data close in memory and in a predictable pattern. """metadatashow_logsèdisabled®skip_as_script«code_folded$d5432907-3e55-4035-9c91-183c37d886eacell_id$d5432907-3e55-4035-9c91-183c37d886eacodeaside(vbox([ md"`log_size` = $(@bind log_size Slider(14:24, default = 16, show_value = true))", md"`num_threads` = $(@bind num_threads Slider(2:8, default = 2, show_value = true))", ]), v_offset = -900)metadatashow_logsèdisabled®skip_as_script«code_folded$e90fd21d-d046-4852-823c-5d7210068923cell_id$e90fd21d-d046-4852-823c-5d7210068923codehmd""" *Cache coherence* : Update L1 cache when the corresponding memory is modified by another core. """metadatashow_logsèdisabled®skip_as_script«code_folded$4b7a62a4-1e88-410b-8549-3021f6cdf6dacell_id$4b7a62a4-1e88-410b-8549-3021f6cdf6dacode%md""" * ``F_s`` : Fraction of ``T_1`` that is sequential * ``F_p = 1 - F_s`` : Fraction of ``T_1`` that is parallelizable ```math \begin{align} T_p &= T_1F_s + T_1F_p/p\\ S_p &= \frac{1}{F_s + F_p/p} & E_p &= \frac{1}{pF_s + F_p}\\ \lim_{p \to \infty} S_p &= \frac{1}{F_s} \end{align} ``` """metadatashow_logsèdisabled®skip_as_script«code_folded$b26ab400-ce89-4a76-ad48-464ac6821dd2cell_id$b26ab400-ce89-4a76-ad48-464ac6821dd2codemd"## Amdahl's law"metadatashow_logsèdisabled®skip_as_script«code_folded$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1cell_id$34519b36-0e60-4c2c-92d6-3b8ed71e6ad1codeًusing PlutoUI, PlutoUI.ExperimentalLayout, HypertextLiteral, Luxor, StaticArrays, BenchmarkTools, PlutoTeachingTools, Markdown, SimpleClangmetadatashow_logsèdisabled®skip_as_script«code_folded$1f45bab8-afb7-4cd2-8a37-1f258f37ad8fcell_id$1f45bab8-afb7-4cd2-8a37-1f258f37ad8fcodemd"## Many processors"metadatashow_logsèdisabled®skip_as_script«code_folded$e7445ed8-cbf7-475d-bd67-3df8d9015de2cell_id$e7445ed8-cbf7-475d-bd67-3df8d9015de2codemd"# Parallel sum"metadatashow_logsèdisabled®skip_as_script«code_folded$ccfd4488-a32a-4b35-a922-2e830f91ca08cell_id$ccfd4488-a32a-4b35-a922-2e830f91ca08codefunction c_sum_matrix(T; column_wise) code = """ #include $T sum($T *mat, int n, int m) { $T total = 0; """ idx = column_wise ? 'j' : 'i' len = column_wise ? 'm' : 'n' code *= """ for (int $idx = 0; $idx < $len; $idx++) { """ idx = column_wise ? 'i' : 'j' len = column_wise ? 'n' : 'm' code *= """ for (int $idx = 0; $idx < $len; $idx++) { """ code *= """ total += mat[i + j * n]; } } return total; } """ return CCode(code) end;metadatashow_logsèdisabled®skip_as_script«code_folded$ebe1cd42-ba25-4538-acbe-353e0e47009ecell_id$ebe1cd42-ba25-4538-acbe-353e0e47009ecodesum_md_code, sum_lib = compile_lib(c_sum_code("float"; local_results, no_false_sharing, simd = true), lib = true, cflags = ["-O3", "-mavx2", "-I/usr/lib/llvm-18/lib/clang/18/include", "-fopenmp"]);metadatashow_logsèdisabled®skip_as_script«code_folded$8b98b33e-f65d-4cbd-9e80-20a7132cd349cell_id$8b98b33e-f65d-4cbd-9e80-20a7132cd349code/BenchmarkTools.DEFAULT_PARAMETERS.seconds = 0.2metadatashow_logsèdisabled®skip_as_script«code_folded$1b9fb8aa-71cf-4e69-ad84-666c1b66bb5ecell_id$1b9fb8aa-71cf-4e69-ad84-666c1b66bb5ecodebegin no_diff = Foldable( md"Wait, these didn't make any difference in the benchmark, how can it be ?", md""" The compiler most probably use a register (actually a SIMD register here (if there is any) since we used `#pragma omp simd`) as accumulator for the `for` loop and only stored the value of that register into `total`, `local_results[thread_num]` or `no_false_sharing` (depending on the version). Despite all this, it is still important to be careful about this issue and not trust the execution on one environment or rely too much on compiler optimizations for the code to be portable. """, ) false_sharing = Foldable( md"This is still a performance issue, can you see why ?", md""" The entries of `local_results` are close to each other in memory. There are therefore very likely going to be part of the same block on cache. This means that when one threads modifies it, the cache block will need to be written and then other threads will need to refresh the value of this block in their cache. variable `total` is shared between the threads, so its value in the register should be sync'ed between the threads! This is called *false sharing*. Let's fix this ? $(@bind no_false_sharing CheckBox(default = false)) $no_diff """, ) aside(Foldable( md"Can you spot the issue in the code ?", md""" The same variable `total` is shared between the threads, so its value in the register should be sync'ed between the threads, this is a performance issue! More importantly, the access to the `total` variable are not **atomic**. Therefore, two threads may read the value, `add`, and then store → only one of the two `add` will then be accounted for! Let's fix this ? $(@bind local_results CheckBox(default = false)) $false_sharing """, ), v_offset = -800) endmetadatashow_logsèdisabled®skip_as_script«code_folded$f95dd40b-8c56-4e10-abbc-3dbb58148e1fcell_id$f95dd40b-8c56-4e10-abbc-3dbb58148e1fcodemd"# Amdahl's law"metadatashow_logsèdisabled®skip_as_script«code_folded$d8238145-9787-40f0-a151-1ef73d8c97eecell_id$d8238145-9787-40f0-a151-1ef73d8c97eecodeAhbox([ img("https://github.com/VictorEijkhout/TheArtOfHPC_vol1_scientificcomputing/raw/refs/heads/main/booksources/graphics/roofline1.jpeg", :height => "260px"), img("https://github.com/VictorEijkhout/TheArtOfHPC_vol1_scientificcomputing/raw/refs/heads/main/booksources/graphics/roofline3.jpeg", :height => "260px"), ])metadatashow_logsèdisabled®skip_as_script«code_folded$a32ba8f2-a9c9-41c6-99b4-577f0823bd9fcell_id$a32ba8f2-a9c9-41c6-99b4-577f0823bd9fcodemd"## Cache lines and prefetch"metadatashow_logsèdisabled®skip_as_script«code_folded$cd6c2bda-e80d-46ba-8242-f0f61a250471cell_id$cd6c2bda-e80d-46ba-8242-f0f61a250471codezfunction definition(name, content) return Markdown.MD(Markdown.Admonition("key-concept", "Def: $name", [content])) endmetadatashow_logsèdisabled®skip_as_script«code_folded$253bd533-99b7-4012-b3f4-e86a2466a919cell_id$253bd533-99b7-4012-b3f4-e86a2466a919codeٞc_sum(x::Vector{Cfloat}; num_threads = 1, verbose = 0) = ccall(("sum", sum_lib), Cfloat, (Ptr{Cfloat}, Cint, Cint, Cint), x, length(x), num_threads, verbose);metadatashow_logsèdisabled®skip_as_script«code_folded$910fc9b2-c57d-4874-b8b2-df440fc921c0cell_id$910fc9b2-c57d-4874-b8b2-df440fc921c0code4@btime many_sum($many_vec; base_num_threads, factor)metadatashow_logsèdisabled®skip_as_script«code_folded$6f70144e-5240-41ef-a719-8a8942e18feecell_id$6f70144e-5240-41ef-a719-8a8942e18feecodeيimg("https://github.com/VictorEijkhout/TheArtOfHPC_vol1_scientificcomputing/raw/refs/heads/main/booksources/graphics/cache-hierarchy.jpg")metadatashow_logsèdisabled®skip_as_script«code_folded$caec43a3-9bac-4f73-a8e5-288cfa9e1606cell_id$caec43a3-9bac-4f73-a8e5-288cfa9e1606code.aside(bibcite("Figure 1.11"), v_offset = -280)metadatashow_logsèdisabled®skip_as_script«code_folded$db24839c-eb42-4d5c-8545-3714abc01bc5cell_id$db24839c-eb42-4d5c-8545-3714abc01bc5codemd"## Benchmark"metadatashow_logsèdisabled®skip_as_script«code_folded$6657e4dd-f5c2-47c4-b0d6-a2a56aac7b96cell_id$6657e4dd-f5c2-47c4-b0d6-a2a56aac7b96codemany_sum(x::Vector{Cfloat}; base_num_threads = 1, factor = 2, verbose = 0) = ccall(("sum", many_sum_lib), Cfloat, (Ptr{Cfloat}, Cint, Cint, Cint, Cint), x, length(x), base_num_threads, factor, verbose);metadatashow_logsèdisabled®skip_as_script«code_folded$ea9ff1a9-615d-4e18-a4c8-9aad20447156cell_id$ea9ff1a9-615d-4e18-a4c8-9aad20447156code.aside(bibcite("Figure 1.16"), v_offset = -360)metadatashow_logsèdisabled®skip_as_script«code_folded$c0bda86a-136b-45ca-84ba-7365c367d265cell_id$c0bda86a-136b-45ca-84ba-7365c367d265codemd"## Arithmetic intensity"metadatashow_logsèdisabled®skip_as_script«code_folded$11b1c6a8-3918-4dda-9028-17af2d6c44c4cell_id$11b1c6a8-3918-4dda-9028-17af2d6c44c4codemd""" Consider a program requiring `m` load / store operations with memory for `o` arithmetic operations. * The *arithmetic intensity* is the ratio ``a = o / m``. * The arithmetic time is ``t_\text{arith} = o / \text{frequency}`` * The data transfer time is ``t_\text{mem} = m / \text{bandwidth} = o / (a \cdot \text{bandwidth})`` As arithmetic operations and data transfer are done in parallel, the time per iteration is ```math \max(t_\text{arith}, t_\text{mem}) / o = 1/\min(\text{frequency}, a \cdot \text{bandwidth}) ``` So the number of operations per second is ``\min(\text{frequency}, a \cdot \text{bandwidth})``. This piecewise linear function in ``a`` gives the *roofline model*. """metadatashow_logsèdisabled®skip_as_script«code_folded$d221bad8-98fb-4c1d-9c9c-66e1b697f023cell_id$d221bad8-98fb-4c1d-9c9c-66e1b697f023codemd""" * *compute-bound* : For large arithmetic intensity (Alg2 in above picture), performance determined by processor characteristics * *bandwidth-bound* : For low arithmetic intensity (Alg1 in above picture), performance determined by memory characteristics * Bandwidth line may be lowered by inefficient memory access (e.g., no locality) * Peak performance line may be lowered by inefficient use of CPU (e.g., not using SIMD) """metadatashow_logsèdisabled®skip_as_script«code_folded$a7118fbb-66d6-44a1-a6ae-839f0e42a3eccell_id$a7118fbb-66d6-44a1-a6ae-839f0e42a3eccode@btime c_sum($many_vec)metadatashow_logsèdisabled®skip_as_script«code_folded$6e8865f5-84ad-4083-bb19-57ad1b561fabcell_id$6e8865f5-84ad-4083-bb19-57ad1b561fabcodemd"## The roofline model"metadatashow_logsèdisabled®skip_as_script«code_folded$d718f117-41da-42ff-9bcd-8bef0e7e6974cell_id$d718f117-41da-42ff-9bcd-8bef0e7e6974codeTmd""" If we have many processors, we may want to speed up the last part as well: """metadatashow_logsèdisabled®skip_as_script«code_folded$3887824b-7c7f-4c24-bf6d-7a55ed7adc89cell_id$3887824b-7c7f-4c24-bf6d-7a55ed7adc89codemd"# Memory layout"metadatashow_logsèdisabled®skip_as_script«code_folded$050a67f8-7f02-4ac9-8ac4-20327d46c5e8cell_id$050a67f8-7f02-4ac9-8ac4-20327d46c5e8codefunction many_sum_code(T) code = """ #include #include extern "C" { void sum_to($T *vec, int length, $T *local_results, int num_threads, int verbose) { omp_set_dynamic(0); // Force the value `num_threads` omp_set_num_threads(num_threads); #pragma omp parallel { int thread_num = omp_get_thread_num(); int stride = length / num_threads; int last = stride * (thread_num + 1); if (thread_num + 1 == num_threads) last = length; if (verbose >= 1) fprintf(stderr, "thread id : %d / %d %d:%d\\n", thread_num, omp_get_num_threads(), stride * thread_num, last - 1); $T no_false_sharing = 0; #pragma omp simd for (int i = stride * thread_num; i < last; i++) no_false_sharing += vec[i]; local_results[thread_num] = no_false_sharing; } } $T sum($T *vec, int length, int num_threads, int factor, int verbose) { $T* buffers[2] = {new $T[num_threads], new $T[num_threads / factor]}; sum_to(vec, length, buffers[0], num_threads, verbose); int prev = num_threads, cur; int buffer_idx = 0; for (cur = num_threads / factor; cur > 0; cur /= factor) { sum_to(buffers[buffer_idx % 2], prev, buffers[(buffer_idx + 1) % 2], cur, verbose); prev = cur; buffer_idx += 1; } if (prev == 1) return buffers[buffer_idx % 2][0]; sum_to(buffers[buffer_idx % 2], prev, buffers[(buffer_idx + 1) % 2], 1, verbose); return buffers[(buffer_idx + 1) % 2][0]; } } """ return CppCode(code) end;metadatashow_logsèdisabled®skip_as_script«code_folded$62297efe-3a15-4dab-ac17-b823ab3e7933cell_id$62297efe-3a15-4dab-ac17-b823ab3e7933code0@btime c_sum($vec, num_threads = 1, verbose = 0)metadatashow_logsèdisabled®skip_as_script«code_folded$31727049-ac0a-45a6-aae0-934d4549b541cell_id$31727049-ac0a-45a6-aae0-934d4549b541code,@btime c_sum($vec; num_threads, verbose = 0)metadatashow_logsèdisabled®skip_as_script«code_folded$2a1f3d29-4d6b-4634-86f3-4ecd4a7821a2cell_id$2a1f3d29-4d6b-4634-86f3-4ecd4a7821a2codemd"## Speed-up and efficency"metadatashow_logsèdisabled®skip_as_script«code_folded$81465bf1-8e54-461f-892c-2769bf94fdfecell_id$81465bf1-8e54-461f-892c-2769bf94fdfecode٣md"""Latency of `n` bytes of data is given by ```math \alpha + \beta n ``` where ``\alpha`` is the start up time and ``\beta`` is the inverse of the bandwidth. """metadatashow_logsèdisabled®skip_as_script«code_folded$3a5d674d-7c5b-4dac-b9ae-d65a1e9a5cbacell_id$3a5d674d-7c5b-4dac-b9ae-d65a1e9a5cbacodevec = rand(Cfloat, 2^log_size)metadatashow_logsèdisabled®skip_as_script«code_folded$6c021710-5828-4ac0-8619-ce690ba89d5fcell_id$6c021710-5828-4ac0-8619-ce690ba89d5fcode*aside(vbox([ md"`many_log_size` = $(@bind many_log_size Slider(14:24, default = 16, show_value = true))", md"`base_num_threads` = $(@bind base_num_threads Slider(2:8, default = 2, show_value = true))", md"`factor` = $(@bind factor Slider(2:8, default = 2, show_value = true))", ]), v_offset = -500)metadatashow_logsèdisabled®skip_as_script«code_folded$e867d9be-5668-4756-af7f-c23c48962f08cell_id$e867d9be-5668-4756-af7f-c23c48962f08code-aside(bibcite("Figure 1.5"), v_offset = -300)metadatashow_logsèdisabled®skip_as_script«code_folded$de0bbef2-1240-4f85-889f-0af509d6cfffcell_id$de0bbef2-1240-4f85-889f-0af509d6cfffcodeR aside(tip(md""" See examples in $(bibcite("Section 1.6.1")). """), v_offset=-300)metadatashow_logsèdisabled®skip_as_script«code_folded$d52b5227-0ef4-4dc6-b443-03364b0caeffcell_id$d52b5227-0ef4-4dc6-b443-03364b0caeffcode(bibcite(what) = "[Eij10; " * what * "]";metadatashow_logsèdisabled®skip_as_script«code_folded$138caa9b-1d53-4c01-a3b9-c1a097413736cell_id$138caa9b-1d53-4c01-a3b9-c1a097413736codeىimg(URL("https://github.com/VictorEijkhout/TheArtOfHPC_vol1_scientificcomputing/raw/refs/heads/main/booksources/graphics/hierarchy.jpg"))metadatashow_logsèdisabled®skip_as_script«code_folded$d1aef3d4-33d1-4151-8ba3-2169f734ea6bcell_id$d1aef3d4-33d1-4151-8ba3-2169f734ea6bcode3Foldable(md"How to get ``1/F_s = \lim_{p \to \infty} S_p`` ?", md""" The algorithm cannot use more than ``n`` processes so if ``p \ge n``, we have ``T_p = 1 + \log_2(n)``. Therefore, ``\lim_{p \to \infty} S_p = S_n = \frac{n}{1 + \log_2(n)}``. Ignoring the constant ``1``, we get ``F_s = \log_2(n)/n``. """)metadatashow_logsèdisabled®skip_as_script«code_folded$947b8e5c-9cb7-4fe6-aff6-48416879fb43cell_id$947b8e5c-9cb7-4fe6-aff6-48416879fb43code:@time many_sum(vec; base_num_threads, factor, verbose = 1)metadatashow_logsèdisabled®skip_as_script«code_folded$4b9cfb4d-2355-42e3-be2f-35e2638e984bcell_id$4b9cfb4d-2355-42e3-be2f-35e2638e984bcodesum_md_codemetadatashow_logsèdisabled®skip_as_script«code_folded$3bff6f16-ad66-4aed-9a29-4bdcfe41b57fcell_id$3bff6f16-ad66-4aed-9a29-4bdcfe41b57fcode@htl("""

LINMA2710 - Scientific Computing Shared-Memory Multiprocessing

P.-A. Absil and B. Legat

$(PlutoTeachingTools.ChooseDisplayMode()) $(PlutoUI.TableOfContents(depth=1)) """)metadatashow_logsèdisabled®skip_as_script«code_folded$f7da896d-089c-4430-b82c-db86c380b171cell_id$f7da896d-089c-4430-b82c-db86c380b171code4md""" The first `sum_to` takes ``n/p`` operations. Assuming `factor` is `2`, there is one operation for each of the ``\log_2(p)`` subsequent `sum_to`. ```math \begin{align} T_1 & = n\\ T_p & = n/p + \log_2(p)\\ S_p & = \frac{1}{1/p + \log_2(p)/n} & E_p & = \frac{1}{1 + p\log_2(p)/n} \end{align} ```"""metadatashow_logsèdisabled®skip_as_script«code_folded$655d59f3-665b-410b-b534-afadae96295bcell_id$655d59f3-665b-410b-b534-afadae96295bcodeEaside(md"Try it on your laptop! ```sh $ lstopo ```", v_offset = -200)metadatashow_logsèdisabled®skip_as_script«code_folded$5d7cd5e3-5fc2-4835-bea1-c4897467365bcell_id$5d7cd5e3-5fc2-4835-bea1-c4897467365bcode'aside(sum_matrix_code, v_offset = -470)metadatashow_logsèdisabled®skip_as_script«code_folded$81da94b8-1bbf-4773-ba53-d229452cef75cell_id$81da94b8-1bbf-4773-ba53-d229452cef75codemat = rand(Cfloat, 2^8, 2^8)metadatashow_logsèdisabled®skip_as_script«code_folded$258817e3-8495-4136-8cb9-00a4475245b2cell_id$258817e3-8495-4136-8cb9-00a4475245b2codemany_sum_md_codemetadatashow_logsèdisabled®skip_as_script«code_folded$a144f991-3af4-4f88-a7e8-c1f3e9d7f7e2cell_id$a144f991-3af4-4f88-a7e8-c1f3e9d7f7e2codeaside(md""" Low level implementation using POSIX Threads (pthreads) covered in "LEPL1503 : Projet 3". We use the high level $(img(URL("https://upload.wikimedia.org/wikipedia/commons/e/eb/OpenMP_logo.png"), :width => "45pt")) library in this course. """, v_offset = -1000)metadatashow_logsèdisabled®skip_as_script«code_folded$dbe9c6d8-611e-46f9-9a8e-2b3647e813facell_id$dbe9c6d8-611e-46f9-9a8e-2b3647e813facodejbegin dir = mktempdir() file = joinpath(dir, "topo.png") run(`lstopo $file`) img(file) endmetadatashow_logsèdisabled®skip_as_script«code_folded$4ae73a28-d945-4c1b-a281-aa4931bf0bfdcell_id$4ae73a28-d945-4c1b-a281-aa4931bf0bfdcode@btime c_sum($mat)metadatashow_logsèdisabled®skip_as_script«code_folded$19655acd-5880-44fa-ac29-d56faf43e87bcell_id$19655acd-5880-44fa-ac29-d56faf43e87bcodefunction c_sum_code(T; local_results::Bool, no_false_sharing::Bool, simd::Bool) code = """ #include #include #include #include extern "C" { $T sum($T *vec, int length, int num_threads, int verbose) { $T total = 0; omp_set_dynamic(0); // Force the value `num_threads` omp_set_num_threads(num_threads); """ if local_results code *= """ std::vector<$T> local_results(num_threads); """ end code *= """ #pragma omp parallel { int thread_num = omp_get_thread_num(); int stride = length / num_threads; int last = stride * (thread_num + 1); if (thread_num + 1 == num_threads) last = length; if (verbose >= 1) fprintf(stderr, "thread id : %d / %d %d:%d\\n", thread_num, omp_get_num_threads(), stride * thread_num, last - 1); """ if no_false_sharing code *= """ $T no_false_sharing = 0; """ end if simd code *= """ #pragma omp simd """ end code *= """ for (int i = stride * thread_num; i < last; i++) $(local_results ? (no_false_sharing ? "no_false_sharing" : "local_results[thread_num]") : "total") += vec[i]; """ if local_results && no_false_sharing code *= """ local_results[thread_num] = no_false_sharing; """ end code *= """ } """ if local_results code *= """ for (int i = 0; i < local_results.size(); i++) total += local_results[i]; """ end code *= """ return total; } } """ return CppCode(code) end;metadatashow_logsèdisabled®skip_as_script«code_folded$b2b3beda-c8bf-4616-b1bd-bdd907d11636cell_id$b2b3beda-c8bf-4616-b1bd-bdd907d11636codehbox([ definition("Speed-up", md""" ```math S_p = \frac{T_1}{T_p} ```"""), Div(definition("Efficiency", md""" ```math E_p = \frac{S_p}{p} ```"""); style = Dict("margin-left" => "30px")), Div(md""" Let ``T_p`` bet the time with ``p`` processes * ``E_p > 1`` → Unlikely * ``E_p = 1`` → Ideal * ``E_p < 1`` → Realistic """; style = Dict("margin" => "30px", "flex-grow" => "1")), ]; style = Dict("align-items" => "center", "justify-content" => "center"))metadatashow_logsèdisabled®skip_as_script«code_folded$37d9b5f0-48b6-4ff3-873d-592230687995cell_id$37d9b5f0-48b6-4ff3-873d-592230687995codemd"## Hierarchy"metadatashow_logsèdisabled®skip_as_script«code_folded$fa017c45-6410-4c14-b9a2-ede33759d396cell_id$fa017c45-6410-4c14-b9a2-ede33759d396codeىsum_matrix_code, sum_matrix_lib = compile_lib(c_sum_matrix("float"; column_wise), lib = true, cflags = ["-O3", "-mavx2", "-ffast-math"]);metadatashow_logsèdisabled®skip_as_script«code_folded$91f7f3fe-cb4d-4461-9147-81c72b650a4bcell_id$91f7f3fe-cb4d-4461-9147-81c72b650a4bcodesbegin struct Path path::String end function imgpath(path::Path) file = path.path if !('.' in file) file = file * ".png" end return joinpath(joinpath(@__DIR__, "images", file)) end function img(path::Path, args...; kws...) return PlutoUI.LocalResource(imgpath(path), args...) end struct URL url::String end function save_image(url::URL, html_attributes...; name = split(url.url, '/')[end], kws...) path = joinpath("cache", name) return PlutoTeachingTools.RobustLocalResource(url.url, path, html_attributes...), path end function img(url::URL, args...; kws...) r, _ = save_image(url, args...; kws...) return @htl("$r") end function img(file::String, args...; kws...) if startswith(file, "http") img(URL(file), args...; kws...) else img(Path(file), args...; kws...) end end endmetadatashow_logsèdisabled®skip_as_script«code_folded$b5f1d18c-53ad-4441-8ca8-e02d6ab840d0cell_id$b5f1d18c-53ad-4441-8ca8-e02d6ab840d0code*@time c_sum(vec; num_threads, verbose = 1)metadatashow_logsèdisabled®skip_as_script«code_folded$c6ea9bc5-bb15-4e25-a854-de3417d736a6cell_id$c6ea9bc5-bb15-4e25-a854-de3417d736a6code.aside(bibcite("Figure 1.13"), v_offset = -250)metadatashow_logsèdisabled®skip_as_script«code_folded$02be0de6-70dc-4cf4-b630-b541a304eecdcell_id$02be0de6-70dc-4cf4-b630-b541a304eecdcodeلimg("https://github.com/VictorEijkhout/TheArtOfHPC_vol1_scientificcomputing/raw/refs/heads/main/booksources/graphics/prefetch.jpeg")metadatashow_logsèdisabled®skip_as_script«code_folded$f26f0a70-c16b-491d-b4cf-45ca146727c2cell_id$f26f0a70-c16b-491d-b4cf-45ca146727c2code!md"## Illustration with matrices"metadatashow_logsèdisabled®skip_as_script«code_foldedënotebook_id$73dc2256-4ab3-11f1-b967-89f2e690ee32in_temp_dir¨metadata