Profile avatar
instlatx64.bsky.social
x86/x64, SIMD, #AVX512, "Aha!" moments. I have been writing code since 1986. Budapest, Europe https://github.com/InstLatx64/InstLatx64
115 posts 177 followers 1 following
Prolific Poster
Conversation Starter

#Intel refreshed the #AVX10_1 specification to 3.1: cdrdv2.intel.com/v1/dl/getCon...

#AMD refreshed the "Revision Guide for AMD Family 1Ah Models 00h-0Fh Processors" 58251 to v1.10 pdf (#Turin #Zen5 C1 CPUID B00F21 #EPYC) www.amd.com/content/dam/...

#AMD refreshed the "Revision Guide for AMD Family 1Ah Models 10h-1Fh Processors" 58730 to v1.10 pdf (#TurinD #Zen5c B0 CPUID B10F10 #EPYC) www.amd.com/content/dam/...

This is an important question: #AVX512_VP2INTERSECT is useful for finding duplicates, it compresses the result of 256-comparison into 2 16-bit vectors Source / Idea: x.com/GrakenKrenia...

Good news: according to ark.intel.com, every #Intel #GraniteRapids-based #Xeon6 6900P, 6700P, 6500P SKU supports #AVX512 with dual 512bFMA, even the #GraniteRidgeD ones too www.intel.com/content/www/...

This is a bit surprising, all #Intel #Xeon6 63xxP SKUs are #RaptorCove based, LGA1700:

#AMD refreshed the "AMD I/O Virtualization Technology (IOMMU) Specification" to 3.10: www.amd.com/content/dam/...

Trace of a working A0 #PantherLake (3/3.2GHz, CPUID C06C0, #Intel 18A process) in coreboot project: chromium.googlesource.com/chromiumos/t...

The latest #Intel "Software Security Guidance": reveals the #GrandRidge 's CPUID stepping number too: B0664 www.intel.com/content/www/...

Interesting factors: - #Goldencove has 3 load port (reason of the advantage over #CypressCove) - zmm store readout is 7 vs 11+ clks on #AMD - AMD Zens have just 10 register read port (mentioned in the optim. guide) - 1 clk ternlog lat (2 clks on 2nd&3rd operand on AMD): www.uops.info/html-lat/ZEN...

Always there is a faster code: My current byte-histogram results vs 2024 It is interesting, that how closer is #Intel #GoldenCove to the theoretical limit (0.343 <-> 0.372) than #AMD #Zen5 (0.207 <-> 2.54) x.com/InstLatX64/s... #GFNI

#AMD #Zen6 -updated #AVX512 / #AMX Euler-diagram: " #AVX512_VP2INTERSECT will continue to be supported in AMD processors going forward " Source: x.com/LeslieB82382... GItHub: github.com/InstLatx64/I...

@haroldaptroot.bsky.social after @instlatx64.bsky.social pointed me to your blog post about histogramming I showed a coworker. He was so impressed, (as was I) that he wrote his own blog post about it. github.com/JoernEngel/j...

#Intel microcode refresh 20250211: github.com/intel/Intel-... Release Notes: github.com/intel/Intel-...

#Intel mentions the brand name of #GrandRidge (#Atom P6900, #Crestmont, CPUID B0660) here: www.intel.com/content/www/...

Unfortunately the diagram was wrong, hopefully it's correct now: based on the current #AMD #EPYC 9965 dump, the #Zen5 also shifts the APIC bits to right in case of SMT-off, like the #Zen2 and #Zen3 (and unlike #Zen4) EPYC 9965: github.com/InstLatx64/I...

New dump: -- 2x 192-Core #AMD #EPYC 9965 (#TurinD, #Zen5c) B10F10 CPUID dump (SMT Off) GitHub: github.com/InstLatx64/I...

End of an era: #AIDA64 doesn't support Win9x anymore

#AMD released the "Smart Data Cache Injection (SDCI)" White Paper 58725 v1.00: www.amd.com/content/dam/...

MCExtractor DB r322 reveals #AMD #MI300 CPUIDs: A80F00, A80F01 = AMD MI300C (#Zen4 + #HBM3) A90F00, A90F01 = AMD MI300A (#Zen4 + #CDNA3 + #HBM3) github.com/platomav/CPU...

#AMD released the "Versioned Chip Endorsement Key (VCEK) Certificate and KDS Interface Specification" 57230 1.00 pdf www.amd.com/content/dam/...

#Intel projects 2025+ v37 #PantherLake #NovaLake #RazerLake #TitanLake #BartlettLake #WildcatLake #GraniteRapids #DiamondRapids #CoralRapids #ClearwaterForest #RogueRiverForest #APX #AVX10_2

#VisualStudio2022 17.13.0 Preview 4 got the #EVEX-encoded, zmm/512b #SM4 cipher support: #DiamondRapids #AVX10_2

A very nice phrase search example from Gabriel Menezes using the #AVX512 VP2INTERSECT instruction - too bad it's deprecated on #Intel processors and its future on #AMD #Zen6 is unclear. gab-menezes.github.io/2025/01/13/u...

#Intel #GNR-W and #GNR-E are mentioned on intel.com: www.intel.com/content/www/... #GraniteRapidsW #GraniteRapidsE

@AgnerFog_ has new version of the Nan propagation paper: www.agner.org/optimize/nan...

New #Intel #RaptorLake / #BartlettLake? steppings appeared among the microcode updates: B06F6, B06F7 winraid.level1techs.com/t/intel-amd-...

#AMD refreshed the "SEV Secure Nested Paging Firmware ABI Specification" to v1.57: www.amd.com/content/dam/...

I'm only asking because I'm currently working on the #Intel #LionCove / #Skymont port assignment (e.g. this diagram is wrong, vector SQRT/DIV uses V01, not V23), but if there's no interest, I won't bother publishing it.

#Intel x86 / x64 projects 2025+ v0.33

#Intel refreshed the "Resource Director Technology (Intel® RDT) Architecture Specification" with the "Region Aware Memory Bandwidth Allocation" feature: cdrdv2-public.intel.com/789566/35668... #DiamondRapids 356688-002US pdf

#Intel refreshed the Advanced Performance Extensions (#Intel #APX) Architecture Specification to 6.0: cdrdv2-public.intel.com/844828/35582... #AMX_MOVRS #AMX_TRANSPOSE #MSR_IMM

#Intel refreshed the #AVX10_2 specification to 3.0: cdrdv2-public.intel.com/844829/36105... AVX10.2 is unified now with levels of #GraniteRidge + #AMX_AVX512 #SM4 #MOVRS, #AMX_TRANSPOSE #AMX_FP8 #AMX_TF32 still missing

In the #AVX512 world, that's just 8 uops on the critical path:

Test code in InstLatX64_Demo: github.com/InstLatx64/I...

This is so cool! #AMD #StrixHalo #AVX512

Addendum: There is no port or throughput change here, 1|.5 P06 will be 3|.5 P06 (and 1|1 P1 -> 3|1 P1) #Intel, #GoldenCove, #RaptorCove, #RedwoodCove

New dump: -- 8C AMD Ryzen 7 8845HS (Hawk Point, Zen4) A70F52 CPUID dump GitHub: github.com/InstLatx64/I...

What explains the unusually low interest in #Intel #LionCove? I posted it over 2 weeks ago, Intel/Agner/Uops.info didn't elaborate on it until now, yet no one reposted it. Bad timing? Credibility? I find it hard to believe that no one is interested in the details of the new Intel P-core.