Profile avatar
abhi9u.bsky.social
NetBSD Dev | Python Internals, AI, compilers, databases, & performance engineering | https://blog.codingconfessions.com/
83 posts 1,268 followers 129 following
Regular Contributor
Active Commenter
comment in response to post
Nostarch has C++ crash course, but it is 700+ pages. The language has become a monster.
comment in response to post
In my latest article I take you through each step of the design of this ingenious algorithm right from the scratch. blog.codingconfessions.com/p/how-unix-s...
comment in response to post
The story of how he designed this dictionary lookup from the ground up is a lesson in software engineering. How you consider the resource constraints in front of you and design your solution to run within them.
comment in response to post
He analyzed the mathematical properties of dictionary lookups and developed a compression algorithm that came within 0.03 bits of the theoretical minimum possible compression - a feat that remains unmatched even today.
comment in response to post
The constraints of the PDP-11 computer meant the entire dictionary needed to fit in just 64kB of RAM. A seemingly impossible task. Instead of using generic compression techniques, McIlroy took a different approach.
comment in response to post
PDF: PDF web.stanford.edu/class/ee398a...
comment in response to post
This is similar to saying modern hardware supports simd, or it is superscalar, when it has been a few decades since those things became common and no more modern. But still it feels wrong to not write "modern" because sure enough there is someone running an ancient 16 bit processor without these.
comment in response to post
Link to the article: blog.codingconfessions.com/p/the-cap-th...
comment in response to post
2. Richness: The algorithm should be capable of generating all possible clusterings of the data. 3. Consistency: If we move similar points closer and dissimilar points farther, the clustering shouldn't change. The paper proves mathematically that it is impossible to have all the 3 properties.
comment in response to post
It is based on a paper by Jon Kleinberg from 2002. The paper mentions that an ideal clustering algorithm should have three properties. 1. Scale-invariance: Changing the scale of data shouldn't change the clustering algorithm's output
comment in response to post
I have it since childhood when I had no devices. I asked everyone if they hear this sound and found out that it was just me 😂
comment in response to post
That's pretty interesting. Thank you for the detailed response. :) What do you do about explaining the more complicated algorithms? I was listening to Jeremy Siek's workshop and he mentioned that his book covers some of it in one chapter and he usually redirects students to the other books.
comment in response to post
How did you get rid of it?
comment in response to post
The comments there are just terrible. Either unrelated to the topic, or in order to sound smart they will nit pick on some small point and harp on it.
comment in response to post
I was looking at the bpf jit compiler in Linux recently. It takes the bpf bytecode, decodes it and then directly generates machine code instructions. They don't call it a transpiler either.
comment in response to post
Lol, first time hearing this joke 🤣
comment in response to post
Looks amazing. After seeing this picture, I can't resist ordering pasta for dinner.
comment in response to post
Was this made possible because of the PEG parser?
comment in response to post
In the case of execve, it loads a new program into the memory of the current process. As a result it replaces the executable instructions in the child with that of the new program. This means that the child will not execute any of the remaining instructions from the old program.
comment in response to post
Good catches. The assembly program does print on stderr, I did that because initially I intended to make the parent and child write to different files for demonstration. Eventually, I went with execve and didn't update the programs. I've fixed these.
comment in response to post
A non-tech person might think that these software people are weird, mixing their coffee and wine together. 😂
comment in response to post
Argh, I meant 3blue1brown
comment in response to post
What happened?
comment in response to post
You found a use for sys.monitoring? I've some interesting ideas as well, but too many other things to do :/
comment in response to post
Hey man, thank you so much! Great to see you here.
comment in response to post
There are some promising changes happening around improving gc which might cut down the costs of a full heap scan quite a lot. I'm keeping an eye on them.
comment in response to post
If there are long living objects, there is no way around gc at the moment. The best knob is to increase the threshold of old generation because an in memory db means you are happy to pay for memory. Would have been great if there was a way to say don't scan these objects. But there isn't.
comment in response to post
Use weak references so that when those cache entries are no longer used anywhere it gets deallocated. This means that when gc runs it will not have that many objects to scan.
comment in response to post
Read here for the full gory details: blog.codingconfessions.com/p/connecting...
comment in response to post
All of these details have impact on the performance of your Python code. The size of your objects, how many of them you create, what is their lifetime etc. can play a role in the amount of time you spend in memory allocation and garbage collection. I cover these insights as well in the article.
comment in response to post
The GC scans the heap and detects cyclic references. If it finds cycles which are unreachable from anywhere else, it breaks the cycle so that the reference counting mechanism kicks in and those objects are destroyed. I've already covered how all of this happens in the earlier mentioned GC article.