Profile avatar
kitmenke.com
Data Engineering leader in Saint Louis, STL Big Data I.D.E.A. meetup organizer, lifelong learner and teacher. He / him #dataBS
39 posts 133 followers 336 following
Regular Contributor
Active Commenter

Databricks recently changed the default notebook format from "source" (.py, .sql, .scala) to IPYNB which seems to indicate they will be getting rid of the source format. IMO, the ipynb format brings a few issues like difficult diffs and the potential to leak data learn.microsoft.com/en-us/azure/...

I found this while looking through some scratch notes. I don't now recall what the context was, but it's an interesting thought on the evolution of the data warehouse. (Though there is an equivocation imbedded in this history) #databs

Do you version your data assets? Or is there only the current version of a database table? What about the table definition? #dataBS

An agile ceremony / rite of passage nobody mentions: arguing about story points and what they mean.

1. Impact. How much revenue does my work protect or generate? 2. Quality. Does my work meet or exceed customer expectations? 3. Efficiency. Reward making the right buy versus build decision. 4. Reusability. How do others leverage my work? 5. Supportability. How much work do I create for others?

I'm out walking and had some thoughts about data and fun stuff and mental health that I wanted to share. #dataBS

Gahhh it’s time! @data-dragoness.bsky.social devUp call for speakers! Let’s take over with the #PowerPlatform and #MicrosoftFabric topics! For anyone who’s in the middle west, let’s do this! sessionize.com/dev-up-2025

Great breakdown of the new S3 Tables feature that leverages Apache Iceberg. Including an explanation of the costs... which are complicated. #dataBS bigdata.2minutestreaming.com/p/meet-your-...

The more I think about yesterday's announcement about Amazon S3 Tables, the more I think that it changes things a great deal. The gravity of data has shifted from the warehouse to cloud storage... but is there really a difference any more? 🧵1/n www.businesswire.com/news/home/20....

Arch Data Network is hosting an event around Apache Airflow and how it is being used. This Thursday, December 5th from 5:30 - 7:30 pm in Creve Coeur www.linkedin.com/events/archd...

I made a Starter Pack for people in the Saint Louis, Missouri area who are doing cool stuff in Data Engineering, Data Analytics, or Data Science. If you're doing data in STL let me know! #datasky #dataBS go.bsky.app/SZUtRw3

In two weeks, the St. Louis Big Data I.D.E.A. meetup is hosting @chad-isenberg.bsky.social to give an overview on dbt, alternatives, and the future of "the last mile" in data management. Beginners welcome! 🗓️ When: December 4, 2024 @ 5:30 PM 📍Where: Virtually on Zoom RSVP below! #dataBS #datasky

Foursquare just open sourced their 100 million place point of interest dataset! Some notes on poking around with it using DuckDB (it's Parquet files on S3) simonwillison.net/2024/Nov/20/...

New feed! Add #dataBS or #databsky to your post, and it'll get ingested into this custom Data BS feed, which looks back over 7 days of posts.

Some work that I have been involved in the last year. I hope you like the blogpost from our lead, Soumaya as its a very interesting solution. Not all problems are nails to the hammer of Spark :) ministryofjustice.github.io/data-and-ana...

In June Elon posted this graph of the rate of likes on X. It doesn't have a unit on the y axis, but it's plausible to assume that it's events/sec. If that is true, X in June was handling about 20k likes/sec. For comparison, Bluesky is now handling about 700 likes/sec during the busy part of the day.

Thinking about creating a STL Data starter pack for those doing cool stuff in data engineering, data analytics, and data science...

New post up! ✨ Exploring AT Protocol with Python to visualize the #databs social graph! davidgasquez.com/exploring-at... Took less than 1 hour to get the data and plot it. Amazing what you can do with open APIs and great SDKs!

I'm all set up with SAP PowerDesigner. Next step... World domination.

if u are wondering about that red pin emoji here 📌 if u find a post u want to return to u can stick a pin in it by replying with the red pin emoji then use @jaz.bsky.social's red pin feed which keeps track of your red pins

By popular request, the moment on May 1, 2023, when Jake Tapper went on TV and referred to a Bluesky post as a "skeet"

I have to be honest, updating my website is a real bummer because there is always some breaking change that Hugo has introduced.

Cat tax!

Do you unit test Databricks SQL code? SQL pipelines seem impossible to test in an automated way. A local option is to use open source Spark but that seems incomplete if you're using Databricks specific syntax. #dataBS