Profile avatar
kojikubota.ai
AI enthusiast based in Tokyo, Japan 🇯🇵 Originally from London, UK 🇬🇧 @golang.org dev 💻
631 posts 72 followers 55 following
Prolific Poster

If you’re making the most of GPT-4.5, I’d love to hear how you’re using it. Your ideas would be greatly appreciated.

I’m not sure when I’d ever need to use GPT-4.5. I rely on Claude 3 Sonnet for coding and Grok 3 for everything else. To me, GPT-4.5 doesn’t stand out as the best at anything. If I’m off base here, feel free to let me know!

If AI could build anything, what should human engineers do?

I use Claude 3.7 Sonnet for coding and Grok 3 for everything else. That’s my rule.

The pricing structure for Cursor Agent, with 1 high-speed request consumption for up to 25 tool calls, and Claude 3.7 Sonnet (Thinking) being treated the same as other premium models, feels so cheap that it’s hard to explain from the token-based billing perspective.

The token rate for Claude 3.7 Sonnet and Claude 3.7 Sonnet (Thinking) should be the same, but it seems the latter has a higher usage cost due to the thinking tokens. Windsurf’s pricing structure reflects this, which might suggest that Cursor is remarkably cheap in comparison.

The advantage of unlimited DeepSeek-V3 might now seem a bit pricey compared to Cursor’s Claude 3.7 Sonnet (Thinking) with its 1 fast request usage.

Windsurf has been updated to version 1.3.9, and Claude 3.7 Sonnet has finally appeared on the model list. It seems the Extended Thinking mode will consume 1.5 credits.

Claude 3.7 Sonnet is so brilliant that it genuinely impresses me. The code it generates has hardly any bugs. It’s as close to zero as you can get.

Claude 3.7 Sonnet’s extended thinking mode makes tricky coding challenges easier to manage. It offers: 1. Thorough debugging and verification 2. Thoughtful design and smooth refactoring 3. Solid support for complex algorithms 4. Improved accuracy in testing and code generation

on X, the praise for Claude 3.7 Sonnet just doesn’t stop. I wonder what will happen when Claude 4.0 comes out. I’m curious not only about 4.0’s coding performance but also about how devs will react.

I thought Grok 3’s coding capabilities were quite nice, but I must say Claude 3.7 Sonnet has rather easily overtaken it

it’s absolutely brilliant how Cursor and GitHub Copilot have quickly made Claude 3.7 Sonnet available. Windsurf feels a bit like it’s lagging behind.

what do you think will happen when Anthropic releases Claude 4.0? I guess all the devs in the world would cry with joy

I'm keeping the details of my Bluesky app under wraps for now – you'll have to wait and see! With so many fantastic clients already out there, thanks to other talented developers, I'm focusing on building something quite niche, specialising in a single feature.

I plan to use TypeScript + React for the frontend and Go + Gin for the backend, which means I also need to learn TypeScript. There’s a lot to learn, but I’m enjoying it all.

I’ve got a decent understanding of web app development, so I reckon I’ll have a go at developing a Bluesky-related app as a practical learning exercise.

エスパルス、勝利!オールドファンにとってはヴェルディに勝つのは至上の喜びなのですよ。

今日のヴェルディ対エスパルス戦、観に行きたかったなぁ…

No prior web app experience, but I'm diving in with Devin AI! I'm using its Knowledge feature to track my learning. I've built a basic CRUD app, sticking to best practices. #Devin #Cognition #AISoftwareEngineer

What models will the Claude 4 series have? Haiku, Sonnet and that’s it? Opus will never come back? #Anthropic #AI

It's a real pain when you've had a long chat and then decide to switch sessions. You’ve got to give all the context from scratch again.

Crafting a .windsurfrules file for TypeScript projects now, and I’m doing it with massive help from ChatGPT o1 pro. #TypeScript #Windsurf #ChatGPT

Lots of developers love Claude 3.5 Sonnet, so I'm super excited for Claude 4 – it's supposed to be coming out in the next few weeks! #Anthropic #AI

I've just developed a bot that posts famous quotes from the greatest minds on Bluesky, using Cognition's Devin AI. Things are getting more and more fun with Devin! github.com/littleironwa...

Devinたん単独でBlueskyのBotを作ってみた。もうほとんどDevinに作業してもらって、僕はちょっとした指示とアドバイスをしただけ。 github.com/littleironwa...

That which does not kill us makes us stronger. - Friedrich Nietzsche

To be, or not to be, that is the question. - William Shakespeare

GoでBlueskyのBotを作ってる。偉人の名言がそれです。

In the middle of difficulty lies opportunity. - Albert Einstein

I think, therefore I am. - René Descartes

In the middle of difficulty lies opportunity. ― Albert Einstein

To be, or not to be, that is the question. ― William Shakespeare

Devinは、単純なシステムの新規開発には向いてるけど、複雑な既存システムの変更は苦手とのこと。後者は、テスト成功率が約15%と信頼性が低く、特に複雑なコードとなると人間の監視が不可欠だそうです。僕が試した限りでは、複雑なリファクタリングも難なくこなしてくれたから、ちょっと意外でした。

DevinのKnowledgeの記述ルールをo3-mini-high x Searchで調べてみた。 Markdownで記述された内容は見出し、箇条書き、リンク、コードブロックなどの基本的な書式が正しくレンダリングされるけど、YAML形式の記述は特別なパース処理が行われず、プレーンテキストとして扱われるみたい。

最近の楽しみは、DevinのKnowledgeが日々蓄積されていくのを確認すること😅

DevinのKnowledge、デフォルトでローカルテストによる動作確認を行わないようになってるんだけど、これってシステム開発における一般的な慣習から考えると、何か意図があるということ?Devinの開発思想って、もしかして、テストよりも開発速度を重視してる?

Devinを「育てる」という意味では、どんなに些細なタスクでも具体的な指示を与えると良い。そうすれば、Knowledgeが蓄積されて、どんどん優秀になっていく。新人エンジニアを育成するように、根気強く、丁寧に接することが大事。

プロジェクト内のすべてのソースファイルの内容と、フォルダ・ファイル構成をテキストファイルに出力する超簡易ツールをGoで作ってみた。ツールの出力内容をそのままo1 proに入力すれば、プロジェクト全体の俯瞰が必要なタスクも簡単に依頼できる。作業がかなり効率的になった。

OpenAIがo3-miniのリリースを発表した時、「o1 proの方が賢いなら、絶対そっちを使うでしょ」と思ってたけど、今はo3-mini-highばかり使ってる。やっぱり、スピードは重要。

Cursorから完全にWindsurfに乗り換えてしまった感がある。Cursor、ごめん。

Anthropicが新たなAI安全システムを発表し、初期テストで95%以上のjailbreakを防ぐことに成功したそうです。無害なクエリに対する過剰な拒否や、高い計算コストといった課題はあるものの、現在15,000ドルの賞金付きjailbreakチャレンジを実施しているので、我こそはという人は是非挑戦を!

今日はDevinたんに何をやってもらおうかな