I made this free and open source dev tool:
https://gitingest.com
Replace "hub" with "ingest" in any github URL to get a text digest that you can feed into any LLM
https://gitingest.com
Replace "hub" with "ingest" in any github URL to get a text digest that you can feed into any LLM
Comments
Check out the repo if there's any issue you like or open your own if you have an idea ! https://github.com/cyclotruc/gitingest
A temporary fix would be to get a specific subdirectory from the url like so:
https://gitingest.com/google-gemini/generative-ai-python/docs/api/google
There's no logic in my code to to anything special about licences, but I fail to see which kind of licence would say "this code shouldn't be read by humans or machines"
I didn't know about those types of licence
While on a personal note I fail to see their purpose, I agree that gitingest should respect those.
I'm definitely adding this to the roadmap. Do you know if there's any standard that could help me identify those licences?
One thing I've hit a couple times is that if the repo has a data outputting image/png the gitingest includes the whole hash. This quickly skyrockets the number of tokens, but I'm not sure how to exclude it.
If so maybe the interface could warn or offer an option to exclude assets? I was hoping to capture oddly long contribution docs, examples, etc, but not media.
Better yet, an opt-in model?
I heard there is ANTI-AI licences that exists but taking those into account would require.. using an AI to read it
Also, your code is not "added" to a tool like gitingest, someone has to use gitingest on your code
SPDX licensing has an API and generally most copy left licensing could be anti-AI training
Opt-in can be inverted such that folks can only use this with repositories that they have collaborator access to (for example).
In the end my tool is free and open source
Anyone can just fork it and remove the restrictions in code directly, then ship the modified product that will eventually become more popular as people hit the limits of my own "censored"version
Keeping the context focused is the key
The space between AI and coding has a lot of space to grow. A tool I was thinking about was a VSCode extention where I can right click a file and copy the entire file but with the file name on top, for promting
I also believe there's a lot of things to do, starting from the most simple ones
I would love for someone to steal that idea.
Hope you keep building mini tools in the like. Feel free to ping me for a re-tweet or however is called here