What advice do folks have for organising projects that will be deployed to production? How do you organise your directories? What do you do if you're deploying multiple "things" (e.g. an app and an api) from the same project? - ThreadSky

hadley.nz • 5 days ago

What advice do folks have for organising projects that will be deployed to production? How do you organise your directories? What do you do if you're deploying multiple "things" (e.g. an app and an api) from the same project?

Comments

opus1993.bsky.social•5 days ago

📌

ellisvalentiner.bsky.social•5 days ago

There are many different conventions depending on language, framework, organizational practices, etc.

I advocate separating deployable units from each other if possible so each be released independently.

hadley.nz•5 days ago

What do you do if they share code?

ellisvalentiner.bsky.social•4 days ago

Shared code goes into a package/library.

The type of deployment (batch job, pubsub service, api, etc) is mostly boilerplate that invokes one or more functions from the library.

hadley.nz•4 days ago

How do the individual deployments get the package?

ellisvalentiner.bsky.social•3 days ago

Packages are installed into containers at build time & then may be deployed.

lib code -> lib pkg -> app code -> app pkg -> container

Code is stored in a code repository; packages and containers are stored in an artifact repository.

hadley.nz•3 days ago

And you install those same packages for your dev environment?

markjrieke.bsky.social•5 days ago

we PR model files separately from the API to run them. this means that the model PR doesn't add any functionality, but also lets us separate the stan/python reviews

ellisvalentiner.bsky.social•3 days ago

Do you make your models into your container images?

hadley.nz•5 days ago

Can you tell me more about this? PR = pull request?

markjrieke.bsky.social•5 days ago

yeah --- when we can, the first pull request will only contain new stan files. the new files don't get referenced anywhere until the second request that updates the api internals. helps keep PR size/cognitive load manageable (& eng can avoid reviewing stan code lol)

hadley.nz•5 days ago

That makes it sounds like all your models are served by the same API? Why not have one api per model? (Or group of models)

rdornas.bsky.social•1 day ago

📌

tanho.ca•5 days ago

R+Docker, we use an R pkg project structure (R/, man/, tests/, inst/) plus additional top-level folders like `exec/` for docker-executable scripts, `dev/` for devel/sandbox scripts, `reports/` for one-time reports, and `local/` for gitignored large files.

app.R, plumber.R etc go at the top level.

ellisvalentiner.bsky.social•3 days ago

Has R dependency management gotten better?

I haven’t tried to use R in a production setting in years.

tanho.ca•1 day ago

hm, pak solves a ton of pain points that I’m pretty happy with where it’s at, but it’s not as good as a ~ uv type solution which does frictionless virtual envs

hadley.nz•1 day ago

How could pak be more like uv?

tanho.ca•1 day ago

uv creates a project-specific virtual environment with its own python install + package libraries (renv does only the package half of this) and feels fast and frictionless (pak is pretty fast and frictionless for the pkg install part)

hadley.nz•1 day ago

So maybe more on pak/rig integration

brodriguesco.bsky.social•1 day ago

Hadley, do you know about Nix? It makes dependency management so smooth, as Nix takes care of not just R package dependencies but also R and system level dependencies in a declarative manner. It's been around for 20 years and works for many languages and tools, see https://docs.ropensci.org/rix/

hadley.nz•1 day ago

Yes, I know about it. I'm sceptical that it's good fit for most data scientists.

michaelgicheru.bsky.social•1 day ago

The equivalent in R would be pak+renv+rig. I would need to learn 3 tools and their quirks vs one tool with uv. I believe this is one of the few places a unified tool excels compared to decoupling, add onto the caveat that the 3 tools in R, from my experience, don't have great interoperability

michaelgicheru.bsky.social•1 day ago

I feel like uv is a lot more frictionless for package and project management.

sorhawell.bsky.social•18 hours ago

In python by tradition the package and environment managers are called from the system terminal. I call renv and/or pak from within an R session. I personally come around to like pythons approach when writing build scripts.

tanho.ca•5 days ago

DESCRIPTION lists all package dependencies, and we use Config/Needs/App, Config/Needs/API etc to specify dependencies that are only needed by specific parts of projects.

We have one Dockerfile per "thing" and a makefile that builds each docker image + applies tags + pushes to docker repo.

tanho.ca•5 days ago

i've championed using R package structure for all projects (including those that aren't actually "packages") because (in approx imptnce order):
- encourages my team to abstract logic into functions over scripts
- easier to write tests for functions, + leverage testing infrastructure from testthat

tanho.ca•5 days ago

- easier to document functions than scripts because of roxygen tooling
- easier to manage dependencies via DESCRIPTION file for prod and devel
- able to use a lot of the tooling R devs have written for package development eg pkgload::load_all
- is conventional and standardized across R developers

edenian-prince.bsky.social•5 days ago

do you use pkgdown for the documentation? im looking for basically pkgdown but for quarto instead of rmarkdown. ive been using quartodoc for python packages and it's awesome, but haven't seen equivalents in R

tanho.ca•5 days ago

internally we mostly use ?help type stuff moreso than pkgdown (primarily because annoying to host a private pkg webpage), but I've definitely used pkgdown in public work.

I think the quarto portion you want is here: https://pkgdown.r-lib.org/articles/quarto.html

tanho.ca•5 days ago

one of my current takes is that renv is a pain in the ass, we use docker to handle all the pkg alignment stuff in prod and i'd rather be antifragile by having the team encouraged to update to new versions as much as possible for performance/security reasons

alienpawi.bsky.social•5 days ago

Do you have a public repo example so I'm sure I understand?

hadley.nz•5 days ago

So you pin dependencies at time of deployment? And then the development environment stays current with CRAN?

andyteucher.bsky.social•4 days ago

This is great to see it laid out so nicely. Would make a good blog post ;)

andyteucher.bsky.social•4 days ago

📌

hadley.nz•5 days ago

Thanks for all the details!

Why do you put apis and apps at the top-level but reports in a subdirectory? What do you put in inst?

tanho.ca•5 days ago

our projects primarily do one of the following:

- a pipeline image that calls an exec script to build something and put it elsewhere (database, S3 etc)
- an app image + a pipeline image to build the data for the app
- we don't use apis as much right now but have experimented with, would use similar

tanho.ca•5 days ago

app.R goes in top level for ease of being able to run shiny::runApp() and have it automatically find the app. the logic and modules all live in R/ (which gets auto-loaded)

Same thing with plumber etc, although occasionally those can go in exec/ instead.

tanho.ca•5 days ago

reports/ (internally) are defined as things run only once + stored on git for future comparisons, if the report was a product that was generated on a schedule it would get written to something like S3 instead.

tanho.ca•5 days ago

inst/ contains some package data e.g. json or other configs needed at package runtime, but mostly data and outputs live elsewhere (S3).

honestly it's 99% because pre-commit's spell check lives here by default and hasn't been moved elsewhere.

hadley.nz•5 days ago

Where would the source qmd/rmd live in that case? in exec/?

janfailenschmid.bsky.social•4 days ago

📌

chris-burgoyne.bsky.social•5 days ago

I found this helpful...some of the tools will vary based on your context, but had some useful ideas for high level organization: https://www.tweag.io/blog/2023-04-04-python-monorepo-1/

karawoo.com•5 days ago

I structure everything as a package. I usually don’t deploy multiple things from the same repo but if I did I would probably put the api in inst/ and have separate dockerfiles for the app and api.

ellisvalentiner.bsky.social•3 days ago

Often I build an app but then biz wants a quick 1 off job but then product wants to turn that into a regular job, etc.

The invocation method is just a thin wrapper around a method in the package. Every app invocation looks the same, every job invocation looks the same, etc.

mikemahoney218.com•5 days ago

Usually some form of:
- everything gets built in one repo, organized in subfolders. CI understands these folders and builds and deploys apps
- everything gets its own repo, packaged and sent to some artifact storage, and a central orchestration repo has CI to download and deploy them

ellisvalentiner.bsky.social•5 days ago

The “everything gets built in one repo” approach can be fraught with headaches unless there is a team dedicated to wrangling the CI/CD pipeline. I much prefer separating things to their own repos, or at least separating by deployment artifact type.

mikemahoney218.com•5 days ago

agreed 100% -- I'd say projects often start as the former and evolve to the latter

and then one of those new repos evolves to have multiple components, and then those get split into their own repo, and the cycle continues 😂

mikemahoney218.com•5 days ago

The tricky thing is that every organization beyond a certain size will have extremely bespoke ways for you to deploy into their environment, so a lot of the time your projects need to change to accommodate that

hadley.nz•5 days ago

Right, the particulars of the deployment change, but surely that doesn't affect too much how you lay out your project? (Assuming that you have some automated step to get from standard layout to your bespoke layout)

mikemahoney218.com•5 days ago

I've got horror stories -- I worked at a place that hadn't yet refactored away from their initial monorepo, so all apps (even ones deployed as microservices) were hosted in the single repo and had to be formatted in the "standard" structure used by the original app the deploy script was written for

mikemahoney218.com•5 days ago

I don't think many tech-forward places are intentionally setting up environments like this, but I think there's similar horror stories at a lot of non tech companies who have slowly accumulated processes over time (and aren't focused on the investments needed to fix it)

blasbenito.com•5 days ago

Tree of R packages (one per repo) encapsulating distinct parts of the business logic, one metapackage to install them all, and a separate repo with renv, Dockerfile, and deployment tools.

ellisvalentiner.bsky.social•3 days ago

One metapackage to install them all?

This is some real Dark Lord Sauron stuff.

blasbenito.com•3 days ago

I am glad you caught the reference!

ellisvalentiner.bsky.social•3 days ago

My coworkers and I primarily communicate using LoTR gifs in Slack, so this was an easy one for me.

blasbenito.com•2 days ago

Your shop brights shinier than mine!

morgangray.bsky.social•5 days ago

📌

mike-thomas.bsky.social•4 days ago

In an extremely general sense, some typical conventions we use:

- Dockerfile at the root of the repository
- .github/workflows/* for CI/CD (typically rebuilding the Docker image and pushing it somewhere where it’ll be picked up by prod)

ellisvalentiner.bsky.social•3 days ago

I’m curious if your “it’ll be picked up by prod” is a push to prod or pull to prod.

I see most people pushing images to prod rather than pulling them in. Just interested what you do.

mike-thomas.bsky.social•4 days ago

- One script that “does the thing”, e.g., ‘main.R’ at root. Typically the last line in the Dockerfile runs this script.
- A directory called R/ containing all supporting logic used in ‘main.R’

mike-thomas.bsky.social•4 days ago

- Additional project-specific infrastructure for dependency management (DESCRIPTION, renv, uv, venv, etc.) that gets leveraged by the Dockerfile

In almost all cases, we take the microservices approach where 1 service (app, API, etc.) == 1 repository

hadley.nz•4 days ago

Thanks! How do you share code across repos?

mike-thomas.bsky.social•4 days ago

You’re welcome! It’s a great question / thought exercise.

I’m not sure I understand your question about sharing code across repos. Do you mean with respect to practicing D.R.Y.? Or are you asking how two services with codebases in different repos would talk to each other?

hadley.nz•4 days ago

Presumably you have some code that you want to use in multiple projects. How do you ensure that all the projects get the same version?

jxmartinez.bsky.social•4 days ago

📌

chasingmicrobes.bsky.social•5 days ago

IMO production looks very different depending on context, industry, etc

Sometimes directory structure. Sometimes separate coordinatesd packages. Sometimes Docker Compose.

hadley.nz•5 days ago

So there are no consistent principles?

chasingmicrobes.bsky.social•5 days ago

For me, mostly the same as package building

hadley.nz•5 days ago

Do you find that resonates with people? My sense is that most people don't want all the infrastructure of a package if they're just deploying a simple dashboard.

morgangray.bsky.social•5 days ago

“All the infrastructure of a package” is kind of needed for stuff used long-term. One reason I use the package approach is bc of the clear documentation and refined functionality. It’s easier to adapt a very nice wheel than invent a new one.

libbyheeren.bsky.social•3 hours ago

@taylorrodgers.bsky.social has been advocating for the package approach lately, for the same reasons

chasingmicrobes.bsky.social•5 days ago

I think it's the most reliable thing to do if it's a product expected to be maintained over time. If it is a one-off then likely not. (So, back to the "it depends")

vickiboykis.com•5 days ago

It depends 🤪

How many people are working on the project? Is the API internal or external to the app? Are you releasing an SDK? How many project components? Very generally, usually each separate component in a repo will be its own versioned project but the repo joins them.

hadley.nz•5 days ago

Hmmm, I'm thinking of the stuff that data scientists usually deploy, say a simple interactive app, a simpler api, a report, a dashboard, ...

chasingmicrobes.bsky.social•5 days ago

I would probably still say a package format, mostly because you benefit from testing, dependency management, and documentation tooling/ templates. Probably not that much different from my view in 2019-
https://github.com/chasemc/presentations/blob/master/RinPharma/rinpharma_aug_2019/chase_RINPHARMA.pdf

vickiboykis.com•5 days ago

Ah yeah. In that case I'd probably structure it mostly something like this (depending on the language,what my users expect) https://docs.astral.sh/uv/concepts/projects/workspaces/#workspace-sources

There's a top-level project (not sure I'd always use "packages", but bird-feeder could be the app and seeds the API/SDK)

hadley.nz•5 days ago

Oooh interesting. I hadn't seen that idea of a workspace before but it seems v. useful

hadley.nz•5 days ago

I assume there's usually a one-to-one correspondence between your workspace and your git repo? (assuming you're not using a monorepo)

vickiboykis.com•5 days ago

yep exactly. I haven't used uv workspaces yet but most of my projects where I deploy multiple separate libraries/packages together in a single git repo usually follow this similar pattern.

xlaszlo.bsky.social•5 days ago

At that point wouldn't it make sense to have more repos?

vickiboykis.com•5 days ago

If you’re developing packages that need to be aware of and depend on each other with a team and then exposing that repo to external consumers, it’s much easier to keep everything in one place

econmaett.bsky.social•4 days ago

📌

diquattro.bsky.social•4 days ago

I like to write a small CLI as an abstraction point to route “production commands” to whatever code should run. Works well as an entry point to a docker image too. The CLI script ends up being a nice overview of the functionality of the project as well.

hadley.nz•4 days ago

You mean some sort of lightweight R/python/shell script?

diquattro.bsky.social•4 days ago

Yeah, I’ve used the docopt package typically for the interface, but I haven’t gone beyond it being contained within a single R script file

xlaszlo.bsky.social•5 days ago

I usually poke around larger projects how they did it and just copy it...

Others much better at this probably have better ideas.

mmuecke.bsky.social•4 days ago

My preferred approach is using R packages + Docker with r-minimal + Makefile and either having a main.R in inst/ with a trigger or cron job or plumber API. But generally my advice is using an internal CRAN and being careful with writing your own job queue logic.

mrcaseb.nflverse.com•5 days ago

I have no experience but this sounds like a good application of git submodules?

blasbenito.com•5 days ago

No no, no git submodules here, please!

hadley.nz•5 days ago

My experience with git submodules suggests that that just creates a new harder problem 🤣

blasbenito.com•5 days ago

Hard agree! My push to prevent the usage of submodules in my shop failed, but they are banned in the repos I manage directly.

mdsumner.bsky.social•5 days ago

totally

dylanstorey.com•5 days ago

"It depends", what's the size of the team/complexity of the system you're developing. How fast do you expect changes to occur on components, how much stability do you think you can expect in interfaces between components, what is your target deployment architecture ?

dylanstorey.com•5 days ago

Also how are you handling dependency sharing between folders/packages/modules ?

rdewald.bsky.social•5 days ago

I deploy multiple Rmd's, Shiny apps, Blastula emails, and api's out of the same GitLab project. This simplifies issue intake. If it's about the Shiny app, the emailed summary, the api, or the reporting on this broad concept, issues go in one place, with detailed tags, etc.

hadley.nz•5 days ago

Are they all related to the same underlying project?

rdewald.bsky.social•5 days ago

Yes, for example, end-of-life healthcare management, or analysis of referrals to providers. Those would be separate projects.

rdewald.bsky.social•5 days ago

But the Shiny app, the Markdown library, the api, etc about end of life are all in the same project, each in their own folder. Using renv to manage the libraries helps, because if I need a function in the Shiny app, I probably also need it in the api.

christophscheuch.bsky.social•1 day ago

I‘d also advocate for deploying apps, APIs or jobs via dedicated packages in Docker containers. Common tools (e.g. custom shiny modules reused across apps) end up as local packages in a shared base image (self-hosted package repo so far didn’t seem worth the effort).

christophscheuch.bsky.social•1 day ago

Also, consistent QA and prod environment are really important. My last project was a shiny app that became pretty complex over time, used box modules, was deployed by copying files to a Shiny Server and had tables with _test suffix for QA. Updates constantly broke stuff, I really missed Docker 😅

johncoene.bsky.social•1 day ago

Shiny app is a package installed and imported in app.R in its own dir, separate dir (git submodule) for plumber API. Then a script in root uses {rsconnect} to deploy multiple instances (loop, write temp yml config, deploy)

Simple and works like a charm, though {renv} is often painful.

Comments

Posting Rules

Reply