What advice do folks have for organising projects that will be deployed to production? How do you organise your directories? What do you do if you're deploying multiple "things" (e.g. an app and an api) from the same project?
Comments
Log in with your Bluesky account to leave a comment
we PR model files separately from the API to run them. this means that the model PR doesn't add any functionality, but also lets us separate the stan/python reviews
yeah --- when we can, the first pull request will only contain new stan files. the new files don't get referenced anywhere until the second request that updates the api internals. helps keep PR size/cognitive load manageable (& eng can avoid reviewing stan code lol)
R+Docker, we use an R pkg project structure (R/, man/, tests/, inst/) plus additional top-level folders like `exec/` for docker-executable scripts, `dev/` for devel/sandbox scripts, `reports/` for one-time reports, and `local/` for gitignored large files.
hm, pak solves a ton of pain points that Iβm pretty happy with where itβs at, but itβs not as good as a ~ uv type solution which does frictionless virtual envs
uv creates a project-specific virtual environment with its own python install + package libraries (renv does only the package half of this) and feels fast and frictionless (pak is pretty fast and frictionless for the pkg install part)
Hadley, do you know about Nix? It makes dependency management so smooth, as Nix takes care of not just R package dependencies but also R and system level dependencies in a declarative manner. It's been around for 20 years and works for many languages and tools, see https://docs.ropensci.org/rix/
The equivalent in R would be pak+renv+rig. I would need to learn 3 tools and their quirks vs one tool with uv. I believe this is one of the few places a unified tool excels compared to decoupling, add onto the caveat that the 3 tools in R, from my experience, don't have great interoperability
In python by tradition the package and environment managers are called from the system terminal. I call renv and/or pak from within an R session. I personally come around to like pythons approach when writing build scripts.
DESCRIPTION lists all package dependencies, and we use Config/Needs/App, Config/Needs/API etc to specify dependencies that are only needed by specific parts of projects.
We have one Dockerfile per "thing" and a makefile that builds each docker image + applies tags + pushes to docker repo.
i've championed using R package structure for all projects (including those that aren't actually "packages") because (in approx imptnce order):
- encourages my team to abstract logic into functions over scripts
- easier to write tests for functions, + leverage testing infrastructure from testthat
- easier to document functions than scripts because of roxygen tooling
- easier to manage dependencies via DESCRIPTION file for prod and devel
- able to use a lot of the tooling R devs have written for package development eg pkgload::load_all
- is conventional and standardized across R developers
do you use pkgdown for the documentation? im looking for basically pkgdown but for quarto instead of rmarkdown. ive been using quartodoc for python packages and it's awesome, but haven't seen equivalents in R
internally we mostly use ?help type stuff moreso than pkgdown (primarily because annoying to host a private pkg webpage), but I've definitely used pkgdown in public work.
one of my current takes is that renv is a pain in the ass, we use docker to handle all the pkg alignment stuff in prod and i'd rather be antifragile by having the team encouraged to update to new versions as much as possible for performance/security reasons
- a pipeline image that calls an exec script to build something and put it elsewhere (database, S3 etc)
- an app image + a pipeline image to build the data for the app
- we don't use apis as much right now but have experimented with, would use similar
app.R goes in top level for ease of being able to run shiny::runApp() and have it automatically find the app. the logic and modules all live in R/ (which gets auto-loaded)
Same thing with plumber etc, although occasionally those can go in exec/ instead.
reports/ (internally) are defined as things run only once + stored on git for future comparisons, if the report was a product that was generated on a schedule it would get written to something like S3 instead.
I structure everything as a package. I usually donβt deploy multiple things from the same repo but if I did I would probably put the api in inst/ and have separate dockerfiles for the app and api.
Often I build an app but then biz wants a quick 1 off job but then product wants to turn that into a regular job, etc.
The invocation method is just a thin wrapper around a method in the package. Every app invocation looks the same, every job invocation looks the same, etc.
Usually some form of:
- everything gets built in one repo, organized in subfolders. CI understands these folders and builds and deploys apps
- everything gets its own repo, packaged and sent to some artifact storage, and a central orchestration repo has CI to download and deploy them
The βeverything gets built in one repoβ approach can be fraught with headaches unless there is a team dedicated to wrangling the CI/CD pipeline. I much prefer separating things to their own repos, or at least separating by deployment artifact type.
The tricky thing is that every organization beyond a certain size will have extremely bespoke ways for you to deploy into their environment, so a lot of the time your projects need to change to accommodate that
Right, the particulars of the deployment change, but surely that doesn't affect too much how you lay out your project? (Assuming that you have some automated step to get from standard layout to your bespoke layout)
I've got horror stories -- I worked at a place that hadn't yet refactored away from their initial monorepo, so all apps (even ones deployed as microservices) were hosted in the single repo and had to be formatted in the "standard" structure used by the original app the deploy script was written for
I don't think many tech-forward places are intentionally setting up environments like this, but I think there's similar horror stories at a lot of non tech companies who have slowly accumulated processes over time (and aren't focused on the investments needed to fix it)
Tree of R packages (one per repo) encapsulating distinct parts of the business logic, one metapackage to install them all, and a separate repo with renv, Dockerfile, and deployment tools.
In an extremely general sense, some typical conventions we use:
- Dockerfile at the root of the repository
- .github/workflows/* for CI/CD (typically rebuilding the Docker image and pushing it somewhere where itβll be picked up by prod)
- One script that βdoes the thingβ, e.g., βmain.Rβ at root. Typically the last line in the Dockerfile runs this script.
- A directory called R/ containing all supporting logic used in βmain.Rβ
Youβre welcome! Itβs a great question / thought exercise.
Iβm not sure I understand your question about sharing code across repos. Do you mean with respect to practicing D.R.Y.? Or are you asking how two services with codebases in different repos would talk to each other?
Do you find that resonates with people? My sense is that most people don't want all the infrastructure of a package if they're just deploying a simple dashboard.
βAll the infrastructure of a packageβ is kind of needed for stuff used long-term. One reason I use the package approach is bc of the clear documentation and refined functionality. Itβs easier to adapt a very nice wheel than invent a new one.
I think it's the most reliable thing to do if it's a product expected to be maintained over time. If it is a one-off then likely not. (So, back to the "it depends")
How many people are working on the project? Is the API internal or external to the app? Are you releasing an SDK? How many project components? Very generally, usually each separate component in a repo will be its own versioned project but the repo joins them.
yep exactly. I haven't used uv workspaces yet but most of my projects where I deploy multiple separate libraries/packages together in a single git repo usually follow this similar pattern.
If youβre developing packages that need to be aware of and depend on each other with a team and then exposing that repo to external consumers, itβs much easier to keep everything in one place
I like to write a small CLI as an abstraction point to route βproduction commandsβ to whatever code should run. Works well as an entry point to a docker image too. The CLI script ends up being a nice overview of the functionality of the project as well.
My preferred approach is using R packages + Docker with r-minimal + Makefile and either having a main.R in inst/ with a trigger or cron job or plumber API. But generally my advice is using an internal CRAN and being careful with writing your own job queue logic.
"It depends", what's the size of the team/complexity of the system you're developing. How fast do you expect changes to occur on components, how much stability do you think you can expect in interfaces between components, what is your target deployment architecture ?
I deploy multiple Rmd's, Shiny apps, Blastula emails, and api's out of the same GitLab project. This simplifies issue intake. If it's about the Shiny app, the emailed summary, the api, or the reporting on this broad concept, issues go in one place, with detailed tags, etc.
But the Shiny app, the Markdown library, the api, etc about end of life are all in the same project, each in their own folder. Using renv to manage the libraries helps, because if I need a function in the Shiny app, I probably also need it in the api.
Iβd also advocate for deploying apps, APIs or jobs via dedicated packages in Docker containers. Common tools (e.g. custom shiny modules reused across apps) end up as local packages in a shared base image (self-hosted package repo so far didnβt seem worth the effort).
Also, consistent QA and prod environment are really important. My last project was a shiny app that became pretty complex over time, used box modules, was deployed by copying files to a Shiny Server and had tables with _test suffix for QA. Updates constantly broke stuff, I really missed Docker π
Shiny app is a package installed and imported in app.R in its own dir, separate dir (git submodule) for plumber API. Then a script in root uses {rsconnect} to deploy multiple instances (loop, write temp yml config, deploy)
Simple and works like a charm, though {renv} is often painful.
Comments
I advocate separating deployable units from each other if possible so each be released independently.
The type of deployment (batch job, pubsub service, api, etc) is mostly boilerplate that invokes one or more functions from the library.
lib code -> lib pkg -> app code -> app pkg -> container
Code is stored in a code repository; packages and containers are stored in an artifact repository.
app.R, plumber.R etc go at the top level.
I havenβt tried to use R in a production setting in years.
We have one Dockerfile per "thing" and a makefile that builds each docker image + applies tags + pushes to docker repo.
- encourages my team to abstract logic into functions over scripts
- easier to write tests for functions, + leverage testing infrastructure from testthat
- easier to manage dependencies via DESCRIPTION file for prod and devel
- able to use a lot of the tooling R devs have written for package development eg pkgload::load_all
- is conventional and standardized across R developers
I think the quarto portion you want is here: https://pkgdown.r-lib.org/articles/quarto.html
Why do you put apis and apps at the top-level but reports in a subdirectory? What do you put in inst?
- a pipeline image that calls an exec script to build something and put it elsewhere (database, S3 etc)
- an app image + a pipeline image to build the data for the app
- we don't use apis as much right now but have experimented with, would use similar
Same thing with plumber etc, although occasionally those can go in exec/ instead.
honestly it's 99% because pre-commit's spell check lives here by default and hasn't been moved elsewhere.
The invocation method is just a thin wrapper around a method in the package. Every app invocation looks the same, every job invocation looks the same, etc.
- everything gets built in one repo, organized in subfolders. CI understands these folders and builds and deploys apps
- everything gets its own repo, packaged and sent to some artifact storage, and a central orchestration repo has CI to download and deploy them
and then one of those new repos evolves to have multiple components, and then those get split into their own repo, and the cycle continues π
This is some real Dark Lord Sauron stuff.
- Dockerfile at the root of the repository
- .github/workflows/* for CI/CD (typically rebuilding the Docker image and pushing it somewhere where itβll be picked up by prod)
I see most people pushing images to prod rather than pulling them in. Just interested what you do.
- A directory called R/ containing all supporting logic used in βmain.Rβ
In almost all cases, we take the microservices approach where 1 service (app, API, etc.) == 1 repository
Iβm not sure I understand your question about sharing code across repos. Do you mean with respect to practicing D.R.Y.? Or are you asking how two services with codebases in different repos would talk to each other?
Sometimes directory structure. Sometimes separate coordinatesd packages. Sometimes Docker Compose.
How many people are working on the project? Is the API internal or external to the app? Are you releasing an SDK? How many project components? Very generally, usually each separate component in a repo will be its own versioned project but the repo joins them.
https://github.com/chasemc/presentations/blob/master/RinPharma/rinpharma_aug_2019/chase_RINPHARMA.pdf
There's a top-level project (not sure I'd always use "packages", but bird-feeder could be the app and seeds the API/SDK)
Others much better at this probably have better ideas.
Simple and works like a charm, though {renv} is often painful.