I get to propose a variable naming convention. Give me your "I'll die on this hill" opinions about variable names #databs #rstats - ThreadSky

bharrap.bsky.social • 1 day ago

I get to propose a variable naming convention.

Give me your "I'll die on this hill" opinions about variable names #databs #rstats

Comments

christopherjarvis.bsky.social•1 day ago

Variables must be named after the author in numerical order. Chris chris1 chris2 chris3 if you need to make a new one earlier on then add a letter chris1a

flemn8r.bsky.social•1 day ago

Names can be too long as well as too short

chr1sw.bsky.social•1 day ago

x, y, z are very underrated! But a comment at definition can be helpful

betasci.bsky.social•1 day ago

Variables should include units when appropriate, like "start_time_utc" or "elapsed_duration_min".

Abbreviations should include what convention they follow like "fips_state".

Foreign key ids should be labeled as [foreign_table_name]_id.

All of this ensures the context is immediately understood.

bharrap.bsky.social•1 day ago

Love it!

rshean.bsky.social•1 day ago

Absolutely no dot.case.variables in R, I know #rstats allows it, but it looks way too much like object.method() oop stuff in other languages. Even though I started with R before switching to #python, it still confuses me at first glance when I see it. (1/10)

healthstatsdude.bsky.social•14 hours ago

and this is exactly why i use = in place of <-

😎

rshean.bsky.social•1 day ago

I don't like javaScriptStyleVars because iThinkItsHardToRead. Good general rule of thumb is don't take best practices for anything from #JavaScript 🤣 (kidding, but only sorta. It's WILD some of the things js will accept and execute without erroring out) (2/10)

rshean.bsky.social•1 day ago

I like_under_scores, because they're easy_to_read. I don't buy the complaint about them being too much effort to type. I would say ease of reading is waaay more valuable than avoiding using the shift key.
(3/10)

rshean.bsky.social•1 day ago

Don't do what I did in my thesis and name variables after random farm animals like cow, sheep1, sheep2 because it's the middle of the night and you think it's funny. Use descriptive names so you can figure out what you were doing the next morning... (4/10)

rshean.bsky.social•1 day ago

...Doesn't matter if your advisor only looks at the outputs and not the code. YOU'LL need to look at the code when your advisor doesn't like the outputs and you have to edit the code. (5/10)

rshean.bsky.social•1 day ago

Not exactly a naming convention, but related to variable naming: if you have a billion intermediate variables in the global environment, consider writing some functions, using pipes, or overwriting variables instead of adding df_23 to the 22 other intermediate dfs. (6/10)

bharrap.bsky.social•1 day ago

Yessssss underscores for life

econmaett.bsky.social•19 hours ago

100% agree

jonathankitt.bsky.social•14 hours ago

Choose wisely the order of words in the variables, to get the most out of auto-completion: "bill_length", "bill_depth" rather than "length_bill", "depth_bill".

rambeaux.bsky.social•1 day ago

You should be brutally consistent in applying a specific_to_general or general_to_specific ordering convention!

Similar to how adjectives must be in the same order in natural language.

rambeaux.bsky.social•1 day ago

It means you will be able to scan code and mentally map related variables by prefix_context_suffix with less cognitive overhead.

lucas.meyerperin.org•1 day ago

All caps. For example, if an artist released an album…

ANALBUMRELEASED

indy.bsky.social•1 day ago

Does this variable name explain enough to me when I come back in a year's time?

brueckmann.bsky.social•1 day ago

📌

emilyriederer.bsky.social•1 day ago

Dataset column names or script names?

My hills I’ll die on: https://www.emilyriederer.com/post/column-name-contracts/

elyobo.bsky.social•1 day ago

Reasonable hill selection!

bharrap.bsky.social•1 day ago

Columns! Thanks for the link I'll read it at work tomorrow 😊

emilyriederer.bsky.social•1 day ago

To summarize (I do tend to ramble), I like to break down properties of my data into tiers described by keywords and do like

{tier 1}_{tier 2}_{etc}

I’m somewhat agnostic to the ordering of those, but post has some examples how it makes things like autocomplete or bulk aggregation easy

bharrap.bsky.social•1 day ago

Yes this is part of my plan!

I'm thinking {survey wave}_{topic}_{abbreviated question wording}_{optional identifier for free text/other/question sets}

francismarkham.work•1 day ago

If you're doing longitudinal data, the Hilda convention of having a single letter prefix is quite handy, it means you can drop the first letter if you want to lose it.

But not sure what they'll do in wave 27...

bharrap.bsky.social•1 day ago

I hate letters for wave prefixes... Just write w1_, w2_ etc.

Much easier to code with, much more explicit, and I don't have to keep going "hmmm what number is j again?"

gdeejay.bsky.social•1 day ago

I've been thinking of this alot recently as I want to add to my prefix convention.

I saw somebody say their always name their functions in a verb like fashion to describe what they do.

I'd love to hear people's ideas!

gdeejay.bsky.social•1 day ago

Hadley's object syntax is snake case using nouns for variable names and verbs for functions:
https://style.tidyverse.org/syntax.html

gdeejay.bsky.social•1 day ago

I kind of think good variable naming needs to always be related to the object name.

For instance, a variable named 'age' will have a different meaning for a dataframe named 'dta_antiques' vs. 'dta_students'.

Obvious as hell, but I didn't see guides make this point.

bharrap.bsky.social•1 day ago

At the moment I'm planning to encode survey wave (it's longitudinal), topic, and question description in the name.

My priority right now is making the names human readable (within reason) and to avoid excessive reference to the dictionary

gdeejay.bsky.social•1 day ago

Have you considered using variable descriptions as well?

I haven't mastered managing them, but I need to as they're pretty handy.

bharrap.bsky.social•1 day ago

Do you mean the variable type? Like strings, factors etc?

gdeejay.bsky.social•1 day ago

I searched for example and didn't find much. This was kind of interesting:
https://thenewstack.io/best-practices-for-naming-variables-what-the-research-shows/

gdeejay.bsky.social•1 day ago

Oh, and this is my prefix convention:

gdeejay.bsky.social•1 day ago

Oh no!

wpball.com•1 day ago

Boring, but it's a bit of a balance between human- and machine-readability for me.

I dislike very cryptic naming, or where you need to refer to a separate document to decipher what is going on, but that can result in long column names!

wpball.com•1 day ago

I'm a underscore_variable kinda guy and I arbitrarily dislike ALL_CAPS

bharrap.bsky.social•1 day ago

Yes the _ is key for me! Makes it really easy to work with variable names in functions like pivot_wider()

danolner.bsky.social•1 day ago

I like to have absolutely no consistency whatsover and change between all possible conventions within the same script. If I remember to think about it, separating with a period is just easier to type. I also like to replace ends of words with a "z" to avoid command names, added bonus of being cute.

bharrap.bsky.social•1 day ago

I'm going to have to take all these shitpost responses and produce a "what not to do" guide

johntormerod.bsky.social•1 day ago

I use Hungerian notation... mX for matrix X, vx for vector x. Have used it for two decades. I know it is weird, but I like it.

chr1sw.bsky.social•1 day ago

#NoMoreCamelCase

bharrap.bsky.social•1 day ago

Absolutely not!

themeepone.bsky.social•1 day ago

Variable name length should be proportional to (log of) its scope length.

Comments

Posting Rules

Reply