Missing commit messages and changelogs

Hi all:)
Reviewing many commits to main I noticed lots of commits have either a messy changelog or no message at all.
The ElasticSearch repo had commit messages that really help developers understand what the change was about, yet the new commits leading to the RC1 give little to no info.
I think the flip side is along the lines of this post:

I think no message is really a problem since developers are not keen on skipping between github and the IDE to see what led to the change they are seeing.

Does anyone have proper best practices?
Where should we add those guidelines?

4 Likes

proper git commit messages are a pet-peeve of mine (people hate it if i review their code - i always criticize the commit-msg :laughing:).

here are some very good guidelines on writing commit messages:
my favourite:
https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html

i would probably put these guidelines in a CONTRIBUTING.md file

EDIT: probably the best source of good examples for git commit messages is still the linux kernel (for those who don’t know: git was built for & by the linux kernel community): kernel/git/torvalds/linux.git - Linux kernel source tree
and of course also git itself: Commits · git/git · GitHub

1 Like

First, I’m guessing that this is relevant mostly to OpenSearch or is it an issue in the other repos?

I agree with @ralph about this type of guideline being in CONTRIBUTING.md.

Actually specifying these in the file is not hard, getting folks on-board is going to be harder depending how prescriptive the guidelines are.

The other place for this the PR template but my experience is that check boxes are often just a formality that no one thinks about, especially if it involves messing with existing commits.

1 Like

+1 for this, the only issue is that this is a project-wide issue and shouldn’t be limited to just the OpenSearch repo imo. Should we have some general place for guidelines that involve the whole project?

Also prevalent in opensearch-dashboards and many of the plugins.

When a project has a one liner as the issue description, and a commit message that doesn’t even reference that issue that is resolves - it could be considered gatekeeping, since newcomers won’t know what is going on in the project.

1 Like

So, the one CONTRIBUTING.md to rule them all is challenging. There are just so many differences between languages and repos to make it relevant across all the repos. Not that there shouldn’t be some harmony between the guidelines though.

I tend to agree that erring on the side of verbosity is a net-positive for getting other contributors up-to-speed on the project.

It sounds like there is some really specific occurrences that are being referenced. Has there been comments made on these occurrences?

I made comments on a PR recently, but commenting on the commits after they are committed seems redundant. Also, I am pretty sure only a maintainer could prevent a commit from being merged with an improper message. Policing this is not fun, contributors should be aware that maintaining these messages is in their best interest as well.

I honestly did not even think to check this as it is a very basic and important practice. I just noticed this the other day, checked the last 10+ commits in several repos and 7 times out of 10 the commit message is either missing or a mess.

By “a mess” I mean messages that are comprised of all the commit messages of the PR appended one after the other. This is obvious since the sign-off is layered in.

I would not say this is at all specific, if anything, it is wide-spread. I just never thought developers would submit code without a proper message.

Having said that, if a commit message is redundant because the title is descriptive enough, then perhaps the change is too small for its own PR.

Do you think it best then to add this to the different projects as a new issue, and then each repo’s maintainers would resolve the issue in their own CONTRIBUTING.md?

This seems like a great meta issue. E.g. find a sample of commit and/or PR messages which have insufficient detail and link these in the meta issue - describe what you would have like to have seen.

I’m personally a big believer in addressing behaviour from specific to broad instead of the other way around. The impact is only large enough to affect the greatest change.

Once there is a recommendation, yeah, I think it would have to be on each CONTRIBUTING.

The reason I refrained from doing that so far is to avoid any form of singling out anyone.
I will do my best to frame it in a way that is constructive though:)

1 Like

I get that’s a little hard, but it’s really no different than pointing out a bug in the code :slight_smile:

1 Like

I’m wondering if perhaps proposing a flow where all commits are squashed and rebased prior to merge would make sense. Something like this could be added to the contributing MD file: Always Squash and Rebase your Git Commits

And we could include the squash and rebase step as part of the checklist in the PR template (Along with the DCO check) - That would ensure PR’s have clean, changelog ready commit messages.

Edit: If there was a good commit message as part of the PR template, this could be used by the maintainers to do a “Squash and Merge” from Github, and omit some of the rebase shenanigans which can also be tricky

2 Likes

100% agreed. We should do better.

In the past, I have been held to standard of every commit being reviewable on a PR. If the commit is invalid it should have been squashed prior to the PR being raised or if the changes were requested then the PR should have an interactive rebased. Even if the plan is to have it squashed, there shouldn’t be broken commits out for review.

Descriptions are a must and the commit message should be 50 characters. We should treat commits like a story and if a commit has a vague commit message and no description we have to open up the commit and figure it out. The context might be lost at the point and retracking it will be hard. 50 characters commit message because tools that depend on the 50 characters for best experience.

We can’t amend committed messages with no messages with rewriting history but we should do better to hold this standard going forward. Just like you said,

it could be considered gatekeeping , since newcomers won’t know what is going on in the project.

I think now that we are in the stabilization phases I think this standard will easier to keep track.

3 Likes

Opened Commit messages should be checked as part of the Pull Request · Issue #851 · opensearch-project/OpenSearch · GitHub to continue the discussion over there:)

1 Like

i disagree on that one: i’m a big fan of small, simple, atomic commits. it can happen that if you’re working on something you see something else which is super small but unrelated to what you’re working on => this can become its own commit (obviously) and own PR (because it’s unrelated it shouldn’t be bundled together as otherwise it’ll take longer for it to get merged and it might be confusing as a causation or correlation is presumed where there is none).

i’m very much against squashing multiple commits into one. on the contrary: i’m a firm believer in atomic commits (first search result as a random link for those who’re not familiar with it (note: i didn’t read the whole page, but it looks like an ok description at first glance): Make Atomic Git Commits | Aleksandr Hovhannisyan).
i continuously do interactive rebases when i’m working on things, but not to squash everything into one commit but rather to amend existing commits (usually by using fixup or squash), re-ordering commits (so that they’re logically grouped and i can open separate PRs if i can already split something out ahead of the whole topic). i sometimes also create separate branches, cherry-pick commits there and then create a PR from there - and after that PR has been merged i can rebase my “real” working branch again.

obviously i only do the force-pushes when working on a feature branch alone.

in general i’m also a firm believer in keeping feature-branches short-lived (also avoids all the rebasing/merging in the master branch). thus it’s important to do atomic commits which are well documented: this makes it easier to git-revert them again if for some reason the feature needs to be removed again. and of course the proper commit message is needed to understand if this is actually part of the feature or something else which must be kept in.

and maybe i got a bit off-topic now… i seem to have a tendency to do that, sorry :smiley:
but i do believe that the topics “atomic commits”, “proper commit messages”, “clean commit history”, git-revert, etc. are interlinked.

1 Like

Although atomic commits can be great, this slows down the development velocity because you must constantly think about the context of your change - as you said:

From my experience, it is really hard for many developers to work in an orderly fashion with atomic commits, which may add friction to onboarding. If it were a widespread practice then I probably wouldn’t say so.

I agree with you that some commits warrant a none message, for instance, a spelling fix in a doc, or a rename etc. You are correct that bunching things up when they are unrelated issues can be bad and should be avoided.