Conversation
camilamaia
left a comment
There was a problem hiding this comment.
Awesome progress on this PR! I have two extra polish suggestions to keep the repo clean, reproducible, and easy to maintain.
1) Keep generated files out of version control (build in CI instead)
Why:
- Reduces review noise and merge conflicts (generated diffs are hard to review).
- Keeps the repo lightweight and avoids stale artifacts.
- Improves reproducibility: a clean clone + build should recreate outputs exactly.
- Makes security reviews clearer (we review source, not compiled/minified outputs).
Proposed changes:
- Add the patterns to
.gitignore(example starter): - Generate artifacts in CI and publish them as build artifacts or attach to releases instead of committing.
2) Ensure a final newline at the end of files (let your editor add it automatically)
Why:
- It’s a long-standing POSIX convention; many tools expect it.
- Prevents noisy diffs and “No newline at end of file” markers.
- Plays nicer with formatters/linters and multi-file concatenation.
Quick fixes in editors:
- VS Code: Settings → search Insert Final Newline → enable
files.insertFinalNewline. - JetBrains (IntelliJ/PyCharm/etc.): Editor → General → Ensure line feed at file end on Save.
- Vim/Neovim: ensure
endofline/fixeolis enabled (:set endofline fixeol). - Sublime: Preferences → Settings →
"ensure_newline_at_eof_on_save": true.
TL;DR
- Let’s ignore generated outputs and build in CI (publish as artifacts/releases).
- Let’s auto-add a final newline on save
There was a problem hiding this comment.
I have a suggestion to improve the maintainability and readability of this script: consider refactoring the repeated code blocks for loading JSON files and building the data dictionary into functions. This would help avoid repetition, make the code easier to update, and simplify testing in the future. One idea of how it could look like:
import json
import csv
STAR_COUNT_FILE = 'stargazerCount.json'
WATCHERS_COUNT_FILE = 'watchersCount.json'
FORK_COUNT_FILE = 'forkCount.json'
LAST_RELEASE_FILE = 'lastReleaseData.json'
CONTRIBUTORS_COUNT_FILE = 'contributorsCount.json'
OPEN_ISSUES_COUNT_FILE = 'openIssuesCount.json'
CLOSED_ISSUES_COUNT_FILE = 'closedIssuesCount.json'
OPEN_PRS_COUNT_FILE = 'openPRsCount.json'
CLOSED_PRS_COUNT_FILE = 'closedPRsCount.json'
COMMUNITY_STANDARDS_FILE = 'communityStandards.json'
SECURITY_FILE = 'security.json'
OUTPUT_FILE = "repo_health_data.csv"
def build_repo_data():
pass
def write_csv(data, filename=OUTPUT_FILE):
pass
def load_json(filepath):
with open(filepath) as f:
return json.load(f)
if __name__ == "__main__":
repo_data_dict = build_repo_data()
write_csv(repo_data_dict)
print(repo_data_dict)You can also create methods to get the nested attributes in a dict, to evaluate if the key is inside the dict... so then you can reuse them and make things easier to read and test.
Co-authored-by: Camila Maia <cmaiacd@gmail.com>
There was a problem hiding this comment.
__pycache__ directories can be safely added to .gitignore. They contain compiled Python bytecode files (.pyc) that are automatically generated by Python to speed up execution, so they don’t need to be versioned. Keeping them out of the repo helps reduce noise in diffs and avoids unnecessary files being tracked.
To ignore them, you can add this to .gitignore:
__pycache__/
*.py[cod]
| path: communityStandards.json | ||
|
|
||
| path: ./files/communityStandards.json | ||
|
|
||
| - name: Generate JSON File with Repo Data Metric | ||
| run: | | ||
| run: | |
There was a problem hiding this comment.
| run: | | |
| run: | |
Initial Python script to read the json files generated by git actions