Skip to content

Initial python script to read json files#3

Open
maitecr wants to merge 7 commits intomainfrom
feat/readjson
Open

Initial python script to read json files#3
maitecr wants to merge 7 commits intomainfrom
feat/readjson

Conversation

@maitecr
Copy link
Copy Markdown
Member

@maitecr maitecr commented Aug 15, 2025

Initial Python script to read the json files generated by git actions

@maitecr maitecr requested a review from camilamaia August 15, 2025 01:42
@maitecr maitecr requested review from a team as code owners August 15, 2025 01:42
Copy link
Copy Markdown
Member

@camilamaia camilamaia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome progress on this PR! I have two extra polish suggestions to keep the repo clean, reproducible, and easy to maintain.

1) Keep generated files out of version control (build in CI instead)

Why:

  • Reduces review noise and merge conflicts (generated diffs are hard to review).
  • Keeps the repo lightweight and avoids stale artifacts.
  • Improves reproducibility: a clean clone + build should recreate outputs exactly.
  • Makes security reviews clearer (we review source, not compiled/minified outputs).

Proposed changes:

  • Add the patterns to .gitignore (example starter):
  • Generate artifacts in CI and publish them as build artifacts or attach to releases instead of committing.

2) Ensure a final newline at the end of files (let your editor add it automatically)

Why:

  • It’s a long-standing POSIX convention; many tools expect it.
  • Prevents noisy diffs and “No newline at end of file” markers.
  • Plays nicer with formatters/linters and multi-file concatenation.

Quick fixes in editors:

  • VS Code: Settings → search Insert Final Newline → enable files.insertFinalNewline.
  • JetBrains (IntelliJ/PyCharm/etc.): Editor → General → Ensure line feed at file end on Save.
  • Vim/Neovim: ensure endofline/fixeol is enabled (:set endofline fixeol).
  • Sublime: Preferences → Settings → "ensure_newline_at_eof_on_save": true.

TL;DR

  • Let’s ignore generated outputs and build in CI (publish as artifacts/releases).
  • Let’s auto-add a final newline on save

Comment thread scripts/generate_csv.py Outdated
Comment thread scripts/generate_csv.py Outdated
Comment thread scripts/generate_csv.py Outdated
Comment thread scripts/generate_csv.py Outdated
Comment thread scripts/generate_csv.py Outdated
Comment thread scripts/generate_csv.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a suggestion to improve the maintainability and readability of this script: consider refactoring the repeated code blocks for loading JSON files and building the data dictionary into functions. This would help avoid repetition, make the code easier to update, and simplify testing in the future. One idea of how it could look like:

import json
import csv

STAR_COUNT_FILE = 'stargazerCount.json'
WATCHERS_COUNT_FILE = 'watchersCount.json'
FORK_COUNT_FILE = 'forkCount.json'
LAST_RELEASE_FILE = 'lastReleaseData.json'
CONTRIBUTORS_COUNT_FILE = 'contributorsCount.json'
OPEN_ISSUES_COUNT_FILE = 'openIssuesCount.json'
CLOSED_ISSUES_COUNT_FILE = 'closedIssuesCount.json'
OPEN_PRS_COUNT_FILE = 'openPRsCount.json'
CLOSED_PRS_COUNT_FILE = 'closedPRsCount.json'
COMMUNITY_STANDARDS_FILE = 'communityStandards.json'
SECURITY_FILE = 'security.json'

OUTPUT_FILE = "repo_health_data.csv"

def build_repo_data():
    pass

def write_csv(data, filename=OUTPUT_FILE):
    pass

def load_json(filepath):
    with open(filepath) as f:
        return json.load(f)

if __name__ == "__main__":
    repo_data_dict = build_repo_data()
    write_csv(repo_data_dict)
    print(repo_data_dict)

You can also create methods to get the nested attributes in a dict, to evaluate if the key is inside the dict... so then you can reuse them and make things easier to read and test.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__pycache__ directories can be safely added to .gitignore. They contain compiled Python bytecode files (.pyc) that are automatically generated by Python to speed up execution, so they don’t need to be versioned. Keeping them out of the repo helps reduce noise in diffs and avoids unnecessary files being tracked.

To ignore them, you can add this to .gitignore:

__pycache__/
*.py[cod]

path: communityStandards.json

path: ./files/communityStandards.json
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change


- name: Generate JSON File with Repo Data Metric
run: |
run: |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
run: |
run: |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants