SolDevelo · jkondrat · Jun 24, 2026 · Jun 23, 2026
diff --git a/README.md b/README.md
@@ -85,7 +85,7 @@ CONTAINER_SCANNER=docker-scout
 ### 🔍 Scanner Options
 
 InfraScan offers several scanning modes:
-- **regex** (Fast): Quick cost optimization scan (19 regex rules)
+- **regex** (Fast): Quick cost optimization scan (27 regex rules)
 - **containers**: Container vulnerability scanning (Docker Scout or Grype)
 - **checkov**: IaC Security checks only
 - **comprehensive**: All scanners combined (Cost + Security + Containers)
@@ -160,6 +160,7 @@ docker run --rm -v $(pwd):/scan soldevelo/infrascan --framework kubernetes --sca
   - **Explicit framework**: Scan only that specific framework (terraform, kubernetes, etc.).
 - `-f`, `--include`: Select specific files or directories to scan. Can be used multiple times (e.g., `-f dir1 -f file2.tf`). This is useful in large repositories to avoid scanning redundant or test deployments.
 - `--download-external-modules`: Allow Checkov to download external modules (Terraform/etc)
+- `--traffic-profile`: `auto`, `small`, `medium`, `large` (default: `auto`). Controls usage-based cost assumptions for NAT transfer, CloudWatch log ingestion, Lambda invocations, S3 storage, and API calls. `auto` detects the profile from infra size (EC2/NAT/Lambda/RDS counts). Profiles are defined in `reporter/traffic_profiles.json` and can be edited without code changes.
 - `--fail-on`: Exit code 1 when: `any` findings, `high_critical` findings, specific grade threshold (`grade_a` through `grade_f`), or priority threshold (`priority_critical` through `priority_info`). Fails if the result matches or is worse than the specified criteria.
 
 #### Selective Scanning (Partial Scans)
@@ -286,7 +287,81 @@ InfraScan supports advanced container scanning features:
   - **Other Registries**: Pre-authenticate manually using `docker login` before running InfraScan, and it will use your existing local Docker credentials.
 
 
-## 📊 Grading System
+## � Cost Estimation
+
+InfraScan calculates actual dollar savings for every finding — not just static text like "$10-50/month", but a computed before/after cost derived from real AWS pricing.
+
+### How it works
+
+1. **Pricing table** (`reporter/pricing_table.json`) — static AWS `us-east-1` prices for EC2, RDS, EBS, NAT Gateway, Lambda, API Gateway, CloudWatch, S3, DynamoDB, SQS, Fargate, Kinesis, and more. Updated on each InfraScan release.
+2. **Per-rule savings models** — every COST-* rule has a `savings_fn` that reads the actual HCL config (instance type, volume size, RCU/WCU, etc.) and computes a precise before/after cost.
+3. **Per-resource total cost** — InfraScan also computes the monthly cost of every resource found, giving a total infrastructure cost estimate and a savings-as-%-of-total headline.
+4. **Traffic profile** — usage-based resources (NAT transfer, Lambda invocations, CW log ingestion) use configurable defaults from `reporter/usage_defaults.json`, scaled by the active traffic profile.
+
+### Traffic profiles
+
+| Profile | NAT transfer/day | CW log ingestion/mo | Lambda invocations/function/mo | S3 storage |
+|---|---|---|---|---|
+| `small` (auto-detected default for small infra) | 10 GB | 5 GB | 1M | 50 GB |
+| `medium` | 100 GB | 50 GB | 10M | 500 GB |
+| `large` | 1 TB | 500 GB | 100M | 5,000 GB |
+
+The `auto` mode (default) **detects the profile automatically** from the scanned repo: it scores the infra by counting EC2 instances, NAT gateways, load balancers, RDS instances, Lambda functions, and ECS tasks. Large instance types (8xlarge+) add extra weight. No manual flag needed in most cases.
+
+```bash
+# Let InfraScan auto-detect the profile (recommended)
+docker run --rm -v $(pwd):/scan soldevelo/infrascan --scanner regex
+
+# Force a profile when auto-detection doesn't match your actual traffic
+docker run --rm -v $(pwd):/scan soldevelo/infrascan --scanner regex --traffic-profile medium
+```
+
+### Customising defaults
+
+Edit `reporter/usage_defaults.json` or `reporter/traffic_profiles.json` directly — no Python changes needed. This is useful when you know your actual traffic numbers:
+
+```json
+// reporter/usage_defaults.json — Tier 1 baseline assumptions
+{
+  "nat_gb_per_day": 10.0,
+  "lambda_invocations_per_mo": 1000000,
+  ...
+}
+```
+
+### Confidence levels
+
+- 🟢 **high** — derived entirely from config (instance type, volume size, Multi-AZ flag)
+- 🟡 **medium** — requires one usage assumption (invocation count, transfer volume)
+- ⚪ **low** — governance rules with no direct cost delta, or highly variable resources
+
+### PR comments
+
+When running in GitHub Actions with `GITHUB_TOKEN` set, InfraScan posts a comment on the PR **only when there are actual cost savings to act on** (i.e., `low_usd_month > 0`). The comment also includes the top 3 critical/high security findings so reviewers get a full health check in one place:
+
+> **🔍 InfraScan Report**
+>
+> | Metric | Value |
+> |---|---|
+> | Estimated monthly infrastructure cost | **$6,941** |
+> | Potential savings (low) | **$4,999/mo** (72.0%) |
+> | Potential savings (high) | **$5,469/mo** (78.8%) |
+> | Overall grade | **C (71.7%)** |
+>
+> **💰 Top cost savings opportunities**
+> | Rule | File | Saving/month |
+> |---|---|---|
+> | COST-005 | main.tf:46 | $1,415.25 |
+> | COST-027 | main.tf:46 | $270.00 |
+> | COST-012 | main.tf:11 | $587.65–$1,057.77 |
+>
+> **🔒 Top security issues (critical/high)**
+> | Severity | Rule | Location |
+> |---|---|---|
+> | 🔴 CRITICAL | CKV_AWS_8 | ec2.tf:21 |
+> | 🟠 HIGH | CKV_AWS_3 | s3.tf:14 |
+
+## �📊 Grading System
 
 InfraScan provides four separate grades:
 

diff --git a/app.py b/app.py
@@ -421,7 +421,8 @@ def clone_repo():
             findings=results,
             resource_count=resource_count,
             scanner_type=scanner_type,
-            extra_recommendations=recommendations
+            extra_recommendations=recommendations,
+            scan_path=temp_dir
         )
 
         # Extract repository name from URL for display
@@ -544,7 +545,8 @@ def scan_repository(repo_url, branch='main', scanner_type='comprehensive', is_pr
             findings=results,
             resource_count=resource_count,
             scanner_type=scanner_type,
-            extra_recommendations=recommendations
+            extra_recommendations=recommendations,
+            scan_path=temp_dir
         )
 
         repo_name = repo_url.rstrip('/').split('/')[-1] if '/' in repo_url else repo_url
@@ -628,7 +630,8 @@ def save_results():
         'cost': data.get('cost'),
         'security': data.get('security'),
         'container': data.get('container'),
-        'analysis': data.get('analysis')
+        'analysis': data.get('analysis'),
+        'metrics': data.get('metrics'),
     }
 
     # Ensure is_private is preserved in metadata

diff --git a/cli.py b/cli.py
@@ -36,6 +36,55 @@ def send_slack_notification(message: str) -> None:
     except Exception as e:
         print(f"Slack notification error: {e}", file=sys.stderr)
 
+def post_pr_comment(body: str) -> None:
+    """Post (or update) a PR comment via the GitHub REST API."""
+    token      = os.getenv('GITHUB_TOKEN', '').strip()
+    event_path = os.getenv('GITHUB_EVENT_PATH', '').strip()
+    repo       = os.getenv('GITHUB_REPOSITORY', '').strip()
+    if not (token and event_path and repo):
+        return
+    try:
+        with open(event_path, 'r', encoding='utf-8') as f:
+            event = json.load(f)
+        pr_number = (
+            event.get('pull_request', {}).get('number')
+            or event.get('issue', {}).get('number')
+        )
+        if not pr_number:
+            return
+        marker   = '<!-- infrascan-cost-report -->'
+        full_body = f"{marker}\n{body}"
+        api_url  = f"https://api.github.com/repos/{repo}/issues/{pr_number}/comments"
+        headers  = {
+            'Authorization': f'Bearer {token}',
+            'Accept': 'application/vnd.github+json',
+            'X-GitHub-Api-Version': '2022-11-28',
+        }
+        # Check for an existing comment with the marker to update rather than post duplicate.
+        existing_resp = requests.get(api_url, headers=headers, timeout=10)
+        if existing_resp.status_code == 200:
+            for comment in existing_resp.json():
+                if marker in comment.get('body', ''):
+                    patch_url = comment['url']
+                    requests.patch(patch_url, json={'body': full_body}, headers=headers, timeout=10)
+                    return
+        requests.post(api_url, json={'body': full_body}, headers=headers, timeout=10)
+    except Exception as e:
+        print(f"PR comment error: {e}", file=sys.stderr)
+
+
+def write_gh_step_summary(content: str) -> None:
+    """Append *content* to the GitHub Actions step summary file."""
+    summary_path = os.getenv('GITHUB_STEP_SUMMARY', '').strip()
+    if not summary_path:
+        return
+    try:
+        with open(summary_path, 'a', encoding='utf-8') as f:
+            f.write(content + '\n')
+    except Exception as e:
+        print(f"Step summary write error: {e}", file=sys.stderr)
+
+
 def build_gh_actions_context() -> dict:
     """Extract GitHub Actions context from environment variables."""
     repo = os.getenv('GITHUB_REPOSITORY', '')
@@ -116,7 +165,16 @@ def setup_args():
         version=f"InfraScan v{__version__}",
         help="Show version information and exit"
     )
-
+
+    parser.add_argument(
+        "--traffic-profile",
+        choices=["auto", "small", "medium", "large"],
+        default="auto",
+        dest="traffic_profile",
+        help="Usage-based cost scaling profile (default: auto — detected from infra size). "
+             "small=10GB/d NAT, medium=100GB/d, large=1TB/d."
+    )
+
     return parser.parse_args()
 
 def print_text_report(report_dict, resource_count, scanner_type):
@@ -234,6 +292,56 @@ def print_grade_line(name, grade):
     print(f"\n{Fore.CYAN}{Style.BRIGHT}{'=' * 60}\n")
 
 
+def _print_savings_block(report_dict: dict) -> None:
+    """Print the '💰 Estimated Savings' block after the grading summary."""
+    init(autoreset=True)
+    est = (report_dict.get('metrics') or {}).get('savings_estimate')
+    if not est:
+        return
+
+    low   = est.get('low_usd_month', 0)
+    high  = est.get('high_usd_month', 0)
+    total = est.get('total_infra_cost_usd_month')
+    pct_lo_det = est.get('savings_pct_of_detectable_low')
+    pct_hi_det = est.get('savings_pct_of_detectable_high')
+    pct_lo_tot = est.get('savings_pct_of_total_low')
+    pct_hi_tot = est.get('savings_pct_of_total_high')
+    profile    = (report_dict.get('metrics') or {}).get('traffic_profile', 'small')
+    provider   = est.get('cost_provider', 'internal')
+
+    print(f"\n{Fore.GREEN}{Style.BRIGHT}💰 ESTIMATED SAVINGS:")
+    print(f"{'-' * 30}")
+
+    if low == high:
+        print(f"  Potential saving: {Fore.GREEN}{Style.BRIGHT}${low:,.2f}/month{Style.RESET_ALL}")
+    else:
+        print(f"  Potential saving: {Fore.GREEN}{Style.BRIGHT}${low:,.2f} – ${high:,.2f}/month{Style.RESET_ALL}")
+
+    if pct_lo_tot is not None and pct_hi_tot is not None:
+        print(f"  vs total infra cost: {Fore.YELLOW}{pct_lo_tot}% – {pct_hi_tot}%{Style.RESET_ALL}", end='')
+        if total:
+            print(f"  (total: ${total:,.0f}/mo)", end='')
+        print()
+    elif pct_lo_det is not None:
+        print(f"  vs detectable resources: {Fore.YELLOW}{pct_lo_det}% – {pct_hi_det}%{Style.RESET_ALL}")
+
+    print(f"  Traffic profile: {profile} | Pricing source: {provider}")
+
+    per = sorted(
+        est.get('per_finding', []),
+        key=lambda f: f.get('saving_high', 0), reverse=True
+    )[:3]
+    if per:
+        print(f"  {Style.BRIGHT}Top opportunities:{Style.RESET_ALL}")
+        for pf in per:
+            s_lo = pf.get('saving_low', 0)
+            s_hi = pf.get('saving_high', 0)
+            saving_str = f"${s_lo:,.2f}" if s_lo == s_hi else f"${s_lo:,.2f}–${s_hi:,.2f}"
+            import os as _os
+            fname = _os.path.basename(pf.get('file', ''))
+            print(f"    • {pf.get('rule_id', '')}: {saving_str}/mo  ({fname}:{pf.get('line', '')})")
+
+
 def should_fail(args, report_dict, results):
     if not args.fail_on:
         return False
@@ -308,7 +416,9 @@ def main():
             findings=results,
             resource_count=resource_count,
             scanner_type=args.scanner,
-            extra_recommendations=recommendations
+            extra_recommendations=recommendations,
+            scan_path=target_path,
+            traffic_profile=getattr(args, 'traffic_profile', 'auto'),
         )
 
         report_dict = report.to_dict()
@@ -350,8 +460,41 @@ def main():
             # If format is text OR if output is saved to record/html/json
             # always show the text summary in the console
             print_text_report(report_dict, resource_count, args.scanner)
+            _print_savings_block(report_dict)
             if args.out:
                  print(f"{Fore.GREEN}[v] Full {args.format.upper()} report saved to: {Fore.WHITE}{args.out}")
+
+        # Phase 3 — GitHub Actions step summary
+        savings_est = (report_dict.get('metrics') or {}).get('savings_estimate')
+        overall_g   = report_dict.get('overall', {})
+        if savings_est and os.getenv('GITHUB_STEP_SUMMARY'):
+            from reporter.cost_estimator import format_savings_summary_md
+            summary_md = format_savings_summary_md(
+                savings_est,
+                overall_grade=overall_g.get('letter'),
+                overall_pct=overall_g.get('percentage'),
+                security_findings=report_dict.get('findings', {}).get('security', []),
+                container_findings=report_dict.get('findings', {}).get('container', []),
+            )
+            write_gh_step_summary(summary_md)
+
+        # Phase 3 — PR comment: only when there are actual savings to act on
+        has_savings = savings_est and savings_est.get('low_usd_month', 0) > 0
+        has_security = bool(
+            report_dict.get('findings', {}).get('security') or
+            report_dict.get('findings', {}).get('container')
+        )
+        if (has_savings or has_security) and os.getenv('GITHUB_TOKEN') and os.getenv('GITHUB_EVENT_PATH'):
+            from reporter.cost_estimator import format_savings_summary_md
+            comment_md = format_savings_summary_md(
+                savings_est if has_savings else {},
+                overall_grade=overall_g.get('letter'),
+                overall_pct=overall_g.get('percentage'),
+                security_findings=report_dict.get('findings', {}).get('security', []),
+                container_findings=report_dict.get('findings', {}).get('container', []),
+            )
+            if comment_md:
+                post_pr_comment(comment_md)
 
         # Send Slack notification if configured
         webhook_url = os.getenv('SLACK_WEBHOOK_URL', '').strip()
@@ -386,6 +529,18 @@ def main():
                 lines.append(f"Triggered by: {ctx['actor']}")
             lines.append(f"Grades: {grades_summary}")
             lines.append(f"Findings: {total_findings} | Scanner: {args.scanner}")
+
+            # Cost savings summary (if available)
+            slack_savings = (report_dict.get('metrics') or {}).get('savings_estimate')
+            if slack_savings:
+                s_lo  = slack_savings.get('low_usd_month', 0)
+                s_hi  = slack_savings.get('high_usd_month', 0)
+                total_c = slack_savings.get('total_infra_cost_usd_month')
+                if total_c:
+                    lines.append(f"Infra cost: ~${total_c:,.0f}/mo | Potential savings: ${s_lo:,.0f}–${s_hi:,.0f}/mo")
+                else:
+                    lines.append(f"Potential savings: ${s_lo:,.0f}–${s_hi:,.0f}/mo")
+
             if ctx['run_url']:
                 lines.append(f"<{ctx['run_url']}|View run>")