From f81fb13a751641d4f9df681009d9f7878a1e477e Mon Sep 17 00:00:00 2001 From: Jakub Kondrat Date: Tue, 23 Jun 2026 15:47:02 +0200 Subject: [PATCH] feat(cost): implement cost estimation --- README.md | 79 +- app.py | 9 +- cli.py | 159 ++- docs/COST_ESTIMATION.md | 160 +++ reporter/cost_estimator.py | 1751 ++++++++++++++++++++++++++++++++ reporter/grading.py | 70 +- reporter/pricing_table.json | 153 +++ reporter/traffic_profiles.json | 24 + reporter/usage_defaults.json | 23 + rules/definitions.py | 6 +- scripts/update_pricing.py | 460 +++++++++ static/app.js | 238 ++++- static/pdf_generator.js | 51 +- static/style.css | 187 ++++ 14 files changed, 3323 insertions(+), 47 deletions(-) create mode 100644 docs/COST_ESTIMATION.md create mode 100644 reporter/cost_estimator.py create mode 100644 reporter/pricing_table.json create mode 100644 reporter/traffic_profiles.json create mode 100644 reporter/usage_defaults.json create mode 100755 scripts/update_pricing.py diff --git a/README.md b/README.md index e2f01ba..b7d9b3c 100644 --- a/README.md +++ b/README.md @@ -85,7 +85,7 @@ CONTAINER_SCANNER=docker-scout ### πŸ” Scanner Options InfraScan offers several scanning modes: -- **regex** (Fast): Quick cost optimization scan (19 regex rules) +- **regex** (Fast): Quick cost optimization scan (27 regex rules) - **containers**: Container vulnerability scanning (Docker Scout or Grype) - **checkov**: IaC Security checks only - **comprehensive**: All scanners combined (Cost + Security + Containers) @@ -160,6 +160,7 @@ docker run --rm -v $(pwd):/scan soldevelo/infrascan --framework kubernetes --sca - **Explicit framework**: Scan only that specific framework (terraform, kubernetes, etc.). - `-f`, `--include`: Select specific files or directories to scan. Can be used multiple times (e.g., `-f dir1 -f file2.tf`). This is useful in large repositories to avoid scanning redundant or test deployments. - `--download-external-modules`: Allow Checkov to download external modules (Terraform/etc) +- `--traffic-profile`: `auto`, `small`, `medium`, `large` (default: `auto`). Controls usage-based cost assumptions for NAT transfer, CloudWatch log ingestion, Lambda invocations, S3 storage, and API calls. `auto` detects the profile from infra size (EC2/NAT/Lambda/RDS counts). Profiles are defined in `reporter/traffic_profiles.json` and can be edited without code changes. - `--fail-on`: Exit code 1 when: `any` findings, `high_critical` findings, specific grade threshold (`grade_a` through `grade_f`), or priority threshold (`priority_critical` through `priority_info`). Fails if the result matches or is worse than the specified criteria. #### Selective Scanning (Partial Scans) @@ -286,7 +287,81 @@ InfraScan supports advanced container scanning features: - **Other Registries**: Pre-authenticate manually using `docker login` before running InfraScan, and it will use your existing local Docker credentials. -## πŸ“Š Grading System +## οΏ½ Cost Estimation + +InfraScan calculates actual dollar savings for every finding β€” not just static text like "$10-50/month", but a computed before/after cost derived from real AWS pricing. + +### How it works + +1. **Pricing table** (`reporter/pricing_table.json`) β€” static AWS `us-east-1` prices for EC2, RDS, EBS, NAT Gateway, Lambda, API Gateway, CloudWatch, S3, DynamoDB, SQS, Fargate, Kinesis, and more. Updated on each InfraScan release. +2. **Per-rule savings models** β€” every COST-* rule has a `savings_fn` that reads the actual HCL config (instance type, volume size, RCU/WCU, etc.) and computes a precise before/after cost. +3. **Per-resource total cost** β€” InfraScan also computes the monthly cost of every resource found, giving a total infrastructure cost estimate and a savings-as-%-of-total headline. +4. **Traffic profile** β€” usage-based resources (NAT transfer, Lambda invocations, CW log ingestion) use configurable defaults from `reporter/usage_defaults.json`, scaled by the active traffic profile. + +### Traffic profiles + +| Profile | NAT transfer/day | CW log ingestion/mo | Lambda invocations/function/mo | S3 storage | +|---|---|---|---|---| +| `small` (auto-detected default for small infra) | 10 GB | 5 GB | 1M | 50 GB | +| `medium` | 100 GB | 50 GB | 10M | 500 GB | +| `large` | 1 TB | 500 GB | 100M | 5,000 GB | + +The `auto` mode (default) **detects the profile automatically** from the scanned repo: it scores the infra by counting EC2 instances, NAT gateways, load balancers, RDS instances, Lambda functions, and ECS tasks. Large instance types (8xlarge+) add extra weight. No manual flag needed in most cases. + +```bash +# Let InfraScan auto-detect the profile (recommended) +docker run --rm -v $(pwd):/scan soldevelo/infrascan --scanner regex + +# Force a profile when auto-detection doesn't match your actual traffic +docker run --rm -v $(pwd):/scan soldevelo/infrascan --scanner regex --traffic-profile medium +``` + +### Customising defaults + +Edit `reporter/usage_defaults.json` or `reporter/traffic_profiles.json` directly β€” no Python changes needed. This is useful when you know your actual traffic numbers: + +```json +// reporter/usage_defaults.json β€” Tier 1 baseline assumptions +{ + "nat_gb_per_day": 10.0, + "lambda_invocations_per_mo": 1000000, + ... +} +``` + +### Confidence levels + +- 🟒 **high** β€” derived entirely from config (instance type, volume size, Multi-AZ flag) +- 🟑 **medium** β€” requires one usage assumption (invocation count, transfer volume) +- βšͺ **low** β€” governance rules with no direct cost delta, or highly variable resources + +### PR comments + +When running in GitHub Actions with `GITHUB_TOKEN` set, InfraScan posts a comment on the PR **only when there are actual cost savings to act on** (i.e., `low_usd_month > 0`). The comment also includes the top 3 critical/high security findings so reviewers get a full health check in one place: + +> **πŸ” InfraScan Report** +> +> | Metric | Value | +> |---|---| +> | Estimated monthly infrastructure cost | **$6,941** | +> | Potential savings (low) | **$4,999/mo** (72.0%) | +> | Potential savings (high) | **$5,469/mo** (78.8%) | +> | Overall grade | **C (71.7%)** | +> +> **πŸ’° Top cost savings opportunities** +> | Rule | File | Saving/month | +> |---|---|---| +> | COST-005 | main.tf:46 | $1,415.25 | +> | COST-027 | main.tf:46 | $270.00 | +> | COST-012 | main.tf:11 | $587.65–$1,057.77 | +> +> **πŸ”’ Top security issues (critical/high)** +> | Severity | Rule | Location | +> |---|---|---| +> | πŸ”΄ CRITICAL | CKV_AWS_8 | ec2.tf:21 | +> | 🟠 HIGH | CKV_AWS_3 | s3.tf:14 | + +## οΏ½πŸ“Š Grading System InfraScan provides four separate grades: diff --git a/app.py b/app.py index 1fad99b..c8ccd41 100644 --- a/app.py +++ b/app.py @@ -421,7 +421,8 @@ def clone_repo(): findings=results, resource_count=resource_count, scanner_type=scanner_type, - extra_recommendations=recommendations + extra_recommendations=recommendations, + scan_path=temp_dir ) # Extract repository name from URL for display @@ -544,7 +545,8 @@ def scan_repository(repo_url, branch='main', scanner_type='comprehensive', is_pr findings=results, resource_count=resource_count, scanner_type=scanner_type, - extra_recommendations=recommendations + extra_recommendations=recommendations, + scan_path=temp_dir ) repo_name = repo_url.rstrip('/').split('/')[-1] if '/' in repo_url else repo_url @@ -628,7 +630,8 @@ def save_results(): 'cost': data.get('cost'), 'security': data.get('security'), 'container': data.get('container'), - 'analysis': data.get('analysis') + 'analysis': data.get('analysis'), + 'metrics': data.get('metrics'), } # Ensure is_private is preserved in metadata diff --git a/cli.py b/cli.py index ff89680..4be0f6c 100755 --- a/cli.py +++ b/cli.py @@ -36,6 +36,55 @@ def send_slack_notification(message: str) -> None: except Exception as e: print(f"Slack notification error: {e}", file=sys.stderr) +def post_pr_comment(body: str) -> None: + """Post (or update) a PR comment via the GitHub REST API.""" + token = os.getenv('GITHUB_TOKEN', '').strip() + event_path = os.getenv('GITHUB_EVENT_PATH', '').strip() + repo = os.getenv('GITHUB_REPOSITORY', '').strip() + if not (token and event_path and repo): + return + try: + with open(event_path, 'r', encoding='utf-8') as f: + event = json.load(f) + pr_number = ( + event.get('pull_request', {}).get('number') + or event.get('issue', {}).get('number') + ) + if not pr_number: + return + marker = '' + full_body = f"{marker}\n{body}" + api_url = f"https://api.github.com/repos/{repo}/issues/{pr_number}/comments" + headers = { + 'Authorization': f'Bearer {token}', + 'Accept': 'application/vnd.github+json', + 'X-GitHub-Api-Version': '2022-11-28', + } + # Check for an existing comment with the marker to update rather than post duplicate. + existing_resp = requests.get(api_url, headers=headers, timeout=10) + if existing_resp.status_code == 200: + for comment in existing_resp.json(): + if marker in comment.get('body', ''): + patch_url = comment['url'] + requests.patch(patch_url, json={'body': full_body}, headers=headers, timeout=10) + return + requests.post(api_url, json={'body': full_body}, headers=headers, timeout=10) + except Exception as e: + print(f"PR comment error: {e}", file=sys.stderr) + + +def write_gh_step_summary(content: str) -> None: + """Append *content* to the GitHub Actions step summary file.""" + summary_path = os.getenv('GITHUB_STEP_SUMMARY', '').strip() + if not summary_path: + return + try: + with open(summary_path, 'a', encoding='utf-8') as f: + f.write(content + '\n') + except Exception as e: + print(f"Step summary write error: {e}", file=sys.stderr) + + def build_gh_actions_context() -> dict: """Extract GitHub Actions context from environment variables.""" repo = os.getenv('GITHUB_REPOSITORY', '') @@ -116,7 +165,16 @@ def setup_args(): version=f"InfraScan v{__version__}", help="Show version information and exit" ) - + + parser.add_argument( + "--traffic-profile", + choices=["auto", "small", "medium", "large"], + default="auto", + dest="traffic_profile", + help="Usage-based cost scaling profile (default: auto β€” detected from infra size). " + "small=10GB/d NAT, medium=100GB/d, large=1TB/d." + ) + return parser.parse_args() def print_text_report(report_dict, resource_count, scanner_type): @@ -234,6 +292,56 @@ def print_grade_line(name, grade): print(f"\n{Fore.CYAN}{Style.BRIGHT}{'=' * 60}\n") +def _print_savings_block(report_dict: dict) -> None: + """Print the 'πŸ’° Estimated Savings' block after the grading summary.""" + init(autoreset=True) + est = (report_dict.get('metrics') or {}).get('savings_estimate') + if not est: + return + + low = est.get('low_usd_month', 0) + high = est.get('high_usd_month', 0) + total = est.get('total_infra_cost_usd_month') + pct_lo_det = est.get('savings_pct_of_detectable_low') + pct_hi_det = est.get('savings_pct_of_detectable_high') + pct_lo_tot = est.get('savings_pct_of_total_low') + pct_hi_tot = est.get('savings_pct_of_total_high') + profile = (report_dict.get('metrics') or {}).get('traffic_profile', 'small') + provider = est.get('cost_provider', 'internal') + + print(f"\n{Fore.GREEN}{Style.BRIGHT}πŸ’° ESTIMATED SAVINGS:") + print(f"{'-' * 30}") + + if low == high: + print(f" Potential saving: {Fore.GREEN}{Style.BRIGHT}${low:,.2f}/month{Style.RESET_ALL}") + else: + print(f" Potential saving: {Fore.GREEN}{Style.BRIGHT}${low:,.2f} – ${high:,.2f}/month{Style.RESET_ALL}") + + if pct_lo_tot is not None and pct_hi_tot is not None: + print(f" vs total infra cost: {Fore.YELLOW}{pct_lo_tot}% – {pct_hi_tot}%{Style.RESET_ALL}", end='') + if total: + print(f" (total: ${total:,.0f}/mo)", end='') + print() + elif pct_lo_det is not None: + print(f" vs detectable resources: {Fore.YELLOW}{pct_lo_det}% – {pct_hi_det}%{Style.RESET_ALL}") + + print(f" Traffic profile: {profile} | Pricing source: {provider}") + + per = sorted( + est.get('per_finding', []), + key=lambda f: f.get('saving_high', 0), reverse=True + )[:3] + if per: + print(f" {Style.BRIGHT}Top opportunities:{Style.RESET_ALL}") + for pf in per: + s_lo = pf.get('saving_low', 0) + s_hi = pf.get('saving_high', 0) + saving_str = f"${s_lo:,.2f}" if s_lo == s_hi else f"${s_lo:,.2f}–${s_hi:,.2f}" + import os as _os + fname = _os.path.basename(pf.get('file', '')) + print(f" β€’ {pf.get('rule_id', '')}: {saving_str}/mo ({fname}:{pf.get('line', '')})") + + def should_fail(args, report_dict, results): if not args.fail_on: return False @@ -308,7 +416,9 @@ def main(): findings=results, resource_count=resource_count, scanner_type=args.scanner, - extra_recommendations=recommendations + extra_recommendations=recommendations, + scan_path=target_path, + traffic_profile=getattr(args, 'traffic_profile', 'auto'), ) report_dict = report.to_dict() @@ -350,8 +460,41 @@ def main(): # If format is text OR if output is saved to record/html/json # always show the text summary in the console print_text_report(report_dict, resource_count, args.scanner) + _print_savings_block(report_dict) if args.out: print(f"{Fore.GREEN}[v] Full {args.format.upper()} report saved to: {Fore.WHITE}{args.out}") + + # Phase 3 β€” GitHub Actions step summary + savings_est = (report_dict.get('metrics') or {}).get('savings_estimate') + overall_g = report_dict.get('overall', {}) + if savings_est and os.getenv('GITHUB_STEP_SUMMARY'): + from reporter.cost_estimator import format_savings_summary_md + summary_md = format_savings_summary_md( + savings_est, + overall_grade=overall_g.get('letter'), + overall_pct=overall_g.get('percentage'), + security_findings=report_dict.get('findings', {}).get('security', []), + container_findings=report_dict.get('findings', {}).get('container', []), + ) + write_gh_step_summary(summary_md) + + # Phase 3 β€” PR comment: only when there are actual savings to act on + has_savings = savings_est and savings_est.get('low_usd_month', 0) > 0 + has_security = bool( + report_dict.get('findings', {}).get('security') or + report_dict.get('findings', {}).get('container') + ) + if (has_savings or has_security) and os.getenv('GITHUB_TOKEN') and os.getenv('GITHUB_EVENT_PATH'): + from reporter.cost_estimator import format_savings_summary_md + comment_md = format_savings_summary_md( + savings_est if has_savings else {}, + overall_grade=overall_g.get('letter'), + overall_pct=overall_g.get('percentage'), + security_findings=report_dict.get('findings', {}).get('security', []), + container_findings=report_dict.get('findings', {}).get('container', []), + ) + if comment_md: + post_pr_comment(comment_md) # Send Slack notification if configured webhook_url = os.getenv('SLACK_WEBHOOK_URL', '').strip() @@ -386,6 +529,18 @@ def main(): lines.append(f"Triggered by: {ctx['actor']}") lines.append(f"Grades: {grades_summary}") lines.append(f"Findings: {total_findings} | Scanner: {args.scanner}") + + # Cost savings summary (if available) + slack_savings = (report_dict.get('metrics') or {}).get('savings_estimate') + if slack_savings: + s_lo = slack_savings.get('low_usd_month', 0) + s_hi = slack_savings.get('high_usd_month', 0) + total_c = slack_savings.get('total_infra_cost_usd_month') + if total_c: + lines.append(f"Infra cost: ~${total_c:,.0f}/mo | Potential savings: ${s_lo:,.0f}–${s_hi:,.0f}/mo") + else: + lines.append(f"Potential savings: ${s_lo:,.0f}–${s_hi:,.0f}/mo") + if ctx['run_url']: lines.append(f"<{ctx['run_url']}|View run>") diff --git a/docs/COST_ESTIMATION.md b/docs/COST_ESTIMATION.md new file mode 100644 index 0000000..c556078 --- /dev/null +++ b/docs/COST_ESTIMATION.md @@ -0,0 +1,160 @@ +# Cost Estimation + +InfraScan estimates monthly AWS infrastructure cost directly from Terraform source files β€” +no credentials, no network access, no external tools required. + +Two things are calculated for each scan: + +- **Savings estimate** β€” how much could be saved by fixing the cost findings InfraScan flagged +- **Total infrastructure cost** β€” a bottom-up monthly cost for every billable resource in the repo + +--- + +## Savings estimate + +Each cost rule (`COST-001`, `COST-004`, etc.) fires on a specific Terraform resource. When it +does, InfraScan calculates what that configuration decision costs versus the recommended +alternative. + +The report shows a **range** β€” `$low – $high / month` β€” rather than a single number. For +straightforward optimisations the two ends are equal (you know exactly what you'll save). For +uncertain ones, `low = $0` to avoid overstating the benefit. + +Rules where `low = $0`: + +| Rule | Why savings are uncertain | +|---|---| +| COST-005 β€” NAT Gateway exists | The gateway may be necessary; eliminating it requires replacing all internet-bound traffic with VPC endpoints, which may not be feasible | +| COST-007 β€” DynamoDB provisioned capacity | Switching to on-demand billing can cost *more* under sustained load | +| COST-027 β€” Missing VPC endpoints | The fraction of NAT traffic that goes to S3/DynamoDB is unknown from config alone | + +For rules that represent a fleet-wide change (COST-012 β€” EC2 Spot instances), the saving is +counted once for the entire account, not multiplied per instance. + +When multiple rules fire on the same resource, their combined savings are capped at that +resource's estimated monthly cost β€” so you can never save more than the resource costs. + +### Supported cost rules + +| Rule | What it detects | Confidence | +|---|---|---| +| COST-001 | EC2 previous-generation instance (upgrade to current gen) | High | +| COST-004 | io1/io2 EBS volume that could use gp3 | High | +| COST-005 | NAT Gateway present (consider VPC endpoints) | Low | +| COST-006 | Unassociated Elastic IP | High | +| COST-007 | DynamoDB provisioned capacity (consider on-demand) | Low | +| COST-008 | EC2 detailed monitoring enabled | High | +| COST-009 | gp2 EBS volume (upgrade to gp3) | High | +| COST-012 | No Spot instances in the fleet | Medium | +| COST-014 | Unnecessary Route53 health checks | High | +| COST-015 | CloudWatch Log group with no retention policy | Medium | +| COST-020 | RDS previous-generation instance | High | +| COST-021 | Lambda function over-provisioned memory | Medium | +| COST-022 | API Gateway REST API (consider HTTP API) | Medium | +| COST-024 | RDS Multi-AZ in what looks like a non-production environment | High | +| COST-025 | ECS task definition with no CPU/memory limits | Low | +| COST-026 | Multiple NAT Gateways in the same VPC | High | +| COST-027 | S3/DynamoDB traffic routed through NAT instead of VPC endpoints | Low | + +Rules COST-003, COST-010, COST-011, COST-017, and COST-023 are governance signals with no +direct cost delta β€” they show `$0` in the savings column. + +--- + +## Total infrastructure cost + +Every resource block in the Terraform files is matched against a list of supported resource +types. For each one, InfraScan calculates two figures: + +- **Min cost** β€” fixed charges only, assuming zero traffic. This is the floor: what you pay + just for the resource existing. +- **Expected cost** β€” fixed charges plus estimated usage at the detected traffic profile. + +For resources that are purely pay-per-use (Lambda, CloudWatch Logs), min cost is `$0`. + +### Traffic profiles + +Because usage-based charges β€” data transfer, invocations, storage GB β€” aren't visible in +Terraform config, InfraScan uses defaults that scale with the apparent size of the +infrastructure. + +It auto-detects one of three profiles: + +| Profile | Rough signal | +|---|---| +| **Small** | A handful of services, no large instances | +| **Medium** | Dozens of services, a few NAT gateways or load balancers | +| **Large** | Many EC2 instances, large instance types, significant data infrastructure | + +The profile sets total environment-level defaults (e.g. total NAT data transfer per day), which +are then divided evenly across the relevant resource count so each instance gets a per-resource +share. + +Pricing comes from a static table (`us-east-1`, on-demand Linux, updated per release). Reserved +Instance or Savings Plan discounts are not modelled. + +### Supported resource types + +| Resource type | What's priced | +|---|---| +| `aws_instance` | On-demand instance + root EBS volume | +| `aws_db_instance` | RDS on-demand (doubled for Multi-AZ) | +| `aws_rds_cluster_instance` | Aurora on-demand | +| `aws_ebs_volume` | Storage GB + provisioned IOPS if io1/io2 | +| `aws_nat_gateway` | Hourly charge + data processing | +| `aws_eip` | Hourly charge when unattached | +| `aws_lb` / `aws_alb` / `aws_elb` | Hourly base + LCU charge from estimated data | +| `aws_lambda_function` | GB-seconds + request count | +| `aws_cloudwatch_log_group` | Ingestion + storage (based on retention setting) | +| `aws_cloudwatch_metric_alarm` | Per-alarm flat rate | +| `aws_cloudwatch_dashboard` | Per-dashboard flat rate | +| `aws_s3_bucket` | Standard storage GB | +| `aws_dynamodb_table` | Provisioned RCU/WCU (on-demand tables show $0 fixed) | +| `aws_ecs_task_definition` | Fargate vCPU + memory | +| `aws_eks_cluster` | Control plane hourly charge | +| `aws_elasticache_cluster` | Node type Γ— node count | +| `aws_elasticache_replication_group` | Node type Γ— node count | +| `aws_msk_cluster` | Broker type Γ— broker count | +| `aws_opensearch_domain` / `aws_elasticsearch_domain` | Instance type Γ— node count | +| `aws_redshift_cluster` | Node type Γ— node count | +| `aws_secretsmanager_secret` | Per-secret monthly fee + API requests | +| `aws_vpc_endpoint` (Interface) | Hourly charge + data processing | +| `aws_vpc_endpoint` (Gateway) | Free | +| `aws_wafv2_web_acl` | Per-ACL flat rate | +| `aws_sfn_state_machine` | State transitions | +| `aws_api_gateway_rest_api` | Per-request pricing | +| `aws_apigatewayv2_api` | Per-request pricing | +| `aws_kinesis_stream` | Per-shard hourly | +| `aws_route53_health_check` | Per-check flat rate | +| `aws_route53_zone` | Per-zone flat rate | +| `aws_kms_key` | Per-key flat rate | +| `aws_efs_file_system` | Standard storage GB | +| `aws_ec2_transit_gateway` | Hourly (per attachment) | +| `aws_ec2_transit_gateway_vpc_attachment` | Hourly + data processing | +| `aws_ecr_repository` | Image storage GB | +| `aws_cloudfront_distribution` | HTTPS requests + transfer out | + +Resources that are free or configuration-only (IAM roles and policies, VPCs, subnets, security +groups, route tables, ACM certificates, Route53 records, SSM parameters, and many others) are +counted as covered but contribute $0 to the total. + +--- + +## Limitations + +- **Terraform variable references** (`var.*`, `each.value.*`, `jsondecode(...)`) are not + evaluated. When an `instance_type` or `instance_class` attribute uses a variable reference + instead of a literal string, InfraScan attempts to infer the value by scanning other blocks + in the same repo (locals, variable declarations) for: + - Explicit assignments: `instance_type = "t3.medium"` + - Variable `default` values that look like EC2/RDS types: `default = "t3.medium"` + If types are found, the **average price** across all candidates is used and confidence is + lowered to `Low`. If no types can be inferred (e.g. when sizing is passed in as a JSON blob + via `var.*` from the caller), a conservative floor of **$50/mo** per EC2 instance and + **$100/mo** per RDS instance is used. +- All `.tf` files are scanned recursively, including local module directories. Remote modules + (sourced from a registry or a URL) are only included if they have already been downloaded + into `.terraform/`. +- Prices are for `us-east-1` on-demand. Other regions and purchasing options are not modelled. +- Inter-AZ and internet egress costs are not included except where they're part of a specific + resource's charge (NAT gateway data processing, TGW attachment data, CloudFront transfer out). diff --git a/reporter/cost_estimator.py b/reporter/cost_estimator.py new file mode 100644 index 0000000..8a65cca --- /dev/null +++ b/reporter/cost_estimator.py @@ -0,0 +1,1751 @@ +""" +Cost estimation module for InfraScan. +""" + +import os +import re +import json +import logging +from collections import defaultdict +from dataclasses import dataclass, asdict +from typing import Callable, Dict, List, Optional + +logger = logging.getLogger(__name__) + +# ── Pricing ─────────────────────────────────────────────────────────────────── + +def load_pricing() -> dict: + """Load pricing_table.json bundled with the reporter package.""" + path = os.path.join(os.path.dirname(__file__), "pricing_table.json") + with open(path, "r", encoding="utf-8") as f: + return json.load(f) + + +# ── Data classes ────────────────────────────────────────────────────────────── + +@dataclass +class SavingsResult: + saving_low_usd: float + saving_high_usd: float + before_usd: float + after_usd: float + assumptions: list + confidence: str # "high" | "medium" | "low" + + +@dataclass +class ResourceCost: + resource_type: str + resource_name: str + file: str + line: int + fixed_usd_month: float + usage_usd_month: float + min_usd_month: float # guaranteed floor: fixed charges only (zero usage) + total_usd_month: float # point estimate: fixed + expected usage + assumptions: list + confidence: str # "high" | "medium" | "low" + + +# ── Usage defaults and traffic profiles (loaded from JSON) ─────────────────── + +def load_usage_defaults() -> Dict[str, float]: + """Load usage_defaults.json bundled with the reporter package.""" + path = os.path.join(os.path.dirname(__file__), "usage_defaults.json") + with open(path, "r", encoding="utf-8") as f: + return json.load(f) + + +def load_traffic_profiles() -> Dict[str, Dict[str, float]]: + """Load traffic_profiles.json bundled with the reporter package.""" + path = os.path.join(os.path.dirname(__file__), "traffic_profiles.json") + with open(path, "r", encoding="utf-8") as f: + return json.load(f) + + +# Module-level constants β€” loaded once at import time; importable by callers. +USAGE_DEFAULTS: Dict[str, float] = load_usage_defaults() +TRAFFIC_PROFILES: Dict[str, Dict[str, float]] = load_traffic_profiles() + + +# ── Block extraction ────────────────────────────────────────────────────────── + +def extract_all_blocks(scan_path: str) -> Dict[str, List[dict]]: + """ + Walk all .tf files under *scan_path* and extract resource blocks. + + Returns a dict keyed by resource_type, each value being a list of block + dicts with keys: name, file, start_line, content, first_line, + resource_type. + """ + blocks: Dict[str, List[dict]] = {} + for root, _dirs, files in os.walk(scan_path): + for fname in files: + if not fname.endswith(".tf"): + continue + fpath = os.path.join(root, fname) + try: + with open(fpath, "r", encoding="utf-8", errors="replace") as fh: + content = fh.read() + except OSError as exc: + logger.warning("Failed to read %s: %s", fpath, exc) + continue + _extract_blocks_from_content(content, fpath, blocks) + return blocks + + +def _extract_blocks_from_content( + content: str, filepath: str, blocks: Dict[str, List[dict]] +) -> None: + """Append parsed resource blocks from *content* into *blocks*.""" + lines = content.splitlines() + i = 0 + while i < len(lines): + match = re.match( + r'\s*resource\s+["\']([^"\']+)["\']\s+["\']([^"\']+)["\']', + lines[i], + ) + if match: + resource_type = match.group(1) + resource_name = match.group(2) + start_line = i + 1 # 1-based + block_lines = [lines[i]] + brace_count = lines[i].count("{") - lines[i].count("}") + i += 1 + while i < len(lines) and brace_count > 0: + block_lines.append(lines[i]) + brace_count += lines[i].count("{") - lines[i].count("}") + i += 1 + blocks.setdefault(resource_type, []).append( + { + "name": resource_name, + "file": filepath, + "start_line": start_line, + "content": "\n".join(block_lines), + "first_line": lines[start_line - 1].strip(), + "resource_type": resource_type, + } + ) + continue + i += 1 + + +def _paths_match(block_file: str, filepath: str) -> bool: + """Compare two file paths that may be absolute vs relative. + + ``block_file`` is always an absolute path (from ``extract_all_blocks``). + ``filepath`` may be a relative path (normalised by ``scan_directory``). + """ + if block_file == filepath: + return True + # Relative path is a suffix of the absolute path after a separator. + sep = "/" + return block_file.endswith(sep + filepath) or block_file.replace("\\", "/").endswith(sep + filepath.replace("\\", "/")) + + +def _find_block( + blocks: Dict[str, List[dict]], filepath: str, line: int +) -> Optional[dict]: + """Return the block that spans *line* in *filepath*, or None.""" + for _rtype, blist in blocks.items(): + for block in blist: + if not _paths_match(block["file"], filepath): + continue + block_line_count = block["content"].count("\n") + 1 + end_line = block["start_line"] + block_line_count + if block["start_line"] <= line <= end_line: + return block + return None + + +# ── Traffic profile auto-detection ──────────────────────────────────────────── + +def detect_traffic_profile(blocks: Dict[str, List[dict]]) -> str: + """ + Infer *small* / *medium* / *large* traffic profile from infra block counts. + + Heuristics: + - Number of EC2 instances, Lambda functions, RDS instances, NAT gateways, + load balancers, ECS task definitions. + - Presence of very large instance types (8xlarge+). + + Returns one of: ``"small"``, ``"medium"``, ``"large"``. + """ + instance_count = len(blocks.get("aws_instance", [])) + lambda_count = len(blocks.get("aws_lambda_function", [])) + nat_count = len(blocks.get("aws_nat_gateway", [])) + lb_count = ( + len(blocks.get("aws_lb", [])) + + len(blocks.get("aws_alb", [])) + + len(blocks.get("aws_elb", [])) + ) + ecs_count = len(blocks.get("aws_ecs_task_definition", [])) + rds_count = len(blocks.get("aws_db_instance", [])) + + large_instances = sum( + 1 + for b in blocks.get("aws_instance", []) + if re.search( + r'instance_type\s*=\s*["\'][^"\']*\.(8xlarge|10xlarge|12xlarge|16xlarge|24xlarge|metal)', + b.get("content", ""), + ) + ) + + score = ( + instance_count * 2 + + lambda_count * 1 + + nat_count * 8 + + lb_count * 4 + + ecs_count * 2 + + rds_count * 3 + + large_instances * 15 + ) + + if score >= 50: + return "large" + if score >= 15: + return "medium" + return "small" + + +def scale_usage_defaults( + usage: dict, profile: str, blocks: Dict[str, List[dict]] +) -> dict: + """ + Return a new usage dict with Tier 2 profile overrides applied. + + ``nat_gb_per_day`` in the profiles represents the *total* estimated daily + NAT transfer for the whole environment. It scales up with compute (more + instances β†’ more egress), then is divided by the number of NAT gateways + so each gateway gets an equal per-gateway share. This prevents the + per-gateway cost estimate from exploding when many parallel gateways are + defined for multi-AZ redundancy. + """ + scaled = dict(usage) + profile_overrides = TRAFFIC_PROFILES.get(profile, TRAFFIC_PROFILES["small"]) + scaled.update(profile_overrides) + + instance_count = len(blocks.get("aws_instance", [])) + len( + blocks.get("aws_ecs_task_definition", []) + ) + nat_count = max(1, len(blocks.get("aws_nat_gateway", []))) + s3_count = max(1, len(blocks.get("aws_s3_bucket", []))) + lambda_count = max(1, len(blocks.get("aws_lambda_function", []))) + apigw_count = max(1, + len(blocks.get("aws_api_gateway_rest_api", [])) + + len(blocks.get("aws_apigatewayv2_api", [])) + ) + ep_count = max(1, len(blocks.get("aws_vpc_endpoint", []))) + tgw_count = max(1, len(blocks.get("aws_ec2_transit_gateway_vpc_attachment", []))) + lb_count = max(1, + len(blocks.get("aws_lb", [])) + + len(blocks.get("aws_alb", [])) + + len(blocks.get("aws_elb", [])) + ) + + # Total egress scales with compute; each gateway receives an equal share. + nat_scale = max(1.0, instance_count / 5.0) + total_nat_gb = scaled["nat_gb_per_day"] * nat_scale + scaled["nat_gb_per_day"] = round(total_nat_gb / nat_count, 2) + + # S3 / Lambda / API GW: profile value is the *total* across all resources; divide + # evenly so each bucket / function / API gets a per-resource estimate. + scaled["s3_gb_standard"] = round(scaled["s3_gb_standard"] / s3_count, 2) + scaled["lambda_invocations_per_mo"] = max(1, scaled["lambda_invocations_per_mo"] // lambda_count) + scaled["api_calls_per_mo"] = max(1, scaled["api_calls_per_mo"] // apigw_count) + + # Usage-based data params: divide total environment estimate by resource count. + if "vpc_endpoint_data_gb_per_mo" in scaled: + scaled["vpc_endpoint_data_gb_per_mo"] = round(scaled["vpc_endpoint_data_gb_per_mo"] / ep_count, 2) + if "tgw_data_processed_gb_per_mo" in scaled: + scaled["tgw_data_processed_gb_per_mo"] = round(scaled["tgw_data_processed_gb_per_mo"] / tgw_count, 2) + if "lb_data_processed_gb" in scaled: + scaled["lb_data_processed_gb"] = round(scaled["lb_data_processed_gb"] / lb_count, 2) + + return scaled + + +# ── EC2 / RDS upgrade maps ──────────────────────────────────────────────────── + +_EC2_UPGRADE_MAP: Dict[str, str] = { + "t2.nano": "t3.nano", + "t2.micro": "t3.micro", + "t2.small": "t3.small", + "t2.medium": "t3.medium", + "t2.large": "t3.large", + "t2.xlarge": "t3.xlarge", + "t2.2xlarge": "t3.2xlarge", + "m3.medium": "m5.large", + "m3.large": "m5.large", + "m4.large": "m5.large", + "m4.xlarge": "m5.xlarge", + "m4.2xlarge": "m5.2xlarge", + "m4.4xlarge": "m5.4xlarge", + "m4.10xlarge": "m5.8xlarge", + "c4.large": "c5.large", + "c4.xlarge": "c5.xlarge", + "c4.2xlarge": "c5.2xlarge", + "c4.4xlarge": "c5.2xlarge", + "r3.large": "r5.large", + "r3.xlarge": "r5.xlarge", + "r3.2xlarge": "r5.2xlarge", + "r3.4xlarge": "r5.2xlarge", + "r4.large": "r5.large", + "r4.xlarge": "r5.xlarge", + "r4.2xlarge": "r5.2xlarge", + "r4.4xlarge": "r5.2xlarge", +} + +_RDS_UPGRADE_MAP: Dict[str, str] = { + "db.t2.micro": "db.t3.micro", + "db.t2.small": "db.t3.small", + "db.t2.medium": "db.t3.medium", + "db.t2.large": "db.t3.large", + "db.t2.xlarge": "db.t3.xlarge", + "db.t2.2xlarge": "db.t3.2xlarge", + "db.m3.medium": "db.m5.large", + "db.m3.large": "db.m5.large", + "db.m4.large": "db.m5.large", + "db.m4.xlarge": "db.m5.xlarge", + "db.m4.2xlarge": "db.m5.2xlarge", + "db.m4.4xlarge": "db.m5.4xlarge", + "db.m4.10xlarge": "db.m5.12xlarge", + "db.r3.large": "db.r5.large", + "db.r3.xlarge": "db.r5.xlarge", + "db.r3.2xlarge": "db.r5.2xlarge", + "db.r3.4xlarge": "db.r5.4xlarge", + "db.r3.8xlarge": "db.r5.8xlarge", + "db.r4.large": "db.r5.large", + "db.r4.xlarge": "db.r5.xlarge", + "db.r4.2xlarge": "db.r5.2xlarge", + "db.r4.4xlarge": "db.r5.4xlarge", + "db.r4.8xlarge": "db.r5.8xlarge", + "db.r4.16xlarge": "db.r5.16xlarge", +} + + +# ── Phase 1 β€” per-rule savings functions ───────────────────────────────────── + +def _savings_cost001(block_content: str, pricing: dict, usage: dict) -> SavingsResult: + """COST-001 EC2 old-gen β†’ new-gen upgrade.""" + ec2 = pricing.get("ec2_instances", {}) + m = re.search(r'instance_type\s*=\s*["\']([^"\']+)["\']', block_content) + if not m: + return SavingsResult(0.0, 0.0, 0.0, 0.0, [], "low") + inst = m.group(1).strip() + before = ec2.get(inst, 0.0) + if before == 0.0: + # Instance type not in pricing table β€” use a conservative floor so the + # calculated saving is shown instead of falling back to static text. + before = 100.0 + conf = "low" + else: + conf = "high" + new_inst = _EC2_UPGRADE_MAP.get(inst) + if new_inst: + after = ec2.get(new_inst, before * 0.85) + else: + after = before * 0.85 # fallback: ~15% saving for unmapped old-gen + saving = max(0.0, before - after) + return SavingsResult( + saving, saving, before, after, + [f"{inst} β†’ {new_inst or 'new-gen'} on-demand us-east-1"], + conf, + ) + + +def _savings_cost004(block_content: str, pricing: dict, usage: dict) -> SavingsResult: + """COST-004 io1/io2 β†’ gp3.""" + ebs = pricing.get("ebs_per_gb_month", {}) + iops_pricing = pricing.get("ebs_iops_per_iops_month", {}) + size_m = re.search(r'(?:volume_size|size)\s*=\s*(\d+)', block_content) + iops_m = re.search(r'\biops\s*=\s*(\d+)', block_content) + vol_m = re.search(r'(?:volume_type|type)\s*=\s*["\']([^"\']+)["\']', block_content) + size = int(size_m.group(1)) if size_m else 100 + iops = int(iops_m.group(1)) if iops_m else 3000 + vol_type = vol_m.group(1) if vol_m else "io1" + before = size * 0.125 + iops * iops_pricing.get("io1", 0.065) + after = size * ebs.get("gp3", 0.08) + saving = max(0.0, before - after) + conf = "high" if (size_m and iops_m) else "medium" + return SavingsResult( + saving, saving, before, after, + [f"{vol_type} {size}GB {iops}IOPS β†’ gp3"], + conf, + ) + + +def _savings_cost005(block_content: str, pricing: dict, usage: dict) -> SavingsResult: + """COST-005 NAT Gateway β€” uncertain removal: may be required by workloads.""" + hourly = pricing.get("nat_gateway_hourly", 0.045) + per_gb = pricing.get("nat_gateway_per_gb", 0.045) + gb_day = usage.get("nat_gb_per_day", 10.0) + before = round(hourly * 730 + per_gb * gb_day * 30, 2) + return SavingsResult( + 0.0, before, # low=0: gateway may be required; high: full cost if replaced by VPC endpoints + before, 0.0, + [ + f"$0.045/hrΓ—730h + $0.045/GBΓ—{gb_day}GB/dΓ—30d", + "low=$0: may be necessary; high: full replacement with VPC endpoints", + ], + "low", + ) + + +def _savings_cost006(block_content: str, pricing: dict, usage: dict) -> SavingsResult: + """COST-006 Unassociated Elastic IP.""" + hourly = pricing.get("eip_unattached_per_hour", 0.005) + before = hourly * 730 + return SavingsResult( + before, before, before, 0.0, + ["$0.005/hr Γ— 730h/mo unattached EIP charge"], + "high", + ) + + +def _savings_cost007(block_content: str, pricing: dict, usage: dict) -> SavingsResult: + """COST-007 DynamoDB provisioned β†’ on-demand β€” uncertain: on-demand may cost more under sustained load.""" + per_rcu_hr = pricing.get("dynamodb_per_rcu_hour", 0.00013) + per_wcu_hr = pricing.get("dynamodb_per_wcu_hour", 0.00065) + rcu_m = re.search(r'read_capacity\s*=\s*(\d+)', block_content) + wcu_m = re.search(r'write_capacity\s*=\s*(\d+)', block_content) + rcu = int(rcu_m.group(1)) if rcu_m else 5 + wcu = int(wcu_m.group(1)) if wcu_m else 5 + before = (rcu * per_rcu_hr + wcu * per_wcu_hr) * 730 + idle_pct = usage.get("dynamo_idle_pct", 0.70) + saving = before * idle_pct * idle_pct + after = before - saving + return SavingsResult( + 0.0, saving, # low=0: on-demand can cost more under sustained load + before, after, + [ + f"RCU={rcu} WCU={wcu}, assumed {int(idle_pct*100)}% idle both dimensions", + "low=$0: on-demand billing can exceed provisioned under steady load", + ], + "low", + ) + + +def _savings_cost008(block_content: str, pricing: dict, usage: dict) -> SavingsResult: + """COST-008 EC2 detailed monitoring β€” fixed $2.10/instance/month.""" + before = 2.10 + return SavingsResult( + before, before, before, 0.0, + ["fixed $2.10/instance/month for detailed monitoring"], + "high", + ) + + +def _savings_cost009(block_content: str, pricing: dict, usage: dict) -> SavingsResult: + """COST-009 gp2 β†’ gp3 EBS volume.""" + ebs = pricing.get("ebs_per_gb_month", {}) + size_m = re.search(r'volume_size\s*=\s*(\d+)', block_content) + size = int(size_m.group(1)) if size_m else 20 + before = size * ebs.get("gp2", 0.10) + after = size * ebs.get("gp3", 0.08) + saving = before - after + conf = "high" if size_m else "medium" + assumptions = [f"{size}GB gp2β†’gp3 ($0.02/GB/mo saving)"] + if not size_m: + assumptions.append("volume_size not found, assumed 20 GB") + return SavingsResult(saving, saving, before, after, assumptions, conf) + + +def _savings_cost014(block_content: str, pricing: dict, usage: dict) -> SavingsResult: + """COST-014 Route53 health check.""" + before = pricing.get("route53_health_check_per_month", 0.50) + return SavingsResult( + before, before, before, 0.0, + ["$0.50/health-check/month"], + "high", + ) + + +def _savings_cost015(block_content: str, pricing: dict, usage: dict) -> SavingsResult: + """COST-015 CloudWatch Logs without retention.""" + per_gb = pricing.get("cloudwatch_logs_per_gb_stored", 0.03) + gb_mo = usage.get("cw_log_gb_per_month", 5.0) + # Without retention: logs accumulate indefinitely β€” model as 12 months stored. + # With 30-day retention: ~1 month stored at any time. + before = per_gb * gb_mo * 12 + after = per_gb * gb_mo * 1 + saving = before - after + return SavingsResult( + saving, saving, before, after, + [f"{gb_mo}GB/mo ingested, no-retentionβ‰ˆ12mo stored vs 30-dayβ‰ˆ1mo stored"], + "medium", + ) + + +def _savings_cost020(block_content: str, pricing: dict, usage: dict) -> SavingsResult: + """COST-020 RDS old-gen β†’ new-gen upgrade.""" + rds = pricing.get("rds_instances", {}) + m = re.search(r'instance_class\s*=\s*["\']([^"\']+)["\']', block_content) + if not m: + return SavingsResult(0.0, 0.0, 0.0, 0.0, [], "low") + inst = m.group(1).strip() + before = rds.get(inst, 0.0) + if before == 0.0: + # Instance type not in pricing table β€” use a conservative floor so the + # calculated saving is shown instead of falling back to static text. + before = 100.0 + conf = "low" + else: + conf = "high" + new_inst = _RDS_UPGRADE_MAP.get(inst) + after = rds.get(new_inst, before * 0.85) if new_inst else before * 0.85 + saving = max(0.0, before - after) + return SavingsResult( + saving, saving, before, after, + [f"{inst} β†’ {new_inst or 'new-gen'} on-demand us-east-1"], + conf, + ) + + +def _savings_cost021(block_content: str, pricing: dict, usage: dict) -> SavingsResult: + """COST-021 Lambda over-provisioned memory.""" + per_gb_sec = pricing.get("lambda_per_gb_second", 0.0000166667) + per_1m_req = pricing.get("lambda_per_1m_requests", 0.20) + invocations = usage.get("lambda_invocations_per_mo", 1_000_000) + duration_ms = usage.get("lambda_avg_duration_ms", 200.0) + after_mb = usage.get("lambda_memory_after_mb", 1024.0) + + mem_m = re.search(r'memory_size\s*=\s*(\d+)', block_content) + before_mb = float(mem_m.group(1)) if mem_m else 3008.0 + + def _lambda_cost(mem_mb: float) -> float: + gb_seconds = (mem_mb / 1024) * (duration_ms / 1000) * invocations + return per_gb_sec * gb_seconds + per_1m_req * (invocations / 1_000_000) + + before = _lambda_cost(before_mb) + after = _lambda_cost(after_mb) + saving = max(0.0, before - after) + return SavingsResult( + saving, saving, before, after, + [ + f"{int(before_mb)}MB β†’ {int(after_mb)}MB", + f"{invocations/1e6:.0f}M invocations/mo, {duration_ms}ms avg", + ], + "medium", + ) + + +def _savings_cost022(block_content: str, pricing: dict, usage: dict) -> SavingsResult: + """COST-022 API Gateway REST β†’ HTTP API.""" + rest_per_1m = pricing.get("api_gw_rest_per_1m_calls", 3.50) + http_per_1m = pricing.get("api_gw_http_per_1m_calls", 1.00) + calls_mo = usage.get("api_calls_per_mo", 1_000_000) + before = rest_per_1m * calls_mo / 1_000_000 + after = http_per_1m * calls_mo / 1_000_000 + saving = before - after + return SavingsResult( + saving, saving, before, after, + [f"{calls_mo/1e6:.0f}M calls/mo RESTβ†’HTTP API ($3.50β†’$1.00 per 1M)"], + "medium", + ) + + +def _savings_cost023(block_content: str, pricing: dict, usage: dict) -> SavingsResult: + """COST-023 SQS max retention β€” governance flag, no direct cost delta.""" + return SavingsResult( + 0.0, 0.0, 0.0, 0.0, + ["SQS retention has no direct per-GB cost; governance signal only"], + "low", + ) + + +def _savings_cost024(block_content: str, pricing: dict, usage: dict) -> SavingsResult: + """COST-024 RDS Multi-AZ in non-production environment.""" + rds = pricing.get("rds_instances", {}) + m = re.search(r'instance_class\s*=\s*["\']([^"\']+)["\']', block_content) + if not m: + hints = usage.get("_var_hints", {}) + inst = hints.get("instance_class") + if not inst: + return SavingsResult(0.0, 0.0, 0.0, 0.0, [], "low") + else: + inst = m.group(1).strip() + instance_cost = rds.get(inst, 100.0) + # Multi-AZ doubles instance cost; disabling it saves one replica. + before = instance_cost * 2 + after = instance_cost + saving = instance_cost + return SavingsResult( + saving, saving, before, after, + [f"{inst} multi_az=false saves one replica (${instance_cost:.2f}/mo)"], + "high", + ) + + +def _savings_cost025(block_content: str, pricing: dict, usage: dict) -> SavingsResult: + """COST-025 ECS task definition without CPU/memory limits.""" + vcpu_hr = pricing.get("ecs_fargate_per_vcpu_hour", 0.04048) + gb_hr = pricing.get("ecs_fargate_per_gb_hour", 0.004445) + over_pct = usage.get("ecs_overprovisioning_pct", 0.25) + # Minimum viable Fargate task: 0.25 vCPU, 0.5 GB. + min_task_cost = (vcpu_hr * 0.25 + gb_hr * 0.5) * 730 + before = min_task_cost * (1 + over_pct) + after = min_task_cost + saving = min_task_cost * over_pct + return SavingsResult( + saving, saving, before, after, + [ + f"min Fargate task 0.25vCPU/0.5GB (${min_task_cost:.2f}/mo)", + f"assumed {int(over_pct*100)}% over-provisioning waste", + ], + "low", + ) + + +def _savings_cost026(block_content: str, pricing: dict, usage: dict) -> SavingsResult: + """COST-026 Multiple NAT Gateways β€” saving = cost of one extra gateway.""" + hourly = pricing.get("nat_gateway_hourly", 0.045) + before = hourly * 730 # cost of one redundant NAT GW (data excl.) + return SavingsResult( + before, before, before, 0.0, + ["$0.045/hr Γ— 730h/mo per extra NAT GW (data charges excluded)"], + "high", + ) + + +def _savings_cost027(block_content: str, pricing: dict, usage: dict) -> SavingsResult: + """COST-027 Missing VPC Endpoints for S3/DynamoDB β€” uncertain: actual S3/Dynamo share unknown.""" + per_gb = pricing.get("nat_gateway_per_gb", 0.045) + gb_day = usage.get("nat_gb_per_day", 10.0) + s3_dynamo_pct = usage.get("s3_dynamo_pct_of_nat", 0.20) + s3_dynamo_gb_day = gb_day * s3_dynamo_pct + before = per_gb * s3_dynamo_gb_day * 30 + return SavingsResult( + 0.0, before, # low=0: actual S3/DynamoDB share of NAT traffic is unknown + before, 0.0, + [ + f"S3/DynamoDB={int(s3_dynamo_pct*100)}% of {gb_day}GB/d NAT traffic", + f"= {s3_dynamo_gb_day:.1f}GB/d Γ— $0.045/GB Γ— 30d", + "low=$0: actual traffic split to S3/DynamoDB is unobservable from config", + ], + "low", + ) + + +def _savings_zero(label: str) -> Callable: + """Factory for governance rules with no direct cost delta.""" + def _fn(block_content: str, pricing: dict, usage: dict) -> SavingsResult: + return SavingsResult(0.0, 0.0, 0.0, 0.0, [label], "low") + return _fn + + +def _savings_cost012_factory(total_ec2_cost: float, inferred: bool = False) -> Callable: + """COST-012 Spot instances β€” fleet-level saving (counted once, not per instance).""" + def _fn(block_content: str, pricing: dict, usage: dict) -> SavingsResult: + low = total_ec2_cost * 0.50 + high = total_ec2_cost * 0.90 + assumptions = [ + f"total EC2 on-demand ${total_ec2_cost:.2f}/mo", + "Spot instances save 50–90%", + ] + if inferred: + assumptions.append("instance_type unresolvable (variable) β€” using inferred price") + return SavingsResult( + low, high, + total_ec2_cost, total_ec2_cost * 0.10, + assumptions, + "low" if inferred else "medium", + ) + return _fn + + +# Rules that represent a fleet-level optimisation rather than a per-resource one. +# estimate_savings will only count the first finding for each of these. +_FLEET_LEVEL_RULES = {"COST-012"} + + +# Base SAVINGS_MODELS registry (COST-012 is built dynamically in estimate_savings). +SAVINGS_MODELS: Dict[str, Callable] = { + "COST-001": _savings_cost001, + "COST-003": _savings_zero("security risk β€” encrypting EBS has no cost delta"), + "COST-004": _savings_cost004, + "COST-005": _savings_cost005, + "COST-006": _savings_cost006, + "COST-007": _savings_cost007, + "COST-008": _savings_cost008, + "COST-009": _savings_cost009, + "COST-010": _savings_zero("S3 lifecycle saving depends on object churn; no static estimate"), + "COST-011": _savings_zero("AWS Budget is a governance control; no direct cost delta"), + "COST-014": _savings_cost014, + "COST-015": _savings_cost015, + "COST-017": _savings_zero("CUR is a governance control; no direct cost delta"), + "COST-020": _savings_cost020, + "COST-021": _savings_cost021, + "COST-022": _savings_cost022, + "COST-023": _savings_cost023, + "COST-024": _savings_cost024, + "COST-025": _savings_cost025, + "COST-026": _savings_cost026, + "COST-027": _savings_cost027, +} + + +# ── Phase 1 β€” estimate_savings ──────────────────────────────────────────────── + +def _build_var_hints(blocks: Dict[str, List[dict]], pricing: dict) -> dict: + """ + Scan non-resource blocks for literal instance_type / instance_class values and + variable ``default`` values that look like EC2/RDS types. + + Returns a hints dict with optional keys: + ``instance_type`` β€” EC2 type whose price is closest to the inferred average + ``_ec2_fallback_cost`` β€” average EC2 price across all found types + ``instance_class`` β€” RDS type closest to the inferred average + ``_rds_fallback_cost`` β€” average RDS price across all found types + """ + ec2_prices = pricing.get("ec2_instances", {}) + rds_prices = pricing.get("rds_instances", {}) + + # Scan all non-resource blocks so we don't re-count literals already handled + # individually in each resource block. + _resource_types = set(_RESOURCE_COST_FNS.keys()) + support = " ".join( + b.get("content", "") + for btype, blist in blocks.items() + if btype not in _resource_types + for b in blist + ) + + # Pattern 1 β€” explicit attribute: instance_type = "t3.medium" + ec2_types: List[str] = re.findall(r'instance_type\s*=\s*["\']([\w.]+)["\']', support) + # Pattern 2 β€” variable default: default = "t3.medium" (value is a known EC2 type) + ec2_types += [ + t for t in re.findall(r'\bdefault\s*=\s*["\']([\w.]+)["\']', support) + if t in ec2_prices + ] + + rds_types: List[str] = re.findall(r'instance_class\s*=\s*["\']([\w.]+)["\']', support) + rds_types += [ + t for t in re.findall(r'\bdefault\s*=\s*["\']([\w.]+)["\']', support) + if t in rds_prices + ] + + hints: dict = {} + if ec2_types: + avg = sum(ec2_prices.get(t, 50.0) for t in ec2_types) / len(ec2_types) + # Store both the representative type (for display) and the average cost (for math) + hints["instance_type"] = min(ec2_types, key=lambda t: abs(ec2_prices.get(t, 50.0) - avg)) + hints["_ec2_fallback_cost"] = avg + if rds_types: + avg = sum(rds_prices.get(t, 100.0) for t in rds_types) / len(rds_types) + hints["instance_class"] = min(rds_types, key=lambda t: abs(rds_prices.get(t, 100.0) - avg)) + hints["_rds_fallback_cost"] = avg + return hints + + +def estimate_savings( + findings: List[dict], + blocks: Dict[str, List[dict]], + pricing: dict, + usage: dict, +) -> dict: + """ + Compute per-finding and aggregate savings for all cost findings. + + Returns a dict matching ScanReport.metrics["savings_estimate"]. + """ + # Build variable hints once and reuse for COST-012 fleet cost and per-finding savings. + var_hints = _build_var_hints(blocks, pricing) + ec2_fallback = var_hints.get("_ec2_fallback_cost", 50.0) + + # For COST-012, pre-compute total EC2 on-demand cost from all discovered blocks. + ec2_prices = pricing.get("ec2_instances", {}) + total_ec2_cost = 0.0 + cost012_inferred = False + for b in blocks.get("aws_instance", []): + m = re.search(r'instance_type\s*=\s*["\']([^"\']+)["\']', b.get("content", "")) + if m: + total_ec2_cost += ec2_prices.get(m.group(1).strip(), 50.0) + else: + # instance_type is a variable/local reference β€” use inferred average or floor + total_ec2_cost += ec2_fallback + cost012_inferred = True + + models = dict(SAVINGS_MODELS) + models["COST-012"] = _savings_cost012_factory(total_ec2_cost, inferred=cost012_inferred) + + usage_with_hints = {**usage, "_var_hints": var_hints} if var_hints else usage + + per_finding: List[dict] = [] + seen_fleet_rules: set = set() # deduplicate fleet-level rules (e.g. COST-012) + + for finding in findings: + rule_id = finding.get("rule_id") + savings_fn = models.get(rule_id) + if savings_fn is None: + continue + + # Fleet-level rules produce one saving for the whole fleet β€” only count once. + if rule_id in _FLEET_LEVEL_RULES: + if rule_id in seen_fleet_rules: + continue + seen_fleet_rules.add(rule_id) + + fpath = finding.get("file", "") + line = finding.get("line", 0) + block = _find_block(blocks, fpath, line) + block_content = block["content"] if block else "" + # Track which block this finding belongs to for per-block cap (see below). + block_key = (block["file"], block["start_line"]) if block else None + + try: + result: SavingsResult = savings_fn(block_content, pricing, usage_with_hints) + except Exception as exc: + logger.warning("savings_fn for %s failed: %s", rule_id, exc) + continue + + per_finding.append( + { + "rule_id": rule_id, + "file": fpath, + "line": line, + "before_usd": round(result.before_usd, 2), + "after_usd": round(result.after_usd, 2), + "saving_low": round(result.saving_low_usd, 2), + "saving_high": round(result.saving_high_usd, 2), + "assumptions": result.assumptions, + "confidence": result.confidence, + # Exposed so the JS can join per_finding β†’ resource_costs exactly. + "block_file": block["file"] if block else fpath, + "block_line": block["start_line"] if block else line, + } + ) + + # Distribute fleet-level COST-012 saving across individual instance blocks. + # The main loop created ONE per_finding entry (fleet total). Replace it with + # N per-instance entries so every row in the resource cost table shows its + # proportional Spot saving. Aggregate sum stays identical: N x (fleet/N) = fleet. + # Only findings that resolve to an actual resource block are distributed; + # data source findings (e.g. data "aws_instance") are silently dropped so the + # headline total stays consistent with the per-row table. + cost012_idx = next( + (i for i, pf in enumerate(per_finding) if pf["rule_id"] == "COST-012"), None + ) + if cost012_idx is not None: + fleet_pf = per_finding[cost012_idx] + cost012_all = [f for f in findings if f.get("rule_id") == "COST-012"] + # Resolve each finding to its resource block; drop those without a match + # (data sources, unmapped types) to prevent orphaned headline inflation. + cost012_resolved = [ + (f, b) + for f in cost012_all + for b in [_find_block(blocks, f.get("file", ""), f.get("line", 0))] + if b is not None + ] + n = len(cost012_resolved) + if n == 0: + # No resolvable blocks β€” remove the entry entirely + per_finding.pop(cost012_idx) + elif n == 1: + # Single instance β€” rewrite in-place with resolved block coords + f0, b0 = cost012_resolved[0] + per_finding[cost012_idx].update({ + "file": f0.get("file", ""), + "line": f0.get("line", 0), + "block_file": b0["file"], + "block_line": b0["start_line"], + }) + else: + distributed = [] + for f, b in cost012_resolved: + distributed.append({ + "rule_id": "COST-012", + "file": f.get("file", ""), + "line": f.get("line", 0), + "before_usd": round(fleet_pf["before_usd"] / n, 2), + "after_usd": round(fleet_pf["after_usd"] / n, 2), + "saving_low": round(fleet_pf["saving_low"] / n, 2), + "saving_high": round(fleet_pf["saving_high"] / n, 2), + "assumptions": fleet_pf["assumptions"], + "confidence": fleet_pf["confidence"], + "block_file": b["file"], + "block_line": b["start_line"], + }) + per_finding[cost012_idx : cost012_idx + 1] = distributed + + # Per-block savings cap + # Multiple rules can fire on the same resource block (e.g. COST-001 and + # COST-004 both target the same aws_instance). Their combined savings must + # not exceed the block’s actual resource cost. + # Fleet-level rules are excluded from this cap: their before_usd spans the + # whole fleet, not one block, so scaling them to a single block’s cost + # would drastically under-count the saving. + # + # 1. Build block_key β†’ (resource_type, block) from all parsed blocks. + _block_registry: Dict[tuple, tuple] = {} + for rtype, blist in blocks.items(): + for b in blist: + key = (b["file"], b["start_line"]) + _block_registry[key] = (rtype, b) + + # 2. Group per_finding indices by block_key, excluding fleet rules. + groups: Dict[tuple, List[int]] = defaultdict(list) + for i, pf in enumerate(per_finding): + if pf["rule_id"] not in _FLEET_LEVEL_RULES: + groups[(pf["block_file"], pf["block_line"])].append(i) + + # 3. For each block that has more than one non-fleet finding, cap and scale. + for block_key, indices in groups.items(): + if len(indices) <= 1: + continue # single finding on this block β€” no overlap possible + + rtype_block = _block_registry.get(block_key) + if not rtype_block: + continue + rtype, block = rtype_block + + cost_fn = _RESOURCE_COST_FNS.get(rtype) + if not cost_fn: + continue + + try: + rc = cost_fn(block, pricing, usage) + resource_cap = rc.total_usd_month + except Exception: + continue + + if resource_cap <= 0: + continue + + sum_high = sum(per_finding[i]["saving_high"] for i in indices) + if sum_high <= resource_cap: + continue # already within budget + + # Scale every non-fleet finding on this block proportionally. + ratio = resource_cap / sum_high + for i in indices: + per_finding[i]["saving_low"] = round(per_finding[i]["saving_low"] * ratio, 2) + per_finding[i]["saving_high"] = round(per_finding[i]["saving_high"] * ratio, 2) + + # Recompute totals from the (possibly scaled) values. + low_total = sum(pf["saving_low"] for pf in per_finding) + high_total = sum(pf["saving_high"] for pf in per_finding) + detectable_cost = sum(f["before_usd"] for f in per_finding) + + # Coverage statistics β€” how many blocks are priced vs total found. + priced_types = set(_RESOURCE_COST_FNS.keys()) + known_free = _ZERO_COST_TYPES + covered_found = set(blocks.keys()) & (priced_types | known_free) + uncovered_found = set(blocks.keys()) - priced_types - known_free + total_block_count = sum(len(v) for v in blocks.values()) + covered_block_count = sum(len(blocks[t]) for t in covered_found) + + return { + "low_usd_month": round(low_total, 2), + "high_usd_month": round(high_total, 2), + "detectable_resource_cost_usd_month": round(detectable_cost, 2), + "total_infra_cost_usd_month": None, # populated by Phase 2 + "savings_pct_of_detectable_low": + round(low_total / detectable_cost * 100, 1) if detectable_cost else None, + "savings_pct_of_detectable_high": + round(high_total / detectable_cost * 100, 1) if detectable_cost else None, + "savings_pct_of_total_low": None, # populated by Phase 2 + "savings_pct_of_total_high": None, + "per_finding": per_finding, + "confidence": "medium", + "cost_provider": "internal", + "covered_resource_types": sorted(covered_found), + "uncovered_resource_types": sorted(uncovered_found), + "total_block_count": total_block_count, + "covered_block_count": covered_block_count, + "total_cost_note": ( + "total_infra_cost_usd_month covers only the resource types listed in " + "covered_resource_types. Free or config-opaque resources (VPC, IAM, " + "Security Groups, ACM, Route Tables, etc.) are excluded." + ), + } + + +# ── Phase 2 β€” per-resource cost calculation ─────────────────────────────────── + +def _rc( + block: dict, + fixed: float, + usage_cost: float, + assumptions: List[str], + confidence: str, +) -> ResourceCost: + """Helper: construct a ResourceCost from a block dict and computed costs.""" + total = round(fixed + usage_cost, 2) + return ResourceCost( + resource_type=block["resource_type"], + resource_name=block["name"], + file=block["file"], + line=block["start_line"], + fixed_usd_month=round(fixed, 2), + usage_usd_month=round(usage_cost, 2), + min_usd_month=round(fixed, 2), + total_usd_month=total, + assumptions=assumptions, + confidence=confidence, + ) + + +def _cost_aws_instance(block: dict, pricing: dict, usage: dict) -> ResourceCost: + ec2 = pricing.get("ec2_instances", {}) + m = re.search(r'instance_type\s*=\s*["\']([^"\']+)["\']', block["content"]) + if m: + inst = m.group(1).strip() + inst_cost = ec2.get(inst, 50.0) + conf = "high" + else: + # instance_type is a variable/local reference β€” use the inferred average cost + hints = usage.get("_var_hints", {}) + inst = hints.get("instance_type", "~inferred") + inst_cost = hints.get("_ec2_fallback_cost", 50.0) + conf = "low" if hints.get("_ec2_fallback_cost") else "medium" + + # Root block device EBS + size_m = re.search( + r'root_block_device\s*\{[^}]*volume_size\s*=\s*(\d+)', block["content"], re.DOTALL + ) + vol_m = re.search( + r'root_block_device\s*\{[^}]*volume_type\s*=\s*["\']([^"\']+)["\']', + block["content"], re.DOTALL, + ) + size = int(size_m.group(1)) if size_m else 8 + vol_type = vol_m.group(1) if vol_m else "gp3" + ebs_cost = size * pricing.get("ebs_per_gb_month", {}).get(vol_type, 0.08) + + return _rc( + block, inst_cost + ebs_cost, 0.0, + [f"{inst} on-demand us-east-1", f"{vol_type} {size}GB root volume"], + conf, + ) + + +def _cost_aws_db_instance(block: dict, pricing: dict, usage: dict) -> ResourceCost: + rds = pricing.get("rds_instances", {}) + m = re.search(r'instance_class\s*=\s*["\']([^"\']+)["\']', block["content"]) + if m: + inst = m.group(1).strip() + cost = rds.get(inst, 100.0) + conf = "high" + else: + hints = usage.get("_var_hints", {}) + inst = hints.get("instance_class", "~inferred") + cost = hints.get("_rds_fallback_cost", 100.0) + conf = "low" if hints.get("_rds_fallback_cost") else "medium" + multi_az = bool(re.search(r'multi_az\s*=\s*true', block["content"])) + if multi_az: + cost *= 2 + return _rc( + block, cost, 0.0, + [f"{inst} on-demand us-east-1" + (" multi-AZ" if multi_az else "")], + conf, + ) + + +def _cost_aws_ebs_volume(block: dict, pricing: dict, usage: dict) -> ResourceCost: + ebs = pricing.get("ebs_per_gb_month", {}) + iops_pricing = pricing.get("ebs_iops_per_iops_month", {}) + size_m = re.search(r'(?:size|volume_size)\s*=\s*(\d+)', block["content"]) + vol_m = re.search(r'(?:type|volume_type)\s*=\s*["\']([^"\']+)["\']', block["content"]) + size = int(size_m.group(1)) if size_m else 20 + vol_type = vol_m.group(1) if vol_m else "gp3" + cost = size * ebs.get(vol_type, 0.08) + iops_cost = 0.0 + iops = 0 + if vol_type in ("io1", "io2"): + iops_m = re.search(r'\biops\s*=\s*(\d+)', block["content"]) + iops = int(iops_m.group(1)) if iops_m else 3000 # default: provisioned IOPS minimum practical value + iops_cost = iops * iops_pricing.get("io1", 0.065) + conf = "high" if (size_m and vol_m) else "medium" + assumptions = [f"{vol_type} {size}GB"] + if iops: + assumptions.append(f"{iops} IOPS") + return _rc(block, cost + iops_cost, 0.0, assumptions, conf) + + +def _cost_aws_nat_gateway(block: dict, pricing: dict, usage: dict) -> ResourceCost: + hourly = pricing.get("nat_gateway_hourly", 0.045) + per_gb = pricing.get("nat_gateway_per_gb", 0.045) + gb_day = usage.get("nat_gb_per_day", 10.0) + fixed = hourly * 730 + usage_c = per_gb * gb_day * 30 + return _rc( + block, fixed, usage_c, + [f"$0.045/hr fixed + $0.045/GB Γ— {gb_day}GB/d Γ— 30d data"], + "medium", + ) + + +def _cost_aws_eip(block: dict, pricing: dict, usage: dict) -> ResourceCost: + hourly = pricing.get("eip_unattached_per_hour", 0.005) + cost = hourly * 730 + return _rc(block, cost, 0.0, ["$0.005/hr Γ— 730h/mo (unattached)"], "high") + + +def _cost_aws_lb(block: dict, pricing: dict, usage: dict) -> ResourceCost: + hourly = pricing.get("alb_per_hour", 0.0225) + per_lcu = pricing.get("alb_per_lcu_hour", 0.008) + data_gb = usage.get("lb_data_processed_gb", 10.0) + fixed = hourly * 730 + # Processed bytes dimension: 1 LCU = 1 GB/hr; estimate from monthly total + lcu_hr = data_gb / 730 + usage_c = round(per_lcu * lcu_hr * 730, 2) + return _rc( + block, round(fixed, 2), usage_c, + [ + f"${hourly}/hr Γ— 730h/mo base", + f"~{lcu_hr:.3f} LCU/hr from {data_gb}GB/mo processed (LCU approx)", + ], + "low", + ) + + +def _cost_aws_lambda_function(block: dict, pricing: dict, usage: dict) -> ResourceCost: + per_gb_sec = pricing.get("lambda_per_gb_second", 0.0000166667) + per_1m_req = pricing.get("lambda_per_1m_requests", 0.20) + invocations = usage.get("lambda_invocations_per_mo", 1_000_000) + duration_ms = usage.get("lambda_avg_duration_ms", 200.0) + mem_m = re.search(r'memory_size\s*=\s*(\d+)', block["content"]) + mem_mb = float(mem_m.group(1)) if mem_m else 128.0 + gb_seconds = (mem_mb / 1024) * (duration_ms / 1000) * invocations + compute_cost = per_gb_sec * gb_seconds + req_cost = per_1m_req * (invocations / 1_000_000) + conf = "medium" + return _rc( + block, 0.0, round(compute_cost + req_cost, 4), + [f"{int(mem_mb)}MB, {invocations/1e6:.0f}M invocations/mo, {duration_ms}ms avg"], + conf, + ) + + +def _cost_aws_cloudwatch_log_group(block: dict, pricing: dict, usage: dict) -> ResourceCost: + per_gb_ingested = pricing.get("cloudwatch_logs_per_gb_ingested", 0.50) + per_gb_stored = pricing.get("cloudwatch_logs_per_gb_stored", 0.03) + gb_mo = usage.get("cw_log_gb_per_month", 5.0) + ret_m = re.search(r'retention_in_days\s*=\s*(\d+)', block["content"]) + ret_days = int(ret_m.group(1)) if ret_m else 0 # 0 = forever + stored_gb = gb_mo * (ret_days / 30.0) if ret_days else gb_mo * 12 + ingestion_cost = per_gb_ingested * gb_mo + storage_cost = per_gb_stored * stored_gb + return _rc( + block, 0.0, round(ingestion_cost + storage_cost, 4), + [ + f"{gb_mo}GB/mo ingested", + f"retention={'forever' if not ret_days else f'{ret_days}d'}", + ], + "medium", + ) + + +def _cost_aws_s3_bucket(block: dict, pricing: dict, usage: dict) -> ResourceCost: + per_gb = pricing.get("s3_per_gb_standard", 0.023) + gb = usage.get("s3_gb_standard", 50.0) + usage_c = per_gb * gb + return _rc( + block, 0.0, round(usage_c, 4), + [f"{gb}GB Standard storage (lifecycle/request charges excluded)"], + "low", + ) + + +def _cost_aws_dynamodb_table(block: dict, pricing: dict, usage: dict) -> ResourceCost: + per_rcu_hr = pricing.get("dynamodb_per_rcu_hour", 0.00013) + per_wcu_hr = pricing.get("dynamodb_per_wcu_hour", 0.00065) + billing_m = re.search(r'billing_mode\s*=\s*["\']([^"\']+)["\']', block["content"]) + mode = billing_m.group(1).upper() if billing_m else "PROVISIONED" + if mode == "PAY_PER_REQUEST": + return _rc( + block, 0.0, 0.0, + ["on-demand mode (usage-dependent, no static estimate)"], + "low", + ) + rcu_m = re.search(r'read_capacity\s*=\s*(\d+)', block["content"]) + wcu_m = re.search(r'write_capacity\s*=\s*(\d+)', block["content"]) + rcu = int(rcu_m.group(1)) if rcu_m else 5 + wcu = int(wcu_m.group(1)) if wcu_m else 5 + cost = (rcu * per_rcu_hr + wcu * per_wcu_hr) * 730 + return _rc( + block, cost, 0.0, + [f"PROVISIONED RCU={rcu} WCU={wcu}"], + "high" if (rcu_m and wcu_m) else "medium", + ) + + +def _cost_aws_ecs_task_definition(block: dict, pricing: dict, usage: dict) -> ResourceCost: + vcpu_hr = pricing.get("ecs_fargate_per_vcpu_hour", 0.04048) + gb_hr = pricing.get("ecs_fargate_per_gb_hour", 0.004445) + cpu_m = re.search(r'^\s*cpu\s*=\s*["\']?(\d+)', block["content"], re.MULTILINE) + mem_m = re.search(r'^\s*memory\s*=\s*["\']?(\d+)', block["content"], re.MULTILINE) + cpu_units = int(cpu_m.group(1)) if cpu_m else 256 + mem_mb = int(mem_m.group(1)) if mem_m else 512 + vcpu = cpu_units / 1024 + mem_gb = mem_mb / 1024 + cost = (vcpu_hr * vcpu + gb_hr * mem_gb) * 730 + conf = "high" if (cpu_m and mem_m) else "low" + return _rc( + block, cost, 0.0, + [f"{vcpu}vCPU {mem_gb:.2f}GB Fargate"], + conf, + ) + + +def _cost_aws_api_gateway_rest_api(block: dict, pricing: dict, usage: dict) -> ResourceCost: + per_1m = pricing.get("api_gw_rest_per_1m_calls", 3.50) + calls_mo = usage.get("api_calls_per_mo", 1_000_000) + return _rc( + block, 0.0, round(per_1m * calls_mo / 1_000_000, 4), + [f"{calls_mo/1e6:.0f}M calls/mo REST API"], + "low", + ) + + +def _cost_aws_apigatewayv2_api(block: dict, pricing: dict, usage: dict) -> ResourceCost: + per_1m = pricing.get("api_gw_http_per_1m_calls", 1.00) + calls_mo = usage.get("api_calls_per_mo", 1_000_000) + return _rc( + block, 0.0, round(per_1m * calls_mo / 1_000_000, 4), + [f"{calls_mo/1e6:.0f}M calls/mo HTTP API"], + "low", + ) + + +def _cost_aws_kinesis_stream(block: dict, pricing: dict, usage: dict) -> ResourceCost: + shard_hr = pricing.get("kinesis_shard_per_hour", 0.015) + shard_m = re.search(r'shard_count\s*=\s*(\d+)', block["content"]) + shards = int(shard_m.group(1)) if shard_m else 1 + cost = shard_hr * shards * 730 + return _rc( + block, cost, 0.0, + [f"{shards} shard(s) Γ— $0.015/hr Γ— 730h/mo"], + "high" if shard_m else "medium", + ) + + +def _cost_aws_route53_health_check(block: dict, pricing: dict, usage: dict) -> ResourceCost: + cost = pricing.get("route53_health_check_per_month", 0.50) + return _rc(block, cost, 0.0, ["$0.50/health-check/month"], "high") + + +def _cost_aws_rds_cluster_instance(block: dict, pricing: dict, usage: dict) -> ResourceCost: + """Aurora cluster instance β€” shares the rds_instances price table.""" + rds = pricing.get("rds_instances", {}) + m = re.search(r'instance_class\s*=\s*["\']([^"\']+)["\']', block["content"]) + inst = m.group(1).strip() if m else "db.t3.medium" + cost = rds.get(inst, 0.0) + conf = "high" if m else "medium" + return _rc(block, cost, 0.0, [f"{inst} Aurora instance us-east-1"], conf) + + +def _cost_aws_elasticache_replication_group(block: dict, pricing: dict, usage: dict) -> ResourceCost: + elasticache = pricing.get("elasticache_instances", {}) + m = re.search(r'node_type\s*=\s*["\']([^"\']+)["\']', block["content"]) + node_type = m.group(1).strip() if m else "cache.t3.micro" + per_node = elasticache.get(node_type, 30.0) + rep_m = re.search(r'(?:num_cache_clusters|replicas_per_node_group)\s*=\s*(\d+)', block["content"]) + nodes = int(rep_m.group(1)) if rep_m else 1 + conf = "high" if (m and rep_m) else "medium" if m else "low" + return _rc(block, per_node * nodes, 0.0, [f"{node_type} Γ— {nodes} nodes (ElastiCache)"], conf) + + +def _cost_aws_elasticache_cluster(block: dict, pricing: dict, usage: dict) -> ResourceCost: + elasticache = pricing.get("elasticache_instances", {}) + m = re.search(r'node_type\s*=\s*["\']([^"\']+)["\']', block["content"]) + node_type = m.group(1).strip() if m else "cache.t3.micro" + per_node = elasticache.get(node_type, 30.0) + num_m = re.search(r'num_cache_nodes\s*=\s*(\d+)', block["content"]) + nodes = int(num_m.group(1)) if num_m else 1 + conf = "high" if (m and num_m) else "medium" if m else "low" + return _rc(block, per_node * nodes, 0.0, [f"{node_type} Γ— {nodes} nodes (ElastiCache)"], conf) + + +def _cost_aws_cloudwatch_metric_alarm(block: dict, pricing: dict, usage: dict) -> ResourceCost: + cost = pricing.get("cloudwatch_alarm_per_month", 0.10) + return _rc(block, cost, 0.0, ["$0.10/alarm/month (standard metrics)"], "high") + + +def _cost_aws_eks_cluster(block: dict, pricing: dict, usage: dict) -> ResourceCost: + hourly = pricing.get("eks_cluster_per_hour", 0.10) + cost = hourly * 730 + return _rc(block, cost, 0.0, ["$0.10/hr Γ— 730h/mo (control plane only; node costs separate)"], "high") + + +def _cost_aws_secretsmanager_secret(block: dict, pricing: dict, usage: dict) -> ResourceCost: + base_mo = pricing.get("secretsmanager_secret_per_month", 0.40) + per_10k = pricing.get("secretsmanager_per_10k_api_calls", 0.05) + requests = usage.get("secretsmanager_requests_per_mo", 10000) + req_cost = per_10k * (requests / 10000) + return _rc( + block, base_mo, round(req_cost, 4), + [ + "$0.40/secret/month", + f"$0.05/10k API calls Γ— {requests/1000:.0f}K calls/mo", + ], + "medium", + ) + + +def _cost_aws_vpc_endpoint(block: dict, pricing: dict, usage: dict) -> ResourceCost: + ep_type_m = re.search(r'vpc_endpoint_type\s*=\s*["\']([^"\']+)["\']', block["content"]) + ep_type = ep_type_m.group(1).upper() if ep_type_m else "Interface" + if ep_type == "Gateway": + return _rc(block, 0.0, 0.0, ["Gateway endpoint β€” free (S3/DynamoDB)"], "high") + hourly = pricing.get("vpc_endpoint_per_hour", 0.01) + per_gb = pricing.get("vpc_endpoint_per_gb", 0.01) + data_gb = usage.get("vpc_endpoint_data_gb_per_mo", 20.0) + fixed = hourly * 730 + usage_c = per_gb * data_gb + return _rc( + block, round(fixed, 2), round(usage_c, 4), + [f"Interface $0.01/hr Γ— 730h + $0.01/GB Γ— {data_gb}GB/mo data processed"], + "medium", + ) + + +def _cost_aws_wafv2_web_acl(block: dict, pricing: dict, usage: dict) -> ResourceCost: + cost = pricing.get("wafv2_web_acl_per_month", 5.0) + return _rc(block, cost, 0.0, ["$5.00/ACL/month REGIONAL (rule/request charges excluded)"], "medium") + + +def _cost_aws_msk_cluster(block: dict, pricing: dict, usage: dict) -> ResourceCost: + msk_prices = pricing.get("msk_brokers", {}) + broker_m = re.search(r'instance_type\s*=\s*["\']([^"\']+)["\']', block["content"]) + broker_type = broker_m.group(1).strip() if broker_m else "kafka.m5.large" + per_hr = msk_prices.get(broker_type, pricing.get("msk_m5_large_per_hour", 0.212)) + count_m = re.search(r'number_of_broker_nodes\s*=\s*(\d+)', block["content"]) + brokers = int(count_m.group(1)) if count_m else 3 + cost = per_hr * brokers * 730 + conf = "high" if (broker_m and count_m) else "medium" + return _rc(block, cost, 0.0, [f"{broker_type} Γ— {brokers} brokers (MSK)"], conf) + + +def _cost_aws_opensearch_domain(block: dict, pricing: dict, usage: dict) -> ResourceCost: + os_prices = pricing.get("opensearch_instances", {}) + m = re.search(r'instance_type\s*=\s*["\']([^"\']+)["\']', block["content"]) + inst_type = m.group(1).strip() if m else "t3.small.search" + per_hr = os_prices.get(inst_type, 0.036) + count_m = re.search(r'instance_count\s*=\s*(\d+)', block["content"]) + count = int(count_m.group(1)) if count_m else 1 + cost = per_hr * count * 730 + conf = "high" if m else "medium" + return _rc(block, cost, 0.0, [f"{inst_type} Γ— {count} nodes (OpenSearch)"], conf) + + +def _cost_aws_redshift_cluster(block: dict, pricing: dict, usage: dict) -> ResourceCost: + rs_prices = pricing.get("redshift_nodes", {}) + m = re.search(r'node_type\s*=\s*["\']([^"\']+)["\']', block["content"]) + node_type = m.group(1).strip() if m else "dc2.large" + per_hr = rs_prices.get(node_type, 0.25) + count_m = re.search(r'number_of_nodes\s*=\s*(\d+)', block["content"]) + nodes = int(count_m.group(1)) if count_m else 1 + cost = per_hr * nodes * 730 + conf = "high" if m else "medium" + return _rc(block, cost, 0.0, [f"{node_type} Γ— {nodes} nodes (Redshift)"], conf) + + +def _cost_aws_sfn_state_machine(block: dict, pricing: dict, usage: dict) -> ResourceCost: + transitions = usage.get("sfn_transitions_per_mo", 100_000) + per_1k = pricing.get("sfn_per_1k_state_transitions", 0.025) + cost = transitions / 1000 * per_1k + return _rc(block, 0.0, round(cost, 4), [f"{transitions/1000:.0f}K transitions/mo (Standard Workflow)"], "low") + + +def _cost_aws_route53_zone(block: dict, pricing: dict, usage: dict) -> ResourceCost: + cost = pricing.get("route53_hosted_zone_per_month", 0.50) + return _rc(block, cost, 0.0, ["$0.50/hosted zone/month (first 25 zones)"], "high") + + +# ── New cost functions ──────────────────────────────────────────────────────── + +def _cost_aws_kms_key(block: dict, pricing: dict, usage: dict) -> ResourceCost: + cost = pricing.get("kms_key_per_month", 1.0) + return _rc(block, cost, 0.0, ["$1.00/key/month + API call charges"], "high") + + +def _cost_aws_efs_file_system(block: dict, pricing: dict, usage: dict) -> ResourceCost: + per_gb = pricing.get("efs_per_gb_month", 0.30) + gb = usage.get("efs_gb_stored", 100.0) + return _rc(block, round(per_gb * gb, 2), 0.0, + [f"Standard storage ${per_gb}/GB-mo Γ— {gb}GB assumed"], "low") + + +def _cost_aws_ec2_transit_gateway(block: dict, pricing: dict, usage: dict) -> ResourceCost: + per_hr = pricing.get("tgw_attachment_per_hour", 0.05) + attachments = usage.get("tgw_attachments", 1) + cost = per_hr * 730 * attachments + return _rc(block, round(cost, 2), 0.0, + [f"${per_hr}/attachment/hr Γ— {attachments} attachment(s) Γ— 730h/mo"], "low") + + +def _cost_aws_ec2_transit_gateway_vpc_attachment(block: dict, pricing: dict, usage: dict) -> ResourceCost: + per_hr = pricing.get("tgw_attachment_per_hour", 0.05) + per_gb = pricing.get("tgw_data_per_gb", 0.02) + data_gb = usage.get("tgw_data_processed_gb_per_mo", 50.0) + fixed = per_hr * 730 + usage_c = per_gb * data_gb + return _rc( + block, round(fixed, 2), round(usage_c, 2), + [f"$0.05/hr Γ— 730h/mo + $0.02/GB Γ— {data_gb}GB/mo data processed"], + "medium", + ) + + +def _cost_aws_cloudwatch_dashboard(block: dict, pricing: dict, usage: dict) -> ResourceCost: + cost = pricing.get("cloudwatch_dashboard_per_month", 3.0) + return _rc(block, cost, 0.0, ["$3.00/dashboard/month (first 3 dashboards free)"], "high") + + +def _cost_aws_ecr_repository(block: dict, pricing: dict, usage: dict) -> ResourceCost: + per_gb = pricing.get("ecr_per_gb_month", 0.10) + gb = usage.get("ecr_gb_stored", 10.0) + return _rc(block, round(per_gb * gb, 2), 0.0, + [f"$0.10/GB-mo Γ— {gb}GB image storage assumed"], "low") + + +def _cost_aws_cloudfront_distribution(block: dict, pricing: dict, usage: dict) -> ResourceCost: + per_10k_https = pricing.get("cloudfront_per_10k_https_requests", 0.0075) + per_gb_out = pricing.get("cloudfront_per_gb_transfer_out", 0.085) + requests_mo = usage.get("cloudfront_requests_per_mo", 1_000_000) + gb_out_mo = usage.get("cloudfront_gb_out_per_mo", 50.0) + req_cost = per_10k_https * (requests_mo / 10_000) + transfer_cost = per_gb_out * gb_out_mo + return _rc( + block, 0.0, round(req_cost + transfer_cost, 2), + [ + f"{requests_mo/1e6:.0f}M HTTPS requests/mo", + f"{gb_out_mo}GB transfer out/mo", + ], + "low", + ) + + +# Known-free resource types β€” these carry no fixed monthly charge. +# Listed here so coverage stats don't treat them as "unpriced". +_ZERO_COST_TYPES: frozenset = frozenset({ + # IAM + "aws_iam_role", "aws_iam_policy", "aws_iam_role_policy", + "aws_iam_role_policy_attachment", "aws_iam_user", "aws_iam_user_policy", + "aws_iam_user_policy_attachment", "aws_iam_access_key", "aws_iam_group", + "aws_iam_group_policy", "aws_iam_group_membership", "aws_iam_group_policy_attachment", + "aws_iam_instance_profile", "aws_iam_service_linked_role", + "aws_iam_openid_connect_provider", "aws_iam_saml_provider", + "aws_iam_account_password_policy", + # VPC / networking (free β€” data transfer billed separately) + "aws_vpc", "aws_subnet", "aws_internet_gateway", "aws_vpn_gateway", + "aws_customer_gateway", "aws_vpn_connection", + "aws_route_table", "aws_route_table_association", "aws_route", + "aws_main_route_table_association", "aws_default_route_table", + "aws_security_group", "aws_security_group_rule", "aws_vpc_security_group_ingress_rule", + "aws_vpc_security_group_egress_rule", + "aws_network_acl", "aws_network_acl_rule", + "aws_default_network_acl", "aws_default_security_group", + "aws_network_interface", "aws_network_interface_attachment", + "aws_flow_log", + "aws_db_subnet_group", "aws_db_parameter_group", + "aws_rds_cluster_parameter_group", "aws_rds_cluster", "aws_rds_global_cluster", + "aws_rds_cluster_endpoint", + # EKS sub-resources (node compute billed at EC2 rates) + "aws_eks_addon", "aws_eks_identity_provider_config", + "aws_eks_node_group", "aws_eks_fargate_profile", + "aws_eks_access_entry", "aws_eks_access_policy_association", + # Compute sub-resources + "aws_autoscaling_group", "aws_autoscaling_attachment", + "aws_autoscaling_notification", "aws_autoscaling_schedule", + "aws_autoscaling_policy", "aws_autoscaling_lifecycle_hook", + "aws_launch_template", "aws_launch_configuration", + "aws_key_pair", "aws_ami", + "aws_placement_group", "aws_spot_fleet_request", + # Load balancer sub-resources (LB itself is priced) + "aws_lb_listener", "aws_lb_listener_rule", "aws_lb_target_group", + "aws_lb_target_group_attachment", "aws_alb_listener", "aws_alb_target_group", + "aws_lb_cookie_stickiness_policy", + # API Gateway sub-resources (REST API is priced) + "aws_api_gateway_api_key", "aws_api_gateway_deployment", + "aws_api_gateway_integration", "aws_api_gateway_integration_response", + "aws_api_gateway_method", "aws_api_gateway_method_response", + "aws_api_gateway_method_settings", "aws_api_gateway_model", + "aws_api_gateway_resource", "aws_api_gateway_stage", + "aws_api_gateway_usage_plan", "aws_api_gateway_usage_plan_key", + "aws_api_gateway_account", "aws_api_gateway_base_path_mapping", + "aws_api_gateway_domain_name", "aws_api_gateway_vpc_link", + "aws_api_gateway_authorizer", "aws_api_gateway_gateway_response", + "aws_api_gateway_request_validator", + # S3 sub-resources (bucket itself is priced) + "aws_s3_bucket_policy", "aws_s3_bucket_acl", + "aws_s3_bucket_versioning", "aws_s3_bucket_cors_configuration", + "aws_s3_bucket_lifecycle_configuration", "aws_s3_bucket_notification", + "aws_s3_bucket_server_side_encryption_configuration", + "aws_s3_bucket_replication_configuration", + "aws_s3_bucket_ownership_controls", "aws_s3_bucket_public_access_block", + "aws_s3_bucket_object", "aws_s3_object", "aws_s3_bucket_logging", + "aws_s3_bucket_metric", "aws_s3_bucket_intelligent_tiering_configuration", + # Lambda sub-resources + "aws_lambda_permission", "aws_lambda_event_source_mapping", + "aws_lambda_alias", "aws_lambda_layer_version", + "aws_lambda_code_signing_config", + # CloudWatch sub-resources + "aws_cloudwatch_event_rule", "aws_cloudwatch_event_target", + "aws_cloudwatch_event_bus", "aws_cloudwatch_event_permission", + "aws_cloudwatch_log_subscription_filter", + "aws_cloudwatch_log_metric_filter", "aws_cloudwatch_log_resource_policy", + "aws_cloudwatch_log_stream", "aws_cloudwatch_composite_alarm", + # SNS/SQS β€” usage-based only; fixed cost is zero + "aws_sns_topic", "aws_sns_topic_subscription", "aws_sns_topic_policy", + "aws_sqs_queue", "aws_sqs_queue_policy", + # ElastiCache sub-resources + "aws_elasticache_subnet_group", "aws_elasticache_parameter_group", + "aws_elasticache_security_group", + # ACM + "aws_acm_certificate", "aws_acm_certificate_validation", + # Route53 sub-resources (zone itself is priced) + "aws_route53_record", + # RAM + "aws_ram_resource_share", "aws_ram_resource_association", + "aws_ram_principal_association", + # SSM + "aws_ssm_parameter", "aws_ssm_document", + "aws_ssm_patch_baseline", "aws_ssm_patch_group", + "aws_ssm_association", "aws_ssm_maintenance_window", + # ECR sub-resources + "aws_ecr_lifecycle_policy", "aws_ecr_repository_policy", + # EFS sub-resources + "aws_efs_mount_target", "aws_efs_access_point", + # Misc governance / config (no direct monthly charge) + "aws_budgets_budget", + "aws_organizations_account", "aws_organizations_organization", + "aws_organizations_policy", "aws_organizations_policy_attachment", + "aws_organizations_organizational_unit", + "aws_service_quota", + "aws_cloudformation_stack", "aws_cloudformation_stack_set", + "aws_servicecatalog_portfolio", + # Tags / meta + "aws_default_tags", "aws_provider", + # EIP sub-resources + "aws_eip_association", + # Volume sub-resources + "aws_volume_attachment", + # KMS sub-resources (key itself is priced) + "aws_kms_alias", "aws_kms_key_policy", "aws_kms_grant", + # Secrets Manager sub-resources + "aws_secretsmanager_secret_version", + # Classic ELB sub-resources + "aws_load_balancer_policy", "aws_load_balancer_listener_policy", + # Auto Scaling sub-resources (compute is billed via EC2 instances) + "aws_appautoscaling_target", "aws_appautoscaling_policy", + # IAM sub-resources + "aws_iam_user_login_profile", "aws_iam_user_ssh_key", + # Miscellaneous AWS sub-resources + "aws_lambda_function_event_invoke_config", + "aws_db_event_subscription", + "aws_ec2_tag", + "aws_cloudfront_origin_access_identity", + "aws_vpc_endpoint_route_table_association", + # Non-AWS providers β€” Terraform creates these but they carry no AWS cost + "null_resource", "local_file", + "random_password", "random_string", "random_id", "random_integer", + "time_sleep", "time_rotating", + # Kubernetes provider (cost is in EKS cluster / node EC2 instances) + "kubernetes_cluster_role", "kubernetes_cluster_role_binding", + "kubernetes_config_map", "kubernetes_config_map_v1", + "kubernetes_deployment", "kubernetes_namespace", + "kubernetes_role", "kubernetes_role_binding", + "kubernetes_secret", "kubernetes_service", + "kubernetes_service_account", "kubernetes_storage_class_v1", + # Helm provider (chart installs β€” AWS cost is in underlying resources) + "helm_release", + # Cloudflare provider β€” separate billing, not AWS + "cloudflare_dns_record", "cloudflare_record", + "cloudflare_load_balancer", "cloudflare_load_balancer_pool", + "cloudflare_load_balancer_monitor", + # Sub-resources confirmed uncovered in practice + "aws_backup_plan", "aws_backup_selection", "aws_backup_vault", + "aws_ec2_transit_gateway_peering_attachment", + "aws_ec2_transit_gateway_route", + "aws_s3_bucket_website_configuration", + "aws_vpc_endpoint_service", + # EC2 network observability β€” no direct AWS charge + "aws_ec2_instance_connect_endpoint", + "aws_ec2_traffic_mirror_filter", + "aws_ec2_traffic_mirror_filter_rule", + "aws_ec2_traffic_mirror_session", + "aws_ec2_traffic_mirror_target", +}) + +# Registry of per-resource-type cost functions. +_RESOURCE_COST_FNS: Dict[str, Callable] = { + "aws_instance": _cost_aws_instance, + "aws_db_instance": _cost_aws_db_instance, + "aws_rds_cluster_instance": _cost_aws_rds_cluster_instance, + "aws_ebs_volume": _cost_aws_ebs_volume, + "aws_nat_gateway": _cost_aws_nat_gateway, + "aws_eip": _cost_aws_eip, + "aws_lb": _cost_aws_lb, + "aws_alb": _cost_aws_lb, + "aws_elb": _cost_aws_lb, + "aws_lambda_function": _cost_aws_lambda_function, + "aws_cloudwatch_log_group": _cost_aws_cloudwatch_log_group, + "aws_cloudwatch_metric_alarm": _cost_aws_cloudwatch_metric_alarm, + "aws_s3_bucket": _cost_aws_s3_bucket, + "aws_dynamodb_table": _cost_aws_dynamodb_table, + "aws_ecs_task_definition": _cost_aws_ecs_task_definition, + "aws_eks_cluster": _cost_aws_eks_cluster, + "aws_elasticache_cluster": _cost_aws_elasticache_cluster, + "aws_elasticache_replication_group": _cost_aws_elasticache_replication_group, + "aws_msk_cluster": _cost_aws_msk_cluster, + "aws_opensearch_domain": _cost_aws_opensearch_domain, + "aws_elasticsearch_domain": _cost_aws_opensearch_domain, # alias + "aws_redshift_cluster": _cost_aws_redshift_cluster, + "aws_secretsmanager_secret": _cost_aws_secretsmanager_secret, + "aws_vpc_endpoint": _cost_aws_vpc_endpoint, + "aws_wafv2_web_acl": _cost_aws_wafv2_web_acl, + "aws_sfn_state_machine": _cost_aws_sfn_state_machine, + "aws_api_gateway_rest_api": _cost_aws_api_gateway_rest_api, + "aws_apigatewayv2_api": _cost_aws_apigatewayv2_api, + "aws_kinesis_stream": _cost_aws_kinesis_stream, + "aws_route53_health_check": _cost_aws_route53_health_check, + "aws_route53_zone": _cost_aws_route53_zone, + "aws_kms_key": _cost_aws_kms_key, + "aws_efs_file_system": _cost_aws_efs_file_system, + "aws_ec2_transit_gateway": _cost_aws_ec2_transit_gateway, + "aws_ec2_transit_gateway_vpc_attachment": _cost_aws_ec2_transit_gateway_vpc_attachment, + "aws_cloudwatch_dashboard": _cost_aws_cloudwatch_dashboard, + "aws_ecr_repository": _cost_aws_ecr_repository, + "aws_cloudfront_distribution": _cost_aws_cloudfront_distribution, +} + + +def estimate_total_cost( + blocks: Dict[str, List[dict]], pricing: dict, usage: dict +) -> List[ResourceCost]: + """ + Compute a monthly cost for every resource block whose type has a cost + function registered in *_RESOURCE_COST_FNS*. + + Returns a list of :class:`ResourceCost` objects. + """ + var_hints = _build_var_hints(blocks, pricing) + usage_with_hints = {**usage, "_var_hints": var_hints} if var_hints else usage + + results: List[ResourceCost] = [] + for resource_type, cost_fn in _RESOURCE_COST_FNS.items(): + for block in blocks.get(resource_type, []): + try: + rc = cost_fn(block, pricing, usage_with_hints) + results.append(rc) + except Exception as exc: + logger.warning( + "cost_fn for %s %s failed: %s", + resource_type, block.get("name"), exc, + ) + return results + + +# ── Phase 3 β€” Markdown / summary formatters ────────────────────────────────── + +def format_savings_summary_md( + savings_estimate: dict, + overall_grade: Optional[str] = None, + overall_pct: Optional[float] = None, + security_findings: Optional[List[dict]] = None, + container_findings: Optional[List[dict]] = None, +) -> str: + """ + Return GitHub-flavoured Markdown for the GitHub Actions step summary and + PR comment. + + *savings_estimate* must have ``low_usd_month > 0``; callers are responsible + for that guard. *security_findings* and *container_findings* are the raw + finding lists from ScanReport β€” only critical/high entries are surfaced. + """ + if not savings_estimate: + return "" + + low = savings_estimate.get("low_usd_month", 0) + high = savings_estimate.get("high_usd_month", 0) + total = savings_estimate.get("total_infra_cost_usd_month") + pct_low = savings_estimate.get("savings_pct_of_total_low") + pct_high = savings_estimate.get("savings_pct_of_total_high") + + lines = ["## πŸ” InfraScan Report", ""] + + # ── Savings headline banner ─────────────────────────────────────────────── + def _fmt_usd(n: float) -> str: + """Format a dollar amount; show '<$1' for positive sub-dollar values.""" + if 0 < n < 1: + return "<$1" + return f"${n:,.0f}" + + if high > 0: + range_str = _fmt_usd(low) if low == high else f"{_fmt_usd(low)} – {_fmt_usd(high)}" + lines.append(f"### πŸ’° Cost Savings Estimate: **{range_str}/mo**") + if pct_low is not None and total: + lines.append(f"> {pct_low}%–{pct_high}% of **{_fmt_usd(total)}/mo** total infrastructure cost") + elif total: + lines.append(f"> vs. **{_fmt_usd(total)}/mo** measured infrastructure cost") + lines.append("") + + lines += ["| Metric | Value |", "|---|---|"] + + if total: + lines.append(f"| Estimated monthly infrastructure cost | **{_fmt_usd(total)}** |") + if pct_low is not None: + lines.append( + f"| Potential savings (low) | **{_fmt_usd(low)}/mo** ({pct_low}%) |" + ) + lines.append( + f"| Potential savings (high) | **{_fmt_usd(high)}/mo** ({pct_high}%) |" + ) + else: + lines.append(f"| Potential savings (low) | **{_fmt_usd(low)}/mo** |") + lines.append(f"| Potential savings (high) | **{_fmt_usd(high)}/mo** |") + + if overall_grade: + lines.append(f"| Overall grade | **{overall_grade} ({overall_pct}%)** |") + + # Top 3 cost savings opportunities (only findings with actual saving > 0) + per_finding = sorted( + [f for f in savings_estimate.get("per_finding", []) if f.get("saving_high", 0) > 0], + key=lambda f: f.get("saving_high", 0), + reverse=True, + )[:3] + if per_finding: + lines += [ + "", + "### πŸ’° Top cost savings opportunities", + "| Rule | File | Saving/month |", + "|---|---|---|", + ] + for pf in per_finding: + s_low = pf.get("saving_low", 0) + s_high = pf.get("saving_high", 0) + fname = os.path.basename(pf.get("file", "")) + line_n = pf.get("line", "") + saving_str = _fmt_usd(s_low) if s_low == s_high else f"{_fmt_usd(s_low)}–{_fmt_usd(s_high)}" + lines.append( + f"| {pf.get('rule_id', '')} | {fname}:{line_n} | {saving_str} |" + ) + + # Top security issues β€” critical and high only, max 3, across IaC + containers + _sev_order = {"critical": 0, "high": 1} + top_sec: List[dict] = [] + for f in (security_findings or []) + (container_findings or []): + if f.get("severity", "").lower() in _sev_order: + top_sec.append(f) + top_sec.sort(key=lambda f: (_sev_order.get(f.get("severity", "").lower(), 9),)) + top_sec = top_sec[:3] + + if top_sec: + lines += [ + "", + "### πŸ”’ Top security issues (critical/high)", + "| Severity | Rule | Location |", + "|---|---|---|", + ] + for sf in top_sec: + sev = sf.get("severity", "").upper() + icon = "πŸ”΄" if sev == "CRITICAL" else "🟠" + rid = sf.get("rule_id") or sf.get("check_id", "") + fname = os.path.basename(sf.get("file", "") or sf.get("image", "")) + line_n = sf.get("line", "") + loc = f"{fname}:{line_n}" if line_n else fname + lines.append(f"| {icon} {sev} | {rid} | {loc} |") + + return "\n".join(lines) diff --git a/reporter/grading.py b/reporter/grading.py index f9b09e1..83ea905 100644 --- a/reporter/grading.py +++ b/reporter/grading.py @@ -304,7 +304,9 @@ def generate_report(self, findings: List[Dict[str, Any]], resource_count: int = 0, scanner_type: str = 'comprehensive', - extra_recommendations: List[str] = None) -> ScanReport: + extra_recommendations: List[str] = None, + scan_path: str = None, + traffic_profile: str = None) -> ScanReport: """ Generate complete scan report. @@ -397,6 +399,48 @@ def generate_report(self, # Additional metrics (extensible) metrics = self._calculate_additional_metrics(findings, resource_count) + # ── Cost estimation (Phases 1 + 2) ─────────────────────────────────── + if scan_path and 'cost' in enabled_scanners: + try: + from reporter.cost_estimator import ( + load_pricing, extract_all_blocks, detect_traffic_profile, + scale_usage_defaults, USAGE_DEFAULTS, + estimate_savings, estimate_total_cost, + ) + from dataclasses import asdict as _asdict + pricing = load_pricing() + blocks = extract_all_blocks(scan_path) + + # Auto-detect traffic profile when not explicitly supplied. + effective_profile = traffic_profile + if not effective_profile or effective_profile == 'auto': + effective_profile = detect_traffic_profile(blocks) + + usage = scale_usage_defaults(USAGE_DEFAULTS, effective_profile, blocks) + savings = estimate_savings(cost_findings, blocks, pricing, usage) + rc_list = estimate_total_cost(blocks, pricing, usage) + + total_cost = sum(rc.total_usd_month for rc in rc_list) + savings['total_infra_cost_usd_month'] = round(total_cost, 2) + if total_cost > 0: + # Per-block capping handles most overlap, but fleet-level rules + # (e.g. COST-012) can still push the total above 100% when their + # fleet before_usd exceeds individual block costs. Final safety net: + lo = min(savings['low_usd_month'], total_cost) + hi = min(savings['high_usd_month'], total_cost) + savings['low_usd_month'] = round(lo, 2) + savings['high_usd_month'] = round(hi, 2) + savings['savings_pct_of_total_low'] = round(lo / total_cost * 100, 1) + savings['savings_pct_of_total_high'] = round(hi / total_cost * 100, 1) + + metrics['savings_estimate'] = savings + metrics['resource_costs'] = [_asdict(rc) for rc in rc_list] + metrics['traffic_profile'] = effective_profile + except Exception as _ce: + import logging as _logging + _logging.getLogger(__name__).warning('Cost estimation failed: %s', _ce) + # ───────────────────────────────────────────────────────────────────── + single_scanner_mode = len(enabled_scanners) == 1 return ScanReport( @@ -575,12 +619,7 @@ def _generate_recommendations(self, cost_grade: GradeInfo, f"vulnerabilities - update container images or patch affected packages" ) - # Cost - show only if high priority - if cost_grade and cost_grade.severity_breakdown.get('high', 0) > 0: - recommendations.append( - f"πŸ’° Optimize {cost_grade.severity_breakdown['high']} high-cost " - f"{'issue' if cost_grade.severity_breakdown['high'] == 1 else 'issues'} for significant savings" - ) + # Cost - removed: the savings panel in the UI owns this story now # Overall assessment - max 1 available_letters = [ @@ -593,18 +632,10 @@ def _generate_recommendations(self, cost_grade: GradeInfo, if worst_grade in ['D', 'F']: recommendations.append( - "⚠️ Infrastructure needs improvement - consider professional review" + "⚠️ Infrastructure needs significant improvement - consider a professional review" ) - elif all( - g.letter == 'A' - for g in [cost_grade, security_grade, container_grade] - if g - ) and total_findings > 0: - recommendations.append("βœ… Excellent infrastructure health - maintain current practices") - elif worst_grade in ['B', 'C']: - recommendations.append("πŸ‘ Good foundation - address remaining issues for optimal results") - return recommendations or ["βœ… No significant issues found"] + return recommendations or ["βœ… No significant security issues found"] def _identify_top_issues(self, findings: List[Dict[str, Any]], top_n: int = 5) -> List[Dict[str, Any]]: @@ -636,7 +667,4 @@ def _calculate_additional_metrics(self, findings: List[Dict[str, Any]], 'unique_rules_triggered': len(set(f.get('rule_id') for f in findings)), 'files_affected': len(set(f.get('file') for f in findings if f.get('file'))), } - - # Calculate estimated potential savings (for cost findings) - # This is extensible - add more calculations as needed - + return metrics diff --git a/reporter/pricing_table.json b/reporter/pricing_table.json new file mode 100644 index 0000000..fa6bb48 --- /dev/null +++ b/reporter/pricing_table.json @@ -0,0 +1,153 @@ +{ + "version": "2026-06-23", + "region": "us-east-1", + "ec2_instances": { + "t2.nano": 4.23, + "t2.micro": 8.47, + "t2.small": 16.79, + "t2.medium": 33.87, + "t2.large": 67.74, + "t2.xlarge": 135.49, + "t2.2xlarge": 270.98, + "t3.nano": 3.8, + "t3.micro": 7.59, + "t3.small": 15.18, + "t3.medium": 30.37, + "t3.large": 60.74, + "t3.xlarge": 121.47, + "t3.2xlarge": 242.94, + "m3.medium": 48.91, + "m3.large": 97.09, + "m4.large": 73.0, + "m4.xlarge": 146.0, + "m4.2xlarge": 292.0, + "m5.large": 70.08, + "m5.xlarge": 140.16, + "m5.2xlarge": 280.32, + "m5.4xlarge": 560.64, + "m5.8xlarge": 1121.28, + "c4.large": 73.0, + "c4.xlarge": 145.27, + "c5.large": 62.05, + "c5.xlarge": 124.1, + "c5.2xlarge": 248.2, + "r3.large": 121.18, + "r3.xlarge": 243.09, + "r4.large": 97.09, + "r4.xlarge": 194.18, + "r5.large": 91.98, + "r5.xlarge": 183.96, + "r5.2xlarge": 367.92, + "r5.8xlarge": 1471.68, + "r5.12xlarge": 2207.52, + "r5.24xlarge": 4415.04, + "m4.4xlarge": 584.0, + "m4.10xlarge": 1460.0, + "c4.2xlarge": 290.54, + "c4.4xlarge": 581.08, + "c4.8xlarge": 1161.43, + "r3.2xlarge": 485.45, + "r3.4xlarge": 970.9, + "r3.8xlarge": 1941.8, + "r4.2xlarge": 388.36, + "r4.4xlarge": 776.72, + "r4.8xlarge": 1553.44, + "r4.16xlarge": 3106.88 + }, + "rds_instances": { + "db.t2.micro": 12.41, + "db.t2.small": 24.82, + "db.t2.medium": 49.64, + "db.t3.micro": 12.41, + "db.t3.small": 24.82, + "db.t3.medium": 49.64, + "db.m3.medium": 65.7, + "db.m3.large": 135.05, + "db.m4.large": 127.75, + "db.m4.xlarge": 255.5, + "db.m4.2xlarge": 511.0, + "db.m5.large": 124.83, + "db.m5.xlarge": 249.66, + "db.m5.2xlarge": 499.32, + "db.r3.large": 175.2, + "db.r3.xlarge": 346.75, + "db.r4.large": 175.2, + "db.r4.xlarge": 350.4, + "db.r5.large": 175.2, + "db.r5.xlarge": 350.4, + "db.r5.2xlarge": 700.8, + "db.t2.large": 99.28, + "db.t2.xlarge": 198.56, + "db.t2.2xlarge": 397.12, + "db.t3.large": 99.28, + "db.t3.xlarge": 198.56, + "db.t3.2xlarge": 397.12, + "db.m4.4xlarge": 1022.73, + "db.m4.10xlarge": 2556.46, + "db.m5.4xlarge": 998.64, + "db.m5.8xlarge": 2000.2, + "db.m5.12xlarge": 2995.92, + "db.r3.2xlarge": 689.85, + "db.r3.4xlarge": 1379.7, + "db.r3.8xlarge": 2759.4, + "db.r4.2xlarge": 700.8, + "db.r4.4xlarge": 1401.6, + "db.r4.8xlarge": 2803.2, + "db.r4.16xlarge": 5606.4, + "db.r5.4xlarge": 1401.6, + "db.r5.8xlarge": 2803.2, + "db.r5.16xlarge": 5606.4 + }, + "ebs_per_gb_month": { + "gp2": 0.1, + "gp3": 0.08, + "io1": 0.125, + "io2": 0.125, + "st1": 0.045, + "sc1": 0.015 + }, + "ebs_iops_per_iops_month": { + "io1": 0.065, + "io2_tier1": 0.065, + "io2": 0.065 + }, + "nat_gateway_hourly": 0.045, + "nat_gateway_per_gb": 0.045, + "eip_unattached_per_hour": 0.005, + "alb_per_hour": 0.0225, + "alb_per_lcu_hour": 0.008, + "lambda_per_gb_second": 1.66667e-05, + "lambda_per_1m_requests": 0.2, + "api_gw_rest_per_1m_calls": 3.5, + "api_gw_http_per_1m_calls": 1.0, + "cloudwatch_logs_per_gb_ingested": 0.5, + "cloudwatch_logs_per_gb_stored": 0.03, + "s3_per_gb_standard": 0.023, + "dynamodb_per_rcu_hour": 0.00013, + "dynamodb_per_wcu_hour": 0.00065, + "sqs_per_1m_requests": 0.4, + "ecs_fargate_per_vcpu_hour": 0.04048, + "ecs_fargate_per_gb_hour": 0.004445, + "kinesis_shard_per_hour": 0.015, + "elasticache_t3_micro_per_hour": 0.017, + "msk_m5_large_per_hour": 0.212, + "route53_health_check_per_month": 0.5, + "elasticache_instances": { + "cache.t3.micro": 19.71, + "cache.t3.small": 24.82, + "cache.t3.medium": 39.42, + "cache.m5.large": 113.88, + "cache.m5.xlarge": 227.03, + "cache.r5.large": 252.58, + "cache.r5.xlarge": 130.67, + "cache.r6g.large": 240.9, + "cache.r6g.xlarge": 300.03 + }, + "redshift_nodes": { + "dc2.large": 182.5, + "dc2.8xlarge": 3504.0, + "ra3.xlplus": 792.78, + "ra3.4xlarge": 2379.8, + "ra3.16xlarge": 9519.2 + } +} diff --git a/reporter/traffic_profiles.json b/reporter/traffic_profiles.json new file mode 100644 index 0000000..038e688 --- /dev/null +++ b/reporter/traffic_profiles.json @@ -0,0 +1,24 @@ +{ + "_comment": "nat_gb_per_day = total daily NAT transfer across ALL gateways (split per-gateway in scale_usage_defaults). cw_log_gb_per_month = per log-group ingestion.", + "small": { + "nat_gb_per_day": 1.0, + "cw_log_gb_per_month": 2.0, + "s3_gb_standard": 50.0, + "lambda_invocations_per_mo": 1000000, + "api_calls_per_mo": 1000000 + }, + "medium": { + "nat_gb_per_day": 10.0, + "cw_log_gb_per_month": 20.0, + "s3_gb_standard": 500.0, + "lambda_invocations_per_mo": 10000000, + "api_calls_per_mo": 10000000 + }, + "large": { + "nat_gb_per_day": 50.0, + "cw_log_gb_per_month": 100.0, + "s3_gb_standard": 5000.0, + "lambda_invocations_per_mo": 100000000, + "api_calls_per_mo": 100000000 + } +} diff --git a/reporter/usage_defaults.json b/reporter/usage_defaults.json new file mode 100644 index 0000000..995b430 --- /dev/null +++ b/reporter/usage_defaults.json @@ -0,0 +1,23 @@ +{ + "nat_gb_per_day": 10.0, + "cw_log_gb_per_month": 5.0, + "s3_gb_standard": 50.0, + "lambda_invocations_per_mo": 1000000, + "lambda_avg_duration_ms": 200.0, + "lambda_memory_after_mb": 1024.0, + "api_calls_per_mo": 1000000, + "sqs_requests_per_mo": 1000000, + "dynamo_reads_per_mo": 1000000, + "dynamo_writes_per_mo": 1000000, + "s3_dynamo_pct_of_nat": 0.20, + "dynamo_idle_pct": 0.70, + "ecs_overprovisioning_pct": 0.25, + "vpc_endpoint_data_gb_per_mo": 20.0, + "tgw_data_processed_gb_per_mo": 50.0, + "secretsmanager_requests_per_mo": 10000, + "lb_data_processed_gb": 10.0, + "cloudfront_requests_per_mo": 1000000, + "cloudfront_gb_out_per_mo": 50.0, + "efs_gb_stored": 100.0, + "ecr_gb_stored": 10.0 +} diff --git a/rules/definitions.py b/rules/definitions.py index aade39a..aab88cf 100644 --- a/rules/definitions.py +++ b/rules/definitions.py @@ -213,9 +213,9 @@ def check(self, content): ), RegexRule( id="COST-004", - name="Provisioned IOPS (io1/io2)", + name="EBS Provisioned IOPS (io1/io2)", severity="High", - description="Usage of Provisioned IOPS SSD (io1/io2). These are very expensive.", + description="EBS volume using Provisioned IOPS (io1/io2) type. These are very expensive β€” io2 costs 56Γ— more than gp3 per GB plus per-IOPS charges.", remediation="Verify if gp3 can meet performance requirements at a lower cost.", estimated_savings="$50-200+/month per volume", pattern=r'type\s*=\s*["\'](io1|io2)["\']' @@ -292,7 +292,7 @@ def check(self, content): remediation="Use spot instances for batch jobs, data analysis, and optional tasks. Consider aws_spot_instance_request or spot_price in launch templates.", estimated_savings="50-90% savings on compute (hundreds to thousands per month)", pattern=r'(spot_instance_request|spot_price|spot\s*=|provisioning_model|market_type)', - resource_pattern=r'(instance_type\s*=|aws_instance|aws_launch)' + resource_pattern=r'resource\s*["\']aws_instance["\']' ), RegexRule( id="COST-013", diff --git a/scripts/update_pricing.py b/scripts/update_pricing.py new file mode 100755 index 0000000..3caa722 --- /dev/null +++ b/scripts/update_pricing.py @@ -0,0 +1,460 @@ +ο»Ώ#!/usr/bin/env python3 +""" +Update reporter/pricing_table.json with current AWS on-demand prices. + +Uses the public AWS Bulk Pricing JSON endpoints -- no credentials, no boto3. +The EC2 regional file is ~40 MB compressed; others are typically 5-20 MB each. +All files are cached to /tmp so re-runs in the same session skip the download. + +Usage: + python3 scripts/update_pricing.py [--region us-east-1] [--dry-run] [--no-cache] + +No extra dependencies beyond Python 3 stdlib. +""" + +import argparse +import gzip +import hashlib +import json +import logging +import os +import sys +import tempfile +import urllib.request +from datetime import date +from typing import Dict, Optional, Tuple + +logger = logging.getLogger(__name__) +logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s") + +_PRICING_BASE = "https://pricing.us-east-1.amazonaws.com/offers/v1.0/aws" + +_REGION_NAME: Dict[str, str] = { + "us-east-1": "US East (N. Virginia)", + "us-east-2": "US East (Ohio)", + "us-west-1": "US West (N. California)", + "us-west-2": "US West (Oregon)", + "eu-west-1": "Europe (Ireland)", + "eu-central-1": "Europe (Frankfurt)", + "ap-southeast-1": "Asia Pacific (Singapore)", + "ap-northeast-1": "Asia Pacific (Tokyo)", +} + +_EC2_TYPES = [ + "t2.nano", "t2.micro", "t2.small", "t2.medium", "t2.large", "t2.xlarge", "t2.2xlarge", + "t3.nano", "t3.micro", "t3.small", "t3.medium", "t3.large", "t3.xlarge", "t3.2xlarge", + "m3.medium", "m3.large", + "m4.large", "m4.xlarge", "m4.2xlarge", "m4.4xlarge", "m4.10xlarge", + "m5.large", "m5.xlarge", "m5.2xlarge", "m5.4xlarge", "m5.8xlarge", + "c4.large", "c4.xlarge", "c4.2xlarge", "c4.4xlarge", "c4.8xlarge", + "c5.large", "c5.xlarge", "c5.2xlarge", + "r3.large", "r3.xlarge", "r3.2xlarge", "r3.4xlarge", "r3.8xlarge", + "r4.large", "r4.xlarge", "r4.2xlarge", "r4.4xlarge", "r4.8xlarge", "r4.16xlarge", + "r5.large", "r5.xlarge", "r5.2xlarge", "r5.8xlarge", "r5.12xlarge", "r5.24xlarge", +] + +_RDS_CLASSES = [ + "db.t2.micro", "db.t2.small", "db.t2.medium", "db.t2.large", "db.t2.xlarge", "db.t2.2xlarge", + "db.t3.micro", "db.t3.small", "db.t3.medium", "db.t3.large", "db.t3.xlarge", "db.t3.2xlarge", + "db.m3.medium", "db.m3.large", + "db.m4.large", "db.m4.xlarge", "db.m4.2xlarge", "db.m4.4xlarge", "db.m4.10xlarge", + "db.m5.large", "db.m5.xlarge", "db.m5.2xlarge", "db.m5.4xlarge", "db.m5.8xlarge", "db.m5.12xlarge", + "db.r3.large", "db.r3.xlarge", "db.r3.2xlarge", "db.r3.4xlarge", "db.r3.8xlarge", + "db.r4.large", "db.r4.xlarge", "db.r4.2xlarge", "db.r4.4xlarge", "db.r4.8xlarge", "db.r4.16xlarge", + "db.r5.large", "db.r5.xlarge", "db.r5.2xlarge", "db.r5.4xlarge", "db.r5.8xlarge", "db.r5.16xlarge", +] + +_ELASTICACHE_TYPES = [ + "cache.t3.micro", "cache.t3.small", "cache.t3.medium", + "cache.m5.large", "cache.m5.xlarge", + "cache.r5.large", "cache.r5.xlarge", + "cache.r6g.large", "cache.r6g.xlarge", +] + +_OPENSEARCH_TYPES = [ + "t3.small.search", "t3.medium.search", + "m5.large.search", "m5.xlarge.search", + "r5.large.search", "r5.xlarge.search", + "r6g.large.search", "r6g.xlarge.search", +] + +_REDSHIFT_TYPES = [ + "dc2.large", "dc2.8xlarge", + "ds2.xlarge", "ds2.8xlarge", + "ra3.xlplus", "ra3.4xlarge", "ra3.16xlarge", +] + +_MSK_TYPES = [ + "kafka.t3.small", + "kafka.m5.large", "kafka.m5.xlarge", "kafka.m5.2xlarge", "kafka.m5.4xlarge", + "kafka.m6g.large", "kafka.m6g.xlarge", +] + + +def _fetch_json(url: str, no_cache: bool = False) -> dict: + cache_key = hashlib.md5(url.encode()).hexdigest() + cache_path = os.path.join(tempfile.gettempdir(), f"infrascan_pricing_{cache_key}.json") + + if not no_cache and os.path.exists(cache_path): + logger.info("Using cached %s", url) + with open(cache_path, "r", encoding="utf-8") as f: + return json.load(f) + + logger.info("Downloading %s ...", url) + req = urllib.request.Request(url, headers={"Accept-Encoding": "gzip"}) + try: + with urllib.request.urlopen(req, timeout=300) as resp: + raw = resp.read() + encoding = resp.headers.get("Content-Encoding", "") + except Exception as exc: + sys.exit(f"Failed to download {url}: {exc}") + + if encoding == "gzip" or raw[:2] == b"\x1f\x8b": + raw = gzip.decompress(raw) + + data = json.loads(raw) + with open(cache_path, "w", encoding="utf-8") as f: + json.dump(data, f) + return data + + +def _on_demand_price(data: dict, sku: str) -> Optional[float]: + for term in data.get("terms", {}).get("OnDemand", {}).get(sku, {}).values(): + for dim in term.get("priceDimensions", {}).values(): + try: + p = float(dim["pricePerUnit"]["USD"]) + if p > 0: + return p + except (KeyError, ValueError): + pass + return None + + +# EC2 + EBS + +def fetch_ec2_prices(data: dict, region_name: str) -> Dict[str, float]: + wanted = set(_EC2_TYPES) + sku_map: Dict[str, str] = {} + for sku, prod in data.get("products", {}).items(): + attrs = prod.get("attributes", {}) + if ( + attrs.get("instanceType") in wanted + and attrs.get("operatingSystem") == "Linux" + and attrs.get("tenancy") == "Shared" + and attrs.get("preInstalledSw") == "NA" + and attrs.get("capacitystatus") == "Used" + and attrs.get("location") == region_name + ): + sku_map[attrs["instanceType"]] = sku + + prices: Dict[str, float] = {} + for itype in _EC2_TYPES: + sku = sku_map.get(itype) + if not sku: + logger.warning(" EC2 %-20s - SKU not found, keeping existing", itype) + continue + p = _on_demand_price(data, sku) + if p is None: + logger.warning(" EC2 %-20s - price not found, keeping existing", itype) + continue + prices[itype] = round(p * 730, 2) + logger.info(" EC2 %-20s $%.6f/hr -> $%.2f/mo", itype, p, prices[itype]) + return prices + + +def fetch_ebs_prices(data: dict, region_name: str) -> Tuple[Dict[str, float], Dict[str, float]]: + vol_types = {"gp2", "gp3", "io1", "io2", "st1", "sc1"} + gb_skus: Dict[str, str] = {} + iops_skus: Dict[str, str] = {} + + for sku, prod in data.get("products", {}).items(): + attrs = prod.get("attributes", {}) + if attrs.get("location") != region_name: + continue + vol = attrs.get("volumeApiName", "") + fam = prod.get("productFamily", "") + if fam == "Storage" and vol in vol_types: + gb_skus[vol] = sku + elif fam == "System Operation" and attrs.get("group") == "EBS IOPS" and vol in {"io1", "io2"}: + iops_skus[vol] = sku + + per_gb: Dict[str, float] = {} + for vol in sorted(vol_types): + sku = gb_skus.get(vol) + if not sku: + logger.warning(" EBS %-6s GB - SKU not found, keeping existing", vol) + continue + p = _on_demand_price(data, sku) + if p is not None: + per_gb[vol] = round(p, 4) + logger.info(" EBS %-6s GB $%.4f/GB-mo", vol, p) + else: + logger.warning(" EBS %-6s GB - price not found, keeping existing", vol) + + per_iops: Dict[str, float] = {} + for vol in ["io1", "io2"]: + sku = iops_skus.get(vol) + if not sku: + continue + p = _on_demand_price(data, sku) + if p is not None: + per_iops[vol] = round(p, 5) + logger.info(" EBS %-6s IOPS $%.5f/IOPS-mo", vol, p) + + return per_gb, per_iops + + +# RDS + +def fetch_rds_prices(data: dict, region_name: str) -> Dict[str, float]: + wanted = set(_RDS_CLASSES) + sku_map: Dict[str, str] = {} + for sku, prod in data.get("products", {}).items(): + attrs = prod.get("attributes", {}) + if ( + attrs.get("instanceType") in wanted + and attrs.get("databaseEngine") == "MySQL" + and attrs.get("deploymentOption") == "Single-AZ" + and attrs.get("location") == region_name + ): + sku_map[attrs["instanceType"]] = sku + + prices: Dict[str, float] = {} + for cls in _RDS_CLASSES: + sku = sku_map.get(cls) + if not sku: + logger.warning(" RDS %-25s - SKU not found, keeping existing", cls) + continue + p = _on_demand_price(data, sku) + if p is None: + logger.warning(" RDS %-25s - price not found, keeping existing", cls) + continue + prices[cls] = round(p * 730, 2) + logger.info(" RDS %-25s $%.6f/hr -> $%.2f/mo", cls, p, prices[cls]) + return prices + + +# ElastiCache + +def fetch_elasticache_prices(data: dict, region_name: str) -> Dict[str, float]: + wanted = set(_ELASTICACHE_TYPES) + sku_map: Dict[str, str] = {} + for sku, prod in data.get("products", {}).items(): + attrs = prod.get("attributes", {}) + if ( + attrs.get("instanceType") in wanted + and attrs.get("cacheEngine") == "Redis" + and attrs.get("location") == region_name + and prod.get("productFamily") == "Cache Instance" + ): + sku_map[attrs["instanceType"]] = sku + + prices: Dict[str, float] = {} + for itype in _ELASTICACHE_TYPES: + sku = sku_map.get(itype) + if not sku: + logger.warning(" ElastiCache %-25s - SKU not found, keeping existing", itype) + continue + p = _on_demand_price(data, sku) + if p is None: + logger.warning(" ElastiCache %-25s - price not found, keeping existing", itype) + continue + prices[itype] = round(p * 730, 2) + logger.info(" ElastiCache %-25s $%.6f/hr -> $%.2f/mo", itype, p, prices[itype]) + return prices + + +# OpenSearch / Elasticsearch Ò”€ + +def fetch_opensearch_prices(data: dict, region_name: str) -> Dict[str, float]: + # The bulk pricing API still uses the legacy "AmazonES" service code. + # productFamily is "Amazon Elasticsearch Service Instance" for both ES and OpenSearch instances. + wanted = set(_OPENSEARCH_TYPES) + sku_map: Dict[str, str] = {} + for sku, prod in data.get("products", {}).items(): + attrs = prod.get("attributes", {}) + if ( + attrs.get("instanceType") in wanted + and attrs.get("location") == region_name + and "Elasticsearch" in prod.get("productFamily", "") + ): + sku_map[attrs["instanceType"]] = sku + + prices: Dict[str, float] = {} + for itype in _OPENSEARCH_TYPES: + sku = sku_map.get(itype) + if not sku: + logger.warning(" OpenSearch %-25s - SKU not found, keeping existing", itype) + continue + p = _on_demand_price(data, sku) + if p is None: + logger.warning(" OpenSearch %-25s - price not found, keeping existing", itype) + continue + prices[itype] = round(p * 730, 2) + logger.info(" OpenSearch %-25s $%.6f/hr -> $%.2f/mo", itype, p, prices[itype]) + return prices + + +# Redshift Ò”€ + +def fetch_redshift_prices(data: dict, region_name: str) -> Dict[str, float]: + wanted = set(_REDSHIFT_TYPES) + sku_map: Dict[str, str] = {} + for sku, prod in data.get("products", {}).items(): + attrs = prod.get("attributes", {}) + if ( + attrs.get("instanceType") in wanted + and attrs.get("location") == region_name + and prod.get("productFamily") == "Compute Instance" + and "Redshift" in attrs.get("servicecode", "") + ): + sku_map[attrs["instanceType"]] = sku + + prices: Dict[str, float] = {} + for itype in _REDSHIFT_TYPES: + sku = sku_map.get(itype) + if not sku: + logger.warning(" Redshift %-20s - SKU not found, keeping existing", itype) + continue + p = _on_demand_price(data, sku) + if p is None: + logger.warning(" Redshift %-20s - price not found, keeping existing", itype) + continue + prices[itype] = round(p * 730, 2) + logger.info(" Redshift %-20s $%.6f/hr -> $%.2f/mo", itype, p, prices[itype]) + return prices + + +# MSK (Kafka) + +def fetch_msk_prices(data: dict, region_name: str) -> Dict[str, float]: + wanted = set(_MSK_TYPES) + sku_map: Dict[str, str] = {} + for sku, prod in data.get("products", {}).items(): + attrs = prod.get("attributes", {}) + if ( + attrs.get("instanceType") in wanted + and attrs.get("location") == region_name + ): + sku_map[attrs["instanceType"]] = sku + + prices: Dict[str, float] = {} + for itype in _MSK_TYPES: + sku = sku_map.get(itype) + if not sku: + logger.warning(" MSK %-25s - SKU not found, keeping existing", itype) + continue + p = _on_demand_price(data, sku) + if p is None: + logger.warning(" MSK %-25s - price not found, keeping existing", itype) + continue + prices[itype] = round(p * 730, 2) + logger.info(" MSK %-25s $%.6f/hr -> $%.2f/mo", itype, p, prices[itype]) + return prices + + +# main Ò”€ + +def main() -> None: + ap = argparse.ArgumentParser( + description="Update InfraScan pricing_table.json from AWS public pricing (no credentials needed)" + ) + ap.add_argument("--region", default="us-east-1") + ap.add_argument("--dry-run", action="store_true", help="Print changes without writing") + ap.add_argument("--no-cache", action="store_true", help="Ignore /tmp cache, re-download") + args = ap.parse_args() + + region_name = _REGION_NAME.get(args.region) + if not region_name: + sys.exit( + f"Unknown region '{args.region}'. Supported: {', '.join(_REGION_NAME)}. " + "Add it to _REGION_NAME in this script." + ) + + if args.no_cache: + for fname in os.listdir(tempfile.gettempdir()): + if fname.startswith("infrascan_pricing_"): + os.remove(os.path.join(tempfile.gettempdir(), fname)) + + pricing_path = os.path.normpath( + os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "reporter", "pricing_table.json") + ) + with open(pricing_path, "r", encoding="utf-8") as f: + table = json.load(f) + + ec2_url = f"{_PRICING_BASE}/AmazonEC2/current/{args.region}/index.json" + rds_url = f"{_PRICING_BASE}/AmazonRDS/current/{args.region}/index.json" + elasticache_url = f"{_PRICING_BASE}/AmazonElastiCache/current/{args.region}/index.json" + opensearch_url = f"{_PRICING_BASE}/AmazonES/current/{args.region}/index.json" + redshift_url = f"{_PRICING_BASE}/AmazonRedshift/current/{args.region}/index.json" + msk_url = f"{_PRICING_BASE}/AmazonMSK/current/{args.region}/index.json" + + logger.info("=== EC2 + EBS ===") + ec2_data = _fetch_json(ec2_url, args.no_cache) + for k, v in fetch_ec2_prices(ec2_data, region_name).items(): + table["ec2_instances"][k] = v + per_gb, per_iops = fetch_ebs_prices(ec2_data, region_name) + for k, v in per_gb.items(): + table["ebs_per_gb_month"][k] = v + for k, v in per_iops.items(): + table["ebs_iops_per_iops_month"][k] = v + + logger.info("=== RDS ===") + rds_data = _fetch_json(rds_url, args.no_cache) + for k, v in fetch_rds_prices(rds_data, region_name).items(): + table["rds_instances"][k] = v + + logger.info("=== ElastiCache ===") + ec_data = _fetch_json(elasticache_url, args.no_cache) + ec_prices = fetch_elasticache_prices(ec_data, region_name) + if ec_prices: + if "elasticache_instances" not in table: + table["elasticache_instances"] = {} + for k, v in ec_prices.items(): + table["elasticache_instances"][k] = v + + logger.info("=== OpenSearch ===") + os_data = _fetch_json(opensearch_url, args.no_cache) + os_prices = fetch_opensearch_prices(os_data, region_name) + if os_prices: + if "opensearch_instances" not in table: + table["opensearch_instances"] = {} + for k, v in os_prices.items(): + table["opensearch_instances"][k] = v + + logger.info("=== Redshift ===") + rs_data = _fetch_json(redshift_url, args.no_cache) + rs_prices = fetch_redshift_prices(rs_data, region_name) + if rs_prices: + if "redshift_nodes" not in table: + table["redshift_nodes"] = {} + for k, v in rs_prices.items(): + table["redshift_nodes"][k] = v + + logger.info("=== MSK ===") + msk_data = _fetch_json(msk_url, args.no_cache) + msk_prices = fetch_msk_prices(msk_data, region_name) + if msk_prices: + if "msk_brokers" not in table: + table["msk_brokers"] = {} + for k, v in msk_prices.items(): + table["msk_brokers"][k] = v + + table["version"] = date.today().isoformat() + table["region"] = args.region + + if args.dry_run: + print(json.dumps(table, indent=2)) + logger.info("Dry-run: no file written.") + return + + with open(pricing_path, "w", encoding="utf-8") as f: + json.dump(table, f, indent=2) + f.write("\n") + + logger.info("Written %s (version: %s)", pricing_path, table["version"]) + + +if __name__ == "__main__": + main() + diff --git a/static/app.js b/static/app.js index 4c6991a..56256db 100644 --- a/static/app.js +++ b/static/app.js @@ -503,6 +503,7 @@ function initApp() { security: currentGradeReport ? currentGradeReport.security : null, container: currentGradeReport ? currentGradeReport.container : null, analysis: currentGradeReport ? currentGradeReport.analysis : null, + metrics: currentGradeReport ? currentGradeReport.metrics : null, is_private: currentMetadata ? currentMetadata.is_private : false }) }); @@ -756,6 +757,7 @@ function initApp() { security: data.security, container: data.container, analysis: data.analysis, + metrics: data.metrics, is_private: data.metadata ? data.metadata.is_private : false }) }); @@ -1097,10 +1099,37 @@ function initApp() { Problem: ${escapeHtml(first.description)} ` : ''} - ${first.scanner === 'regex' ? ` -
- Potential Savings: ${escapeHtml(first.estimated_savings)} -
` : ''} + ${first.scanner === 'regex' ? (() => { + // Look for computed cost data from the cost estimator (Phase 1+2) + const perFinding = currentGradeReport?.metrics?.savings_estimate?.per_finding || []; + const costData = perFinding.find( + pf => pf.rule_id === first.rule_id && pf.file === first.file && pf.line === first.line + ); + if (costData && (costData.saving_high > 0 || costData.before_usd > 0)) { + const savingStr = costData.saving_low === costData.saving_high + ? `$${costData.saving_low.toFixed(2)}` + : `$${costData.saving_low.toFixed(2)} – $${costData.saving_high.toFixed(2)}`; + const confIcon = costData.confidence === 'high' ? '🟒' : costData.confidence === 'medium' ? '🟑' : 'βšͺ'; + const assumptions = (costData.assumptions || []).join('; '); + let afterStr = ''; + if (costData.before_usd > 0) { + const afterBest = Math.max(0, costData.before_usd - costData.saving_high); + const afterWorst = Math.max(0, costData.before_usd - costData.saving_low); + afterStr = afterBest === afterWorst + ? `$${afterBest.toFixed(2)}` + : `$${afterBest.toFixed(2)}–$${afterWorst.toFixed(2)}`; + } + return `
+ Estimated Saving: + ${savingStr}/mo ${confIcon} + ${afterStr ? `current: $${costData.before_usd.toFixed(2)} β†’ after: ${afterStr}` : ''} +
`; + } + // Fall back to the static estimated_savings text + return `
+ Potential Savings: ${escapeHtml(first.estimated_savings)} +
`; + })() : ''}
Occurrences: ${fileCount} ${fileCount === 1 ? 'location' : 'locations'}
@@ -1410,6 +1439,67 @@ function initApp() { }); } + function renderSavingsCard(est, trafficProfile) { + if (!est || (est.low_usd_month === 0 && est.high_usd_month === 0 && !est.total_infra_cost_usd_month)) return ''; + + const low = est.low_usd_month || 0; + const high = est.high_usd_month || 0; + const total = est.total_infra_cost_usd_month; + const pctLoTot = est.savings_pct_of_total_low; + const pctHiTot = est.savings_pct_of_total_high; + const pctLoDet = est.savings_pct_of_detectable_low; + const pctHiDet = est.savings_pct_of_detectable_high; + const provider = est.cost_provider || 'internal'; + const profile = trafficProfile || 'small'; + + const savingRange = low === high + ? `$${low.toLocaleString('en-US', {minimumFractionDigits: 2, maximumFractionDigits: 2})}` + : `$${low.toLocaleString('en-US', {minimumFractionDigits: 2, maximumFractionDigits: 2})} – $${high.toLocaleString('en-US', {minimumFractionDigits: 2, maximumFractionDigits: 2})}`; + + let pctLine = ''; + if (pctLoTot != null && total) { + pctLine = `${pctLoTot}%–${pctHiTot}% of $${Math.round(total).toLocaleString()}/mo total infra cost`; + } else if (pctLoDet != null) { + pctLine = `${pctLoDet}%–${pctHiDet}% of detectable resource cost`; + } + + // Top 3 findings by saving + const topFindings = (est.per_finding || []) + .filter(f => f.saving_high > 0) + .sort((a, b) => b.saving_high - a.saving_high) + .slice(0, 3); + + const topRows = topFindings.map(f => { + const s = f.saving_low === f.saving_high + ? `$${f.saving_low.toFixed(2)}` + : `$${f.saving_low.toFixed(2)}–$${f.saving_high.toFixed(2)}`; + const fname = (f.file || '').split('/').pop().split('\\\\').pop(); + const conf = f.confidence === 'high' ? '🟒' : f.confidence === 'medium' ? '🟑' : 'βšͺ'; + return ` + ${escapeHtml(f.rule_id || '')} + ${escapeHtml(fname)}${f.line ? ':' + f.line : ''} + ${s}/mo + ${conf} + `; + }).join(''); + + return ` +
+
+ πŸ’° + Estimated Monthly Savings + ${escapeHtml(provider)} Β· ${escapeHtml(profile)} profile +
+
${savingRange}/month
+ ${pctLine ? `
${pctLine}
` : ''} + ${topRows ? ` + + + ${topRows} +
RuleResourceSavingConfidence
` : ''} +
`; + } + function renderGradeReport(gradeReport) { if (!gradeReport || !gradeReport.overall) return ''; @@ -1439,7 +1529,7 @@ function initApp() { return explanations[title] || ''; }; - const renderGradeCard = (title, grade, icon) => { + const renderGradeCard = (title, grade, icon, extra = '') => { if (!grade) return ''; // Context-aware label for violations @@ -1474,12 +1564,128 @@ function initApp() { ${grade.severity_breakdown.low > 0 ? `${grade.severity_breakdown.low} Low` : ''} ` : ''} + ${extra} `; }; const singleScannerMode = gradeReport.metrics?.single_scanner_mode; - const recommendations = gradeReport.analysis?.recommendations || []; + const recommendations = [...(gradeReport.analysis?.recommendations || [])]; + const est = gradeReport.metrics?.savings_estimate; + + // Build savings panel: per-resource savings via exact block_file/block_line join + const savingsPanelHtml = (() => { + if (!est || (est.low_usd_month === 0 && est.high_usd_month === 0 && !est.total_infra_cost_usd_month)) return ''; + const lo = est.low_usd_month || 0; + const hi = est.high_usd_month || 0; + const tot = est.total_infra_cost_usd_month; + const fmt = n => `$${Math.round(n).toLocaleString('en-US')}`; + const savingStr = lo === hi ? fmt(lo) : `${fmt(lo)} – ${fmt(hi)}`; + const pctLo = est.savings_pct_of_total_low; + const pctHi = est.savings_pct_of_total_high; + + // Group per_finding by block coords (new) or finding coords (old scans) + const findingsByBlock = {}; + (est.per_finding || []).forEach(pf => { + const bf = pf.block_file != null ? pf.block_file : pf.file; + const bl = pf.block_line != null ? pf.block_line : pf.line; + const key = `${bf}::${bl}`; + if (!findingsByBlock[key]) findingsByBlock[key] = []; + findingsByBlock[key].push(pf); + }); + + // Attach per-resource savings and sort by savings desc + const resources = (gradeReport.metrics?.resource_costs || []) + .filter(r => r.total_usd_month > 0) + .map(r => { + const matched = findingsByBlock[`${r.file}::${r.line}`] || []; + const sLow = Math.min(matched.reduce((s, pf) => s + pf.saving_low, 0), r.total_usd_month); + const sHigh = Math.min(matched.reduce((s, pf) => s + pf.saving_high, 0), r.total_usd_month); + return { ...r, savings_low: sLow, savings_high: sHigh }; + }) + .sort((a, b) => b.savings_high - a.savings_high || b.total_usd_month - a.total_usd_month); + + const makeRow = r => { + const confLevel = r.confidence === 'high' ? 'high' : r.confidence === 'medium' ? 'medium' : 'low'; + const confIcon = confLevel === 'high' ? '🟒' : confLevel === 'medium' ? '🟑' : 'βšͺ'; + const confLabels = { high: 'High β€” price read directly from config', medium: 'Medium β€” partially inferred (variable reference or missing field)', low: 'Low β€” inferred from variable defaults or fallback estimate' }; + const confTip = [confLabels[confLevel], ...(r.assumptions || [])].join('\n'); + const conf = `${confIcon}`; + const fmtS = n => n < 1 ? `<$1` : `$${Math.round(n).toLocaleString('en-US')}`; + const sStr = r.savings_high > 0 + ? `${ + r.savings_low === r.savings_high + ? `${fmtS(r.savings_high)}/mo` + : `${fmtS(r.savings_low)}–${fmtS(r.savings_high)}/mo` + }` + : `–`; + return ` + ${escapeHtml(r.resource_type.replace('aws_', ''))} + ${escapeHtml(r.resource_name || (r.file||'').split('/').pop())} + ${sStr} + ${fmt(r.total_usd_month)}/mo + ${conf} + `; + }; + + // Columns: 20% type, 30% resource, 22% savings, 22% cost, 6% conf + const colgroup = ` + + + `; + const thead = `TypeResourceSavingsCostConf.`; + + const top = resources.slice(0, 3); + // In expanded section only show resources that have savings to review + const rest = resources.slice(3).filter(r => r.savings_high > 0); + + return ` +
+

πŸ’° Cost Savings Estimate

+
${savingStr}/mo
+ ${(pctLo != null && tot && pctHi < 99) + ? `
${pctLo}%–${pctHi}% of ${fmt(tot)}/mo total infra cost
` + : (tot ? `
vs. ${fmt(tot)}/mo measured infra cost β“˜
` : '') + } + ${top.length ? ` + + ${colgroup}${thead} + ${top.map(makeRow).join('')} +
` : ''} + ${rest.length ? ` +
+ ${rest.length} more resource${rest.length !== 1 ? 's' : ''} with savings + + ${colgroup}${rest.map(makeRow).join('')} +
+
` : ''} + ${(() => { + const total = est.total_block_count || 0; + const covered = est.covered_block_count || 0; + const unpriced = (est.uncovered_resource_types || []) + .filter(t => !['aws_vpc','aws_subnet','aws_route_table','aws_route','aws_internet_gateway', + 'aws_security_group','aws_security_group_rule','aws_iam_role','aws_iam_policy', + 'aws_iam_role_policy','aws_iam_role_policy_attachment','aws_iam_instance_profile', + 'aws_network_acl','aws_network_acl_rule','aws_vpc_dhcp_options', + 'aws_vpc_dhcp_options_association','aws_main_route_table_association', + 'aws_route_table_association','aws_vpc_endpoint','aws_vpn_gateway', + 'aws_customer_gateway','aws_acm_certificate','aws_acm_certificate_validation', + 'aws_key_pair','aws_placement_group','aws_ssm_parameter','aws_secretsmanager_secret', + 'aws_secretsmanager_secret_version','aws_kms_key','aws_kms_alias', + 'aws_cloudwatch_log_subscription_filter'].includes(t)); + if (!total || covered >= total) return ''; + const unpriced3 = unpriced.slice(0,3).map(t => t.replace('aws_','')).join(', '); + const more = unpriced.length > 3 ? ` +${unpriced.length - 3} more` : ''; + return `
Priced ${covered} of ${total} resources${unpriced.length ? ` Β· potentially unpriced: ${unpriced3}${more}` : ''}
`; + })()} +
`; + })(); + + const hasSavings = savingsPanelHtml !== ''; + + // Only show recommendations panel when there are real actionable security alerts, + // not the generic fallback "no significant issues" placeholder. + const actionableRecs = recommendations.filter(r => !r.startsWith('βœ…')); return `
@@ -1489,20 +1695,22 @@ function initApp() { ${!singleScannerMode && gradeReport.overall ? renderGradeCard('Overall Grade', gradeReport.overall, '🎯') : ''} - ${renderGradeCard('Cost Optimization', gradeReport.cost, 'πŸ’°')} ${renderGradeCard('IaC Security', gradeReport.security, 'πŸ”’')} ${renderGradeCard('Container Security', gradeReport.container, '🐳')}
- ${recommendations.length > 0 ? ` -
-

πŸ’‘ Recommendations

- -
- ` : ''} + ${(actionableRecs.length > 0 || hasSavings) ? ` +
+ ${actionableRecs.length > 0 ? ` +
+

πŸ’‘ Recommendations

+
    + ${actionableRecs.map(rec => `
  • ${escapeHtml(rec)}
  • `).join('')} +
+
` : ''} + ${hasSavings ? savingsPanelHtml : ''} +
` : ''} ` ; } diff --git a/static/pdf_generator.js b/static/pdf_generator.js index 5fb0f33..a11a5a1 100644 --- a/static/pdf_generator.js +++ b/static/pdf_generator.js @@ -113,6 +113,32 @@ function buildPdfDocument(results, summary, metadata, gradeReport) { const TH = (txt, w='') => `${txt}`; + // Build a lookup: rule_id+file+line -> computed cost data from savings_estimate + const perFindingLookup = {}; + (gradeReport?.metrics?.savings_estimate?.per_finding || []).forEach(pf => { + const key = `${pf.rule_id}::${pf.file}::${pf.line}`; + perFindingLookup[key] = pf; + }); + + // For a rule group, return the best computed saving string across all occurrences, + // or fall back to the static estimated_savings text. + function computedSavingStr(ruleId, findings, staticFallback) { + let totalLow = 0, totalHigh = 0, hasData = false; + findings.forEach(f => { + const pf = perFindingLookup[`${ruleId}::${f.file}::${f.line}`]; + if (pf && (pf.saving_high > 0 || pf.before_usd > 0)) { + totalLow += pf.saving_low || 0; + totalHigh += pf.saving_high || 0; + hasData = true; + } + }); + if (!hasData) return esc(staticFallback || 'β€”'); + const fmtN = n => (n > 0 && n < 1) ? '<$1' : `$${Math.round(n).toLocaleString('en-US')}`; + return totalLow === totalHigh + ? `${fmtN(totalHigh)}/mo` + : `${fmtN(totalLow)}–${fmtN(totalHigh)}/mo`; + } + function costTable() { if (costGroups.length === 0) return showIaC() || containerScannerName ? `
@@ -126,11 +152,12 @@ function buildPdfDocument(results, summary, metadata, gradeReport) { const files = findings.slice(0,3).map(fi => `
${esc(trunc(fi.file,50))}${fi.line?':'+fi.line:''}
`).join('') + (findings.length>3 ? `
+${findings.length-3} more…
` : ''); + const savingDisplay = computedSavingStr(f.rule_id, findings, f.estimated_savings); return `
${esc(f.rule_name)}
${esc(f.description || '')}
${sevBadge(f.severity)} ${findings.length} - ${esc(f.estimated_savings||'β€”')} + ${savingDisplay}
${esc(f.remediation || '')}
${files} `; }).join(''); @@ -425,6 +452,28 @@ function buildPdfDocument(results, summary, metadata, gradeReport) { ${gradesSection} + +${(() => { + const est = gradeReport?.metrics?.savings_estimate; + if (!est) return ''; + const low = est.low_usd_month || 0; + const high = est.high_usd_month || 0; + if (high === 0) return ''; + const total = est.total_infra_cost_usd_month; + const pctLo = est.savings_pct_of_total_low; + const pctHi = est.savings_pct_of_total_high; + const fmt = n => `$${Math.round(n).toLocaleString('en-US')}`; + const rangeStr = low === high ? `${fmt(high)}/mo` : `${fmt(low)} – ${fmt(high)}/mo`; + const pctStr = (pctLo != null && total) + ? `${pctLo}%–${pctHi}% of ${fmt(total)}/mo total infrastructure cost` + : (total ? `vs. ${fmt(total)}/mo measured infrastructure cost` : ''); + return `
+
πŸ’° Cost Savings Estimate
+
${rangeStr}
+ ${pctStr ? `
${pctStr}
` : ''} +
`; +})()} + ${costTable()} diff --git a/static/style.css b/static/style.css index edf558c..4b8812c 100644 --- a/static/style.css +++ b/static/style.css @@ -2226,6 +2226,12 @@ textarea:focus { font-size: 0.9rem; } +.grade-savings-inline { + margin-top: 0.6rem; + padding-top: 0.6rem; + border-top: 1px solid var(--border); +} + .grade-detail-label { color: var(--text-muted); } @@ -2272,6 +2278,96 @@ textarea:focus { color: #93c5fd; } +/* ── Savings Card ── */ +.savings-card { + background: linear-gradient(135deg, rgba(16, 185, 129, 0.12), rgba(5, 150, 105, 0.06)); + border: 1px solid rgba(16, 185, 129, 0.35); + border-radius: 12px; + padding: 1.5rem; + margin: 1.25rem 0; +} + +.savings-card-header { + display: flex; + align-items: center; + gap: 0.6rem; + margin-bottom: 0.75rem; +} + +.savings-icon { font-size: 1.4rem; } + +.savings-title { + font-size: 1.05rem; + font-weight: 700; + color: var(--text-main); + flex: 1; +} + +.savings-badge { + font-size: 0.75rem; + background: rgba(16, 185, 129, 0.2); + color: #6ee7b7; + padding: 2px 8px; + border-radius: 20px; + font-weight: 500; + text-transform: capitalize; +} + +.savings-amount { + font-size: 2rem; + font-weight: 800; + color: #10b981; + letter-spacing: -0.5px; + line-height: 1.1; +} + +.savings-unit { + font-size: 0.95rem; + font-weight: 500; + color: #6ee7b7; + margin-left: 4px; +} + +.savings-pct-line { + margin-top: 0.35rem; + font-size: 0.87rem; + color: var(--text-secondary); +} + +.savings-pct { color: #fbbf24; font-weight: 600; } + +.savings-table { + width: 100%; + border-collapse: collapse; + margin-top: 1rem; + font-size: 0.85rem; + table-layout: fixed; +} + +.savings-table th { + text-align: left; + color: var(--text-secondary); + font-weight: 600; + border-bottom: 1px solid rgba(255,255,255,0.1); + padding: 0.4rem 0.5rem; + overflow: hidden; + white-space: nowrap; + text-overflow: ellipsis; +} + +.savings-table td { + padding: 0.4rem 0.5rem; + border-bottom: 1px solid rgba(255,255,255,0.05); + vertical-align: top; + overflow: hidden; + white-space: nowrap; + text-overflow: ellipsis; + max-width: 0; +} + +.savings-table tr:last-child td { border-bottom: none; } +/* ───────────────── */ + .recommendations-section { background: rgba(99, 102, 241, 0.1); border: 1px solid rgba(99, 102, 241, 0.3); @@ -2279,6 +2375,85 @@ textarea:focus { padding: 1.5rem; } +/* Two-column insights row: recommendations (left) + savings panel (right) */ +.insights-row { + display: grid; + grid-template-columns: 1fr; + gap: 1rem; + margin-top: 1rem; +} + +.insights-row.has-savings { + grid-template-columns: 1fr 1fr; +} + +/* Savings panel */ +.savings-panel { + background: rgba(16, 185, 129, 0.07); + border: 1px solid rgba(16, 185, 129, 0.3); + border-radius: 8px; + padding: 1.5rem; +} + +.savings-panel-title { + font-size: 1.1rem; + font-weight: 600; + margin-bottom: 0.75rem; + color: var(--text-main); +} + +.savings-panel-amount { + font-size: 1.6rem; + font-weight: 700; + color: var(--success); + line-height: 1.2; +} + +.savings-panel-unit { + font-size: 1rem; + font-weight: 400; + color: var(--text-secondary); + margin-left: 2px; +} + +.savings-panel-pct { + font-size: 0.85rem; + color: var(--text-secondary); + margin-top: 0.2rem; + margin-bottom: 0.5rem; +} + +.savings-expand { + margin-top: 0.4rem; +} + +.savings-expand summary { + cursor: pointer; + font-size: 0.85rem; + color: var(--text-secondary); + padding: 0.3rem 0; + user-select: none; + list-style: none; /* Firefox */ +} + +.savings-expand summary::-webkit-details-marker { + display: none; /* Chrome / Safari / Edge */ +} + +.savings-expand summary::before { + content: 'β–Ά '; + font-size: 0.7em; + vertical-align: middle; +} + +.savings-expand[open] summary::before { + content: 'β–Ό '; +} + +.savings-expand summary:hover { + color: var(--text-main); +} + .recommendations-title { font-size: 1.1rem; font-weight: 600; @@ -3132,6 +3307,18 @@ textarea:focus { .severity-tag.low-tag { color: #64748b !important; } /* ---- Recommendations ---- */ + .insights-row.has-savings { + grid-template-columns: 1fr !important; + } + + .savings-panel { + background: #f0fdf4 !important; + border: 1px solid #86efac !important; + } + + .savings-panel-amount { color: #15803d !important; } + .savings-panel-pct { color: #374151 !important; } + .recommendations-section { background: #f1f5f9 !important; border: 1px solid #cbd5e1 !important;