Integrate BugsInPy by t-sorger · Pull Request #184 · ASSERT-KTH/repairbench-framework

t-sorger · 2024-11-24T16:57:09Z

WIP: Issue #178

andre15silva · 2025-01-14T09:17:30Z

Any update here?

FYI, it would probably be a good idea to rebase with master, I made some updates recently.

t-sorger · 2025-01-14T11:36:44Z

I am still encountering issues with BugsInPy and how it works.
I will send you an email so we can arrange a meeting to discuss the problems I’m facing.

Thanks for the hint, I will rebase with master as soon as possible!

Update submodules when rebasing with master

…k into BugsInPy

Update submodules when rebasing with master

…k into BugsInPy

t-sorger · 2025-05-28T07:00:40Z

Hi @andre15silva, I've been very busy lately finishing my thesis. Therefore, I will continue working on this PR after my defence, which is scheduled for mid-June.

andre15silva · 2025-07-01T11:19:10Z

Thanks for the updates @t-sorger ! What are the current blockers here?

t-sorger · 2025-07-01T12:10:11Z

Hi @andre15silva, I still need to check why the tests are running locally but not here; I haven’t had a chance to look into it yet.
Other than that, the tests for core should be more or less done. There were some internal dependency issues that I need to double-check and figure out which bugs are affected and how to resolve them.
Next step (please correct me if I’m wrong) would be to continue/start writing the tests for sampling and evaluation.

andre15silva · 2025-07-02T13:51:39Z

Got it.

As for the next steps, yes that's about it. The list we had defined is still valid: #178 (comment)

t-sorger · 2025-09-23T13:34:23Z

Hi @andre15silva,
I’d like to run a test across the entire benchmark now to identify unreproducible bugs and exclude them.
From my understanding, this can be done by running test_checkout_all_bugs and test_run_all_bugs and then checking which ones fail and why.
Is there anything else I should keep in mind?
Thanks!

andre15silva · 2025-09-24T15:20:47Z

Hi @andre15silva, I’d like to run a test across the entire benchmark now to identify unreproducible bugs and exclude them. From my understanding, this can be done by running test_checkout_all_bugs and test_run_all_bugs and then checking which ones fail and why. Is there anything else I should keep in mind? Thanks!

Hi @t-sorger !

test_checkout_all_bugs is just to ensure they can all be checked-out
test_run_all_bugs should check out each bug and run both the buggy and fixed version.

For identifying the flaky ones, you want to run them several times. One solution is to add a for loop in the test_run_all_bugs, to confirm e.g. 5 times that the results are as expected for both version of each sample.

t-sorger · 2025-10-01T12:58:00Z

The test_checkout_all_bugs runs fine for all bugs (takes around 3 hours to run).
I also started running the test_run_all_bugs. They take quite a while, so I let them run for a few hours (~130 bugs in so far). It looks like many of them fail, some, which I double-checked manually, with a command not found error. I’m not sure if I’m missing a dependency, but the library’s documentation doesn’t mention any additional installation steps or external dependencies required.

andre15silva · 2025-10-04T05:23:07Z

The test_checkout_all_bugs runs fine for all bugs (takes around 3 hours to run). I also started running the test_run_all_bugs. They take quite a while, so I let them run for a few hours (~130 bugs in so far). It looks like many of them fail, some, which I double-checked manually, with a command not found error. I’m not sure if I’m missing a dependency, but the library’s documentation doesn’t mention any additional installation steps or external dependencies required.

How many failed to run the tests and what command fails to run? Would be nice to have the statistics of this and a list of common errors.

monperrus · 2026-02-17T06:40:14Z

ping @t-sorger for completion. thanks!

t-sorger · 2026-02-19T17:45:01Z

After running the tests on all the bugs, I got the following results.

Is there a specific project I should prioritise to analyse why they fail or why the command not found errors occur? I remember using ansible and pysnooper to go through the setup process manually to understand the flow, so I may have installed some dependencies when the setup failed. I assume some more dependencies are missing for the other projects as well.

monperrus · 2026-02-20T09:00:36Z

you can debug with any project and hopefully the root cause and its fix will be shared with the other ones.

t-sorger · 2026-03-05T20:42:06Z

I added more extensive logging and ran it over the entire dataset again, so I can now investigate the log files.

t-sorger added 5 commits November 24, 2024 17:54

add BugsInPy submodule

a09695d

add initial BugsInPybug.py

c9384d5

add initial BugsInPy.py to benchmark

ce48490

add BugsInPy to core utils

865975b

add initial tests for BugsInPy; fix typo

e8976c5

t-sorger added 23 commits January 14, 2025 13:37

add BugsInPy submodule

9a3325d

add initial BugsInPybug.py

96d79c5

add initial BugsInPy.py to benchmark

83b35cd

add BugsInPy to core utils

0cf0179

add initial tests for BugsInPy; fix typo

e09839c

add test implementation for BugsInPybug

f335bdf

fix bin path issues

2bc479a

lint code

bd08ec1

rework tests for BugsInPy

11600a3

update submodules

1cc7bc6

Update submodules when rebasing with master

Merge branch 'BugsInPy' of github.com:ASSERT-KTH/repairbench-framewor…

0d28f9d

…k into BugsInPy

add BugsInPy submodule

d3de871

add initial BugsInPybug.py

56f4502

add initial BugsInPy.py to benchmark

8274a8d

add BugsInPy to core utils

63f5834

add initial tests for BugsInPy; fix typo

8e761a6

add test implementation for BugsInPybug

41821d4

fix bin path issues

28e4c9a

lint code

21420fd

rework tests for BugsInPy

5962796

update submodules

ea287fa

Update submodules when rebasing with master

Merge branch 'BugsInPy' of github.com:ASSERT-KTH/repairbench-framewor…

17c438d

…k into BugsInPy

adds RichBug and fixes process calls

7177e86

t-sorger added 3 commits June 27, 2025 11:43

add first docker adoptations

b72565c

update BugsInPy for Docker

5507ee7

lint files

029538a

t-sorger force-pushed the BugsInPy branch from aac1e95 to 029538a Compare June 27, 2025 16:51

update steup

04a0fc0

t-sorger added 5 commits September 21, 2025 15:01

add sample/instruct test for BugsInPy

b629e73

add sample/infilling test for BugsInPy

70e7251

add evaluation tests for BugsInPy

6dd1290

add missing tests for RichBug implementation of BugsInPy

7c21a6d

remove prints

4963e5b

t-sorger added 8 commits June 13, 2026 16:02

use bugsinpy docker

fc33407

update Dockerfile.bugsinpy and clean up code

7d667dd

pins actions to a full-length commit SHA

7f6a817

try fixing workflow

c1a08d2

put setup.sh into initial state

bf00014

fix setup.sh

e98b448

update test workflow to only run BugsInPy tests

a6578a1

Merge branch 'master' into BugsInPy

a9f1cee

Uh oh!

Conversation

t-sorger commented Nov 24, 2024

Uh oh!

andre15silva commented Jan 14, 2025

Uh oh!

t-sorger commented Jan 14, 2025

Uh oh!

t-sorger commented May 28, 2025

Uh oh!

andre15silva commented Jul 1, 2025

Uh oh!

t-sorger commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andre15silva commented Jul 2, 2025

Uh oh!

t-sorger commented Sep 23, 2025

Uh oh!

andre15silva commented Sep 24, 2025

Uh oh!

t-sorger commented Oct 1, 2025

Uh oh!

andre15silva commented Oct 4, 2025

Uh oh!

monperrus commented Feb 17, 2026

Uh oh!

t-sorger commented Feb 19, 2026

Uh oh!

monperrus commented Feb 20, 2026

Uh oh!

t-sorger commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

t-sorger commented Jul 1, 2025 •

edited

Loading