Skip to content

Integrate BugsInPy#184

Open
t-sorger wants to merge 60 commits into
masterfrom
BugsInPy
Open

Integrate BugsInPy#184
t-sorger wants to merge 60 commits into
masterfrom
BugsInPy

Conversation

@t-sorger

Copy link
Copy Markdown
Collaborator

WIP: Issue #178

@andre15silva

Copy link
Copy Markdown
Member

Hi @t-sorger,

Any update here?

FYI, it would probably be a good idea to rebase with master, I made some updates recently.

@t-sorger

Copy link
Copy Markdown
Collaborator Author

Hi @andre15silva,

I am still encountering issues with BugsInPy and how it works.
I will send you an email so we can arrange a meeting to discuss the problems I’m facing.

Thanks for the hint, I will rebase with master as soon as possible!

@t-sorger

Copy link
Copy Markdown
Collaborator Author

Hi @andre15silva, I've been very busy lately finishing my thesis. Therefore, I will continue working on this PR after my defence, which is scheduled for mid-June.

@andre15silva

Copy link
Copy Markdown
Member

Thanks for the updates @t-sorger ! What are the current blockers here?

@t-sorger

t-sorger commented Jul 1, 2025

Copy link
Copy Markdown
Collaborator Author

Hi @andre15silva, I still need to check why the tests are running locally but not here; I haven’t had a chance to look into it yet.
Other than that, the tests for core should be more or less done. There were some internal dependency issues that I need to double-check and figure out which bugs are affected and how to resolve them.
Next step (please correct me if I’m wrong) would be to continue/start writing the tests for sampling and evaluation.

@andre15silva

Copy link
Copy Markdown
Member

Got it.

As for the next steps, yes that's about it. The list we had defined is still valid: #178 (comment)

@t-sorger

Copy link
Copy Markdown
Collaborator Author

Hi @andre15silva,
I’d like to run a test across the entire benchmark now to identify unreproducible bugs and exclude them.
From my understanding, this can be done by running test_checkout_all_bugs and test_run_all_bugs and then checking which ones fail and why.
Is there anything else I should keep in mind?
Thanks!

@andre15silva

Copy link
Copy Markdown
Member

Hi @andre15silva, I’d like to run a test across the entire benchmark now to identify unreproducible bugs and exclude them. From my understanding, this can be done by running test_checkout_all_bugs and test_run_all_bugs and then checking which ones fail and why. Is there anything else I should keep in mind? Thanks!

Hi @t-sorger !

test_checkout_all_bugs is just to ensure they can all be checked-out
test_run_all_bugs should check out each bug and run both the buggy and fixed version.

For identifying the flaky ones, you want to run them several times. One solution is to add a for loop in the test_run_all_bugs, to confirm e.g. 5 times that the results are as expected for both version of each sample.

@t-sorger

t-sorger commented Oct 1, 2025

Copy link
Copy Markdown
Collaborator Author

The test_checkout_all_bugs runs fine for all bugs (takes around 3 hours to run).
I also started running the test_run_all_bugs. They take quite a while, so I let them run for a few hours (~130 bugs in so far). It looks like many of them fail, some, which I double-checked manually, with a command not found error. I’m not sure if I’m missing a dependency, but the library’s documentation doesn’t mention any additional installation steps or external dependencies required.

@andre15silva

Copy link
Copy Markdown
Member

The test_checkout_all_bugs runs fine for all bugs (takes around 3 hours to run). I also started running the test_run_all_bugs. They take quite a while, so I let them run for a few hours (~130 bugs in so far). It looks like many of them fail, some, which I double-checked manually, with a command not found error. I’m not sure if I’m missing a dependency, but the library’s documentation doesn’t mention any additional installation steps or external dependencies required.

How many failed to run the tests and what command fails to run? Would be nice to have the statistics of this and a list of common errors.

@monperrus

Copy link
Copy Markdown
Contributor

ping @t-sorger for completion. thanks!

@t-sorger

Copy link
Copy Markdown
Collaborator Author

After running the tests on all the bugs, I got the following results.

Is there a specific project I should prioritise to analyse why they fail or why the command not found errors occur? I remember using ansible and pysnooper to go through the setup process manually to understand the flow, so I may have installed some dependencies when the setup failed. I assume some more dependencies are missing for the other projects as well.

@monperrus

Copy link
Copy Markdown
Contributor

you can debug with any project and hopefully the root cause and its fix will be shared with the other ones.

@t-sorger

t-sorger commented Mar 5, 2026

Copy link
Copy Markdown
Collaborator Author

I added more extensive logging and ran it over the entire dataset again, so I can now investigate the log files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants