Wv scraper by Ash1R · Pull Request #496 · biglocalnews/warn-scraper

Ash1R · 2022-10-28T05:43:11Z

This is for issue #375, for West Virginia.
It was a large pdf on their workforce site pdfplumber extracted the tables pretty well, although there were some irregularities in the pdf (for example, some tables had boxes that had the specific sites of the layoffs).
There are a couple of errors on their end (switching up values), but nothing too significant.

palewire · 2022-12-05T18:40:18Z

@Ash1R. I am still seeing MI being deleted from the repo, I think. Do you see this same thing on the files tab?

https://github.com/biglocalnews/warn-scraper/pull/496/files

palewire · 2023-01-29T13:20:22Z

+                companydone = False
+                row = []
+                for k in range(len(data)):
+                    if data[k][0] is not None:


Why is it necessary to the range in the loop here? Can you not simple do something more like for row in data?

Each company's data is contained in two consecutive rows, with some blank rows in between these company row-pairs. Alternative company names and addresses are stored on the second row. I used range so I can access the second row using an index of k + 1. I did unnecessarily use range later, so I removed that.

stucka · 2023-08-21T00:39:24Z

Triggering tests by closing and reopening.

stucka · 2023-08-21T00:50:28Z

mypy is flagging some type errors:
warn/scrapers/wv.py:65: error: Item "None" of "Optional[str]" has no attribute "strip" [union-attr]
warn/scrapers/wv.py:66: error: Item "None" of "Optional[str]" has no attribute "strip" [union-attr]
warn/scrapers/wv.py:68: error: Item "None" of "Optional[str]" has no attribute "strip" [union-attr]
warn/scrapers/wv.py:72: error: Item "None" of "Optional[str]" has no attribute "strip" [union-attr]
warn/scrapers/wv.py:74: error: Item "None" of "Optional[str]" has no attribute "strip" [union-attr]
warn/scrapers/wv.py:75: error: Item "None" of "Optional[str]" has no attribute "strip" [union-attr]
warn/scrapers/la.py:170: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs [annotation-unchecked]

stucka · 2023-08-21T00:58:52Z

@Ash1R , I think I see maybe an easy way to work around the mypy type conflict and also maybe make this a bit more readable, something like:

if not data[k][0]:
     rowkey = None
else:
    rowkey = data[k][0].strip()

Then start folding in those changes into the flagged rows, like if rowkey in in header_whitelist: and then keep working down to down. Last bit might be more readable as elif ((not rowkey) and (k != 0)) ... ?

… version

stucka · 2023-09-22T17:23:39Z

The landing page perhaps has been killed off. This is the closest I could find, and I can't guarantee it'd be updated in the same way notice after notice. https://workforcewv.org/about-us/

I have not tried seeing if this scraper works with that PDF.

Ash1R added 3 commits August 5, 2022 10:46

fixed issue biglocalnews#469 . First commit to this project

42a4e08

added a scraper for wv for issue biglocalnews#375

b83733b

Merged in main

3d8312d

palewire requested changes Nov 7, 2022

View reviewed changes

Comment thread warn/scrapers/mi.py Outdated

Comment thread warn/scrapers/mi.py Outdated

Comment thread warn/scrapers/mi.py Outdated

Comment thread warn/scrapers/mi.py

removing michigan scraper as per request

568ca53

Merged in main

6e814da

palewire requested changes Jan 4, 2023

View reviewed changes

Comment thread warn/scrapers/wv.py Outdated

Ash1R added 2 commits January 8, 2023 01:25

Merged in main

9faca27

removed some redundancy

17b8455

palewire requested changes Jan 29, 2023

View reviewed changes

remove an unncessary range()

de7ea50

stucka closed this Aug 21, 2023

stucka reopened this Aug 21, 2023

stucka added 3 commits August 20, 2023 21:05

Try to patch mypy errors

18182b5

Editing on Github may have been a bad idea

280cb56

Editing in Github was a terrible idea. Restoring back to Ashir's last…

920d488

… version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wv scraper#496

Wv scraper#496
Ash1R wants to merge 11 commits into
biglocalnews:mainfrom
Ash1R:wv-scraper

Ash1R commented Oct 28, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

palewire commented Dec 5, 2022

Uh oh!

Uh oh!

palewire Jan 29, 2023

Uh oh!

Ash1R Feb 10, 2023

Uh oh!

stucka commented Aug 21, 2023

Uh oh!

stucka commented Aug 21, 2023

Uh oh!

stucka commented Aug 21, 2023 •

edited

Loading

Uh oh!

stucka commented Sep 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Ash1R commented Oct 28, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

palewire commented Dec 5, 2022

Uh oh!

Uh oh!

palewire Jan 29, 2023

Choose a reason for hiding this comment

Uh oh!

Ash1R Feb 10, 2023

Choose a reason for hiding this comment

Uh oh!

stucka commented Aug 21, 2023

Uh oh!

stucka commented Aug 21, 2023

Uh oh!

stucka commented Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stucka commented Sep 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stucka commented Aug 21, 2023 •

edited

Loading