Wv scraper#496
Conversation
|
@Ash1R. I am still seeing MI being deleted from the repo, I think. Do you see this same thing on the files tab? |
| companydone = False | ||
| row = [] | ||
| for k in range(len(data)): | ||
| if data[k][0] is not None: |
There was a problem hiding this comment.
Why is it necessary to the range in the loop here? Can you not simple do something more like for row in data?
There was a problem hiding this comment.
Each company's data is contained in two consecutive rows, with some blank rows in between these company row-pairs. Alternative company names and addresses are stored on the second row. I used range so I can access the second row using an index of k + 1. I did unnecessarily use range later, so I removed that.
|
Triggering tests by closing and reopening. |
|
mypy is flagging some type errors: |
|
@Ash1R , I think I see maybe an easy way to work around the mypy type conflict and also maybe make this a bit more readable, something like: Then start folding in those changes into the flagged rows, like |
|
The landing page perhaps has been killed off. This is the closest I could find, and I can't guarantee it'd be updated in the same way notice after notice. https://workforcewv.org/about-us/ I have not tried seeing if this scraper works with that PDF. |

This is for issue #375, for West Virginia.
It was a large pdf on their workforce site pdfplumber extracted the tables pretty well, although there were some irregularities in the pdf (for example, some tables had boxes that had the specific sites of the layoffs).
There are a couple of errors on their end (switching up values), but nothing too significant.