Add ability to scrape user pages by xraymemory · Pull Request #12 · meetmangukiya/instagram-scraper

xraymemory · 2018-06-28T04:01:06Z

Refactored the code so that you can specify both tags and users to be scraped. Also fixed some off by one errors and added more function documentation.

meetmangukiya

This is a nice new addition, thank you! Left a detailed review, some are nitpicks here and there, please bear with me. 😅

meetmangukiya · 2018-06-28T17:08:08Z

@@ -1,3 +1,4 @@
+


umm... why?

meetmangukiya · 2018-06-28T17:24:15Z

+        :param short_circuit:
+            Whether or not to short_circuit total_count loop 
+
+    Yields url, captions, hashtags, and mentions for provided insta url


caption*

Move this to the top, in the docstring.

meetmangukiya · 2018-06-28T17:24:34Z

+        :param existing:
+            URLs to skip
+        :param short_circuit:
+            Whether or not to short_circuit total_count loop 


dedent lines 26-33 by 4 spaces

meetmangukiya · 2018-06-28T17:32:05Z

        Total number of images to be scraped.
+    :param existing:
+        URLs to skip
+    :param mode


add a colon after mode

meetmangukiya · 2018-06-28T17:34:47Z

+        List of users to be scraped
+    :param total_count:
+        total number of images to be scraped
+    :param should_continue


add colon after should_continue

meetmangukiya · 2018-06-28T17:52:16Z

-                    existing_links.add(row[1])
-                start = i + 1
-        _single_tag_processing(tag, total_count, existing_links, start)
+                    print(f'[{target}] downloaded {url} as {file_index}.jpg in data/{target}')


This becomes incorrect, since we are downloading as f'{count}.jpg' which is one less than file_index. Replace count with file_index, better variable name.

meetmangukiya · 2018-06-28T17:54:16Z

                try:
                    req = requests.get(url)
-                    with open(f'data/{tag}/{count}.jpg', 'wb') as img:
+                    with open(f'data/{target}/{count}.jpg', 'wb') as img:


We want the users to be able to distinguish between the user photos, and tag photos, since if I scrape @instagram, I might mistake it for images scraped from instagram tag. So, mode specific data directories. :)

meetmangukiya · 2018-06-28T17:54:37Z

+                    print(f'[{target}] downloaded {url} as {file_index}.jpg in data/{target}')
+
+    targets = {'tags': tags, 'users': users}
+    for mode,lists in targets.items():


space after ,

meetmangukiya · 2018-06-28T17:55:40Z

+
+    Scrapes user and hashtag images from Instagram
+    """
+    def _single_input_processing(target: str, total_count: int, existing_links: set, start: int, mode: str='tag'):


Rename this, this is no longer single input processing.

meetmangukiya · 2018-06-28T17:56:48Z

+                    for i, row in enumerate(reader):
+                        existing_links.add(row[1])
+                    start = i + 1
+            _single_input_processing(target, total_count, existing_links, start, mode=mode)


Account the rename here too

xraymemory added 6 commits June 27, 2018 23:40

Add ability to scrape user pages

7607fa2

More meta info for functions

6a91d7b

Move count tick

f1c5d8a

Fix off by one

78d6e31

More off by one fun

c4e25ab

Remove gratuitous prints

ab2004e

meetmangukiya suggested changes Jun 28, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to scrape user pages#12

Add ability to scrape user pages#12
xraymemory wants to merge 6 commits intomeetmangukiya:masterfrom
xraymemory:patch-1

xraymemory commented Jun 28, 2018

Uh oh!

meetmangukiya left a comment

Uh oh!

meetmangukiya Jun 28, 2018

Uh oh!

meetmangukiya Jun 28, 2018

Uh oh!

meetmangukiya Jun 28, 2018

Uh oh!

meetmangukiya Jun 28, 2018

Uh oh!

meetmangukiya Jun 28, 2018

Uh oh!

meetmangukiya Jun 28, 2018

Uh oh!

meetmangukiya Jun 28, 2018

Uh oh!

meetmangukiya Jun 28, 2018

Uh oh!

meetmangukiya Jun 28, 2018

Uh oh!

meetmangukiya Jun 28, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xraymemory commented Jun 28, 2018

Uh oh!

meetmangukiya left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants