You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Can load the first 1k of images found in the page to determine dimensions if the attributes are not includes
47
+
* Loads as little of the image as possible when fetching for image info. Stops downloading if diminsions are found or a setable limit is reached.
29
48
30
49
* The original rules from the JS version are still implemented. (see options)
31
50
32
-
Webpage Resolver options
33
-
------------------------
51
+
ImageResolver() METHODS
52
+
-----------------------
53
+
54
+
**__init__** *(\*\*kwargs)*
55
+
56
+
Keyword options
57
+
58
+
* *max_read_size* - set to the maximum amount of bytes to read to find the width and height of an image. Default `10240`
59
+
* *chunk_size* - set to the chunk size to read Default `1024`
60
+
* *read_all* - set to read the entire image and then detect its info. Option will override max_read_size. Default `False`
61
+
* *debug* - set to enable debugging output (logger="ImageResolver"). Default `False`
62
+
63
+
**fetch** *(string url)*
64
+
65
+
Fetches a URL and returns the response data.
66
+
67
+
**fetch_image_info** *(string url)*
68
+
69
+
Fetches an image url and examines the resulting image. Returns a tuple consisting of the detected file extension, the width and the height of the image.
70
+
71
+
**register** *(instance filter)*
72
+
73
+
Register a filter to examine an image with. The filter argument must be an instance of a class that has a `resolve()` method. `resolve()` must accept a string URL and must return a url or `None`
74
+
75
+
**resolve** *(string url)*
76
+
77
+
Loop through each registered filter until a url is resolved by one of them. If no url is found, returns `None`
78
+
34
79
35
-
Options to pass to the webpage resolver. Default values are shown::
36
-
37
-
# set to true to load the first 1k of images whose size is not set in HTML
38
-
load_images=False
80
+
FileExtensionResolver() METHODS
81
+
-------------------------------
82
+
83
+
**resolve** *(string url)*
84
+
85
+
Returns the url if the extention matches a possible image
86
+
87
+
ImgurPageResolver() METHODS
88
+
---------------------------
39
89
40
-
# set to true to use the original rules from the Javascript version
41
-
use_js_ruleset=False
90
+
**resolve** *(string url)*
42
91
43
-
# set to false to disable adblock filters
44
-
use_adblock_filters=True
92
+
Returns an Imgur image url if `url` matches the pattern of an Imgur page
45
93
46
-
# set to a BeautifulSoup compatable parser (lxml is recommended)
47
-
parser='html.parser'
94
+
WebpageResolver() METHODS
95
+
-------------------------
96
+
97
+
The work-horse of this module. Our uses revolve mostly around this filter and thus it is the
98
+
most feature complete and tested.
48
99
49
-
# set to a file containing AdBlockPlus style filters that will lower an
50
-
# image's score
51
-
blacklist='blacklist.txt'
100
+
**__init__** *(\*\*kwargs)*
52
101
53
-
# set to a file containing AdBlockPlus style filters that will raise an
54
-
# image's score
55
-
whitelist='whitelist.txt'
102
+
Initialize the class with options:
103
+
104
+
* *load_image* - set to true to load the first 1k of images whose size is not set in HTML. Default `False`
105
+
* *use_js_ruleset* - set to true to use the original rules from the Javascript version. Default `False`
106
+
* *use_adblock_filters* - set to false to disable adblock filters. Default `True`
107
+
* *parser* - set to a BeautifulSoup compatable parser (lxml is recommended). Default `html.parser`
108
+
* *blacklist* - set to a file containing AdBlockPlus style filters used to lower an image's score. Default `blacklist.txt`
109
+
* *whiltelist* - set to a file containing AdBlockPlus style filters used to raise an image's score. Default `whitelist.txt`
110
+
* *significant_surface* - Amount of surface (width x height) of the image required to add additional scoring
* *skip_fetch_errors* - Skip exceptions raised by fetch_image_info(). Exceptions are logged and the image will be skipped. Default `True`
56
115
57
116
The default parser for BeautifulSoup is html.parser which is built-in to python. We *highly* recommend you install lxml and pass parser="lxml"
58
117
to WebpageResolver(). In our testing we found that it was much faster and more accurate.
59
118
60
-
Currently Implemented Resolvers
61
-
-------------------------------
119
+
LOGGING
120
+
-------
121
+
122
+
Use the name "ImageResolver" to configure a logger. Skipped exceptions will be logged to this logger's error output and when enabled, debugging output as well.
123
+
124
+
EXCEPTIONS
125
+
----------
126
+
127
+
**ImageInfoException**
62
128
63
-
* FileExtensionResolver()
129
+
Raised if the image could not be read or type, width or height properties return undefined.
130
+
By default this exception is skipped and logged but can be enabled with "skip_fetch_errors=False" option in WebpageResolver
64
131
65
-
* ImgurPageResolver()
132
+
**HTTPException**
66
133
67
-
* WebpageResolver()
134
+
Raised if the image could not be loaded from the URL.
135
+
By default this exception is skipped and logged but can be enabled with "skip_fetch_errors=False" option in WebpageResolver
Need to implement better caching. Future plan is to include a configurable cache method so images seen across sessions can be cached for better performance
100
153
101
154
102
155
AUTHOR
@@ -112,7 +165,7 @@ Probably. Send us an email or a patch if you find one
112
165
COPYRIGHT / ACKNOWLEDGEMENTS
113
166
----------------------------
114
167
115
-
(c) 2013 National Write Your Congressman
168
+
(c) 2014 Constituent Voice, LLC.
116
169
117
170
Original idea and basic setup came from Maurice Svay https://github.com/mauricesvay/ImageResolver
0 commit comments