Skip to content

Commit 2c34416

Browse files
committed
copied the readme into docs
1 parent 68c2e00 commit 2c34416

3 files changed

Lines changed: 102 additions & 45 deletions

File tree

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,7 @@ venv/
1616

1717
#build files
1818
build/
19+
20+
#python dist files
21+
dist/
22+
MANIFEST

MANIFEST.in

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
include *.txt
2-
include README.md
2+
include README.rst
33
recursive-include docs *.txt
44
recursive-include imageresolver/data *

docs/README.txt

Lines changed: 97 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,25 @@ ImageResolver
55
A python clone of ImageResolver for finding significant images in HTML content
66
See the excellent JS version at: https://github.com/mauricesvay/ImageResolver
77

8+
USAGE
9+
-----
10+
11+
::
12+
13+
import imageresolver
14+
import sys
15+
16+
try:
17+
i = imageresolver.ImageResolver()
18+
i.register(imageresolver.FileExtensionResolver())
19+
i.register(imageresolver.ImgurPageResolver())
20+
i.register(imageresolver.WebpageResolver(load_images=True, parser='lxml',blacklist='easylist.txt'))
21+
url = sys.argv[1]
22+
23+
print i.resolve(url)
24+
except:
25+
print "An error occured"
26+
827
Differences From the Javascript Version
928
---------------------------------------
1029

@@ -25,50 +44,101 @@ WebpageResolver Additions
2544

2645
* blacklist image sources and whitelist
2746

28-
* Can load the first 1k of images found in the page to determine dimensions if the attributes are not includes
47+
* Loads as little of the image as possible when fetching for image info. Stops downloading if diminsions are found or a setable limit is reached.
2948

3049
* The original rules from the JS version are still implemented. (see options)
3150

32-
Webpage Resolver options
33-
------------------------
51+
ImageResolver() METHODS
52+
-----------------------
53+
54+
**__init__** *(\*\*kwargs)*
55+
56+
Keyword options
57+
58+
* *max_read_size* - set to the maximum amount of bytes to read to find the width and height of an image. Default `10240`
59+
* *chunk_size* - set to the chunk size to read Default `1024`
60+
* *read_all* - set to read the entire image and then detect its info. Option will override max_read_size. Default `False`
61+
* *debug* - set to enable debugging output (logger="ImageResolver"). Default `False`
62+
63+
**fetch** *(string url)*
64+
65+
Fetches a URL and returns the response data.
66+
67+
**fetch_image_info** *(string url)*
68+
69+
Fetches an image url and examines the resulting image. Returns a tuple consisting of the detected file extension, the width and the height of the image.
70+
71+
**register** *(instance filter)*
72+
73+
Register a filter to examine an image with. The filter argument must be an instance of a class that has a `resolve()` method. `resolve()` must accept a string URL and must return a url or `None`
74+
75+
**resolve** *(string url)*
76+
77+
Loop through each registered filter until a url is resolved by one of them. If no url is found, returns `None`
78+
3479

35-
Options to pass to the webpage resolver. Default values are shown::
36-
37-
# set to true to load the first 1k of images whose size is not set in HTML
38-
load_images=False
80+
FileExtensionResolver() METHODS
81+
-------------------------------
82+
83+
**resolve** *(string url)*
84+
85+
Returns the url if the extention matches a possible image
86+
87+
ImgurPageResolver() METHODS
88+
---------------------------
3989

40-
# set to true to use the original rules from the Javascript version
41-
use_js_ruleset=False
90+
**resolve** *(string url)*
4291

43-
# set to false to disable adblock filters
44-
use_adblock_filters=True
92+
Returns an Imgur image url if `url` matches the pattern of an Imgur page
4593

46-
# set to a BeautifulSoup compatable parser (lxml is recommended)
47-
parser='html.parser'
94+
WebpageResolver() METHODS
95+
-------------------------
96+
97+
The work-horse of this module. Our uses revolve mostly around this filter and thus it is the
98+
most feature complete and tested.
4899

49-
# set to a file containing AdBlockPlus style filters that will lower an
50-
# image's score
51-
blacklist='blacklist.txt'
100+
**__init__** *(\*\*kwargs)*
52101

53-
# set to a file containing AdBlockPlus style filters that will raise an
54-
# image's score
55-
whitelist='whitelist.txt'
102+
Initialize the class with options:
103+
104+
* *load_image* - set to true to load the first 1k of images whose size is not set in HTML. Default `False`
105+
* *use_js_ruleset* - set to true to use the original rules from the Javascript version. Default `False`
106+
* *use_adblock_filters* - set to false to disable adblock filters. Default `True`
107+
* *parser* - set to a BeautifulSoup compatable parser (lxml is recommended). Default `html.parser`
108+
* *blacklist* - set to a file containing AdBlockPlus style filters used to lower an image's score. Default `blacklist.txt`
109+
* *whiltelist* - set to a file containing AdBlockPlus style filters used to raise an image's score. Default `whitelist.txt`
110+
* *significant_surface* - Amount of surface (width x height) of the image required to add additional scoring
111+
* *boost_jpeg* - add (int) boost score to JPEG files. Default `1`
112+
* *boost_gif* - add (int) boost score to GIF files. Default `0`
113+
* *boost_png* - add (int) boost score to PNG files. Default `0`
114+
* *skip_fetch_errors* - Skip exceptions raised by fetch_image_info(). Exceptions are logged and the image will be skipped. Default `True`
56115

57116
The default parser for BeautifulSoup is html.parser which is built-in to python. We *highly* recommend you install lxml and pass parser="lxml"
58117
to WebpageResolver(). In our testing we found that it was much faster and more accurate.
59118

60-
Currently Implemented Resolvers
61-
-------------------------------
119+
LOGGING
120+
-------
121+
122+
Use the name "ImageResolver" to configure a logger. Skipped exceptions will be logged to this logger's error output and when enabled, debugging output as well.
123+
124+
EXCEPTIONS
125+
----------
126+
127+
**ImageInfoException**
62128

63-
* FileExtensionResolver()
129+
Raised if the image could not be read or type, width or height properties return undefined.
130+
By default this exception is skipped and logged but can be enabled with "skip_fetch_errors=False" option in WebpageResolver
64131

65-
* ImgurPageResolver()
132+
**HTTPException**
66133

67-
* WebpageResolver()
134+
Raised if the image could not be loaded from the URL.
135+
By default this exception is skipped and logged but can be enabled with "skip_fetch_errors=False" option in WebpageResolver
68136

69-
To Be Implemented
137+
TODO
70138
-----------------
71139

140+
Still missing the following resolvers:
141+
72142
* ImgurAlbumResolver()
73143

74144
* FlickrResolver()
@@ -79,24 +149,7 @@ To Be Implemented
79149

80150
I have no plans to implement a 9gag resolver.
81151

82-
USAGE
83-
-----
84-
85-
::
86-
87-
import imageresolver
88-
import sys
89-
90-
try:
91-
i = imageresolver.ImageResolver()
92-
i.register(imageresolver.FileExtensionResolver())
93-
i.register(imageresolver.ImgurPageResolver())
94-
i.register(imageresolver.WebpageResolver(load_images=True, parser='lxml',blacklist='easylist.txt'))
95-
url = sys.argv[1]
96-
97-
print i.resolve(url)
98-
except:
99-
print "An error occured"
152+
Need to implement better caching. Future plan is to include a configurable cache method so images seen across sessions can be cached for better performance
100153

101154

102155
AUTHOR
@@ -112,7 +165,7 @@ Probably. Send us an email or a patch if you find one
112165
COPYRIGHT / ACKNOWLEDGEMENTS
113166
----------------------------
114167

115-
(c) 2013 National Write Your Congressman
168+
(c) 2014 Constituent Voice, LLC.
116169

117170
Original idea and basic setup came from Maurice Svay https://github.com/mauricesvay/ImageResolver
118171

0 commit comments

Comments
 (0)