Bases: object
The main clas. Performs the following functions:
Parses command line options
Optionally just watches a directory for new PDF’s to OCR; once a file appears, it does the next step
Files the OCR’ed file in the proper place if specified
Files the original file if specified
Helper function to delete files :param files: List of files to delete :type files: list :returns: None
Helper function to run the conversion, then do the optional filing, and optional emailing.
Read in the yaml config file
Parameters: | config_file (file) – Configuration file (YAML format) |
---|---|
Returns: | dict of yaml file |
Return type: | dict |
Instance the proper PyFiler object (either pypdfocr.pypdfocr_filer_dirs.PyFilerDirs or pypdfocr.pypdfocr_filer_evernote.PyFilerEvernote)
TODO: Make this more generic to allow third-party plugin filing objects
Variables: |
|
---|---|
Returns: | Nothing |
move the converted filename to its destiantion directory. Optionally also moves the original PDF.
Parameters: |
|
---|---|
Returns: | Target folder name |
“rtype: string
Parse the command-line options and set the following object properties:
Parameters: | argv – usually just sys.argv[1:] |
---|---|
Returns: | Nothing |
Variables: |
|
The main entry point into PyPDFOCR
Does the following:
Parameters: | pdf_filename (string) – Scanned PDF |
---|---|
Returns: | OCR’ed PDF |
Return type: | filename string |
Wrap ghostscript calls. Yes, this is ugly.
Bases: object
Class to wrap all the ghostscript calls
Wrap pdf generation and text addition code
Bases: object
Class to create pdfs from images
Take two page objects, rotate the text page if necessary, and return the merged page
Draw an invisible text layer for OCR data.
This function really needs to get cleaned up
alist.sort(key=natural_keys) sorts in human order http://nedbatchelder.com/blog/200712/human_sorting.html (See Toothy’s implementation in the comments)
Provides capability to search PDFs and file to a specific folder based on keywords
Run Tesseract to generate hocr file
Something
Bases: watchdog.events.FileSystemEventHandler
Watch a folder for new pdf files.
If new file event, then add it to queue with timestamp. If file mofified event, then change timestamp in queue. Every few seconds pop-off queue and if timestamp older than 3 seconds, process the file else, push it back onto queue.
Called by the file watching api on any file creations/modifications. For any file ending with ”.pdf”, but not “_ocr.pdf”, it adds new files to the event queue with the current time stamp, or it updates existing files in the queue with the current timestamp. This queue is used to track files and keep track of their last “touched” time, so we can start processing a file if check_queue() finds a file that hasn’t been touched in a while.
If the file does note exist in the events dict:
- Add it with the current time
Otherwise:
- If the file time is marked as -1, delete it from the dict
- Else, update the time in the dict to the current time
This function is called at regular intervals by start().
Iterate through the events, and if there is any with a timestamp greater than the scan_interval, return it and set its timestamp to -1 for purging later.
Returns: | Filename if available to process, otherwise None. |
---|
Wrap ImageMagick calls. Yes, this is ugly.
Bases: object
Abstract base class for defining filing objects, whether you want to save to a file-system/directory structure or to something like Evernote
Move the original file given by filename to the proper location. You will need to use original_move_target
Parameters: | original_filename (string) – File to move |
---|---|
Returns: | Full path+filename of destination(original_filename if not moved) |
Return type: | string |
Data structure for mapping a keyword to a folder target. Usually just a dict, and new mappings are added from add_folder_target()
Move the file given by filename to the proper location. You will need to use target_folder and folder_targets to figure out what the proper destination is. If there is no matching location, then use default_folder
Parameters: | filename (string) – File to move |
---|---|
Returns: | Full path+filename of destination |
Return type: | string |
Bases: pypdfocr.pypdfocr_filer.PyFiler
Bases: pypdfocr.pypdfocr_filer.PyFiler
Establish a connection to evernote and authenticate.
Parameters: | dictUserInfo – Dict of user info like user/passwrod. For now, just the dev token |
---|---|
Returns success: | |
Return wheter connection succeeded | |
Rtype bool: |
Override this to make sure we only have the basename
Just file it to the local file system (don’t upload to evernote)
Use the evernote API to create a new note: