with 80,000 images, the headache begins. for convenience sake (and my own clarity lol), here is the workflow below. pardon the extra step in extracting just the links #regret - it is possible to download the images directly with scrapy.
1) Extract links from Wikipedia This saves all the links into ‘items.csv’ in a tab delimited csv, using my spider called ‘myspider.py’.
scrapy runspider myspider.py -o items.csv -t csv
2) Download pictures from links, crop and resize (50x50) and upload to AWS
python3 uploading.py
3) Resize (28x28) and resave it to another bucket in AWS
python3 resize_28_upload.py
4) Save pictures to an array in AWS
python3 pickle_img_array.py
5) Run test model on AWS
python3 POC_adapted_28_aws.py