Hacks and Tools on machine learning
matchbox.py
A tool box for pytorch, A wrapper around trainer
Full introduction here
kmean_torch.py
The core class for cuda accelerated kmeans by batch
Full introduction here
lprint.py
A program to print log in a very neat format
from ray import lprint
l=lprint.lprint("newtask")
# Do something time consuming
l.p("data loaded","data avatar images loaded")
# or you can just pass one string
l.p("data processed")
bcolzer.py
Turns big bulk array to bcolz file.
For a numpy, if you create some thing big, like a.shape=(1000000,224,224,3)
You memory won’t indulge this simplicity. With bcolz, you can flush array to hard drive, and still use bcolz array as a single variable.
from ray import bcolzer
bzr=bcolzer.bcolzer("img")
# create an image generator, use bzr.gen as the generator
bzr.img_gen("/dir_to_your_img_folder",with_class=True)
bzr.empety_img_bcolz("/dir_to_your_img_bcolz","/dir_to_your_label_bcolz")
metrics.py
Augmented metrics for keras
Added ratio
,precision
and recall
This will calculate the ratio,precision and recall of a specific category for classification problem.
from ray.metrics import precision
# Precision is the fraction of detections
# reported by the model that were correct.
def dog_precision(y_true,y_pred):
# assuming dog is your second category
return precision(1,y_true,y_pred)
# while compiling a keras model
model.compile(loss='categorical cross',metrics=["accuracy",dog_precision],optimizer="Adam")
armory.py
preproc
Preprocess before entering the cnn/resnet
Notice: it does not scale element to [0,1]
check_img_folder_multi
check img folder with multi processing
check_img_folder_multi('/data/cats/img/')
folder_split
Split folder to train/valid
from ray.armory import folder_split
# The path we input contains sub-folders(categories)
folder_split("/data/animals",percent=.8)
# percent : the percentage of train data
one_hot
Turn index array to one hot encoded array
from ray.armory import one_hot
# say if we have a array which labels 5 classes
one_hot(label_array, num_classes=5)