Abstract: Broadcast is a common operation in machine learning and widely used in calculating bias or subtracting maximum for normalization in convolutional neural networks. Broadcast operation is required when two tensors possibly with different number of dimensions, hence with different number of elements, are input to an element-wise function. Tensors are scaled in process so that the two tensors match in size and dimension. In this research, we introduce a new broadcast functionality for matrices to be used on CUDA enabled GPU devices. We further extend this operation to multidimensional arrays and measure its performance against the implementation available in the Knet deep learning framework. Our final implementation provides up to 2x improvement over the Knet broadcast implementation, which only supports vector broadcast. Our implementation can handle broadcast operations with any number of dimensions.
Cheetos Food Dye Turns Mice Transparent
1 hour ago
No comments:
Post a Comment