Cloud services are adopted by both start-ups and enterprises in recent years. However, it comes security issues. At this point, developed codes differ from the data. Critical data should be stored as encrypted. On the other hand, developed codes are mostly installed on server vulnerably. For istance, Java projects could be installed on a server as a jar/ear extention file. This files include java classes hierarchically. However, there are several decompilers extract original java codes from class files.
What if the developed code includes patentable algorithm? An enterprise might protect its intellectual property. In this case, installing the project on a server directly would be like turkeys voting for Christmas. So, what we are saying is that we should encrypt the important codes just as critical data, store them in cloud database, and decrypt it on runtime to protect intellectual property. In this way, custom codes would be still secure even if the cloud system is invaded because encryption key would not be stored on cloud system.
Even the worthy Homer sometimes nods. The idiom means even the most gifted person occasionally makes mistakes. We would adapt this sentence to machine learning lifecycle. Even the best ML-models should make mistakes (or else overfitting problem). The important thing is know how to measeure errors. There are lots of metrics for measuring forecasts. In this post, we will mention evalution metrics meaningful for ML studies.
Homer Simpson uses catchphrase D’oh! when he has done something wrong
Sign of actual and predicted value diffence should not be considered when calculation total error of a system. Otherwise, total error of a series including equally high underestimations and overestimations might measure very low error. In fact, forecasts should include low underestimations and overestimations, and total error should be measured low. Discarding sign values provides to get rid of this negative effect. Squaring differences enables discarding signs. This metric is called as Mean Squared Error or mostly MSE.
Today’s world limits our expressions lenght to 140 character. No matter you text SMS or Tweet, you have to fit your sentences based on this restriction. Further to that, nobody has toleration for longer statements. That’s why, text material should be picked and chosen. Herein, some subsdiary materials such as long links would cause to go waste. To sum up, small is beautiful.
Blowfish Represents Long Urls in Bitly Icon
URL shortening services redirect long urls to shorter one. This kind of urls are friendly for messaging technologies requiring limited number of characters such as sms. Thus, you can append a link into your message and message quota would not be exceeded. Moreover, short urls are memorable. That’s why, referencig short urls are common in hard copied paragraphs such as newspapers or magazines.
Formely, Twitter automatically shortens links with Bitly service, this makes Bitly popular. Today, The Blue Bird consumes their own t.co service.
What’s more, Bitly and Google are the most common url shortening service providers for end users. Although, web inteface of these service providers are easy to consume, urls should be shorten manuelly. In this post, we will focus on how to consume these services in our programs.
Debates between humans and computers start with mechanical turk. That’s an historical autonomous chess player costructed in 18th century. However, that’s a fake one. The mechanism allows to hide a chess player inside the machine. Thus, the turk operates while hiding master playing chess. (Yes, just like Athony Deniels and Kenny Baker hid inside of 3PO and R2D2 in Star Wars). So, there is no intelligence for this ancient example. Still, this fake machine shows expectations of 18th century people for an intelligent system to involve in daily life.
IBM Deep Blue is first chess playing computer won against a world champion. Garry Kasparov were defeated by Deep Blue in 1997. Interestingly, development of Deep Blue has began in 1985 at Carnegie Mellon University (remember this university). In other words, with 12 years study comes success.
Gradient descent is one of the most powerful optimizing method. However, learning time is a challange, too. Standard version of gradient descent learns slowly. That’s why, some modifications are included into the gradient descent in real world applications. These approaches would be applied to converge faster. In previous post, incorporating momentum is already mentioned to work for same purpose. Now, we will focus on applying adaptive learning to learn faster.
Learning Should Adapt To The Environment Like Chameleons (Rango, 2011)
As you might remember, weights are updated by the following formula in back propagation.
wi = wi – α . (∂Error / ∂wi)
Alpha refers to learning rate in the formula. Applying adaptive learning rate proposes to increase / decrease alpha based on cost changes. The following code block would realize this process.
Newton’s cradle is the most popular example of momentumconservation. A lifted and released sphere strikes the stationary spheres and force is transmitted through the stationary spheres. This action pushes the last sphere upward. This shows that the last ball receives the momentum of the first ball. We would apply similar principle in neural networks to improve learning speed. The idea including momentum into neural networks learning is incorporating previous update in the current change.
Newton’s Cradle Demonstrates Conservation of Momentum
Gradient descent guarantees to reach the local minimum when iteration approaches to infinity. However, that is not applicable in reality. Gradient descent iterations have to be terminated by a reasonable value. Moreover, gradient descent converges slowly. Herein, momentum improves the performance of the gradient descent considerably. Thus, cost might converge faster with less iterations if momentum is involved in the weight update formula.
Applying neural networks could be divided into two phases as learning and forecasting. Learning phase has high cost whereas forecasting phase produces results very quickly. Epoch value (aka training time), network structure and historical data size specify the cost of learning phase. Normally, the larger epoch produces the better results. However, increment of epoch value will cause to be taken longer time. That’s why, picking up very large epoch value would not be applicable for online transaction if learning is implemented instantly.
However, we can apply learning and forecasting steps asynchronously. We would perform neural network learning as batch application (e.g. periodic day-end or month-end calculation). Thus, epoch would be picked up as very large value. Besides, weights of neural networks will be calculated on low system load (most probably late night hours). In this way, no matter how long neural networks learning lasts. Thus, we can even make forecasts for online transactions in milliseconds. You might imagine this approach like that human nervous system updates its own weights while sleeping.