NOTEPADJekyll2018-08-28T16:32:33+00:00http:/blog/Shubham Shuklahttp:/blog/shubhamshukla1197@gmail.comhttp:/blog/gsoc'18/Peer-Review2018-08-11T00:00:00+00:002018-08-11T00:00:00+00:00Shubham Shuklahttp:/blogshubhamshukla1197@gmail.com
<h2 id="peer-review">Peer Review</h2>
<p>This is my review of the work wuwei has done during the summer and his blog post. I have found his work very interesting and there are a few things we can talk about.</p>
<h2 id="main-ideas">Main ideas</h2>
<p>The Transformers API is nicely implemented. The old converter and <a href="http://www.shogun-toolbox.org/api/6.0.0/classshogun_1_1CPreprocessor.html">preprocessor</a> API was essentially doing the same things hence a unified class was needed. <code class="highlighter-rouge">.fit</code> and <code class="highlighter-rouge">.transform</code> are simple to use from swig too <a href="https://github.com/shogun-toolbox/shogun/blob/41888fe7c8dc3797063d674f452e13351f321338/examples/meta/src/converter/independent_component_analysis_fast.sg#L14">here</a>. I agree the ref-ing is sometimes a lot of trouble to deal with (for example <a href="https://github.com/shogun-toolbox/shogun/pull/4285/files#diff-90ffa0d34a7969080b14e84afd82eae7L58">here</a>), hence this work really shines in its simplicity inspite being huge.</p>
<p>The pipeline API can be used to chain transformers and machines together as shown <a href="https://github.com/vinx13/shogun/blob/56acfd9bb58ccd38b7f3ce8f177de042dfa056ca/examples/meta/src/pipeline/pipeline.sg#L24">here</a>. Because something like</p>
<pre><code class="language-Python">p.over("submean", transformer("PruneVarSubMean")).then("kmeans", machine("KMeans"))
</code></pre>
<p>is possible in Python the API is a lot similar to that of <a href="http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html">sklearn</a>. I really liked this. Also the keywords <code class="highlighter-rouge">over</code> and <code class="highlighter-rouge">then</code> make the fact that pipeline will need a pair of transformer and machine. Overall it is definitely user friendly.<br />
The pipeline is easy to use with cross validation <a href="https://github.com/shogun-toolbox/shogun/blob/5bd958ca3accdd6e21da2fb10b06566c097addc5/examples/meta/src/evaluation/cross_validation_pipeline.sg#L25">here</a> too.</p>
<p>The custom exceptions that were introduced will lead to some creative error handling in shogun. The idea of seperating <code class="highlighter-rouge">logical_errors</code> with <code class="highlighter-rouge">out_of_range</code> exceptions is an informative and clean. The idea is simply to throw a std exception instead of always using <code class="highlighter-rouge">ShogunException</code> with the help of two macros <code class="highlighter-rouge">SG_THROW</code> and <code class="highlighter-rouge">REQUIRE_E</code>. I think we can write about using these exceptions all over shogun as a future work somewhere because this will make a nice task for contributors.</p>
<p>The view template for feature and labels is a nice step towards immutable features. Making a shadow copy of the features ensures features are not modified. The best thing is they are <code class="highlighter-rouge">const</code> during view <a href="https://github.com/shogun-toolbox/shogun/pull/4352/files#diff-1c85d9d179a2ac86ac9808c1e1ea342eR25">here</a>. I also liked returning <code class="highlighter-rouge">Some</code> as subset features. No one likes dealing with refs/ unrefs. The <code class="highlighter-rouge">duplicate</code> method <a href="https://github.com/shogun-toolbox/shogun/pull/4352/files#diff-d635d9223d8ee233fa41029992b2d832R137">here</a> is not causing data copying overhead. Also it is more clear how to use this since <code class="highlighter-rouge">add_subset</code> and then <code class="highlighter-rouge">remove_subset</code> can be complicated to work with.</p>
<p>Untemplated linalg has a lot of ideas I found very interesting. The idea behind lazy evaluation is to deal with types at runtime. An untemplated Vector and Matrix which can be converted to templated instances of SGVector and SGMatrix using an operator. I like how the code for basic things like dot, add, multiply is already there demonstrating the idea’s feasibility. I had some trouble about how <code class="highlighter-rouge">Exp</code> came into the picture <a href="http://wuwei.io/post/2018/06/lazy-evaluation-with-expression-templates-1/">here</a> some hints for that will make it more clearer. I found it a bit difficult to diffrentiate between implicit and explicit evaluations of expressions (<a href="https://github.com/vinx13/shogun-untemplated-demo/blob/162a9aceb28d231580f70e2cb1f8c7b45b073894/demo.cpp#L32">here</a> for example). <br />
This is where the lazy evaluation truly shines in the sense that the expression is not evaluated until it is “needed” to be assigned to a Vector. I think the more we eval implicitly the better a few thoughts about that will be helpful. The recursive simplicity of eval method is very intuitive. Solve lhs, solve rhs, apply the operator. <a href="https://github.com/vinx13/shogun-untemplated-demo/blob/162a9aceb28d231580f70e2cb1f8c7b45b073894/demo.cpp#L39">An explicit eval here</a> is however a price we pay for being lazy it seems.</p>
<p>Maybe we can add a few words about future works and ideas to the <a href="http://wuwei.io/post/2018/08/gsoc18-final-review/">blog</a>. This will help pick things up later. Ideas like meta example for inverse transform API, using custom exceptions shogun-wide, or the systematic tests with all features and labels view can be a few good candidates candidates.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I have enjoyed working at shogun with you this summer. I appreciate the complexity of work that has been done over the summer. Your work during the project has kept me excited and helped me work more efficiently. I look forward to writing more code together.</p>
<p><a href="http:/blog/gsoc'18/Peer-Review">Peer Review Continuous Detoxification</a> was originally published by Shubham Shukla at <a href="http:/blog">NOTEPAD</a> on August 11, 2018.</p>http:/blog/gsoc'18/Final-Report2018-08-10T00:00:00+00:002018-08-10T00:00:00+00:00Shubham Shuklahttp:/blogshubhamshukla1197@gmail.com
<h2 id="overview">Overview</h2>
<p><strong>Name</strong>: Shubham Shukla<br />
<strong>Project</strong>: Inside The Black Box<br />
<strong>Mentors</strong>: Heiko Strathmann, Giovanni De Toni<br />
<strong>Organization</strong>: Shogun Machine Learning Toolbox</p>
<h3 id="abstract">Abstract</h3>
<p>Shogun is a large scale machine learning toolkit developed by many diffrent minds and ideas. This means we have a lot of opportunities to optimize what goes on under the hood and create something simple and impactful. This project focuses a bit more on Iterative Algorithms among other things. The premature stopping framework was improved to make it more robust and natural to use and modify, the progress bar was improved to make it more verbose along with implementing it in some iterative algorithms, We also worked on making algorithms respect the provided feature types making them fully templated for a more generic behaviour.</p>
<h3 id="table-of-contents">Table of Contents</h3>
<ul>
<li><a href="#stoppablesgobject-class-and-progress-bar">StoppableSGObject class and progress bar</a></li>
<li><a href="#iterative-machine">Iterative Machine</a></li>
<li><a href="#feature-type-dispatching-and-generic-nature">Feature type dispatching and generic nature</a></li>
<li><a href="#other-contributions-and-ideas">Other Contributions and Ideas</a></li>
</ul>
<h3 id="stoppablesgobject-class-and-progress-bar">StoppableSGObject class and progress bar</h3>
<p>In shogun we have a <code class="highlighter-rouge">SignalHandler</code> that gives some control of what happens in case of an event, like premature cancellation (<code class="highlighter-rouge">CTRL+C</code>), back to the user for some algorithms. We take all the premature stopping code and make it accessible to non <code class="highlighter-rouge">CMachine</code> types as well by placing it in a new <code class="highlighter-rouge">CStoppableSGObject</code> class. My mentor also introduced <code class="highlighter-rouge">m_callback</code> data member which can accept a lambda function that can serve as a way to fire a cancel computation signal.</p>
<h5 id="relevant-prs">Relevant PRs:</h5>
<p>The class is implemented in <a href="https://github.com/shogun-toolbox/shogun/pull/4280">PR4280</a>. Other updates include</p>
<ul>
<li><a href="https://github.com/shogun-toolbox/shogun/pull/4286">#4286</a>: replacing cancel_computation calls with macro</li>
<li><a href="https://github.com/shogun-toolbox/shogun/pull/4287">#4287</a>: enable premature stopping in all machines</li>
<li><a href="https://github.com/shogun-toolbox/shogun/pull/4291">#4291</a>: use case for <code class="highlighter-rouge">CStoppableSGObject</code> class</li>
</ul>
<h5 id="progress-bar">Progress bar</h5>
<p>We added a default prefix for <code class="highlighter-rouge">class_name::method_name</code> to the progress bar. This slightly changes the usage which is now done using <code class="highlighter-rouge">SG_PROGRESS</code> macro instead of <code class="highlighter-rouge">progress</code> method. We implemented it in most iterative algorithms. This helped us prepare a list of iterative algorithms<br />
<strong>PR</strong>: <a href="https://github.com/shogun-toolbox/shogun/pull/4305">#4305</a>: The new macro and its usage
<strong>Future Work</strong>: Finding more use cases for it and using it to extend the list of iterative algorithms.</p>
<h3 id="iterative-machine">Iterative Machine</h3>
<p>Previously an algorithm could define what happens when a training process is cancelled or paused with the help of methods like <code class="highlighter-rouge">on_next</code>, <code class="highlighter-rouge">on_pause</code> etc. This is not flexible for a user behind an interface like shogun. We use the concept of <em>mixins</em> for the first time in shogun to write a new <code class="highlighter-rouge">CIterativeMachine</code> class which allows the user to cancel training anytime, execute some more code, and then resume it later if needed. The pre-trained model remains usable and concurrent. This is done by making sure the model updates its <code class="highlighter-rouge">state</code> in every iteration.</p>
<h5 id="relevant-prs-1">Relevant PRs:</h5>
<p>The mixin is implemented in <a href="https://github.com/shogun-toolbox/shogun/pull/4335">PR4335</a>. Related PRs are:</p>
<ul>
<li><a href="https://github.com/shogun-toolbox/shogun/pull/4320">#4320</a>: update the <code class="highlighter-rouge">state</code> of Perceptron in every iteration</li>
<li><a href="https://github.com/shogun-toolbox/shogun/pull/4335">PR4335</a>: this also includes the implementation of <code class="highlighter-rouge">CIterativeMachine</code> in <code class="highlighter-rouge">CPerceptron</code></li>
<li><a href="https://github.com/shogun-toolbox/shogun/pull/4347">#4347</a>: <code class="highlighter-rouge">CIterativeMachine</code> in <code class="highlighter-rouge">CNewtonSVM</code> class.</li>
</ul>
<h5 id="future-work">Future Work:</h5>
<ul>
<li>Porting all algorithms from the <a href="https://github.com/shogun-toolbox/shogun/wiki/List-of-iterative-algorithms">List of Iterative Algorithms</a> wiki page to this code style</li>
<li>Systematic tests for all Iterative machines that ensure proper state update and concurrency.</li>
</ul>
<h3 id="feature-type-dispatching-and-generic-nature">Feature type dispatching and generic nature</h3>
<p>There is an implicit assumption in most algorithms that the provided feature type will be 64 bit dense. To introduce more generic behaviour in an automated way we have written some new classes. These are all <em>mixins</em> that use the curiously recursive template pattern. This means algorithms will inherit from themselves and the orignal base class. The concept is new to shogun and brings a lot more possibilities for similar ideas. The idea is to dispatch feature types from base class and then have subclasses implement a templated version of <code class="highlighter-rouge">train_machine</code>.</p>
<h5 id="relevant-prs-2">Relevant PRs:</h5>
<ul>
<li><a href="https://github.com/shogun-toolbox/shogun/pull/4373">#4373</a>: This implements two mixin + crtp classes to dispatch dense and string feature types. We also implement dense featue types in <code class="highlighter-rouge">CLDA</code>, and <code class="highlighter-rouge">CLeastAngleRegression</code>. There are also unit tests for training a machine will a list of feature types.</li>
<li><a href="https://github.com/shogun-toolbox/shogun/pull/4389">#4389</a>: a meta example for dense dispatcher
<h5 id="future-work-1">Future Work:</h5>
</li>
<li>Add more dispatchers with tests along with implementations.</li>
<li>An automated way to create a new dispatcher class or maybe a workaround so that we don’t need to have as many dispatchers as the number of feature types.</li>
</ul>
<h3 id="other-contributions-and-ideas">Other Contributions and Ideas:</h3>
<h5 id="observer-and-put">Observer and put</h5>
<p>Shogun has an API to set values of member parameters that are registered by algorithms. This API can be used together with parameter observers to record summaries of change in member variables. This can be done using a new <code class="highlighter-rouge">put_observe</code> method that will inform the parameter observers on anything that is being updated. This comes with a design change of using <code class="highlighter-rouge">put</code> to change member data in code instead of using assignment. To benchmark the overhead we will gain over direct assignment we have written a benchmark in <a href="https://github.com/shogun-toolbox/shogun/pull/4342">#4342</a><br />
<strong>Future Work:</strong></p>
<ul>
<li>Prototyping and benchmarking of <code class="highlighter-rouge">put_observe</code></li>
<li>Infering type using <code class="highlighter-rouge">Any</code> instead of doing so in <code class="highlighter-rouge">CObservedValue</code></li>
</ul>
<h5 id="systematic-tests-for-iterative-machine">Systematic tests for Iterative machine</h5>
<p>Writing seperate tests in multiple classes that aim to do the same thing is redundant. We can use <code class="highlighter-rouge">TYPED_TESTS</code> instead to do it for a number of classes. This has a few issues but will be possible in the future. An idea of how to do it is <a href="https://github.com/shogun-toolbox/shogun/pull/4327">#4327</a> for serialization tests.<br />
<strong>Future Work:</strong></p>
<ul>
<li>Using <code class="highlighter-rouge">LibASTParser</code> to provide more information on base classes</li>
<li>Making a general dataset for proper testing
An idea of how these tests should look like is in <code class="highlighter-rouge">Perceptron_unittest.cc</code></li>
</ul>
<h5 id="newtonsvm-refactoring">NewtonSVM Refactoring</h5>
<p>The implementation of <code class="highlighter-rouge">CNewtonSVM</code> was outdated. We refactored it to use <code class="highlighter-rouge">SGVectors/SGMatrix</code> for data storage and using <code class="highlighter-rouge">linalg</code> API for computations. We also ported it to the new IterativeMachine code style.<br />
<strong>PRs</strong>: We refactored most of the code in <a href="https://github.com/shogun-toolbox/shogun/pull/4347">#4347</a>. We added a new method to calculate pseudo inverse for matrices with two implementations in linalg <a href="https://github.com/shogun-toolbox/shogun/pull/4356">#4356</a></p>
<h5 id="meta-examples-and-cookbook-contributions">Meta examples and cookbook contributions</h5>
<p>These are some contributions to meta examples and cookbooks</p>
<ul>
<li><strong>Neural Network Factory</strong>: Adding the option to <code class="highlighter-rouge">auto_initialize</code> the neural network along with a new <code class="highlighter-rouge">layer</code> factory to create new layers. The example from python looks much more intuitive now.<br />
<strong>PR</strong>: <a href="https://github.com/shogun-toolbox/shogun/pull/4386">#4386</a> contains the factory example along with a cookbook for training a Convolutional Neural Network on a dataset for mnist images of 0, 1, 2 in shogun. The corresponding dataset is <a href="https://github.com/shogun-toolbox/shogun-data/pull/165">#165</a></li>
<li><a href="https://github.com/shogun-toolbox/shogun/pull/4346">#4346</a>: NewtonSVM meta example</li>
<li><a href="https://github.com/shogun-toolbox/shogun/pull/4340">#4340</a>: Diffusion meta cookbook and example</li>
<li><a href="https://github.com/shogun-toolbox/shogun/pull/4310">#4310</a>: porting <code class="highlighter-rouge">KRRNystrom</code> and <code class="highlighter-rouge">LeastAngleRegression</code> to new API</li>
<li><a href="https://github.com/shogun-toolbox/shogun/pull/4297">#4297</a>: porting <code class="highlighter-rouge">KernelRidgeRegression</code> meta example to new API</li>
<li><strong>Deleting CLabelsFactory</strong>: The <code class="highlighter-rouge">CLabelsFactory</code> performed static casts which were not needed anymore so we deleted it and used <code class="highlighter-rouge">as</code> for conversions<br />
<strong>PRs</strong>: <a href="https://github.com/shogun-toolbox/shogun/pull/4281">#4281</a>, <a href="https://github.com/shogun-toolbox/shogun/pull/4277">#4277</a></li>
<li><a href="https://github.com/shogun-toolbox/shogun/pull/4278">#4278</a>: A few meta examples on using distance machines from factory</li>
<li><a href="https://github.com/shogun-toolbox/shogun/pull/4236">#4236</a>: factory methods in lda meta example</li>
</ul>
<p><strong>Parallel computation of <code class="highlighter-rouge">sample</code> in CLogDetEstimator</strong>: <a href="https://github.com/shogun-toolbox/shogun/pull/4235">#4235</a> We used openMP to make the code parallel along with refactoring it for efficient memory usage</p>
<p><a href="http:/blog/gsoc'18/Final-Report">Final Report Inside the Black Box</a> was originally published by Shubham Shukla at <a href="http:/blog">NOTEPAD</a> on August 10, 2018.</p>http:/blog/my%20experience/Gsoc-Experience2018-08-05T00:00:00+00:002018-08-05T00:00:00+00:00Shubham Shuklahttp:/blogshubhamshukla1197@gmail.com
<p>We have had a terrific time working together to solve some really cool problems. I found my work to be relevant. This is a short note to my mentors and future students.</p>
<p>For a detailed description of our main ideas please checkout the <a href="https://shubham808.github.iohttp:/blog/featured/"><em>Featured Posts</em></a> section of my blog. For an even closer look at the project timeline check out the <a href="https://shubham808.github.iohttp:/blog/categories/index.html#Weekly%20Updates"><em>Weekly Updates</em></a> category. Here you will find the intermediate ideas and decisions that led us to the final versions which will be helpful for future contributors.</p>
<p>When I first started working on shogun, I worked on small and some large issues. The community was very supportive and helped me with things that were trivial to complicated through detailed and regular feedbacks. This, when mixed with summer of code and some regularity, transformed into brainstorming sessions with my mentors. I had a lot of fun exploring and implementing those ideas into the project. However, not all of those plans have been realized into code. I would love to continue working on them after Summer of Code.</p>
<p>The most important part for a successful project will be good communication with mentors. I have enjoyed that part of this summer to the fullest. Every week we would have a meeting on hangouts sometimes with a document open by the side where we write about new ideas. This went on for hours. The next morning I can’t wait to merge it as soon as I have something that compiles. Ofcourse, the merging part always came after a few days or maybe a week with a lot more thoughts to make it shine. I have learned a lot from my mentors about the way I should be thinking on problems.</p>
<p>I want to thank my mentors Heiko Strathmann and Giovanni De Toni for working with me and always patiently helping me, for understanding my ideas and for realizing them in shogun codebase. I am very excited to see where we take the new developments from here.</p>
<h3 id="advice-for-new-contributors">Advice for new contributors</h3>
<p>The community is welcoming of new contributors. People are very happy to help out and guide you through issues. Do not be afraid to try out new ideas. An explorative approach to things is the best way to understand and then improve upon new ideas. Taking your time to learn about things, realizing that you are stuck and then clearly explaining why that is, will lead to some of the most constructive ideas. Shogun welcomes new contributors to take up new ideas and produce nice code. Being active will <em>always</em> lead to a solution.</p>
<p><a href="http:/blog/my%20experience/Gsoc-Experience">Google Summer of Code with Shogun</a> was originally published by Shubham Shukla at <a href="http:/blog">NOTEPAD</a> on August 05, 2018.</p>http:/blog/gsoc'18/StoppableSGObject and progress bar2018-08-02T00:00:00+00:002018-08-02T00:00:00+00:00Shubham Shuklahttp:/blogshubhamshukla1197@gmail.com
<h3 id="overview">Overview:</h3>
<p>The code for premature stopping framework was written in <code class="highlighter-rouge">CMachine</code>. We take all of its components together and put them in a seperate object type called <code class="highlighter-rouge">CStoppableSGObject</code>. This makes life easier and removes copying of same code.</p>
<h3 id="motivation">Motivation:</h3>
<p>The stopping framework is useful in classes that do not inherit from <code class="highlighter-rouge">CMachine</code> like <code class="highlighter-rouge">CMachineEvaluation</code> this makes the idea a lot scalable in terms of usage since all any class needs to do in order to include the whole thing is inherit from <code class="highlighter-rouge">CStoppableSGObject</code>. Also, it makes introducing new features, like a callback member function, a lot more easier.</p>
<h3 id="implementation-details-and-design-choice">Implementation details and Design choice:</h3>
<p>The <code class="highlighter-rouge">CStoppableSGObject</code> inherits from <code class="highlighter-rouge">CSGObject</code>. It has all the members that were introduced in the <a href="https://github.com/shogun-toolbox/shogun/wiki/premature-stopping">premature stopping framework</a> last year. My mentor added a new feature that enables us to register a lambda function as callback whenever a new iteration starts in a loop. This is done by invoking an <code class="highlighter-rouge">SG_BLOCK_COMP</code> in the callback when a condition returns True.</p>
<h5 id="data-members-and-methods">Data members and Methods:</h5>
<p>Apart from the already present components of premature stopping framework, we introduced a new way to cancel computation of a machine. This will make testing easier and understandable.</p>
<ul>
<li><code class="highlighter-rouge">m_callback</code>: It is a <code class="highlighter-rouge">std::function<bool></code> which can call <code class="highlighter-rouge">cancel_computation()</code> along with generating block signal from the <code class="highlighter-rouge">global_signal_handler</code>.
An example of callback is:</li>
</ul>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">function</span><span class="o"><</span><span class="kt">bool</span><span class="p">()</span><span class="o">></span> <span class="n">callback</span> <span class="o">=</span> <span class="p">[</span><span class="k">this</span><span class="p">]()</span>
<span class="p">{</span>
<span class="c1">// Stop if we did more than 5 steps
</span> <span class="k">if</span> <span class="p">(</span><span class="n">m_last_iteration</span> <span class="o">>=</span> <span class="mi">5</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">get_global_signal</span><span class="p">()</span><span class="o">-></span><span class="n">get_subscriber</span><span class="p">()</span><span class="o">-></span><span class="n">on_next</span><span class="p">(</span><span class="n">SG_BLOCK_COMP</span><span class="p">);</span>
<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">m_last_iteration</span><span class="o">++</span><span class="p">;</span>
<span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<ul>
<li><code class="highlighter-rouge">set_callback</code>: setter for <code class="highlighter-rouge">m_callback</code>.</li>
</ul>
<h3 id="some-thoughts">Some Thoughts:</h3>
<p>For <code class="highlighter-rouge">CIterativeMachine</code> this provides a base for a testing mechanism. The idea is to stop a model using callback and compare it with reference results to test concurrency. This enables us to simulate a user pressing <code class="highlighter-rouge">CTRL+C</code> which will help us to write good unit tests along with providing an alternative way to cancel computation.
For non-iterative classes this means the <code class="highlighter-rouge">COMPUTATIONS_CONTROLLERS</code> macro is still usable to support the signal handler and deal with them in a systematic way without having to write all the code again.</p>
<h1 id="progress-bar-macro">Progress bar macro</h1>
<h3 id="overview-and-motivation">Overview and motivation:</h3>
<p>The progress bar now has an informative prefix by default. This is more verbose and makes it easier to understand and diffrentiate. This was done by adding a new <code class="highlighter-rouge">SG_PROGRESS</code> macro that appends the <code class="highlighter-rouge">function name::class name</code> prefix to the progress bar. It is only possible to get the name of the current function being executed hence, this justifies the use of a macro to obtain the name of the caller.</p>
<h3 id="examples-and-applying-to-more-algorithms">Examples and applying to more algorithms:</h3>
<p>The new, smooth progress bar looks like:</p>
<blockquote>
<table>
<tbody>
<tr>
<td>CustomKernel::get_kernel_matrix</td>
<td>██████████████████████████████████████████████████████</td>
<td>100.00% 0.0 seconds</td>
</tr>
</tbody>
</table>
</blockquote>
<blockquote>
<table>
<tbody>
<tr>
<td>KMeansMiniBatch::minibatch_KMeans</td>
<td>████████████████████████████████████████████████████</td>
<td>100.00% 0.0 seconds</td>
</tr>
</tbody>
</table>
</blockquote>
<p>to use it in a new algorithm we can make the following changes:</p>
<ul>
<li>Identify a candidate loop that will need the progress bar.</li>
<li>Use the macro in the begining of the loop.</li>
</ul>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">for</span> <span class="p">(</span><span class="k">auto</span> <span class="n">e</span> <span class="o">:</span> <span class="n">SG_PROGRESS</span><span class="p">(</span><span class="n">range</span><span class="p">(</span><span class="n">epochs</span><span class="p">)))</span></code></pre></figure>
<p>All <code class="highlighter-rouge">CIterativeMachine</code> will automatically have a progress bar since we apply it in <code class="highlighter-rouge">continue_train()</code></p>
<h3 id="future-work">Future Work:</h3>
<ul>
<li>Use the progress bar anywhere it seems feasible.</li>
<li>Expand the <code class="highlighter-rouge">List of Iterative Algorithms</code> while searching for suitable candidates.</li>
</ul>
<p><a href="http:/blog/gsoc'18/StoppableSGObject-and-progress-bar">StoppableSGObject class and progress bar</a> was originally published by Shubham Shukla at <a href="http:/blog">NOTEPAD</a> on August 02, 2018.</p>http:/blog/gsoc'18/Some more ideas around shogun2018-08-02T00:00:00+00:002018-08-02T00:00:00+00:00Shubham Shuklahttp:/blogshubhamshukla1197@gmail.com
<h3 id="overview">Overview:</h3>
<p>This is a collection of some spin-off stories we worked on during the project. They are some nice future ideas and other cool stuff we played with and would like to realize into something better.</p>
<h3 id="iterative-machine-automated-tests">Iterative Machine automated tests:</h3>
<p>A simple and elegant way to test Iterative Machines is by using 3 reference models. We train the first model till say 5 iterations, the second model till 10 iterations, the third model is to be trained till 10 iterations but is prematurely stopped at 5. The pre-trained model must produce same result as the first reference model. We then call continue_train till the model completes training at 10 iterations. The result must be same as the second reference model.
To automate these tests, we will create a <code class="highlighter-rouge">generator</code> python file which will collect all <code class="highlighter-rouge">CIterativeMachine</code> and create various <code class="highlighter-rouge">type lists</code>. These can be used to write <code class="highlighter-rouge">TYPED_TESTS</code> which will implement the above idea.</p>
<h5 id="future-work">Future work:</h5>
<ul>
<li>Using an AST parser or something similar to generate graphs that provide more information about inheritance</li>
<li>Choosing dataset such that the tests can remain <em>general</em> for all machines that need it.</li>
</ul>
<p>One test of similar approach is available in <code class="highlighter-rouge">Perceptron_unittest.cc</code></p>
<h3 id="put-and-observe">Put and Observe:</h3>
<p>The Parameter Observer Framework is a way to <em>observe</em> how the various members change during training. To implement this in an algorithm we would need to call the <code class="highlighter-rouge">observe</code> method periodically. The new idea was to use <code class="highlighter-rouge">put</code> api to update member variables state and then observe whenever some variable is being <code class="highlighter-rouge">put</code>. Basically, we will try to add a new <code class="highlighter-rouge">put_observe</code> api that uses the orignal api and also calls <code class="highlighter-rouge">observe</code> to update the parameter observers. From an algorithms view, this means we will not update member variables directly (using equals <code class="highlighter-rouge">=</code>) instead we will replace the updates with a call to <code class="highlighter-rouge">put_observe</code>. We wrote a benchmark to quantify the overhead using put generates with and without the observer.</p>
<h5 id="future-work-1">Future work:</h5>
<ul>
<li>Since the <code class="highlighter-rouge">Any</code> api is now better we do can use it in the <code class="highlighter-rouge">ObservedValue</code> class for some runtime magic and type deductions.</li>
<li>Prototyping <code class="highlighter-rouge">put_observe</code>.</li>
</ul>
<h3 id="neural-networks">Neural Networks:</h3>
<p>We added a new dataset based on mnist images of numbers <code class="highlighter-rouge">0, 1, 2</code> along with a factory for neural networks. The model trains nicely however using factory does not allow us to <code class="highlighter-rouge">connect</code> neural layers in a custom manner.</p>
<h5 id="future-work-2">Future work:</h5>
<ul>
<li>A work around for custom connect of layers. This work around will also be very helpful in porting more meta examples like <code class="highlighter-rouge">featureblock_logistic_regression</code> etc.</li>
<li>Remove <code class="highlighter-rouge">NeuralNets.i</code> from swig.</li>
</ul>
<h3 id="newtonsvm-refactoring">NewtonSVM Refactoring:</h3>
<p>This was a big spin-off. <code class="highlighter-rouge">CNewtonSVM</code> was implemented in an obsolete manner and we wrote it all over again making it new and shinier. This included using <code class="highlighter-rouge">linalg</code> for most operations, removing the use of raw pointers and using <code class="highlighter-rouge">SGVector</code>, <code class="highlighter-rouge">SGMatrix</code> instead, and also implementing our <code class="highlighter-rouge">CIterativeMachine</code> code style here. The class is now a lot more readable along with being a classic example for how <code class="highlighter-rouge">CIterativeMachine</code> works.
We also added pseudo inverse of matrices(seperate self-adjoing and general implementions) to complete the refactor.</p>
<h5 id="future-works">Future works:</h5>
<ul>
<li>Enabling the use of <code class="highlighter-rouge">svd_bdc</code> in <code class="highlighter-rouge">linalg::pinv</code> for faster singular value decomposition, also providing api to get <em>all</em> decomposed matrices.</li>
<li>Efficient memory allocation by making more data members.</li>
<li>Refactoring more such classes</li>
</ul>
<p><a href="http:/blog/gsoc'18/Some-more-ideas-around-shogun">Some more ideas around shogun</a> was originally published by Shubham Shukla at <a href="http:/blog">NOTEPAD</a> on August 02, 2018.</p>http:/blog/gsoc'18/IterativeMachine2018-08-02T00:00:00+00:002018-08-02T00:00:00+00:00Shubham Shuklahttp:/blogshubhamshukla1197@gmail.com
<h3 id="overview">Overview:</h3>
<p>Iterative machine enables us to write iterative algorithms that are prematurely stoppable. This means users can cancel the training process any time. The model is still usable and concurrent. This model can then be applied to test data, compared with reference weights and, if needed, can <em>resume</em> training from where it left off earlier. Iterative Machine framework makes using cancelled state models more robust.
The idea here is to have iterative models implement only a single iteration of the main training loop instead. This will be called from a while loop in <a href="https://github.com/shogun-toolbox/shogun/tree/develop/src/shogun/machine/IterativeMachine.h#62">CIterativeMachine</a> class now.</p>
<h3 id="motivation">Motivation:</h3>
<p>The previous idea was to have different callbacks like <code class="highlighter-rouge">on_next</code>, <code class="highlighter-rouge">on_pause</code> which will be called based on the user choice obtained from the <code class="highlighter-rouge">ShogunSignalHandler</code> prompt. This proves a bit restrictive with respect to what the user can acutally do. Furthermore, it was not possible to define such behaivour from an interface like python. The user <em>has</em> to write some c++ code. Therefore, it is a better approach to allow the user to just cancel training whenever he wants and then also to resume training from where it was left off. This is more flexible to the user and developer as it removes the element of <em>guessing</em> what is meaningful for a user.</p>
<h3 id="implementation-details">Implementation details:</h3>
<p>The <code class="highlighter-rouge">CIterativeMachine</code> is a mixin class. This means it can inherit from some other class which is passed to it through a template argument to its constructors. Iterative models will now inherit from <code class="highlighter-rouge">CIterativeMachine<CMockMachine></code> instead of being a direct subclass of <code class="highlighter-rouge">CMockMachine</code>.</p>
<h5 id="data-members">data members:</h5>
<ul>
<li><code class="highlighter-rouge">m_current_iteration</code>: The current iteration count.</li>
<li><code class="highlighter-rouge">m_max_iteration</code>: Maximum number of iterations allowed.</li>
<li><code class="highlighter-rouge">m_complete</code>: If the model has completed training and converged.
<h5 id="methods">methods:</h5>
</li>
<li><code class="highlighter-rouge">init_model</code>: Virtual, must be written in subclass, this is called before training loop begins to initialize all members.</li>
<li><code class="highlighter-rouge">continue_training</code>: Contains the main training loop which updates <code class="highlighter-rouge">m_current_iteration</code> and called the <code class="highlighter-rouge">iteration</code> method.</li>
<li><code class="highlighter-rouge">iteration</code>: Virtual, This must be written in subclass and implements a single iteration of training loop.</li>
<li><code class="highlighter-rouge">end_training</code>: An optional method called after training to clean member states or giving warnings etc.</li>
</ul>
<h3 id="example">Example:</h3>
<p>Below is a cpp example of a fake iterative model.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#include <shogun/base/init.h>
#include <shogun/base/some.h>
#include <shogun/labels/BinaryLabels.h>
#include <shogun/machine/IterativeMachine.h>
#include <iostream>
</span>
<span class="k">using</span> <span class="k">namespace</span> <span class="n">shogun</span><span class="p">;</span>
<span class="k">using</span> <span class="k">namespace</span> <span class="n">std</span><span class="p">;</span>
<span class="c1">// Mock Iterative Algorithm which implements fake methods
</span><span class="k">class</span> <span class="nc">MockModel</span> <span class="o">:</span> <span class="k">public</span> <span class="n">CIterativeMachine</span><span class="o"><</span><span class="n">CMachine</span><span class="o">></span>
<span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="n">MockModel</span><span class="p">()</span> <span class="o">:</span> <span class="n">CIterativeMachine</span><span class="o"><</span><span class="n">CMachine</span><span class="o">></span><span class="p">()</span> <span class="p">{}</span>
<span class="o">~</span><span class="n">MockModel</span><span class="p">()</span> <span class="p">{}</span>
<span class="k">protected</span><span class="o">:</span>
<span class="k">virtual</span> <span class="kt">void</span> <span class="n">init_model</span><span class="p">(</span><span class="n">CFeatures</span> <span class="o">*</span> <span class="n">data</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// Initialize members
</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">m_max_iterations</span> <span class="o">=</span> <span class="mi">10</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">virtual</span> <span class="kt">void</span> <span class="n">iteration</span><span class="p">()</span>
<span class="p">{</span>
<span class="c1">// Single iteration of training loop
</span> <span class="n">x</span> <span class="o">=</span> <span class="n">x</span> <span class="o">+</span> <span class="n">m_labels</span><span class="o">-></span><span class="n">get_value</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">virtual</span> <span class="kt">void</span> <span class="n">end_training</span><span class="p">()</span>
<span class="p">{</span>
<span class="c1">// clean member variable states or give warnings and information
</span> <span class="n">cout</span><span class="o"><<</span><span class="n">x</span><span class="o"><<</span><span class="n">endl</span><span class="p">;</span>
<span class="n">x</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">protected</span><span class="o">:</span>
<span class="n">float64_t</span> <span class="n">x</span><span class="p">;</span>
<span class="p">};</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">init_shogun_with_defaults</span><span class="p">();</span>
<span class="c1">// Set up binary labels
</span> <span class="k">auto</span> <span class="n">labels</span> <span class="o">=</span> <span class="n">some</span><span class="o"><</span><span class="n">CBinaryLabels</span><span class="o">></span><span class="p">(</span><span class="n">SGVector</span><span class="o"><</span><span class="n">float64_t</span><span class="o">></span><span class="p">({</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">}));</span>
<span class="n">MockModel</span> <span class="n">a</span><span class="p">;</span>
<span class="n">a</span><span class="p">.</span><span class="n">set_labels</span><span class="p">(</span><span class="n">labels</span><span class="p">);</span>
<span class="n">cout</span><span class="o"><<</span><span class="s">"Training Start..."</span><span class="o"><<</span><span class="n">endl</span><span class="p">;</span>
<span class="n">a</span><span class="p">.</span><span class="n">train</span><span class="p">();</span>
<span class="c1">// Press CTRL+C before training is complete. Another way to stop training
</span> <span class="c1">//is to pass a callback that will trigger can trigger a signal.
</span>
<span class="c1">// Here you can use the pre-trained model. For example we can apply on test data, serialize the model etc.
</span> <span class="n">cout</span><span class="o"><<</span><span class="s">"Resuming Training"</span><span class="o"><<</span><span class="n">endl</span><span class="p">;</span>
<span class="n">a</span><span class="p">.</span><span class="n">continue_train</span><span class="p">();</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>There are two ways to Prematurely stop an algorithm. The user can press <code class="highlighter-rouge">CTRL+C</code> or the user can write a callback method that will trigger a signal. For more details on second method see <a href="https://github.com/shogun-toolbox/shogun/pull/4293">this patch</a>. From python the code will look like:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">shogun</span> <span class="kn">import</span> <span class="n">Perceptron</span>
<span class="n">Perceptron</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">feats</span><span class="p">)</span>
<span class="c"># Press CTRL+C and you will see something like</span>
<span class="c"># [ShogunSignalHandler] Immediately return to prompt / Prematurely finish computations / Pause current computation / Do nothing (I/C/P/D)?</span>
<span class="c"># Type "C"</span>
<span class="c"># Perform operations like apply on test data, save current model etc</span>
<span class="n">Perceptron</span><span class="o">.</span><span class="n">continue_train</span><span class="p">()</span></code></pre></figure>
<h3 id="applying-iterative-machine-to-more-algorithms">Applying Iterative Machine to more Algorithms:</h3>
<p>To use the features of <code class="highlighter-rouge">CIterativeMachine</code> with a new Algorithm we can make the following changes:</p>
<ul>
<li>Use existing machine members (For eg: <code class="highlighter-rouge">m_w</code>, <code class="highlighter-rouge">bias</code> of <code class="highlighter-rouge">CLinearMachine</code> for weights and bias) instead of local member copies. If there are corresponding local members present they must be removed. This is to make sure the model updates its state every iteration.</li>
<li>Identify the main training loop. This is where the magic is happening.</li>
<li>Everything above the loop in training process is a likely candidate for <code class="highlighter-rouge">init_model</code> method.</li>
<li>The contents of the loop represent a single iteration hence they will go into <code class="highlighter-rouge">iteration</code></li>
</ul>
<p>And you are all set.</p>
<h3 id="future-work">Future Work:</h3>
<ul>
<li>Porting all iterative models to this code style is the aim here. A list of Iterative Algorithms is available <a href="https://github.com/shogun-toolbox/shogun/wiki/List-of-iterative-algorithms">here</a>.</li>
<li>Automated tests for all Iterative Machines. These will include test for correctness of a pre-trained model along with a test to make sure that the model updates its state each iteration.</li>
</ul>
<p><a href="http:/blog/gsoc'18/IterativeMachine">Iterative Machine Guide</a> was originally published by Shubham Shukla at <a href="http:/blog">NOTEPAD</a> on August 02, 2018.</p>http:/blog/gsoc'18/Feature type dispatching2018-08-02T00:00:00+00:002018-08-02T00:00:00+00:00Shubham Shuklahttp:/blogshubhamshukla1197@gmail.com
<h3 id="overview">Overview:</h3>
<p>Most algorithms in shogun do not behave in a generic manner in the sense that they are <em>type dependent</em>. The <code class="highlighter-rouge">train</code> method can accept any type of features as a <code class="highlighter-rouge">CFeatures*</code> pointer however it is later assumed that the features provided are of a particular type. We intorduced feature dispatching to enable this feature in a more automated way. Some algorithms (like <code class="highlighter-rouge">CLDA</code>) already try to take care of types and implement a templated <code class="highlighter-rouge">train_machine</code>. We take that idea and give its own space in shogun <a href="https://github.com/shogun-toolbox/shogun/blob/develop/src/shogun/machine/FeatureDispatchCRTP.h">here</a>.</p>
<h3 id="motivation">Motivation:</h3>
<p>Least Angle Regression is an Iterative Algorithm that tries to stay type independent. This is a good idea in cases where small feature matrix can be scaled down to <code class="highlighter-rouge">float32_t</code> type. Also to implement our Iterative Machine code style here was a problem since it meant having to perform type dispatching in <em>every iteration</em>. This is obviously redundant, even if its cheap it is not a good code style. The idea here is to dispatch feature type in base class (<code class="highlighter-rouge">CMachine</code>) so that when we start training loop, types are already taken care of.</p>
<p>An idea to solve such a problem can be using a hiearchy and then making a child class aware of templated types. Other subclasses can overload virtual methods. The idea will not work because we cannot have virtual methods that are templated. Once the run-time system figured out it would need to call a templatized virtual function, compilation is all done and the compiler cannot generate the appropriate instance anymore.<br />
Hence, mixin is a better idea here.</p>
<h3 id="implementation-details-and-design-choice">Implementation details and Design choice:</h3>
<p>The <code class="highlighter-rouge">CDenseRealDispatch</code> is a class to dispatch dense feature types in <code class="highlighter-rouge">FeatureDispatchCRTP.h</code>. It is a mixin class that takes two template arguments. First is all the members of the base class hence we definitely need to inherit that. The second is <em>something</em> to bring the templated version of <code class="highlighter-rouge">train_machine</code> in scope <em>up</em> the inheritance ladder. Hence we inherit it from the subclass itself. This is possible due to the concept of Curiously Recursive Template Pattern(<code class="highlighter-rouge">CRTP</code>). <code class="highlighter-rouge">C++</code> is lazy, this means a pointer for a class is available to use even <em>before</em> it is declared. In other words, a call to a member method of such a class does not need to be instantiated until the function is actually <em>called</em>. This is diffrent from a normal mixin approach that uses a single template argument because not all the methods can be collected with the help of a single class.</p>
<p>Classes(like <code class="highlighter-rouge">CLDA</code>, <code class="highlighter-rouge">CLeastAngleRegression</code>) which support dynamic dispatching via the mixin will inherit from <code class="highlighter-rouge">CDenseRealDispatch<CMockModel, CBaseMachine></code> instead of directly inheriting from <code class="highlighter-rouge">CBaseMachine</code>.</p>
<h5 id="methods">Methods:</h5>
<ul>
<li><code class="highlighter-rouge">train_dense</code>: Virtual, the method is written in <code class="highlighter-rouge">CDenseRealDispatch</code> called if the feature class of data pointer is <code class="highlighter-rouge">C_DENSE</code>. In the dispatcher this calls <code class="highlighter-rouge">train_machine_templated</code> of model with appropriate type.</li>
</ul>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">train_dense</span><span class="p">(</span><span class="n">CFeatures</span><span class="o">*</span> <span class="n">data</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">auto</span> <span class="n">this_casted</span> <span class="o">=</span> <span class="k">this</span><span class="o">-></span><span class="k">template</span> <span class="n">as</span><span class="o"><</span><span class="n">P</span><span class="o">></span><span class="p">();</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">data</span><span class="o">-></span><span class="n">get_feature_type</span><span class="p">())</span>
<span class="p">{</span>
<span class="k">case</span> <span class="n">F_DREAL</span><span class="p">:</span>
<span class="k">return</span> <span class="n">this_casted</span><span class="o">-></span><span class="k">template</span> <span class="n">train_machine_templated</span><span class="o"><</span><span class="n">float64_t</span><span class="o">></span><span class="p">(</span>
<span class="n">data</span><span class="o">-></span><span class="n">as</span><span class="o"><</span><span class="n">CDenseFeatures</span><span class="o"><</span><span class="n">float64_t</span><span class="o">>></span><span class="p">());</span>
<span class="k">case</span> <span class="n">F_SHORTREAL</span><span class="p">:</span>
<span class="k">return</span> <span class="n">this_casted</span><span class="o">-></span><span class="k">template</span> <span class="n">train_machine_templated</span><span class="o"><</span><span class="n">float32_t</span><span class="o">></span><span class="p">(</span>
<span class="n">data</span><span class="o">-></span><span class="n">as</span><span class="o"><</span><span class="n">CDenseFeatures</span><span class="o"><</span><span class="n">float32_t</span><span class="o">>></span><span class="p">());</span>
<span class="k">case</span> <span class="n">F_LONGREAL</span><span class="p">:</span>
<span class="k">return</span> <span class="n">this_casted</span>
<span class="o">-></span><span class="k">template</span> <span class="n">train_machine_templated</span><span class="o"><</span><span class="n">floatmax_t</span><span class="o">></span><span class="p">(</span>
<span class="n">data</span><span class="o">-></span><span class="n">as</span><span class="o"><</span><span class="n">CDenseFeatures</span><span class="o"><</span><span class="n">floatmax_t</span><span class="o">>></span><span class="p">());</span>
<span class="nl">default:</span>
<span class="n">SG_SERROR</span><span class="p">(</span>
<span class="s">"Training with %s of provided type %s is not "</span>
<span class="s">"possible!"</span><span class="p">,</span>
<span class="n">data</span><span class="o">-></span><span class="n">get_name</span><span class="p">(),</span>
<span class="n">feature_type</span><span class="p">(</span><span class="n">data</span><span class="o">-></span><span class="n">get_feature_type</span><span class="p">()).</span><span class="n">c_str</span><span class="p">());</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<ul>
<li><code class="highlighter-rouge">train_string</code>: Virtual, this is similar to <code class="highlighter-rouge">train_dense</code> but it dispatches string types like <code class="highlighter-rouge">uint8_t</code>, <code class="highlighter-rouge">char</code>.</li>
<li><code class="highlighter-rouge">train_machine_templated</code>: This is a templated version of <code class="highlighter-rouge">train_machine</code> written in subclass. It is called with appropriate parameter by the dispatcher.</li>
</ul>
<p>These methods keep feature class check in the base class and perform feature type checks in mixin. This keeps dense and string features seperate.</p>
<p>There is also an added detail that any class that implements feature type dispatching needs to pass features while calling <code class="highlighter-rouge">train()</code>. This is something we want to enforce all over shogun and the mixin seemed a good place to start.</p>
<h3 id="example-and-tests">Example and Tests:</h3>
<p>A cookbook of how to use a class that supports dispatching is <a href="">here</a>.
The tests for dynamic dispatch use a fake model that returns <em>true</em> when a particular feature type is passed. The feature type is provided in constructor.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">class</span> <span class="nc">CDenseRealMockMachine</span>
<span class="o">:</span> <span class="k">public</span> <span class="n">CDenseRealDispatch</span><span class="o"><</span><span class="n">CDenseRealMockMachine</span><span class="p">,</span> <span class="n">CMachine</span><span class="o">></span>
<span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="n">CDenseRealMockMachine</span><span class="p">(</span><span class="n">EFeatureType</span> <span class="n">f</span><span class="p">)</span>
<span class="o">:</span> <span class="n">CDenseRealDispatch</span><span class="o"><</span><span class="n">CDenseRealMockMachine</span><span class="p">,</span> <span class="n">CMachine</span><span class="o">></span><span class="p">()</span>
<span class="p">{</span>
<span class="n">m_expected_feature_type</span> <span class="o">=</span> <span class="n">f</span><span class="p">;</span>
<span class="p">}</span>
<span class="o">~</span><span class="n">CDenseRealMockMachine</span><span class="p">()</span>
<span class="p">{</span>
<span class="p">}</span>
<span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="o">></span>
<span class="kt">bool</span> <span class="n">train_machine_templated</span><span class="p">(</span><span class="n">CDenseFeatures</span><span class="o"><</span><span class="n">T</span><span class="o">>*</span> <span class="n">data</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">data</span><span class="o">-></span><span class="n">get_feature_type</span><span class="p">()</span> <span class="o">==</span> <span class="n">m_expected_feature_type</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">virtual</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">get_name</span><span class="p">()</span> <span class="k">const</span>
<span class="p">{</span>
<span class="k">return</span> <span class="s">"CDenseRealMockMachine"</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">EFeatureType</span> <span class="n">m_expected_feature_type</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>This is then tested with a few feature types for each dispatcher.</p>
<h3 id="applying-dispatchers-to-more-classes">Applying Dispatchers to more Classes:</h3>
<p>To implement dense dispatching in more algorithm we can make the following changes:</p>
<ul>
<li>Port the <code class="highlighter-rouge">train_machine</code> call to its templated version <code class="highlighter-rouge">train_machine_templated</code>. This can be a bit tricky and involves making the implementation fully templated and making sure the dispatched types are respected.</li>
<li>Inherit from the mixin instead of directly inheriting from base class.
<blockquote>
<ul>
<li>class CMockModel : public CMockMachine</li>
<li>class CMockModel : public CDenseRealDispatch<CMockModel, CMockMachine></li>
</ul>
</blockquote>
</li>
<li>Make the class a friend of the Dispatcher. This is done to bring <code class="highlighter-rouge">train_machine_templated</code> into the dispatcher’s scope.</li>
</ul>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">friend</span> <span class="k">class</span> <span class="nc">CMockModel</span> <span class="o">:</span> <span class="k">public</span> <span class="n">CDenseRealDispatch</span><span class="o"><</span><span class="n">CMockModel</span><span class="p">,</span> <span class="n">CMockMachine</span><span class="o">></span></code></pre></figure>
<p>The idea is similar for <code class="highlighter-rouge">CStringFeaturesDispatch</code>.
And you are all set with a fully templated model.</p>
<p>Writing a dispatcher for a new feature class F can be done by:</p>
<ul>
<li>Identify what feature types will make sense to dispatch. This is highly dependent on the feature class we pick.</li>
<li>Add a new method <code class="highlighter-rouge">train_F</code> or some other suitable name to CMachine and update <code class="highlighter-rouge">CMachine::train()</code> with new feature class.</li>
<li>Add a new dispatcher class to <code class="highlighter-rouge">FeatureDispatcherCRTP.h</code> in a similar way as is already done.</li>
<li>Implement it in an algorithm or write a unit test.</li>
</ul>
<h3 id="future-work">Future Work:</h3>
<ul>
<li>Add more dispatchers with tests along with implementing the dispatchers all over shogun.</li>
<li>A nice design improvement would be an automated way to create a new dispatcher class or maybe a workaround so that we don’t need to have as many dispatchers as we have feature types.</li>
</ul>
<p><a href="http:/blog/gsoc'18/Feature-type-dispatching">Feature Dispatching Guide</a> was originally published by Shubham Shukla at <a href="http:/blog">NOTEPAD</a> on August 02, 2018.</p>http:/blog/weekly%20updates/Post-122018-07-23T00:00:00+00:002018-07-23T00:00:00+00:00Shubham Shuklahttp:/blogshubhamshukla1197@gmail.com
<p>This week we completed the mixin and merged it into develop.</p>
<p>We also made train_machine_templated protected again by making the mixin base class a friend of subclasses.<br />
Also, <code class="highlighter-rouge">CWDSVMOcas</code> is not the best candidate for string features since it actually asserts a particular feature type. So we had to get rid of the new changes there. <br />
We added a unit tests for the dispatcher. The strategy is to make a fake machine that implements a templated train_machine. The model returns true if the feature type recieved from <code class="highlighter-rouge">train</code> call and the expected feature type. The expected type is set in the machine constructor. <br />
It all worked out pretty well. <br />
There are a few problems we saw like the fact that we will need a new mixin class for each feature class dispatching. There are a lot of feature classes so it does feel a bit redundant. Although it does make sense to keep diffrent feature types seperately too. We will think about this a bit more. <br />
Another problem is when train_machine_templated is called with an <code class="highlighter-rouge">illegal</code> type parameter. Such an error will not be caught and this will cause problems in compiling downstream. A solution for this is using another type parameter in train_machine_templated. This defaults to a allowing only certain types like floating points. When we try to call train_machine_templated with something that is not allowed we can throw a ShogunException and avoid messy compiler errors.<br />
On thinking about this a bit more we realized that we need to seperate arithmetic types from floating types. This means a new mixin class for arithmetic. The problem might just scale upwards as we introduce more feature dispatcher.</p>
<p>I also worked on a patch for cookbook in convolutional neural networks. This was a fun patch. First I created a dataset from images of 0, 1, 2 by reading them with opencv and writing them down in matrices. I used some default parameters along with creating two factory. <code class="highlighter-rouge">neural_networks</code> and <code class="highlighter-rouge">neural_layer</code>. Initially the network did not behave nicely becuase of misleading parameters. I will work with this in the next week.</p>
<h3 id="contributions">Contributions</h3>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4373">Feature type dispatching through recursive mixin</a><br />
<a href="https://github.com/shogun-toolbox/shogun/pull/4386">Neural Layers Cookbook</a></p>
<p><a href="http:/blog/weekly%20updates/Post-12">Week 10</a> was originally published by Shubham Shukla at <a href="http:/blog">NOTEPAD</a> on July 23, 2018.</p>http:/blog/weekly%20updates/Post-112018-07-16T00:00:00+00:002018-07-16T00:00:00+00:00Shubham Shuklahttp:/blogshubhamshukla1197@gmail.com
<p>This week we came up with a second idea for feature dispatching. The earlier approach was to use macro to generate function names based on train_dense, train_string calls. This is not very automated. Also, the macros are hard to debug since it can be difficult to extract useful information from macros. We realized the problem is solvable via mixins. Idea was to have each machine implement a templated version of <code class="highlighter-rouge">train_machine</code>.</p>
<p>This will be called by the mixin depending on feature types. However since it is not possible to have virtual methods that are templated, we found a workaround for our case. We will write the mixins as <code class="highlighter-rouge">CRTP</code> classes. CRTP or Curiously recursive template pattern is a C++ dev magic that relies on compiler lazyness to evaluate function calls. This, for our case, means that we can pass the class name as a parameter to its own base class !. As long as a method for subclass is not called the compiler need give an error over this. We can now call the templated train_machine from the base class without making it virtual. Only issue I see is that train_Machine_templated needs to be public now.</p>
<p>We will have a diffrent dispatcher for each feature class. The subclasses will inherit from the dispatcher which accepts 2 template arguments. One would be the subclass itself! This is the magic of CRTP. The second would be a base class like LinearMachine for example. We implemented this for dense features and used the dispatcher in <code class="highlighter-rouge">CLeastAngleRegression</code> and <code class="highlighter-rouge">CLDA</code></p>
<p>The macro version of the solution had a merit that the feature class dispatching was done in <code class="highlighter-rouge">CMachine</code>. We will combine this idea with the mixin.
We added a method train_dense which will be called if Dense features are provided and the machine supports dispatching. The idea remains same forString Features as well.
train_dense is implemented in our <code class="highlighter-rouge">CDenseRealDispatch</code> mixin. It checks for feature type and calls <code class="highlighter-rouge">train_machine_templated</code> with appropriate methods.
We also implemented String feature dispatching in <code class="highlighter-rouge">CWDSVMOcas</code></p>
<p>We also wrote a <code class="highlighter-rouge">feature_name</code> method which can generate a std::string for various feature classes.</p>
<h3 id="contributions">Contributions</h3>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4373">Feature type dispatching through recursive mixin</a></p>
<p><a href="http:/blog/weekly%20updates/Post-11">Week 9</a> was originally published by Shubham Shukla at <a href="http:/blog">NOTEPAD</a> on July 16, 2018.</p>http:/blog/weekly%20updates/Post-102018-07-09T00:00:00+00:002018-07-09T00:00:00+00:00Shubham Shuklahttp:/blogshubhamshukla1197@gmail.com
<p>This week we made some major refactors to NewtonSVM class.<br /></p>
<p>These include cleaning up all raw pointers and using SGVector, SGMatrix instead.<br />
Using linalg instead of SGVector, SGMatrix for ops.<br />
Making NewtonSVM iterative.<br />
seperately calculating bias and weights. This was being done in a single matrix till now. <br />
Using weights member of LinearMachine instead of local member. This ensures the model is usable when it is paused.<br />
<br />
Next we worked on implementing Pseudo Inverse in linalg.<br />
Any m x n matrix A can be decomposed into A = USVt. If A is self adjoint positive semi definite matrix then a <br />
Symmetrical Self adjoint eigen slover can be used to calculate S and U. The inverse can be expressed as A+ = U * inverse(S)t * Ut.<br />
We have symmetric eigen solver in linalg so we have used that here.
<br /><br />
For a general m x n matrix, we have Singular Value Decomposition of A to calculate inverse as A+ = Vt * inverse(S)t * Ut.<br />
This needed to be implemented directly from eigen backend.
<br /><br />
With this all of refactoring of NewtonSVM was completed.<br />
We also work on a systematic way to test all Iterative machines. Since they will inherit from Iterativemachine. We can use ctags to<br />
sort them out and apply our test to them.<br /></p>
<h3 id="contributions">Contributions</h3>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4335">NewtonSVM</a><br />
<a href="https://github.com/shogun-toolbox/shogun/pull/4335">pinv in linalg</a><br />
<a href="https://github.com/shogun-toolbox/shogun/pull/4335">Iterative machine test</a><br /></p>
<p><a href="http:/blog/weekly%20updates/Post-10">Week 8-End of Phase 2</a> was originally published by Shubham Shukla at <a href="http:/blog">NOTEPAD</a> on July 09, 2018.</p>http:/blog/weekly%20updates/Post-92018-07-02T00:00:00+00:002018-07-02T00:00:00+00:00Shubham Shuklahttp:/blogshubhamshukla1197@gmail.com
<p>This week we worked on a few more cleanups for our mixin along with finally merging it into develop.<br />
We also wrote a cookbook for converters along with adding a factory for them.<br /></p>
<p>We decided to implement our mixin to another Iterative model as an additonal example.<br />
For this we choose NewtonSVM LinearMachine. The class was a bit updated and we discovered a bug
in IterativeMachine where end_training() was not called in case of premature stopping that was fixed in this PR.<br />
<br />
The code of NewtonSVM is old with a lot of raw pointers, for memory allocations instead of SGVectors and Matrix along with
many places that should use linalg instead of for loops.<br />
<br />
As a start we implement the Iterative Machine to it as is. This requires making all the things that are shared over iterations
a data member of the subclass and initializing them to avoid memory leaks.<br />
Then implemeting init_model, iteration and end_training methods.<br />
<br />
We worked on the benchmark for “put” this week. to implement various test cases we wrote a std::function member that can be provided a
lambda function to implement various cases of updates. The benchmark shows that there is negligible loss of resources when updating members with
put instead of assignment. This will be used to work along with ParameterObserver in subsequent weeks.
<br />
<br />
Another example for IterativeMachine is Least Angle Regression. To port it to IterativeMachine framework we will need to make Iteration templated.
Currently most classes at shogun do not deal with feature dispatching at all and it is done in a redundant manner in classes that do so like LDA and LARS.
To implement that we have tried a few things. We will be using mixins to solve this problem.<br />
<br />
<br /></p>
<h3 id="contributions">Contributions</h3>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4335">NewtonSVM</a><br />
<a href="https://github.com/shogun-toolbox/shogun/pull/4335">Iterative Machine</a><br />
<a href="https://github.com/shogun-toolbox/shogun/pull/4335">Benchmark for put</a><br />
<a href="https://github.com/shogun-toolbox/shogun/pull/4335">Coverter factory</a><br /></p>
<p><a href="http:/blog/weekly%20updates/Post-9">Week 7</a> was originally published by Shubham Shukla at <a href="http:/blog">NOTEPAD</a> on July 02, 2018.</p>http:/blog/weekly%20updates/Post-82018-06-25T00:00:00+00:002018-06-25T00:00:00+00:00Shubham Shuklahttp:/blogshubhamshukla1197@gmail.com
<p>This week we refined and completed IterativeMachine class as a mixin.<br /></p>
<p>We focused on Perceptron as an example to see how things would actually look like
when we implement our idea.<br />
The flow begins when the user calls “train” for an Iterative machine which will be perceptron for us.<br />
the train method is implemented in CMachine and calls train_machine with the provided data pointer.<br />
The mixin has 3 data members: m_current_iteration, m_max_iteration and m_complete.<br />
The train_machine is implemented in the subclasses and implements the training process for an algorithm.<br />
In IterativeMachine we implement train_machine for subclasses in the mixin.<br />
Instead the subclasses will now implement 3 methods:<br />
<br />
init_model: The subclass can initialize its members here along with any other ops that need to be done
before training begins.<br />
<br />
iteration: The actual iteration is implemented here.<br />
<br />
end_training: this is an optional method which can be used for some additional error handling and/or cleaning states of members.<br />
<br />
Data between the three methods is shared with the help of data members of subclass. Doing this has an additional advantage.<br />
We are forced to write code that uses already present data members like m_w(weights), bias of CLinearMachine.<br />
The state is hence being automatically updated every iteration. This keeps the model “current” during a paused state.<br />
<br />
The train_machine calls init_model with the features data pointer. This initializes the model parameters.<br />
next it calls continue_train. This is the new element of our class. In continue_train we have the while loop
that runs till convergence or maximum iteration. the loop calls the iteration method of subclass again and again.<br />
<br />
When the user decides to prematurely cancel computation control is returned to main. The user can perform what ever is need now.
For eg he can serialize the incomplete model for comparisions, he can apply the model or some test data, and then simply call
continue_train to resume training of the model.<br />
<br />
The pull request includes a test in Perceptron that shows the genral idea of how things work now.
<br /></p>
<h3 id="contributions">Contributions</h3>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4335">Iterative Machine</a></p>
<p><a href="http:/blog/weekly%20updates/Post-8">Week 6</a> was originally published by Shubham Shukla at <a href="http:/blog">NOTEPAD</a> on June 25, 2018.</p>http:/blog/weekly%20updates/Post-72018-06-18T00:00:00+00:002018-06-18T00:00:00+00:00Shubham Shuklahttp:/blogshubhamshukla1197@gmail.com
<p>Phase one of GSoc is over this week. We have had a terrific run.</p>
<p>In this post we will look at how we are going to implement our IterativeMachine class and a few issues we faced
along with how we tackled them.</p>
<p>The class overides train_machine which calls a virtual method init_model that will perform all the things needed by the main training
loop. Communication between the training loop and init_model will be done through member varaibles.
Next component is continue_train method. This is where we have our COMPUTATION_CONTROLLERS macro and the main training loop.
This will run the single iteration implemented in the model and update state after each iteration.</p>
<p>One issue was how to deal with features and labels. For labels we have used the already present member in CMachine.
For features we added a new m_continue_features member as an extra. This is intended to keep things between IterativeMachine
and base classes like LinearMachine machine apart.</p>
<p>For now we have only have an IterativeLinearMachine class that inherits from LinearMachine machine and implemented
the idea in Perceptron. We will build on it this week.</p>
<p>For end of training ops like cleaning/resetting states, warnings, errors etc. we have an end_training() method that can be overloaded
in base class.</p>
<p>The final problem is where to place the IterativeMachine class in the inheritance ladder. The obvious solution appears
to be multiple inheritance but we cannot do that since there will be some function overloading and it proved tough.
Another method is to implement IterativeMachine as a mixin for machines. A mixin is a class that can inherit from
another class dynamically through a template argument. It is not a standalone class but adds more things to a base class.
The result is a custom base class. Exactly what we want. This keeps the Iterative Framework minimal. We might not need the extra
feature member anymore. Also the class will be keeping up with all API changes we introduce later because it will inherit most of everything
from the orignal base classes.</p>
<p>I have implemented this as a work in progress and it is working nicely. I have issues using it in interfaces and that is something
we will be looking into this week.</p>
<p>Another thing i worked on this week is a benchmark for “put” using perceptron this gave mixed results and more digging is required.</p>
<h3 id="contributions">Contributions</h3>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4335">Iterative Machine</a></p>
<p><a href="http:/blog/weekly%20updates/Post-7">Week 5-End of Phase 1</a> was originally published by Shubham Shukla at <a href="http:/blog">NOTEPAD</a> on June 18, 2018.</p>http:/blog/weekly%20updates/Post-62018-06-11T00:00:00+00:002018-06-11T00:00:00+00:00Shubham Shuklahttp:/blogshubhamshukla1197@gmail.com
<p>This week we discussed deeply into the implementation of our framework along with making a few changes.</p>
<p>After merging <a href="https://github.com/shogun-toolbox/shogun/pull/4320">#4230</a> the Perceptron is ready to implement on_pause_impl()
of its own.</p>
<p>To start this off I wrote a simple code to serialize a machine whenever the user chooses pause after pressing CTRL+C.
The idea here was to simply allow a user to serialize the model in a CFile* member of StoppableObject class. If the user wanted to
do something else on pause he could overide the on_pause_impl() method and do it.</p>
<p>This had a few problems:</p>
<p>Most of the time the File member will remain unused which is a bad design.<br />
Problems with file_name and overwriting of files.<br />
Tests: We had earlier worked on reusing the serialization tests and including a test for whether model is stoppable.<br />
However, we will also need a test to check if a model is updating its state properly in each iteration. We came up with one but it
still had its own limitations.<br />
Interfaces like Python cannot directly overload the on_pause_impl() methods. User does not have any choice but to write C++ code if he wants to do something else on paused.<br /></p>
<p>With this in mind the current approach needed more thought. So, we decided to introduce the IterativeMachine class to shogun.
This will allow the user to only cancel computation and then perform what ever is needed on the intermediate model then
if he wants to, continue training again.
This approach solves all our problems above.</p>
<p>We do not need to “predict” and prepare for what the user might want to do ideally. So no need for any extra members to StoppableObject.
Any issues with filename are now directly in control of the user so that makes it a lot more transparent.</p>
<p>To implement this would mean changing implementation of all iterative algorithms to in a way that they only have a method to run a single iteration
and not the whole thing at once. This means an updated state in each iteration is almost neccessary for the model to successfully train.
We dont need complicated tests anymore for that.</p>
<p>But the best problem this approach solves is easy flow of code into Interfaces. Earlier we were looking into using DirectorClasses to allow interfaces to
overload a method but that feels overkill.
Now all the user in python needs to do is stop the training process with CTRL+C, perform whatever is needed, then call continue_train() again.
Much simpler to implment and more flexible.</p>
<h3 id="contributions">Contributions</h3>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4330">helper to serialize machine to ascii</a><br /></p>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4335">Iterative Machine</a></p>
<p><a href="http:/blog/weekly%20updates/Post-6">Week 4</a> was originally published by Shubham Shukla at <a href="http:/blog">NOTEPAD</a> on June 11, 2018.</p>http:/blog/weekly%20updates/Post-52018-06-04T00:00:00+00:002018-06-04T00:00:00+00:00Shubham Shuklahttp:/blogshubhamshukla1197@gmail.com
<p>This week we looked into how the algorithms do not update their state properly in each iteration.
Due to this we cannot really expect meaningful information during pause since it will just return the initialized values.
This will need to be fixed for each algorithm.</p>
<p>That is in each iteration the algorithm needs to update its state so that whenever we pause we get the current values.
We have implemented this for perceptron.</p>
<p>Along the way we also wrote a test for hyperparameter initialization of the model as a jump start to it.
But this will need to be addressed for all iterative algorithms.</p>
<p>I also refactored the trained model serialization test to include a StoppableObject test that implements a basic test to see algorithms
iterative nature.
This lead to interesting results. First we cannot use this approach for algorithms that do not need to use the second iteration to converge, we will need to skip
there algorithms from testing currently.
Second we need to use break instead of continue in the COMPUTATION_CONTROLLERS macro.
The second part led to problems with training loops turning into infinite loops as the loop condition is never updated and
the while statement never terminates.</p>
<p>This was an easy fix but troublesome to find.</p>
<p>Another task this week was a nice refactor to the progress bar code that changes how its used all over shogun.
We now have a prefix like class_name::method_name as default.</p>
<p>This was done by instantiating the progress bar from a macro instead and the using the <strong>FUNCTION</strong> macro to obtain function name.</p>
<p>We now have a lot broader questions to ask like what will constitute as “state” of the model and how do we plan to update it each iteration.</p>
<p>This week we found a new issue in garbage collection with factory API… or rather the absence of it. I tried adding
%newobject to have swig take ownership of the newly created factory object but it is not working. We will need to investigate it further.</p>
<h3 id="contributions">Contributions:</h3>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4305">Progress bar in iterative algorithms</a>.</p>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4327">Pausing Unit Test</a>.</p>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4322">Garbage collection in swig</a>.</p>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4320">State Update in Perceptron and unit test</a>.</p>
<p><a href="http:/blog/weekly%20updates/Post-5">Week 3</a> was originally published by Shubham Shukla at <a href="http:/blog">NOTEPAD</a> on June 04, 2018.</p>http:/blog/weekly%20updates/Post-42018-05-29T00:00:00+00:002018-05-29T00:00:00+00:00Shubham Shuklahttp:/blogshubhamshukla1197@gmail.com
<p>This week we came up with an idea to easily test Iterative Machines.</p>
<p>The idea here was to provide callback methods that will trigger cancel_computation(). These callbacks will directly send a block
signal to the signal handler and we are done.</p>
<p>This is also a nice functionality to have since a user might want to trigger pause/cancel when a condition is satisfied.
This makes the framework more user friendly. Along with a lot easier tests.</p>
<p>Currently we use jinja2 to write tests to systematically test a bunch of algorithms. But jinja2 was being dropped from serialization tests.
We could use that here because the test we designed will take the algorithm, prematurely stop it with a callback over number of iteration
serialize the model, then we will compare results after deserializing it.
What we test here is the fact that the algorithm is stoppable. We leverage this on the whether the model triggered the callback i.e. it had an
iteration during the training phase.This can serve as a test for whether the model is iterative in nature as well.</p>
<p>As a cookbook patch I ported regression meta examples to the new API.</p>
<p>We also implemented the progress bar in all iterative algorithms too. This was a nice refactor that involved identifying a lot of iterative algorithms :).</p>
<h3 id="contributions">Contributions:</h3>
<hr />
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4305">Progress bar in iterative algorithms</a>.<br /></p>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4310">Refactor regression meta</a>.<br /></p>
<p>By my mentor<br /></p>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4293">Add set_callback() to StoppableSGObject</a>.<br /></p>
<p><a href="http:/blog/weekly%20updates/Post-4">Week 2</a> was originally published by Shubham Shukla at <a href="http:/blog">NOTEPAD</a> on May 29, 2018.</p>http:/blog/weekly%20updates/Post-32018-05-21T00:00:00+00:002018-05-21T00:00:00+00:00Shubham Shuklahttp:/blogshubhamshukla1197@gmail.com
<p>The first week of coding period ends today and we have an idea about how we are going to be testing the premature stopping Framework. What we have in mind is using a
registered callback along with an <code class="highlighter-rouge">addcallback</code> method that allows the user to define a custom callback to an algorithm. As toni told me what we want to do ideally is
train a model, stop and serialize it, and then calculate the result. Next, we compare it with the result we get by deserializing the model we saved earlier for consistency.</p>
<p>Obviously, doing this for all iterative algorithms seperately is difficult so we are going to be writing some <code class="highlighter-rouge">TYPED_TESTS</code>. These can run for a large instances
of algorithms without us having to explicitly write them for each instance. This is intuitive because we want to test just the fact that the model is serialized consistently
and also the callback will remain same for every instance. Toni has made some edits to the <code class="highlighter-rouge">CStoppableSGObject</code> class regarding this.</p>
<p>I will be working on this the following week.</p>
<p>We also ran into an issue because the <code class="highlighter-rouge">LinearMachine</code> and <code class="highlighter-rouge">KernelMachine</code> were implementing their own version of train instead of using the base class version.
Due to this we were not able to write custom on_pause_impl() methods. So I have made a patch to that.</p>
<p>We wrote code for the <code class="highlighter-rouge">StoppableSGObject</code> class last week and I implemented it in <code class="highlighter-rouge">CMachineEvaluation</code>.</p>
<p>Next we took a look at removing the direct calls to <code class="highlighter-rouge">cancel_computation()</code> and replacing them with <code class="highlighter-rouge">CANCEL_COMPUTATION</code> macro. During this we ran into an issue with using
the macro with const train methods.</p>
<p>The log-det refactoring was finally completed this week. Cheers to that! The final thing we did was making the code more memory efficient while preserving thread-safety.
We used a boolean flag that defaults to false. We will set it to true when we negated shifts and just let it be false otherwise. This was a simple hack and provides a lot of
memory efficiency due to the fact that earlier we were just allocating a vector memory and then allocating another all over again in the next iteration.</p>
<p>The final thing i did this week was trying to come up with a list of iterative algorithms. This meant going through a lot of code and finding classes with some
logical iterative implementation. First I found a list of all algorithms with CMachine as base class. From there I manually go through all the for/while loops and
decide if this will need to be visited later in this project along with the line number that the loop starts at. The list is not complete yet but we have enough to start our work.
This was added as a Wiki page to shogun. I tried to keep the list short by writing only base classes like say NeuralNetworks/EMBase, we will need to visit all others that use it when
we start implementing custom pause/cancel behaviours.</p>
<h3 id="pull-requests">Pull Requests:</h3>
<hr />
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4286">#4286 using CANCEL_COMPUTATION macro</a><br />
replacing <code class="highlighter-rouge">cancel_computation()</code> with the macro.</p>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4291">#4291 refactor CMachineEvaluation</a><br />
removing duplicate code from the class and inheriting from <code class="highlighter-rouge">CStoppableSGObject</code>.</p>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4235">#4235 parallel computation of log-det</a><br />
making all methods called within <code class="highlighter-rouge">estimator.sample</code> const along with parallelizing in it.</p>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4287">#4287 connect LinearMachine and KernelMachine to signal Handler</a><br />
use the train of base class in them.</p>
<p><a href="https://github.com/shogun-toolbox/shogun/wiki/List-of-iterative-algorithms">List of iterative Algorithms</a><br /></p>
<p><a href="http:/blog/weekly%20updates/Post-3">Week 1</a> was originally published by Shubham Shukla at <a href="http:/blog">NOTEPAD</a> on May 21, 2018.</p>http:/blog/weekly%20updates/Post-22018-05-13T00:00:00+00:002018-05-13T00:00:00+00:00Shubham Shuklahttp:/blogshubhamshukla1197@gmail.com
<p>The <code class="highlighter-rouge">Community Bonding Period</code> is over today and from tommorow the <code class="highlighter-rouge">Coding Period starts</code>. In this post I will go through a summary of my
Community Bonding.</p>
<p>The first meeting with both of my mentors was very helpful and we made a decision to add a few more things to the work we needed to do.</p>
<p>First was an interface class that would later enable us to scale the Premature Stopping code to classes not inheriting from <code class="highlighter-rouge">CMachine</code>. This basically takes all our code
and places it in a nice new <code class="highlighter-rouge">StoppableSGObject</code> class.</p>
<p>My mentor’s experience proved really helpful in making this easy for me!</p>
<p>Another issue was a reliable testing mechanism for premature stopping. This is still in open discussion and we will be completing it soon.</p>
<p>As a community Bonding excerise all of the new students were required to help out with the new release by porting some meta examples to the new API and also translating some
undocumented ones to meta.</p>
<p>We have also decided to get rid of <code class="highlighter-rouge">CLabelsFactory</code> in favor of using ‘as’ all over shogun. This was a nice refactoring patch.</p>
<p>Overall this week was terrific for me. I got to work on a lot of cool things. The mentors are very helpful and we would be doing a lot more great stuff this summer.</p>
<h3 id="pull-requests">Pull Requests:</h3>
<hr />
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4236">#4236 factory methods in LDA meta example</a><br />
Refactors the <code class="highlighter-rouge">lda meta example</code> to use factory methods.<br />
Refactors LDA to work with <code class="highlighter-rouge">Dense</code> and <code class="highlighter-rouge">Multiclass</code> Labels instead of just Binary.<br />
Refactor unit-LDA by removing repeting code and also using some all over.<br /></p>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4277">#4277 Delete CLabelsFactory Part1</a><br />
<a href="https://github.com/shogun-toolbox/shogun/pull/4281">#4281 Delete CLabelsFactory Part2</a><br />
Deletes <code class="highlighter-rouge">CLabelsFactory</code> in favor of using <code class="highlighter-rouge">as</code> in label conversions.</p>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4280">#4280 CStoppableSGObject class</a><br />
Implements the new <code class="highlighter-rouge">CStoppableSGObject</code> base class.</p>
<p><a href="https://github.com/shogun-toolbox/shogun/pull/4278">#4278 distance meta examples</a><br />
Ported a few distance legacy python examples to meta.<br /></p>
<p><a href="http:/blog/weekly%20updates/Post-2">Community Bonding Period</a> was originally published by Shubham Shukla at <a href="http:/blog">NOTEPAD</a> on May 13, 2018.</p>http:/blog/weekly%20updates/Post-12018-05-01T00:00:00+00:002018-05-01T00:00:00+00:00Shubham Shuklahttp:/blogshubhamshukla1197@gmail.com
<p>The 14th Google Summer of Code results were announced today and I am thrilled to be working with the people at Shogun.</p>
<p>My proposal to the project <em><a href="https://summerofcode.withgoogle.com/projects/6010966421012480">Inside the BlackBox</a></em> was accepted.
The major goals I am going to be working on is using premature stopping all over shogun’s codebase.</p>
<p>The backbone for this was beautifully written by my mentor Giovanni De Toni last year.
This interests me because debugging ease is something everyone is excited about for the obvious reason that debugging can be frustrating.
In machine learning, our models and algorithms do seem like black boxes with an input plugged in and generating meaningful output, especially when we do it in
other languages through wrappers.</p>
<p>What we want to achieve is try to take the user along with the algorithm so that he can see what the algorithm is seeing and then
make better and informed decisions.</p>
<p>First step would be to list out my domain for this task namely what am I am going to touch and how.
This does involve frisking all of shogun’s algorithms and finding the ones we like. Thankfully someone( it was Ken Thompson, cheers to him) created grep.
We will be getting familiar with a large number of algorithms and deciding what is meaningful for each of them.</p>
<p>Aside from this I will also be working to complete the trasition to factory methods in meta examples. The new API is definately interesting
that our mentors have developed piece by piece. Lets take all that love and give it a place to live in our meta examples.
This week I am going to try to have my already open prs merged. I have my final semester exams from next week and they will definately eat
up most of my time for now.</p>
<p>We are going to have an IRC session soon enough to discuss how everything is really going to happen.
Interacting with my mentors is definately something I have been looking forward to since I first found shogun a few months ago.
I am more interested in how they want to go about the project. This will provide me with a proffessional opinion that I am ready to
understand and henceforth work with.</p>
<p>I am also reading previous gsoc blogposts at Shogun, numFOCUS and other organisations to get a feel of how this summer is going to unfold.
This is a pretty interesting and fun thing to do. Its always a great time trying to dive into the minds of people you are going to be working with/like. :D</p>
<p><a href="http:/blog/weekly%20updates/Post-1">The proposal accepted</a> was originally published by Shubham Shukla at <a href="http:/blog">NOTEPAD</a> on May 01, 2018.</p>