Here’s an excellent post explaining functors, applicatives and monads with pictures.

Read More

(image from cartoonmovement.com)

Democracy is not just about voting, but about the wider problem of making decisions in a community. Good systems for making decisions are more likely to yield decisions that reflect the community’s preferences and exploit its collective knowledge. A decision making system, therefore, takes as input preferences and knowledge and produces as output a decision. In effect, such a decision making system is an information aggregator; high quality decisions are the result of aggregating as much information as possible: no preferences are ignored, and no valuable knowledge is left unused.

This brief description brings us to the distinction between voting and the general case of  decision making. In traditional voting, voters are asked to cast a vote which represents a choice between a predefined set of already existing alternatives. The alternatives presented to the voter are fixed prior to the vote, they have been previously elaborated by some other agent independent of the voters. The voters simply choose among the set presented to them.

In the terms presented above, the knowledge and preferences that voters input into the system only serves to discriminate between a predefined set of options. If the process by which the options were elaborated has not previously already aggregated this information and materialized it into the options themselves, that information could be lost.

In other words, voting is too coarse grained to aggregate the information in all its richness. The grain is determined by the options, if these options do not previously already represent the best quality decisions, the resulting decision will be suboptimal. The optimal decision implicit in the lost information is unavailable as a choice.

The rise of information technology is very relevant to this problem. It is clear, nominally speaking, that a problem of aggregating information is directly related to information technology. More specifically, the communication possibilities that the modern day internet afford are precisely those that are necessary to open up the possibility of aggregating information from hundreds of thousands of sources in a much more refined way than the coarse grained procedure of voting allows.

The key idea is this: information technology and the internet allow extending voters’ participation from just the voting phase, back in time to include the proposal drafting/elaboration (what before were merely fixed options) phase. In contrast to the coarse grain input that voting allows, contributions to a proposal as it is being written are fine grained; modifications to proposals can be arbitrarily small and precise. From adding (or removing) a comma, to a sentence, to a paragraph, to an entire proposal.

We call this opening up of proposal elaboration to the general voter population Open / Distributed / Federated proposal elaboration.

It is open because anyone can contribute. It is distributed/federated because there is no central authority that drafts proposals and imposes them on the voters. Proposals are drafted spontaneously in a distributed fashion. In conclusion, Open/Distributed/Federated proposal elaboration is a decision making medium that need not suffer from information loss that can affect democracy-as-voting. This strong focus on participation makes this approach strongly related to ideas such as collaborative governance[1] and participative democracy[2].

There are two sides to the coin, however. In opening up proposal elaboration to the general public, the gates to more information are opened, but they are also opened to too much information. The key then, is to balance the benefits of the extra possibilities and ideas that this system allows with the extra noise and less useful contributions that can potentially flood the medium. If the signal to noise ratio is too low, optimal decisions can be discarded, not because the required information was not present as we discussed,  but because those decisions were buried under noise that makes them impossible to find.

In summary, a three stage model of information aggregation as decision making is proposed. Although we use the term stage, these three opportunities need not occur chronologically, they may occur in any order and repeatedly.

  1. The first stage of decision making occurs when a set of individuals sets about collaboratively writing[3] or modifying an existing proposal. Ideally this occurs in realtime, these individuals discuss and debate the changes as they are being made. Or they can exchange views or ideas with more traditional means of interaction, offline. In any case, the proposal is the result of their collaboration and contribution, information in terms of preferences and knowledge is integrated to yield the proposal.
  2. The second stage is proposal cross-fertilization[6]. Whereas the focus on aggregation in the first stage was accross individuals collaboratively writing a proposal, in this stage the aggregation occurs across proposals. Proposals may be forked, merged, recombined and mutated. Sections from one proposal may be imported/transcluded into another one. Although the process is still carried out by individuals, proposal cross-fertilization does not  only gather contributions from a subset of users collaboratively making modifications, but rather contributions are drawn from the entire proposal pool.
  3. The third stage is proposal selection, which corresponds to our traditional idea of voting. Having a given pool of proposals that have evolved during the other two stages of aggregation, the voting population is asked to vote for one (or many).

Due to the problems of noised mentioned above, it may be necessary to add extra stages that act as filters. For example, a filter could discard proposals that are not supported by a threshold number of participants.


References

[1] http://en.wikipedia.org/wiki/Collaborative_governance

[2] http://en.wikipedia.org/wiki/Participatory_democracy

[3] By collaborative proposal editing we mean realtime online editing of document by several individuals. The most common examples of this today are found in Google Drive and Etherpad.

[4] http://en.wikipedia.org/wiki/Google_Drive

[5] http://en.wikipedia.org/wiki/Etherpad

[6] Tools that support this model are for example Wikis[7] and Version control systems[8]. The idea of cross fertilization has been suggested in [9]

[7] http://en.wikipedia.org/wiki/Wiki

[8] http://en.wikipedia.org/wiki/Version_control

[9] http://zelea.com/w/Stuff:Votorola/p/position_space_rationalization

Read More
CEM

Taken from Causal Entropic Forces

Recently I was referred to a paper titled Causal Entropic Forces published in Physical Review Letters that attempts to link intelligence and entropy maximization. You can find reviews of this paper here and here. The paper starts with

Recent advances in fields ranging from cosmology to computer science have hinted at a possible deep connection between intelligence and entropy maximization….In this Letter, we explicitly propose a first step toward such a relationship in the form of a causal generalization of entropic forces that we show can spontaneously induce remarkably sophisticated behaviors associated with the human ‘‘cognitive niche,’’ including tool use and social cooperation, in simple physical systems.

The authors then go on to define a causal path version of entropy. Briefly, this is a generalization from standard entropy, a measure of how many states a system can be in at a specific point in time, to causal path entropy, a measure of how many paths that system can follow during a given time horizon. In technical language, microstates are mapped to paths in configuration space, and macrostates are mapped to configuration space volumes:

In particular, we can promote microstates from instantaneous configurations to fixed-duration paths through configuration space while still partitioning such microstates into macrostates according to the initial coordinate of each path

In other words, an initial coordinate establishes a volume in configuration space which represents possible future histories starting at that point. This is the macrostate (depicted as a cone in the image above)

Having defined this version of entropy, the authors then add the condition of entropy maximization to their model; this is what they call causal entropic forcing. For this to have a net effect, some macrostates have volumes which are partially blocked off for physical reasons. Consequently these macrostates have less available future paths, and less causal path entropy. The result is that different macrostates with different entropies can be differentially favored by condition of causal entropy maximization:

there is an environmentally imposed excluded path-space volume that breaks translational symmetry, resulting in a causal entropic force F directed away from the excluded volume.

Note that, contrary to actual thermodynamical systems that naturally exhibit entropy maximization for statistical reasons, causal entropic forcing is not physical, it is a thermodynamics inspired premise the authors add to their model as a “what if” condition, to see what behaviour results. So, what happens when systems are subject to causal entropic forcing?

 we simulated its effect on the evolution of the causal macrostates of a variety of simple mechanical systems: (i) a particle in a box, (ii) a cart and pole system, (iii) a tool use puzzle, and (iv) a social cooperation puzzle…The latter two systems were selected because they isolate major behavioral capabilities associated with the human ‘‘cognitive niche’’

Before you get excited, the “tool use puzzle” and “social cooperation puzzle” are not what one would imagine. They are simple “toy simulations” that can be interpreted as tool use and social cooperation. In any case, the result was surprising. When running these simulation the authors observed adaptive behaviour that was remarkably sophisticated given the simplicity of the physics model it emerged from. What’s more, not only was the behaviour adaptive, but it exhibited a degree of generality; the same basic model was applied to both examples without specific tuning.

The remarkable spontaneous emergence of these sophisticated behaviors from such a simple physical process suggests that causal entropic forces might be used as the basis for a general—and potentially universal—thermodynamic model for adaptive behavior.

How does this fit in with intelligence?

I see two different ways one can think about this new approach. One, as an independent definition of intelligence from very simple physical principles. Two, in terms of existing definitions of intelligence, seeing where it fits in and if it can be shown to be equivalent or recovered partially.

Defining intelligence as causal entropy maximization (CEM) is a very appealing as it only requires a few very basic physical principles to work. In this sense it is a very powerful concept. But as all definitions it is neither right nor wrong, its merit rests on how useful it is. The question is thus how well does this version of intelligence capture our intutions about the concept, and how well it fits with existing phenomena that we currently classify as intelligent[1]. Ill consider a simple example to suggest that intelligence defined this way cannot be the entire picture.

That example is unsurprisingly life, the cradle of intelligence. The concept that directly collides with intelligence defined as CEM is negentropy. Organisms behave adaptively to keep their biological systems within the narrow space of parameters that is compatible with life. We would call this adaptive behaviour intelligent, and yet its effect is precisely that of reducing entropy. Indeed, maximizing causal entropy for a living being means one thing, death.

One could argue that the system is not just the living organism, but the living organism plus its environment, and that in that case the entropy perhaps would be maximized[2]. This could resolve the apparent incompatibility, but CEM still seems unsatisfying. How can a good definition of intelligence leave out an essential aspect of intelligent life: the entropy minimization that all living beings must carry out. Is this local entropy minimization implicit in the overall causal entropy maximization?

CEM, intelligence and goals

Although there is no single correct existing definition of intelligence, it can be said that current working definitions share certain common features. Citing [3]

If we scan through the definitions pulling out commonly occurring features we find that intelligence is:

• A property that an individual agent has as it interacts with its environment or environments.

• Is related to the agent’s ability to succeed or profit with respect to some goal or objective.

• Depends on how able to agent is to adapt to different objectives and environments.

In particular, intelligence is related to the ability to achieve goals. One of the appealing characteristics of CEM as defining intelligence is that it does not need to define goals explicitly. In the simulations carried out by the authors the resulting behaviour seemed to be directed at achieving some goal that was not specified by the experimenters. It could be said that the goals emerged spontaneously from CEM.  But it remains to be seen whether this goal directed behaviour results automatically in real complex environments. For the example of life I mentioned above, it looks to be just the opposite.

So in general, how does CEM fit in with existing frameworks where intelligence is the ability to achieve goals in a wide range of environments[3]? Again, I see two possibilities:

  • CEM is a very general heuristic[4] that aligns with standard intelligence when there is uncertainty in the utility of different courses of action
  • CEM can be shown to be equivalent if there exists an encoding that represents specific goals via blocked off regions in configuration space (macrostates)

The idea behind the first possibility is very simple. If an agent is faced with many possibilities where it is unclear which one will lead to achieving its goals, then maximizing expected utility would seek to follow courses of action that allow it to react adaptively and flexibly when more information becomes available. This heuristic is just a version of keep your options open.

The second idea is just a matter of realizing that CEM’s resulting behaviour depends on how you count possible paths to determine the causal entropy of a macrostate. If one were to rule out paths that result in low utility given certain goals, then CEM could turn out to be equivalent to existing approaches to intelligence. Is it possible to recover intelligent goal directed behaviour as an instance of CEM given the right configuration space restrictions?


References

[1] Our intuitions about intelligence exist prior to any technical definition. For example, we would agree that a monkey is more intelligent than a rock, and that a person is more intelligent than a fly. A definition that does not fit these basic notions would be unsatisfactory.

[2] http://prd.aps.org/abstract/PRD/v76/i4/e043513

[3] http://www.vetta.org/documents/A-Collection-of-Definitions-of-Intelligence.pdf

[4] This seems related to the idea of basic AI drives identified by Omohundro in his paper http://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf. In particular 6. AIs will want to acquire resources and use them efficiently. Availability of resources translates to the ability to follow more paths in configuration space, paths that would be unavailable otherwise.

Read More

Continuing along the theme of tightening the feedback loop, I saw this post on LtU mentioning a paper on live programming

Live programming allows programmers to edit the code of a running program and immediately see the effect of the code changes. This tightening of the traditional edit-compile-run cycle reduces the cognitive gap between program code and behavior, improving the learning experience of beginning programmers while boosting the productivity of seasoned ones. Unfortunately, live programming is difficult to realize in practice as imperative languages lack well-defined abstraction boundaries that make live programming responsive or its feedback comprehensible.

This paper enables live programming for user interface programming by cleanly separating the rendering and non-rendering aspects of a UI program, allowing the display to be refreshed on a code change without restarting the program. A type and effect system formalizes this separation and provides an evaluation model that incorporates the code update step. By putting live programming on a more formal footing, we hope to enable critical and technical discussion of live programming systems.

 

 

Read More

Programmers have been known to engage in flame wars about programming languages (and related matters like choice of text editor, operating system or even code indent style). Rational arguments are absent from these heated debates, differences in opinion usually reduce to personal preferences and strongly held allegiances without much objective basis. I have discussed this pattern of thinking before as found in politics.

Although humans have a natural tendency to engage in this type of thought and debate for any subject matter, the phenomenon is exacerbated for fields in which there is no available objective evidence to reach conclusions; no method to settle questions in a technical and precise way. Programming languages are a clear example of this, and so better/worse opinions are more or less free to roam without the constraints of well established knowledge. Quoting from a presentation I link to below

Many claims are made for the efficacy and utility of new approaches to software engineering – structured methodologies, new programming paradigms, new tools, and so on. Evidence to support such claims is thin and such evidence, as there is, is largely anecdotal. Of proper scientific evidence there is remarkably little. – Frank Bott

Fortunately there is a recognized wisdom that can settle some debates: there is no overall better programming language, you merely pick the right tool for the job. This piece of wisdom is valuable for two reasons. First, because it is most probably true. Second, its down to earth characterization of a programming language as just a tool inoculates against religious attitudes towards it; you dont worship tools, you merely use them.

But even though this change of attitude is welcome and definitely more productive than the usual pointless flame wars, it does not automatically imply that there is no such thing as a better or worse programming language for some class of problems, or that better or worse cannot be defined in some technical yet meaningful way. After all, programming languages should be subject to advances like any other engineering tool The question is, what approach can be used to even begin to think about programming, programs, and programming languages in a rigorous way?

One approach is to establish objective metrics on source code that reflect some property of the program that is relevant for the purposes of writing better software. One such metric is the Cyclomatic complexity as a measure of soure code complexity. The motivation for this metric is clear, complex programs are harder to understand, maintain and debug. In this sense, cyclomatic complexity is an objective metric that tries to reflect a property that can be interpreted as better/worse; a practical recommendation could be to write and refactor programs code in a way that minimizes the value of this metric.

But the problem with cyclomatic complexity, or any measure, is whether it in fact reflects some property that is relevant and has meaningful consequences. It is not enough that the metric is precisely defined and objective if it doesn’t mean anything.  In the above, it would be important to determine that cyclomatic complexity is in fact correlated with difficulty in understading, maintaining, and debugging. Absent this verified correlation, one cannot make the jump from an objective metric on code to some interpretation in terms of better/worse, and we’re back where we started.

The important thing to note is that correctly assigning some property of source code a better/worse interpretation is partly a matter of human psychology, a field whose methods and conclusions can be exploited. The fact that some program is hard to understand (or maintain, debug, etc) is a consequence both of some property of the program and some aspect of the way we understand programs. This brings us to the concept of the psychology of programming as a necessary piece in the quest to investigate programming in a rigorous and empirical way.

Michael Hansen discusses these ideas in this talk: Cognitive Architectures: A Way Forward for the Psychology of Programming. His approach is very interesting, it attempts to simulate cognition via the same cognitive architectures that play a role in artificial general intelligence. Data from these simulations can cast light as to how how different programming language features impact cognition, and therefore how these features perform in the real world.

The ACT-R cognitive architecture

I have to say, however, that this approach seems very ambitious to me. First, because modeling cognition is incredibly hard to get right. Otherwise we’d already have machine intelligence. Secondly, because it is hard to isolate the effects of anything beyond a low granularity feature. And programming languages, let alone paradigms, are defined by the interplay of many of these features and characteristics. Both of these problems are recognized by the speaker.


[1] Image taken from http://www.lackuna.com/2013/01/02/4-programming-languages-to-ace-your-job-interviews/

Read More

(Beta reduction)

When first using programming languages like C, C++, Java, I never stopped to think about what the compilation phase was actually doing, beyond considering it a necessary translation from source code to machine code. The need to annotate variables and methods with types seemed an intrinsic part of that process, and I never gave much thought about what it really meant, or the purpose it served; it was just part of writing and compiling code.

Using dynamic languages changes things, you realize that code may need some form of translation or interpretation for execution, but does not absolutely require type annotations as with statically typed languages. In this light, one reconsiders what a type system is from scratch, how it plays a part in compilation, and what compilation errors and type errors really are.

A type error (I am not talking about statistics) is an important concept to grasp well. In fact not just for programming, it is also a useful concept to illuminate errors in natural language and thinking. Briefly, a type error is treating data as belonging to a kind to which it does not in fact belong. The obvious example is invoking operations on an object that does not support them. Or alternatively, according to this page

type error: an attempt to perform an operation on arguments of the wrong types.

If one tries to perform such illegal operations (and there is no available conversion or coercion), the program can behave unexpectedly or crash. Statically typed languages can detect this error at compile time, which is a good thing. This is one of the main points advocates of statically typed languages make when discussing static and dynamic languages.

But the question is, how big an advantage is this? How important are type errors, what fraction of common programming errors do they make up? And how much of program correctness (in the non strict sense of the term) can a compiler check and ensure?

Compilers and type systems can be seen not just requirements to write programs that do not crash, but also tools with which to express and check problem domain information. The more information about the problem domain can be encoded in the type information in a program, the more the compiler can check for you. In this mindset one adds type information order to exploit the type system, rather than just conform to it.

Here are a few examples where problem domain information is encoded in types in order to prevent errors that otherwise could occur at runtime, and therefore would need specific checks. A concrete example taken from the first post I linked

  • If I call move on a tic-tac-toe board, but the game has finished, I should get a compile-time type-error. In other words, calling move on inappropriate game states (i.e. move doesn’t make sense) is disallowed by the types.
  • If I call takeMoveBack on a tic-tac-toe board, but no moves have yet been made, I get a compile-time type-error.
  • If I call whoWonOrDraw on a tic-tac-toe board, but the game hasn’t yet finished, I get a compile-time type-error.

By encoding these rules of the problem domain into the type system, it is not possible to write a program that violates the rule, logic errors in the program do not compile.

But it is unrealistic to go all the way and say, all the relevant information should be expressible in types, and its really seductive twin: all program errors are type errors. Unfortunately, the real world is not that tidy and convenient, unless you’re doing specialized things like theorem proving, or programming with an exotic programming language like Coq.

As advocates of dynamic languages know very well, there is no subsitute for unit testing and integration tests. The compiler should not make you feel safe enough to disregard testing. Compile-time checking and testing are complementary, not mutually exclusive. Compile-time checking does not eliminate the need for testing, nor does testing eliminate the benefits of compile-time checking.

Still, there is most probably room left to express more relevant information in the type system and making the compiler do more work for you. And this becomes even more plausible if you’re using a language with such a powerful type system as Scala. More on these matters: Static Typing Where Possible, Dynamic Typing When Needed: The End of the Cold War Between Programming Languages.

Read More

In a previous entry on brain machine interfaces I wrote

Injecting input into the brain need not only be a matter of feedback for control, but as a way to input sensory information in general. In one experiment, a rat was subject to artifcial stimuli corresponding to a virtual object, via directly stimulating neurons responsible for processing touch information from its whiskers (yes, its whiskers you read that right). So one could say that the rat “felt” the touch of a non existent object. Taking the idea even further, one could conceivably create entirely new senses, not linked to a specific part of the body, in order to sense surroundings in ways that do not correspond to anything that our bodies currently perceive.

Seems it’s been done again, but this time allowing the rat to “touch” infrared light. Unlike the experiment mentioned in my previous post, the rat in these new experiments was sensing real (infrared) light, not some fixed and arbitrarily specified virtual object.

Read More

I’ve recently come across some comments and questions that discuss the idea of non-blocking IO and how it works. Which led me to this video

described as

..here is a short video that tries to explain the advantage of non-blocking javascript. It’s oversimplifying but it should illustrate the basic concept.

A simple model

Continuing with oversimplification, here I’ll present a model that describes throughput for a blocking and non-blocking IO process where messages have to be sent across a communication channel. This channel is characterized by two parameters, bandwidth and latency, that together with a third parameter, message size, completes the model. Thus

m = message size — the size of the messages to be sent

b = bandwith — the channel’s capacity, data that can be sent per unit time

l = latency — round tri,p time for a ping message

and what we want to identify is the throughput, as a function of these parameters

t = throughput — messages processed per unit time = f(m, b, l)

Throughput is defined as the amount of message replies arriving at the source per unit time, which is equivalent to the rate of messages arriving at the target under the assumption that message replies are not constrained by bandwidth. This throughput is what defines the performance of the IO process. We’ll use arbitrary units for the quantities, but you can think of message size, bandwidth and latency in the usual units (ie Kb, Kb/s, ms) if that makes it clearer.

First, the non-blocking throughput is simply the bandwidth divided by the message size

tn = b / m

this results from the assumption that non-blocking message sending will achieve 100% bandwidth utilization; the number of messages per unit time is directly given by the bandwidth.

Blocking

With blocking, each message will only be sent once a reply has been received for the previous one; previous messages block subsequent ones. This causes the number of message replies that arrive at the source per unit time to not be fully constrained by bandwidth, but by bandwidth and latency. The net effect is to add a time overhead to each message’s total processing time. Since this overhead occurs per message, we first calculate transmission time for one message using available bandwidth, which is

transmission time = message size / bandwidth

the overhead is half the latency, since that is the time it takes for a message reply to arrive, triggering the next message send. So the total time per message is

total time = transmission time + return delay = m / b + l / 2

because this is the total time required for each message send, the throughput is the reciprocal

tb = 1 / (m / b + l / 2)

This is the function we were looking for. As a sanity check, we can see that if the latency is zero, the above equation reduces to the non-blocking case. This also shows that t≤ tn for all values.

Note that this throughput corresponds both to that of message replies and messages arriving at the destination, the important point is that the latency blocks further sends, besides adding time to the roundtrip reply. Finally, if delays corresponding to message processing were variable, the term l / 2 would have to be substituted with the average processing time plus half the latency to yield an average throughput.

Plotting f

Let’s plot f using some example values to see what happens. We will use the following

message size = 5

bandwidth = 1-100

latency = 0.01-1.0

To compare performance with and without blocking, we’ll define a quantity

r = ratio of blocking throughput to non-blocking throughput = tb / tn

We will use octave to generate 3D plots for this data, using this simple script

function draw(message)
b = [1:100];
l = [0.01:0.01:1];
[x,y] = meshgrid(b,l);
r = (1./((message./x) + y/2))./(x/message);
n = x/message;
b = (1./((message/x) + y/2));
figure(1);
mesh(x,y,n);
figure(2);
mesh(x,y,b);
figure(3);
mesh(x,y,r);
endfunction

which generates three plots, one for non-blocking throughput, one for blocking throughput, and one for the ratio. The vertical axis shows the throughput, with the two horizontal axes corresponding to bandwidth (left) and latency (right). Message size was fixed at 5 units.

Non-blocking throughput (tn)

Nothing special going on here, throughput scales linearly with bandwidth. Because there is no blocking, latency has no effect, the surface is a plane.

Blocking throughput (tb)

The interesting thing to note here is how latency (bottom right axis) controls the degree to which throughput scales with bandwidth (bottom left axis). With high latency, throughput barely scales, the curve rises but is almost flat. At the other extreme, with zero latency the scaling is the same as in the previous graph, a straight line with the same gradient rising up to the red region. The transition from no scaling to scaling occurs in the middle of the graph, as the shape of the surface changes. Overall, throughput is of course reduced in comparison to non-blocking.

Ratio (r = tb / tn)

A ratio of 1.0, equal throughput, occurs with zero latency, the peak at the top right that continues along the ridge at the top of the graph. But what is interesting is the other ridge that extends towards the bottom. In fact, the trend that we can see from the blue area towards the latency axis is that lower bandwidth produces higher ratios closer to 1.0, equal performance. After a moment’s thought, this makes sense, the lack of bandwidth negates the effects of blocking. Or in other words, blocking still manages to utilize 100% bandwidth when it is scarce.

Javascript and non-blocking IO

The last point I’d like to make here is that the association that some people make between javascript and non-blocking IO is wrong. Javascript as a language says nothing about blocking or non-blocking IO. What javascript as a language does have is anonymous functions (or lambdas, or whatever you want to call them). Anonymous functions as first class citizens makes it easy to define callbacks, and it is through these callbacks that non-blocking IO can occur. But in this day and age, not having anonymous functions is a stone age thing rather than the norm, so most modern languages should be able to do non-blocking IO with ease.

Read More

Selected quote

A large fraction of the flaws in software development are due to programmers not fully understanding all the possible states their code may execute in.

The full post is here

 

Read More

 

It’s all about deep learning these days. I previously posted a video here of a talk by Andrew Ng where one can also see unsupervised feature learning, as for example Gabor filters, ie these features are learned by the network automatically.

Read More