We propose a computational approach using a variational specification of the visual front-end, where ganglion cells with properties of retinal Konio cells (K-cells), are considered as a network, yielding a mesoscopic view of the retinal process. The variational framework is implemented as a simple mechanism of diffusion in a two-layered non-linear filtering mechanism with feedback, as observed in synaptic layers of the retina, while its biological plausibility, and capture functionalities as (i) stimulus adapted response; (ii) non-local noise reduction (i.e. segmentation); (iii) visual event detection, taking several visual cues into account: contrast and local texture, color or edge channels, and motion base in natural images. Those functionalities could be implemented in the biological tissues We use computer vision methods to propose an effective link between the observed functions and their possible implementation in the retinal network base on a two-layers network with non-separable local spatio-temporal convolution as input, and recurrent connections performing non-linear diffusion before prototype based visual event detection. The numerical robustness of the proposed model has been experimentally checked on real natural images. Finally, we discuss in base of experimental biological and computational results the generality of our description.