Our work concerns knowledge extraction from graphical languages whose symbols are a priori unknown. We are assuming that the observation of a large quantity of documents should allow to discover the symbols of the considered language. The difficulty of the problem is the two-dimensional and handwritten nature of the graphical languages that we are studying. We are considering online handwriting produced by interfaces like touch-screens, interactive whiteboards or electronic pens. The signal is then available as a sampled trajectory of the pen or finger tip, producing a sequence of strokes, themselves composed of a sequence of points. A symbol, the basic element of the alphabet of the language, is composed of a set of strokes with specific structural and relational properties. The extraction of symbols is performed by unveiling the presence of repetitive subgraphs in a global graph modeling the strokes (nodes) and their spatial relationships (arcs) of the entire document set. The principle of minimum description length (MDL) is used to select the best representatives of the symbol set. This work was validated on two experimental datasets. The first one is a dataset of simple mathematical expressions, the second is composed of graphical flowcharts. On these datasets, we can assess the quality of the extracted symbols and compared them to the ground truth. Finally, we were interested in reducing the annotation workload of a database by considering both the problems of segmentation and labeling of the different strokes.