Statistical modeling of functional oligonucleotides such as transcription factor binding sites, i.e., inferring a sequence motif with the incentive of predicting new instances, is one of the classic fields within bioinformatics. Most of the previous work in this field is based on a comparatively simple motif model that assumes statistical independence among all nucleotide. Making use of additional features is to date limited by insufficient statistical models that suffer from overfitting. In this work we propose a new class of statistical models that allows modeling complex features in the data while keeping the parameter space small in order to avoid overfitting. For inferring these models from data, we propose different Bayesian and non-Bayesian learning approaches, both for fully observable data and in the presence of latent variables. We apply models and learning algorithms to investigate the phenomenon of statistical dependencies within sequence motifs of DNA-binding proteins. Using de novo motif discovery on ChIP-seq data, we find that intra-motif dependencies are prevalent in nature and that modeling them increases prediction accuracy.