Big Data Processing and Analytics

Prof. E. Maiorana



  • Programma Sintetico del Corso

    • Il corso intende fornire una panoramica sugli strumenti principali utilizzati per l'analisi di grandi moli di dati (audio, video, testo) generati dagli odierni sistemi di telecomunicazione e dai relativi servizi offerti.
    • A tale scopo sono introdotti i principi di inferenza statistica e di machine learning, oltre che le principali architetture di deep learning utilizzati ad oggi in svariati ambiti applicativi.
    • Sono previste esercitazioni in Matlab relative agli argomenti esposti, e al suo uso per calcolo parallelo.

    • Il programma di dettaglio del corso prevede:
      • Statistics
        • inference and statistical hypothesis testing
        • regression
      • Machine Learning
        • classification (supervised learning)
          • decision trees, random forests, naïve Bayes, linear discriminant analysis, k-nearest neighbor, support vector machines
        • clustering (unsupervised learning)
          • k-means clustering
          • hierarchical clustering
        • data modeling
          • principal component analysis, indipendent component analysis, outlier detection and data cleansing, hidden Markov models
        • deep learning & CNN
      • Processing
        • parallel processing
        • examples in Matlab
      • Data analytics in business applications
      • Graph-based signal processing (TBD)
      • Students' presentations
  • Course Contents

    • The course aims at providing an overview of the main tools used for the analysis of big data (audio, video, text) generated by today's telecommunications systems and related offered services.
    • For this purpose, the principles of statistical inference and machine learning are introduced, as well as the fundamental deep learning architectures used nowadays in several application areas.
    • Matlab exercises related to the presented arguments, and to its use for parallel computation, are planned.

    • The course detail program includes:
      • Statistics
        • inference and statistical hypothesis testing
        • regression
      • Machine Learning
        • classification (supervised learning)
          • decision trees, random forests, naïve Bayes, linear discriminant analysis, k-nearest neighbor, support vector machines
        • clustering (unsupervised learning)
          • k-means clustering
          • hierarchical clustering
        • data modeling
          • principal component analysis, indipendent component analysis, outlier detection and data cleansing, hidden Markov models
        • deep learning & CNN
      • Processing
        • parallel processing
        • examples in Matlab
      • Data analytics in business applications
      • Graph-based signal processing (TBD)
      • Students' presentations
  • Slide del corso

  • Testi consigliati

    • S. Nolan and T. Heinzen, "Statistics for the Behavioral Sciences"
    • G. James, D. Witten, T. Hastie, R. Tibshirani, "An Introduction to Statistical Learning"
    • K. P. Murphy, "Machine Learning - A Probabilistic Perspective"
    • S. Theodoridis and K. Koutroumbas, "Pattern Recognition"
    • T. A. Runkler, "Data Analytics - Models and Algorithms for Intelligent Data Analysis"
    • I. Goodfellow, Y. Bengio, A. Courville, "Deep Learning"
  • Materiale didattico on-line

    • Materiale su piattaforma Moodle