Information Retrieval and Text Mining ( Otkrivanje informacija i znanja iz teksta)
Text information retrieval systems; efficient text indexing; Indexing, terms and doc processing. Robust term processing. Web search overview, web structure, the user, paid placement, search engine optimization/spam. Web size measurement. Crawling and web indexes. Near-duplicate detection. Index construction Boolean, vector space model. Hierarchical clustering. Probabilistic retrieval models; ranking and rank aggregation; evaluating IR systems. Text clustering and classification methods: Text classification. Naive Bayes models. Spam filtering. Nearest Neighbors, Decision boundaries, Vector space classification using centroids. Comparative results. Text clustering. Partitioning methods: k-means clustering; Latent semantic indexing (LSI). Applications to clustering and Support vector machine classifiers. Kernel Function. Evaluation of classification. Micro- and macro-averaging. Learning rankings. Taxonomy induction, cluster labeling; classification algorithms and their evaluation, text filtering and routing. Information Extraction. Text Understanding. Question Answering. Link analysis. Sentiment Analysis.