Data Science & Big data (CS-7005)

Data Science & Big data (CS-7005) RGPV notes CBGS

Syllabus

UNIT 1:
Understanding Data: Data Wrangling and Exploratory Analysis, Data Transformation & Cleaning, Feature Extraction, Data Visualization. Introduction to contemporary tools and programming languages for data analysis like R and Python.

UNIT 2:
Statistical & Probabilistic analysis of Data: Multiple hypothesis testing, Parameter Estimation methods, Confidence intervals, Bayesian statistics and Data Distributions.

UNIT 3:
Introduction to machine learning: Supervised & unsupervised learning, classification & clustering Algorithms, Dimensionality reduction: PCA & SVD, Correlation & Regression analysis, Training & testing data: Overfitting & Under fitting.

UNIT 4:
Introduction to Information Retrieval:Boolean Model, Vector model, Probabilistic Model, Text based search: Tokenization,TF-IDF, stop words and n-grams, synonyms and parts of speech tagging.

UNIT 5:
Introduction to Web Search& Big data: Crawling and Indexes, Search Engine architectures, Link Analysis and ranking algorithms such as HITS and PageRank, Hadoop File system & MapReduce Paradigm


NOTES


You May Also Like

Comments

Popular posts from this blog

OR & Supply Chain (ME-7003)

Enterprise Resource Planning [ERP] (ME-7005)

Nano Electronics (EC-6005)