Who is Data Scientist in Big Data: professional competences of data researcher
We tell who Data Scientist is: what a scientist needs to know from data and how a researcher differs from an analyst.
What does a Data Scientist do
Like Data Analyst, a data scientist also works with information arrays by performing the following operations:
- search for patterns in information sets;
- preparation of data for modeling (sampling, cleaning, feature generation, integration, formatting);
- modeling and visualization of data;
- development and testing of hypotheses for improving business metrics through the construction of machine learning models (Machine Learning).
Data Scientist, in most cases, is focused on predictive analytics, while a Data Analyst most often considers information post-factum. Nevertheless, the main goal of the Data Scientist is consonant with the main working goal of the Big Data Analyst – to extract information from information arrays that are useful for business from the point of view of making optimal management decisions.
Image 1: Portrait of professional competences of Data Scientist
Professional competencies of a Data Scientist: what a Data Scientist should know
To solve the above problems, the data scientist must be competent in the following areas of knowledge:
- information technology – methods and means of data mining (Data Mining): algorithms and data structures, machine learning and other sections of artificial intelligence (artificial neural networks, genetic algorithms, deep learning), programming languages (R, Python, Julia, Haskell) statistical analysis environments (R-Studio, MatLab, Jupyter Notebook);
- mathematics (statistics, probability theory, discrete mathematics);
- domain knowledge – industry or corporate specifics.
Note that, unlike Data Analytic, Data Scientist focuses on the technical aspects of information research, paying less attention to system analysis and business processes.
Image 2: Data Science knowledge areas
What is the difference between a Big Data Analyst and a Data Scientist
At first glance, it may seem that Data Scientist is no different from Data Analyst, because their work responsibilities and professional competencies overlap. However, these are not exactly interchangeable specialties. With significant similarities, the differences between them are also very significant:
- by tools-the analyst often works with ETL-storages and data marts, while the scientist interacts with Big Data storage and processing systems (Apache hadoop stack, NoSQL databases, etc.), as well as statistical packages (R-studio, Matlab , etc.) •);
- research methods – Data Analyst often uses methods of system analysis and business intelligence, while Data Scientist mainly works with mathematical tools of Computer Science (models and algorithms of machine learning, as well as other sections of artificial intelligence);
- salary – Data Scientist is slightly higher than Data Analyst in the labor market. Perhaps this is due to a higher threshold of entry into the profession: the data researcher has programming skills, whereas Data Analyst mainly works with ready-made SQL / ETL tools.
In practice, in some companies, all data work, including business intelligence and building Machine Learning models, is done by the same person. Today, more and more companies tend to share the responsibilities of Data Analyst and Data Scientist, as well as data engineer and big Data administrator, as we will discuss in the following articles.
Image 3: Data Scientist is one of the most popular professions in the modern IT market