Who is Big Data Administrator
What the Big Data Administrator does
Big Data Administrator creates and supports cluster solutions (including Apache Hadoop-based cloud platforms), including:
• installation and deployment of a cluster;
• selection of the initial configuration;
• optimization of nodes at the kernel level;
• managing updates and creating local repositories;
• configuring replication, authentication, and queue management
• ensuring information security of clusters;
• performance monitoring and balancing the load on servers;
• ensuring information security of clusters and systems;
• backup and recovery of data in case of failures.
In performing these duties, the administrator interacts with big data engineers, but their work tasks do not overlap, although they overlap in some way. Read what Data Engineer does here .
Image 1: Big Data Administrator – the Superman of the Big Data World
Big Data Administrator Professional Competencies
To solve the problems of creating, configuring and maintaining Big Data clusters, the Big Data Administrator must know the following disciplines and technologies:
• network protocols of the TCP / IP stack, incl. nginx, bash, etc .;
• programming languages Python, Shell, Go;
• Apache Hadoop ecosystem, as well as cluster solutions HBase, Kafka, Spark;
• monitoring systems Grafana, Zabbix, ELK, Prometheus;
• task schedulers and load balancers Cloudera Manager, Apache Ambari, Apache Zookeeper;
• cluster security tools Kerberos, Apache Sentry, Cloudera Navigator, Apache Ambari, Apache Ranger, Apache Knox, Apache Atlas;
• cloud platforms for big data (Amazon Web Services, Google Cloud Platform, Microsoft Azure and other similar solutions from large PaaS / IaaS providers).
In some companies, the big data administrator also has requirements for knowledge of continuous integration and software delivery tools (CI / CD, Continuous Integration / Continuous Delivery) – Jenkins, Puppet, Chef, Ansible, Docker, OpenShift, Kubernetes, as well as configuration management tools and testing (Terraform, Vault, Consul, Packer, Elasticsearch, etc.). However, such tasks are the responsibility of the DevOps engineer, and Big DataAdministrator is primarily involved in setting up a cluster infrastructure.
Image 2: Big Data Admin professional areas
What is the difference between Data Engineer and big data administrator?
Like Data Engineer, the big data administrator is part of Big Data engineering, which prepares the infrastructure for information analysis that Data Analyst and Data Scientist do. However, the areas of professional activity of an engineer and a big data administrator differ significantly from each other:
- Data Engineer works at a higher level of abstraction, concentrating on automating the collection and distribution of information flows, as well as interacting with corporate data warehouses (SQL and NoSQL databases, Data Lake, Data Warehouse);
- the Big Data administrator configures and maintains the data storage infrastructure, creating clusters and configuring cloud platforms, and taking care of the information security of big data.
Image 3:The Professional Path of Big Data Administrator