[toc]
Linux host:
Username: fox
Password: 123
Hive (on port 10000):
Username: fox
Password: 123
All required environments and scripts are included in the OVA file. The version of the software used to:
Software | Version |
---|---|
CentOS | 7.5 |
Hadoop | 3.1.3 |
Hive | 3.1.2 |
MySQL | 5.1.34 |
Superset | 2.1.2 |
The required software installation directory is in /opt/module
, which includes Hadoop, Hive, Superset, etc.
Changes can be made to the configuration file:
-
HDFS
/opt/module/hadoop-3.1.3/etc/hadoop/hdfs-site.xml
-
MapReduce
/opt/module/hadoop-3.1.3/etc/hadoop/mapred-site.xml
-
Yarn
/opt/module/hadoop-3.1.3/etc/hadoop/yarn-site.xml
/opt/module/hive/conf/hive-site.xml
If you want the cluster to work properly and start properly with scripts, you can configure the network as you need it:
-
IP address of each cluster
vim /etc/sysconfig/network-scripts/ifcfg-ens33
-
IP addresses and hostnames of other hosts in the cluster
vim /etc/hosts
-
Modify the host name
vim /etc/hostname
vim /etc/profile.d/my_env.sh
The scripts are written to make it easy to start and shut down clusters or distribute files in a uniform manner. These scripts can also be modified for your environment and are located under /bin
.
myhadoop [start|stop]
xsync FILES
jpsall
sudo shutdownall
In /home/fox/bilibili/user_1000000.csv
contains a sample csv file with one million pieces of data (data has been cleaned).
The word_cloud.csv
contains the data needed for the word cloud.
After starting Hive, you can execute bilibili_hive_ql.sql
to import and analyze the data.
Use the following command to start the visualization tool Superset on port 8099
conda activate superset
superset run -h YOUR_IP -p 8099 --with-threads --reload
You can import the Dashboard's analysis results directly into Superset, the file provided is the folder res_superset
.