Hadoop for Beginner: Beeswax

Introducing Beeswax

The Beeswax application enables you to perform queries on Apache Hive, a data warehousing system designed to work with Hadoop. You can create Hive tables, load data, run and manage Hive queries, and download the results in a Microsoft Office Excel worksheet file or a comma-separated values file.

Beeswax and Hive Installation and Configuration

Beeswax is installed as part of Hue. For more information about installing Hue, see Hue Installation.

Hive Configuration

Beeswax, the Hive user interface in Hue, uses your system's Hive installation and is compatible with Hive 0.7.

Your Hive data is stored in the Hadoop Distributed File System (HDFS), typically in the /user/hive/warehouse directory (or the directory you specify as hive.metastore.warehouse.dir in the hive-site.xml file). Make sure this directory exists and is writable by the users whom you expect to be creating tables. The directory /tmp (on the local file system) must also be world-writable because Hive uses it extensively.

Beeswax Configuration

If there is an existing Hive installation:

In /etc/hue/beeswax.ini, modify the hive_conf_dir property to refer to the directory containing hive-site.xml.

If there is no existing Hive installation:

For information about the configuration options in hive-site.xml, see http://wiki.apache.org/hadoop/Hive/AdminManual/Configuration. The hive-site.xml file is optional but it is often useful, particularly if you want to set up a metastore. You may store the hive-site.xml file in /etc/hue/conf, or instruct Beeswax to locate it using the hive_conf_dir configuration variable (see /etc/hue/beeswax.ini).

Sharing Saved Queries

By default, a Beeswax user can see the saved queries for all users – both his/her own queries and those of other Beeswax users. If this behavior is not desirable, there is a configuration option you can change in the /etc/hue/beeswax.ini file to restrict viewing saved queries to only the query owner and Hue administrators. To change this setting, find and uncomment the share_saved_queries property and set it tofalse.

Starting Beeswax

To start the Beeswax application, click this icon images/image1.jpeg

in the application bar at the bottom of the Hue web page. The Beeswax Hive Query window opens in the Hue web page.

Installing the Beeswax Samples

You can install two sample Beeswax tables to use as examples.

To install Beeswax samples:

In the Beeswax window, click Tables.
In the Table List window, click install samples.

After you click install samples, the samples are displayed in the Hive Table List window. Beeswax removes the install samples button after the samples are installed so you can only install the samples once.

Working with Queries

The Hive Query view enables you to enter queries in Hive's Query Language (HQL), which is similar to Structured Query Language (SQL). You can name and save your queries to use later. When you submit a query, the Beeswax Server uses Hive to run the queries. You can either wait for the query to complete, or return later to find the queries in the Beeswax History view. You can also receive an email message after the query is completed.

For More Information

For information about HQL syntax, see http://wiki.apache.org/hadoop/Hive/LanguageManual.

Creating and Running Queries

To create and run a query:

In the Beeswax Hive Query window, type the query.For example, to select all data from the sample_08 table, you would type:SELECT * FROM sample_08
To view the Hive and Hadoop default settings for queries, click Settings at the top of the Beeswax window. To return to the Query Editor, click Query Editor.
To override the default Hive and Hadoop settings for the current query, click Advanced.A panel opens on the left side of the window where you can specify the advanced settings.

Click the plus sign icon images/image5.jpeg

to add a setting for the following options. Click the plus sign icon again to specify multiple settings for a group, such as Hive Settings.

Option	Description
Hive Settings	Use Hive Settings to override the Hive and Hadoop default settings. For Key, enter a Hive or Hadoop configuration variable name. For Value, enter the value you want to use for the variable. For example, to override the directory where structured hive query logs are created, you would enter`hive.querylog.location` for Key, and a path for Value. For information about Hive configuration variables, see: http://wiki.apache.org/hadoop/Hive/AdminManual/Configuration. For information about Hadoop configuration variables, see:http://hadoop.apache.org/common/docs/current/mapred-default.html
File Resources	Use File Resources to make locally accessible files available at query execution time on the entire Hadoop cluster. Hive uses Hadoop's Distributed Cache to distribute the added files to all machines in the cluster at query execution time.From the Type drop-down menu, choose one of the following:JAR — Adds the resources to the Java classpath. This is required in order to reference objects such as user defined functions.ARCHIVE — Automatically unarchives resources when distributing them.FILE — Adds resources to the distributed cache. Typically, this might be a transform script (or similar) to be executed.For Path, enter the path to the file. You can also clickChoose a File to browse and select the file.Note: It is not necessary to specify files used in a transform script if the files are available in the same path on all machines in the Hadoop cluster.
User-defined Functions	You can use user-defined functions in a query. Specify the function name for Name, and specify the class name for Class name. You must specify a JAR file for the user-defined functions in File Resources. To include a user-defined function in a query, add a $ (dollar sign) before the function name in the query. For example, if MyTable is a user-defined function name in the query, you would type: `SELECT * $MyTable`
Parameterization	If you want to display a dialog box for you or other users to enter parameter values when a query is executed, select Parameterization.
Email Notification	If you want to receive an email message after a query completes, select Email Notification.

Click the red Close icon to close a group, and click all of the Close icons to close the Advanced panel.
If you want to save your query and advanced settings to use them again later, click Save As, enter a name and description, and then click OK. To save changes to an existing query, click Save.
If you want to view the execution plan for the query, click Explain. For more information, see http://wiki.apache.org/hadoop/Hive/LanguageManual/Explain.
To run the query, click Execute.The Beeswax Query Results window appears with the results of your query.
Do any of the following to download or save the query results:
- Click Download XLS to download the results in a Microsoft Office Excel worksheet file.
- Click Download CSV to download the results in a comma-separated values file suitable for use in other applications.
- Click Save. To save the results in a new table, select In a new table, enter a name, and then click Save. To save the results in an HDFS file, select In an HDFS directory, enter a path or Choose File and browse to the directory, and then click Save.
To view a log of the query execution, click Log. You can use the information in this tab to debug your query.
Under MR Jobs, you can view any Map/Reduce jobs that the query started.
To return to the query in the Query Editor, click Unsaved Query or the name of your saved query in the blue box at the top of the panel on the left side of the Beeswax window.

Viewing Query History

Beeswax enables you to view the history of queries that you have previously run. Results for these queries are available for one week or until Hue is restarted.

To view query history:

In the Beeswax window, click History.Beeswax displays a list of your unsaved and saved queries in the Beeswax Query History window.
To display the queries for all users, click everyone's. To display your queries only, click mine.
To display the automatically generated actions that Beeswax performed on a user's behalf, click auto actions. To display user queries again, click user queries.

Viewing, Editing, or Deleting Saved Queries

You can view a list of saved queries by clicking Saved Queries in the Beeswax window.If Beeswax is configured for shared queries (the default), you can view the queries from any user, and copy any user's query, but you can only edit, delete, and view the history of your own queries. If sharing is disabled, then you can only view and copy your own queries.
images/image9.jpeg

To edit a saved query:

In the Beeswax window, click Saved Queries.Beeswax displays the Beeswax Queries window.
Right-click one of your queries and choose Edit from the context menu.

Beeswax displays the query in the Beeswax Query Editor window.
Change the query and then click Save. You can also click Save As, enter a new name, and click OK to save a copy of the query.

To delete a saved query:

In the Beeswax window, click Saved Queries.Beeswax displays the Beeswax Queries window.
Right-click any of your own queries and choose Delete from the context menu.
Click Ok to confirm the deletion.

To copy a saved query:

In the Beeswax window, click Saved Queries.Beeswax displays the Beeswax Queries window.
Right-click any of the queries and choose Clone from the context menu.Beeswax displays the query in the Beeswax Query Editor window.
Change the query as necessary and then click Save. You can also click Save As, enter a new name, and click Ok to save a copy of the query.

To copy a query in the Beeswax Query History window:

In the Beeswax window, click History.Beeswax displays the Beeswax Query History window.
To display the queries for all users, click everyone's.Beeswax displays the queries for all users in the Beeswax Query History window.
Click the Clone link next to the query you want to copy.Beeswax displays a copy of the query in the Beeswax Query Editor window.
Change the query, if necessary, and then click Save As, enter a new name, and click OK to save the query.

Working with Tables

When working with Hive tables, you can use Beeswax to:

Create tables
Browse tables
Import data into tables
Drop tables
View the location of a table

Creating Tables

Although you can create tables by executing the appropriate HQL DDL query commands, it is easier to create a table using the Beeswax table creation wizard.

To create a table:

In the Beeswax window, click Tables.
In the Beeswax Table List window, click new table.The table creation wizard starts.
Follow the instructions in the wizard to create the table. For information about an option in the wizard, place your mouse cursor on the help icon

next to the option.After you click Submit Query at the end of the table creation wizard, a new query to create the table is displayed in the Query Editor window.
Click Execute to run the query and create the table.Beeswax displays the new table's metadata on the right side of the Beeswax Table Metadata window.

Browsing Tables

To browse the data in a table:

In the Beeswax Table List window, click Tables.
Click the Browse Data link next to the table you want to browse.

Beeswax displays the table's data in the Query Results window.

To browse the metadata in a table:

In the Beeswax Table List window, click Tables.
Double-click the table.Beeswax displays the table's metadata on the right side of the Beeswax Table Metadata window.

Importing Data into Tables

When importing data, you can choose to append or overwrite the table's data with data from a file.

To import data into a table:

In the Beeswax Table List window, click Tables.
Double-click the table.Beeswax displays the Beeswax Table Metadata window.
Click Import Data.
Select Overwrite existing data to replace the data in the selected table with the imported data.
For Path, enter the path to the file that contains the data you want to import, or click Choose File to browse to the file.
Click Submit to start importing the data.

Dropping Tables

To drop a table:

In the Beeswax Table List window, click Tables.
Double-click the table.Beeswax displays the Beeswax Table Metadata window.
Click Drop Table.
Click Ok to confirm the deletion.

Viewing a Table's Location

To view a table's location:

In the Beeswax Table List window, click Tables.
Double-click the table.Beeswax displays the Beeswax Table Metadata window.
Click View File Location.Beeswax lists the selected table in its directory in the File Browser window.

Hadoop for Beginner

HTML/JavaScript

Beeswax

Introducing Beeswax

Beeswax and Hive Installation and Configuration

Hive Configuration

Beeswax Configuration

Sharing Saved Queries

Starting Beeswax

Installing the Beeswax Samples

Working with Queries

Creating and Running Queries

Viewing Query History

Viewing, Editing, or Deleting Saved Queries

Working with Tables

Creating Tables

Browsing Tables

Importing Data into Tables

Dropping Tables

Viewing a Table's Location

No comments:

Post a Comment

HTML/JavaScript

document.write(ssyby);

Beeswax

Introducing Beeswax

Beeswax and Hive Installation and Configuration

Hive Configuration

Beeswax Configuration

Sharing Saved Queries

Starting Beeswax

Installing the Beeswax Samples

Working with Queries

Creating and Running Queries

Viewing Query History

Viewing, Editing, or Deleting Saved Queries

Working with Tables

Creating Tables

Browsing Tables

Importing Data into Tables

Dropping Tables

Viewing a Table's Location

No comments:

Post a Comment