Manage meausurments
CREATE MEASUREMENT
openGemini supports automatic table creation when writing data, but in the following three situations, tables need to be created in advance.
Specify a tag as partition key
By default, data in openGemini is partition by hash base on time series. However, in some scenarios, users frequently use one or more tags for data query. the hash method distributes the data to different nodes. AS a result, the query fanout is large.
If the data can be partitioned according to these frequently used TAGs, then the data with the same TAG value will be stored on the same node, thereby reducing query fan-out and improving data query efficiency.
Specify a TAG (such as location) to break up the data
CREATE MEASUREMENT mst WITH SHARDKEY location
Specify multiple TAGs (such as location, region) as SHARD KEY
CREATE MEASUREMENT mst WITH SHARDKEY location,region
TEXT SEARCH
Text retrieval refers to searching and filtering base on keywords or phrases in text data sets. openGemini supports text retrieval, such as keyword retrieval of logs, which can return all log data containing keywords.
You need to create a measurement before use it. The purpose of creating a measurement is to specify which Field fields to create a full-text index on, but there is a premise that these fields must be 'string' data type
> CREATE MEASUREMENT mst WITH INDEXTYPE text INDEXLIST description, error_message
Create a measurement named mst and specify to create a full-text index on the two fields description
and error_message
.
> CREATE MEASUREMENT mst WITH INDEXTYPE text INDEXLIST description, error_message SHARDKEY location
Create a measurement named mst, and specify to create a full-text index on the description
and error_message
fields, and set location
as partition key
Tips
Full-text indexes will only be created on the fields description
and error_message
specified by PRIMARYKEY. If you search keywords in other Fields, it may be slower
It supports exact matching, phrase matching and fuzzy matching on fields description
and error_message
related entries Text Search
It is not recommended to create a text index on a TAG, and unforeseeable problems may occur.
USE A HIGH SERIES CARDINALITY STORAGE ENGINE(HSCE)
The traditional time series database has the problem of index expansion due to the large time series, and openGemini's HSCE solves this problem. When we use it, we need to specify the engine type 'columnstore' when creating a measurement. The default storage engine is not HSCE.
> CREATE MEASUREMENT mst (location string field default "", direction string field default "", rtt int field default 0, time int field default 0,) WITH ENGINETYPE = columnstore SHARDKEY location TYPE hash PRIMARYKEY location, direction SORTKEY time
Create a measurement named mst with four fields location
, direction
, rtt
, time,
, and specify the data type and default value respectively. For example, location is a string type, and the default value is an empty string.
ENGINETYPE
Required, must be columnstore
SHARDKEY
Required, specify location
as partition key
TYPE
Required, There are two ways to break up data: hash and range
PRIMARYKEY
Required, The primary key is location
and direction
, which means that the storage engine will create indexes on these two fields.
SORTKEY
Required, specify the data sorting method inside the storage engine. time
means sorting by time, and can also be changed to rtt
or direction
, or even other fields in the table.
When creating measurement, you need to pay attention to:
Must specify all field names, data types, TAG or ordinary field and the default value in case of missing values.
If
SHARDKEY
is not specified, all data will be written to one data node
Tips
The traditional inverted index is similar to a dense index with high series cardinality. The index take a lot of memory space, and the query efficiency is low, and it has little effect on data filtering. The openGemini high series cardinality storage engine improves data query efficiency by building sparse indexes.
For the problem of high series cardinality, openGemini has found a solution, but many functions of openGemini on the new storage engine are not yet perfect and cannot be used in a production environment. For example, it does not support aggregation operators, and the syntax for creating measurement still needs further streamlining, and some exceptions have not been handled yet.
Welcome to participate and improve the functions together.
SHOW MEASUREMENTS
View the measurements created in the database
SHOW MEASUREMENTS [ON <database_name>] [WITH MEASUREMENT <operator> ['<measurement_name>' | <regular_expression>]]
ON <database_name>
is optional。If ON <database_name>
is not included in the query, you must specify the database with USE <database_name>
in the CLI before, or use the parameter db
in the openGemini API request.
The WITH
clause, WHERE
clause, LIMIT
clause and OFFSET
clause are optional. The WHERE
clause supports tag comparison; in SHOW MEASUREMENTS
queries, field comparison is invalid.
The operators in WHERE
clause are:
Operators | Description |
---|---|
= | equal |
<> | not equal |
!= | not equal |
=~ | match |
!~ | not match |
relate entries FROM
clause、LIMIT、OFFSET
clause
Examples
SHOW MEASUREMENTS
with an ON
clause
> SHOW MEASUREMENTS ON NOAA_water_database
name: measurements
+---------------------+
| name |
+---------------------+
| average_temperature |
| h2o_feet |
| h2o_pH |
| h2o_quality |
| h2o_temperature |
+---------------------+
1 columns, 5 rows in set
The database NOAA_water_database
has five measurements: average_temperature
, h2o_feet
, h2o_pH
, h2o_quality
and h2o_temperature
.
SHOW MEASUREMENTS
without the ON
clause
use command USE <database_name>
specified database:
> USE NOAA_water_database
Elapsed: 781ns
> SHOW MEASUREMENTS
name: measurements
+---------------------+
| name |
+---------------------+
| average_temperature |
| h2o_feet |
| h2o_pH |
| h2o_quality |
| h2o_temperature |
+---------------------+
1 columns, 5 rows in set
Use the parameter db
to specify the database
> curl -G "http://localhost:8086/query?db=NOAA_water_database&pretty=true" --data-urlencode "q=SHOW MEASUREMENTS"
{
"results": [
{
"statement_id": 0,
"series": [
{
"name": "measurements",
"columns": [
"name"
],
"values": [
[
"average_temperature"
],
[
"h2o_feet"
],
[
"h2o_pH"
],
[
"h2o_quality"
],
[
"h2o_temperature"
]
]
}
]
}
]
}
SHOW MEASUREMENTS
with multiple clauses
> SHOW MEASUREMENTS ON NOAA_water_database WITH MEASUREMENT =~ /h2o.*/
name: measurements
+-----------------+
| name |
+-----------------+
| h2o_feet |
| h2o_pH |
| h2o_quality |
| h2o_temperature |
+-----------------+
1 columns, 4 rows in set
Return measurements whose names start with h2o
in the database NOAA_water_database
.
View the number of measurements
> SHOW MEASUREMENTS CARDINALITY
TODO
> SHOW MEASUREMENTS CARDINALITY ON NOAA_water_database
TODO
SHOW MEASUREMENTS DETAIL
SHOW MEASUREMENTS DETAIL WITH MEASUREMENT = '<measurement_name>'
Show the INDEX, SHARDKEY, ENGINE TYPE, TAG KEYS, FIELD KEYS, RETENION POLICY of measurement. Previously, at least 3 or 4 commands were required to query the above information.
Examples
> show measurements
name: measurements
+-----------------------+
| name |
+-----------------------+
| compact |
| dbpt_tasks |
| downSample |
| executor |
| filestat |
| hitRatio |
| httpd |
| io |
| merge |
| meta |
| metaRaft |
| meta_dbpt_tasks |
| metadata |
| ooo_time_distribution |
| performance |
| record |
| runtime |
| spdy |
| store_query |
| stream |
| stream_window |
+-----------------------+
1 columns, 21 rows in set
> show measurements detail with measurement = meta
name: meta
+------------------------------------+
| Detail |
+------------------------------------+
| RETENTION POLICY: autogen |
| INDEX: <nil> |
| SHARD KEY: <nil> |
| ENGINE TYPE: tsstore |
| TAG KEYS: Host, NodeID, app, |
| hostname |
| FIELD KEYS: |
| GetFromDataMarshalLenTotal(float), |
| GetFromDataMarshalTotal(float), |
| GetFromOpsMapLenTotal(float), |
| GetFromOpsMapTotal(float), |
| LTime(float), |
| LeaderSwitchTotal(float), |
| SnapshotDataSize(float), |
| SnapshotTotal(float), |
| SnapshotUnmarshalDuration(float), |
| Status(float)... |
+------------------------------------+
1 columns, 6 rows in set
DROP MEASUREMENT
use command DROP MEASUREMENT
to delete measurement.
Deleting a measurement will delete all data and indexes.
DROP MEASUREMENT <retention_policy_name>.<measurement_name>
The retention_policy_name
part could be omitted. In that case, the default policy of current database will be used.
Examples
Delete the measurement h2o_feet
in default retention policy
> DROP MEASUREMENT "h2o_feet"
Delete the measurement h2o_feet
in retention policy rp0
> DROP MEASUREMENT "rp0"."h2o_feet"
Note
There has no results return when the command 'DROP MEASUREMENT' excute success.