You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
classTSDataType(IntEnum):
""" Enumeration of data types currently supported by TsFile. """BOOLEAN=0INT32=1INT64=2FLOAT=3DOUBLE=4TEXT=5TIMESTAMP=8DATE=9BLOB=10STRING=11classColumnCategory(IntEnum):
""" Enumeration of column categories in TsFile. TAG: Represents a tag column, used for metadata. FIELD: Represents a field column, used for storing actual data values. """TAG=0FIELD=1classColumnSchema:
"""Defines schema for a table column (name, datatype, category)."""column_name=Nonedata_type=Nonecategory=Nonedef__init__(self, column_name: str, data_type: TSDataType,
category: ColumnCategory=ColumnCategory.FIELD)
classTableSchema:
"""Schema definition for a table structure."""table_name=Nonecolumns=Nonedef__init__(self, table_name: str, columns: List[ColumnSchema])
classResultSetMetaData:
"""Metadata container for query result sets (columns, types, table name)."""column_list=Nonedata_types=Nonetable_name=Nonedef__init__(self, column_list: List[str], data_types: List[TSDataType])
Write interface
TsFileWriter
classTsFileTableWriter:
""" Facilitates writing structured table data into a TsFile with a specified schema. """""" :param path: The path of tsfile, will create if it doesn't exist. :param table_schema: describes the schema of the tables want to write. :return: no return value. """def__init__(self, path: str, table_schema: TableSchema)
""" Write a tablet into table in tsfile. :param tablet: stored batch data of a table. :return: no return value. """defwrite_table(self, tablet: Tablet)
""" Close TsFileTableWriter and flush data automatically. :return: no return value. """defclose(self)
Tablet definition
You can use Tablet to insert data into TsFile in batches.
classTablet(object)
""" A pre-allocated columnar data container for batch data with type constraints. Creates timestamp buffer and typed data columns, with value range validation ranges for numeric types. Initializes: :param column_name_list: name list for data columns. :param type_list: TSDataType values specifying allowed types per column. :param max_row_num: Pre-allocated row capacity (default 1024) :return: no return value. """def__init__(self, column_name_list: list[str], type_list: list[TSDataType],
max_row_num: int=1024)
Read Interface
TsFileReader
classTsFileReader:
""" Query table data from a TsFile. """""" Initialize a TsFile reader for the specified file path. :param pathname: The path to the TsFile. :return no return value. """def__init__(self, pathname)
""" Executes a time range query on the specified table and columns. :param table_name: The name of the table to query. :param column_names: A list of column names to retrieve. :param start_time: The start time of the query range (default: minimum int64 value). :param end_time: The end time of the query range (default: maximum int64 value). :return: A query result set handler. """defquery_table(self, table_name : str, column_names : List[str],
start_time : int=np.iinfo(np.int64).min,
end_time: int=np.iinfo(np.int64).max) ->ResultSet""" Retrieves the schema of the specified table. :param table_name: The name of the table. :return: The schema of the specified table. """defget_table_schema(self, table_name : str)->TableSchema""" Retrieves the schemas of all tables in the TsFile. :return: A dictionary mapping table names to their schemas. """defget_all_table_schemas(self) ->dict[str, TableSchema]
""" Closes the TsFile reader. If the reader has active result sets, they will be invalidated. """defclose(self)
ResultSet
classResultSet:
""" Retrieves data from a query result set. When a query is executed, a query handler is returned. If the reader is closed, the result set will become invalid. """""" Checks and moves to the next row in the query result set. :return: True if the next row exists, False otherwise. """defnext(self) ->bool""" Retrieves the column information of the result set. :return: A dictionary containing column names as keys and their data types as values. """defget_result_column_info(self) ->dict[str, TsDataType]
""" Fetches the next DataFrame from the query result set. :param max_row_num: The maximum number of rows to retrieve. Default is 1024. :return: A DataFrame containing data from the query result set. """defread_data_frame(self, max_row_num : int=1024) ->DataFrame""" Retrieves the value at the specified index from the query result set. :param index: The index of the value to retrieve, 1 <= index <= column_num. :return: The value at the specified index. """defget_value_by_index(self, index : int)
""" Retrieves the value for the specified column name from the query result set. :param column_name: The name of the column to retrieve the value from. :return: The value of the specified column. """defget_value_by_name(self, column_name : str)
""" Retrieves the metadata of the result set. :return: The metadata of the result set as a ResultSetMetadata object. """defget_metadata(self) ->ResultSetMetadata""" Checks whether the field at the specified index in the result set is null. :param index: The index of the field to check. 1 <= index <= column_num. :return: True if the field is null, False otherwise. """defis_null_by_index(self, index : int)
""" Checks whether the field with the specified column name in the result set is null. :param name: The name of the column to check. :return: True if the field is null, False otherwise. """defis_null_by_name(self, name : str)
""" Closes the result set and releases any associated resources. """defclose(self)
to_dataframe
defto_dataframe(file_path: str,
table_name: Optional[str] =None,
column_names: Optional[list[str]] =None,
start_time: Optional[int] =None,
end_time: Optional[int] =None,
max_row_num: Optional[int] =None,
as_iterator: bool=False) ->Union[pd.DataFrame, Iterator[pd.DataFrame]]:
""" Read data from a TsFile and convert it into a Pandas DataFrame or an iterator of DataFrames. This function supports both table-model and tree-model TsFiles. Users can filter data by table name, column names, time range, and maximum number of rows. Parameters ---------- file_path : str Path to the TsFile to be read. table_name : Optional[str], default None Name of the table to query in table-model TsFiles. If None and the file is in table model, the first table found in the schema will be used. column_names : Optional[list[str]], default None List of column/measurement names to query. - If None, all columns will be returned. - Column existence will be validated in table-model TsFiles. start_time : Optional[int], default None Start timestamp for the query. If None, the minimum int64 value is used. end_time : Optional[int], default None End timestamp for the query. If None, the maximum int64 value is used. max_row_num : Optional[int], default None Maximum number of rows to read. - If None, all available rows will be returned. - When `as_iterator` is False, the final DataFrame will be truncated to this size if necessary. as_iterator : bool, default False Whether to return an iterator of DataFrames instead of a single concatenated DataFrame. - True: returns an iterator yielding DataFrames in batches - False: returns a single Pandas DataFrame Returns ------- Union[pandas.DataFrame, Iterator[pandas.DataFrame]] - A Pandas DataFrame if `as_iterator` is False - An iterator of Pandas DataFrames if `as_iterator` is True Raises ------ TableNotExistError If the specified table name does not exist in a table-model TsFile. ColumnNotExistError If any specified column does not exist in the table schema. """
TsFileDataFrame
TsFileDataFrame is built around three core types:
TsFileDataFrame: The entry object that loads one or more TsFiles and provides a unified view. Only metadata is scanned during initialization; actual data values are not read.
Timeseries: A lazy-loaded handle for a single time series. Obtained via array-style indexing with df[...], it contains series metadata but does not load data immediately – data reading is only triggered when indexed by row number.
AlignedTimeseries: The time-aligned result of multiple series. Obtained via df.loc[...], it aligns multiple specified series to the same timeline within a given time range and loads them into memory in one operation.