Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@ class TSDataType(IntEnum):
FLOAT = 3
DOUBLE = 4
TEXT = 5
TIMESTAMP = 8
DATE = 9
BLOB = 10
STRING = 11

class ColumnCategory(IntEnum):
Expand Down Expand Up @@ -280,3 +283,74 @@ class ResultSet:
def close(self)
```


### to_dataframe

```python

def to_dataframe(file_path: str,
table_name: Optional[str] = None,
column_names: Optional[list[str]] = None,
start_time: Optional[int] = None,
end_time: Optional[int] = None,
max_row_num: Optional[int] = None,
as_iterator: bool = False) -> Union[pd.DataFrame, Iterator[pd.DataFrame]]:

"""
Read data from a TsFile and convert it into a Pandas DataFrame or
an iterator of DataFrames.

This function supports both table-model and tree-model TsFiles.
Users can filter data by table name, column names, time range,
and maximum number of rows.

Parameters
----------
file_path : str
Path to the TsFile to be read.

table_name : Optional[str], default None
Name of the table to query in table-model TsFiles.
If None and the file is in table model, the first table
found in the schema will be used.

column_names : Optional[list[str]], default None
List of column names to query.
- If None, all columns will be returned.
- Column existence will be validated in table-model TsFiles.

start_time : Optional[int], default None
Start timestamp for the query.
If None, the minimum int64 value is used.

end_time : Optional[int], default None
End timestamp for the query.
If None, the maximum int64 value is used.

max_row_num : Optional[int], default None
Maximum number of rows to read.
- If None, all available rows will be returned.
- When `as_iterator` is False, the final DataFrame will be
truncated to this size if necessary.

as_iterator : bool, default False
Whether to return an iterator of DataFrames instead of
a single concatenated DataFrame.
- True: returns an iterator yielding DataFrames in batches
- False: returns a single Pandas DataFrame

Returns
-------
Union[pandas.DataFrame, Iterator[pandas.DataFrame]]
- A Pandas DataFrame if `as_iterator` is False
- An iterator of Pandas DataFrames if `as_iterator` is True

Raises
------
TableNotExistError
If the specified table name does not exist in a table-model TsFile.

ColumnNotExistError
If any specified column does not exist in the table schema.
"""
```
9 changes: 9 additions & 0 deletions src/UserGuide/develop/QuickStart/QuickStart-PYTHON.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,15 @@ with TsFileReader(table_data_dir) as reader:
print(result.read_data_frame())
```

use `to_dataframe` to read tsfile as dataframe.

```Python
import os
import tsfile as ts
table_data_dir = os.path.join(os.path.dirname(__file__), "table_data.tsfile")
print(ts.to_dataframe(table_data_dir))
```

## Sample Code

The sample code of using these interfaces is in:https://github.com/apache/tsfile/blob/develop/python/examples/example.py
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@ class TSDataType(IntEnum):
FLOAT = 3
DOUBLE = 4
TEXT = 5
TIMESTAMP = 8
DATE = 9
BLOB = 10
STRING = 11

class ColumnCategory(IntEnum):
Expand Down Expand Up @@ -280,3 +283,73 @@ class ResultSet:
def close(self)
```

### to_dataframe

```python

def to_dataframe(file_path: str,
table_name: Optional[str] = None,
column_names: Optional[list[str]] = None,
start_time: Optional[int] = None,
end_time: Optional[int] = None,
max_row_num: Optional[int] = None,
as_iterator: bool = False) -> Union[pd.DataFrame, Iterator[pd.DataFrame]]:

"""
Read data from a TsFile and convert it into a Pandas DataFrame or
an iterator of DataFrames.

This function supports both table-model and tree-model TsFiles.
Users can filter data by table name, column names, time range,
and maximum number of rows.

Parameters
----------
file_path : str
Path to the TsFile to be read.

table_name : Optional[str], default None
Name of the table to query in table-model TsFiles.
If None and the file is in table model, the first table
found in the schema will be used.

column_names : Optional[list[str]], default None
List of column/measurement names to query.
- If None, all columns will be returned.
- Column existence will be validated in table-model TsFiles.

start_time : Optional[int], default None
Start timestamp for the query.
If None, the minimum int64 value is used.

end_time : Optional[int], default None
End timestamp for the query.
If None, the maximum int64 value is used.

max_row_num : Optional[int], default None
Maximum number of rows to read.
- If None, all available rows will be returned.
- When `as_iterator` is False, the final DataFrame will be
truncated to this size if necessary.

as_iterator : bool, default False
Whether to return an iterator of DataFrames instead of
a single concatenated DataFrame.
- True: returns an iterator yielding DataFrames in batches
- False: returns a single Pandas DataFrame

Returns
-------
Union[pandas.DataFrame, Iterator[pandas.DataFrame]]
- A Pandas DataFrame if `as_iterator` is False
- An iterator of Pandas DataFrames if `as_iterator` is True

Raises
------
TableNotExistError
If the specified table name does not exist in a table-model TsFile.

ColumnNotExistError
If any specified column does not exist in the table schema.
"""
```
9 changes: 9 additions & 0 deletions src/UserGuide/latest/QuickStart/QuickStart-PYTHON.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,15 @@ with TsFileReader(table_data_dir) as reader:
print(result.read_data_frame())
```

Use `to_dataframe` to read tsfile as dataframe.

```Python
import os
import tsfile as ts
table_data_dir = os.path.join(os.path.dirname(__file__), "table_data.tsfile")
print(ts.to_dataframe(table_data_dir))
```

## Sample Code

The sample code of using these interfaces is in:https://github.com/apache/tsfile/blob/develop/python/examples/example.py
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ class TSDataType(IntEnum):
FLOAT = 3
DOUBLE = 4
TEXT = 5
TIMESTAMP = 8
DATE = 9
BLOB = 10
STRING = 11

class ColumnCategory(IntEnum):
Expand Down Expand Up @@ -262,3 +265,72 @@ class ResultSet:

```

### to_dataframe

```Python

def to_dataframe(file_path: str,
table_name: Optional[str] = None,
column_names: Optional[list[str]] = None,
start_time: Optional[int] = None,
end_time: Optional[int] = None,
max_row_num: Optional[int] = None,
as_iterator: bool = False) -> Union[pd.DataFrame, Iterator[pd.DataFrame]]:
"""
从 TsFile 中读取数据,并将其转换为 Pandas DataFrame
或 DataFrame 迭代器。

该函数同时支持表模型(table-model)和树模型(tree-model)的 TsFile。
用户可以通过表名、列名、时间范围以及最大行数对数据进行过滤。

Parameters
----------
file_path : str
要读取的 TsFile 文件路径。

table_name : Optional[str], default None
表模型 TsFile 中要查询的表名。
如果为 None 且文件为表模型,
将使用 schema 中找到的第一个表。

column_names : Optional[list[str]], default None
要查询的列名/测点名列表。
- 如果为 None,则返回所有列。
- 在表模型 TsFile 中会校验列是否存在。

start_time : Optional[int], default None
查询的起始时间戳。
如果为 None,则使用 int64 的最小值。

end_time : Optional[int], default None
查询的结束时间戳。
如果为 None,则使用 int64 的最大值。

max_row_num : Optional[int], default None
读取的最大行数。
- 如果为 None,则返回所有可用数据。
- 当 `as_iterator` 为 False 时,
若结果行数超过该值,DataFrame 将被截断。

as_iterator : bool, default False
是否返回 DataFrame 迭代器,而不是单个合并后的 DataFrame。
- True:返回按批次生成 DataFrame 的迭代器
- False:返回单个 Pandas DataFrame

Returns
-------
Union[pandas.DataFrame, Iterator[pandas.DataFrame]]
- 当 `as_iterator` 为 False 时,返回 Pandas DataFrame
- 当 `as_iterator` 为 True 时,返回 Pandas DataFrame 迭代器

Raises
------
TableNotExistError
当指定的表名在表模型 TsFile 中不存在时抛出。

ColumnNotExistError
当指定的列在表结构中不存在时抛出。
"""

```

9 changes: 9 additions & 0 deletions src/zh/UserGuide/develop/QuickStart/QuickStart-PYTHON.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,15 @@ with TsFileReader(table_data_dir) as reader:
print(result.read_data_frame())
```

使用 `to_dataframe` 读取 TsFile 为 Dataframe.

```Python
import os
import tsfile as ts
table_data_dir = os.path.join(os.path.dirname(__file__), "table_data.tsfile")
print(ts.to_dataframe(table_data_dir))
```

## 示例代码

使用这些接口的示例代码可以在以下链接中找到:https://github.com/apache/tsfile/blob/develop/python/examples/example.py
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ class TSDataType(IntEnum):
FLOAT = 3
DOUBLE = 4
TEXT = 5
TIMESTAMP = 8
DATE = 9
BLOB = 10
STRING = 11

class ColumnCategory(IntEnum):
Expand Down Expand Up @@ -262,3 +265,72 @@ class ResultSet:

```


### to_dataframe

```Python

def to_dataframe(file_path: str,
table_name: Optional[str] = None,
column_names: Optional[list[str]] = None,
start_time: Optional[int] = None,
end_time: Optional[int] = None,
max_row_num: Optional[int] = None,
as_iterator: bool = False) -> Union[pd.DataFrame, Iterator[pd.DataFrame]]:
"""
从 TsFile 中读取数据,并将其转换为 Pandas DataFrame
或 DataFrame 迭代器。

该函数同时支持表模型(table-model)和树模型(tree-model)的 TsFile。
用户可以通过表名、列名、时间范围以及最大行数对数据进行过滤。

Parameters
----------
file_path : str
要读取的 TsFile 文件路径。

table_name : Optional[str], default None
表模型 TsFile 中要查询的表名。
如果为 None 且文件为表模型,
将使用 schema 中找到的第一个表。

column_names : Optional[list[str]], default None
要查询的列名/测点名列表。
- 如果为 None,则返回所有列。
- 在表模型 TsFile 中会校验列是否存在。

start_time : Optional[int], default None
查询的起始时间戳。
如果为 None,则使用 int64 的最小值。

end_time : Optional[int], default None
查询的结束时间戳。
如果为 None,则使用 int64 的最大值。

max_row_num : Optional[int], default None
读取的最大行数。
- 如果为 None,则返回所有可用数据。
- 当 `as_iterator` 为 False 时,
若结果行数超过该值,DataFrame 将被截断。

as_iterator : bool, default False
是否返回 DataFrame 迭代器,而不是单个合并后的 DataFrame。
- True:返回按批次生成 DataFrame 的迭代器
- False:返回单个 Pandas DataFrame

Returns
-------
Union[pandas.DataFrame, Iterator[pandas.DataFrame]]
- 当 `as_iterator` 为 False 时,返回 Pandas DataFrame
- 当 `as_iterator` 为 True 时,返回 Pandas DataFrame 迭代器

Raises
------
TableNotExistError
当指定的表名在表模型 TsFile 中不存在时抛出。

ColumnNotExistError
当指定的列在表结构中不存在时抛出。
"""

```
Loading
Loading