tpcbench.py add --query support to run custom query#84
tpcbench.py add --query support to run custom query#84zhangxffff wants to merge 1 commit intoapache:mainfrom
Conversation
|
test in local environment |
jazracherif
left a comment
There was a problem hiding this comment.
Hello,
Thank you for this PR!
I'm new to this project and ran into the exact issues handled by the PR, was going to submit one until I found there was already one, so just providing some of my thoughts on it!
| if (args.qnum != -1 and args.query is not None): | ||
| print("Please specify either --qnum or --query, but not both") | ||
| exit(1) | ||
|
|
||
| queries = [] | ||
| if (args.qnum != -1): | ||
| if args.qnum < 1 or args.qnum > 22: | ||
| print("Invalid query number. Please specify a number between 1 and 22.") | ||
| exit(1) | ||
| else: | ||
| queries.append((str(args.qnum), tpch_query(args.qnum))) | ||
| print("Executing tpch query ", args.qnum) | ||
|
|
||
| elif (args.query is not None): | ||
| queries.append(("custom query", args.query)) | ||
| print("Executing custom query: ", args.query) | ||
| else: | ||
| print("Executing all tpch queries") | ||
| queries = [(str(i), tpch_query(i)) for i in range(1, 23)] | ||
|
|
There was a problem hiding this comment.
minor suggestion, extract this into its own functions, for example
from typing import List
def get_sql_queries(tpch_qnum: str = None, sql_statement: str= None) -> List[(str, str)]:
"""
Get the list of SQL statements from either the TPCH or user provided SQL statements.
At most one of these parameters can be provided.
:param tpch_qnum: the TPCH Query number. If none, return all TPCH queries supported
:param sql_statement: SQL string statement on available data tables (e.g ingested through make_data.py)
:return: a list of tuples with name of the Query and the string SQL statement
"""| parser.add_argument("--qnum", type=int, default=-1, | ||
| help="TPCH query number, 1-22") | ||
| parser.add_argument("--query", required=False, type=str, | ||
| help="Custom query to run with tpch tables") |
There was a problem hiding this comment.
| help="Custom query to run with tpch tables") | |
| help="Custom SQL query statement to run with tpch tables") |
| print("Invalid query number. Please specify a number between 1 and 22.") | ||
| exit(1) | ||
| else: | ||
| queries.append((str(args.qnum), tpch_query(args.qnum))) |
There was a problem hiding this comment.
explicitly mention TPCH in the id
| queries.append((str(args.qnum), tpch_query(args.qnum))) | |
| queries.append((f"TPCH-{args.qnum)}", tpch_query(args.qnum))) |
| print("Executing custom query: ", args.query) | ||
| else: | ||
| print("Executing all tpch queries") | ||
| queries = [(str(i), tpch_query(i)) for i in range(1, 23)] |
There was a problem hiding this comment.
| queries = [(str(i), tpch_query(i)) for i in range(1, 23)] | |
| queries = [(f"TPCH-{i}", tpch_query(i)) for i in range(1, 23)] |
|
|
||
| ```bash | ||
| RAY_COLOR_PREFIX=1 RAY_DEDUP_LOGS=0 python tpc.py --data=file:///path/to/your/tpch/directory/ --concurrency=2 --batch-size=8182 --worker-pool-min=10 --qnum 2 | ||
| RAY_COLOR_PREFIX=1 RAY_DEDUP_LOGS=0 python tpcbench.py --data=file:///path/to/your/tpch/directory/ --concurrency=2 --batch-size=8182 --worker-pool-min=10 --qnum 2 |
There was a problem hiding this comment.
I would recommend standardizing the data file directory to testdata/tpch and add the correct make_file.py command just above, for example
| RAY_COLOR_PREFIX=1 RAY_DEDUP_LOGS=0 python tpcbench.py --data=file:///path/to/your/tpch/directory/ --concurrency=2 --batch-size=8182 --worker-pool-min=10 --qnum 2 | |
| RAY_COLOR_PREFIX=1 RAY_DEDUP_LOGS=0 python tpcbench.py --data=../testdata/tpch --concurrency=2 --batch-size=8182 --worker-pool-min=10 --qnum 2 |
add before this more documentation one make_file
- In the
tpchdirectory, usemake_data.pyto create a TPCH dataset at a provided scale factor and an output director, such as thetestdatadirectory
python make_data.py 1 "../testdata/tpch"could also specify a env variable for this in the setup
TPCH_DATA=../testdata/tpch
and replace the examples with $TPCH_DATA
--queryargument suport for tpcbench.py to run custom query with tpch tables.docs/contributing.md