Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 13 additions & 6 deletions hivemind_etl/storage/s3_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,18 +46,25 @@ def __init__(self):
f"Initializing S3 client with bucket: {self.bucket_name}, region: {self.region}"
)

# Configure S3 client
config = Config(
signature_version="s3v4",
region_name=self.region,
# a region-agnostic client (no region_name) always works for GetBucketLocation
self.s3_client = boto3.client(
"s3",
aws_access_key_id=self.access_key,
aws_secret_access_key=self.secret_key,
config=Config(signature_version="s3v4"),
)

resp = self.s3_client.get_bucket_location(Bucket=self.bucket_name)
self.bucket_region = resp["LocationConstraint"] or "us-east-1"

Comment on lines +49 to +59
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Good approach for dynamic region detection!

Using a region-agnostic client to determine the bucket's actual region is a great improvement. However, the get_bucket_location call could potentially fail and should include error handling.

# a region-agnostic client (no region_name) always works for GetBucketLocation
self.s3_client = boto3.client(
    "s3",
    aws_access_key_id=self.access_key,
    aws_secret_access_key=self.secret_key,
    config=Config(signature_version="s3v4"),
)

-resp = self.s3_client.get_bucket_location(Bucket=self.bucket_name)
-self.bucket_region = resp["LocationConstraint"] or "us-east-1"
+try:
+    resp = self.s3_client.get_bucket_location(Bucket=self.bucket_name)
+    self.bucket_region = resp["LocationConstraint"] or "us-east-1"
+    logging.info(f"Detected bucket region: {self.bucket_region}")
+except ClientError as e:
+    logging.warning(f"Could not determine bucket region: {str(e)}. Falling back to configured region.")
+    self.bucket_region = self.region
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# a region-agnostic client (no region_name) always works for GetBucketLocation
self.s3_client = boto3.client(
"s3",
aws_access_key_id=self.access_key,
aws_secret_access_key=self.secret_key,
config=Config(signature_version="s3v4"),
)
resp = self.s3_client.get_bucket_location(Bucket=self.bucket_name)
self.bucket_region = resp["LocationConstraint"] or "us-east-1"
# a region-agnostic client (no region_name) always works for GetBucketLocation
self.s3_client = boto3.client(
"s3",
aws_access_key_id=self.access_key,
aws_secret_access_key=self.secret_key,
config=Config(signature_version="s3v4"),
)
try:
resp = self.s3_client.get_bucket_location(Bucket=self.bucket_name)
self.bucket_region = resp["LocationConstraint"] or "us-east-1"
logging.info(f"Detected bucket region: {self.bucket_region}")
except ClientError as e:
logging.warning(f"Could not determine bucket region: {str(e)}. Falling back to configured region.")
self.bucket_region = self.region
🤖 Prompt for AI Agents
In hivemind_etl/storage/s3_client.py around lines 49 to 59, the
get_bucket_location call on the region-agnostic S3 client may raise exceptions
that are not currently handled. Add a try-except block around the
get_bucket_location call to catch potential errors, log or handle them
appropriately, and ensure the program can recover or fail gracefully if the
bucket location cannot be determined.

logging.info(f"Bucket region: {self.bucket_region}!")

self.s3_client = boto3.client(
"s3",
# endpoint_url=self.endpoint_url,
region_name=self.bucket_region,
aws_access_key_id=self.access_key,
aws_secret_access_key=self.secret_key,
config=config,
config=Config(signature_version="s3v4"),
)

# Ensure bucket exists
Expand Down