File Sync & Archive to S3 Storage
Overview
File Synchronization and Archival
Sync directories, archive old files, and manage large file collections using S3 storage. Perfect for document management, media archives, and file distribution.
Prerequisites
Before You Start
- S3 storage configured with valid credentials
- AWS CLI installed and configured
- Sufficient bandwidth for large file transfers
- Clear understanding of sync vs backup operations
Directory Synchronization
Basic Directory Sync
Simple Directory Sync Script
Create sync_directory.sh:
#!/bin/bash
# Configuration
LOCAL_DIR="/home/user/Documents"
S3_BUCKET="file-sync"
S3_PREFIX="documents"
# S3 Configuration
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
echo "Syncing $LOCAL_DIR to S3..."
# Sync local directory to S3
aws s3 sync "$LOCAL_DIR" "s3://$S3_BUCKET/$S3_PREFIX/" \
--endpoint-url https://eu-west-1.euronodes.com \
--delete
echo "Sync completed"
Bidirectional Sync
Two-Way Directory Sync
Create bidirectional_sync.sh:
#!/bin/bash
# Configuration
LOCAL_DIR="/home/user/SharedFiles"
S3_BUCKET="file-sync"
S3_PREFIX="shared"
# S3 Configuration
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
echo "Starting bidirectional sync..."
# First, sync S3 to local (download new/changed files)
echo "Downloading changes from S3..."
aws s3 sync "s3://$S3_BUCKET/$S3_PREFIX/" "$LOCAL_DIR" \
--endpoint-url https://eu-west-1.euronodes.com
# Then, sync local to S3 (upload new/changed files)
echo "Uploading changes to S3..."
aws s3 sync "$LOCAL_DIR" "s3://$S3_BUCKET/$S3_PREFIX/" \
--endpoint-url https://eu-west-1.euronodes.com
echo "Bidirectional sync completed"
Selective Sync with Filters
Filtered Sync Script
Create filtered_sync.sh:
#!/bin/bash
# Configuration
LOCAL_DIR="/home/user/Projects"
S3_BUCKET="project-sync"
S3_PREFIX="projects"
# S3 Configuration
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
echo "Syncing projects with filters..."
# Sync with exclusions and inclusions
aws s3 sync "$LOCAL_DIR" "s3://$S3_BUCKET/$S3_PREFIX/" \
--endpoint-url https://eu-west-1.euronodes.com \
--exclude "*" \
--include "*.pdf" \
--include "*.docx" \
--include "*.xlsx" \
--include "*.pptx" \
--exclude "*/node_modules/*" \
--exclude "*/.git/*" \
--exclude "*/build/*" \
--exclude "*/dist/*"
echo "Filtered sync completed"
Media Archive Solutions
Photo Archive Script
Photo Collection Archive
Create photo_archive.sh:
#!/bin/bash
# Configuration
PHOTO_DIR="/home/user/Photos"
S3_BUCKET="media-archive"
S3_PREFIX="photos"
# S3 Configuration
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
echo "Archiving photos to S3..."
# Sync photos with metadata preservation
aws s3 sync "$PHOTO_DIR" "s3://$S3_BUCKET/$S3_PREFIX/" \
--endpoint-url https://eu-west-1.euronodes.com \
--exclude "*.DS_Store" \
--exclude "Thumbs.db" \
--metadata-directive COPY
# Create yearly archives
for year in $(find "$PHOTO_DIR" -name "*.jpg" -o -name "*.png" -o -name "*.raw" | \
xargs stat -c %Y | xargs -I {} date -d @{} +%Y | sort -u); do
echo "Creating archive for year $year..."
# Find photos from specific year and create archive
find "$PHOTO_DIR" -name "*.jpg" -o -name "*.png" -o -name "*.raw" | \
while read file; do
file_year=$(stat -c %Y "$file" | xargs -I {} date -d @{} +%Y)
if [ "$file_year" = "$year" ]; then
relative_path=${file#$PHOTO_DIR/}
aws s3 cp "$file" "s3://$S3_BUCKET/$S3_PREFIX/by-year/$year/$relative_path" \
--endpoint-url https://eu-west-1.euronodes.com
fi
done
done
echo "Photo archive completed"
Video Archive Script
Video Collection Archive
Create video_archive.sh:
#!/bin/bash
# Configuration
VIDEO_DIR="/home/user/Videos"
S3_BUCKET="media-archive"
S3_PREFIX="videos"
# S3 Configuration
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
echo "Archiving videos to S3..."
# Use multipart upload for large video files
aws configure set default.s3.multipart_threshold 64MB
aws configure set default.s3.multipart_chunksize 16MB
aws configure set default.s3.max_concurrent_requests 10
# Sync videos
aws s3 sync "$VIDEO_DIR" "s3://$S3_BUCKET/$S3_PREFIX/" \
--endpoint-url https://eu-west-1.euronodes.com \
--exclude "*.tmp" \
--exclude "*.part"
echo "Video archive completed"
Document Management
Document Archive by Type
Organize Documents by Type
Create document_archive.sh:
#!/bin/bash
# Configuration
DOC_DIR="/home/user/Documents"
S3_BUCKET="document-archive"
# S3 Configuration
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
echo "Archiving documents by type..."
# Archive PDFs
find "$DOC_DIR" -name "*.pdf" -type f | while read file; do
relative_path=${file#$DOC_DIR/}
aws s3 cp "$file" "s3://$S3_BUCKET/pdfs/$relative_path" \
--endpoint-url https://eu-west-1.euronodes.com
done
# Archive Office documents
find "$DOC_DIR" \( -name "*.docx" -o -name "*.xlsx" -o -name "*.pptx" \) -type f | while read file; do
relative_path=${file#$DOC_DIR/}
extension="${file##*.}"
aws s3 cp "$file" "s3://$S3_BUCKET/office/$extension/$relative_path" \
--endpoint-url https://eu-west-1.euronodes.com
done
# Archive text files
find "$DOC_DIR" \( -name "*.txt" -o -name "*.md" -o -name "*.rtf" \) -type f | while read file; do
relative_path=${file#$DOC_DIR/}
aws s3 cp "$file" "s3://$S3_BUCKET/text/$relative_path" \
--endpoint-url https://eu-west-1.euronodes.com
done
echo "Document archive by type completed"
Automated Document Workflow
Document Processing Pipeline
Create document_workflow.sh:
#!/bin/bash
# Configuration
INBOX_DIR="/home/user/DocumentInbox"
PROCESSED_DIR="/home/user/ProcessedDocuments"
S3_BUCKET="document-workflow"
# S3 Configuration
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
echo "Processing document workflow..."
# Process new documents in inbox
for file in "$INBOX_DIR"/*; do
if [ -f "$file" ]; then
filename=$(basename "$file")
extension="${filename##*.}"
date_prefix=$(date +%Y%m%d)
# Create processed filename
processed_name="${date_prefix}_${filename}"
# Move to processed directory
mv "$file" "$PROCESSED_DIR/$processed_name"
# Upload to S3 with date organization
aws s3 cp "$PROCESSED_DIR/$processed_name" \
"s3://$S3_BUCKET/$(date +%Y)/$(date +%m)/$processed_name" \
--endpoint-url https://eu-west-1.euronodes.com
echo "Processed: $filename -> $processed_name"
fi
done
echo "Document workflow completed"
Log File Management
Log Archive Script
Archive Old Log Files
Create log_archive.sh:
#!/bin/bash
# Configuration
LOG_DIR="/var/log"
S3_BUCKET="log-archive"
ARCHIVE_DAYS=7
# S3 Configuration
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
echo "Archiving old log files..."
# Find log files older than specified days
find "$LOG_DIR" -name "*.log" -mtime +$ARCHIVE_DAYS -type f | while read logfile; do
# Compress log file
gzip "$logfile"
compressed_file="${logfile}.gz"
# Create S3 path with date structure
relative_path=${compressed_file#$LOG_DIR/}
archive_date=$(date -r "$compressed_file" +%Y/%m/%d)
# Upload to S3
aws s3 cp "$compressed_file" \
"s3://$S3_BUCKET/$archive_date/$relative_path" \
--endpoint-url https://eu-west-1.euronodes.com
# Remove local compressed file after successful upload
if [ $? -eq 0 ]; then
rm "$compressed_file"
echo "Archived and removed: $compressed_file"
fi
done
echo "Log archive completed"
Application Log Rotation
Rotate and Archive Application Logs
Create app_log_rotation.sh:
#!/bin/bash
# Configuration
APP_NAME="myapp"
LOG_FILE="/var/log/$APP_NAME.log"
S3_BUCKET="app-logs"
MAX_SIZE="100M"
# S3 Configuration
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
echo "Checking log rotation for $APP_NAME..."
# Check if log file exceeds maximum size
if [ -f "$LOG_FILE" ]; then
file_size=$(stat -c%s "$LOG_FILE")
max_bytes=$(echo $MAX_SIZE | sed 's/M/*1024*1024/' | bc)
if [ $file_size -gt $max_bytes ]; then
echo "Log file exceeds $MAX_SIZE, rotating..."
# Create rotated filename with timestamp
rotated_name="${APP_NAME}_$(date +%Y%m%d_%H%M%S).log"
# Move current log to rotated name
mv "$LOG_FILE" "/tmp/$rotated_name"
# Compress rotated log
gzip "/tmp/$rotated_name"
# Upload to S3
aws s3 cp "/tmp/${rotated_name}.gz" \
"s3://$S3_BUCKET/$APP_NAME/$(date +%Y/%m)/${rotated_name}.gz" \
--endpoint-url https://eu-west-1.euronodes.com
# Clean up
rm "/tmp/${rotated_name}.gz"
# Create new empty log file
touch "$LOG_FILE"
chown app:app "$LOG_FILE"
echo "Log rotation completed: ${rotated_name}.gz"
else
echo "Log file size OK, no rotation needed"
fi
fi
Automated Scheduling
Cron Jobs for File Operations
Schedule File Sync and Archive
# Edit crontab
crontab -e
# Sync documents every 4 hours
0 */4 * * * /path/to/sync_directory.sh >> /var/log/file_sync.log 2>&1
# Archive photos weekly on Sundays at 3 AM
0 3 * * 0 /path/to/photo_archive.sh >> /var/log/photo_archive.log 2>&1
# Archive logs daily at 1 AM
0 1 * * * /path/to/log_archive.sh >> /var/log/log_archive.log 2>&1
# Process document workflow every hour
0 * * * * /path/to/document_workflow.sh >> /var/log/doc_workflow.log 2>&1
Monitoring Sync Status
Monitor File Sync Operations
Create sync_monitor.sh:
#!/bin/bash
# Configuration
S3_BUCKET="file-sync"
EXPECTED_DIRS=("documents" "photos" "projects")
echo "Monitoring sync status..."
for dir in "${EXPECTED_DIRS[@]}"; do
echo "Checking $dir..."
# Count files in S3
s3_count=$(aws s3 ls "s3://$S3_BUCKET/$dir/" --recursive \
--endpoint-url https://eu-west-1.euronodes.com | wc -l)
echo "$dir: $s3_count files in S3"
# Check last sync time
last_sync=$(aws s3 ls "s3://$S3_BUCKET/$dir/" \
--endpoint-url https://eu-west-1.euronodes.com | \
tail -1 | awk '{print $1" "$2}')
echo "Last sync: $last_sync"
done
FAQ
What's the difference between sync and backup?
Sync keeps directories identical (deletes files removed from source), while backup preserves all versions.
How do I handle large files efficiently?
Use multipart uploads, increase chunk sizes, and consider compression for text-based files.
Can I sync between multiple computers?
Yes, use S3 as the central hub and sync from multiple locations to the same bucket.
How do I handle file conflicts in bidirectional sync?
AWS CLI uses timestamps to determine newer files. Consider using version control for critical files.
What about bandwidth usage?
Monitor your usage and consider scheduling large syncs during off-peak hours.
Contact Support
Need Help?
- Sync Issues: Open support ticket through client portal
- Performance Problems: Include file sizes and transfer speeds
- Automation Help: Specify your sync requirements and schedule
For S3 setup, see S3 Configuration | For backup solutions, see Restic Backups