File Sync & Archive to S3 Storage

Overview

File Synchronization and Archival

Sync directories, archive old files, and manage large file collections using S3 storage. Perfect for document management, media archives, and file distribution.

Prerequisites

Before You Start

S3 storage configured with valid credentials
AWS CLI installed and configured
Sufficient bandwidth for large file transfers
Clear understanding of sync vs backup operations

Directory Synchronization

Basic Directory Sync

Simple Directory Sync Script

Create sync_directory.sh:

#!/bin/bash

# Configuration
LOCAL_DIR="/home/user/Documents"
S3_BUCKET="file-sync"
S3_PREFIX="documents"

# S3 Configuration
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"

echo "Syncing $LOCAL_DIR to S3..."

# Sync local directory to S3
aws s3 sync "$LOCAL_DIR" "s3://$S3_BUCKET/$S3_PREFIX/" \
    --endpoint-url https://eu-west-1.euronodes.com \
    --delete

echo "Sync completed"

Bidirectional Sync

Two-Way Directory Sync

Create bidirectional_sync.sh:

#!/bin/bash

# Configuration
LOCAL_DIR="/home/user/SharedFiles"
S3_BUCKET="file-sync"
S3_PREFIX="shared"

# S3 Configuration
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"

echo "Starting bidirectional sync..."

# First, sync S3 to local (download new/changed files)
echo "Downloading changes from S3..."
aws s3 sync "s3://$S3_BUCKET/$S3_PREFIX/" "$LOCAL_DIR" \
    --endpoint-url https://eu-west-1.euronodes.com

# Then, sync local to S3 (upload new/changed files)
echo "Uploading changes to S3..."
aws s3 sync "$LOCAL_DIR" "s3://$S3_BUCKET/$S3_PREFIX/" \
    --endpoint-url https://eu-west-1.euronodes.com

echo "Bidirectional sync completed"

Selective Sync with Filters

Filtered Sync Script

Create filtered_sync.sh:

#!/bin/bash

# Configuration
LOCAL_DIR="/home/user/Projects"
S3_BUCKET="project-sync"
S3_PREFIX="projects"

# S3 Configuration
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"

echo "Syncing projects with filters..."

# Sync with exclusions and inclusions
aws s3 sync "$LOCAL_DIR" "s3://$S3_BUCKET/$S3_PREFIX/" \
    --endpoint-url https://eu-west-1.euronodes.com \
    --exclude "*" \
    --include "*.pdf" \
    --include "*.docx" \
    --include "*.xlsx" \
    --include "*.pptx" \
    --exclude "*/node_modules/*" \
    --exclude "*/.git/*" \
    --exclude "*/build/*" \
    --exclude "*/dist/*"

echo "Filtered sync completed"

Media Archive Solutions

Photo Archive Script

Photo Collection Archive

Create photo_archive.sh:

#!/bin/bash

# Configuration
PHOTO_DIR="/home/user/Photos"
S3_BUCKET="media-archive"
S3_PREFIX="photos"

# S3 Configuration
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"

echo "Archiving photos to S3..."

# Sync photos with metadata preservation
aws s3 sync "$PHOTO_DIR" "s3://$S3_BUCKET/$S3_PREFIX/" \
    --endpoint-url https://eu-west-1.euronodes.com \
    --exclude "*.DS_Store" \
    --exclude "Thumbs.db" \
    --metadata-directive COPY

# Create yearly archives
for year in $(find "$PHOTO_DIR" -name "*.jpg" -o -name "*.png" -o -name "*.raw" | \
              xargs stat -c %Y | xargs -I {} date -d @{} +%Y | sort -u); do

    echo "Creating archive for year $year..."

    # Find photos from specific year and create archive
    find "$PHOTO_DIR" -name "*.jpg" -o -name "*.png" -o -name "*.raw" | \
    while read file; do
        file_year=$(stat -c %Y "$file" | xargs -I {} date -d @{} +%Y)
        if [ "$file_year" = "$year" ]; then
            relative_path=${file#$PHOTO_DIR/}
            aws s3 cp "$file" "s3://$S3_BUCKET/$S3_PREFIX/by-year/$year/$relative_path" \
                --endpoint-url https://eu-west-1.euronodes.com
        fi
    done
done

echo "Photo archive completed"

Video Archive Script

Video Collection Archive

Create video_archive.sh:

#!/bin/bash

# Configuration
VIDEO_DIR="/home/user/Videos"
S3_BUCKET="media-archive"
S3_PREFIX="videos"

# S3 Configuration
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"

echo "Archiving videos to S3..."

# Use multipart upload for large video files
aws configure set default.s3.multipart_threshold 64MB
aws configure set default.s3.multipart_chunksize 16MB
aws configure set default.s3.max_concurrent_requests 10

# Sync videos
aws s3 sync "$VIDEO_DIR" "s3://$S3_BUCKET/$S3_PREFIX/" \
    --endpoint-url https://eu-west-1.euronodes.com \
    --exclude "*.tmp" \
    --exclude "*.part"

echo "Video archive completed"

Document Management

Document Archive by Type

Organize Documents by Type

Create document_archive.sh:

#!/bin/bash

# Configuration
DOC_DIR="/home/user/Documents"
S3_BUCKET="document-archive"

# S3 Configuration
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"

echo "Archiving documents by type..."

# Archive PDFs
find "$DOC_DIR" -name "*.pdf" -type f | while read file; do
    relative_path=${file#$DOC_DIR/}
    aws s3 cp "$file" "s3://$S3_BUCKET/pdfs/$relative_path" \
        --endpoint-url https://eu-west-1.euronodes.com
done

# Archive Office documents
find "$DOC_DIR" \( -name "*.docx" -o -name "*.xlsx" -o -name "*.pptx" \) -type f | while read file; do
    relative_path=${file#$DOC_DIR/}
    extension="${file##*.}"
    aws s3 cp "$file" "s3://$S3_BUCKET/office/$extension/$relative_path" \
        --endpoint-url https://eu-west-1.euronodes.com
done

# Archive text files
find "$DOC_DIR" \( -name "*.txt" -o -name "*.md" -o -name "*.rtf" \) -type f | while read file; do
    relative_path=${file#$DOC_DIR/}
    aws s3 cp "$file" "s3://$S3_BUCKET/text/$relative_path" \
        --endpoint-url https://eu-west-1.euronodes.com
done

echo "Document archive by type completed"

Automated Document Workflow

Document Processing Pipeline

Create document_workflow.sh:

#!/bin/bash

# Configuration
INBOX_DIR="/home/user/DocumentInbox"
PROCESSED_DIR="/home/user/ProcessedDocuments"
S3_BUCKET="document-workflow"

# S3 Configuration
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"

echo "Processing document workflow..."

# Process new documents in inbox
for file in "$INBOX_DIR"/*; do
    if [ -f "$file" ]; then
        filename=$(basename "$file")
        extension="${filename##*.}"
        date_prefix=$(date +%Y%m%d)

        # Create processed filename
        processed_name="${date_prefix}_${filename}"

        # Move to processed directory
        mv "$file" "$PROCESSED_DIR/$processed_name"

        # Upload to S3 with date organization
        aws s3 cp "$PROCESSED_DIR/$processed_name" \
            "s3://$S3_BUCKET/$(date +%Y)/$(date +%m)/$processed_name" \
            --endpoint-url https://eu-west-1.euronodes.com

        echo "Processed: $filename -> $processed_name"
    fi
done

echo "Document workflow completed"

Log File Management

Log Archive Script

Archive Old Log Files

Create log_archive.sh:

#!/bin/bash

# Configuration
LOG_DIR="/var/log"
S3_BUCKET="log-archive"
ARCHIVE_DAYS=7

# S3 Configuration
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"

echo "Archiving old log files..."

# Find log files older than specified days
find "$LOG_DIR" -name "*.log" -mtime +$ARCHIVE_DAYS -type f | while read logfile; do

    # Compress log file
    gzip "$logfile"
    compressed_file="${logfile}.gz"

    # Create S3 path with date structure
    relative_path=${compressed_file#$LOG_DIR/}
    archive_date=$(date -r "$compressed_file" +%Y/%m/%d)

    # Upload to S3
    aws s3 cp "$compressed_file" \
        "s3://$S3_BUCKET/$archive_date/$relative_path" \
        --endpoint-url https://eu-west-1.euronodes.com

    # Remove local compressed file after successful upload
    if [ $? -eq 0 ]; then
        rm "$compressed_file"
        echo "Archived and removed: $compressed_file"
    fi
done

echo "Log archive completed"

Application Log Rotation

Rotate and Archive Application Logs

Create app_log_rotation.sh:

#!/bin/bash

# Configuration
APP_NAME="myapp"
LOG_FILE="/var/log/$APP_NAME.log"
S3_BUCKET="app-logs"
MAX_SIZE="100M"

# S3 Configuration
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"

echo "Checking log rotation for $APP_NAME..."

# Check if log file exceeds maximum size
if [ -f "$LOG_FILE" ]; then
    file_size=$(stat -c%s "$LOG_FILE")
    max_bytes=$(echo $MAX_SIZE | sed 's/M/*1024*1024/' | bc)

    if [ $file_size -gt $max_bytes ]; then
        echo "Log file exceeds $MAX_SIZE, rotating..."

        # Create rotated filename with timestamp
        rotated_name="${APP_NAME}_$(date +%Y%m%d_%H%M%S).log"

        # Move current log to rotated name
        mv "$LOG_FILE" "/tmp/$rotated_name"

        # Compress rotated log
        gzip "/tmp/$rotated_name"

        # Upload to S3
        aws s3 cp "/tmp/${rotated_name}.gz" \
            "s3://$S3_BUCKET/$APP_NAME/$(date +%Y/%m)/${rotated_name}.gz" \
            --endpoint-url https://eu-west-1.euronodes.com

        # Clean up
        rm "/tmp/${rotated_name}.gz"

        # Create new empty log file
        touch "$LOG_FILE"
        chown app:app "$LOG_FILE"

        echo "Log rotation completed: ${rotated_name}.gz"
    else
        echo "Log file size OK, no rotation needed"
    fi
fi

Automated Scheduling

Cron Jobs for File Operations

Schedule File Sync and Archive

# Edit crontab
crontab -e

# Sync documents every 4 hours
0 */4 * * * /path/to/sync_directory.sh >> /var/log/file_sync.log 2>&1

# Archive photos weekly on Sundays at 3 AM
0 3 * * 0 /path/to/photo_archive.sh >> /var/log/photo_archive.log 2>&1

# Archive logs daily at 1 AM
0 1 * * * /path/to/log_archive.sh >> /var/log/log_archive.log 2>&1

# Process document workflow every hour
0 * * * * /path/to/document_workflow.sh >> /var/log/doc_workflow.log 2>&1

Monitoring Sync Status

Monitor File Sync Operations

Create sync_monitor.sh:

#!/bin/bash

# Configuration
S3_BUCKET="file-sync"
EXPECTED_DIRS=("documents" "photos" "projects")

echo "Monitoring sync status..."

for dir in "${EXPECTED_DIRS[@]}"; do
    echo "Checking $dir..."

    # Count files in S3
    s3_count=$(aws s3 ls "s3://$S3_BUCKET/$dir/" --recursive \
        --endpoint-url https://eu-west-1.euronodes.com | wc -l)

    echo "$dir: $s3_count files in S3"

    # Check last sync time
    last_sync=$(aws s3 ls "s3://$S3_BUCKET/$dir/" \
        --endpoint-url https://eu-west-1.euronodes.com | \
        tail -1 | awk '{print $1" "$2}')

    echo "Last sync: $last_sync"
done

FAQ

What's the difference between sync and backup?

Sync keeps directories identical (deletes files removed from source), while backup preserves all versions.

How do I handle large files efficiently?

Use multipart uploads, increase chunk sizes, and consider compression for text-based files.

Can I sync between multiple computers?

Yes, use S3 as the central hub and sync from multiple locations to the same bucket.

How do I handle file conflicts in bidirectional sync?

AWS CLI uses timestamps to determine newer files. Consider using version control for critical files.

What about bandwidth usage?

Monitor your usage and consider scheduling large syncs during off-peak hours.

Contact Support

Need Help?

Sync Issues: Open support ticket through client portal
Performance Problems: Include file sizes and transfer speeds
Automation Help: Specify your sync requirements and schedule

For S3 setup, see S3 Configuration | For backup solutions, see Restic Backups