NeoGuardianAI

An advanced machine learning model for detecting phishing URLs with high accuracy

Protecting Users from Phishing Attacks

NeoGuardianAI is a sophisticated machine learning model designed to identify and flag potentially dangerous phishing URLs. With the rise in cyber threats, this tool serves as a digital guardian, helping users navigate the web safely.

The model analyzes various features of a URL to determine if it's legitimate or a phishing attempt, providing real-time protection against cyber threats.

Key Features

  • High accuracy (96.31%) in detecting phishing URLs
  • Analyzes multiple URL characteristics for comprehensive detection
  • Accessible through Hugging Face Spaces and API
  • User-friendly interface for easy URL checking

How the Model Was Generated

Development Process

  1. 1

    Data Collection

    The model was trained on the pirocheto/phishing-url dataset from Hugging Face, containing thousands of labeled URLs.

  2. 2

    Feature Engineering

    Extracted over 30 features from each URL, including length metrics, domain characteristics, special character counts, and suspicious patterns.

  3. 3

    Model Selection

    After evaluating multiple algorithms, XGBoost was selected for its superior performance in classification tasks and ability to handle complex feature relationships.

  4. 4

    Training & Optimization

    The model was trained with carefully tuned hyperparameters including max depth, learning rate, and regularization to prevent overfitting.

  5. 5

    Evaluation & Deployment

    After rigorous testing and validation, the model was deployed to Hugging Face Hub for public access and integrated into a Gradio web interface.

How NeoGuardianAI Works

URL Analysis Process

When a URL is submitted, NeoGuardianAI performs a comprehensive analysis:

  1. Extracts features from the URL structure
  2. Normalizes and scales the features
  3. Passes the processed data through the XGBoost model
  4. Generates a prediction with confidence score
  5. Returns a user-friendly result indicating safety status

Key Features Analyzed

  • URL length and structure
  • Domain age and registration information
  • Presence of suspicious keywords
  • Special character frequency and distribution
  • TLD (Top-Level Domain) reputation
  • Presence of IP addresses in URL
  • Redirection patterns

Model Architecture

XGBoost Classifier

A gradient boosting framework that uses decision trees and gradient boosting to create a highly accurate prediction model.

XGBClassifier(
  max_depth=5,
  learning_rate=0.1,
  n_estimators=100,
  subsample=0.8,
  colsample_bytree=0.8,
  gamma=0.1,
  objective='binary:logistic',
  eval_metric='logloss'
)

Feature Processing

StandardScaler is used to normalize features, ensuring all inputs have similar scale for optimal model performance.

Decision Process

The model combines multiple decision trees, with each new tree correcting errors made by previous trees, resulting in high accuracy predictions.

Model Performance

Accuracy

96.31%

Precision

96.00%

Recall

96.66%

F1 Score

96.33%

Performance Analysis

NeoGuardianAI achieves exceptional performance across all key metrics, making it highly reliable for phishing URL detection:

Balanced Performance

The close values of precision and recall indicate the model is well-balanced, minimizing both false positives and false negatives.

Real-World Effectiveness

The high F1 score (96.33%) demonstrates the model's effectiveness in real-world scenarios where both precision and recall are important.

Comparison to Industry Standards

NeoGuardianAI's performance exceeds many commercial phishing detection solutions, which typically achieve 85-90% accuracy.

How to Use NeoGuardianAI

Web Interface

The easiest way to use NeoGuardianAI is through the Hugging Face Spaces web interface:

  1. Visit the NeoGuardianAI Space
  2. Enter a URL in the input field
  3. Click "Check URL"
  4. View the prediction result and confidence score
Try it now

API Integration

For developers, NeoGuardianAI can be integrated into applications using the Hugging Face Inference API:

import requests

API_URL = "https://api-inference.huggingface.co/models/Devishetty100/neoguardianai"
headers = {"Authorization": "Bearer YOUR_API_TOKEN"}

def query(url):
    payload = {"inputs": url}
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

result = query("https://example.com")
print(result)

Replace YOUR_API_TOKEN with your Hugging Face API token. The API returns a prediction and confidence score for the provided URL.