Overview
This is a repackaged open-source software product wherein additional charges apply for technical support and maintenance by Apps4Rent.
MLlib is a machine learning library for Apache Spark, an open-source distributed computing system designed for processing large-scale data sets. MLlib provides a set of distributed machine learning algorithms and utilities, as well as a high-level API for building scalable machine learning pipelines.
MLlib includes a range of supervised and unsupervised learning algorithms, such as linear regression, logistic regression, decision trees, random forests, k-means clustering, and collaborative filtering. The library also includes tools for feature extraction and transformation, model evaluation, and data visualization.
One of the key benefits of using MLlib is its ability to scale machine learning algorithms to handle large data sets that may not fit in memory on a single machine. By distributing the data and computation across multiple machines, MLlib can process and analyze large volumes of data quickly and efficiently.
Overall, MLlib is a powerful tool for building large-scale machine learning pipelines and is widely used in the industry for a variety of applications, such as fraud detection, recommendation systems, and predictive analytics.
Disclaimer: The respective trademarks mentioned in the offering are owned by the respective companies. We do not provide the commercial license of any of these products. Many of the products have a free, demo or Open-Source license as applicable.
Image may take up to 5-7 minutes for initial launch.
Highlights
- MLlib is designed to scale out to handle large datasets, and it can run on distributed computing clusters. MLlib supports a wide range of machine learning algorithms, including classification, regression, clustering, collaborative filtering, and dimensionality reduction.
- MLlib integrates seamlessly with other components of the Spark ecosystem, such as Spark SQL and Spark Streaming. MLlib provides a number of utilities for feature engineering, including feature extraction, transformation, and selection.
- MLlib provides tools for model selection and tuning, including cross-validation, grid search, and hyperparameter tuning. MLlib can read and write data in various formats, including CSV, JSON, and Parquet.
Details
Typical total price
$0.146/hour
Features and programs
Financing for AWS Marketplace purchases
Pricing
- ...
Instance type | Product cost/hour | EC2 cost/hour | Total/hour |
---|---|---|---|
t2.nano | $0.10 | $0.006 | $0.106 |
t2.micro AWS Free Tier | $0.10 | $0.012 | $0.112 |
t2.small | $0.10 | $0.023 | $0.123 |
t2.medium Recommended | $0.10 | $0.046 | $0.146 |
t2.large | $0.10 | $0.093 | $0.193 |
t2.xlarge | $0.10 | $0.186 | $0.286 |
t2.2xlarge | $0.10 | $0.371 | $0.471 |
t3.nano | $0.10 | $0.005 | $0.105 |
t3.micro AWS Free Tier | $0.10 | $0.01 | $0.11 |
t3.small | $0.10 | $0.021 | $0.121 |
Additional AWS infrastructure costs
Type | Cost |
---|---|
EBS General Purpose SSD (gp2) volumes | $0.10/per GB/month of provisioned storage |
Vendor refund policy
Apps4Rent does not offer commercial licenses or refund to any product mentioned above. The product comes with open source licenses.
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
64-bit (x86) Amazon Machine Image (AMI)
Amazon Machine Image (AMI)
An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.
Version release notes
Additional details
Usage instructions
*For Linux: Connect to your Linux instance via port number 22 using SSH. Please refer to this article for more information: https://docs.thinkwithwp.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html Sign-in credentials: Username: ubuntu
Dependencies version check commands:
To check Python Version: python3 --version (Python version: 3.10.6) To check PySpark Version: pyspark --version (PySpark version: 3.3.2) To check Java Version: java --version (Java Version: 11.0.18)
To verify MLlib Installation in the system, type the following commands:
Open PySpark: pyspark
In the PySpark shell, create a sample RDD: rdd = sc.parallelize([(1,2),(3,4),(5,6)])
Import the pyspark.mllib module and use its KMeans clustering algorithm to cluster the data in the RDD:
from pyspark.mllib.clustering import KMeans model = KMeans.train(rdd, 2, maxIterations=10, initializationMode="random")
To confirm that MLlib is present in your system, you can print the model: print(model)
Output: <pyspark.mllib.clustering.KMeansModel object at 0x7fbb6d8effd0>
Below are the minimum external resources subscribers need to have to use this product: An Internet connection is required in order for this product to function as expected.
We recommend keeping your crucial data in a custom-made encrypted EBS in order to save it from termination in the future.
Resources
Vendor resources
Support
Vendor support
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.