Skip to content

Create a 3D Object from Your Images with TripoSR in Python Ritwik Raha PyImageSearch

  • by

​[[{“value”:”


Table of Contents


Create a 3D Object from Your Images with TripoSR in Python

In this tutorial, we’ll walk you through the process of creating a 3D object from a single image using TripoSR, a state-of-the-art model for fast-feedforward 3D reconstruction. We’ll cover everything from setting up the environment to generating the final 3D model and rendering a result video.

To learn how to generate high-quality 3D objects from a SINGLE image, just keep reading.

Looking for the source code to this post?

Jump Right To The Downloads Section


Image to 3D Objects

At PyImageSearch, we have shown how to create 3D objects from an array of specialized images using Neural Implicit Scene Rendering (NeRFs). While NeRF is a reliable and established process for generating 3D objects with images, there are multiple problems with this approach.

  1. Exact COLMAP settings need to be calibrated
  2. Multiple images from multiple angles
  3. Expensive training and inference
  4. Generates low-quality 3D views

Ideally, with the advancements of Computer Vision in the last 3 years, we would like to generate reliable and high-quality 3D objects fast from limited (read 1) images. Enter TripoSR from StabilityAI.

Leveraging the principles of the Large Reconstruction Model (LRM), TripoSR brings to the table key advancements that significantly boost both the speed and quality of 3D reconstruction (as shown in Figure 1). Our model is distinguished by its ability to rapidly process inputs, generating high-quality 3D models in less than 0.5 seconds on an NVIDIA A100 GPU. TripoSR has exhibited superior performance in both qualitative and quantitative evaluations, outperforming other open-source alternatives across multiple public datasets. The figures below illustrate visual comparisons and metrics showcasing TripoSR’s performance relative to other leading models. Details about the model architecture, training process, and comparisons can be found in this technical report.

Figure 1: Results of the TripoSR model from their technical report.

TripoSR is a new 3D reconstruction model that is based on the Large Reconstruction Model (LRM). It is faster and more accurate than other open-source 3D reconstruction models. TripoSR can process inputs in less than 0.5 seconds on an NVIDIA A100 GPU. It has been evaluated on multiple public datasets and has been shown to outperform other models in both qualitative and quantitative evaluations.

The figures below show visual comparisons and metrics of TripoSR’s performance relative to other leading models. Details about the model architecture, training process, and comparisons can be found in this technical report.

TripoSR is a promising new 3D reconstruction model that has the potential to be used in a variety of applications. It is fast, accurate, and easy to use.

This tutorial uses your images for generating 3D objects. This means you will be able to upload images and turn them into 3D objects. Product photography images are notoriously hard to gather from the internet. How would you like immediate access to 3,457 professional images curated and labeled with hand gestures to train, explore, and experiment with … for free? Head over to Roboflow and get a free account to grab these hand gesture images.


Setting Up the Environment

First, we need to set up our environment:

!git clone https://github.com/pyimagesearch/TripoSR.git
import sys
sys.path.append('/content/TripoSR/tsr')
%cd TripoSR
!pip install -r requirements.txt -q

Here, we’re cloning the TripoSR repository, adding it to our Python path, changing it to the TripoSR directory, and installing the required dependencies.


Importing Necessary Libraries

Next, we import the required libraries:

import torch
import os
import time
from PIL import Image
import numpy as np
from IPython.display import Video
from tsr.system import TSR
from tsr.utils import remove_background, resize_foreground, save_video
import pymeshlab as pymesh
import rembg

We’re importing various libraries for image processing, 3D modeling, and utility functions. The TSR class from tsr.system is the core of TripoSR.


Setting Up the Device

We determine whether to use CUDA (GPU) or CPU:

device = "cuda" if torch.cuda.is_available() else "cpu"

This line checks if a CUDA-compatible GPU is available and sets the device accordingly.


Creating a Timer Utility

To measure the performance of different steps, we create a Timer class:

class Timer:
    def __init__(self):
        self.items = {}
        self.time_scale = 1000.0  # ms
        self.time_unit = "ms"
    def start(self, name: str) -> None:
        if torch.cuda.is_available():
            torch.cuda.synchronize()
        self.items[name] = time.time()
    def end(self, name: str) -> float:
        if name not in self.items:
            return
        if torch.cuda.is_available():
            torch.cuda.synchronize()
        start_time = self.items.pop(name)
        delta = time.time() - start_time
        t = delta * self.time_scale
        print(f"{name} finished in {t:.2f}{self.time_unit}.")

timer = Timer()

This Timer class allows us to measure the execution time of different parts of our process.


Uploading and Preparing the Image

Now, we upload our image, a Nike Low (shown in Figure 2), and prepare it for processing:

from google.colab import files
uploaded = files.upload()
original_image = Image.open(list(uploaded.keys())[0])
original_image.resize((512, 512)).save("examples/product.png")

We use Google Colab’s file upload feature to get our image, then resize it to 512x512 pixels and save it.

Figure 2: A product image of Nike sneakers.

Setting Up TripoSR Parameters

We define the parameters for running TripoSR:

image_paths = "/content/TripoSR/examples/product.png"
device = "cuda:0"
pretrained_model_name_or_path = "stabilityai/TripoSR"
chunk_size = 8192
no_remove_bg = True
foreground_ratio = 0.85
output_dir = "output/"
model_save_format = "obj"
render = True
output_dir = output_dir.strip()
os.makedirs(output_dir, exist_ok=True)

These parameters define the input image path, the device to use, the pretrained model to load, and various other settings for the 3D reconstruction process.


Initializing the TripoSR Model

We initialize the TripoSR model:

timer.start("Initializing model")
model = TSR.from_pretrained(
    pretrained_model_name_or_path,
    config_name="config.yaml",
    weight_name="model.ckpt",
)
model.renderer.set_chunk_size(chunk_size)
model.to(device)
timer.end("Initializing model")

Here, we load the pretrained TripoSR model, set the chunk size for rendering, and move the model to the specified device (GPU or CPU).


Processing the Image

Now, we process our input image:

timer.start("Processing images")
images = []
rembg_session = rembg.new_session()
image = remove_background(original_image, rembg_session)
image = resize_foreground(original_image, foreground_ratio)
if image.mode == "RGBA":
    image = np.array(image).astype(np.float32) / 255.0
    image = image[:, :, :3] * image[:, :, 3:4] + (1 - image[:, :, 3:4]) * 0.5
    image = Image.fromarray((image * 255.0).astype(np.uint8))
image_dir = os.path.join(output_dir, str(0))
os.makedirs(image_dir, exist_ok=True)
image.save(os.path.join(image_dir, "input.png"))
images.append(image)
timer.end("Processing images")

In this step, we remove the background from the image, resize it, and handle RGBA (red green blue alpha) images by blending the alpha channel with a gray background, as shown in Figure 3.

Figure 3: The background removed the processed image.

Generating the 3D Model and Rendering

Finally, we generate the 3D model and render it:

for i, image in enumerate(images):
    print(f"Running image {i + 1}/{len(images)} ...")
    timer.start("Running model")
    with torch.no_grad():
        scene_codes = model([image], device=device)
    timer.end("Running model")
    
    if render:
        timer.start("Rendering")
        render_images = model.render(scene_codes, n_views=30, return_type="pil")
        for ri, render_image in enumerate(render_images[0]):
            render_image.save(os.path.join(output_dir, str(i), f"render_{ri:03d}.png"))
        save_video(
            render_images[0], os.path.join(output_dir, str(i), "render.mp4"), fps=30
        )
        timer.end("Rendering")
    
    timer.start("Exporting mesh")
    meshes = model.extract_mesh(scene_codes, has_vertex_color=False)
    mesh_file = os.path.join(output_dir, str(i), f"mesh.{model_save_format}")
    meshes[0].export(mesh_file)
    timer.end("Exporting mesh")

print("Processing complete.")

This loop processes each image (in our case, just one) through the TripoSR model. It generates the 3D scene codes, renders multiple views of the 3D model, saves these renders as images and a video, and exports the 3D mesh.


Downloading the .stl File

For those looking to convert flat PNG (Portable Network Graphics) images to STL (Stereolithography), this is a great option for you to convert the .obj object into .stl format directly.

STL (Stereolithography) is a widely used file format for representing 3D models. It’s primarily used in 3D printing and computer-aided manufacturing (CAM).

OBJ is another common 3D model file format. It differs from STL primarily because it is vertex-based and can store more data points.

Thus, STL is a specialized format for 3D models that is optimized for 3D printing. Its simplicity, geometric focus, and wide support make it a popular choice for this application. While OBJ offers more versatility and data storage, STL is well-suited for the specific needs of 3D printing.

obj_file = "/content/TripoSR/output/0/mesh.obj"

# Load the .obj mesh
ms = pymesh.MeshSet()
ms.load_new_mesh(obj_file)
mesh = ms.current_mesh()


# Convert to .stl format
stl_file = 'model.stl'
ms.save_current_mesh(stl_file)

We load the saved mesh output from the output directory. Now, using MeshSet() from the pymesh library, we load it into a new mesh object.

To save it as an .stl file, we change the file name and use the save_current_mesh function.

You can download the .stl file by expanding the sidebar of the interactive Colab notebook and downloading it.


Displaying the Result

To view our result, we display the rendered video:

Video('output/0/render.mp4', embed=True)

This line displays the rendered video of our 3D model, shown as a gif in Figure 4.

Figure 4: Rendered video output of the 3D object

What’s next? We recommend PyImageSearch University.

Course information:
86 total classes • 115+ hours of on-demand code walkthrough videos • Last updated: October 2024
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you’re serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you’ll find:

  • ✓ 86 courses on essential computer vision, deep learning, and OpenCV topics
  • ✓ 86 Certificates of Completion
  • ✓ 115+ hours of on-demand video
  • ✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
  • ✓ Pre-configured Jupyter Notebooks in Google Colab
  • ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
  • ✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
  • ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
  • ✓ Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University


Summary

In this tutorial, we’ve walked through the process of creating a 3D object from a single image using TripoSR. We began by setting up our environment and importing necessary libraries. We then uploaded and prepared our input image, initialized the TripoSR model, and processed the image to remove its background.

The core of our process involved using the TripoSR model to generate 3D scene codes from our 2D image. We then used these codes to render multiple views of our 3D model and export the 3D mesh.

Throughout the process, we used a custom Timer class to measure the performance of each step, giving us insights into the speed of the TripoSR model.

The result of this process is a 3D model of our input object, which we can view as a rendered video. This demonstrates the power of TripoSR to quickly and accurately create a 3D model from a single PNG image, opening up numerous possibilities in fields such as e-commerce, game development, and virtual reality.


Citation Information

Raha, R. “Create a 3D Object from Your Images with TripoSR in Python,” PyImageSearch, P. Chugh, S. Huot, and P. Thakur, eds., 2024, https://pyimg.co/g316x

@incollection{Raha_2024_create-3d-object-with-triposr-in-python,
  author = {Ritwik Raha},
  title = {Create a 3D Object from Your Images with TripoSR in Python},
  booktitle = {PyImageSearch},
  editor = {Puneet Chugh and Susan Huot and Piyush Thakur},
  year = {2024},
  url = {https://pyimg.co/g316x},
}

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you’ll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

The post Create a 3D Object from Your Images with TripoSR in Python appeared first on PyImageSearch.

“}]] [[{“value”:”Table of Contents Create a 3D Object from Your Images with TripoSR in Python Image to 3D Objects Setting Up the Environment Importing Necessary Libraries Setting Up the Device Creating a Timer Utility Uploading and Preparing the Image Setting Up…
The post Create a 3D Object from Your Images with TripoSR in Python appeared first on PyImageSearch.”}]]  Read More 2D to 3D, 3D Asset Generation, 3D Rendering, Computer Vision, Image to 3D, Machine Learning, Tutorial, 3d reconstruction, 3d rendering, ai-powered modeling, computer vision, cuda, deep learning, image processing, pytorch, single-image 3d, triposr, tutorial 

Leave a Reply

Your email address will not be published. Required fields are marked *