Pydantic Tutorial: Data Validation in Python Made Simple
Image by Author

 

 

Python is a dynamically typed language. So you can create variables without explicitly specifying the data type. And you can always assign a completely different value to the same variable. While this makes things easier for beginners, it also makes it just as easy to create invalid objects in your Python application.

Well, you can create data classes which allow defining fields with type hints. But they do not offer direct support for validating data. Enter Pydantic, a popular data validation and serialization library. Pydantic offers out-of-the-box support for data validation and serialization. Meaning you can:

  • leverage Python’s type hints to validate fields, 
  • use the custom fields and built-in validators Pydantic offers, and 
  • define custom validators as needed.

In this tutorial, we’ll model a simple ‘Employee’ class and validate the values of the different fields using the data validation functionality of Pydantic. Let’s get started!

 

 

If you have Python 3.8 or a later version, you can install Pydantic using pip:

 

If you need email validation in your application, you can install the optional email-validator dependency when installing Pydantic like so:

$ pip install pydantic[email]

 

Alternatively, you can run the following command to install email-validator:

$ pip install email-validator

 

Note: In our example, we’ll use email validation. So please install the dependency if you’d like to code along.

 

 

Now let’s create a simple Employee class. FIrst, we create a class that inherits from the BaseModel class. The various fields and the expected types are specified as shown:

# main.py

from pydantic import BaseModel, EmailStr

class Employee(BaseModel):
    name: str
    age: int
    email: EmailStr
    department: str
    employee_id: str

 

Notice that we’ve specified email to be of the EmailStr type that Pydantic supports instead of a regular Python string. This is because all valid strings may not be valid emails.

 

 

Because the Employee class is simple, let’s add validation for the following fields:

  • email: should be a valid email. Specifying the EmailStr accounts for this, and we run into errors creating objects with invalid email.
  • employee_id: should be a valid employee ID. We’ll implement a custom validation for this field.

 

Implementing Custom Validation

 

For this example, let’s say the employee_id should be a string of length 6 containing only alphanumeric characters.

We can use the @validator decorator with the employee_id field at the argument and define the validate_employee_id method as shown: 

# main.py 

from pydantic import BaseModel, EmailStr, validator

...

@validator("employee_id")
    def validate_employee_id(cls, v):
        if not v.isalnum() or len(v) != 6:
            raise ValueError("Employee ID must be exactly 6 alphanumeric characters")
        return v

 

Now this method checks if the employee_id is valid for the Employee objects we try to create.

At this point, your script should look like so:

# main.py

from pydantic import BaseModel, EmailStr, validator

class Employee(BaseModel):
    name: str
    age: int
    email: EmailStr
    department: str
    employee_id: str

    @validator("employee_id")
     def validate_employee_id(cls, v):
         if not v.isalnum() or len(v) != 6:
             raise ValueError("Employee ID must be exactly 6 alphanumeric characters")
         return v

 

 

In practice, it’s very common to parse JSON responses from APIs into data structures like Python dictionaries. Say we have an ‘employees.json’ file (in the current directory) with the following records:

# employees.json

[
	{
    	"name": "John Doe",
    	"age": 30,
    	"email": "john.doe@example.com",
    	"department": "Engineering",
    	"employee_id": "EMP001"
	},
	{
    	"name": "Jane Smith",
    	"age": 25,
    	"email": "jane.smith@example.com",
    	"department": "Marketing",
    	"employee_id": "EMP002"
	},
	{
    	"name": "Alice Brown",
    	"age": 35,
    	"email": "invalid-email",
    	"department": "Finance",
    	"employee_id": "EMP0034"
	},
	{
    	"name": "Dave West",
    	"age": 40,
    	"email": "dave.west@example.com",
    	"department": "HR",
    	"employee_id": "EMP005"
	}
]

 

We can see that in the third record corresponding to ‘Alice Brown’, we have two fields that are invalid: the email and the employee_id:

Pydantic Tutorial: Data Validation in Python Made Simple

 

Because we’ve specified that email should be EmailStr, the email string will be automatically validated. We’ve also added the validate_employee_id class method to check if the objects have a valid employee ID.

Now let’s add the code to parse the JSON file and create employee objects (we’ll use the built-in json module for this).  We also import the ValidationError class from Pydantic. In essence, we try to create objects, handle ValidationError exceptions when the data validation fails, and also print out the errors:

# main.py

import json
from pydantic import BaseModel, EmailStr, ValidationError, validator
...

# Load and parse the JSON data
with open("employees.json", "r") as f:
    data = json.load(f)

# Validate each employee record
for record in data:
    try:
        employee = Employee(**record)
        print(f"Valid employee record: {employee.name}")
    except ValidationError as e:
        print(f"Invalid employee record: {record['name']}")
        print(f"Errors: {e.errors()}")

 

When you run the script, you should see a similar output:

Output >>>

Valid employee record: John Doe
Valid employee record: Jane Smith
Invalid employee record: Alice Brown
Errors: [{'type': 'value_error', 'loc': ('email',), 'msg': 'value is not a valid email address: The email address is not valid. It must have exactly one @-sign.', 'input': 'invalid-email', 'ctx': {'reason': 'The email address is not valid. It must have exactly one @-sign.'}}, {'type': 'value_error', 'loc': ('employee_id',), 'msg': 'Value error, Employee ID must be exactly 6 alphanumeric characters', 'input': 'EMP0034', 'ctx': {'error': ValueError('Employee ID must be exactly 6 alphanumeric characters')}, 'url': '
Valid employee record: Dave West

 

As expected, only the record corresponding to ‘Alice Brown’ is not a valid employee object. Zooming in to the relevant part of the output, you can see a detailed message on why the email and employee_id fields are invalid.

Here’s the complete code:

# main.py

import json
from pydantic import BaseModel, EmailStr, ValidationError, validator

class Employee(BaseModel):
    name: str
    age: int
    email: EmailStr
    department: str
    employee_id: str

    @validator("employee_id")
     def validate_employee_id(cls, v):
         if not v.isalnum() or len(v) != 6:
             raise ValueError("Employee ID must be exactly 6 alphanumeric characters")
         return v

# Load and parse the JSON data
with open("employees.json", "r") as f:
    data = json.load(f)

# Validate each employee record
for record in data:
    try:
        employee = Employee(**record)
        print(f"Valid employee record: {employee.name}")
    except ValidationError as e:
        print(f"Invalid employee record: {record['name']}")
        print(f"Errors: {e.errors()}")

 

 

That’s all for this tutorial! This is an introductory tutorial to Pydantic. I hope you learned the basics of modeling your data, and using both built-in and custom validations that Pydantic offers. All the code used in this tutorial is on GitHub

Next, you may try using Pydantic in your Python projects and also explore serialization  capabilities. Happy coding!
 
 

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.