Pydantic Tutorial: Data Validation in Python Made Simple
Image by Author
Python is a dynamically typed language. So you can create variables without explicitly specifying the data type. And you can always assign a completely different value to the same variable. While this makes things easier for beginners, it also makes it just as easy to create invalid objects in your Python application.
Well, you can create data classes which allow defining fields with type hints. But they do not offer direct support for validating data. Enter Pydantic, a popular data validation and serialization library. Pydantic offers out-of-the-box support for data validation and serialization. Meaning you can:
- leverage Python’s type hints to validate fields,
- use the custom fields and built-in validators Pydantic offers, and
- define custom validators as needed.
In this tutorial, we’ll model a simple ‘Employee’ class and validate the values of the different fields using the data validation functionality of Pydantic. Let’s get started!
If you have Python 3.8 or a later version, you can install Pydantic using pip:
If you need email validation in your application, you can install the optional email-validator dependency when installing Pydantic like so:
$ pip install pydantic[email]
Alternatively, you can run the following command to install email-validator:
$ pip install email-validator
Note: In our example, we’ll use email validation. So please install the dependency if you’d like to code along.
Now let’s create a simple Employee
class. FIrst, we create a class that inherits from the BaseModel
class. The various fields and the expected types are specified as shown:
# main.py
from pydantic import BaseModel, EmailStr
class Employee(BaseModel):
name: str
age: int
email: EmailStr
department: str
employee_id: str
Notice that we’ve specified email to be of the EmailStr
type that Pydantic supports instead of a regular Python string. This is because all valid strings may not be valid emails.
Because the Employee
class is simple, let’s add validation for the following fields:
email
: should be a valid email. Specifying theEmailStr
accounts for this, and we run into errors creating objects with invalid email.employee_id
: should be a valid employee ID. We’ll implement a custom validation for this field.
Implementing Custom Validation
For this example, let’s say the employee_id
should be a string of length 6 containing only alphanumeric characters.
We can use the @validator
decorator with the employee_id
field at the argument and define the validate_employee_id
method as shown:
# main.py
from pydantic import BaseModel, EmailStr, validator
...
@validator("employee_id")
def validate_employee_id(cls, v):
if not v.isalnum() or len(v) != 6:
raise ValueError("Employee ID must be exactly 6 alphanumeric characters")
return v
Now this method checks if the employee_id
is valid for the Employee objects we try to create.
At this point, your script should look like so:
# main.py
from pydantic import BaseModel, EmailStr, validator
class Employee(BaseModel):
name: str
age: int
email: EmailStr
department: str
employee_id: str
@validator("employee_id")
def validate_employee_id(cls, v):
if not v.isalnum() or len(v) != 6:
raise ValueError("Employee ID must be exactly 6 alphanumeric characters")
return v
In practice, it’s very common to parse JSON responses from APIs into data structures like Python dictionaries. Say we have an ‘employees.json’ file (in the current directory) with the following records:
# employees.json
[
{
"name": "John Doe",
"age": 30,
"email": "john.doe@example.com",
"department": "Engineering",
"employee_id": "EMP001"
},
{
"name": "Jane Smith",
"age": 25,
"email": "jane.smith@example.com",
"department": "Marketing",
"employee_id": "EMP002"
},
{
"name": "Alice Brown",
"age": 35,
"email": "invalid-email",
"department": "Finance",
"employee_id": "EMP0034"
},
{
"name": "Dave West",
"age": 40,
"email": "dave.west@example.com",
"department": "HR",
"employee_id": "EMP005"
}
]
We can see that in the third record corresponding to ‘Alice Brown’, we have two fields that are invalid: the email
and the employee_id
:
Because we’ve specified that email should be EmailStr
, the email string will be automatically validated. We’ve also added the validate_employee_id
class method to check if the objects have a valid employee ID.
Now let’s add the code to parse the JSON file and create employee objects (we’ll use the built-in json module for this). We also import the ValidationError
class from Pydantic. In essence, we try to create objects, handle ValidationError exceptions when the data validation fails, and also print out the errors:
# main.py
import json
from pydantic import BaseModel, EmailStr, ValidationError, validator
...
# Load and parse the JSON data
with open("employees.json", "r") as f:
data = json.load(f)
# Validate each employee record
for record in data:
try:
employee = Employee(**record)
print(f"Valid employee record: {employee.name}")
except ValidationError as e:
print(f"Invalid employee record: {record['name']}")
print(f"Errors: {e.errors()}")
When you run the script, you should see a similar output:
Output >>>
Valid employee record: John Doe
Valid employee record: Jane Smith
Invalid employee record: Alice Brown
Errors: [{'type': 'value_error', 'loc': ('email',), 'msg': 'value is not a valid email address: The email address is not valid. It must have exactly one @-sign.', 'input': 'invalid-email', 'ctx': {'reason': 'The email address is not valid. It must have exactly one @-sign.'}}, {'type': 'value_error', 'loc': ('employee_id',), 'msg': 'Value error, Employee ID must be exactly 6 alphanumeric characters', 'input': 'EMP0034', 'ctx': {'error': ValueError('Employee ID must be exactly 6 alphanumeric characters')}, 'url': '
Valid employee record: Dave West
As expected, only the record corresponding to ‘Alice Brown’ is not a valid employee object. Zooming in to the relevant part of the output, you can see a detailed message on why the email
and employee_id
fields are invalid.
Here’s the complete code:
# main.py
import json
from pydantic import BaseModel, EmailStr, ValidationError, validator
class Employee(BaseModel):
name: str
age: int
email: EmailStr
department: str
employee_id: str
@validator("employee_id")
def validate_employee_id(cls, v):
if not v.isalnum() or len(v) != 6:
raise ValueError("Employee ID must be exactly 6 alphanumeric characters")
return v
# Load and parse the JSON data
with open("employees.json", "r") as f:
data = json.load(f)
# Validate each employee record
for record in data:
try:
employee = Employee(**record)
print(f"Valid employee record: {employee.name}")
except ValidationError as e:
print(f"Invalid employee record: {record['name']}")
print(f"Errors: {e.errors()}")
That’s all for this tutorial! This is an introductory tutorial to Pydantic. I hope you learned the basics of modeling your data, and using both built-in and custom validations that Pydantic offers. All the code used in this tutorial is on GitHub.
Next, you may try using Pydantic in your Python projects and also explore serialization capabilities. Happy coding!
Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.