Want data on the fly? Just call on Pandas and NumPy!

Life's too short for predictable data—embrace the randomness with Pandas and NumPy.

When working with data science and machine learning projects, it's often useful to create data frames with random data for testing, experimentation, or prototyping. In Python, the powerful libraries Pandas and NumPy make it easy to generate data frames with random data. In this blog, we'll explore how to generate data frames with random data using Pandas and NumPy.

I assume, you have latest version of Python installed in your local machine.

Getting Started

To follow along with the examples in this blog, you need to have Pandas and NumPy installed. If you haven't installed them yet, you can use the following command:

pip install pandas numpy

Now, let's explore how to generate data frames with random data.

Generating Data Frames with Random Data

First, we'll import the Pandas and NumPy libraries:

import pandas as pd
import numpy as np

1. Generating a Data Frame with Random Numbers

You can use NumPy's np.random module to generate random numbers and Pandas' pd.DataFrame to create a data frame.

For example, let's generate a data frame with 10 rows and 5 columns filled with random numbers between 0 and 1:

# Generate a 10x5 array of random numbers between 0 and 1
random_data = np.random.rand(10, 5)

# Create a data frame using the random data
df = pd.DataFrame(random_data, columns=['A', 'B', 'C', 'D', 'E'])

# Display the data frame
print(df)

Output:

Try to run in any python installation or a notebook in Anaconda cloud

2. Generating a Data Frame with Random Integers

To generate a data frame with random integers, you can use NumPy's np.random.randint function. For example, let's create a data frame with 8 rows and 3 columns filled with random integers between 1 and 100:

# Generate an 8x3 array of random integers between 1 and 100
random_data_integers = np.random.randint(1, 101, size=(8, 3))

# Create a data frame using the random integer data
df_integers = pd.DataFrame(random_data_integers, columns=['X', 'Y', 'Z'])

# Display the data frame
print(df_integers)

####
   X    Y   Z
0  67   79  99
1  82   34  18
2  33   31  62
3  73   18  38
4  77  100  48
5  11   81  95
6  29   42  22
7  27   62  79

3. Generating Data Frames with Specific Distributions

You can use NumPy's functions to generate random data with specific statistical distributions.

Normal Distribution: Use np.random.normal to generate data with a normal distribution (mean=0 and standard deviation=1).

# Generate a 5x4 array of random numbers with a normal distribution
random_data_normal = np.random.normal(0, 1, size=(5, 4))

# Create a data frame using the random data
df_normal = pd.DataFrame(random_data_normal, columns=['W', 'X', 'Y', 'Z'])

# Display the data frame
print(df_normal)

###
          W         X         Y         Z
0  0.199598 -0.323744  1.183820  0.957453
1 -0.309851 -0.075018 -0.379081  1.062766
2 -1.765066  0.238010 -0.408007 -1.676825
3  0.732172  0.007899  0.764553 -1.122832
4 -1.313068  0.404557 -0.417568  0.362700

Conclusion

Generating data frames with random data is a great way to experiment with data manipulation techniques and test code functionality. With Pandas and NumPy, you can easily create data frames filled with random data, integers, or data from specific statistical distributions. Try out the examples above to start generating data frames with random data for your projects!

Bonus:

# Generate a 10x5 array of random numbers between 0 and 1
random_data = np.random.rand(10, 5)

# Create a data frame using the random data
df = pd.DataFrame(random_data, columns=['A', 'B', 'C', 'D', 'E'])

# Display the data frame
print(df)

Want data on the fly? Just call on Pandas and NumPy!

Getting Started

Generating Data Frames with Random Data

1. Generating a Data Frame with Random Numbers

2. Generating a Data Frame with Random Integers

3. Generating Data Frames with Specific Distributions

Conclusion

Getting started with Anaconda

Deciphering SQL Query Execution

Join to our community 👋

Getting Started

Generating Data Frames with Random Data

1. Generating a Data Frame with Random Numbers

2. Generating a Data Frame with Random Integers

3. Generating Data Frames with Specific Distributions

Conclusion

Share Article:

Getting started with Anaconda

Deciphering SQL Query Execution

More in this Category Data & More

Transforming Supply Chains: AI/ML Innovations in CPG Distribution

Getting started with Databricks

Deciphering SQL Query Execution

Want data on the fly? Just call on Pandas and NumPy!

Join to our community 👋